Module utilities.section_utils
Utility functions and classes used by: section_extractor.py section_specs.py
Functions
def default_roll_func(start_idxs: list[int], sect_lines: list[str])
-
Default line roll function. Developed for Cerner PDF line wrapping.
Appends wrapped line text to the end of the previous line while attempting to preserve tabular layouts by examining the lines immediately before and after. NOTE:
sect_lines
will be updated in place.Args
start_idxs
:list[int]
- list of line indexes where line wrapping occurs.
sect_lines
:list[str]
- list of lines in the current section.
Example
>>> sect_lines = [ ... " Column1 Column2", ... " Value1 Value", ... "2" ... ] >>> default_roll_func([1], sect_lines) >>> sect_lines [' Column1 Column2', ' Value1 Value 2']
def field_roll_func(idxs: list[int], lines: list[str])
-
Field line roll function. Developed for TeamHealth Racine Facesheets.
Appends line_splits from wrapped line text to a split with a matching index on the previous line. NOTE:
lines
will be updated in place.Args
idxs
:list[int]
- list of line indexes where line wrapping occurs.
lines
:list[str]
- list of lines in the current section.
Returns
list[str]
- list of lines with wrapped text appended to the previous line.
Example
>>> lines = [ ... " up:", ... " Address: 4567 WILLOW WOOD DR", ... " City: MOUNT Stat WI Zip: 534 Phone: 602-620-2413", ... " PLEASANT e: 03", ... "Guarantor Information" ... ] >>> field_roll_func([2], lines) >>> lines [' up:', ' Address: 4567 WILLOW WOOD DR', ' City: MOUNTPLEASANT State: WI Zip: 53403 Phone: 602-620-2413', 'Guarantor Information']
def is_attribution(line: str | Sequence[str]) ‑> bool
-
Determine if a given line constitutes an "attribution" from a provider.
Args
line
:str | Sequence[str]
- The input line or sequence of lines to check.
Returns
bool
- True if the line constitutes an attribution, False otherwise.
Example
>>> is_attribution(" JM.1 - Mawn, John Gregory, MD on 07/10/24 1402") True >>> is_attribution("Random text without attribution") False
Classes
class AgentsStripper
-
Remove 'Agents' table of O2 flows from Epic records.
Initially implemented for NPH_BSWWACS.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class DocumentStripper
-
Abstract class for constructing "strippers" that can be used to remove unwanted lines from a section. See RevisionHistoryStripper and VersionStripper below for example implementations.
Ancestors
- abc.ABC
Subclasses
- AgentsStripper
- DuplicatedSectionsStripper
- FlowsheetDataStripper
- HistoryAndPhysicalStripper
- InsuranceInformationDataStrippter
- RevisionHistoryStripper
- TimelineDataStripper
- VersionStripper
Instance variables
prop strip_start_idx
-
Line index where stripping began
prop stripping : bool
-
True if actively removing lines
Methods
def strip_check(self, lines: Sequence[str])
-
Check to begin/end stripping based on current stripping status
class DuplicatedSectionsStripper
-
Remove duplicated Intraop, Procedure Notes, summaries, events, and staff data present in NPH_BSWWACS.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class FlowsheetDataStripper
-
Remove Intraprocedure Flowsheet Data section from Epic records.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class ForceNameTuple (checks: ForwardRef('Sequence[Callable[[str], bool]]'), title: ForwardRef('str'), apply_after_breaks: ForwardRef('bool') = False, replace: ForwardRef('bool') = False, insert: ForwardRef('bool') = False)
-
Tests the current title with each check func. if any return True, the current title is replaced with the value in the
title
attribute.Attributes
checks
:Sequence[Callable[[str], bool]]
- each callable is passed the current title. If any return True, the current title is replaced.
title
:str
- the new title to use when any check returns True
apply_after_breaks
:bool
- if True, the force name check is applied after any heading breaks are applied. Defaults to False.
replace
:bool
- if True, the line containing the original title is replaced with the new title. if False, only the title is changed. Defaults to False. NOTE: Implemented in section_extractor ONLY. Has no effect for table_extractor use case.
insert
:bool
- if True, the forced title is inserted as the first line of the section/table. Default is False. NOTE: Implemented in section_extractor ONLY. Has no effect for table_extractor use cases.
Ancestors
- builtins.tuple
Instance variables
var apply_after_breaks : bool
-
Alias for field number 2
var checks : collections.abc.Sequence[collections.abc.Callable[[str], bool]]
-
Alias for field number 0
var insert : bool
-
Alias for field number 4
var replace : bool
-
Alias for field number 3
var title : str
-
Alias for field number 1
class HBTuple (hbreak: ForwardRef('str'), offset: ForwardRef('int') = 0, right: ForwardRef('bool') = False, replace: ForwardRef('bool') = False)
-
Heading Break Tuple
Attributes
hbreak
:str
- the string at which the heading's text will be split
offset
:int
- added to str.rfind(hbreak) to set final split position
right
:bool
- take final title from left (default) or right of split
replace
:bool
- replace 1st line with new title. Defaults to False. Only applies to sections. Not implemented for tables.
Ancestors
- builtins.tuple
Instance variables
var hbreak : str
-
Alias for field number 0
var offset : int
-
Alias for field number 1
var replace : bool
-
Alias for field number 3
var right : bool
-
Alias for field number 2
class HistoryAndPhysicalStripper
-
Remove H&P / History and Physical section data from ABS Epic records.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class InsuranceInformationDataStrippter
-
Remove an Insurance Information table: specific for TMA_GTR
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class LTTuple (latch: ForwardRef('Callable[[Sequence[str]], bool]'), trigger: ForwardRef('Callable[[Sequence[str]], bool]') = <function LTTuple.<lambda>>, unlatch: ForwardRef('Callable[[Sequence[str]], bool]') = <function LTTuple.<lambda>>, title_check: ForwardRef('Callable[[str], bool]') = <function LTTuple.<lambda>>)
-
A Latch/Unlatch/Trigger mechanism for managing state transitions during iteration.
LTTuple defines four callable attributes: latch, trigger, unlatch, and title_check. Typically, an iterative process will take some action if the current iteration object returns True when passed to 'trigger' if and only if a prior object passed to 'latch' has returned True and none of the objects between the latching object and the triggering object have returned True when passed to 'unlatch'.
Attributes
latch
:Callable[[Sequence[str]], bool]
- Function to determine when to latch.
trigger
:Callable[[Sequence[str]], bool]
- Function to determine when to trigger.
Default is
lambda x: True
. unlatch
:Callable[[Sequence[str]], bool]
- Function to determine when to unlatch.
Default is
lambda x: False
. title_check
:Callable[[str]], bool]
- An optional function for checking a section
the title. If the current title returns False, skip the check. Default is
lambda x: True
Example
>>> ltt = LTTuple( ... latch=lambda remainder: False if not remainder else remainder[0]=="latch", ... trigger=lambda remainder: False if not remainder else remainder[0]=="trigger", ... unlatch=lambda remainder: False if not remainder else remainder[0]=="unlatch" ... ) >>> lines = ["latch", "unlatch", "trigger", "latch", "line1", "line3", "trigger", "end"] >>> latched = False >>> for i, _ in enumerate(lines): ... latched = (latched or ltt.latch(lines[i:])) and not ltt.unlatch(lines[i:]) ... if latched and ltt.trigger(lines[i:]): ... print(i) 6
Ancestors
- builtins.tuple
Instance variables
var latch : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
-
Alias for field number 0
var title_check : collections.abc.Callable[[str], bool]
-
Alias for field number 3
var trigger : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
-
Alias for field number 1
var unlatch : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
-
Alias for field number 2
class LineRollCheck (description: ForwardRef('str'), lines_check: ForwardRef('Callable[[int, list[str]], bool]'), roll_func: ForwardRef('Callable[[list[int], list[str]], None]') = <function default_roll_func>, title_check: ForwardRef('Callable[[str], bool]') = <function LineRollCheck.<lambda>>)
-
Defines tests for detecting wrapped lines and correcting them.
Attributes
description
:str
- appears in a log message if this check is fired.
lines_check
:Callable[[int, list[str]], bool]
- given an index and the full list of lines in a section, return True if the line at index + 1 should be rolled up to the index line. False otherwise.
roll_func
:Callable[[list[int], list[str]], list[str]]
- given a list of line indexes where line wrapping occurs and the list of lines in the current section, update the list of lines by rolling all wrapped line text into the previous line. Defaults to default_roll_func.
title_check
:Callable[[str], bool]
- given the title of the current section, return True to enable this line roll check or False to skip it. Defaults to lambda _: True.
Ancestors
- builtins.tuple
Instance variables
var description : str
-
Alias for field number 0
var lines_check : collections.abc.Callable[[int, list[str]], bool]
-
Alias for field number 1
var roll_func : collections.abc.Callable[[list[int], list[str]], None]
-
Alias for field number 2
var title_check : collections.abc.Callable[[str], bool]
-
Alias for field number 3
class RevisionHistoryStripper
-
Remove revision history sections to prevent parsing errors.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class SectExtBase
-
Defines fields required in a class in order for it to function in use cases intended for a full SectionExtractor instance.
Ancestors
- abc.ABC
Static methods
def new_from_table_dict(tab_dict: dict[str, list[dict[str, str]]])
-
return a new SectExtBase instance with self.table_dictionary set equal to tab_dict
class SectStartDisqualifier (title_check: ForwardRef('Callable[[str], bool]') = <function SectStartDisqualifier.<lambda>>, remaining_lines_disqualifier: ForwardRef('Callable[[Sequence[str]], bool]') = <function SectStartDisqualifier.<lambda>>)
-
Section start disqalifier. Prevents a new section from triggering if the conditions described below are met.
Attributes
title_check
:Callable[[str], bool]
- given the title of the currently extracting section, return True to enable this disqualifier or False to skip it.
remaining_lines_disqualifier
:Callable[[Sequence[str]], bool]
- given the lines remaining in the extracted text, return True to prevent a new section from starting at the current position.
Ancestors
- builtins.tuple
Instance variables
var remaining_lines_disqualifier : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
-
Alias for field number 1
var title_check : collections.abc.Callable[[str], bool]
-
Alias for field number 0
class StripperTuple (stripper_class: ForwardRef('Type[DocumentStripper]'), kwargs: ForwardRef('dict[str, Any]'))
-
Simple tuple defining a stripper class and its kwargs.
Ancestors
- builtins.tuple
Instance variables
var kwargs : dict[str, typing.Any]
-
Alias for field number 1
var stripper_class : Type[DocumentStripper]
-
Alias for field number 0
class TimelineDataStripper
-
Remove Patient Care Timeline section from ABS Epic records.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members
class VersionStripper (*, depth=2, end_strip_latches: Sequence[LTTuple] = ())
-
Remove all but the most recent version for all Op Notes.
Ancestors
- DocumentStripper
- abc.ABC
Inherited members