Module utilities.section_utils
Utility functions and classes used by: section_extractor.py section_specs.py
Functions
- def default_roll_func(start_idxs: list[int], sect_lines: list[str])
- 
Default line roll function. Developed for Cerner PDF line wrapping. Appends wrapped line text to the end of the previous line while attempting to preserve tabular layouts by examining the lines immediately before and after. NOTE: sect_lineswill be updated in place.Args- start_idxs:- list[int]
- list of line indexes where line wrapping occurs.
- sect_lines:- list[str]
- list of lines in the current section.
 Example>>> sect_lines = [ ... " Column1 Column2", ... " Value1 Value", ... "2" ... ] >>> default_roll_func([1], sect_lines) >>> sect_lines [' Column1 Column2', ' Value1 Value 2']
- def field_roll_func(idxs: list[int], lines: list[str])
- 
Field line roll function. Developed for TeamHealth Racine Facesheets. Appends line_splits from wrapped line text to a split with a matching index on the previous line. NOTE: lineswill be updated in place.Args- idxs:- list[int]
- list of line indexes where line wrapping occurs.
- lines:- list[str]
- list of lines in the current section.
 Returns- list[str]
- list of lines with wrapped text appended to the previous line.
 Example>>> lines = [ ... " up:", ... " Address: 4567 WILLOW WOOD DR", ... " City: MOUNT Stat WI Zip: 534 Phone: 602-620-2413", ... " PLEASANT e: 03", ... "Guarantor Information" ... ] >>> field_roll_func([2], lines) >>> lines [' up:', ' Address: 4567 WILLOW WOOD DR', ' City: MOUNTPLEASANT State: WI Zip: 53403 Phone: 602-620-2413', 'Guarantor Information']
- def is_attribution(line: str | Sequence[str]) ‑> bool
- 
Determine if a given line constitutes an "attribution" from a provider. Args- line:- str | Sequence[str]
- The input line or sequence of lines to check.
 Returns- bool
- True if the line constitutes an attribution, False otherwise.
 Example>>> is_attribution(" JM.1 - Mawn, John Gregory, MD on 07/10/24 1402") True >>> is_attribution("Random text without attribution") False
Classes
- class AgentsStripper
- 
Remove 'Agents' table of O2 flows from Epic records. Initially implemented for NPH_BSWWACS. Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class DocumentStripper
- 
Abstract class for constructing "strippers" that can be used to remove unwanted lines from a section. See RevisionHistoryStripper and VersionStripper below for example implementations. Ancestors- abc.ABC
 Subclasses- AgentsStripper
- DuplicatedSectionsStripper
- FlowsheetDataStripper
- HistoryAndPhysicalStripper
- InsuranceInformationDataStrippter
- RevisionHistoryStripper
- TimelineDataStripper
- VersionStripper
 Instance variables- prop strip_start_idx
- 
Line index where stripping began 
- prop stripping : bool
- 
True if actively removing lines 
 Methods- def strip_check(self, lines: Sequence[str])
- 
Check to begin/end stripping based on current stripping status 
 
- class DuplicatedSectionsStripper
- 
Remove duplicated Intraop, Procedure Notes, summaries, events, and staff data present in NPH_BSWWACS. Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class FlowsheetDataStripper
- 
Remove Intraprocedure Flowsheet Data section from Epic records. Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class ForceNameTuple (checks: ForwardRef('Sequence[Callable[[str], bool]]'), title: ForwardRef('str'), apply_after_breaks: ForwardRef('bool') = False, replace: ForwardRef('bool') = False, insert: ForwardRef('bool') = False)
- 
Tests the current title with each check func. if any return True, the current title is replaced with the value in the titleattribute.Attributes- checks:- Sequence[Callable[[str], bool]]
- each callable is passed the current title. If any return True, the current title is replaced.
- title:- str
- the new title to use when any check returns True
- apply_after_breaks:- bool
- if True, the force name check is applied after any heading breaks are applied. Defaults to False.
- replace:- bool
- if True, the line containing the original title is replaced with the new title. if False, only the title is changed. Defaults to False. NOTE: Implemented in section_extractor ONLY. Has no effect for table_extractor use case.
- insert:- bool
- if True, the forced title is inserted as the first line of the section/table. Default is False. NOTE: Implemented in section_extractor ONLY. Has no effect for table_extractor use cases.
 Ancestors- builtins.tuple
 Instance variables- var apply_after_breaks : bool
- 
Alias for field number 2 
- var checks : collections.abc.Sequence[collections.abc.Callable[[str], bool]]
- 
Alias for field number 0 
- var insert : bool
- 
Alias for field number 4 
- var replace : bool
- 
Alias for field number 3 
- var title : str
- 
Alias for field number 1 
 
- class HBTuple (hbreak: ForwardRef('str'), offset: ForwardRef('int') = 0, right: ForwardRef('bool') = False, replace: ForwardRef('bool') = False)
- 
Heading Break Tuple Attributes- hbreak:- str
- the string at which the heading's text will be split
- offset:- int
- added to str.rfind(hbreak) to set final split position
- right:- bool
- take final title from left (default) or right of split
- replace:- bool
- replace 1st line with new title. Defaults to False. Only applies to sections. Not implemented for tables.
 Ancestors- builtins.tuple
 Instance variables- var hbreak : str
- 
Alias for field number 0 
- var offset : int
- 
Alias for field number 1 
- var replace : bool
- 
Alias for field number 3 
- var right : bool
- 
Alias for field number 2 
 
- class HistoryAndPhysicalStripper
- 
Remove H&P / History and Physical section data from ABS Epic records. Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class InsuranceInformationDataStrippter
- 
Remove an Insurance Information table: specific for TMA_GTR Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class LTTuple (latch: ForwardRef('Callable[[Sequence[str]], bool]'), trigger: ForwardRef('Callable[[Sequence[str]], bool]') = <function LTTuple.<lambda>>, unlatch: ForwardRef('Callable[[Sequence[str]], bool]') = <function LTTuple.<lambda>>, title_check: ForwardRef('Callable[[str], bool]') = <function LTTuple.<lambda>>)
- 
A Latch/Unlatch/Trigger mechanism for managing state transitions during iteration. LTTuple defines four callable attributes: latch, trigger, unlatch, and title_check. Typically, an iterative process will take some action if the current iteration object returns True when passed to 'trigger' if and only if a prior object passed to 'latch' has returned True and none of the objects between the latching object and the triggering object have returned True when passed to 'unlatch'. Attributes- latch:- Callable[[Sequence[str]], bool]
- Function to determine when to latch.
- trigger:- Callable[[Sequence[str]], bool]
- Function to determine when to trigger.
Default is lambda x: True.
- unlatch:- Callable[[Sequence[str]], bool]
- Function to determine when to unlatch.
Default is lambda x: False.
- title_check:- Callable[[str]], bool]
- An optional function for checking a section
the title. If the current title returns False, skip the check. Default is
lambda x: True
 Example>>> ltt = LTTuple( ... latch=lambda remainder: False if not remainder else remainder[0]=="latch", ... trigger=lambda remainder: False if not remainder else remainder[0]=="trigger", ... unlatch=lambda remainder: False if not remainder else remainder[0]=="unlatch" ... ) >>> lines = ["latch", "unlatch", "trigger", "latch", "line1", "line3", "trigger", "end"] >>> latched = False >>> for i, _ in enumerate(lines): ... latched = (latched or ltt.latch(lines[i:])) and not ltt.unlatch(lines[i:]) ... if latched and ltt.trigger(lines[i:]): ... print(i) 6Ancestors- builtins.tuple
 Instance variables- var latch : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
- 
Alias for field number 0 
- var title_check : collections.abc.Callable[[str], bool]
- 
Alias for field number 3 
- var trigger : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
- 
Alias for field number 1 
- var unlatch : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
- 
Alias for field number 2 
 
- class LineRollCheck (description: ForwardRef('str'), lines_check: ForwardRef('Callable[[int, list[str]], bool]'), roll_func: ForwardRef('Callable[[list[int], list[str]], None]') = <function default_roll_func>, title_check: ForwardRef('Callable[[str], bool]') = <function LineRollCheck.<lambda>>)
- 
Defines tests for detecting wrapped lines and correcting them. Attributes- description:- str
- appears in a log message if this check is fired.
- lines_check:- Callable[[int, list[str]], bool]
- given an index and the full list of lines in a section, return True if the line at index + 1 should be rolled up to the index line. False otherwise.
- roll_func:- Callable[[list[int], list[str]], list[str]]
- given a list of line indexes where line wrapping occurs and the list of lines in the current section, update the list of lines by rolling all wrapped line text into the previous line. Defaults to default_roll_func.
- title_check:- Callable[[str], bool]
- given the title of the current section, return True to enable this line roll check or False to skip it. Defaults to lambda _: True.
 Ancestors- builtins.tuple
 Instance variables- var description : str
- 
Alias for field number 0 
- var lines_check : collections.abc.Callable[[int, list[str]], bool]
- 
Alias for field number 1 
- var roll_func : collections.abc.Callable[[list[int], list[str]], None]
- 
Alias for field number 2 
- var title_check : collections.abc.Callable[[str], bool]
- 
Alias for field number 3 
 
- class RevisionHistoryStripper
- 
Remove revision history sections to prevent parsing errors. Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class SectExtBase
- 
Defines fields required in a class in order for it to function in use cases intended for a full SectionExtractor instance. Ancestors- abc.ABC
 Static methods- def new_from_table_dict(tab_dict: dict[str, list[dict[str, str]]])
- 
return a new SectExtBase instance with self.table_dictionary set equal to tab_dict 
 
- class SectStartDisqualifier (title_check: ForwardRef('Callable[[str], bool]') = <function SectStartDisqualifier.<lambda>>, remaining_lines_disqualifier: ForwardRef('Callable[[Sequence[str]], bool]') = <function SectStartDisqualifier.<lambda>>)
- 
Section start disqalifier. Prevents a new section from triggering if the conditions described below are met. Attributes- title_check:- Callable[[str], bool]
- given the title of the currently extracting section, return True to enable this disqualifier or False to skip it.
- remaining_lines_disqualifier:- Callable[[Sequence[str]], bool]
- given the lines remaining in the extracted text, return True to prevent a new section from starting at the current position.
 Ancestors- builtins.tuple
 Instance variables- var remaining_lines_disqualifier : collections.abc.Callable[[collections.abc.Sequence[str]], bool]
- 
Alias for field number 1 
- var title_check : collections.abc.Callable[[str], bool]
- 
Alias for field number 0 
 
- class StripperTuple (stripper_class: ForwardRef('Type[DocumentStripper]'), kwargs: ForwardRef('dict[str, Any]'))
- 
Simple tuple defining a stripper class and its kwargs. Ancestors- builtins.tuple
 Instance variables- var kwargs : dict[str, typing.Any]
- 
Alias for field number 1 
- var stripper_class : Type[DocumentStripper]
- 
Alias for field number 0 
 
- class TimelineDataStripper
- 
Remove Patient Care Timeline section from ABS Epic records. Ancestors- DocumentStripper
- abc.ABC
 Inherited members
- class VersionStripper (*, depth=2, end_strip_latches: Sequence[LTTuple] = ())
- 
Remove all but the most recent version for all Op Notes. Ancestors- DocumentStripper
- abc.ABC
 Inherited members