Module utilities.section_utils

Utility functions and classes used by: section_extractor.py section_specs.py

Functions

def default_roll_func(start_idxs: list[int], sect_lines: list[str])

Default line roll function. Developed for Cerner PDF line wrapping.

Appends wrapped line text to the end of the previous line while attempting to preserve tabular layouts by examining the lines immediately before and after. NOTE: sect_lines will be updated in place.

Args

start_idxs : list[int]
list of line indexes where line wrapping occurs.
sect_lines : list[str]
list of lines in the current section.

Example

>>> sect_lines = [
...     "   Column1      Column2",
...     "    Value1      Value",
...     "2"
... ]
>>> default_roll_func([1], sect_lines)
>>> sect_lines
['   Column1      Column2', '    Value1      Value 2']
def field_roll_func(idxs: list[int], lines: list[str])

Field line roll function. Developed for TeamHealth Racine Facesheets.

Appends line_splits from wrapped line text to a split with a matching index on the previous line. NOTE: lines will be updated in place.

Args

idxs : list[int]
list of line indexes where line wrapping occurs.
lines : list[str]
list of lines in the current section.

Returns

list[str]
list of lines with wrapped text appended to the previous line.

Example

>>> lines = [
...     "                               up:",
...     "   Address:     4567 WILLOW WOOD DR",
...     "   City:        MOUNT          Stat   WI     Zip:      534    Phone:           602-620-2413",
...     "                PLEASANT       e:                      03",
...     "Guarantor Information"
... ]
>>> field_roll_func([2], lines)
>>> lines
['                               up:',
 '   Address:     4567 WILLOW WOOD DR',
 '   City:        MOUNTPLEASANT          State:   WI     Zip:      53403    Phone:           602-620-2413',
 'Guarantor Information']
def is_attribution(line: str | Sequence[str]) ‑> bool

Determine if a given line constitutes an "attribution" from a provider.

Args

line : str | Sequence[str]
The input line or sequence of lines to check.

Returns

bool
True if the line constitutes an attribution, False otherwise.

Example

>>> is_attribution("   JM.1 - Mawn, John Gregory, MD on 07/10/24 1402")
True
>>> is_attribution("Random text without attribution")
False

Classes

class AgentsStripper

Remove 'Agents' table of O2 flows from Epic records.

Initially implemented for NPH_BSWWACS.

Ancestors

Inherited members

class DocumentStripper

Abstract class for constructing "strippers" that can be used to remove unwanted lines from a section. See RevisionHistoryStripper and VersionStripper below for example implementations.

Ancestors

  • abc.ABC

Subclasses

Instance variables

prop strip_start_idx

Line index where stripping began

prop stripping : bool

True if actively removing lines

Methods

def strip_check(self, lines: Sequence[str])

Check to begin/end stripping based on current stripping status

class DuplicatedSectionsStripper

Remove duplicated Intraop, Procedure Notes, summaries, events, and staff data present in NPH_BSWWACS.

Ancestors

Inherited members

class FlowsheetDataStripper

Remove Intraprocedure Flowsheet Data section from Epic records.

Ancestors

Inherited members

class ForceNameTuple (checks: ForwardRef('Sequence[Callable[[str], bool]]'), title: ForwardRef('str'), apply_after_breaks: ForwardRef('bool') = False, replace: ForwardRef('bool') = False, insert: ForwardRef('bool') = False)

Tests the current title with each check func. if any return True, the current title is replaced with the value in the title attribute.

Attributes

checks : Sequence[Callable[[str], bool]]
each callable is passed the current title. If any return True, the current title is replaced.
title : str
the new title to use when any check returns True
apply_after_breaks : bool
if True, the force name check is applied after any heading breaks are applied. Defaults to False.
replace : bool
if True, the line containing the original title is replaced with the new title. if False, only the title is changed. Defaults to False. NOTE: Implemented in section_extractor ONLY. Has no effect for table_extractor use case.
insert : bool
if True, the forced title is inserted as the first line of the section/table. Default is False. NOTE: Implemented in section_extractor ONLY. Has no effect for table_extractor use cases.

Ancestors

  • builtins.tuple

Instance variables

var apply_after_breaks : bool

Alias for field number 2

var checks : collections.abc.Sequence[collections.abc.Callable[[str], bool]]

Alias for field number 0

var insert : bool

Alias for field number 4

var replace : bool

Alias for field number 3

var title : str

Alias for field number 1

class HBTuple (hbreak: ForwardRef('str'), offset: ForwardRef('int') = 0, right: ForwardRef('bool') = False, replace: ForwardRef('bool') = False)

Heading Break Tuple

Attributes

hbreak : str
the string at which the heading's text will be split
offset : int
added to str.rfind(hbreak) to set final split position
right : bool
take final title from left (default) or right of split
replace : bool
replace 1st line with new title. Defaults to False. Only applies to sections. Not implemented for tables.

Ancestors

  • builtins.tuple

Instance variables

var hbreak : str

Alias for field number 0

var offset : int

Alias for field number 1

var replace : bool

Alias for field number 3

var right : bool

Alias for field number 2

class HistoryAndPhysicalStripper

Remove H&P / History and Physical section data from ABS Epic records.

Ancestors

Inherited members

class InsuranceInformationDataStrippter

Remove an Insurance Information table: specific for TMA_GTR

Ancestors

Inherited members

class LTTuple (latch: ForwardRef('Callable[[Sequence[str]], bool]'), trigger: ForwardRef('Callable[[Sequence[str]], bool]') = <function LTTuple.<lambda>>, unlatch: ForwardRef('Callable[[Sequence[str]], bool]') = <function LTTuple.<lambda>>, title_check: ForwardRef('Callable[[str], bool]') = <function LTTuple.<lambda>>)

A Latch/Unlatch/Trigger mechanism for managing state transitions during iteration.

LTTuple defines four callable attributes: latch, trigger, unlatch, and title_check. Typically, an iterative process will take some action if the current iteration object returns True when passed to 'trigger' if and only if a prior object passed to 'latch' has returned True and none of the objects between the latching object and the triggering object have returned True when passed to 'unlatch'.

Attributes

latch : Callable[[Sequence[str]], bool]
Function to determine when to latch.
trigger : Callable[[Sequence[str]], bool]
Function to determine when to trigger. Default is lambda x: True.
unlatch : Callable[[Sequence[str]], bool]
Function to determine when to unlatch. Default is lambda x: False.
title_check : Callable[[str]], bool]
An optional function for checking a section the title. If the current title returns False, skip the check. Default is lambda x: True

Example

>>> ltt = LTTuple(
...     latch=lambda remainder: False if not remainder else remainder[0]=="latch",
...     trigger=lambda remainder: False if not remainder else remainder[0]=="trigger",
...     unlatch=lambda remainder: False if not remainder else remainder[0]=="unlatch"
... )
>>> lines = ["latch", "unlatch", "trigger", "latch", "line1", "line3", "trigger", "end"]
>>> latched = False
>>> for i, _ in enumerate(lines):
...     latched = (latched or ltt.latch(lines[i:])) and not ltt.unlatch(lines[i:])
...     if latched and ltt.trigger(lines[i:]):
...         print(i)
6

Ancestors

  • builtins.tuple

Instance variables

var latch : collections.abc.Callable[[collections.abc.Sequence[str]], bool]

Alias for field number 0

var title_check : collections.abc.Callable[[str], bool]

Alias for field number 3

var trigger : collections.abc.Callable[[collections.abc.Sequence[str]], bool]

Alias for field number 1

var unlatch : collections.abc.Callable[[collections.abc.Sequence[str]], bool]

Alias for field number 2

class LineRollCheck (description: ForwardRef('str'), lines_check: ForwardRef('Callable[[int, list[str]], bool]'), roll_func: ForwardRef('Callable[[list[int], list[str]], None]') = <function default_roll_func>, title_check: ForwardRef('Callable[[str], bool]') = <function LineRollCheck.<lambda>>)

Defines tests for detecting wrapped lines and correcting them.

Attributes

description : str
appears in a log message if this check is fired.
lines_check : Callable[[int, list[str]], bool]
given an index and the full list of lines in a section, return True if the line at index + 1 should be rolled up to the index line. False otherwise.
roll_func : Callable[[list[int], list[str]], list[str]]
given a list of line indexes where line wrapping occurs and the list of lines in the current section, update the list of lines by rolling all wrapped line text into the previous line. Defaults to default_roll_func.
title_check : Callable[[str], bool]
given the title of the current section, return True to enable this line roll check or False to skip it. Defaults to lambda _: True.

Ancestors

  • builtins.tuple

Instance variables

var description : str

Alias for field number 0

var lines_check : collections.abc.Callable[[int, list[str]], bool]

Alias for field number 1

var roll_func : collections.abc.Callable[[list[int], list[str]], None]

Alias for field number 2

var title_check : collections.abc.Callable[[str], bool]

Alias for field number 3

class RevisionHistoryStripper

Remove revision history sections to prevent parsing errors.

Ancestors

Inherited members

class SectExtBase

Defines fields required in a class in order for it to function in use cases intended for a full SectionExtractor instance.

Ancestors

  • abc.ABC

Static methods

def new_from_table_dict(tab_dict: dict[str, list[dict[str, str]]])

return a new SectExtBase instance with self.table_dictionary set equal to tab_dict

class SectStartDisqualifier (title_check: ForwardRef('Callable[[str], bool]') = <function SectStartDisqualifier.<lambda>>, remaining_lines_disqualifier: ForwardRef('Callable[[Sequence[str]], bool]') = <function SectStartDisqualifier.<lambda>>)

Section start disqalifier. Prevents a new section from triggering if the conditions described below are met.

Attributes

title_check : Callable[[str], bool]
given the title of the currently extracting section, return True to enable this disqualifier or False to skip it.
remaining_lines_disqualifier : Callable[[Sequence[str]], bool]
given the lines remaining in the extracted text, return True to prevent a new section from starting at the current position.

Ancestors

  • builtins.tuple

Instance variables

var remaining_lines_disqualifier : collections.abc.Callable[[collections.abc.Sequence[str]], bool]

Alias for field number 1

var title_check : collections.abc.Callable[[str], bool]

Alias for field number 0

class StripperTuple (stripper_class: ForwardRef('Type[DocumentStripper]'), kwargs: ForwardRef('dict[str, Any]'))

Simple tuple defining a stripper class and its kwargs.

Ancestors

  • builtins.tuple

Instance variables

var kwargs : dict[str, typing.Any]

Alias for field number 1

var stripper_class : Type[DocumentStripper]

Alias for field number 0

class TimelineDataStripper

Remove Patient Care Timeline section from ABS Epic records.

Ancestors

Inherited members

class VersionStripper (*, depth=2, end_strip_latches: Sequence[LTTuple] = ())

Remove all but the most recent version for all Op Notes.

Ancestors

Inherited members