Module utilities.table_interpreters

library of "interpreter" functions called during table extraction that covert free text lines into a raw table format

Functions

def bullet_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Extracts rows corresponding to the first column value from a bulleted list.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

min_split_spaces : int
Minimum number of spaces to split. Default is 2.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = [
...     'Column1                               Column2',
...     '  •   ValueRow1                       IrrelevantRow1',
...     '  •   ValueRow2',
...     '  •   ValueRow3                       IrrelevantRow3',
... ]
>>> bullet_interpreter(lines, 'example_table')
[{'Column1': 'ValueRow1'},
 {'Column1': 'ValueRow2'},
 {'Column1': 'ValueRow3'}]
def cerner_events_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Process OCR results from a Cerner Intraop Actions table.

This function processes a list of lines from OCR results, specifically for a Cerner Intraop Actions table. It currently supports only two columns of event data, separated by at least seven spaces and is limited to collecting Anesthesia Start and Stop times. Future improvements should use utils.columns and provide support for an arbitrary number of data columns and data for additional events.

Args

lines : list[str]
The input list of lines from OCR results.
table_name : str
The name of the table being processed.

KwArgs

debug : bool
If True, logs debug information. Defaults to False.

Returns

list[dict[str, str]]
A list of dictionaries representing the processed data.

Example

>>> lines = [
...     "12/1/2024                             12/1/2024",
...     " 12:01Patient In Room                  15:34Anesthesia Stop",
...     "       Anesthesia Start"
... ]
>>> cerner_events_interpreter(lines, "Intraop Actions")
[{'Date': '1924-12-01', 'Time': '12:01', 'Event': 'Anesthesia Start'},
{'Date': '1924-12-01', 'Time': '15:34', 'Event': 'Anesthesia Stop'}]
def complex_roll_helper(lines: list[str], r_keys: Sequence[str], **kwargs) ‑> dict[str, str]

Helper function for fields_interpreter() to handle tables with a complicated multi-column, multiline value structure.

Args

lines : list[str]
List of lines in the table.
r_keys : Sequence[str]
Sequence of rollover keys for the fields.

KwArgs

value_append_separator : str
Separator to append values. Default is " ".
force_save_keys : tuple[str, …]
Tuple of keys to force save. Default is ().
min_split_spaces : int
Minimum number of spaces to split. Default is 2.
debug : bool
Flag to enable debug mode. Default is False.
min_val_length : int
Minimum length of values. Default is 0.

Returns

dict[str, str]
Dictionary where each key is a field and each value is the

corresponding concatenated value.

Example

>>> lines = [
...     "Field1: start of val1   Field2: start of val2   Field3: start of val3",
...     "   end of val1             end of val2             end of val3"
... ]
>>> r_keys = ["Field1", "Field2", "Field3"]
>>> complex_roll_helper(lines, r_keys)
{'Field1': 'start of val1 end of val1',
 'Field2': 'start of val2 end of val2',
 'Field3': 'start of val3 end of val3'}
def date_pivot_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

With field names in the left column and a table of values for multiple date/time headings.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

min_rows : int
Minimum number of rows. Default is 2.
min_split_spaces : int
Minimum number of spaces to split. Default is 2.
debug : bool
Flag to enable debug mode. Default is False.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = [
...     "                    01/01/22        01/02/22        01/03/22",
...     "                    08:00           09:00           10:00",
...     "Field name A:       --              2nd A           3rd A",
...     "Wrapped field       1st B           2nd B           --",
...     "name B:             1st B (cont'd)",
...     "                    01/04/22        01/05/22        01/06/22",
...     "                    11:00           12:00           13:00",
...     "Field name A:       4th A           5th A           6th A",
...     "                    4th A (cont'd)                  6th A (cont'd)",
...     "                       01/04/22        01/05/22       01/06/22",
...     "                       11:00           12:00          13:00",
...     "Wrapped field          4th B                          6th B",
...     "name B:                4th B (cont'd)  --",
...     "                       01/07/22",
...     "                       14:00",
...     "Field name A:          7th A",
...     "Wrapped field",
...     "name B:                7th B",
... ]
>>> date_pivot_interpreter(lines, 'example_table')
[{'Date': '01/01/22 08:00',
'Field name A': '--',
'Wrapped field name B': "1st B 1st B (cont'd)"},
{'Date': '01/02/22 09:00',
'Field name A': '2nd A',
'Wrapped field name B': '2nd B'},
{'Date': '01/03/22 10:00',
'Field name A': '3rd A',
'Wrapped field name B': '--'},
{'Date': '01/04/22 11:00',
'Field name A': "4th A 4th A (cont'd)",
'Wrapped field name B': "4th B 4th B (cont'd)"},
{'Date': '01/05/22 12:00',
'Field name A': '5th A',
'Wrapped field name B': '--'},
{'Date': '01/06/22 13:00',
'Field name A': "6th A 6th A (cont'd)",
'Wrapped field name B': '6th B'},
{'Date': '01/07/22 14:00',
'Field name A': '7th A',
'Wrapped field name B': '7th B'}]
def dual_pivot_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Similar to pivot_interpreter except there are 4 total columns, two of which contain keys, two of which contain values.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

debug : bool
Flag to enable debug mode. Default is False.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = [
...     '                  Entry 1',
...     'Left Key 1        Left Value 1       Right Key 1     Right Value 1',
...     'Left Key 2        Left Value 2       Right Key 2     Right Value 2',
...     'Left Key 3        Left Value 3       Right Key 3     Right Value 3',
... ]
>>> dual_pivot_interpreter(lines, "example_table")
[{'Left Key 1': 'Left Value 1',
 'Left Key 2': 'Left Value 2',
 'Left Key 3': 'Left Value 3',
 'Right Key 1': 'Right Value 1',
 'Right Key 2': 'Right Value 2',
 'Right Key 3': 'Right Value 3'}]
def extended_header_lines(all_lines: Sequence[str], window: tuple[int, int], first_header='', extended_header='Details', true_line_check: collections.abc.Callable[[str], bool] = <function <lambda>>, headers_in_window=False) ‑> list[str]

Prepends the last header line found prior to the window as the header row. first_header is prepended and extended_header is appended to the captured header row while maintaining spacing.

Lines in the window are classified as true lines or addenda lines. Addenda lines are rolled up onto corresponding true lines to serve as the values for extended_header. Useful for tables that are interrupted periodically with "column" values spanning the entire page.

Args

all_lines : Sequence[str]
All lines for all subtables.
window : tuple[int, int]
Start and end indices of this subtable.
first_header : str
Prepended to the header row. Defaults to '', i.e. the first header was already present in the data.
extended_header : str
Appended to the header row. Defaults to 'Details'.
true_line_check : Callable[[str], bool]
Determines if a line is a true line. Defaults to lambda line: not utils.lindent(line).
headers_in_window : bool
Flag if headers are in the window. Defaults to False.

Returns

list[str]
List of lines with the extended header and rolled-up addenda lines.

Example

>>> all_lines = [
...     "Header1    Header2",
...     "Value1     Value2",
...     " Addenda1",
...     "Value3     Value4",
...     " Addenda2",
... ]
>>> window = (1, 3)
>>> extended_header_lines(all_lines, window)
['Header1    Header2     Details',
'Value1     Value2       Addenda1']
def fields_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Interprets a table with fields and values, handling various formatting issues and rollovers for multiline fields.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

debug : bool
Flag to enable debug mode. Default is False.
force_save_keys : list[str]
List of keys to force save. Default is [].
roll_keys : list[tuple[str, …]]
List of keys for complex roll. Default is [].
roll_on_ending_colon : bool
Flag to roll on ending colon. Default is True.
roll_on_titles : bool
Flag to roll on title case lines. Default is True.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = [
...     "Field1: Value1",
...     "Field2: Value2",
...     "Field3: Value3",
...     "Field4: Value4",
... ]
>>> fields_interpreter(lines, "example_table")
[{'Field1': 'Value1', 'Field2': 'Value2', 'Field3': 'Value3', 'Field4': 'Value4'}]
def flex_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Interprets a flexible table layout from a list of lines.

This function is designed to handle various tabular layouts, including those with 'field: value' pairs and mixed layouts. It was developed for the Epic Anesthesia Record but supports many other tabular formats.

The function processes the input lines to identify columns and values, supporting cases where columns wrap to new lines. It can handle tables where columns are right-aligned and tables with fields that span multiple lines.

Args

lines : list[str]
List of strings representing the lines of the table.
table_name : str
Name of the table being interpreted.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.
def interpreter_check(interpreter: collections.abc.Callable[[list[str], str], ~T_INT_RESULT]) ‑> collections.abc.Callable[[list[str], str], ~T_INT_RESULT]

Decorator for interpreter functions to check table name and call an alternate interpreter as defined by table_specs. Prints the initial line for debugging if the table is in the debug_table list in table_specs.

Usage

>>> @interpreter_check
... def your_interpreter(lines, table_name, **kwargs):
...     pass

Args

interpreter : Callable[[list[str], str], tu.T_INT_RESULT]
The interpreter function to be decorated.

Returns

Callable[[list[str], str], tu.T_INT_RESULT]
The wrapped interpreter function.
def multicol_no_field_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Interprets a table with multiple columns and no field names, handling rollovers for any combination of columns.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

debug : bool
Flag to enable debug mode. Default is False.

Returns

list[dict[str, str]]
Single item list of dictionary where each key is the column index and each value is the corresponding cell value.

Example

>>> lines = [
...     'Value                  Value 2',
...     '1',
...     'Value 3                Value',
...     '                       4',
...     'Value 5                Value 6',
... ]
>>> multicol_no_field_interpreter(lines, "example_table")
[{'0': 'Value 1',
 '1': 'Value 2',
 '2': 'Value 3',
 '3': 'Value 4',
 '4': 'Value 5',
 '5': 'Value 6'}]
def null_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Prevents extraction of supplied table. Used as an alt_interpreter for unextractable or irrelevant tables.

Returns

A list with a single dict having key 'null' and value '' regardless of input

def pivot_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Interprets and converts poorly formatted tables into a structured format.

This function was created to support Cerner's poorly formatted tables. It processes the input lines to extract a columnar representation, rotates the table, and removes unnecessary rows and columns. The function then concatenates split values and returns a list of dictionaries representing the table rows.

Args

lines : list[str]
List of strings representing the lines of the table.
table_name : str
Name of the table being interpreted.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = [
...     '                         Entry 1                        Entry 2',
...     'Case Attendee            Asgharian, MD, Behnam          Richardson CRNA,',
...     '                                                        Rebecca B',
...     'Role Performed           Surgeon - Primary              Anesthesia Provider of',
...     '                                                        Record',
...     'Time In                  09/16/22 08:52:00              09/16/22 08:45:00',
...     'Time Out                 09/16/22 09:09:00              09/16/22 09:12:00',
...     'Procedure                Esophagogastroduodenosco       Esophagogastroduodenosco',
...     '                         py(Upper)                      py(Upper)',
...     'Vendor Rep',
...     'Last Modified By:        Brookhart, Todd A              Brookhart, Todd A',
...     '                         09/16/22 09:16:14              09/16/22 09:16:14',
... ]
>>> pivot_interpreter(lines, "example_table")
[{'Case Attendee': 'Asgharian, MD, Behnam',
'Role Performed': 'Surgeon - Primary',
'Time In': '09/16/22 08:52:00',
'Time Out': '09/16/22 09:09:00',
'Procedure': 'Esophagogastroduodenosco py(Upper)',
'Last Modified By': 'Brookhart, Todd A 09/16/22 09:16:14'},
{'Case Attendee': 'Richardson CRNA, Rebecca B',
'Role Performed': 'Anesthesia Provider of Record',
'Time In': '09/16/22 08:45:00',
'Time Out': '09/16/22 09:12:00',
'Procedure': 'Esophagogastroduodenosco py(Upper)',
'Last Modified By': 'Brookhart, Todd A 09/16/22 09:16:14'}]
def prepended_title_column_lines(all_lines: Sequence[str], table_window: tuple[int, int], is_header: collections.abc.Callable[[str], bool], is_value: collections.abc.Callable[[str], bool], title_column_name: str = 'Title', title_column_padding: int = 5) ‑> list[str]

Prepends the first line of each subtable window as a new column value in all remaining lines.

Args

all_lines : Sequence[str]
All lines for all subtables.
table_window : tuple[int, int]
Start and end indices of this subtable.
is_header : Callable[[str], bool]
Prepends the column name if True is returned.
is_value : Callable[[str], bool]
Prepends the new value if True is returned.
title_column_name : str
Name of the new column. Defaults to "Title".
title_column_padding : int
Number of spaces to pad the new column. Defaults to 5.

Returns

list[str]
List of lines with the new column prepended.

Example

>>> prepended_title_column_lines(
...     all_lines=[
...         'propofol injx (mg)',  # title line
...         '  Date/Time          Admin User',  # header line
...         '       1700          Matthew Krumholz,',  # value line
...         '                     CRNA',  # wrapped value line
...         'fentanyl (mcg)',  # title line
...         '  Date/Time          Admin User',  # header line
...         '       1700          Matthew Krumholz,',  # value line
...         '                     CRNA',  # wrapped value line
...     ],
...     table_window=(0, 4),
...     is_header=lambda line: line.strip().startswith('Date/Time'),
...     is_value=lambda line: utils.lindent(line) < 30 and line.strip()[0].isnumeric(),
...     title_column_name='Medication',
...     title_column_padding=5,
... )
['Medication               Date/Time          Admin User',
 'propofol injx (mg)            1700          Matthew Krumholz,',
 '                                            CRNA']
def regex_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Given a regex expression for a full row match with named groups corresponding to key/value pairs, return the list of matched groupdicts.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

regex_expr : re.Pattern
Compiled regex expression to match lines.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = ['line 1', 'line 2', 'bad line', 'line 3']
>>> regex_expr = re.compile('^(?P<label>line) (?P<idx>\d)$', re.MULTILINE)
>>> regex_interpreter(lines, 'example_table', regex_expr=regex_expr)
[{'label': 'line', 'idx': '1'},
 {'label': 'line', 'idx': '2'},
 {'label': 'line', 'idx': '3'}]
def simple_roll_helper(lines: list[str], **kwargs) ‑> tuple[str, str]

Helper function for fields_interpreter() to handle tables with a multiline value structure.

Args

lines : list[str]
List of lines in the table.

KwArgs

force_save_keys : tuple[str, …]
Tuple of keys to force save. Default is ().
value_append_separator : str
Separator to append values. Default is " ".
min_val_length : int
Minimum length of values. Default is 0.
min_split_spaces : int
Minimum number of spaces to split. Default is 2.

Returns

tuple[str, str]
A tuple where the first element is the field key and the second element is the concatenated value.

Example

>>> lines = [
...     "Field1:",
...     "    First line of field1 value",
...     "    Second line of field1 value",
... ]
>>> simple_roll_helper(lines)
('Field1', 'First line of field1 value Second line of field1 value')
def single_value_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Interprets a table with a single value under a heading.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

debug : bool
Flag to enable debug mode. Default is False.

Returns

list[dict[str, str]]
List containing a single dictionary where the key is the

table heading and the value is the corresponding single value.

Example

>>> lines = ["   Anesthesia type: General"]
>>> single_value_interpreter(lines, "Final Anesthesia Type")
[{'Final Anesthesia Type': 'General'}]
def split_fields_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

An extension of fields_interpreter() that vertically segments reliably spaced columns of key/value pairs and stacks them prior to calling it's unextended namesake.

Supports proper "rolling value" data collection as shown in the Example for columns '1st Col Key 1' and '2nd Col Key 2' which should both contain value 'Value\nContinued Value'. The presence of a 2nd key inline with a value continuation (e.g. "...Continued Value 2nd Col Key 2: Value") prevents the standard fields_interpreter from picking up the 2nd value lines.

Args

lines : list[str]
raw text, always comes in a form of 2 columns
table_name : str
not used name of the table, always Events

KwArgs

debug : bool
if True, log the input and output lines
min_split_spaces : int
minimum number of spaces for triggering splits
force_page_breaks : list[Callable[[str], bool]]
list of functions that lines will be passed to. If any function returns True, a page break will be forced at that line. Serves as an arg to utils.columns() when vertically partitioning lines into columns.
force_save_keys : list[str]
List of keys to force save. Default is [].
roll_keys : list[tuple[str, …]]
List of keys for complex roll. Default is [].
roll_on_ending_colon : bool
Flag to roll on ending colon. Default is True.
roll_on_titles : bool
Flag to roll on title case lines. Default is True.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Examples

>>> unsplit = [
...    "1st Col Key 1: Value                   2nd Col Key 1: Value",
...    "               Continued Value         2nd Col Key 2: Value",
...    "1st Col Key 2: Value                                  Continued Value",
... ]
>>> split = [
...     "1st Col Key 1: Value",
...     "               Continued Value",
...     "1st Col Key 2: Value",
...     "2nd Col Key 1: Value",
...     "2nd Col Key 2: Value",
...     "               Continued Value",
... ]
>>> roll_keys = [("1st Col Key 1",), ("2nd Col Key 2",)]
>>> split_fields_output = split_fields_interpreter(unsplit, "", roll_keys=roll_keys)
>>> split_fields_output
[{'1st Col Key 1': 'Value Continued Value', '1st Col Key 2': 'Value', '2nd Col Key 1': 'Value', '2nd Col Key 2': 'Value Continued Value'}]
>>> fields_output = fields_interpreter(split, "", roll_keys=roll_keys)
>>> split_fields_output == fields_output
True
>>> fields_output_unsplit = fields_interpreter(unsplit, "", roll_keys=roll_keys)
>>> fields_output_unsplit
[{'1st Col Key 1': 'Value Continued Value', '2nd Col Key 2': 'Value', '1st Col Key 2': 'Value'}]
def subtable_interpreter(lines: list[str], table_name: str, **kwargs) ‑> SubtableParser

Processes tables with subtables as table entries.

See SubtableParser for more information.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

debug : bool
Flag to enable debug mode. Default is False.
subtable_parser : Callable
SubtableParser instance to parse subtables. Default is tu.SubtableParser(table_interpreter()).

Returns

tu.SubtableParser
Parsed subtables.
def table_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]

Standard table interpreter that processes table lines and returns a list of dictionaries representing the table rows.

Args

lines : list[str]
List of lines in the table.
table_name : str
Name of the table being interpreted.

KwArgs

debug : bool
Flag to enable debug mode. Default is False.
split_table_columns : list[str]
List of columns to split. Default is [].
min_split_spaces : int
Minimum number of spaces to split. Default is 1.
min_val_length : int
Minimum length of values. Default is 5.
force_page_breaks : list[Callable[[str], bool]]
List of functions to force page breaks. Default is [].
min_rows : int
Minimum number of rows. Default is 2.

Returns

list[dict[str, str]]
List of dictionaries where each dictionary represents a row in the table.

Example

>>> lines = [
...     "Header1    Header2    Header3",
...     "Value1     Value2     Value3",
...     "Value4     Value5     Value6",
... ]
>>> table_interpreter(lines, "example_table")
[{'Header1': 'Value1', 'Header2': 'Value2', 'Header3': 'Value3'}, {'Header1': 'Value4', 'Header2': 'Value5', 'Header3': 'Value6'}]
def validation_failed_msg(origin: str, msg_type: str, lines: list[str], result: Optional[~T_INT_RESULT] = None, ex: Exception | None = None) ‑> str

Compile a message containing all relevant information for troubleshooting in the event of an interpreter failure.

Args

origin : str
The origin of the message.
msg_type : str
The type of the message.
lines : list[str]
List of lines related to the failure.
result : tu.T_INT_RESULT | None
The result of the interpreter, if any. Defaults to None.
ex : Exception | None
The exception that was raised, if any. Defaults to None.

Returns

str
A compiled message containing all relevant information for troubleshooting.

Example

>>> lines = ["line1", "line2"]
>>> result = [{"key1": "value1"}, {"key2": "value2"}]
>>> ex = ValueError("An error occurred")
>>> validation_failed_msg("origin", "msg_type", lines, result, ex).splitlines()
['origin: msg_type',
 '****** EXCEPTION ******:',
 '  An error occurred',
 'Traceback:',
 '****** RESULT ******:',
 '  Row [0]:',
 '    key1 : value1,',
 '  Row [1]:',
 '    key2 : value2,',
 '****** LINES *******:',
 'line1',
 'line2']