Module utilities.table_interpreters
library of "interpreter" functions called during table extraction that covert free text lines into a raw table format
Functions
def bullet_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Extracts rows corresponding to the first column value from a bulleted list.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
min_split_spaces
:int
- Minimum number of spaces to split. Default is 2.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = [ ... 'Column1 Column2', ... ' • ValueRow1 IrrelevantRow1', ... ' • ValueRow2', ... ' • ValueRow3 IrrelevantRow3', ... ] >>> bullet_interpreter(lines, 'example_table') [{'Column1': 'ValueRow1'}, {'Column1': 'ValueRow2'}, {'Column1': 'ValueRow3'}]
def cerner_events_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Process OCR results from a Cerner Intraop Actions table.
This function processes a list of lines from OCR results, specifically for a Cerner Intraop Actions table. It currently supports only two columns of event data, separated by at least seven spaces and is limited to collecting Anesthesia Start and Stop times. Future improvements should use
utils.columns
and provide support for an arbitrary number of data columns and data for additional events.Args
lines
:list[str]
- The input list of lines from OCR results.
table_name
:str
- The name of the table being processed.
KwArgs
debug
:bool
- If True, logs debug information. Defaults to False.
Returns
list[dict[str, str]]
- A list of dictionaries representing the processed data.
Example
>>> lines = [ ... "12/1/2024 12/1/2024", ... " 12:01Patient In Room 15:34Anesthesia Stop", ... " Anesthesia Start" ... ] >>> cerner_events_interpreter(lines, "Intraop Actions") [{'Date': '1924-12-01', 'Time': '12:01', 'Event': 'Anesthesia Start'}, {'Date': '1924-12-01', 'Time': '15:34', 'Event': 'Anesthesia Stop'}]
def complex_roll_helper(lines: list[str], r_keys: Sequence[str], **kwargs) ‑> dict[str, str]
-
Helper function for
fields_interpreter()
to handle tables with a complicated multi-column, multiline value structure.Args
lines
:list[str]
- List of lines in the table.
r_keys
:Sequence[str]
- Sequence of rollover keys for the fields.
KwArgs
value_append_separator
:str
- Separator to append values. Default is " ".
force_save_keys
:tuple[str, …]
- Tuple of keys to force save. Default is ().
min_split_spaces
:int
- Minimum number of spaces to split. Default is 2.
debug
:bool
- Flag to enable debug mode. Default is False.
min_val_length
:int
- Minimum length of values. Default is 0.
Returns
dict[str, str]
- Dictionary where each key is a field and each value is the
corresponding concatenated value.
Example
>>> lines = [ ... "Field1: start of val1 Field2: start of val2 Field3: start of val3", ... " end of val1 end of val2 end of val3" ... ] >>> r_keys = ["Field1", "Field2", "Field3"] >>> complex_roll_helper(lines, r_keys) {'Field1': 'start of val1 end of val1', 'Field2': 'start of val2 end of val2', 'Field3': 'start of val3 end of val3'}
def date_pivot_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
With field names in the left column and a table of values for multiple date/time headings.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
min_rows
:int
- Minimum number of rows. Default is 2.
min_split_spaces
:int
- Minimum number of spaces to split. Default is 2.
debug
:bool
- Flag to enable debug mode. Default is False.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = [ ... " 01/01/22 01/02/22 01/03/22", ... " 08:00 09:00 10:00", ... "Field name A: -- 2nd A 3rd A", ... "Wrapped field 1st B 2nd B --", ... "name B: 1st B (cont'd)", ... " 01/04/22 01/05/22 01/06/22", ... " 11:00 12:00 13:00", ... "Field name A: 4th A 5th A 6th A", ... " 4th A (cont'd) 6th A (cont'd)", ... " 01/04/22 01/05/22 01/06/22", ... " 11:00 12:00 13:00", ... "Wrapped field 4th B 6th B", ... "name B: 4th B (cont'd) --", ... " 01/07/22", ... " 14:00", ... "Field name A: 7th A", ... "Wrapped field", ... "name B: 7th B", ... ] >>> date_pivot_interpreter(lines, 'example_table') [{'Date': '01/01/22 08:00', 'Field name A': '--', 'Wrapped field name B': "1st B 1st B (cont'd)"}, {'Date': '01/02/22 09:00', 'Field name A': '2nd A', 'Wrapped field name B': '2nd B'}, {'Date': '01/03/22 10:00', 'Field name A': '3rd A', 'Wrapped field name B': '--'}, {'Date': '01/04/22 11:00', 'Field name A': "4th A 4th A (cont'd)", 'Wrapped field name B': "4th B 4th B (cont'd)"}, {'Date': '01/05/22 12:00', 'Field name A': '5th A', 'Wrapped field name B': '--'}, {'Date': '01/06/22 13:00', 'Field name A': "6th A 6th A (cont'd)", 'Wrapped field name B': '6th B'}, {'Date': '01/07/22 14:00', 'Field name A': '7th A', 'Wrapped field name B': '7th B'}]
def dual_pivot_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Similar to pivot_interpreter except there are 4 total columns, two of which contain keys, two of which contain values.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
debug
:bool
- Flag to enable debug mode. Default is False.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = [ ... ' Entry 1', ... 'Left Key 1 Left Value 1 Right Key 1 Right Value 1', ... 'Left Key 2 Left Value 2 Right Key 2 Right Value 2', ... 'Left Key 3 Left Value 3 Right Key 3 Right Value 3', ... ] >>> dual_pivot_interpreter(lines, "example_table") [{'Left Key 1': 'Left Value 1', 'Left Key 2': 'Left Value 2', 'Left Key 3': 'Left Value 3', 'Right Key 1': 'Right Value 1', 'Right Key 2': 'Right Value 2', 'Right Key 3': 'Right Value 3'}]
def extended_header_lines(all_lines: Sequence[str], window: tuple[int, int], first_header='', extended_header='Details', true_line_check: collections.abc.Callable[[str], bool] = <function <lambda>>, headers_in_window=False) ‑> list[str]
-
Prepends the last header line found prior to the window as the header row.
first_header
is prepended andextended_header
is appended to the captured header row while maintaining spacing.Lines in the window are classified as true lines or addenda lines. Addenda lines are rolled up onto corresponding true lines to serve as the values for
extended_header
. Useful for tables that are interrupted periodically with "column" values spanning the entire page.Args
all_lines
:Sequence[str]
- All lines for all subtables.
window
:tuple[int, int]
- Start and end indices of this subtable.
first_header
:str
- Prepended to the header row. Defaults to '', i.e. the first header was already present in the data.
extended_header
:str
- Appended to the header row. Defaults to 'Details'.
true_line_check
:Callable[[str], bool]
- Determines if a line is a true line.
Defaults to
lambda line: not utils.lindent(line)
. headers_in_window
:bool
- Flag if headers are in the window. Defaults to False.
Returns
list[str]
- List of lines with the extended header and rolled-up addenda lines.
Example
>>> all_lines = [ ... "Header1 Header2", ... "Value1 Value2", ... " Addenda1", ... "Value3 Value4", ... " Addenda2", ... ] >>> window = (1, 3) >>> extended_header_lines(all_lines, window) ['Header1 Header2 Details', 'Value1 Value2 Addenda1']
def fields_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Interprets a table with fields and values, handling various formatting issues and rollovers for multiline fields.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
debug
:bool
- Flag to enable debug mode. Default is False.
force_save_keys
:list[str]
- List of keys to force save. Default is [].
roll_keys
:list[tuple[str, …]]
- List of keys for complex roll. Default is [].
roll_on_ending_colon
:bool
- Flag to roll on ending colon. Default is True.
roll_on_titles
:bool
- Flag to roll on title case lines. Default is True.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = [ ... "Field1: Value1", ... "Field2: Value2", ... "Field3: Value3", ... "Field4: Value4", ... ] >>> fields_interpreter(lines, "example_table") [{'Field1': 'Value1', 'Field2': 'Value2', 'Field3': 'Value3', 'Field4': 'Value4'}]
def flex_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Interprets a flexible table layout from a list of lines.
This function is designed to handle various tabular layouts, including those with 'field: value' pairs and mixed layouts. It was developed for the Epic Anesthesia Record but supports many other tabular formats.
The function processes the input lines to identify columns and values, supporting cases where columns wrap to new lines. It can handle tables where columns are right-aligned and tables with fields that span multiple lines.
Args
lines
:list[str]
- List of strings representing the lines of the table.
table_name
:str
- Name of the table being interpreted.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
def interpreter_check(interpreter: collections.abc.Callable[[list[str], str], ~T_INT_RESULT]) ‑> collections.abc.Callable[[list[str], str], ~T_INT_RESULT]
-
Decorator for interpreter functions to check table name and call an alternate interpreter as defined by table_specs. Prints the initial line for debugging if the table is in the debug_table list in table_specs.
Usage
>>> @interpreter_check ... def your_interpreter(lines, table_name, **kwargs): ... pass
Args
interpreter
:Callable[[list[str], str], tu.T_INT_RESULT]
- The interpreter function to be decorated.
Returns
Callable[[list[str], str], tu.T_INT_RESULT]
- The wrapped interpreter function.
def multicol_no_field_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Interprets a table with multiple columns and no field names, handling rollovers for any combination of columns.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
debug
:bool
- Flag to enable debug mode. Default is False.
Returns
list[dict[str, str]]
- Single item list of dictionary where each key is the column index and each value is the corresponding cell value.
Example
>>> lines = [ ... 'Value Value 2', ... '1', ... 'Value 3 Value', ... ' 4', ... 'Value 5 Value 6', ... ] >>> multicol_no_field_interpreter(lines, "example_table") [{'0': 'Value 1', '1': 'Value 2', '2': 'Value 3', '3': 'Value 4', '4': 'Value 5', '5': 'Value 6'}]
def null_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Prevents extraction of supplied table. Used as an alt_interpreter for unextractable or irrelevant tables.
Returns
A list with a single dict having key 'null' and value '' regardless of input
def pivot_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Interprets and converts poorly formatted tables into a structured format.
This function was created to support Cerner's poorly formatted tables. It processes the input lines to extract a columnar representation, rotates the table, and removes unnecessary rows and columns. The function then concatenates split values and returns a list of dictionaries representing the table rows.
Args
lines
:list[str]
- List of strings representing the lines of the table.
table_name
:str
- Name of the table being interpreted.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = [ ... ' Entry 1 Entry 2', ... 'Case Attendee Asgharian, MD, Behnam Richardson CRNA,', ... ' Rebecca B', ... 'Role Performed Surgeon - Primary Anesthesia Provider of', ... ' Record', ... 'Time In 09/16/22 08:52:00 09/16/22 08:45:00', ... 'Time Out 09/16/22 09:09:00 09/16/22 09:12:00', ... 'Procedure Esophagogastroduodenosco Esophagogastroduodenosco', ... ' py(Upper) py(Upper)', ... 'Vendor Rep', ... 'Last Modified By: Brookhart, Todd A Brookhart, Todd A', ... ' 09/16/22 09:16:14 09/16/22 09:16:14', ... ] >>> pivot_interpreter(lines, "example_table") [{'Case Attendee': 'Asgharian, MD, Behnam', 'Role Performed': 'Surgeon - Primary', 'Time In': '09/16/22 08:52:00', 'Time Out': '09/16/22 09:09:00', 'Procedure': 'Esophagogastroduodenosco py(Upper)', 'Last Modified By': 'Brookhart, Todd A 09/16/22 09:16:14'}, {'Case Attendee': 'Richardson CRNA, Rebecca B', 'Role Performed': 'Anesthesia Provider of Record', 'Time In': '09/16/22 08:45:00', 'Time Out': '09/16/22 09:12:00', 'Procedure': 'Esophagogastroduodenosco py(Upper)', 'Last Modified By': 'Brookhart, Todd A 09/16/22 09:16:14'}]
def prepended_title_column_lines(all_lines: Sequence[str], table_window: tuple[int, int], is_header: collections.abc.Callable[[str], bool], is_value: collections.abc.Callable[[str], bool], title_column_name: str = 'Title', title_column_padding: int = 5) ‑> list[str]
-
Prepends the first line of each subtable window as a new column value in all remaining lines.
Args
all_lines
:Sequence[str]
- All lines for all subtables.
table_window
:tuple[int, int]
- Start and end indices of this subtable.
is_header
:Callable[[str], bool]
- Prepends the column name if True is returned.
is_value
:Callable[[str], bool]
- Prepends the new value if True is returned.
title_column_name
:str
- Name of the new column. Defaults to "Title".
title_column_padding
:int
- Number of spaces to pad the new column. Defaults to 5.
Returns
list[str]
- List of lines with the new column prepended.
Example
>>> prepended_title_column_lines( ... all_lines=[ ... 'propofol injx (mg)', # title line ... ' Date/Time Admin User', # header line ... ' 1700 Matthew Krumholz,', # value line ... ' CRNA', # wrapped value line ... 'fentanyl (mcg)', # title line ... ' Date/Time Admin User', # header line ... ' 1700 Matthew Krumholz,', # value line ... ' CRNA', # wrapped value line ... ], ... table_window=(0, 4), ... is_header=lambda line: line.strip().startswith('Date/Time'), ... is_value=lambda line: utils.lindent(line) < 30 and line.strip()[0].isnumeric(), ... title_column_name='Medication', ... title_column_padding=5, ... ) ['Medication Date/Time Admin User', 'propofol injx (mg) 1700 Matthew Krumholz,', ' CRNA']
def regex_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Given a regex expression for a full row match with named groups corresponding to key/value pairs, return the list of matched groupdicts.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
regex_expr
:re.Pattern
- Compiled regex expression to match lines.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = ['line 1', 'line 2', 'bad line', 'line 3'] >>> regex_expr = re.compile('^(?P<label>line) (?P<idx>\d)$', re.MULTILINE) >>> regex_interpreter(lines, 'example_table', regex_expr=regex_expr) [{'label': 'line', 'idx': '1'}, {'label': 'line', 'idx': '2'}, {'label': 'line', 'idx': '3'}]
def simple_roll_helper(lines: list[str], **kwargs) ‑> tuple[str, str]
-
Helper function for
fields_interpreter()
to handle tables with a multiline value structure.Args
lines
:list[str]
- List of lines in the table.
KwArgs
force_save_keys
:tuple[str, …]
- Tuple of keys to force save. Default is ().
value_append_separator
:str
- Separator to append values. Default is " ".
min_val_length
:int
- Minimum length of values. Default is 0.
min_split_spaces
:int
- Minimum number of spaces to split. Default is 2.
Returns
tuple[str, str]
- A tuple where the first element is the field key and the second element is the concatenated value.
Example
>>> lines = [ ... "Field1:", ... " First line of field1 value", ... " Second line of field1 value", ... ] >>> simple_roll_helper(lines) ('Field1', 'First line of field1 value Second line of field1 value')
def single_value_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Interprets a table with a single value under a heading.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
debug
:bool
- Flag to enable debug mode. Default is False.
Returns
list[dict[str, str]]
- List containing a single dictionary where the key is the
table heading and the value is the corresponding single value.
Example
>>> lines = [" Anesthesia type: General"] >>> single_value_interpreter(lines, "Final Anesthesia Type") [{'Final Anesthesia Type': 'General'}]
def split_fields_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
An extension of
fields_interpreter()
that vertically segments reliably spaced columns of key/value pairs and stacks them prior to calling it's unextended namesake.Supports proper "rolling value" data collection as shown in the Example for columns '1st Col Key 1' and '2nd Col Key 2' which should both contain value 'Value\nContinued Value'. The presence of a 2nd key inline with a value continuation (e.g.
"...Continued Value 2nd Col Key 2: Value"
) prevents the standard fields_interpreter from picking up the 2nd value lines.Args
lines
:list[str]
- raw text, always comes in a form of 2 columns
table_name
:str
- not used name of the table, always Events
KwArgs
debug
:bool
- if True, log the input and output lines
min_split_spaces
:int
- minimum number of spaces for triggering splits
force_page_breaks
:list[Callable[[str], bool]]
- list of functions that
lines will be passed to. If any function returns True, a page break
will be forced at that line. Serves as an arg to
utils.columns()
when vertically partitioning lines into columns. force_save_keys
:list[str]
- List of keys to force save. Default is [].
roll_keys
:list[tuple[str, …]]
- List of keys for complex roll. Default is [].
roll_on_ending_colon
:bool
- Flag to roll on ending colon. Default is True.
roll_on_titles
:bool
- Flag to roll on title case lines. Default is True.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Examples
>>> unsplit = [ ... "1st Col Key 1: Value 2nd Col Key 1: Value", ... " Continued Value 2nd Col Key 2: Value", ... "1st Col Key 2: Value Continued Value", ... ] >>> split = [ ... "1st Col Key 1: Value", ... " Continued Value", ... "1st Col Key 2: Value", ... "2nd Col Key 1: Value", ... "2nd Col Key 2: Value", ... " Continued Value", ... ] >>> roll_keys = [("1st Col Key 1",), ("2nd Col Key 2",)] >>> split_fields_output = split_fields_interpreter(unsplit, "", roll_keys=roll_keys) >>> split_fields_output [{'1st Col Key 1': 'Value Continued Value', '1st Col Key 2': 'Value', '2nd Col Key 1': 'Value', '2nd Col Key 2': 'Value Continued Value'}] >>> fields_output = fields_interpreter(split, "", roll_keys=roll_keys) >>> split_fields_output == fields_output True >>> fields_output_unsplit = fields_interpreter(unsplit, "", roll_keys=roll_keys) >>> fields_output_unsplit [{'1st Col Key 1': 'Value Continued Value', '2nd Col Key 2': 'Value', '1st Col Key 2': 'Value'}]
def subtable_interpreter(lines: list[str], table_name: str, **kwargs) ‑> SubtableParser
-
Processes tables with subtables as table entries.
See
SubtableParser
for more information.Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
debug
:bool
- Flag to enable debug mode. Default is False.
subtable_parser
:Callable
- SubtableParser instance to parse subtables.
Default is
tu.SubtableParser(table_interpreter())
.
Returns
tu.SubtableParser
- Parsed subtables.
def table_interpreter(lines: list[str], table_name: str, **kwargs) ‑> list[dict[str, str]]
-
Standard table interpreter that processes table lines and returns a list of dictionaries representing the table rows.
Args
lines
:list[str]
- List of lines in the table.
table_name
:str
- Name of the table being interpreted.
KwArgs
debug
:bool
- Flag to enable debug mode. Default is False.
split_table_columns
:list[str]
- List of columns to split. Default is [].
min_split_spaces
:int
- Minimum number of spaces to split. Default is 1.
min_val_length
:int
- Minimum length of values. Default is 5.
force_page_breaks
:list[Callable[[str], bool]]
- List of functions to force page breaks. Default is [].
min_rows
:int
- Minimum number of rows. Default is 2.
Returns
list[dict[str, str]]
- List of dictionaries where each dictionary represents a row in the table.
Example
>>> lines = [ ... "Header1 Header2 Header3", ... "Value1 Value2 Value3", ... "Value4 Value5 Value6", ... ] >>> table_interpreter(lines, "example_table") [{'Header1': 'Value1', 'Header2': 'Value2', 'Header3': 'Value3'}, {'Header1': 'Value4', 'Header2': 'Value5', 'Header3': 'Value6'}]
def validation_failed_msg(origin: str, msg_type: str, lines: list[str], result: Optional[~T_INT_RESULT] = None, ex: Exception | None = None) ‑> str
-
Compile a message containing all relevant information for troubleshooting in the event of an interpreter failure.
Args
origin
:str
- The origin of the message.
msg_type
:str
- The type of the message.
lines
:list[str]
- List of lines related to the failure.
result
:tu.T_INT_RESULT | None
- The result of the interpreter, if any. Defaults to None.
ex
:Exception | None
- The exception that was raised, if any. Defaults to None.
Returns
str
- A compiled message containing all relevant information for troubleshooting.
Example
>>> lines = ["line1", "line2"] >>> result = [{"key1": "value1"}, {"key2": "value2"}] >>> ex = ValueError("An error occurred") >>> validation_failed_msg("origin", "msg_type", lines, result, ex).splitlines() ['origin: msg_type', '****** EXCEPTION ******:', ' An error occurred', 'Traceback:', '****** RESULT ******:', ' Row [0]:', ' key1 : value1,', ' Row [1]:', ' key2 : value2,', '****** LINES *******:', 'line1', 'line2']