Module `utilities.transform_utils`

utility functions used by: table_transformer.py transform_specs.py summary_specs.py

Functions

def address_parser(address_string: str, tag_mapping: collections.abc.Mapping[str, str] | None = None) ‑> dict[str, str]

given a string containing an address, return a dict with keys from tag_mapping value set

Args

address_string : str: string containing a full address
tag_mapping : Mapping[str, str]: Optional. map from usaddress builtin tags

to user specified output tags. Defaults to gvars.USADDRESS_MAPPING.

Returns

dict[str, str]: dict containing all keys in the supplied tag_mapping.

def address_split(table_name: str, full_address: str, *, key_prefix: str = '', key_suffix: str = '', debug=False) ‑> types.SimpleNamespace

Given a full address (e.g., "102 Maple Ln, Leesburg, GA 31763"), return a simple namespace with attributes street1, street2, city, state, and zip.

This function parses a full address string and returns a SimpleNamespace object with the parsed address components. It handles special cases such as addresses in Hawaii and formats ZIP codes appropriately. Optional key prefixes and suffixes can be added to the returned namespace attributes. NOTE: Appropriate for use as the split_func parameter of a split_key operation in a specs.TransformSpec.

Args

table_name : str: Name of the table containing the address.
full_address : str: Full address string.
key_prefix : str: Optional. Prefix to add to all keys in the returned namespace. Must be valid to appear in a Python identifier if supplied. Defaults to "".
key_suffix : str: Optional. Suffix to add to all keys in the returned namespace. Must be valid to appear in a Python identifier if supplied. Defaults to "".
debug : bool: Optional. Enable debug logging. Defaults to False.

Returns

SimpleNamespace: A namespace with attributes street1, street2, city, state, and zip.

Example

>>> address = "102 Maple Ln, Leesburg, GA 31763"
>>> result = address_split("AddressTable", address)
>>> result.street1
'102 MAPLE LN'
>>> result.city
'LEESBURG'
>>> result.state
'GA'
>>> result.zip
'31763'

def contextualize_summary(flat_summary: dict[str, dict[str, vStr]], drop_nulls: bool = False, context_prefix='', context_sep='|', force_index=False) ‑> dict[str, dict[str, vStr]]

Prep a deduped summary for additional merge/dedup operations. Extends keys in all sub-dicts with contexts from their vStr values.

Args

flat_summary: flattened, deduped summary dict.
drop_nulls: if True, drop keys with null values (see is_null()).
context_prefix: prefix to add to each context string if not already present.
context_sep: separator between key and context string.
force_index: if True, override table indices with list indices. Set if contextualizing a summary that's already been properly deduped to avoid improper incrementing of list indices.

Returns

dict[str, dict[str, Any]]: contextualized summary dict.

def dedup_dict_lists(nest_dict)

Remove duplicates from a list of dictionaries or a nested dictionary containing lists of dictionaries.

This function recursively processes a nested dictionary or a list of dictionaries, removing duplicate dictionaries based on their key-value pairs. It handles nested structures and ensures that only unique dictionaries are retained. The function skips deduplication for the key "custom_fields".

Args

nest_dict : dict | list: A nested dictionary or a list of dictionaries to deduplicate.

Returns

dict | list: A deduplicated nested dictionary or list of dictionaries.

Example

>>> nest_dict = {
...     "key1": [{"a": "1", "b": "2"}, {"a": "1", "b": "2"}],
...     "key2": {"subkey": [{"c": "3"}, {"c": "3"}]},
...     "custom_fields": [{"d": "4"}, {"d": "4"}]
... }
>>> dedup_dict_lists(nest_dict)
{'key1': [{'a': '1', 'b': '2'}], 'key2': {'subkey': [{'c': '3'}]}, 'custom_fields': [{'d': '4'}, {'d': '4'}]}

def dedup_list_of_dict_by_keys(dict_list: list[dict]) ‑> list[dict]

Remove duplicates from a list of dictionaries based on their flattened key-value pairs.

This function takes a list of dictionaries and removes duplicates by comparing their flattened key-value pairs. It handles cases where the dictionaries have nested structures and ensures that only unique dictionaries are returned. If the list contains only one dictionary or if the dictionaries contain the key "inputType", no deduplication is performed.

Args

dict_list : list[dict]: A list of dictionaries to deduplicate.

Returns

list[dict]: A list of unique dictionaries.

Example

>>> dict_list = [
...     {"a": "1", "b": {"c": "2"}},
...     {"a": "1", "b": {"c": "2"}},
...     {"a": "3", "b": {"d": "4"}}
... ]
>>> dedup_list_of_dict_by_keys(dict_list)
[{'a': '1', 'b': {'c': '2'}}, {'a': '3', 'b': {'d': '4'}}]

def earliest_date(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

self explanatory

def earliest_datetime(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

self explanatory

def earliest_time(time_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

self explanatory

def employer_split(table_name: str, employer_string: str, **kwargs) ‑> types.SimpleNamespace

Split an employer string into individual fields for name, address, and phone.

This function processes a single string containing employer information, which may include the employer's name, address, and phone number, and splits it into individual fields. It handles various delimiters and uses the usaddress library to parse the address. NOTE: Appropriate for use as the split_func parameter of a split_key operation in a specs.TransformSpec.

Args

table_name : str: The name of the table containing the employer string.
employer_string : str: The string containing employer information.

KwArgs

debug : bool: Optional. Enable debug logging. Defaults to False.
person_type : str: Optional. The type of person (e.g., "patient"). Defaults to "patient". Used as a prefix for all namespace attrs.

Returns

SimpleNamespace: A namespace with attributes for the employer's name, address,

and phone number.

Example

>>> employer_string = "Acme Corp, 123 Main St, Anytown, USA; 843-563-4567"
>>> result = employer_split("EmployerTable", employer_string)
>>> result.patient_employer
'Acme Corp'
>>> result.patient_employer_street1
'123 Main St'
>>> result.patient_employer_city
'Anytown'
>>> result.patient_employer_phone
'+1 843-563-4567'

def explode_nested(nested: dict[str, typing.Any], explode_spec: ExplodeSpec) ‑> dict[str, typing.Any]

Explode an object in a nested dict into a list of dicts.

NOTE: updates nested in place.

Args

nested : dict[str, Any]: nested dict containing the object to explode
explode_spec : ExplodeSpec: defines which and how to explode keys in a nested object

def extend_key_with_value_context(key: str, value: str | vStr, context_prefix='', context_sep='|', force_index=False) ‑> str

Extend flattened key with value context to prevent dict merges from overwriting data and allow for FieldManager operations against merged schedule/demo and PDF extracted data.

Args

key : str: flattened key
value : vStr: value to be merged
context_sep : str: Optional. separator between key and context. Defaults to "|".
force_index: if True, override table index with list index. Set when working with a summary that's already been properly deduped to avoid improper incrementing of list indices.

Returns

str: extended key

def extract_asa_status(candidates: collections.abc.Sequence[str | vStr]) ‑> str | vStr

Convert extracted ASA status values to a string representation of an integer.

This function processes a sequence of strings to find the numeric 'Physical Status' from a list of arbitrary extracted values. It converts Roman numeral ASA statuses to integer strings prior to selecting the most frequent element in the set of extracted candidates.

Args

candidates : Sequence[str | vStr]: A sequence of strings containing Roman numeral ASA statuses.

Returns

str | vStr: The most frequent ASA status as a string representation of an

integer.

Example

>>> extract_asa_status(["II", "III", "IV", "2 - emergent"])
'2'

def flatten(nest: dict[str, typing.Any], drop_nulls: bool = False) ‑> dict[str, str | vStr]

Given a nested json object, return {jmespath_pattern: value} for all base values (i.e. values of type str, vStr, bool, int or float).

Args

nest: nested json object to flatten.
drop_nulls: if True, drop keys with null values (see is_null()).

Returns

dict[str, str | vStr]: flattened json object.

def flatten_summary(job_dict: dict[str, dict[str, typing.Any]], drop_nulls: bool = False, append_context=False, context_prefix='', context_sep='|', force_index=False) ‑> dict[str, dict[str, typing.Any]]

Flatten a summary dict for deduplication and merging.

Args

job_dict: summary dict to flatten.
drop_nulls: if True, drop keys with null values (see is_null()).
append_context: if True, extend keys with value context to prevent dict merges from overwriting data and allow for FieldManager operations against merged schedule/demo and PDF extracted data.
context_prefix: prefix to add to each context string if not already present.
context_sep: separator between key and context string.
force_index: if True, override table indices with list indices. Set if flattening a summary that's already been properly deduped to avoid improper incrementing of list indices.

Returns

dict[str, dict[str, Any]]: flattened summary dict.

def is_address(addr_str: str) ‑> bool

test string for valid address

def is_null(val: Any) ‑> bool

Return True if val is None or a string containing a null value.

Args

val: value to test.

Returns

bool: True if val is not a verified or non-null original value vStr and str(val).lower() is "false", "none", "–", or "null".

def joined_set(element_list: collections.abc.Sequence[str | vStr], sep: str = '; ') ‑> vStr

self explanatory

def json_pair(key: str, value: str | vStr | list[str | vStr]) ‑> str | vStr

convenience method to return a json escaped key value pair

def latest_date(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

self explanatory

def latest_datetime(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

self explanatory

def latest_time(time_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

self explanatory

def most_freq_name_element(name_candidates_list: collections.abc.Sequence[str | vStr], element: str) ‑> str | vStr

Find the most frequent value in a list of name candidates and return the specified element of the HumanName object in uppercase.

This function determines the most frequent name from a list of name candidates and returns the specified element (e.g., "first", "last", "full_name") of the HumanName object in uppercase. The function uses a tiebreaker to handle cases where multiple names have the same frequency.

Args

name_candidates_list : Sequence[str | vStr]: A list of name candidates.
element : str: The element of the HumanName object to return (e.g., "first", "last", "full_name").

Returns

str | vStr: The most frequent name element in uppercase, preserving vStr attributes if present.

Example

>>> from nameparser import HumanName
>>> name_candidates = ["John Doe", "Jane Doe", "John Smith"]
>>> most_freq_name_element(name_candidates, "first")
'JOHN'

def nest_from_jmespath_keys(flat_dict: dict[str, typing.Any], dedup_lists: bool = True) ‑> dict[str, typing.Any]

Produce nested JSON output from a flat dictionary with jmespath keys.

Args

flat_dict : dict[str, Any]: A dictionary with jmespath query keys and their corresponding values.
dedup_lists : bool: Optional. If True, deduplicates lists within the nested structure. Defaults to True.

Returns

dict[str, Any]: A nested dictionary representation of the input flat dictionary.

Example

>>> flat_dict = {"schedule.asa": "2", "input.entities[0].text": "test text"}
>>> nest_from_jmespath_keys(flat_dict)
{'input': {'entities': [{'text': 'test text'}]}, 'schedule': {'asa': '2'}}

def note_split(table_name: str, note: str, is_procedure=False, debug=False) ‑> types.SimpleNamespace

Split a note string into a namespace having the requisite attributes for an input->'entities' entry.

This function processes a note string, splitting it into its components such as text, note type, and note header. It handles both general notes and procedure notes, formatting them appropriately. NOTE: Appropriate for use as the split_func parameter of a split_key operation in a specs.TransformSpec.

Args

table_name : str: The name of the table containing the note.
note : str: The note string to be split.
is_procedure : bool: Optional. Indicates if the note is a procedure note. Defaults to False.
debug : bool: Optional. Enable debug logging. Defaults to False.

Returns

SimpleNamespace: A namespace containing the split components of the note.

Example

>>> note = "Header\nLine 1\nLine 2\nProcedure: Description"
>>> result = note_split("NotesTable", note, is_procedure=True)
>>> result.noteHeader
'Procedure - Description'
>>> result.text
'Header\nLine 1\nLine 2'
>>> result.noteType
'AnesProcNotes'

def note_type(table_name: str, first_line: str) ‑> str

Set the GUI noteType based on the section title. See gvars.TITLE_TO_NOTE_TYPE_MAP.

Args

table_name : str: The name of the table containing the note.
first_line : str: The first line of the note.

Returns

str: The determined note type.

Example

>>> table_name = "OpNotes"
>>> first_line = "Op Note: Procedure details"
>>> note_type(table_name, first_line)
'OpNote'

def ordered_function_calls(transform_function_dict)

refer to TableTransformer.apply_transforms() in table_transformer.py.

def phone_split(table_name: str, phone_string: str, *, class_dict: dict[str, str] | None = None, debug=False) ‑> types.SimpleNamespace

Split a phone string into home, work, and mobile phone numbers.

This function processes a phone string and splits it into separate home, work, and mobile phone numbers. It uses a provided class dictionary to customize the keys for the returned SimpleNamespace object. If no class dictionary is provided, default keys are used. NOTE: Appropriate for use as the split_func parameter of a split_key operation in a specs.TransformSpec.

Args

table_name : str: The name of the table containing the phone string.
phone_string : str: The phone string to be split.
class_dict : dict[str, str] | None, optional: A dictionary to customize the keys for the returned SimpleNamespace object. Defaults to None.
debug : bool: Optional. Enable debug logging. Defaults to False.

Returns

SimpleNamespace: A namespace with attributes for home, work, and mobile phone

numbers.

Example

>>> phone_string = "Home: 843-521-1000; Work: 843-521-1001; Mobile: 843-521-1002"
>>> class_dict = {
...     "home_phone": "Home: {}",
...     "work_phone": "Work: {}",
...     "mobile_phone": "Mobile: {}",
... }
>>> result = phone_split("PhoneTable", phone_string, class_dict=class_dict)
>>> result.home_phone
'Home: +1 843-521-1000'
>>> result.work_phone
'Work: +1 843-521-1001'
>>> result.mobile_phone
'Mobile: +1 843-521-1002'

def proc_data_split(table_name: str, proc_data_json: str, debug: bool = False) ‑> types.SimpleNamespace

Splits procedure data into table entries compatible with Responsible Staff.

This function processes a JSON string containing procedure data and splits it into a dictionary of table entries that are compatible with the Responsible Staff format. It handles various keys related to providers and times, and ensures that the output dictionary is properly formatted. NOTE: Appropriate for use as the split_func parameter of a split_key operation in a specs.TransformSpec.

Args

table_name : str: The name of the table being processed.
proc_data_json : str: A JSON string containing the procedure data.
debug : bool: Optional. If True, debug information will be logged. Defaults to False.

Returns

SimpleNamespace: A namespace with attributes containing the processed provider data.

Example

>>> table_name = "ProcedureTable"
>>> proc_data_json = (
...     '{"PerformedBy": "Dr. Smith, MD", '
...     '"AnesStartTime": "08:00", '
...     '"AnesEndTime": "10:00"}'
... )
>>> result = proc_data_split(table_name, proc_data_json)
>>> result.AnPrvNm1
'Dr. Smith, MD'
>>> result.AnPrvStrtTm1
'08:00'
>>> result.AnPrvEndTm1
'10:00'

def select_datetime_from_list(date_list, earliest=True, date_only=False)

supports recursive lists of date, datetime, and most date string formats

def select_time_from_list(time_list, earliest=True)

get earliest or latest time from a list of strings interpretable as time values.

def standardize_field_value(field_value: str | vStr, field_name: str, standard_map: StandardMap, exempt_checks: collections.abc.Sequence[collections.abc.Callable[[str], bool]] = (), split_regex: str = '[\\s,;\\+\\\\/%\\$\\^\\*\\.:\'\\"\\-><`]+', min_token_length: int = 2) ‑> str | vStr

Standardize a field value based on a standard_map.

Args

field_value : str | vStr: value to standardize
field_name : str: name of field to standardize. used for logging.
standard_map : StandardMap: a mapping between official values and lists of keywords. The standard value order determines the priority of assignment.
exempt_checks : Sequence[Callable[[str], bool]]: return raw from lookup if any check(raw) returns true. Default is (). See _tokenized_lookup() for more info.
split_regex : str: poor man's tokenizer. See tokenized_lookup() for more info.
min_token_length : int: forces direct 'raw' matches for mapping values less than a cutoff. Default is 2. Avoids matches between, e.g., a mapping '1' and a cluttered raw value like 'N/A\nJunky Field 1\nJunky Data'.

Returns

str | vStr: The standardized value.

def stripped_first_element(element_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr

return first element from list after calling strip(",- /.")

def tokenized_lookup(raw: str, standard_map: StandardMap, split_regex: str = '[\\s,;\\+\\\\/%\\$\\^\\*\\.:\'\\"\\-><`]+', min_token_length: int = 2) ‑> str

Given a dict with standard field values as keys and a list of mapped tokens as values, return the most likely matched standard field value for the raw value.

Args

raw : str: value to standardize
standard_map : StandardMap: map of standard values to tokens
split_regex : str: regex used to tokenize raw value. Default is "[\s,;+\/%\$\^*.:'"-><`]+".

Returns

the first standard value whose map contains a hard match, i.e. where raw.lower() is a member of the map.
the first standard value whose map includes any token from the list of tokens produced by applying split_regex to the raw value.
the first standard value whose map includes "*".
an empty string

def unflatten(flat_dict: dict[str, typing.Any], drop_nulls: bool = False, level: int = 0) ‑> dict[str, typing.Any]

Given a dict of {jmespath_pattern: value}, return its nested form.

Args

flat_dict: flattened json object.
drop_nulls: if True, drop keys with null values (see is_null()).
level: the level at which the unflatten operation is applied, e.g. 1 for a 'flattened_summary' as returned by the eponymous function.

Returns

dict[str, Any]: nested json object.

def unq_clean_key(value_dict: dict[str, typing.Any] = None, this_key: str = 'key') ‑> str

return a new unique key for value_dict containing no illegal json key chars

Classes

class ExplodeSpec (target_path: str, source_key_key: str = 'key', source_value_key: str = 'value', source_key_xform: collections.abc.Callable[[str], str] = <function ExplodeSpec.<lambda>>, source_value_xform: collections.abc.Callable[[str], str] = <function ExplodeSpec.<lambda>>, add_keys: dict[str, collections.abc.Callable[[dict[str, typing.Any]], str]] = <factory>)

Defines which keys to explode and how to explode them in a nested object.

Args

target_path : str: path to the nested object whose children will be exploded
source_key_key : str: key that will be set to xformed source key in the exploded object
source_value_key : str: key that will be set to xformed source value in the exploded obj
source_key_xform : Callable[[str], str]: reformat function for source keys
source_value_xform : Callable[[str], str]: reformat function for source values
add_keys : dict[str, Callable[[dict[str, Any]], str]]: keys to add to the exploded object

Class variables

var add_keys : dict[str, collections.abc.Callable[[dict[str, typing.Any]], str]]
var source_key_key : str
var source_value_key : str
var target_path : str

Methods

def source_key_xform(x) ‑> collections.abc.Callable[[str], str]
def source_value_xform(x) ‑> collections.abc.Callable[[str], str]

class KeyTuple (source_key: str, dest_key: str, xform: collections.abc.Callable[[str], str | vStr] = utilities.v_str.vStr)

A tuple of source / destination keys with an optional value transform.

Args

source_key : str: original key name
dest_key : str: new key name
xform : Callable[[str], str | vStr]: optional value transform function.

Ancestors

builtins.tuple

Instance variables

var dest_key : str: Alias for field number 1
var source_key : str: Alias for field number 0
var xform : collections.abc.Callable[[str], str | vStr]: Alias for field number 2

class StandardMap (mapping: dict[str, collections.abc.Collection[str]])

A dictionary for use as a standard_map in a call to standardize_field_value()

Performs post processing to set the case of all mapping values to lower(), add the standard value itself to each mapping, and convert the mappings themselves to sets for faster lookup.

Ancestors

builtins.dict