Module utilities.transform_utils
utility functions used by: table_transformer.py transform_specs.py summary_specs.py
Functions
def address_parser(address_string: str, tag_mapping: collections.abc.Mapping[str, str] | None = None) ‑> dict[str, str]
-
given a string containing an address, return a dict with keys from tag_mapping value set
Args
address_string
:str
- string containing a full address
tag_mapping
:Mapping[str, str]
- Optional. map from usaddress builtin tags
to user specified output tags. Defaults to gvars.USADDRESS_MAPPING.
Returns
dict[str, str]
- dict containing all keys in the supplied tag_mapping.
def address_split(table_name: str, full_address: str, *, key_prefix: str = '', key_suffix: str = '', debug=False) ‑> types.SimpleNamespace
-
Given a full address (e.g., "102 Maple Ln, Leesburg, GA 31763"), return a simple namespace with attributes street1, street2, city, state, and zip.
This function parses a full address string and returns a SimpleNamespace object with the parsed address components. It handles special cases such as addresses in Hawaii and formats ZIP codes appropriately. Optional key prefixes and suffixes can be added to the returned namespace attributes. NOTE: Appropriate for use as the
split_func
parameter of asplit_key
operation in aspecs.TransformSpec
.Args
table_name
:str
- Name of the table containing the address.
full_address
:str
- Full address string.
key_prefix
:str
- Optional. Prefix to add to all keys in the returned namespace. Must be valid to appear in a Python identifier if supplied. Defaults to "".
key_suffix
:str
- Optional. Suffix to add to all keys in the returned namespace. Must be valid to appear in a Python identifier if supplied. Defaults to "".
debug
:bool
- Optional. Enable debug logging. Defaults to False.
Returns
SimpleNamespace
- A namespace with attributes street1, street2, city, state, and zip.
Example
>>> address = "102 Maple Ln, Leesburg, GA 31763" >>> result = address_split("AddressTable", address) >>> result.street1 '102 MAPLE LN' >>> result.city 'LEESBURG' >>> result.state 'GA' >>> result.zip '31763'
def contextualize_summary(flat_summary: dict[str, dict[str, vStr]], drop_nulls: bool = False, context_prefix='', context_sep='|', force_index=False) ‑> dict[str, dict[str, vStr]]
-
Prep a deduped summary for additional merge/dedup operations. Extends keys in all sub-dicts with contexts from their vStr values.
Args
flat_summary
- flattened, deduped summary dict.
drop_nulls
- if True, drop keys with null values (see is_null()).
context_prefix
- prefix to add to each context string if not already present.
context_sep
- separator between key and context string.
force_index
- if True, override table indices with list indices. Set if contextualizing a summary that's already been properly deduped to avoid improper incrementing of list indices.
Returns
dict[str, dict[str, Any]]
- contextualized summary dict.
def dedup_dict_lists(nest_dict)
-
Remove duplicates from a list of dictionaries or a nested dictionary containing lists of dictionaries.
This function recursively processes a nested dictionary or a list of dictionaries, removing duplicate dictionaries based on their key-value pairs. It handles nested structures and ensures that only unique dictionaries are retained. The function skips deduplication for the key "custom_fields".
Args
nest_dict
:dict | list
- A nested dictionary or a list of dictionaries to deduplicate.
Returns
dict | list
- A deduplicated nested dictionary or list of dictionaries.
Example
>>> nest_dict = { ... "key1": [{"a": "1", "b": "2"}, {"a": "1", "b": "2"}], ... "key2": {"subkey": [{"c": "3"}, {"c": "3"}]}, ... "custom_fields": [{"d": "4"}, {"d": "4"}] ... } >>> dedup_dict_lists(nest_dict) {'key1': [{'a': '1', 'b': '2'}], 'key2': {'subkey': [{'c': '3'}]}, 'custom_fields': [{'d': '4'}, {'d': '4'}]}
def dedup_list_of_dict_by_keys(dict_list: list[dict]) ‑> list[dict]
-
Remove duplicates from a list of dictionaries based on their flattened key-value pairs.
This function takes a list of dictionaries and removes duplicates by comparing their flattened key-value pairs. It handles cases where the dictionaries have nested structures and ensures that only unique dictionaries are returned. If the list contains only one dictionary or if the dictionaries contain the key "inputType", no deduplication is performed.
Args
dict_list
:list[dict]
- A list of dictionaries to deduplicate.
Returns
list[dict]
- A list of unique dictionaries.
Example
>>> dict_list = [ ... {"a": "1", "b": {"c": "2"}}, ... {"a": "1", "b": {"c": "2"}}, ... {"a": "3", "b": {"d": "4"}} ... ] >>> dedup_list_of_dict_by_keys(dict_list) [{'a': '1', 'b': {'c': '2'}}, {'a': '3', 'b': {'d': '4'}}]
def earliest_date(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
self explanatory
def earliest_datetime(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
self explanatory
def earliest_time(time_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
self explanatory
def employer_split(table_name: str, employer_string: str, **kwargs) ‑> types.SimpleNamespace
-
Split an employer string into individual fields for name, address, and phone.
This function processes a single string containing employer information, which may include the employer's name, address, and phone number, and splits it into individual fields. It handles various delimiters and uses the
usaddress
library to parse the address. NOTE: Appropriate for use as thesplit_func
parameter of asplit_key
operation in aspecs.TransformSpec
.Args
table_name
:str
- The name of the table containing the employer string.
employer_string
:str
- The string containing employer information.
KwArgs
debug
:bool
- Optional. Enable debug logging. Defaults to False.
person_type
:str
- Optional. The type of person (e.g., "patient"). Defaults to "patient". Used as a prefix for all namespace attrs.
Returns
SimpleNamespace
- A namespace with attributes for the employer's name, address,
and phone number.
Example
>>> employer_string = "Acme Corp, 123 Main St, Anytown, USA; 843-563-4567" >>> result = employer_split("EmployerTable", employer_string) >>> result.patient_employer 'Acme Corp' >>> result.patient_employer_street1 '123 Main St' >>> result.patient_employer_city 'Anytown' >>> result.patient_employer_phone '+1 843-563-4567'
def explode_nested(nested: dict[str, typing.Any], explode_spec: ExplodeSpec) ‑> dict[str, typing.Any]
-
Explode an object in a nested dict into a list of dicts.
NOTE: updates
nested
in place.Args
nested
:dict[str, Any]
- nested dict containing the object to explode
explode_spec
:ExplodeSpec
- defines which and how to explode keys in a nested object
def extend_key_with_value_context(key: str, value: str | vStr, context_prefix='', context_sep='|', force_index=False) ‑> str
-
Extend flattened key with value context to prevent dict merges from overwriting data and allow for FieldManager operations against merged schedule/demo and PDF extracted data.
Args
key
:str
- flattened key
value
:vStr
- value to be merged
context_sep
:str
- Optional. separator between key and context. Defaults to "|".
force_index
- if True, override table index with list index. Set when working with a summary that's already been properly deduped to avoid improper incrementing of list indices.
Returns
str
- extended key
def extract_asa_status(candidates: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
Convert extracted ASA status values to a string representation of an integer.
This function processes a sequence of strings to find the numeric 'Physical Status' from a list of arbitrary extracted values. It converts Roman numeral ASA statuses to integer strings prior to selecting the most frequent element in the set of extracted candidates.
Args
candidates
:Sequence[str | vStr]
- A sequence of strings containing Roman numeral ASA statuses.
Returns
str | vStr
- The most frequent ASA status as a string representation of an
integer.
Example
>>> extract_asa_status(["II", "III", "IV", "2 - emergent"]) '2'
def flatten(nest: dict[str, typing.Any], drop_nulls: bool = False) ‑> dict[str, str | vStr]
-
Given a nested json object, return {jmespath_pattern: value} for all base values (i.e. values of type str, vStr, bool, int or float).
Args
nest
- nested json object to flatten.
drop_nulls
- if True, drop keys with null values (see is_null()).
Returns
dict[str, str | vStr]
- flattened json object.
def flatten_summary(job_dict: dict[str, dict[str, typing.Any]], drop_nulls: bool = False, append_context=False, context_prefix='', context_sep='|', force_index=False) ‑> dict[str, dict[str, typing.Any]]
-
Flatten a summary dict for deduplication and merging.
Args
job_dict
- summary dict to flatten.
drop_nulls
- if True, drop keys with null values (see is_null()).
append_context
- if True, extend keys with value context to prevent dict merges from overwriting data and allow for FieldManager operations against merged schedule/demo and PDF extracted data.
context_prefix
- prefix to add to each context string if not already present.
context_sep
- separator between key and context string.
force_index
- if True, override table indices with list indices. Set if flattening a summary that's already been properly deduped to avoid improper incrementing of list indices.
Returns
dict[str, dict[str, Any]]
- flattened summary dict.
def is_address(addr_str: str) ‑> bool
-
test string for valid address
def is_null(val: Any) ‑> bool
-
Return True if val is None or a string containing a null value.
Args
val
- value to test.
Returns
bool
- True if val is not a verified or non-null original value vStr and str(val).lower() is "false", "none", "–", or "null".
def joined_set(element_list: collections.abc.Sequence[str | vStr], sep: str = '; ') ‑> vStr
-
self explanatory
def json_pair(key: str, value: str | vStr | list[str | vStr]) ‑> str | vStr
-
convenience method to return a json escaped key value pair
def latest_date(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
self explanatory
def latest_datetime(date_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
self explanatory
def latest_time(time_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
self explanatory
def most_freq_name_element(name_candidates_list: collections.abc.Sequence[str | vStr], element: str) ‑> str | vStr
-
Find the most frequent value in a list of name candidates and return the specified element of the HumanName object in uppercase.
This function determines the most frequent name from a list of name candidates and returns the specified element (e.g., "first", "last", "full_name") of the HumanName object in uppercase. The function uses a tiebreaker to handle cases where multiple names have the same frequency.
Args
name_candidates_list
:Sequence[str | vStr]
- A list of name candidates.
element
:str
- The element of the HumanName object to return (e.g., "first", "last", "full_name").
Returns
str | vStr
- The most frequent name element in uppercase, preserving vStr attributes if present.
Example
>>> from nameparser import HumanName >>> name_candidates = ["John Doe", "Jane Doe", "John Smith"] >>> most_freq_name_element(name_candidates, "first") 'JOHN'
def nest_from_jmespath_keys(flat_dict: dict[str, typing.Any], dedup_lists: bool = True) ‑> dict[str, typing.Any]
-
Produce nested JSON output from a flat dictionary with jmespath keys.
Args
flat_dict
:dict[str, Any]
- A dictionary with jmespath query keys and their corresponding values.
dedup_lists
:bool
- Optional. If True, deduplicates lists within the nested structure. Defaults to True.
Returns
dict[str, Any]
- A nested dictionary representation of the input flat dictionary.
Example
>>> flat_dict = {"schedule.asa": "2", "input.entities[0].text": "test text"} >>> nest_from_jmespath_keys(flat_dict) {'input': {'entities': [{'text': 'test text'}]}, 'schedule': {'asa': '2'}}
def note_split(table_name: str, note: str, is_procedure=False, debug=False) ‑> types.SimpleNamespace
-
Split a note string into a namespace having the requisite attributes for an input->'entities' entry.
This function processes a note string, splitting it into its components such as text, note type, and note header. It handles both general notes and procedure notes, formatting them appropriately. NOTE: Appropriate for use as the
split_func
parameter of asplit_key
operation in aspecs.TransformSpec
.Args
table_name
:str
- The name of the table containing the note.
note
:str
- The note string to be split.
is_procedure
:bool
- Optional. Indicates if the note is a procedure note. Defaults to False.
debug
:bool
- Optional. Enable debug logging. Defaults to False.
Returns
SimpleNamespace
- A namespace containing the split components of the note.
Example
>>> note = "Header\nLine 1\nLine 2\nProcedure: Description" >>> result = note_split("NotesTable", note, is_procedure=True) >>> result.noteHeader 'Procedure - Description' >>> result.text 'Header\nLine 1\nLine 2' >>> result.noteType 'AnesProcNotes'
def note_type(table_name: str, first_line: str) ‑> str
-
Set the GUI noteType based on the section title. See gvars.TITLE_TO_NOTE_TYPE_MAP.
Args
table_name
:str
- The name of the table containing the note.
first_line
:str
- The first line of the note.
Returns
str
- The determined note type.
Example
>>> table_name = "OpNotes" >>> first_line = "Op Note: Procedure details" >>> note_type(table_name, first_line) 'OpNote'
def ordered_function_calls(transform_function_dict)
-
refer to TableTransformer.apply_transforms() in table_transformer.py.
def phone_split(table_name: str, phone_string: str, *, class_dict: dict[str, str] | None = None, debug=False) ‑> types.SimpleNamespace
-
Split a phone string into home, work, and mobile phone numbers.
This function processes a phone string and splits it into separate home, work, and mobile phone numbers. It uses a provided class dictionary to customize the keys for the returned SimpleNamespace object. If no class dictionary is provided, default keys are used. NOTE: Appropriate for use as the
split_func
parameter of asplit_key
operation in aspecs.TransformSpec
.Args
table_name
:str
- The name of the table containing the phone string.
phone_string
:str
- The phone string to be split.
class_dict
:dict[str, str] | None
, optional- A dictionary to customize the keys for the returned SimpleNamespace object. Defaults to None.
debug
:bool
- Optional. Enable debug logging. Defaults to False.
Returns
SimpleNamespace
- A namespace with attributes for home, work, and mobile phone
numbers.
Example
>>> phone_string = "Home: 843-521-1000; Work: 843-521-1001; Mobile: 843-521-1002" >>> class_dict = { ... "home_phone": "Home: {}", ... "work_phone": "Work: {}", ... "mobile_phone": "Mobile: {}", ... } >>> result = phone_split("PhoneTable", phone_string, class_dict=class_dict) >>> result.home_phone 'Home: +1 843-521-1000' >>> result.work_phone 'Work: +1 843-521-1001' >>> result.mobile_phone 'Mobile: +1 843-521-1002'
def proc_data_split(table_name: str, proc_data_json: str, debug: bool = False) ‑> types.SimpleNamespace
-
Splits procedure data into table entries compatible with Responsible Staff.
This function processes a JSON string containing procedure data and splits it into a dictionary of table entries that are compatible with the Responsible Staff format. It handles various keys related to providers and times, and ensures that the output dictionary is properly formatted. NOTE: Appropriate for use as the
split_func
parameter of asplit_key
operation in aspecs.TransformSpec
.Args
table_name
:str
- The name of the table being processed.
proc_data_json
:str
- A JSON string containing the procedure data.
debug
:bool
- Optional. If True, debug information will be logged. Defaults to False.
Returns
SimpleNamespace
- A namespace with attributes containing the processed provider data.
Example
>>> table_name = "ProcedureTable" >>> proc_data_json = ( ... '{"PerformedBy": "Dr. Smith, MD", ' ... '"AnesStartTime": "08:00", ' ... '"AnesEndTime": "10:00"}' ... ) >>> result = proc_data_split(table_name, proc_data_json) >>> result.AnPrvNm1 'Dr. Smith, MD' >>> result.AnPrvStrtTm1 '08:00' >>> result.AnPrvEndTm1 '10:00'
def select_datetime_from_list(date_list, earliest=True, date_only=False)
-
supports recursive lists of date, datetime, and most date string formats
def select_time_from_list(time_list, earliest=True)
-
get earliest or latest time from a list of strings interpretable as time values.
def standardize_field_value(field_value: str | vStr, field_name: str, standard_map: StandardMap, exempt_checks: collections.abc.Sequence[collections.abc.Callable[[str], bool]] = (), split_regex: str = '[\\s,;\\+\\\\/%\\$\\^\\*\\.:\'\\"\\-><`]+', min_token_length: int = 2) ‑> str | vStr
-
Standardize a field value based on a standard_map.
Args
field_value
:str | vStr
- value to standardize
field_name
:str
- name of field to standardize. used for logging.
standard_map
:StandardMap
- a mapping between official values and lists of keywords. The standard value order determines the priority of assignment.
exempt_checks
:Sequence[Callable[[str], bool]]
- return raw from lookup if any check(raw) returns true. Default is (). See _tokenized_lookup() for more info.
split_regex
:str
- poor man's tokenizer. See tokenized_lookup() for more info.
min_token_length
:int
- forces direct 'raw' matches for mapping values less than a cutoff. Default is 2. Avoids matches between, e.g., a mapping '1' and a cluttered raw value like 'N/A\nJunky Field 1\nJunky Data'.
Returns
str | vStr
- The standardized value.
def stripped_first_element(element_list: collections.abc.Sequence[str | vStr]) ‑> str | vStr
-
return first element from list after calling strip(",- /.")
def tokenized_lookup(raw: str, standard_map: StandardMap, split_regex: str = '[\\s,;\\+\\\\/%\\$\\^\\*\\.:\'\\"\\-><`]+', min_token_length: int = 2) ‑> str
-
Given a dict with standard field values as keys and a list of mapped tokens as values, return the most likely matched standard field value for the raw value.
Args
raw
:str
- value to standardize
standard_map
:StandardMap
- map of standard values to tokens
split_regex
:str
- regex used to tokenize raw value. Default is "[\s,;+\/%\$\^*.:'"-><`]+".
Returns
- the first standard value whose map contains a hard match, i.e. where raw.lower() is a member of the map.
- the first standard value whose map includes any token from the list of tokens produced by applying split_regex to the raw value.
- the first standard value whose map includes "*".
- an empty string
def unflatten(flat_dict: dict[str, typing.Any], drop_nulls: bool = False, level: int = 0) ‑> dict[str, typing.Any]
-
Given a dict of {jmespath_pattern: value}, return its nested form.
Args
flat_dict
- flattened json object.
drop_nulls
- if True, drop keys with null values (see is_null()).
level
- the level at which the unflatten operation is applied, e.g.
1
for a 'flattened_summary' as returned by the eponymous function.
Returns
dict[str, Any]
- nested json object.
def unq_clean_key(value_dict: dict[str, typing.Any] = None, this_key: str = 'key') ‑> str
-
return a new unique key for value_dict containing no illegal json key chars
Classes
class ExplodeSpec (target_path: str, source_key_key: str = 'key', source_value_key: str = 'value', source_key_xform: collections.abc.Callable[[str], str] = <function ExplodeSpec.<lambda>>, source_value_xform: collections.abc.Callable[[str], str] = <function ExplodeSpec.<lambda>>, add_keys: dict[str, collections.abc.Callable[[dict[str, typing.Any]], str]] = <factory>)
-
Defines which keys to explode and how to explode them in a nested object.
Args
target_path
:str
- path to the nested object whose children will be exploded
source_key_key
:str
- key that will be set to xformed source key in the exploded object
source_value_key
:str
- key that will be set to xformed source value in the exploded obj
source_key_xform
:Callable[[str], str]
- reformat function for source keys
source_value_xform
:Callable[[str], str]
- reformat function for source values
add_keys
:dict[str, Callable[[dict[str, Any]], str]]
- keys to add to the exploded object
Class variables
var add_keys : dict[str, collections.abc.Callable[[dict[str, typing.Any]], str]]
var source_key_key : str
var source_value_key : str
var target_path : str
Methods
def source_key_xform(x) ‑> collections.abc.Callable[[str], str]
def source_value_xform(x) ‑> collections.abc.Callable[[str], str]
class KeyTuple (source_key: str, dest_key: str, xform: collections.abc.Callable[[str], str | vStr] = utilities.v_str.vStr)
-
A tuple of source / destination keys with an optional value transform.
Args
source_key
:str
- original key name
dest_key
:str
- new key name
xform
:Callable[[str], str | vStr]
- optional value transform function.
Ancestors
- builtins.tuple
Instance variables
var dest_key : str
-
Alias for field number 1
var source_key : str
-
Alias for field number 0
var xform : collections.abc.Callable[[str], str | vStr]
-
Alias for field number 2
class StandardMap (mapping: dict[str, collections.abc.Collection[str]])
-
A dictionary for use as a standard_map in a call to standardize_field_value()
Performs post processing to set the case of all mapping values to lower(), add the standard value itself to each mapping, and convert the mappings themselves to sets for faster lookup.
Ancestors
- builtins.dict