Module `matchops.constructor_tools`

generic specs and func defs for data matching operations

Functions

def add_column_range(new_column_template: str, source_columns_template: list[str], apply_function: collections.abc.Callable[[str, pandas.core.series.Series], str], rng: range) ‑> list[AddColumn]

add an AddColumn definition for a range of array keys

Args

new_column_template : str: string formattable with array index
source_columns_template : list[str]: list of strings as above
apply_function : Callable[[str, pd.Series], str]: first arg is
array index. 2nd is standard row (aka series) of source columns.
rng : range: range object containing indexes to build columns for

Returns

list[AddColumn]: AddColumn list for all indexes in rng

def anes_staff_w_npi_columns(md_name_column: str, md_npi_column: str, staff_column: str) ‑> list[AddColumn]

generate columns for populating anesthesiaStaff rows from data in csv/tsv e.g. using "Anesthesiologist", "MDA NPI", and "Anesthesia Staff" columns from msa_bmd

Args

md_name_column : str: column containing the anesthesiologist name
md_npi_column : str: column containing the anesthesiologist NPI
staff_column : str: column containing multiline text representation of

all provider times

Returns

list[AddColumn]: list of new columns to fully populate

schedule.anesthesiaStaff. column names will map directly to standard flat output keys (e.g. schedule.anesthesiaStaff[0].provider), so mappings.py entries are not required.

def full_name_column(first_column: str, middle_column: str, last_column: str, new_column_name: str, suffix_column: str = None) ‑> AddColumn

given the names of columns for first, middle, and last name, return properly formatted full name

Args

first_column : str: column name that contains first name
last_column : str: column name that contains last name
middle_column : str: column name that contains middle name
new_column_name : str: name of the column to be added
suffix_column : str | None: optional column containing suffix

Returns

AddColumn: AddColumn object to create full name

def graphium_composite_columns(hank_key_template_map: str, source_column: str, repeat_count: int) ‑> list[AddColumn]

Create a series of add columns for a graphium repeating composite field.

Args

hank_key_template_map : str: a string containing a series of Hank keys
separated by "^" that map to the composite field's subfields. Each key
should contain a pair of curly braces to be formatted with the repeat
index (e.g. "schedule.anesthesiaStaff[{}].provider").
source_column : str: the column containing the composite field data.
repeat_count : int: the number of times the composite field repeats.

Example

When unpacked in a DataFrameConstructor's add_columns parameter, the output from the following implementation: graphium_composite_columns( hank_key_template_map=( "schedule.anesthesiaStaff[{}].provider" "^schedule.anesthesiaStaff[{}].provider" "^schedule.anesthesiaStaff[{}].providerNPI" "^" "^schedule.anesthesiaStaff[{}].role" "^schedule.anesthesiaStaff[{}].startTime" "^schedule.anesthesiaStaff[{}].startTime" "^schedule.anesthesiaStaff[{}].endTime" "^schedule.anesthesiaStaff[{}].endTime" ), source_column="AnesthesiaStaff", repeat_count=2, ) will add the provider, providerNPI, role, startTime, and endTime columns for schedule.anesthesiaStaff[0] and schedule.anesthesiaStaff[1]. Increase repeat_count to add an arbitrary number of such columns.

Returns

list[AddColumn]: as described above.

def hl7_joins_for_schedule_columns(patient_name: str, dos: str, mrn: str, dob: str, accn: str) ‑> list[JoinSpec]

get standard joins for hl7 demographic merge using supplied schedule column names for dos (date of service), mrn (medical record number), dob (date of birth) and accn (hospital account number)

def split_address_columns(source_column: str, prefix='') ‑> list[AddColumn]

returns a list of add columns for all address subfields from a source column containing a full address.

Args

source_column : str: column name containing full address
prefix : str: Optional. prefix to add to new column names. Defaults to "". Set to a valid "address" prefix (e.g. "patient_info.patient.address.") to return columns with valid summary_spec keys.

See transform_utils.address_split for details.

def split_name_columns(source_column: str, jmespath_keys=False) ‑> list[AddColumn]

returns a list of AddColumns representing the first, last, and middle name elements extracted from the value contained in source_column. new columns will be of form f'{source_column} {[element]}' where element is one of ["First", "Middle", "Last"], e.g. "Patient Name First".

def split_phone_columns(source_column: str, class_dict=None, prefix='') ‑> list[AddColumn]

returns a list of add columns for homePhone, workPhone, and mobilePhone based on the value of a consolidated source_column.

Example:
    given consolidated source_column value:
        'H: 222-222-2222

W: 888-888-8888 M: 555-555-5555' returns new columns equivalent to: { "homePhone": "+1 222-222-2222", "workPhone": "+1 888-888-8888", "mobilePhone": "+1 555-555-5555", }

Args:
    source_column (str): column name containing consolidated phone numbers
    class_dict (dict[str, str]): Optional. dictionary mapping phone class
        to format string. Defaults to None.
    prefix (str): Optional. prefix to add to new column names. Defaults to "".
        Set to a valid "person" prefix (e.g. "patient_info.patient.") to
        return columns with valid summary_spec keys.

See transform_utils.phone_split for details.

def standard_demographics_constructor(split_name_column_list: list[str], split_phone_column_list: list[str], add_columns: list[AddColumn], post_process: collections.abc.Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]) ‑> DataFrameConstructor

used for source CSVs following the Hank AI CSV field standard

Args

split_name_column_list : list[str]: list of *.fullName columns to split
split_phone_column_list : list[str]: list of *.homePhone columns to split
add_columns : list[AddColumn]: additional implementation specific AddColumns
post_process : Callable[[pd.DataFrame], pd.DataFrame]: standard post_process

Returns

DataFrameConstructor: demographics dataframe constructor