Module matchops.constructor_tools
generic specs and func defs for data matching operations
Functions
def add_column_range(new_column_template: str, source_columns_template: list[str], apply_function: collections.abc.Callable[[str, pandas.core.series.Series], str], rng: range) ‑> list[AddColumn]
-
add an AddColumn definition for a range of array keys
Args
new_column_template
:str
- string formattable with array index
source_columns_template
:list[str]
- list of strings as above
apply_function
:Callable[[str, pd.Series], str]
- first arg is
- array index. 2nd is standard row (aka series) of source columns.
rng
:range
- range object containing indexes to build columns for
Returns
list[AddColumn]
- AddColumn list for all indexes in rng
def anes_staff_w_npi_columns(md_name_column: str, md_npi_column: str, staff_column: str) ‑> list[AddColumn]
-
generate columns for populating anesthesiaStaff rows from data in csv/tsv e.g. using "Anesthesiologist", "MDA NPI", and "Anesthesia Staff" columns from msa_bmd
Args
md_name_column
:str
- column containing the anesthesiologist name
md_npi_column
:str
- column containing the anesthesiologist NPI
staff_column
:str
- column containing multiline text representation of
all provider times
Returns
list[AddColumn]
- list of new columns to fully populate
schedule.anesthesiaStaff. column names will map directly to standard flat output keys (e.g. schedule.anesthesiaStaff[0].provider), so mappings.py entries are not required.
def full_name_column(first_column: str, middle_column: str, last_column: str, new_column_name: str, suffix_column: str = None) ‑> AddColumn
-
given the names of columns for first, middle, and last name, return properly formatted full name
Args
first_column
:str
- column name that contains first name
last_column
:str
- column name that contains last name
middle_column
:str
- column name that contains middle name
new_column_name
:str
- name of the column to be added
suffix_column
:str | None
- optional column containing suffix
Returns
AddColumn
- AddColumn object to create full name
def graphium_composite_columns(hank_key_template_map: str, source_column: str, repeat_count: int) ‑> list[AddColumn]
-
Create a series of add columns for a graphium repeating composite field.
Args
hank_key_template_map
:str
- a string containing a series of Hank keys
- separated by "^" that map to the composite field's subfields. Each key
- should contain a pair of curly braces to be formatted with the repeat
- index (e.g. "schedule.anesthesiaStaff[{}].provider").
source_column
:str
- the column containing the composite field data.
repeat_count
:int
- the number of times the composite field repeats.
Example
When unpacked in a DataFrameConstructor's add_columns parameter, the output from the following implementation:
graphium_composite_columns( hank_key_template_map=( "schedule.anesthesiaStaff[{}].provider" "^schedule.anesthesiaStaff[{}].provider" "^schedule.anesthesiaStaff[{}].providerNPI" "^" "^schedule.anesthesiaStaff[{}].role" "^schedule.anesthesiaStaff[{}].startTime" "^schedule.anesthesiaStaff[{}].startTime" "^schedule.anesthesiaStaff[{}].endTime" "^schedule.anesthesiaStaff[{}].endTime" ), source_column="AnesthesiaStaff", repeat_count=2, )
will add the provider, providerNPI, role, startTime, and endTime columns for schedule.anesthesiaStaff[0] and schedule.anesthesiaStaff[1]. Increaserepeat_count
to add an arbitrary number of such columns.Returns
list[AddColumn]
- as described above.
def hl7_joins_for_schedule_columns(patient_name: str, dos: str, mrn: str, dob: str, accn: str) ‑> list[JoinSpec]
-
get standard joins for hl7 demographic merge using supplied schedule column names for dos (date of service), mrn (medical record number), dob (date of birth) and accn (hospital account number)
def split_address_columns(source_column: str, prefix='') ‑> list[AddColumn]
-
returns a list of add columns for all address subfields from a source column containing a full address.
Args
source_column
:str
- column name containing full address
prefix
:str
- Optional. prefix to add to new column names. Defaults to "". Set to a valid "address" prefix (e.g. "patient_info.patient.address.") to return columns with valid summary_spec keys.
See transform_utils.address_split for details.
def split_name_columns(source_column: str, jmespath_keys=False) ‑> list[AddColumn]
-
returns a list of AddColumns representing the first, last, and middle name elements extracted from the value contained in source_column. new columns will be of form f'{source_column} {[element]}' where element is one of ["First", "Middle", "Last"], e.g. "Patient Name First".
def split_phone_columns(source_column: str, class_dict=None, prefix='') ‑> list[AddColumn]
-
returns a list of add columns for homePhone, workPhone, and mobilePhone based on the value of a consolidated source_column.
Example: given consolidated source_column value: 'H: 222-222-2222
W: 888-888-8888 M: 555-555-5555' returns new columns equivalent to: { "homePhone": "+1 222-222-2222", "workPhone": "+1 888-888-8888", "mobilePhone": "+1 555-555-5555", }
Args: source_column (str): column name containing consolidated phone numbers class_dict (dict[str, str]): Optional. dictionary mapping phone class to format string. Defaults to None. prefix (str): Optional. prefix to add to new column names. Defaults to "". Set to a valid "person" prefix (e.g. "patient_info.patient.") to return columns with valid summary_spec keys. See transform_utils.phone_split for details.
def standard_demographics_constructor(split_name_column_list: list[str], split_phone_column_list: list[str], add_columns: list[AddColumn], post_process: collections.abc.Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]) ‑> DataFrameConstructor
-
used for source CSVs following the Hank AI CSV field standard
Args
split_name_column_list
:list[str]
- list of *.fullName columns to split
split_phone_column_list
:list[str]
- list of *.homePhone columns to split
add_columns
:list[AddColumn]
- additional implementation specific AddColumns
post_process
:Callable[[pd.DataFrame], pd.DataFrame]
- standard post_process
Returns
DataFrameConstructor
- demographics dataframe constructor