Module `utilities.managed_fields`

FieldManager and ManagedField class and related utilities. See specs/_bases/_fields.py for examples of well commented, globally applicable ManagedField and ContextRule definitions.

Classes

class ContextRule (title: ForwardRef('str'), controller_keys: ForwardRef('list[str]'), patterns: ForwardRef('list[re.Pattern]'), applied_context: ForwardRef('Sequence[str]'), is_active: ForwardRef('Callable[[KeyGroups], Any]'), candidate_filter: ForwardRef('Callable[[KeyGroups, Any], KeyGroups]') = <function ContextRule.<lambda>>)

Conditionally modify the preferred point of origin (context) and/or manipulate the list of candidate values for all standard keys matched by a list of regular expressions. See FieldManager._apply_context_rules() for implementation. See specs/_bases/_fields.py for examples of well commented, globally applicable ContextRules that reduce DB/CSV and PDF/DB/CSV summaries. These global context rules are available from the specs subpackage, i.e.: import specs specs.CSV_DB_CONTEXT_RULE # context rule for CSV/DB summaries

Args

title : str: a descriptive name used for logging and maintainability purposes.
controller_keys : list[str]: A list of standard keys whose standard_key_groups entries will determine if the rule is activated.
patterns : list[re.Pattern]: A list of regular expressions that determine the set of standard keys who are subject to the rule's actions.
applied_context : list[str]: the new context_priority value for matched keys if the supplied is_active function returns True (boolean type only).
is_active : Callable[[KeyGroups], Any]: A function that takes the subset of the FieldManager's standard_key_groups that matches the rule's controller_keys and determines if the rule is in effect (bool) or an alternative input for a custom candidate_filter in advanced applications.
candidate_filter : Callable[[KeyGroups, Any], KeyGroups]: A function that takes the subset of the FieldManager's standard_key_groups that matches the rule's keys and the result from the is_active func and returns the subset of the FieldManager's standard_key_groups that are eligible to appear in the output. Optional. Defaults to a null operation (i.e. lambda x, _: x).

Ancestors

builtins.tuple

Instance variables

var applied_context : collections.abc.Sequence[str]: Alias for field number 3
var candidate_filter : collections.abc.Callable[[dict[str, dict[str, str | vStr]], typing.Any], dict[str, dict[str, str | vStr]]]: Alias for field number 5
var controller_keys : list[str]: Alias for field number 1
var is_active : collections.abc.Callable[[dict[str, dict[str, str | vStr]]], typing.Any]: Alias for field number 4
var patterns : list[re.Pattern]: Alias for field number 2
var title : str: Alias for field number 0

class FieldManager (managed_fields: dict[str, ManagedField], standard_keys: list[str], context_rules: Sequence[ContextRule] = (), context_sep: str = '|', output_dir: str | None = None, patient_id: str = 'Unknown', raw_summary: dict[str, vStr] = <factory>)

Apply the settings defined in the ManagedField objects to the data in a flattened summary dictionary from the output of a TableTransformer instance.

Args

managed_fields : dict[str, ManagedField]: A dictionary mapping keys to ManagedField objects.
standard_keys : list[str]: A list of standard keys used in the data typically defined in a summary_spec (see specs/summary_specs.py and table_transformer.py).
context_rules : Sequence[ContextRule]: A list of context rules to apply to the data. ContextRules allow for conditional modification of the preferred point of origin (context) and/or manipulation of the candidate values for a standard key subset.
context_sep : str: A string used to separate keys and their context. Defaults to "|". Example key: "schedule.startTime|CSV|0".
output_dir : str | None: An optional file path prefix (end in "/" to target a folder) If supplied, f"{output_dir}{patient_id}_managed_fields.json" will be created after calling the reduce() method.
patient_id : str: an identifier for the patient being processed. Used in debug output filenames and logging. Defaults to 'Unknown'.
raw_summary : dict[str, vStr]: A dict containing raw summary data for one patient. Defaults to an empty dict to facilitate templating (see __call__() docstring)

Attributes

standard_key_groups : KeyGroups: A mapping between standard keys and groups of corresponding context bearing candidate key value pairs. Has form {'<standard key>': {'<standard key><sep><context>': '<value>', ...}, ...}
context_groups : dict[str, KeyGroups]: The inverse of standard_key_groups (kind of). Groups raw data by context. Initialized as a defaultdict(dict). Used in list and grouped element merge operations. See _GROUPED_ELEMENTS. Has form {'<parent>': {'<context>': {'<final element>': '<value>', ...}, ...}, ...}.
context_group_priorities : dict[str, list[str]]: Records the _group_context of the first candidate selected for a standard_key in a context group to ensure that the same context is prioritized for subsequent group members.
list_manager : dict[str, list[tuple[int, int]]]: init==False. A dictionary used to track the output list indexes of "array type" standard keys.
output : dict[str, vStr]: init==False. A dictionary containing the reduced output. Empty until the reduce() method is called.

During post intialization, the supplied raw_summary (dict[str, vStr]) is compiled into standard_key_groups (dict[str, dict[str, vStr]]) where the keys of raw_summary appear as subkeys within standard_key_groups with standard_keys members as parent keys. E.g., raw_summary entries: { "schedule.diagnosis|PDF.Anesthesia Record.Case Summary|0": "Right hip displacement", "schedule.diagnosis|PDF.Operative Note|7": "Displacement of right hip", } would be compiled into standard_key_groups entries: { "schedule.diagnosis": { "schedule.diagnosis|PDF.Anesthesia Record.Case Summary|0": "Right hip displacement", "schedule.diagnosis|PDF.Operative Note|7": "Displacement of right hip", } } The same is true, generally speaking, for array type keys with the additional step of replacing the wildcard card in the standard key definition with the proper list index using the list_manager dictionary. E.g., raw_summary entries: { "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|0": "Aetna", "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|1": "BCBS", "patient_info.insurance[1].company|PDF.Anesthesia Record.Active Insurance|0": "MDCR", } would be compiled into standard_key_groups entries: { "patient_info.insurance[0].company": { "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|0": "Aetna", }, "patient_info.insurance[1].company": { "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|1": "BCBS", }, "patient_info.insurance[2].company": { "patient_info.insurance[1].company|PDF.Anesthesia Record.Active Insurance|0": "MDCR", }, } by virtue of their combined list and table indexes (0, 0), (0, 1), and (1, 0) respectively.

The reduce() method iterates over each standard field in standard_key_groups, selecting the proper final standard key value from the list of candidates according to the ManagedField definition. If the field is not managed and present in standard_keys, the default ManagedField will be applied (selects the most frequent value in the list of candidates).

Class variables

var context_group_priorities : dict[str, list[str]]
var context_groups : dict[str, dict[str, dict[str, str | vStr]]]
var context_rules : collections.abc.Sequence[ContextRule]
var context_sep : str
var list_manager : dict[str, list[tuple[int, int]]]
var managed_fields : dict[str, ManagedField]
var output : dict[str, str | vStr]
var output_dir : str | None
var patient_id : str
var raw_summary : dict[str, vStr]
var standard_key_groups : dict[str, dict[str, str | vStr]]
var standard_keys : list[str]

Methods

def expand_dependencies(self, this_field: ManagedField) ‑> collections.abc.Sequence[str]

Allow a 'non-list' field to depend on all values collected for a list field.

If a non-list field depends upon a list type field, replace the original wildcarded list key reference supplied in its dependencies with the list of indexed keys for which we actually collected data.

Example

Given a raw_summary containing data for three anesthesia providers, and a standard field "schedule.surgeon" with dependency "schedule.anesthesiaStaff[*].provider", expand the wildcarded list dependency into entries for the three provider values collected and return: [ "schedule.anesthesiaStaff[0].provider", "schedule.anesthesiaStaff[1].provider", "schedule.anesthesiaStaff[2].provider", ]

Args

this_field : ManagedField: the managed field to process

Returns

list[str]: return the original dependencies if the supplied field is itself a list field. otherwise, replace all wildcarded list type dependency references with the list of indexed keys generated by the dependency's collected values in _build_standard_key_groups().

def reduce(self)

Reduces self.standard_key_groups by applying the settings defined in self.managed_fields.

This method iterates over each standard field in self.standard_key_groups. If the field is not managed and present in self.standard_keys, the default ManagedField will be applied (selects the most frequent value in the list of candidates). If the field is managed, its dependencies are recursively reduced and a final value is generated according to the preprocess and xform functions defined in its ManagedField definition.

Args

_input : KeyGroups | None: The input dictionary to reduce. If None, the method uses the standard_key_groups attribute.
_recursing_for : list[str]: A list of standard keys that are being recursively reduced due to their presence as a dependency in a different standard key's ManagedField definition. Used to avoid infinite recursion.

Returns

dict: The reduced dictionary.

class ManagedField (standard_key: str, context_priority: Sequence[str] = ('CSV', 'PDF', 'DB'), xform: Callable[[str | vStr], str | vStr] = <function ManagedField.<lambda>>, generated: bool = False, dependencies: Sequence[str] = (), preprocess: Callable[[str | vStr, dict[str, str | vStr]], str | vStr] = <function ManagedField.<lambda>>, reducer: Callable[[Sequence[str | vStr]], str | vStr] = <function most_freq_element>)

Standardize, recombine, generate, and prioritize extracted data based on point of origin, current value, and pre-defined dependencies.

Useful for establishing enums, rejecting bad inputs, and accepting postprocedure data over preprocedure data, among other use cases. Use the transform_utils.standardize_field_value() function for establishing enums. Implemented at the client and facility level of specs/client_specs.py. Global definitions for the ClaimMaker UI use case are available as members of the specs subpackage, i.e.: import specs specs.BASE_MANAGED_FIELDS # baseline field definitions

Args

standard_key : str: a key from a summary_specs entry
context_priority : Sequence[str]: A list used to prioritize the value selected from a list of vStr candidates based on vStr.ctx and/or point of origin provided in a summary key. Default is ("CSV", "PDF", "DB").
xform : Callable[[str | vStr], str | vStr]: transform to apply to produce a standard output from the provided input, if required. Defaults to a null operation (i.e. lambda x: x) to allow context only operations.
generated : bool: if true, calculate a value for this key from its depedencies and add it to the summary output even if the key is absent in the original summary provided to the FieldManager.
dependencies : Sequence[str]: list of standard field names that will be used to augment or construct the managed output
preprocess : Callable[[vStr, dict[str, vStr]], vStr]: takes raw value and the dict of dependency values as input. preps raw value for xform. defaults to a null operation (i.e. lambda x, _: x).
reducer : Callable[[Sequence[vStr]], vStr]: takes a list of candidate vStr values as input and returns the final output for this field. Defaults to utils.most_freq_element.

Class variables

var context_priority : collections.abc.Sequence[str]
var dependencies : collections.abc.Sequence[str]
var generated : bool
var standard_key : str

Methods

def clone(self, list_idx: str) ‑> ManagedField

Clone this instance after replacing '*' with the supplied list_idx in standard_key and all dependencies.

Args

list_idx : str: an integer string e.g. "1"

Returns

ManagedField: a new ManagedFields instance

def preprocess(x, _) ‑> collections.abc.Callable[[str | vStr, dict[str, str | vStr]], str | vStr]

def reduce(self, candidates: dict[str, str | vStr]) ‑> tuple[str, str | vStr]

Call self.reducer and return the key and value of the selected candidate.

Args

candidates: dictionary of candidate keys and values

Returns

tuple[str, str | vStr]: the selected candidate key and value

def reducer(check_lists, tiebreak_func=<function vstr_confidence_tiebreak>, xform: collections.abc.Callable[[typing.Any], typing.Union[str, vStr, typing.Any]] = utilities.v_str.vStr) ‑> collections.abc.Callable[[collections.abc.Sequence[str | vStr]], str | vStr]

Return the most frequent element from a list of lists.

Args

check_lists : list[list[Any]]: A list of lists containing elements to check.
tiebreak_func : Callable[[Any, Any], Any]: A function to break ties between elements with the same frequency. Default is vstr_confidence_tiebreak.
xform : Callable[[Any], str | vStr | Any]: A transformation function applied to each element. Defaults to vStr.

Returns

Any: The most frequent element after applying the transformation and tiebreak functions.

Example

>>> check_lists = [["a", "b", "a"], ["a", "c", "b", "b"]]
>>> tiebreak_func = lambda *args: sorted(args)[0]
>>> xform = str.upper
>>> most_freq_element(check_lists, tiebreak_func, xform)
'A'
>>> tiebreak_func = lambda *args: sorted(args)[-1]
>>> most_freq_element(check_lists, tiebreak_func, xform)
'B'

def xform(x) ‑> collections.abc.Callable[[str | vStr], str | vStr]