Package specs

Manage client, section, table, transform, summary, and match specs

Sub-modules

specs._types

Spec definitions. TypedDicts are used for easy serialization.

specs.client_specs

specs for extracting data from S3. keys are subfolder names with the exception of "db_secret_name" and "api_secret_name".

specs.client_specs_dev

specifications for extracting data from S3 for debugging purposes

specs.match_specs

facility specific specifications for mathching operations

specs.match_specs_maps

mappings between various data representations and the Hank AI standard json format

specs.section_specs

Function

common location for defining settings dictionaries for splitting PDFs into different sections. Different specifications can be …

specs.summary_specs

specfications to roll up all extracted data into a predefined json summary format.

specs.table_specs

Usage

Internal use within import_table_processing.py

Function

contains specifications for table processing in the form of dictionaries …

specs.transform_specs

TableTransformer 'apply_transforms' specs

Functions

def dev_specs(send_to_disk: bool = True, dev_secrets: bool = True, staging: bool = False, clients_to_keep: collections.abc.Sequence[str] | None = None, facilities_to_keep: list[str] | None = None) ‑> dict[str, ClientSpecStatic | dict[str, FacilitySpec]]

return SDLC specs for client. use arg to toggle use of debug send func dict_to_disk vs db_push_dict

def full_name_from_parts(prefix: str, context_priority: collections.abc.Sequence[str] = ('DB', 'CSV', 'PDF')) ‑> utilities.managed_fields.ManagedField

Build a ManagedField to standardize the data in the fullName, first, middle, and last properties of a PersonModel object (see schemaops/patient_info.model.json).

Args

prefix
The prefix for the ManagedField's standard_key, e.g. patient_info.patient.
context_priority
The context priority for the ManagedField. Defaults to

("DB", "CSV", "PDF").

Returns

ManagedField
A managed field for constructing a fullName property from its corresponding first, middle, and last properties while preserving the name suffix from the originally extracted fullName. The standard_key for the field is set as f"{prefix}.fullName".
def get_client_specs(*, client_list: list[str] | None = None, facility_list: list[str] | None = None, client_specs: dict[str, ClientSpecStatic | dict[str, FacilitySpec]] | None = None, **kwargs) ‑> dict[str, ClientSpecStatic | dict[str, FacilitySpec]]

Inspect env vars and command line args and override the builtin client specs as appropriate. cli_kwargs are collected in the args namespace defined in extract_s3.py.

KwArgs

client_list
list of top level keys aka clients to include in the returned specs. If not supplied, retain all clients. Applied after overrides.
facility_list
list of non-static 1st level subkeys aka facilities to include in the returned specs. if not supplied, retain all facilities. Applied after overrides. static 1st level subkeys (aka those defined in ClientSpecStatic) are always retained.
client_specs
custom specs unpickled from file or s3 that will supersede the builtin specs from this module.

client_specs override priority is as follows: 1. custom specs unpickled from file or s3 (cli_kwargs["client_specs"]) 2. "debug" specs from specs/client_specs_dev.py selected according env var PDF_EXT_RUN_MODE. a. Run mode 1 == End to end SDLC testing. Use SLDC secrets and earlier start dates from debug specs, but do NOT override the db_push_dict send_func or set an output directory. b. Run mode 2 == Local testing. Overrides secrets, start dates, and send_funcs AND sets an output dir. Saves final extract dict to disk (not DB) along with all intermediate objects. 3. builtin specs from specs/client_specs.py

Returns

ClientSpecs
a dictionary of (client name: ClientSpec)
def get_match_spec(match_specs_key: str, override_specs: dict[str, MatchSpec] | None = None) ‑> MatchSpec

Get the match spec defined for the supplied match_specs_key.

Args

match_specs_key : str
The match specs key to retrieve.
override_specs : MatchSpecs | None
Optional override of builtin_match_specs.

Returns

MatchSpec
The match spec for the supplied match_specs_key.
def get_section_spec(section_specs_key: str, override_specs: dict[str, SectionSpec] | None = None) ‑> SectionSpec

Get the section spec defined for the supplied section_specs_key.

Args

section_specs_key : str
The section specs key to retrieve.
override_specs : MatchSpecs | None
Optional override of builtin_section_specs.

Returns

SectionSpec
The section spec for the supplied section_specs_key.
def get_summary_spec(summary_specs_key: str, override_specs: dict[str, SummarySpec] | None = None) ‑> SummarySpec

Get the summary spec defined for the supplied summary_specs_key.

Args

summary_specs_key : str
The summary specs key to retrieve.
override_specs : MatchSpecs | None
Optional override of builtin_summary_specs.

Returns

SectionSpec
The summary spec for the supplied summary_specs_key.
def get_table_spec(section_title: str, section_specs_key: str, table_specs: dict[str, TableSpec] = None) ‑> TableSpec

Get the TableSpec for the supplied section_title and section_specs_key.

Usage

from functools import partial from section_extractor import section_extractor_factory from specs import get_table_spec, get_table_specs table_spec_getter = partial( get_table_spec, section_specs_key='your-section-specs-key', table_specs=get_table_specs('your-table-specs-key'), ) sections = section_extractor_factory(files_dict, section_spec, table_spec_getter)

Args

section_title : str
The title of the section to retrieve.
section_specs_key : str
The section specs key to retrieve.
table_specs : dict[str, TableSpec]
Optional override of builtin_table_specs.

Returns

TableSpec
The TableSpec for the supplied section_title and section_specs_key augmented with the heading_breaks defined in the section spec.
def get_table_specs(table_specs_key: str, override_specs: dict[str, dict[str, TableSpec]] | None = None) ‑> dict[str, TableSpec]

Get the dict of {: TableSpec} specs for the supplied table_specs_key.

Args

table_specs_key : str
The table specs key to retrieve.
override_specs : MatchSpecs | None
Optional override of builtin_table_specs.

Returns

TableSectionSpecs
The dict of {: TableSpec} for the supplied table_specs_key.
def get_transform_spec(transform_specs_key: str, override_specs: dict[str, dict[str, dict[str, dict[str, dict[str, typing.Any]]]]] | None = None) ‑> dict[str, dict[str, dict[str, dict[str, typing.Any]]]]

Get the transform spec defined for the supplied transform_specs_key.

Args

transform_specs_key : str
The transform specs key to retrieve.
override_specs : MatchSpecs | None
Optional override of builtin_transform_specs.

Returns

SectionSpec
The transform spec for the supplied transform_specs_key.
def to_spec_key(value: str) ‑> str

Convert a raw table, section, or facility title to a valid identifier.

Args

value : str
The raw title to convert.

Returns

str
The converted identifier. Spaces are replaced with underscores, non-alphabetic characters are removed, and the result is converted to lower case.