Package specs
Manage client, section, table, transform, summary, and match specs
Sub-modules
specs._types
-
Spec definitions. TypedDicts are used for easy serialization.
specs.client_specs
-
specs for extracting data from S3. keys are subfolder names with the exception of "db_secret_name" and "api_secret_name".
specs.client_specs_dev
-
specifications for extracting data from S3 for debugging purposes
specs.match_specs
-
facility specific specifications for mathching operations
specs.match_specs_maps
-
mappings between various data representations and the Hank AI standard json format
specs.section_specs
-
Function
common location for defining settings dictionaries for splitting PDFs into different sections. Different specifications can be …
specs.summary_specs
-
specfications to roll up all extracted data into a predefined json summary format.
specs.table_specs
-
Usage
Internal use within import_table_processing.py
Function
contains specifications for table processing in the form of dictionaries …
specs.transform_specs
-
TableTransformer 'apply_transforms' specs
Functions
def dev_specs(send_to_disk: bool = True, dev_secrets: bool = True, staging: bool = False, clients_to_keep: collections.abc.Sequence[str] | None = None, facilities_to_keep: list[str] | None = None) ‑> dict[str, ClientSpecStatic | dict[str, FacilitySpec]]
-
return SDLC specs for client. use arg to toggle use of debug send func
dict_to_disk
vsdb_push_dict
def full_name_from_parts(prefix: str, context_priority: collections.abc.Sequence[str] = ('DB', 'CSV', 'PDF')) ‑> utilities.managed_fields.ManagedField
-
Build a ManagedField to standardize the data in the fullName, first, middle, and last properties of a PersonModel object (see schemaops/patient_info.model.json).
Args
prefix
- The prefix for the ManagedField's standard_key, e.g.
patient_info.patient
. context_priority
- The context priority for the ManagedField. Defaults to
("DB", "CSV", "PDF").
Returns
ManagedField
- A managed field for constructing a fullName property
from its corresponding first, middle, and last properties while
preserving the name suffix from the originally extracted fullName.
The standard_key for the field is set as
f"{prefix}.fullName"
.
def get_client_specs(*, client_list: list[str] | None = None, facility_list: list[str] | None = None, client_specs: dict[str, ClientSpecStatic | dict[str, FacilitySpec]] | None = None, **kwargs) ‑> dict[str, ClientSpecStatic | dict[str, FacilitySpec]]
-
Inspect env vars and command line args and override the builtin client specs as appropriate. cli_kwargs are collected in the
args
namespace defined in extract_s3.py.KwArgs
client_list
- list of top level keys aka clients to include in the returned specs. If not supplied, retain all clients. Applied after overrides.
facility_list
- list of non-static 1st level subkeys aka facilities to include in the returned specs. if not supplied, retain all facilities. Applied after overrides. static 1st level subkeys (aka those defined in ClientSpecStatic) are always retained.
client_specs
- custom specs unpickled from file or s3 that will supersede the builtin specs from this module.
client_specs override priority is as follows: 1. custom specs unpickled from file or s3 (cli_kwargs["client_specs"]) 2. "debug" specs from specs/client_specs_dev.py selected according env var PDF_EXT_RUN_MODE. a. Run mode
1
== End to end SDLC testing. Use SLDC secrets and earlier start dates from debug specs, but do NOT override the db_push_dict send_func or set an output directory. b. Run mode2
== Local testing. Overrides secrets, start dates, and send_funcs AND sets an output dir. Saves final extract dict to disk (not DB) along with all intermediate objects. 3. builtin specs from specs/client_specs.pyReturns
ClientSpecs
- a dictionary of (client name: ClientSpec)
def get_match_spec(match_specs_key: str, override_specs: dict[str, MatchSpec] | None = None) ‑> MatchSpec
-
Get the match spec defined for the supplied match_specs_key.
Args
match_specs_key
:str
- The match specs key to retrieve.
override_specs
:MatchSpecs | None
- Optional override of builtin_match_specs.
Returns
MatchSpec
- The match spec for the supplied match_specs_key.
def get_section_spec(section_specs_key: str, override_specs: dict[str, SectionSpec] | None = None) ‑> SectionSpec
-
Get the section spec defined for the supplied section_specs_key.
Args
section_specs_key
:str
- The section specs key to retrieve.
override_specs
:MatchSpecs | None
- Optional override of builtin_section_specs.
Returns
SectionSpec
- The section spec for the supplied section_specs_key.
def get_summary_spec(summary_specs_key: str, override_specs: dict[str, SummarySpec] | None = None) ‑> SummarySpec
-
Get the summary spec defined for the supplied summary_specs_key.
Args
summary_specs_key
:str
- The summary specs key to retrieve.
override_specs
:MatchSpecs | None
- Optional override of builtin_summary_specs.
Returns
SectionSpec
- The summary spec for the supplied summary_specs_key.
def get_table_spec(section_title: str, section_specs_key: str, table_specs: dict[str, TableSpec] = None) ‑> TableSpec
-
Get the TableSpec for the supplied section_title and section_specs_key.
Usage
from functools import partial from section_extractor import section_extractor_factory from specs import get_table_spec, get_table_specs table_spec_getter = partial( get_table_spec, section_specs_key='your-section-specs-key', table_specs=get_table_specs('your-table-specs-key'), ) sections = section_extractor_factory(files_dict, section_spec, table_spec_getter)
Args
section_title
:str
- The title of the section to retrieve.
section_specs_key
:str
- The section specs key to retrieve.
table_specs
:dict[str, TableSpec]
- Optional override of builtin_table_specs.
Returns
TableSpec
- The TableSpec for the supplied section_title and section_specs_key augmented with the heading_breaks defined in the section spec.
def get_table_specs(table_specs_key: str, override_specs: dict[str, dict[str, TableSpec]] | None = None) ‑> dict[str, TableSpec]
-
Get the dict of {
: TableSpec} specs for the supplied table_specs_key. Args
table_specs_key
:str
- The table specs key to retrieve.
override_specs
:MatchSpecs | None
- Optional override of builtin_table_specs.
Returns
TableSectionSpecs
- The dict of {
: TableSpec} for the supplied table_specs_key.
def get_transform_spec(transform_specs_key: str, override_specs: dict[str, dict[str, dict[str, dict[str, dict[str, typing.Any]]]]] | None = None) ‑> dict[str, dict[str, dict[str, dict[str, typing.Any]]]]
-
Get the transform spec defined for the supplied transform_specs_key.
Args
transform_specs_key
:str
- The transform specs key to retrieve.
override_specs
:MatchSpecs | None
- Optional override of builtin_transform_specs.
Returns
SectionSpec
- The transform spec for the supplied transform_specs_key.
def to_spec_key(value: str) ‑> str
-
Convert a raw table, section, or facility title to a valid identifier.
Args
value
:str
- The raw title to convert.
Returns
str
- The converted identifier. Spaces are replaced with underscores, non-alphabetic characters are removed, and the result is converted to lower case.