Module `specs._types`

Spec definitions. TypedDicts are used for easy serialization.

Sub-modules

specs._types._client: TypeDict definitions for client_specs.py
specs._types._match: TypeDict definitions for matchops classes.
specs._types._section: section spec typed dict
specs._types._summary: Type definitions for summary and condense specs.
specs._types._table: TypedDicts for use in table_specs.py

Classes

class ClientSpecStatic (*args, **kwargs)

Top level keys that must be present in ClientSpec objects

Attributes

db_secret_name : str: Name of the secret containing the database credentials.
api_secret_name : str: Name of the secret containing the api credentials.
managed_fields : dict[str, ManagedField]: client level standard fields.
managed_fields_context_rules : list[ContextRule]: client level context rules.
summary_key_addendum : list[str]: additional summary keys specific to client.
summary_map_addendum : dict[str, ReduceSpec]: dict of additional summary reduce specs.

Ancestors

builtins.dict

Class variables

var api_secret_name : str
var db_secret_name : str
var managed_fields : dict[str, utilities.managed_fields.ManagedField]
var managed_fields_context_rules : list[utilities.managed_fields.ContextRule]
var summary_key_addendum : list[str]
var summary_map_addendum : dict[str, ReduceSpec]

class FacilitySpec (*args, **kwargs)

Facility entry in client_specs.builtin_client_specs.

Attributes

azure_secret_name : str: stores endpoint and subscription key for Azure Computer Vision OCR operations.
dest_prefix : str: The final s3 folder specification for processed PDFs.
extract_func : partial | pu.ExtractFunc: Function used for text extraction.
facility_name : str: MUST MATCH FACILITY NAME FROM ACE SALESFORCE.
failed_prefix : str: The final s3 folder specification for failed PDFs.
first_dos : str: The first date of service to be coded for facility.
insurance_integration_mode (Literal['0', '1'] | None): overrides equivalent env var when set.
max_keys : int: Override max keys passed to extract_buckets for this facility.
match_specs_key : str: Key for this facility in match_specs.
output_dir : str | None: Optional directory for saving intermediate outputs.
provider_integration_mode (Literal['0', '1', '2'] | None): overrides equivalent env var when set.
s3_prefix : str: Non-filename portion of source PDF s3 keys.
section_specs_key : str: Key in section_specs.
send_func : Callable: S3Batch output processing function.
source_prefix : str: The final s3 folder specification for unprocessed PDFs.
summary_key_addendum : list[str]: List of additional summary keys valid for this facility.
summary_map_addendum : dict[str, ReduceSpec]: dict of additional summary reduce specs.
summary_specs_key : str: Key for summary type in summary_specs.
table_specs_key : str: Key for facility type in table_specs.
transform_specs_key : str: Key for facility type in transform_specs.
use_autocoding : bool: Flag to enable/disable autocoding for the facility.
use_docuvision : bool: Flag to enable/disable docuvision for the facility.
managed_fields : dict[str, ManagedField]: Custom standard field values (optional).
managed_fields_context_rules: list[ContextRule]: facility level context rules.
dv_preferred_networks : list[str] | None: List of DocuVision neural networks preferred for this facility.
dv_required_page_types : set[str] | None: if supplied, docuvision will only create a case for a pid if at least one of the pages assigned to that pid has a noteType in this set.
send_reject_notifications : bool: if True, include a UserNotification for each rejected s3 file input in the ClaimMaker Alert email to the client. Defaults to False.
file_groups : list[S3FileGroup]: list of file groups specifying how to process every file type according to a matched regex

Ancestors

builtins.dict

Class variables

var azure_secret_name : str
var dest_prefix : str
var dv_preferred_networks : list[str] | None
var dv_required_page_types : set[str] | None
var extract_func : functools.partial[dict[str, utilities.utils.FileContentsEntry]] | collections.abc.Callable[[dict[str, utilities.library_utils.PDFLibProto]], dict[str, utilities.utils.FileContentsEntry]]
var facility_name : str
var failed_prefix : str
var file_groups : list[utilities.client_utils.S3FileGroup]
var first_dos : str
var insurance_integration_mode : Optional[Literal['0', '1']]
var managed_fields : dict[str, utilities.managed_fields.ManagedField]
var managed_fields_context_rules : list[utilities.managed_fields.ContextRule]
var match_specs_key : str
var max_keys : int
var output_dir : str | None
var provider_integration_mode : Optional[Literal['0', '1', '2']]
var s3_prefix : str
var section_specs_key : str
var send_func : collections.abc.Callable[..., dict[str, bool]]
var send_reject_notifications : bool
var source_prefix : str
var summary_key_addendum : list[str]
var summary_map_addendum : dict[str, ReduceSpec]
var summary_specs_key : str
var table_specs_key : str | None
var transform_specs_key : str
var use_autocoding : bool
var use_docuvision : bool

class MatchSpec (*args, **kwargs)

Spec definition for reference file dataframe matching.

Attributes

schedule_other : partial[DataFrameMatcher]: partial function for matching other csvs to schedule.
schedule_demo : partial[DataFrameMatcher]: partial function for matching demographics to schedule.

Ancestors

builtins.dict

Class variables

var schedule_demo : functools.partial[DataFrameMatcher]
var schedule_other : functools.partial[DataFrameMatcher]

class ReduceSpec (*args, **kwargs)

TypedDict defining the fields required in a summary_map entry.

Attributes

reduce : Callable: function to reduce a list of values to a single value
key_filters : list[str]: if the source table key contains a value in this list, exclude its value from consideration.
value_filters : list[Callable]: if any of these functions return True for for a candidate value, exclude it from consideration.
queries : dict[str, Callable]: collect candidate values by querying the source table for keys that match the regex defined in the query key and reduce the results using the function defined in the query value.

Ancestors

builtins.dict

Class variables

var key_filters : list[str]
var queries : dict[str, collections.abc.Callable[[list[str]], str | bool | list[typing.Any]]]
var reduce : collections.abc.Callable[[collections.abc.Sequence[typing.Any]], str | bool | list[typing.Any]] | str
var value_filters : list[collections.abc.Callable[..., bool]]

class SectionSpec (*args, **kwargs)

Section spec TypedDict definition.

Attributes

exact_titles : list[str]: list of exact section titles to match
force_names : list[ForceNameTuple]: list of tuples of check functions and names to force if the check function returns True when passed a current section's title.
strip_ends : list[str]: list of strings to strip from the end of a section title.
heading_breaks : list[HBTuple]: list of tuples of break strings to remove unwanted data from section titles.
sect_start_checks : list[Callable]: list of functions to check if a section has started when passed the list of remaining extracted lines of text.
end_sect_latches : list[LTTuple]: list of tuples containing a latch function that should return True when passed the list of remaining lines if the section is ending, a trigger function to end the section based on the current line, and an unlatch function to clear the "section ending" latch based on the current line.
sect_start_dqs : list[su.SectStartDisqualifier]: list of su.SectStartDisqualifier. Disqualify section starts based on the currently extracting section name and the remaining extracted text.
line_roll_checks : list[su.LineRollCheck]: list of LineRoleCheck tuples defining tests to detect and functions to correct improper line wrapping in the source document.
wrap_lines : bool: if True, search for horizontally distributed table layouts and move tables in the rightmost column such that they appear below the table in the leftmost column.
document_strippers : list[StripperTuple]: list of tuples defining document stripper classes and their kwargs.

Ancestors

builtins.dict

Class variables

var document_strippers : list[utilities.section_utils.StripperTuple]
var end_sect_latches : list[utilities.section_utils.LTTuple]
var exact_titles : list[str]
var force_names : list[utilities.section_utils.ForceNameTuple]
var heading_breaks : list[utilities.section_utils.HBTuple]
var line_roll_checks : list[utilities.section_utils.LineRollCheck]
var sect_start_checks : list[collections.abc.Callable[[collections.abc.Sequence[str]], bool]]
var sect_start_dqs : list[utilities.section_utils.SectStartDisqualifier]
var strip_ends : list[str]
var wrap_lines : bool

class SummarySpec (*args, **kwargs)

Typed dict for required summary spec keys.

Attributes

summary_func : str: must be a valid summary function name from TableTransformer()
summary_args : dict[str, Any]: kwargs for summary_func
summary_key_addendum : list[str]: list of valid output keys not appearing in summary_map
summary_map : CondenseSpec: map of output keys to ReduceSpecs
summary_meets_claimmaker_standard : bool: enables "claimmaker only" operations. See aws_s3_batch.py for more information.

Ancestors

builtins.dict

Class variables

var summary_args : dict[str, typing.Any]
var summary_func : str
var summary_key_addendum : list[str]
var summary_map : dict[str, ReduceSpec]
var summary_meets_claimmaker_standard : bool

class TableSpec (*args, **kwargs)

TypedDict representing args/configuration items for parsing the tables of a single section.

Attributes

save_full_text : bool: if True, save the full text of the table
split_table_columns : list[str]: list of column names to split on
row_indent : int: lines indented by at least this many spaces are appended to the list of lines in the previous table.
force_table_names : list[ForceNameTuple]: list of tuples of check functions and names to force if the check function returns True when passed a current table's title.
rollup_cascade_reference : RollupCascadeManager: container class for defining rollup/cascade operations.
heading_indent : int: lines indented by exactly this many spaces are autmotically considered to be table headings.
stripped_head_key : str: key to use when storing data stripped from a table heading.
heading_contains : list[str]: list of strings to check for in the line to detect it as a table heading.
start_checks : list[TableStartCheck]: list of tuples of check functions for starting new tables.
end_checks : list[TableEndCheck]: list of tuples of check functions for ending the current table.
interpreter : Callable: function to interpret the table
interpreter_kwargs : SwappingInterpreterKwArgs: kwargs for interpreter
process_residual : bool: if True, process residual text after removing all lines assigned to other tables.
heading_breaks : list[HBTuple]: list of tuples of break strings to remove unwanted data from table titles. Inherited from section specs.

Ancestors

builtins.dict

Class variables

var end_checks : list[utilities.table_utils.TableEndCheck]
var force_table_names : list[utilities.section_utils.ForceNameTuple]
var heading_breaks : list[utilities.section_utils.HBTuple]
var heading_contains : list[str]
var heading_indent : int
var interpreter : collections.abc.Callable[..., list[dict[str, str]] | utilities.table_utils.SubtableParser]
var interpreter_kwargs : utilities.table_utils.SwappingInterpreterKwArgs
var process_residual : bool
var rollup_cascade_reference : RollupCascadeManager
var row_indent : int
var save_full_text : bool
var split_table_columns : list[str]
var start_checks : list[utilities.table_utils.TableStartCheck]
var stripped_head_key : str