Module specs._types

Spec definitions. TypedDicts are used for easy serialization.

Sub-modules

specs._types._client

TypeDict definitions for client_specs.py

specs._types._match

TypeDict definitions for matchops classes.

specs._types._section

section spec typed dict

specs._types._summary

Type definitions for summary and condense specs.

specs._types._table

TypedDicts for use in table_specs.py

Classes

class ClientSpecStatic (*args, **kwargs)

Top level keys that must be present in ClientSpec objects

Attributes

db_secret_name : str
Name of the secret containing the database credentials.
api_secret_name : str
Name of the secret containing the api credentials.
managed_fields : dict[str, ManagedField]
client level standard fields.
managed_fields_context_rules : list[ContextRule]
client level context rules.
summary_key_addendum : list[str]
additional summary keys specific to client.
summary_map_addendum : dict[str, ReduceSpec]
dict of additional summary reduce specs.

Ancestors

  • builtins.dict

Class variables

var api_secret_name : str
var db_secret_name : str
var managed_fieldsdict[str, utilities.managed_fields.ManagedField]
var managed_fields_context_rules : list[utilities.managed_fields.ContextRule]
var summary_key_addendum : list[str]
var summary_map_addendumdict[str, ReduceSpec]
class FacilitySpec (*args, **kwargs)

Facility entry in client_specs.builtin_client_specs.

Attributes

azure_secret_name : str
stores endpoint and subscription key for Azure Computer Vision OCR operations.
dest_prefix : str
The final s3 folder specification for processed PDFs.
extract_func : partial | pu.ExtractFunc
Function used for text extraction.
facility_name : str
MUST MATCH FACILITY NAME FROM ACE SALESFORCE.
failed_prefix : str
The final s3 folder specification for failed PDFs.
first_dos : str
The first date of service to be coded for facility.
insurance_integration_mode (Literal['0', '1'] | None): overrides equivalent env var when set.
max_keys : int
Override max keys passed to extract_buckets for this facility.
match_specs_key : str
Key for this facility in match_specs.
output_dir : str | None
Optional directory for saving intermediate outputs.
provider_integration_mode (Literal['0', '1', '2'] | None): overrides equivalent env var when set.
s3_prefix : str
Non-filename portion of source PDF s3 keys.
section_specs_key : str
Key in section_specs.
send_func : Callable
S3Batch output processing function.
source_prefix : str
The final s3 folder specification for unprocessed PDFs.
summary_key_addendum : list[str]
List of additional summary keys valid for this facility.
summary_map_addendum : dict[str, ReduceSpec]
dict of additional summary reduce specs.
summary_specs_key : str
Key for summary type in summary_specs.
table_specs_key : str
Key for facility type in table_specs.
transform_specs_key : str
Key for facility type in transform_specs.
use_autocoding : bool
Flag to enable/disable autocoding for the facility.
use_docuvision : bool
Flag to enable/disable docuvision for the facility.
managed_fields : dict[str, ManagedField]
Custom standard field values (optional).
managed_fields_context_rules
list[ContextRule]: facility level context rules.
dv_preferred_networks : list[str] | None
List of DocuVision neural networks preferred for this facility.
dv_required_page_types : set[str] | None
if supplied, docuvision will only create a case for a pid if at least one of the pages assigned to that pid has a noteType in this set.
send_reject_notifications : bool
if True, include a UserNotification for each rejected s3 file input in the ClaimMaker Alert email to the client. Defaults to False.
file_groups : list[S3FileGroup]
list of file groups specifying how to process every file type according to a matched regex

Ancestors

  • builtins.dict

Class variables

var azure_secret_name : str
var dest_prefix : str
var dv_preferred_networks : list[str] | None
var dv_required_page_types : set[str] | None
var extract_func : functools.partial[dict[str, utilities.utils.FileContentsEntry]] | collections.abc.Callable[[dict[str, utilities.library_utils.PDFLibProto]], dict[str, utilities.utils.FileContentsEntry]]
var facility_name : str
var failed_prefix : str
var file_groups : list[utilities.client_utils.S3FileGroup]
var first_dos : str
var insurance_integration_mode : Optional[Literal['0', '1']]
var managed_fieldsdict[str, utilities.managed_fields.ManagedField]
var managed_fields_context_rules : list[utilities.managed_fields.ContextRule]
var match_specs_key : str
var max_keys : int
var output_dir : str | None
var provider_integration_mode : Optional[Literal['0', '1', '2']]
var s3_prefix : str
var section_specs_key : str
var send_func : collections.abc.Callable[..., dict[str, bool]]
var send_reject_notifications : bool
var source_prefix : str
var summary_key_addendum : list[str]
var summary_map_addendumdict[str, ReduceSpec]
var summary_specs_key : str
var table_specs_key : str | None
var transform_specs_key : str
var use_autocoding : bool
var use_docuvision : bool
class MatchSpec (*args, **kwargs)

Spec definition for reference file dataframe matching.

Attributes

schedule_other : partial[DataFrameMatcher]
partial function for matching other csvs to schedule.
schedule_demo : partial[DataFrameMatcher]
partial function for matching demographics to schedule.

Ancestors

  • builtins.dict

Class variables

var schedule_demo : functools.partial[DataFrameMatcher]
var schedule_other : functools.partial[DataFrameMatcher]
class ReduceSpec (*args, **kwargs)

TypedDict defining the fields required in a summary_map entry.

Attributes

reduce : Callable
function to reduce a list of values to a single value
key_filters : list[str]
if the source table key contains a value in this list, exclude its value from consideration.
value_filters : list[Callable]
if any of these functions return True for for a candidate value, exclude it from consideration.
queries : dict[str, Callable]
collect candidate values by querying the source table for keys that match the regex defined in the query key and reduce the results using the function defined in the query value.

Ancestors

  • builtins.dict

Class variables

var key_filters : list[str]
var queriesdict[str, collections.abc.Callable[[list[str]], str | bool | list[typing.Any]]]
var reduce : collections.abc.Callable[[collections.abc.Sequence[typing.Any]], str | bool | list[typing.Any]] | str
var value_filters : list[collections.abc.Callable[..., bool]]
class SectionSpec (*args, **kwargs)

Section spec TypedDict definition.

Attributes

exact_titles : list[str]
list of exact section titles to match
force_names : list[ForceNameTuple]
list of tuples of check functions and names to force if the check function returns True when passed a current section's title.
strip_ends : list[str]
list of strings to strip from the end of a section title.
heading_breaks : list[HBTuple]
list of tuples of break strings to remove unwanted data from section titles.
sect_start_checks : list[Callable]
list of functions to check if a section has started when passed the list of remaining extracted lines of text.
end_sect_latches : list[LTTuple]
list of tuples containing a latch function that should return True when passed the list of remaining lines if the section is ending, a trigger function to end the section based on the current line, and an unlatch function to clear the "section ending" latch based on the current line.
sect_start_dqs : list[su.SectStartDisqualifier]
list of su.SectStartDisqualifier. Disqualify section starts based on the currently extracting section name and the remaining extracted text.
line_roll_checks : list[su.LineRollCheck]
list of LineRoleCheck tuples defining tests to detect and functions to correct improper line wrapping in the source document.
wrap_lines : bool
if True, search for horizontally distributed table layouts and move tables in the rightmost column such that they appear below the table in the leftmost column.
document_strippers : list[StripperTuple]
list of tuples defining document stripper classes and their kwargs.

Ancestors

  • builtins.dict

Class variables

var document_strippers : list[utilities.section_utils.StripperTuple]
var end_sect_latches : list[utilities.section_utils.LTTuple]
var exact_titles : list[str]
var force_names : list[utilities.section_utils.ForceNameTuple]
var heading_breaks : list[utilities.section_utils.HBTuple]
var line_roll_checks : list[utilities.section_utils.LineRollCheck]
var sect_start_checks : list[collections.abc.Callable[[collections.abc.Sequence[str]], bool]]
var sect_start_dqs : list[utilities.section_utils.SectStartDisqualifier]
var strip_ends : list[str]
var wrap_lines : bool
class SummarySpec (*args, **kwargs)

Typed dict for required summary spec keys.

Attributes

summary_func : str
must be a valid summary function name from TableTransformer()
summary_args : dict[str, Any]
kwargs for summary_func
summary_key_addendum : list[str]
list of valid output keys not appearing in summary_map
summary_map : CondenseSpec
map of output keys to ReduceSpecs
summary_meets_claimmaker_standard : bool
enables "claimmaker only" operations. See aws_s3_batch.py for more information.

Ancestors

  • builtins.dict

Class variables

var summary_argsdict[str, typing.Any]
var summary_func : str
var summary_key_addendum : list[str]
var summary_mapdict[str, ReduceSpec]
var summary_meets_claimmaker_standard : bool
class TableSpec (*args, **kwargs)

TypedDict representing args/configuration items for parsing the tables of a single section.

Attributes

save_full_text : bool
if True, save the full text of the table
split_table_columns : list[str]
list of column names to split on
row_indent : int
lines indented by at least this many spaces are appended to the list of lines in the previous table.
force_table_names : list[ForceNameTuple]
list of tuples of check functions and names to force if the check function returns True when passed a current table's title.
rollup_cascade_reference : RollupCascadeManager
container class for defining rollup/cascade operations.
heading_indent : int
lines indented by exactly this many spaces are autmotically considered to be table headings.
stripped_head_key : str
key to use when storing data stripped from a table heading.
heading_contains : list[str]
list of strings to check for in the line to detect it as a table heading.
start_checks : list[TableStartCheck]
list of tuples of check functions for starting new tables.
end_checks : list[TableEndCheck]
list of tuples of check functions for ending the current table.
interpreter : Callable
function to interpret the table
interpreter_kwargs : SwappingInterpreterKwArgs
kwargs for interpreter
process_residual : bool
if True, process residual text after removing all lines assigned to other tables.
heading_breaks : list[HBTuple]
list of tuples of break strings to remove unwanted data from table titles. Inherited from section specs.

Ancestors

  • builtins.dict

Class variables

var end_checks : list[utilities.table_utils.TableEndCheck]
var force_table_names : list[utilities.section_utils.ForceNameTuple]
var heading_breaks : list[utilities.section_utils.HBTuple]
var heading_contains : list[str]
var heading_indent : int
var interpreter : collections.abc.Callable[..., list[dict[str, str]] | utilities.table_utils.SubtableParser]
var interpreter_kwargs : utilities.table_utils.SwappingInterpreterKwArgs
var process_residual : bool
var rollup_cascade_referenceRollupCascadeManager
var row_indent : int
var save_full_text : bool
var split_table_columns : list[str]
var start_checks : list[utilities.table_utils.TableStartCheck]
var stripped_head_key : str