Package utilities
shared extractor utility functions
Sub-modules
utilities.aws_utils
-
Container for send functions called via aws_specs and aws_specs_dev
utilities.azure_ocr_integrator
-
Function call for Microsoft Azure OCR requests
utilities.client_utils
-
Container for send functions called via client_specs and client_specs_dev
utilities.env_var_action
-
EnvVarAction class definition
utilities.hl7_utils
-
Parse HL7 data into pdf extractor schedule/demographic format.
utilities.json_decoders
-
Custom json decoders
utilities.library_utils
-
Tools for constructing and managing 'pdf_library' objects
utilities.log_utils
-
exception handler with logging and dynamic continuation
utilities.managed_fields
-
FieldManager and ManagedField class and related utilities. See specs/_bases/_fields.py for examples of well commented, globally applicable …
utilities.pdf_utils
-
Utility functions for raw pdf, CSV, and other delimited file processing.
utilities.protocols
-
Define protocols for inaccessible (due to circular imports) external classes
utilities.section_utils
-
Utility functions and classes used by: section_extractor.py section_specs.py
utilities.table_interpreters
-
library of "interpreter" functions called during table extraction that covert free text lines into a raw table format
utilities.table_utils
-
Utility functions for table extraction
utilities.transform_utils
-
utility functions used by: table_transformer.py transform_specs.py summary_specs.py
utilities.utils
-
Utiltily functions with global scope
utilities.v_str
-
String implementation used to associate source context and confidence with information extracted from the PDF
utilities.value_cache_dict
-
Useful in extending functionality for comprehensions by allowing the current iteration to reference the results of prior iteration(s). See …
Classes
class AzureOCRIntegrator (max_retries: int = 10, timeout: int = 60)
-
Extract text from pdf images via calls to Azure Cognitive Services OCR API.
Class variables
var client : azure.cognitiveservices.vision.computervision._computer_vision_client.ComputerVisionClient | None
var last_op_id : str
var last_read_results : list[azure.cognitiveservices.vision.computervision.models._models_py3.ReadResult]
var max_retries : int
var timeout : int
Methods
def create_client(self, azure_secret_name: str) ‑> None
-
Create an Azure ComputerVisionClient based on the supplied AWS secret.
If the
azure_secret_name
parameter is empty or does not point to a valid AWS secret, self.client is set to None and no image text will be extracted.Args
azure_secret_name
- the name of the secret defined in AWS for this facility's subscription key.
Environment Variables: AWS_ACCESS_KEY_ID: the access key ID for an account with access to the secret AWS_SECRET_ACCESS_KEY: the secret access key for an account with access to the secret AWS_SESSION_TOKEN (optional): required if the selected account is not allowed the non-interactive login permission. Blank otherwise. AWS_REGION (optional): defaults to "us-east-1" if not set.
def ocr_pages(self, pdf: bytes, pages: list[int], debug_path: pathlib.Path | None = None) ‑> list[str]
-
Read the text from supplied pdf page indices.
Args
pdf
- bytes of a pdf file
pages
- list of pdf page indices to OCR
Returns
list[str]
- list of strings containing structured text extracted from each supplied page index.
class CacheDictCheck (check_name: str, key_arg: str | int, value_arg: str | int, key_check: collections.abc.Callable[[typing.Any], bool], format_key: collections.abc.Callable[[typing.Any], collections.abc.Hashable] = builtins.str, value_check: collections.abc.Callable[[typing.Any], bool] = <function CacheDictCheck.<lambda>>, preprocess_value: collections.abc.Callable[[typing.Any], typing.Any] = <function CacheDictCheck.<lambda>>, concat_value: collections.abc.Callable[[typing.Any, typing.Any], typing.Any] = <function CacheDictCheck.<lambda>>, format_arg_value: collections.abc.Callable[[typing.Any, typing.Any], typing.Any] = <function CacheDictCheck.<lambda>>, cache_factory: collections.abc.Callable[[], typing.Any] = builtins.str, reset_arg: str | int | None = None, *, reset_check: collections.abc.Callable[[typing.Any, typing.Any], bool] = <function CacheDictCheck.<lambda>>)
-
Defines a cache_check for use with the ValueCacheDict decorator.
Defines the tests for determining whether an arg or kwarg should be added to the cache, how the value should be formatted prior to being added, how to combine it with previously cached values, and how the value in the cache should be reincorporated into the values of the args and kwargs supplied in the call before they are forwarded to the wrapped function.
Attributes
check_name
:str
- key for this check's cache in the cache_dict of the @ValueCacheDict wrapped function.
key_arg
:str | int
- the argument to the wrapped function that will be used to generate the cache dict key. An int represents the zero based index of a postional arg (see note below). A str must reference a valid keyword argument.
value_arg
:str | int
- the argument to the wrapped function that will be used to create values for the cache.
key_check
:Callable[[Any], bool]
- given the resolved value of the 'key_arg' (see ValueCacheDict), return True to trigger caching.
format_key
:Callable[[Any], Hashable]
- given the resolved value of the 'key_arg', return a cache dict key. (Default is str)
value_check
:Callable[[Any], bool]
- given the resolved value of the 'value_arg' (see ValueCacheDict), return True to trigger caching. (Default is lambda _: True)
preprocess_value
:Callable[[Any], Any]
- given the resolved value of the 'value_arg', return a modified representation fit for caching. (Default is lambda x: x)
concat_value
:Callable[[Any, Any], Any]
- given the cached value and the preprocessed new value, return a new cached value. (Default is lambda x, y: x + y)
format_arg_value
:Callable[[Any, Any], Any]
- given the original resolved value of the 'value_arg' and the new cache value, return the value of 'value_arg' to use when calling the wrapped function. (Default is lambda: _, y: y (passes the new cache value))
cache_factory
:Callable[[], Any]
- assigned to the "default_factory" attribute of the internal defaultdict serving as the cache for the duration of this check. (Default is str)
reset_arg
:str | int | None
- the argument to the wrapped function that will be evaluated to determine whether the cache should be reset. If None, the cache will never reset.
reset_check
:Callable[[Any, Any], bool]
- given the previously resolved value of the 'reset_arg' and the current resolved value, return True to reset the cache. (Default is lambda x, y: x != y)
NOTE: the arg at position zero of a class or instance method will be "cls" or "self", so the arguments in the actual function call will begin at position 1 for those use cases.
Class variables
var cache_factory : collections.abc.Callable[[], typing.Any]
-
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
var cached : dict[collections.abc.Hashable, typing.Any]
var check_name : str
var format_key : collections.abc.Callable[[typing.Any], collections.abc.Hashable]
-
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
var key_arg : str | int
var key_check : collections.abc.Callable[[typing.Any], bool]
var reset_arg : str | int | None
var reset_value : Any
var value_arg : str | int
Methods
def concat_value(x, y) ‑> collections.abc.Callable[[typing.Any, typing.Any], typing.Any]
def format_arg_value(_, y) ‑> collections.abc.Callable[[typing.Any, typing.Any], typing.Any]
def preprocess_value(x) ‑> collections.abc.Callable[[typing.Any], typing.Any]
def reset_check(x, y) ‑> collections.abc.Callable[[typing.Any, typing.Any], bool]
def value_check(_) ‑> collections.abc.Callable[[typing.Any], bool]
class ContextRule (title: ForwardRef('str'), controller_keys: ForwardRef('list[str]'), patterns: ForwardRef('list[re.Pattern]'), applied_context: ForwardRef('Sequence[str]'), is_active: ForwardRef('Callable[[KeyGroups], Any]'), candidate_filter: ForwardRef('Callable[[KeyGroups, Any], KeyGroups]') = <function ContextRule.<lambda>>)
-
Conditionally modify the preferred point of origin (context) and/or manipulate the list of candidate values for all standard keys matched by a list of regular expressions. See FieldManager._apply_context_rules() for implementation. See specs/_bases/_fields.py for examples of well commented, globally applicable ContextRules that reduce DB/CSV and PDF/DB/CSV summaries. These global context rules are available from the specs subpackage, i.e.:
import specs specs.CSV_DB_CONTEXT_RULE # context rule for CSV/DB summaries
Args
title
:str
- a descriptive name used for logging and maintainability purposes.
controller_keys
:list[str]
- A list of standard keys whose standard_key_groups entries will determine if the rule is activated.
patterns
:list[re.Pattern]
- A list of regular expressions that determine the set of standard keys who are subject to the rule's actions.
applied_context
:list[str]
- the new context_priority value for matched keys if the
supplied
is_active
function returnsTrue
(boolean type only). is_active
:Callable[[KeyGroups], Any]
- A function that takes the subset of the
FieldManager's standard_key_groups that matches the rule's controller_keys and
determines if the rule is in effect (bool) or an alternative input for a custom
candidate_filter
in advanced applications. candidate_filter
:Callable[[KeyGroups, Any], KeyGroups]
- A function that takes the
subset of the FieldManager's standard_key_groups that matches the rule's keys and
the result from the
is_active
func and returns the subset of the FieldManager's standard_key_groups that are eligible to appear in the output. Optional. Defaults to a null operation (i.e. lambda x, _: x).
Ancestors
- builtins.tuple
Instance variables
var applied_context : collections.abc.Sequence[str]
-
Alias for field number 3
var candidate_filter : collections.abc.Callable[[dict[str, dict[str, str | vStr]], typing.Any], dict[str, dict[str, str | vStr]]]
-
Alias for field number 5
var controller_keys : list[str]
-
Alias for field number 1
var is_active : collections.abc.Callable[[dict[str, dict[str, str | vStr]]], typing.Any]
-
Alias for field number 4
var patterns : list[re.Pattern]
-
Alias for field number 2
var title : str
-
Alias for field number 0
class DocuVisionIntegratorProtocol (*args, **kwargs)
-
Define externally accessed methods for the DocuVisionIntegrator class. See
./app/integrators/docuvision_integrator.py
for class implementation.Ancestors
- typing.Protocol
- typing.Generic
Subclasses
Class variables
var api_key : str
var base_path : str
var base_url : str
var page_map : dict[str, dict[str, list[int]]]
var split_by_pid : bool
Instance variables
prop pdf_library : dict[str, typing.Any]
-
Dict of {case_doc_id: PDFLibEntry} where each entry represents the combined_pdf of a DocuVisionCase created by a child DocuVisionTask instance.
Merged into S3Batch.pdf_library in aws_s3_batch.py for file matching and other downstream processes.
prop pdfs_by_doc_id : dict[str, bytes]
-
Returns a dict of pdf bytes split and/or concatenated by task pids.
prop results : dict[str, list[dict[str, str]]]
-
Process responses from all child tasks to produce results according to the current instance settings.
Methods
def create_tasks(self, documents: dict[str, typing.Any] | None = None)
-
Obtain upload location and post PDFs. If mock_ids were defined, create mock tasks for each id and collect the existing respones.
Args
documents
:dict[str, lu.PDFLibProto]
- dict of {filename: PDFLibProto} where PDFLibProto is a namedtuple of (body, meta) where body is a bytes object and meta is a dict of metadata for the pdf. Optional. Extends self.documents if supplied.
def job_dict_entries(self, extracted_data: dict[str, dict[str, typing.Any]]) ‑> dict[str, dict[str, typing.Any]]
-
Dict of {job_id: job_dict} for all tasks in self._tasks where each job_dict contains values for db columns 'input', 'comments', and 'note'.
Called by aws_s3_batch.py to recombine the values for the columns noted above with their corresponding output from table_transformer.py.
Args
extracted_data
:dict[str, dict[str, Any]]
- TableTransformer output data supplied from aws_s3_batch.S3Batch.transformed.
Returns
dict[str, dict[str, Any]]
- dict of {job_id: job_dict} for all tasks in self._tasks.
def reset(self, **kwargs)
-
Reset the dataclass to prepare for a new facility by clearing all documents, tasks, results, and internal variables.
KwArgs
documents
:dict[str, lu.PDFLibProto]
- dict of {filename: PDFLibProto} where PDFLibProto is a namedtuple of (body, meta) where body is a bytes object and meta is a dict of metadata for the pdf.
page_map
:dict[str, dict[str, list[int]]]
- dict of {new_doc_id: {old_doc_id: [page_nums]}} where new_doc_id is the doc_id for the combined pdf created by docuvision and old_doc_id is the doc_id for the original pdf. page_nums is a list of page numbers from the original pdf that were included in the combined pdf.
out_dir
:str
- path to directory where output json files will be written. If None, no files will be written.
split_by_pid
:bool
- if True, docuvision will split each pdf into separate documents based on patient id. If False, docuvision will combine all pdfs into a single document.
fail_on_error
:bool
- if True, raise an error if any task fails to post or any response fails to be collected. Defaults to False.
mock_ids
:list[int]
- list of task_ids to use for mock responses. Defaults to [].
default_dos
:str
- default date of service to use if no date of service is extracted from the facesheet. Defaults to gvars.DEFAULT_DOS.
api_secret_name
:str
- name of secret in AWS secretsmanager containing the base_url, base_path, and api_key values for the docuvision API.
dv_preferred_networks
:list[str] | None
- list of preferred Docuvision Neural Networks. Created for facilities where people manually upload 1-page PDFs
table_converters
:dict[str, Callable[[list[str]], list[dict[str, str]]]]
- function reference for processing '*Table' labels returned by DV-1.
dv_required_page_types
:set[str]
- if supplied, a case will only be created for a pid if at least one of the pages assigned to that pid have a type in this set.
def tables_for(self, doc_id: str, section: str = 'DocuVision', sep: str = '.') ‑> dict[str, list[dict[str, str]]]
-
Get results for the supplied doc_id in a tabular format suitable for downstream processing in table_transformer.py.
Args
doc_id
:str
- doc_id for the document to retrieve results for.
section
:str
- section name to use for the table. Defaults to "DocuVision".
sep
:str
- separator to use for table keys. Defaults to ".".
Returns
dict[str, list[dict[str, str]]]
- dict of {table_name: [table_rows]} where each table_row is a dict of {label: value}.
def task_attr_list(self, attr: str) ‑> list
-
List the specified attribute for all tasks in self._tasks.
class EnvVarAction (option_strings, dest, default=None, type=None, **kwargs)
-
Custom argparse action to override command line args with environment variable values if environment variables have been set. To work, the environment variable name must equal (cmd line arg namespace name).upper(). For example:
cmd line arg namespace name env var name ------------ -------------- ------------ --log-console log_console LOG_CONSOLE --aws-s3-bucket aws_s3_bucket AWS_S3_BUCKET
For boolean type args, this action works similarly to the 'store_true' built-in action, i.e. the cmd line arg functions as a switch to set the namespace value to True (unless the environment variable is defined in which case the env value will be used).
Ancestors
- argparse.Action
- argparse._AttributeHolder
class FieldManager (managed_fields: dict[str, ManagedField], standard_keys: list[str], context_rules: Sequence[ContextRule] = (), context_sep: str = '|', output_dir: str | None = None, patient_id: str = 'Unknown', raw_summary: dict[str, vStr] = <factory>)
-
Apply the settings defined in the ManagedField objects to the data in a flattened summary dictionary from the output of a TableTransformer instance.
Args
managed_fields
:dict[str, ManagedField]
- A dictionary mapping keys to ManagedField objects.
standard_keys
:list[str]
- A list of standard keys used in the data typically defined in a summary_spec (see specs/summary_specs.py and table_transformer.py).
context_rules
:Sequence[ContextRule]
- A list of context rules to apply to the data. ContextRules allow for conditional modification of the preferred point of origin (context) and/or manipulation of the candidate values for a standard key subset.
context_sep
:str
- A string used to separate keys and their context. Defaults to "|". Example key: "schedule.startTime|CSV|0".
output_dir
:str | None
- An optional file path prefix (end in "/" to target a folder)
If supplied,
f"{output_dir}{patient_id}_managed_fields.json"
will be created after calling thereduce()
method. patient_id
:str
- an identifier for the patient being processed. Used in debug output filenames and logging. Defaults to 'Unknown'.
raw_summary
:dict[str, vStr]
- A dict containing raw summary data for one patient.
Defaults to an empty dict to facilitate templating (see
__call__()
docstring)
Attributes
standard_key_groups
:KeyGroups
- A mapping between standard keys and groups of
corresponding context bearing candidate key value pairs. Has form
{'<standard key>': {'<standard key><sep><context>': '<value>', ...}, ...}
context_groups
:dict[str, KeyGroups]
- The inverse of standard_key_groups
(kind of). Groups raw data by context. Initialized as a defaultdict(dict). Used
in list and grouped element merge operations. See
_GROUPED_ELEMENTS
. Has form{'<parent>': {'<context>': {'<final element>': '<value>', ...}, ...}, ...}
. context_group_priorities
:dict[str, list[str]]
- Records the _group_context of the first candidate selected for a standard_key in a context group to ensure that the same context is prioritized for subsequent group members.
list_manager
:dict[str, list[tuple[int, int]]]
- init==False. A dictionary used to track the output list indexes of "array type" standard keys.
output
:dict[str, vStr]
- init==False. A dictionary containing the reduced output. Empty until the reduce() method is called.
During post intialization, the supplied raw_summary (dict[str, vStr]) is compiled into standard_key_groups (dict[str, dict[str, vStr]]) where the keys of raw_summary appear as subkeys within standard_key_groups with standard_keys members as parent keys. E.g., raw_summary entries:
{ "schedule.diagnosis|PDF.Anesthesia Record.Case Summary|0": "Right hip displacement", "schedule.diagnosis|PDF.Operative Note|7": "Displacement of right hip", }
would be compiled into standard_key_groups entries:{ "schedule.diagnosis": { "schedule.diagnosis|PDF.Anesthesia Record.Case Summary|0": "Right hip displacement", "schedule.diagnosis|PDF.Operative Note|7": "Displacement of right hip", } }
The same is true, generally speaking, for array type keys with the additional step of replacing the wildcard card in the standard key definition with the proper list index using the list_manager dictionary. E.g., raw_summary entries:{ "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|0": "Aetna", "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|1": "BCBS", "patient_info.insurance[1].company|PDF.Anesthesia Record.Active Insurance|0": "MDCR", }
would be compiled into standard_key_groups entries:{ "patient_info.insurance[0].company": { "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|0": "Aetna", }, "patient_info.insurance[1].company": { "patient_info.insurance[0].company|PDF.Anesthesia Record.Active Insurance|1": "BCBS", }, "patient_info.insurance[2].company": { "patient_info.insurance[1].company|PDF.Anesthesia Record.Active Insurance|0": "MDCR", }, }
by virtue of their combined list and table indexes (0, 0), (0, 1), and (1, 0) respectively.The reduce() method iterates over each standard field in standard_key_groups, selecting the proper final standard key value from the list of candidates according to the ManagedField definition. If the field is not managed and present in standard_keys, the default ManagedField will be applied (selects the most frequent value in the list of candidates).
Class variables
var context_group_priorities : dict[str, list[str]]
var context_groups : dict[str, dict[str, dict[str, str | vStr]]]
var context_rules : collections.abc.Sequence[ContextRule]
var context_sep : str
var list_manager : dict[str, list[tuple[int, int]]]
var managed_fields : dict[str, ManagedField]
var output : dict[str, str | vStr]
var output_dir : str | None
var patient_id : str
var raw_summary : dict[str, vStr]
var standard_key_groups : dict[str, dict[str, str | vStr]]
var standard_keys : list[str]
Methods
def expand_dependencies(self, this_field: ManagedField) ‑> collections.abc.Sequence[str]
-
Allow a 'non-list' field to depend on all values collected for a list field.
If a non-list field depends upon a list type field, replace the original wildcarded list key reference supplied in its dependencies with the list of indexed keys for which we actually collected data.
Example
Given a raw_summary containing data for three anesthesia providers, and a standard field "schedule.surgeon" with dependency "schedule.anesthesiaStaff[*].provider", expand the wildcarded list dependency into entries for the three provider values collected and return:
[ "schedule.anesthesiaStaff[0].provider", "schedule.anesthesiaStaff[1].provider", "schedule.anesthesiaStaff[2].provider", ]
Args
this_field
:ManagedField
- the managed field to process
Returns
list[str]
- return the original dependencies if the supplied field is itself a list field. otherwise, replace all wildcarded list type dependency references with the list of indexed keys generated by the dependency's collected values in _build_standard_key_groups().
def reduce(self)
-
Reduces self.standard_key_groups by applying the settings defined in self.managed_fields.
This method iterates over each standard field in self.standard_key_groups. If the field is not managed and present in self.standard_keys, the default ManagedField will be applied (selects the most frequent value in the list of candidates). If the field is managed, its dependencies are recursively reduced and a final value is generated according to the preprocess and xform functions defined in its ManagedField definition.
Args
_input
:KeyGroups | None
- The input dictionary to reduce. If None, the method uses the standard_key_groups attribute.
_recursing_for
:list[str]
- A list of standard keys that are being recursively reduced due to their presence as a dependency in a different standard key's ManagedField definition. Used to avoid infinite recursion.
Returns
dict
- The reduced dictionary.
class LogExHandler (continue_on_error: bool, default_return=False, *, is_generator=False, **kwargs)
-
A decorator class to handle exceptions, log them, and optionally continue execution.
This decorator can be used to wrap functions and handle exceptions by logging them along with their trace information. It provides options to continue execution or stop it based on the configuration. It also supports generator functions.
Usage
>>> @LogExHandler(True) ... def your_func(arg_1, arg_2, **kwargs): ... pass
Args
continue_on_error
:bool
- If False, raises an additional exception after saving the log to disk to stop execution.
default_return
:Optional
- If a literal, returns the literal. If a string, checks if 'self.', 'args' or 'kwargs' is in default_return. If so, sets return_val = eval(default_return). If is_generator is False, this can be used to return an argument supplied to the original function for continued processing. If is_generator is True, eval(default_return) should generate a new and complete "args" object. This new "args" object will then be passed to another instance of the generator to allow processing to continue in a fully transparent fashion for the caller. NOTE: the new "args" object should be modified from the original to exclude the object that resulted in the original error. Otherwise, exceptions will be logged until depth >= gvars.MAX_EX_DEPTH.
KwArgs
is_generator
:bool
- Set to True if decorating a generator function that utilizes the "yield" statement. Otherwise, the LogExHandler wrapper immediately returns the generator object to the caller without evaluating any of the internal generator logic, and exceptions raised during iteration will not be caught or logged by this decorator. Modifies "default_return" behavior (see above).
max_depth
:int
- Set the maximum number of attempts at re-entering a wrapped
generator. No effect when is_generator==False. Default is
gvars.MAX_EX_DEPTH
. exit_code
:int
- Sets a custom exit code in implementations where continue_on_error=False. Should be a positive integer. Defaults to 1.
notify
:bool
- If True, capture logged exceptions in
gvars.NOTIFICATIONS
to send an email notification at the conclusion of the run. Default is False. **kwargs
- Additional arbitrary keyword args are captured as class attributes.
Allows the caller to pass in classes and/or objects not present in this
module for use when evaluating a custom
default_return
(see above).
Example
>>> import contextlib, io # NOTE: required for stdout redirect during doctest evaluation >>> @LogExHandler(continue_on_error=True, default_return=0) ... def example_function(x, y): ... return x / y >>> with contextlib.redirect_stdout(io.StringIO()): ... test = example_function(10, 0) >>> test 0
class LogExOverrideError (*args, **kwargs)
-
Raise this exception to force all nested LogExHandlers to propagate an error upward to the first wrapped function with continue_on_error=False.
Ancestors
- builtins.Exception
- builtins.BaseException
class ManagedField (standard_key: str, context_priority: Sequence[str] = ('CSV', 'PDF', 'DB'), xform: Callable[[str | vStr], str | vStr] = <function ManagedField.<lambda>>, generated: bool = False, dependencies: Sequence[str] = (), preprocess: Callable[[str | vStr, dict[str, str | vStr]], str | vStr] = <function ManagedField.<lambda>>, reducer: Callable[[Sequence[str | vStr]], str | vStr] = <function most_freq_element>)
-
Standardize, recombine, generate, and prioritize extracted data based on point of origin, current value, and pre-defined dependencies.
Useful for establishing enums, rejecting bad inputs, and accepting postprocedure data over preprocedure data, among other use cases. Use the
standardize_field_value()
function for establishing enums. Implemented at the client and facility level of specs/client_specs.py. Global definitions for the ClaimMaker UI use case are available as members of the specs subpackage, i.e.:import specs specs.BASE_MANAGED_FIELDS # baseline field definitions
Args
standard_key
:str
- a key from a summary_specs entry
context_priority
:Sequence[str]
- A list used to prioritize the value
selected from a list of vStr candidates based on vStr.ctx and/or
point of origin provided in a summary key. Default is
("CSV", "PDF", "DB")
. xform
:Callable[[str | vStr], str | vStr]
- transform to apply to produce
a standard output from the provided input, if required. Defaults to
a null operation (i.e.
lambda x: x
) to allow context only operations. generated
:bool
- if true, calculate a value for this key from its depedencies and add it to the summary output even if the key is absent in the original summary provided to the FieldManager.
dependencies
:Sequence[str]
- list of standard field names that will be used to augment or construct the managed output
preprocess
:Callable[[vStr, dict[str, vStr]], vStr]
- takes raw value and
the dict of dependency values as input. preps raw value for xform.
defaults to a null operation (i.e.
lambda x, _: x
). reducer
:Callable[[Sequence[vStr]], vStr]
- takes a list of candidate
vStr values as input and returns the final output for this field.
Defaults to
most_freq_element()
.
Class variables
var context_priority : collections.abc.Sequence[str]
var dependencies : collections.abc.Sequence[str]
var generated : bool
var standard_key : str
Methods
def clone(self, list_idx: str) ‑> ManagedField
-
Clone this instance after replacing '*' with the supplied list_idx in standard_key and all dependencies.
Args
list_idx
:str
- an integer string e.g. "1"
Returns
ManagedField
- a new ManagedFields instance
def preprocess(x, _) ‑> collections.abc.Callable[[str | vStr, dict[str, str | vStr]], str | vStr]
def reduce(self, candidates: dict[str, str | vStr]) ‑> tuple[str, str | vStr]
-
Call self.reducer and return the key and value of the selected candidate.
Args
candidates
- dictionary of candidate keys and values
Returns
tuple[str, str | vStr]
- the selected candidate key and value
def reducer(check_lists, tiebreak_func=<function vstr_confidence_tiebreak>, xform: collections.abc.Callable[[typing.Any], typing.Union[str, vStr, typing.Any]] = utilities.v_str.vStr) ‑> collections.abc.Callable[[collections.abc.Sequence[str | vStr]], str | vStr]
-
Return the most frequent element from a list of lists.
Args
check_lists
:list[list[Any]]
- A list of lists containing elements to check.
tiebreak_func
:Callable[[Any, Any], Any]
- A function to break ties between
elements with the same frequency. Default is
vstr_confidence_tiebreak
. xform
:Callable[[Any], str | vStr | Any]
- A transformation function applied
to each element. Defaults to
vStr
.
Returns
Any
- The most frequent element after applying the transformation and tiebreak functions.
Example
>>> check_lists = [["a", "b", "a"], ["a", "c", "b", "b"]] >>> tiebreak_func = lambda *args: sorted(args)[0] >>> xform = str.upper >>> most_freq_element(check_lists, tiebreak_func, xform) 'A' >>> tiebreak_func = lambda *args: sorted(args)[-1] >>> most_freq_element(check_lists, tiebreak_func, xform) 'B'
def xform(x) ‑> collections.abc.Callable[[str | vStr], str | vStr]
class ProviderIntegratorProtocol (*args, **kwargs)
-
Define externally accessed methods for the ProviderIntegrator class
Ancestors
- typing.Protocol
- typing.Generic
Subclasses
Class variables
var api_url : str
var full_name : vStr
var is_anes_provider : bool
var mode : str | None
var npi : str | vStr
var public_only : bool
Instance variables
prop last_api_response : dict[str, typing.Any] | None
-
last public API query response
prop last_url_params : dict[str, typing.Any]
-
last public API query parameters
prop query_name
-
name of last query function
Methods
def search(self, is_anes_provider: bool, full_name: str | vStr, npi: str | vStr = '', mode: str | None = None) ‑> tuple[vStr, vStr]
-
progressively search local DB and public API with supplied provider data and return (a) fully populated vStr objects for the provider name and NPI upon a successful lookup or (b) an 'original value only' vStr object for the provider name and a "null" vStr upon failure.
class ValueCacheDict (cache_checks: collections.abc.Sequence[CacheDictCheck], **kwargs)
-
A decorator proviing flexible caching options for use in comprehensions.
Initially developed to provide a means to cache the parts of a full address when street address and city/state/zip are found in different sections of a form and thus labeled with independent bounding boxes, this class and its supporting class CacheDictCheck are implemented for general use. Useful in extending the basic functionality of comprehensions by allowing the current iteration to reference the results of prior iteration(s).
Args
cache_checks
:Sequence[CacheDictCheck]
- one CacheDictCheck instance per cached parameter. See CacheDictCheck for details.
ex_handler
:Callable[[Exception], bool]
- a user defined function to which
all raised exceptions are passed. Return True to continue processing.
Return False (default) to raise the exception to the calling thread.
Default is
lambda _: False
.
Attributes
cache_dict
:dict[Any, Any]
- this attribute will be added to functions
decorated with this class to provide convenient access to the
cached
attributes of each CacheDictCheck.
Instance variables
prop cache_dict : dict[collections.abc.Hashable, dict[collections.abc.Hashable, typing.Any]]
-
dict of dicts of form {'[check.check_name]: {[check.cached]}'
class vStr (_str, context: str = '', confidence: float = 0.0, force_type: bool | None = None, og_value: str | None = None, verified: bool | None = None)
-
Built-in string extended with attributes used in data validation.
Atrributes
ctx [str]: point of origin data for the value (
context
arg) con [float]: confidence that the value is correct (confidence
arg) frc [bool]: force all descendents to classvStr
(force_type
arg) ogv [str]:originalValue
in data_entry_fields (og_value
arg) tru [bool]:isVerified
in data_entry_fields (verified
arg)Ancestors
- builtins.str
Static methods
def cat(*args) ‑> vStr
-
equivalent to vStr.jn("", (args)). emulates ''.join(args)
def from_data_entry_dict(de_dict: dict[str, Any], verify_all: bool = False) ‑> vStr
-
Return a vStr constructed from a data_entry_fields object from the claimmaker DB and/or a validated claimmaker job dict
def from_nested(val: Any, is_verified: bool = False) ‑> str | vStr
-
Used when manually created cases contain data in nested columns with no corresponding entry in the data_entry_fields column.
returns a 1.0 confidence vStr with context 'DB.User' for 'non-null' vals i.e. when val != type(val)()
def jn(sep: str, iterable: Iterable[str | vStr]) ‑> vStr
-
emulates str.join()
def merge_attrs(iterable: Iterable[str | vStr]) ‑> tuple[str, float, bool, str | None, bool]
-
merge vStr attrs from multiple instances. returns pipe delimited string for ctx and mean for con or empty string / 0.0 if no vStr is present in input
Instance variables
var con
var ctx
prop data_entry_dict
-
return a data_entry_fields dict representing this vStr
var frc
prop is_verified : bool
-
True if value has been validated by a user.
var ogv
var tru
Methods
def extend_context(self, suffix: str) ‑> str
-
Append suffix to current ctx value
def format(self, *args, **kwargs) ‑> vStr
-
If any of the supplied args or kwargs is of type vStr, return the str.format result as a vStr having attributes merged from all vStr inputs. If none of the inputs are
def mutation_decorator(self, mutation_func: Callable[..., str])
-
Decorates underlying str mutation functions (e.g. upper, lower, strip, split, etc.) to restore ctx and con attributes post mutation
def prepend_context(self, prefix: str) ‑> str
-
Prepend prefix to current ctx value
def replace_context(self, new_context: str) ‑> str
-
Prepend prefix to current ctx value
def set_custom_attrs(self, context: str, confidence: float, force_type: bool, og_value: str, verified: bool)
-
set custom attributes
("ctx", "con", "frc", "ogv", "tru")