Module utilities.library_utils

Tools for constructing and managing 'pdf_library' objects

Classes

class DateVersIsEmptyTuple (last_modified: ForwardRef('datetime'), version_id: ForwardRef('str'), is_empty: ForwardRef('bool'))

Last modified date, version id, and status for an s3 object.

Ancestors

  • builtins.tuple

Instance variables

var is_empty : bool

Alias for field number 2

var last_modified : datetime.datetime

Alias for field number 0

var version_id : str

Alias for field number 1

class KeyVersionTuple (key: ForwardRef('str'), version: ForwardRef('str'), check_results: ForwardRef('bool') = False, reject_reason: ForwardRef('str') = '')

Data elements required to copy and fully delete an object in a versioned S3 bucket.

Attributes

key : str
The S3 key for the object.
version : str
The S3 Version ID of the object.
check_results : bool
Default False. Set to True to move the object generate an email alert and move to 'failed/' if its aws_s3_batch.S3Batch.send_results entry is absent or False.
reject_reason : str
Default is "". If set, reject the target object prior to aws_s3_batch processing.

Ancestors

  • builtins.tuple

Instance variables

var check_results : bool

Alias for field number 2

var key : str

Alias for field number 0

var reject_reason : str

Alias for field number 3

var version : str

Alias for field number 1

class PDFLibEntry (*, body: bytes, meta: dict[str, str] = <factory>, document_id: str = '', file_type: S3FileType = pdf)

Defines an entry in an S3Batch pdf_library.

All PDFLibEntry objects are required to link directly to an entry in an S3Batch send_result in order to be considered considered successfully extracted. If no send_result entry exists or the send_result entry is False for a PDFLibEntry object, an error will be logged and an email notification sent (ClaimMaker facilities only).

Attributes

body
bytes from a pdf file
meta
dict[str, str] of pdf file metadata
document_id
s3 key where pdf file is or will be stored
file_type
S3FileType to indicate how the entry should be processed

Ancestors

Subclasses

Class variables

var body : bytes
var document_id : str
var file_typeS3FileType
var meta : dict[str, str]

Methods

def as_entry(self) ‑> PDFLibEntry

Dummy function returning self for compatibility with PDFLibProto.

def push_to_s3(self, s3_key: str, force=False)

Push self.body to S3 with the specified key. Also saves to disk if gvars.CURRENT_SPEC contains a valid "output_dir" entry.

NOTE: AWS credentials MUST stored in the standard AWS environment variables or an exception will result.

Args

s3_key : str
S3 key to push to
force : bool
Optional. Force push even if NO_DB or PDF_EXT_RUN_MODE is set to 2. Defaults to False.

Inherited members

class PDFLibProto (*args, **kwargs)

Protocol for entries in an S3Batch pdf_library.

Attributes

body
bytes from a pdf file
meta
dict[str, str] of pdf file metadata
document_id
s3 key where pdf file is or will be stored
file_type
S3FileType to indicate how the entry should be processed
filename
the filename portion of the key of origin
last_modified
the filename portion of the key of origin
as_entry
method to convert a PDFLibReference to a PDFLibEntry

Ancestors

  • typing.Protocol
  • typing.Generic

Subclasses

Class variables

var body : bytes
var document_id : str
var file_typeS3FileType
var meta : dict[str, str]

Instance variables

prop filename : str

the filename portion of the s3 key of origin

prop last_modified : str

the last modified timestamp of the s3 key of origin (str)

var pdf_reader

A PdfReader object. Raises TypeError if the file is not a PDF.

Methods

def as_entry(self) ‑> PDFLibEntry

Converts a PDFLibReference to a PDFLibEntry instance.

class PDFLibReference (*, document_id: str, body: bytes | None = None, meta: dict[str, str] | None = None, file_type: S3FileType = pdf, **kwargs)

Defines a reference to an S3 file object without actually downloading it.

If the body property is empty, the reference is downloaded from S3 when the body property is requested. NOTE: Valid AWS credentials MUST stored in the standard AWS environment variables or an exception will result. PDFLibReferences are distinct from PDFLibEntries in that they are not required to link directly to an entry in an S3Batch send_result in order to be considered considered successfully extracted. This is useful for posting "information only" documents (e.g. PDF schedules) to placeholder cases and handling facilities (e.g. ms_chaph) that require two PDFs per patient but never send both PDFs in the same feed.

Attributes

document_id
s3 key where pdf file is or will be stored
file_type
S3FileType to indicate how the entry should be processed
body
bytes from a pdf file
meta
dict[str, str] of pdf file metadata

Ancestors

Instance variables

prop body : bytes

The bytes of the referenced object.

If the "body" property is empty, the reference is downloaded from S3 and cached upon first access. If gvars.CURRENT_SPEC contains a valid "output_dir" entry, a local copy is saved in the specified folder.

Methods

def as_entry(self) ‑> PDFLibEntry

Converts this PDFLibReference to a PDFLibEntry instance.

Inherited members

class RecordsLibEntry (*, body: bytes = b'', meta: dict[str, str] = <factory>, document_id: str = '', file_type: S3FileType = pdf, static_columns: dict[str, str | Sequence[str]] = <factory>, source_s3_keys: list[str] = <factory>, extracted_data: Any = None)

A dataclass to hold tabular data and metadata.

Ancestors

Class variables

var body : bytes
var extracted_data : Any
var source_s3_keys : list[str]
var static_columns : dict[str, typing.Union[str, typing.Sequence[str]]]

Instance variables

prop records : bytes | Any

returns self.extracted_data if populated. otherwise, returns self.body.

Inherited members

class S3FileType (value, names=None, *, module=None, qualname=None, type=None, start=1)

Mark a PDFLibProto object for processing as a schedule, a discrete demographics reference, a secondary discrete reference (other_csv), a parsable pdf, an image based (docuvision) pdf, or a "display only" pdf (placeholder document).

Attributes

SCHDL
schedule. Handled by S3Batch.match_schedule_to_demographics().
DEMOS
demographics. Handled by S3Batch.match_schedule_to_demographics().
OTHCSV
other_csv. Handled by S3Batch.match_schedule_to_demographics().
PDF
parsed pdf. Handled by pdf_extractor.py.
DOCVIS
docuvision pdf. Handled by a DocuVisionIntegrator instance.
MANUAL
manual. Handled by S3Batch.add_placeholders().

Ancestors

  • builtins.str
  • enum.Enum

Class variables

var DEMOS
var DOCVIS
var MANUAL
var OTHCSV
var PDF
var SCHDL