Module utilities.library_utils
Tools for constructing and managing 'pdf_library' objects
Classes
class DateVersIsEmptyTuple (last_modified: ForwardRef('datetime'), version_id: ForwardRef('str'), is_empty: ForwardRef('bool'))
-
Last modified date, version id, and status for an s3 object.
Ancestors
- builtins.tuple
Instance variables
var is_empty : bool
-
Alias for field number 2
var last_modified : datetime.datetime
-
Alias for field number 0
var version_id : str
-
Alias for field number 1
class KeyVersionTuple (key: ForwardRef('str'), version: ForwardRef('str'), check_results: ForwardRef('bool') = False, reject_reason: ForwardRef('str') = '')
-
Data elements required to copy and fully delete an object in a versioned S3 bucket.
Attributes
key
:str
- The S3 key for the object.
version
:str
- The S3 Version ID of the object.
check_results
:bool
- Default False. Set to True to move the object
generate an email alert and move to 'failed/' if its
aws_s3_batch.S3Batch.send_results
entry is absent or False. reject_reason
:str
- Default is "". If set, reject the target object
prior to
aws_s3_batch
processing.
Ancestors
- builtins.tuple
Instance variables
var check_results : bool
-
Alias for field number 2
var key : str
-
Alias for field number 0
var reject_reason : str
-
Alias for field number 3
var version : str
-
Alias for field number 1
class PDFLibEntry (*, body: bytes, meta: dict[str, str] = <factory>, document_id: str = '', file_type: S3FileType = pdf)
-
Defines an entry in an S3Batch pdf_library.
All PDFLibEntry objects are required to link directly to an entry in an S3Batch send_result in order to be considered considered successfully extracted. If no send_result entry exists or the send_result entry is False for a PDFLibEntry object, an error will be logged and an email notification sent (ClaimMaker facilities only).
Attributes
body
- bytes from a pdf file
meta
- dict[str, str] of pdf file metadata
document_id
- s3 key where pdf file is or will be stored
file_type
- S3FileType to indicate how the entry should be processed
Ancestors
- PDFLibProto
- typing.Protocol
- typing.Generic
Subclasses
Class variables
var body : bytes
var document_id : str
var file_type : S3FileType
var meta : dict[str, str]
Methods
def as_entry(self) ‑> PDFLibEntry
-
Dummy function returning self for compatibility with PDFLibProto.
def push_to_s3(self, s3_key: str, force=False)
-
Push self.body to S3 with the specified key. Also saves to disk if
gvars.CURRENT_SPEC
contains a valid "output_dir" entry.NOTE: AWS credentials MUST stored in the standard AWS environment variables or an exception will result.
Args
s3_key
:str
- S3 key to push to
force
:bool
- Optional. Force push even if NO_DB or PDF_EXT_RUN_MODE is set to 2. Defaults to False.
Inherited members
class PDFLibProto (*args, **kwargs)
-
Protocol for entries in an S3Batch pdf_library.
Attributes
body
- bytes from a pdf file
meta
- dict[str, str] of pdf file metadata
document_id
- s3 key where pdf file is or will be stored
file_type
- S3FileType to indicate how the entry should be processed
filename
- the filename portion of the key of origin
last_modified
- the filename portion of the key of origin
as_entry
- method to convert a PDFLibReference to a PDFLibEntry
Ancestors
- typing.Protocol
- typing.Generic
Subclasses
Class variables
var body : bytes
var document_id : str
var file_type : S3FileType
var meta : dict[str, str]
Instance variables
prop filename : str
-
the filename portion of the s3 key of origin
prop last_modified : str
-
the last modified timestamp of the s3 key of origin (str)
var pdf_reader
-
A PdfReader object. Raises TypeError if the file is not a PDF.
Methods
def as_entry(self) ‑> PDFLibEntry
-
Converts a PDFLibReference to a PDFLibEntry instance.
class PDFLibReference (*, document_id: str, body: bytes | None = None, meta: dict[str, str] | None = None, file_type: S3FileType = pdf, **kwargs)
-
Defines a reference to an S3 file object without actually downloading it.
If the
body
property is empty, the reference is downloaded from S3 when thebody
property is requested. NOTE: Valid AWS credentials MUST stored in the standard AWS environment variables or an exception will result. PDFLibReferences are distinct from PDFLibEntries in that they are not required to link directly to an entry in an S3Batch send_result in order to be considered considered successfully extracted. This is useful for posting "information only" documents (e.g. PDF schedules) to placeholder cases and handling facilities (e.g. ms_chaph) that require two PDFs per patient but never send both PDFs in the same feed.Attributes
document_id
- s3 key where pdf file is or will be stored
file_type
- S3FileType to indicate how the entry should be processed
body
- bytes from a pdf file
meta
- dict[str, str] of pdf file metadata
Ancestors
- PDFLibProto
- typing.Protocol
- typing.Generic
Instance variables
prop body : bytes
-
The bytes of the referenced object.
If the "body" property is empty, the reference is downloaded from S3 and cached upon first access. If
gvars.CURRENT_SPEC
contains a valid "output_dir" entry, a local copy is saved in the specified folder.
Methods
def as_entry(self) ‑> PDFLibEntry
-
Converts this PDFLibReference to a PDFLibEntry instance.
Inherited members
class RecordsLibEntry (*, body: bytes = b'', meta: dict[str, str] = <factory>, document_id: str = '', file_type: S3FileType = pdf, static_columns: dict[str, str | Sequence[str]] = <factory>, source_s3_keys: list[str] = <factory>, extracted_data: Any = None)
-
A dataclass to hold tabular data and metadata.
Ancestors
- PDFLibEntry
- PDFLibProto
- typing.Protocol
- typing.Generic
Class variables
var body : bytes
var extracted_data : Any
var source_s3_keys : list[str]
var static_columns : dict[str, typing.Union[str, typing.Sequence[str]]]
Instance variables
prop records : bytes | Any
-
returns self.extracted_data if populated. otherwise, returns self.body.
Inherited members
class S3FileType (value, names=None, *, module=None, qualname=None, type=None, start=1)
-
Mark a PDFLibProto object for processing as a schedule, a discrete demographics reference, a secondary discrete reference (other_csv), a parsable pdf, an image based (docuvision) pdf, or a "display only" pdf (placeholder document).
Attributes
SCHDL
- schedule. Handled by
S3Batch.match_schedule_to_demographics()
. DEMOS
- demographics. Handled by
S3Batch.match_schedule_to_demographics()
. OTHCSV
- other_csv. Handled by
S3Batch.match_schedule_to_demographics()
. PDF
- parsed pdf. Handled by pdf_extractor.py.
DOCVIS
- docuvision pdf. Handled by a DocuVisionIntegrator instance.
MANUAL
- manual. Handled by
S3Batch.add_placeholders()
.
Ancestors
- builtins.str
- enum.Enum
Class variables
var DEMOS
var DOCVIS
var MANUAL
var OTHCSV
var PDF
var SCHDL