Module extract_s3
API for AWS and command line execution of aws_extractor.py
Purpose
Reads and processes the contents of an s3 "folder" (i.e. prefix) to extract data from PDFs, CSVs, FWFs, and more. Extracted data can be inserted into a postgresql DB or posted as json to S3 or disk. Processed keys are relocated to either a "processed" or "failed" folder depending on whether the extraction was successful.
Usage
python extract_s3.py [cli options]
Use --help
to display detailed documentation of the cli options.
Functions
def eval_specs(specs_dict: dict)
-
if spec pickle value starts with 'eval::', perform eval() on the text following '::' and store the result in the source key
def main()
-
Parse cli args and execute the appropriate aws_extractor algorithm.
If the –to-extract-key cli arg is supplied, extract_cases() is called to process only the –to-extract-key. Otherwise, extract_buckets() is called, and all clients and facilities defined in the client specs (either built in or supplied via cli arg) are processed in series.
def process_aws_creds(args)
-
check AWS env vars and add to args namespace if present.
def process_cli_args() ‑> argparse.Namespace
-
Create command line arg parser and evaluate, validate, and resolve conflicts with environment variable settings.
Returns
argparse.Namespace
- command line arguments namespace supplied as
kwargs to extract_buckets() or extract_cases() via
**vars(args)
.
def process_client_specs(args)
-
Transmogrify pickle from –client-specs-file into valid client specs.
def process_s3_specs(args)
-
Transmogrify binary files from s3 into valid specs dictionaries.
Searches for args prefixed with 'aws_s3_key_' and ending with 'specs'. If found, the value is used as an s3 key to download a pickle file containing a specs dictionary. The dictionary is unpickled and evaluated to resolve any 'eval::' values. The resulting dictionary is stored in the args namespace with the 'aws_s3_key' prefix removed, e.g. 'aws_s3_key_client_specs' becomes 'client_specs'.
Args
args
:argparse.Namespace
- command line arguments namespace.
def register_fatal_except_hook()
-
Set custom sys.excepthook while preserving call to original.
def usage(parser: argparse.ArgumentParser)
-
Prints arg parse help and exits program if arg validation fails.
Args
parser
:argparse.ArgumentParser
- parser object to print help from.