PDF Extractor

Documentation homepage for PDF Extractor

Repository Readme


pdf-extractor

A customizable framework for extracting, segmenting, contextualizing, and standardizing text data from pdf medical records

Purpose

Translate PDF, csv, and other inputs into a desired json schema and insert the translated records into a Hank ClaimMaker DB or post them directly to s3.

Development Setup

AWS CLI

Download and install the AWS CLI

Python Environment (Windows)

  • Please note that most of the current development team is using Windows machines
  • Install miniconda
  • From a command prompt, run the following:
  • conda create --name=pdfext python=3.10.9
  • conda activate pdfext
  • conda install pip
  • pip install -r requirements-dev.txt
  • pip install notebook
  • For Mac/Linux:
  • pip install notebook
  • pip install psycopg2

VSCode Setup

  • Open pdfext.code-workspace using VSCode
  • Type Ctrl+Shift+P. In the command search bar, type "Select Interpreter" and choose "Python: Select Interpreter" and then "Set at workspace level".
  • Choose your newly created pdfext environment from the list.

Environment Variables

The following parameters may ONLY be supplied as environment variables. Two of these MUST be supplied or errors will result. See below for additional options. Pay particular attention to those marked Required and Recommended.

  • AWS_ACCESS_KEY_ID
  • Set to access key ID of an active SDLC PowerUser/Admin AWS session or IAM user hank-ai-clients-dev_importer (defined in the SDLC AWS account)
  • AWS_SECRET_ACCESS_KEY
  • Set to secret access key of an active SDLC PowerUser/Admin AWS session or IAM user hank-ai-clients-dev_importer
  • AWS_SESSION_TOKEN
  • OPTIONAL. If using SSO credentials, set to the session token of an active SDLC PowerUser/Admin AWS session. Not required for IAM user.

Command Line Testing

  • Production operations all occur from within a Docker Container, but either of the CLI Modules can be invoked directly from a command prompt on Windows, Linux, or MAC for rapid development and testing.
  • See the Command Line Argument / Environment Variable Reference below for additional information.

Basic ClaimMaker Client / Facility Setup

This section covers the basics of setting up a new client / facility in pdf_extractor. It does not attempt to provide an exhaustive description of all features available to the developer.

AWS Setup

  1. (new clients only) Follow the steps in scripts/create_client_secrets.py to generate the db and api secrets for the client in both DEV and PROD. Record the PROD secret names for use during Client Specs Setup.
  2. Follow the steps in scripts/create_s3_folders.py to generate the S3 folder structure for the new client / facility.

Client Specs Update

  1. (new clients only) Open app/specs/client_specs.py. Insert a new key in the builtin_client_specs ClientSpecs object that corresponds to the client folder name created in step 2 of AWS Setup excluding the final '/'.
  2. (new clients only) Set the value of the newly created key from step one to a ClientSpecStatic instance with an extending empty dictionary (i.e. ClientSpecsStatic(...) | {}). The ClientSpecStatic instance should inhert from BASE_CLIENT_SPECS as shown for existing clients. Set the secret name parameters of the instance to match those recorded in step 1 of AWS Setup.
  3. Insert a new key in the extending dictionary that corresponds to the facility folder name created in step 2 of AWS Setup excluding the final '/'. Set its value to a FacilitySpec instance created according to Configure the FacilitySpec.

Configure the FacilitySpec

  1. Always inherit from BASE_FACILITY_SPEC as shown for existing clients, i.e. FacilitySpec(BASE_FACILITY_SPEC, ...settings...).
  2. Set the facility_name parameter to the value found in the shared.facilities table of the client's ClaimMaker DB. Typically, this is the uppercase variant of the facility key added in step 3 of Client Specs Update.
  3. Set the first_dos parameter to the startup date of service for the new facility (per client direction).
  4. If data will be extracted from PDFs using DocuVision, (a) set the use_docuvision parameter to True and (b) set the file_groups parameter to [cu.S3FileGroup(file_type=lu.S3FileType.DOCVIS)]. If the length of an average PDF is more than 100 pages, also set the max_keys parameter to 5 (recommended but not required).
  5. If data will be extracted from PDFs directly, (a) set the file_groups parameter to [cu.S3FileGroup(file_type=lu.S3FileType.PDF)] (optional if no CSVs will be processed as this is the default) and (b) set the extract_func parameter to extract_single_patient_pdfs or extract_multi_patient_pdfs (or a custom partial implementation of one or the other) according to the content of the source PDFs.
  6. If data will be extracted from a CSV schedule, (a) follow the steps described in the Schedule Mapping Template to populate app/specs/match_specs.py with an entry for the new facility, (b) set the match_specs_key parameter to the key of the match_specs entry you created, and (c) append cu.S3FileGroup(file_type=lu.S3FileType.SCHED, filename_test_expr=re.compile(r"(?i)^.*\.csv$")) to the file_groups list.
  7. If demographics data is provided in a separate CSV reference, (a) adjust the filename_test_expr shown for the schedule in step 6 to differentiate schedule files from demographic files and (b) append an additional cu.S3FileGroup with file_type=lu.S3FileType.DEMOS (and an appropriately differentiated filename_test_expr) to the file_groups list.

NOTE: The FacilitySpec setup described above covers the most basic of basic use cases and assumes (a) single patient pdfs are comprehensive for a given encounter (no file combining), (b) multi patient pdfs contain a first line header with some encounter identifying information followed by Page # of #, and (c) all file names are loosely equivalent to <optional non-numeric data>YYYYMMDD<optional additional info>.<pdf or csv> with YYYYMMDD representing the date the file was bridged.

Core Concepts

Functional Programming

  • Well, sort of... There are still a lot of OOP concepts in play but all of the core modules rely on Mappings defined on a per facility (or facility type) basis to define the operations and transformations that must occur to successfully generate the desired output.
  • The Mappings rely heavily on the "partial" class defined in the builtin module "functools" to define facility specific implementations of various functions and/or classes.
  • The "LogExHandler" class defined in utils.py serves to some degree as a context manager for functional programming operations, allowing the caller to define fallbacks and re-enter generator loops after an exception has occurred. See "TEST CASE #4" in scripts/manual_tests.py for an example of its capabilities.

"Flattening" and "Unflattening" with JMESPath

  • https://jmespath.org/specification.html
  • The python port of JMESPath is relied upon heavily for flattening/unflattening operations. Understanding JMESPath search pattern syntax will be helpful in understanding many of the operations contained in the transform_utils.py module.
  • If a nested object is desired as the final output, a Summary Spec containing output keys corresponding to valid jmespath search patterns should be used along with "summary_func" as_nested_dict.

Pickling Specs using Dill

  • All of the various "*_specs" mappings can be pickled using dill and supplied as kwargs to the aws_extractor.py extract_buckets function.
  • See scripts/ria_*_specs.py script files for pickling methods for each spec type.
  • Pickles for all spec types can be supplied via the --aws-s3-key-*-specs series of cli args.

"Sections" and "Tables"

  • A "section" refers to a logical grouping of lines of text that has been extracted from the larger body of a PDF. Unique specifications for delineating sections can be defined in the sect_specs mapping according to "facility type". The "section_specs_key" field in the facility's client_spec mapping (in specs.builtin_client_specs) determines which spec is utilized at runtime.
  • A "table" is loosely analogous to a "table" in a DB and consists of a list of dicts each of which represents a single "row" of data. As such, every dict MUST have the same set of keys if multiple "rows" are present, but it's not uncommon for a table to have only one "row". The data in a table is generated by passing a subset of the lines from a single section to a specified "interpreter" function. The "interpreter" to be used and the logic for selecting the lines from the source section are specified by section title AND facility type in specs.builtin_table_specs (driven by "section_specs_key" as above).
  • All extracted sections are included as separate "documents" in the right pane of the Hank UI unless specifically excluded via a transformation of "Raw Sections" defined in specs.builtin_transform_specs (e.g. the "Proocedure Notes" section is dropped from the output via spec entry transform_specs["claimmaker_transform_spec"]["Raw Sections"]["section_transforms"]["pre"]["1.drop_tables"] because the individual tables in that section are included as documents rather than the section as a whole; see TableTransformer Tips below).
  • The key/value data from tables (e.g. Patient Name, MRN, DOB, etc.) are used to populate the Hank UI DB and/or to match documents with schedules and demographic downloads when such are available. Tables are NOT included as "documents" in the right pane of the UI by default, but a transformation specified in specs.builtin_transform_specs (see TableTransformer Tips below) can be applied in cases where the lines of a table (rather than a section) should be displayed (e.g. individual "Procedures" are displayed as documents via spec entry transform_specs["claimmaker_transform_spec"]["Raw Tables"]["Procedure"]["table_transforms"]["1.split_key"]).

Core Functionality

specs

Client: "It's gotta look like this." Developer: "I got u fam..." *imports spec package
  • Provides TypedDict classes for customizing behaviors for all workflow steps.
  • Defines generic base instances for all TypedDict classes.
  • Includes a base definition for parsing Epic PDFs.
  • Defines builtin specs for all clients and facilities and functions for retrieving them at runtime.

aws_extractor.py

Serves as the head for all cli modules. Two entrypoints: extract_cases() and extract_buckets()
  • Downloads PDFs from S3 for each client/facility and instantiates an S3Batch.
  • S3Batch contains methods for invoking all of the remaining core modules defined below.
  • Scheduled execution function extract_buckets yields instances of S3Batch back to caller.
  • On-demand file processing function extract_cases calls all remaining modules implicitly.
  • Each batch is sent to a pdf_extractor function as defined by facility specific specs in specs.builtin_client_specs.
  • Batches are determined by the filename portion of the S3 object key.
  • Extracts information contained in the filename and appends a Source File Metadata section to the text extracted from the PDF to include this information in the output.
  • Customizable "send_func" is defined in specs.builtin_client_specs for each facility. Target a db for uploading data or send directly to s3 as a flat file.
  • Manages s3 object location, moving objects from "unprocessed" to "processed" (when extraction is successful) or "failed" (if fatal errors are encountered or an object does not match any *_test_expr regular expression strings defined for the facility in specs/client.py).

pdf_extractor.py

Extract structured text information from PDF
  • Split multi patient PDFs based on supplied “split generator function” or combine multiple PDFs into a single case via supplied "file match function"
  • Identifies repetitive “header/footer” lines and strips them from all pages. Any relevant information is appended to the end of the text output in a Header/Footer Info section.
  • Drops “continued” title lines
  • One function call to extract an entire library of 1+ PDF/patient format
  • One function call to extract all patients from a library of multi patient PDFs
  • Output contains a list of lines of text, a corresponding list of files of origin, and a "pagemap" used to select the proper pages when splitting a "multi-patient" pdf into patient specific children.

section_extractor.py

Separate extracted text into discrete sections + “everything else”
  • Sections are “stripped” from the main text into independent “notes”
  • Section names are dynamically determined (not hard coded)
  • Section starts and ends are identified base on supplied functions
  • Document “Stripping” support for removing unnecessary sections
  • Removes “Attributions” to reduce parsing/NLP complexity
  • E.g. Procedure:[AA.1] Colonoscopy --> Procedure: Colonoscopy
  • Epic specific; needs abstraction
  • All text remaining after section extraction concludes is added to a final “Anesthesia Record” section
  • 1 instance per patient record (created via factory function)
  • Automatically creates associated Table Extractor instance

table_extractor.py

Extract fields and tables from extracted sections (including the “everything else” section)
  • Transforms structured text into field/value pairs
  • Table names are dynamically determined
  • Processing driven by supplied “interpreter” function
  • A default “interpreter” is assigned based on section but alternate interpreters can be assigned based on table name (hard config; needs improvement)
  • Interpreter signature standardized with additional parameters passed via kwargs
  • Supports interpretation of entire section as single table or splitting into subtables
  • 1 instance per patient record; created automatically by SectionExtractor

table_transformer.py

Transform high dimensional field/table data into standard field outputs
  • Ingests patients from a dictionary of SectionExtractors (created via factory function) or a directory of json files
  • Groups data for all patients by section and table for interrogation and transformation
  • Manual mode with data analysis tools for data inspection, testing and debugging
  • Auto mode (default) applies all transformations defined in a “transform spec” dictionary and summarizes data into a summary dict and summary DF for all patients
  • Summaries generated based on function dictionaries
  • Currently developed summaries (see specs.builtin_summary_specs):
    • gui_summary_spec: creates json output suitable for ingestion by back end/GUI for lumberton multi-patient PDFs
    • default_summary_spec: creates a standardized output for use during automated audit activities
  • See TableTransformer Tips below for more info.

Support Libraries

matchops

Standard framework for matching records across datasets using pandas dataframes

matcher.py

  • DataFrameMatcher class definition
  • Standard class used to match records across datasets
  • Caller specifies a left and right data source and DataFrameConstructor and an ordered sequence of joins to be performed
  • After each join, successfully matched records are removed from the left dataframe and added to the output
  • Matching ends and output is returned when the left dataframe is empty (i.e. all records in left have been matched) or all defined joins have been exhausted

constructor.py

  • DataFrameConstructor class definition
  • Standard class used to construct the dataframes utilized by the DataFrameMatcher class
  • Caller defines a data source and a pandas method to be used when attempting to construct the dataframe
  • Caller can optionally define a list of required columns to return an empty dataframe with the columns specified if an exception is encountered.

standard_matchers.py

  • Contains general use matching specifications case_matcher and case_deduper.
  • case_matcher provides general purpose matching for combining data from disparate sources, e.g. matching csv with db data and then pdf data.
  • case_deduper compares data between pdfs to prevent duplicate case creation when two PDFs are received for the same patient encounter.

dbops

Contains all DB related functionality including connections, selects, inserts and updates

push_analysis_jobs.py

  • Insert/update functions for tables analysis_jobs and analysis_job_claims

selects.py

  • Select functions for all tables
  • "Read only" db module

create_batch.py

  • Insert function for batches table

db_utils.py

  • Common DB functions
  • set_db_globals
  • db_check decorator
  • minimized_query convenience function
  • TrimEncoder, a custom JSONEncoder used to create a "preview" of the output for a case by trimming any value with a length > 100 chars to be 100 chars. Appends "..." to any trimmed value.

schemaops

Schema definitions and validators for jsonb DB columns patient_info, input, and schedule

*.model.json

  • All files ending with this prefix must contain a valid jsonschema draft 7 schema definition
  • The "*" portion of the filename should correspond to a DB column that implements the defined schema
  • All such files are added as validators by default when an instance of DefaultingValidatorGroup is created.

validators.py

  • DefaultingValidatorGroup class definition and support functions
  • When a DefaultingValidorGroup instance is called, it verifies the values contained in dict passed in as an argument, adding default values for all required fields if a default value is defined.

CLI Module - extract_s3.py

  • Primary CLI module targeted by the Docker Container when populating the Hank UI DB
  • Can be run locally. See the Command Line Argument Reference for details.
  • Invoked via AWS EventBridge rules detailed below:
  • pdf-extractor[-dev][-client-reference]
    • standard scheduled runs at 10AM UTC. Processes ALL KEYS located in unprocessed/ folders for all clients and facilities defined in specs.builtin_client_specs
    • can be constrained to specific clients by setting the container command to ['--client-list', 'client1,client2,...'] in the Batch Job Input Transformer definition
  • pdf-extractor-run-on-upload[-dev]
    • triggered on S3 object creation if newly created key is prefixed to-extract/. Processes ONLY the newly created object.

Docker

  • Uses the standard python:3.10-slim-bullseye base image
  • The container version that runs is selected via version tag (e.g. v0.7.0) in the AWS Batch job definition and is manually configured.
  • To bump the version running in prod:
  • Using the AWS Batch console, create a new revision of the pdf-extractor job definition that targets the latest container version.
  • The job definition with the highest revision number is automatically selected when the job is triggered via EventBridge.

TableTransformer Tips

  • Two specs are passed to the constructor for the TableTransformer class defined in table_transformer.py:
  • A transform_spec from specs.builtin_transform_specs that defines an ordered series of section and table transforms to apply during preprocessing.
  • A summary_spec from specs.builtin_summary_specs that further reduces the transformed data into the final output

What's a transform_spec?

  • Simply put, a transform_spec defines a series of transforms to apply to the "raw" section/table data that was supplied to the TableTransformer class constructor. A transform might: perform a pivot, copy a key/value from one table to another, drop a table, split a key (e.g. full address to street, city, state, etc.), or rename keys.
  • The transforms defined by a transform_spec occur in three steps: section "pre" transforms, table transforms, and section "post" transforms. Each of these steps is applied across the entire dataset in series (i.e. "pre" transforms for all sections, then table transforms for all sections, then "post" transforms for all sections), and sections are processed in the order in which they appear. THIS ORDERING IS CRITICAL!!! Transform B might target elements that are created by Transform A and later deleted by Transform C.
  • Each transform_spec has the following structure:
  • Section Name: The extracted name of a section from a PDF
    • "section_transforms" (literal, required): If no section transforms are required, the value should be set to {}. Defines transforms scoped at the Section level. As such, any transform that involves more than one table or section (e.g. moving data from one table to another, moving an entire table from one section to another, etc.) must be defined at this level.
    • "pre" (literal): transforms to apply PRIOR to any defined table transforms
      • Transform Function Reference (one or more): key is of form [alphanumeric identifier defining the order in which to apply the function].[name of a transform function defined in the TableTransformer class], e.g. "1.copy_keys", "2.move_tables", "zzz.drop_tables", etc.
      • transform function kwargs
    • "post" (literal): transforms to apply AFTER any defined table transforms
      • Transform Function Reference (one or more): as above
      • transform function kwargs
    • Table Name (0 or more): The extracted name of a table from the PDF that occurs within the parent section.
    • "table_transforms" (literal, required): included for extensibility, no other keys are currently defined under a Table Name
      • Transform Function Reference: as above except the referenced function must be defined in the Section class (in table_transformer.py)
      • transform function kwargs
  • Second Section Name: ...
  • One of the most common use cases for transforms is to disambiguate keys like "Name", "Address", "Provider", etc. "Name", for example, could refer to the name of the patient, provider, guarantor, or even the name of an insurance company depending on the table in which it appeared. To ensure the correct key's value is captured in the final output, it is often beneficial to rename these keys to something unique, e.g. converting "Name" to "GuarNm" in the Guarantor Information table.

Hey! You didn't say anything about a "condense spec"

  • A third type of spec, a "condense spec", can be used to define the operations performed by a "condense_tables" transform. Getting to know them will make it a bit easier to understand how the summary_spec is applied. Here are the basics:
  • The purpose of the condense_tables transform is to convert a relatively unstructured, "as extracted" table into a standard format with predictable field names and values.
  • The "condense spec", the mapping parameter that defines this standard format, can be defined "inline" (within the transform_spec itself) or abstracted into a separate mapping (e.g. the "op_notes_condense_spec" in the specs/transform_specs.py module).
  • The top level keys in the spec are the keys that will appear in the transformed output.
  • The value of each entry in the spec is also a mapping. These "sub-mappings" each contain a "return" key containing a reference to a reduce function along with one or more additional keys. Each additional key is itself a "key search pattern" and its value is also a function reference. Each search pattern key is tested against the keys from all of the tables in the current section. The values for the matched keys are passed to the function reference to produce a list of candidate values collected by that particular search pattern. The results of all of the searches are then passed to the reduce function defined by the "return" key to select the final output from the list of candidates.
  • JEEZ! Maybe an example will help:

python "patient_info.patient.accn": { # this key will appear in the transform output "return": utils.most_freq_element, # final "reduce" function "Acct*ID": tu.stripped_first_element, # first search pattern "Hospital*Account": tu.stripped_first_element, # second search pattern "Acct*#": tu.stripped_first_element, # third search pattern "Acct*Num*": tu.stripped_first_element, # fourth search pattern }

Can we get to the summary_spec already?

  • A summary_spec is essentially a special case of a condense spec that is used to populate the "summary_dict" and "output" attributes of the TableTransformer instance. The majority of the entries in a summary spec are simply the condense spec for populating the "summary_dict" attribute, but there are three additional keys, 1 required and 2 optional, that are used to define the final transform for converting the "summary_dict" (which contains keys of form [condense spec key entry].[section name].[table name].[table instance #]) to the final "output" (which contains keys that are section/table agnostic):
  • summary_func (required, default="as_summary_dict"): a string matching the name of a function in the TableTransformer class that should be called to convert "summary_dict" into the desired "output". The default simply performs a simple "most frequent value" reduce operation to resolve any collisions resulting from the removal of the section/table context and copies the deduped results to "output".
  • summary_args (optional): a dict of kwargs to be passed to the summary func along with the output of the condense operation.
  • summary_key_addendum (optional): a list of additional keys eligible for inclusion in the "output" attribute of the TableTransformer instance.

Process Overview

Process Overview

Command Line Argument / Environment Variable Reference

  • The following "command line" options all have "twin" environment variables. For example, argument --aws-s3-bucket has a twin environment variable AWS_S3_BUCKET.
  • If the environment variable is defined, its value always overrides the command line value.
  • If the environment variable is undefined and the command line argument is suppled, the environment variable is set to the command line value for the current run.
  • See Environment Variables in Development Setup for additional required environment variables.

Required Parameters

--aws-s3-bucket bucket-name

  • The name of the AWS S3 Bucket to scan for source PDFs. The top level folders in this bucket should correspond to the top level keys in your client_specs (default client_specs can be found in specs.builtin_client_specs).

Recommended Parameters

--pdf-ext-run-mode [1 or 2]

  • Run mode 0 (default)
  • Normal production run.
  • Run mode 1
  • Populates DEV database and moves DEV bucket keys to processed. End to end SDLC testing. Use SLDC secrets and earlier start dates from specs/client_specs_dev, but do NOT override the db_push_dict send_func or set an output directory.
  • Run mode 2
  • Dumps output to disk (not DB) and leaves all source keys in place. Overrides secrets, start dates, and send_funcs AND sets an output dir. Saves final extract dict to disk along with all intermediate objects.

--log-dir /your/log/output/dir

  • The root directory path for logging. Default value is .local.

--client-list client1,client2

  • A comma separated list of client keys appearing in specs.builtin_client_specs. If supplied, only clients appearing in the list will be processed during this run. Commonly used to limit the execution scope during development and testing.

--facility-list facility1,facility2,facility3

  • A comma separated list of facility keys appearing in specs.builtin_client_specs. If supplied, only facilities appearing in the list will be processed during this run. Commonly used to limit the execution scope during development and testing.

Frequently Used Optional Parameters

--pdf-ext-dev-base-path /local/output/path/

  • Base path for all debug output when using run mode 2. Defaults to './local' with a warning regarding storage of PHI on unencrypted local resources.

--no-db

  • Switch parameter. Prevents DB and S3 updates for all Run Modes.

--no-notifications

  • Email notifications will NOT be sent if this arg is supplied.

--log-console

  • Switch parameter. Bypass file logging and print all logging to the console stdout and stderr.

--client-specs-file /path/to/local/client.bin

  • Allows user to pass the path to a custom aws client specification stored on a local storage device. The passed file must contain a pickle of a valid client_specs config.

Other Optional Parameters

--max-keys integer-between-0-and-1000

  • The maximum number of AWS S3 object keys to process for a single facility. Overridden by both MAX_KEYS env var AND max_keys setting in client_specs. The client_specs setting has the highest precedence. Default value is max aka 1000.

--aws-s3-key-client-specs s3/path/to/client.bin

  • An AWS S3 key storing a pickle of a valid client_specs dict. If supplied, the value of specs.builtin_client_specs will be overwritten with the unpickled dict at runtime.

--aws-s3-key-match-specs s3/path/to/match.bin

  • An AWS S3 key storing a pickle of a valid match_specs dict. If supplied, specs.builtin_match_specs will be overwritten with the unpickled dict at runtime.

--aws-s3-key-section-specs s3/path/to/section.bin

  • An AWS S3 key storing a pickle of a valid section_specs dict. If supplied, the value of specs.builtin_section_specs will be overwritten with the unpickled dict at runtime.

--aws-s3-key-table-specs s3/path/to/table.bin

  • An AWS S3 key storing a pickle of a valid table_specs dict. If supplied, the value of specs.builtin_table_specs will be overwritten with the unpickled dict at runtime.

--aws-s3-key-transform-specs s3/path/to/transform.bin

  • An AWS S3 key storing a pickle of a valid transform_specs dict. If supplied, the value of specs.builtin_transform_specs will be overwritten with the unpickled dict at runtime.

--aws-s3-key-summary-specs s3/path/to/summary.bin

  • An AWS S3 key storing a pickle of a valid summary_specs dict. If supplied, the value of specs.builtin_summary_specs will be overwritten with the unpickled dict at runtime.

--to-extract-key s3/path/to/target.pdf

  • Specify a single key to extract when triggered by UI upload.

--api-secret-name your-secret-name

  • Overrides client spec entry 'api_secret_name'. Used when debugging production operations to prevent charges to client.

--aws-region us-east-1

  • Defaults to 'us-east-1'.