dataduct.s3 package¶
Submodules¶
dataduct.s3.s3_directory module¶
Base class for storing a S3 File
- class dataduct.s3.s3_directory.S3Directory(path=None, s3_path=None)¶
Bases: object
S3 Directory object helps operate with a directory on S3
The S3Directory acts much like the S3File. It represents a directory. Tries to unify the concept of a directory stored locally with one stored in S3.
- s3_path¶
Outputs the s3_path
- upload_to_s3()¶
Uploads the directory to the s3 directory
dataduct.s3.s3_file module¶
Base class for storing a S3 File
- class dataduct.s3.s3_file.S3File(path=None, text=None, s3_path=None)¶
Bases: object
S3 File object that provides functions to operate with a file on S3
The S3 file unifies the file concept, which could be stored on the local file system, as a string, or already in s3.
- file_name¶
The file name of this file
Returns: file_name – The file_name of this file Return type: str
- s3_path¶
Outputs the s3_path
- text¶
Outputs the text of the associated file
Returns: result – The text of the file. Can be local or on S3 Return type: str
- upload_to_s3()¶
Sends file to URI. This action is idempotent.
Raises: ETLInputError – If no URL is provided
dataduct.s3.s3_log_path module¶
Class for storing a S3 Log Path
- class dataduct.s3.s3_log_path.S3LogPath(key=None, bucket=None, uri=None, parent_dir=None, is_directory=False)¶
Bases: dataduct.s3.s3_path.S3Path
S3 Log path for data pipeline S3LogPath only exists to correct the use of S3 URI’s by Data Pipeline. In most cases, one should use a backslash to disambiguate prefixes. For instance, the former prefix includes the latter unless there is a backslash:
- ::
- s3:://coursera-datapipeline/dev s3:://coursera-datapipeline/dev_log_dir
However, if one adds a backslash to the log s3 URI, Data Pipeline will add another backslash before adding subdirectories. These double backslashes break boto.
- uri¶
Get the log directory path
Returns: s3_uri – s3_log path without the trailing ‘/’ Return type: str
dataduct.s3.s3_path module¶
Base class for storing a S3 Path
- class dataduct.s3.s3_path.S3Path(key=None, bucket=None, uri=None, parent_dir=None, is_directory=False)¶
Bases: object
S3 Path object that provides basic functions to interact with an S3 path
The s3 path ensures that there is a regular way of representing paths in s3, and distinguishing between directories and files.
Note
We don’t connect with S3 using boto for any checks here.
- append(new_key, is_directory=False)¶
Appends new key to the current key
Parameters: - new_key (str) – Key for the S3 path
- is_directory (bool) – Is the specified S3 path a directory
- uri¶
Returns the uri of the S3 path
Note
Note that if there is a directory, the URI is appended a ‘/’
Returns: S3 URI
dataduct.s3.utils module¶
Shared utility functions
- dataduct.s3.utils.copy_within_s3(s3_old_path, s3_new_path, raise_when_no_exist=True)¶
Copies files from one S3 Path to another
Parameters: - s3_old_path (S3Path) – Output path of the file to be uploaded
- s3_new_path (S3Path) – Output path of the file to be uploaded
- raise_when_no_exist (bool, optional) – Raise error if file not found
Raises: ETLInputError – If s3_old_path does not exist
- dataduct.s3.utils.delete_dir_from_s3(s3_path)¶
Deletes a complete directory from s3
Parameters: s3_path (S3Path) – Path of the directory to be deleted
- dataduct.s3.utils.download_dir_from_s3(s3_path, local_path)¶
Downloads a complete directory from s3
Parameters: - s3_path (S3Path) – Input path of the file to be downloaded
- local_path (file_path) – Output path of the file to be downloaded
- dataduct.s3.utils.get_s3_bucket(bucket_name)¶
Returns an S3 bucket object from boto
Parameters: bucket_name (str) – Name of the bucket to be read Returns: bucket – Boto S3 bucket object Return type: boto.S3.bucket.Bucket
- dataduct.s3.utils.read_from_s3(s3_path)¶
Reads the contents of a file from S3
Parameters: s3_path (S3Path) – Input path of the file to be read Returns: results – Contents of the file as a string Return type: str
- dataduct.s3.utils.upload_dir_to_s3(s3_path, local_path, filter_function=None)¶
Uploads a complete directory to s3
Parameters: - s3_path (S3Path) – Output path of the file to be uploaded
- local_path (file_path) – Input path of the file to be uploaded
- filter_function (function) – Function to filter out directories
- dataduct.s3.utils.upload_to_s3(s3_path, file_name=None, file_text=None)¶
Uploads a file to S3
Parameters: - s3_path (S3Path) – Output path of the file to be uploaded
- file_name (str) – Name of the file to be uploaded to s3
- file_text (str) – Contents of the file to be uploaded