dataduct.s3 package

Submodules

dataduct.s3.s3_directory module

Base class for storing a S3 File

class dataduct.s3.s3_directory.S3Directory(path=None, s3_path=None)

Bases: object

S3 Directory object helps operate with a directory on S3

The S3Directory acts much like the S3File. It represents a directory. Tries to unify the concept of a directory stored locally with one stored in S3.

s3_path

Outputs the s3_path

upload_to_s3()

Uploads the directory to the s3 directory

dataduct.s3.s3_file module

Base class for storing a S3 File

class dataduct.s3.s3_file.S3File(path=None, text=None, s3_path=None)

Bases: object

S3 File object that provides functions to operate with a file on S3

The S3 file unifies the file concept, which could be stored on the local file system, as a string, or already in s3.

file_name

The file name of this file

Returns:file_name – The file_name of this file
Return type:str
s3_path

Outputs the s3_path

text

Outputs the text of the associated file

Returns:result – The text of the file. Can be local or on S3
Return type:str
upload_to_s3()

Sends file to URI. This action is idempotent.

Raises:ETLInputError – If no URL is provided

dataduct.s3.s3_log_path module

Class for storing a S3 Log Path

class dataduct.s3.s3_log_path.S3LogPath(key=None, bucket=None, uri=None, parent_dir=None, is_directory=False)

Bases: dataduct.s3.s3_path.S3Path

S3 Log path for data pipeline S3LogPath only exists to correct the use of S3 URI’s by Data Pipeline. In most cases, one should use a backslash to disambiguate prefixes. For instance, the former prefix includes the latter unless there is a backslash:

::
s3:://coursera-datapipeline/dev s3:://coursera-datapipeline/dev_log_dir

However, if one adds a backslash to the log s3 URI, Data Pipeline will add another backslash before adding subdirectories. These double backslashes break boto.

uri

Get the log directory path

Returns:s3_uri – s3_log path without the trailing ‘/’
Return type:str

dataduct.s3.s3_path module

Base class for storing a S3 Path

class dataduct.s3.s3_path.S3Path(key=None, bucket=None, uri=None, parent_dir=None, is_directory=False)

Bases: object

S3 Path object that provides basic functions to interact with an S3 path

The s3 path ensures that there is a regular way of representing paths in s3, and distinguishing between directories and files.

Note

We don’t connect with S3 using boto for any checks here.

append(new_key, is_directory=False)

Appends new key to the current key

Parameters:
  • new_key (str) – Key for the S3 path
  • is_directory (bool) – Is the specified S3 path a directory
uri

Returns the uri of the S3 path

Note

Note that if there is a directory, the URI is appended a ‘/’

Returns:S3 URI

dataduct.s3.utils module

Shared utility functions

dataduct.s3.utils.copy_within_s3(s3_old_path, s3_new_path, raise_when_no_exist=True)

Copies files from one S3 Path to another

Parameters:
  • s3_old_path (S3Path) – Output path of the file to be uploaded
  • s3_new_path (S3Path) – Output path of the file to be uploaded
  • raise_when_no_exist (bool, optional) – Raise error if file not found
Raises:

ETLInputError – If s3_old_path does not exist

dataduct.s3.utils.delete_dir_from_s3(s3_path)

Deletes a complete directory from s3

Parameters:s3_path (S3Path) – Path of the directory to be deleted
dataduct.s3.utils.download_dir_from_s3(s3_path, local_path)

Downloads a complete directory from s3

Parameters:
  • s3_path (S3Path) – Input path of the file to be downloaded
  • local_path (file_path) – Output path of the file to be downloaded
dataduct.s3.utils.get_s3_bucket(bucket_name)

Returns an S3 bucket object from boto

Parameters:bucket_name (str) – Name of the bucket to be read
Returns:bucket – Boto S3 bucket object
Return type:boto.S3.bucket.Bucket
dataduct.s3.utils.read_from_s3(s3_path)

Reads the contents of a file from S3

Parameters:s3_path (S3Path) – Input path of the file to be read
Returns:results – Contents of the file as a string
Return type:str
dataduct.s3.utils.upload_dir_to_s3(s3_path, local_path, filter_function=None)

Uploads a complete directory to s3

Parameters:
  • s3_path (S3Path) – Output path of the file to be uploaded
  • local_path (file_path) – Input path of the file to be uploaded
  • filter_function (function) – Function to filter out directories
dataduct.s3.utils.upload_to_s3(s3_path, file_name=None, file_text=None)

Uploads a file to S3

Parameters:
  • s3_path (S3Path) – Output path of the file to be uploaded
  • file_name (str) – Name of the file to be uploaded to s3
  • file_text (str) – Contents of the file to be uploaded

Module contents