turntable package¶

Submodules¶

turntable.press module¶

The press module is used to create Record Collections.

class turntable.press.Record(**kwargs)¶

Bases: turntable.press.RecordSetter, turntable.press.SeriesLoader

Record is a container object with the special property “series”. Any property added to Record will also be added to the pandas.Series

series : pandas.Series: container for parameters set to the instance

load : assigns items of a **kwargs to the class and to the series parameter _set_attributes : assigns items of a dictionary to the class and to the series parameter runMethod : runs a method by a string call

lets see how we can add a propertie to the record object

>>> record = Record(first_item = 'one')
>>> record.second_item = 'two'
>>> print record.series

mint = True¶

class turntable.press.RecordPress(pickle=True, pickle_path='./tmp')¶

Bases: object

This class auto-seralizes any attributes assigned to an instance and clears them from memmory when an attribute is called via the dot operator, it is read from disk

pickle : Boolean [True]: if False, the instance will behave as a normal class
pickle_path : string [‘./tmp’]: the path underwhich the files will be stored

clean_disk(): deletes all files stored by the instance
clean_memmory(): sets the in memory attribute values to None reducing the memory footprint

This class can be encapsulated to be used elsewhere

>>> class NewClass(RecordPress):
>>>
>>>  def __init__(self, pickle = True, pickle_path = './tmp'):
>>>     self.pickle = pickle
>>>     self.class_path = turntable.utils.path_to_filename(pickle_path+'/'+self.__class__.__name__)[0]
>>>     self.pickles = []
>>>
>>> newClass = NewClass()
>>> newClass.x = 10
>>> y = newClass.x
>>> newClass.clean_disk()

clean_disk()¶: clean removes all files and folders under the class_path directory

clean_memory()¶: sets all attribute values from the attribute list pickles to None -this is the default behaviure so this method is redundent

class turntable.press.RecordSetter(**kwargs)¶

RecordSetter provides a simple interface for initalizing arguments passed in kwargs and a run method for running a class method by name

kwargs : name : value

RecordSetter is a general python class that assigns **kwargs as instances of it self.

>>> obj = RecordSetter(name = 'me')
>>> print obj.name
me

load(**kwargs)¶

Takes an instance of Record() and named arguments from **kwargs returns the record instance with the named arguemnts added to the record

**kwargs : named arguments: first_arg = 1, second_arg = ‘two’

record.first_arg -> 1 record.second_arg -> ‘two’

>>> import turntable
>>> record = turntable.press.Record(first_arg = 1)
>>> record = record.load(second_arg = 'two')
>>> record.series
first_arg       1
second_arg    two

run_method(method)¶: Calls a specied method by name using run_method

class turntable.press.SeriesLoader¶

Bases: object

SeriesLoader assigns given properties to a special pandas.Series property: self.series.

series : all atributes of the class get added to an internal pandas series

This class can be encapsulated to be used elsewhere

>>> series_loader = SeriesLoader()
>>> series_loader.one = 'one'
>>> print series_loader.series

turntable.press.build_collection(df, **kwargs)¶

Generates a list of Record objects given a DataFrame. Each Record instance has a series attribute which is a pandas.Series of the same attributes in the DataFrame. Optional data can be passed in through kwargs which will be included by the name of each object.

df : pandas.DataFrame kwargs : alternate arguments to be saved by name to the series of each object

collection : list: list of Record objects where each Record represents one row from a dataframe

This is how we generate a Record Collection from a DataFrame.

>>> import pandas as pd
>>> import turntable
>>>
>>> df = pd.DataFrame({'Artist':"""Michael Jackson, Pink Floyd, Whitney Houston, Meat Loaf, 
    Eagles, Fleetwood Mac, Bee Gees, AC/DC""".split(', '),
>>> 'Album' :"""Thriller, The Dark Side of the Moon, The Bodyguard, Bat Out of Hell, 
    Their Greatest Hits (1971-1975), Rumours, Saturday Night Fever, Back in Black""".split(', ')})
>>> collection = turntable.press.build_collection(df, my_favorite_record = 'nevermind')
>>> record = collection[0]
>>> print record.series

turntable.press.collection_to_df(collection)¶

Converts a collection back into a pandas DataFrame

collection : list: list of Record objects where each Record represents one row from a dataframe

df : pandas.DataFrame: DataFrame of length=len(collection) where each row represents one Record

turntable.press.load_record(record, **kwargs)¶

Takes an instance of Record() and named arguments from **kwargs returns the record instance with the named arguemnts added to the record

record : Record(): either full or empty record object
**kwargs : named arguments: first_arg = 1, second_arg = ‘two’

record.first_arg -> 1 record.second_arg -> ‘two’

>>> import turntable
>>> record = load_record(turntable.press.Record(), first_arg = 1, second_arg = 'two')
>>> record.series
first_arg       1
second_arg    two

turntable.press.spin_frame(df, method)¶

Runs the full turntable process on a pandas DataFrame

df : pandas.DataFrame: each row represents a record
method : def method(record): function used to process each row

df : pandas.DataFrame: DataFrame processed by method

>>> import pandas as pd
>>> import turntable
>>>
>>> df = pd.DataFrame({'Artist':"""Michael Jackson, Pink Floyd, Whitney Houston, Meat Loaf, Eagles, Fleetwood Mac, Bee Gees, AC/DC""".split(', '), 'Album':"""Thriller, The Dark Side of the Moon, The Bodyguard, Bat Out of Hell, Their Greatest Hits (1971–1975), Rumours, Saturday Night Fever, Back in Black""".split(', ')})
>>>
>>> def method(record):
>>>    record.cost = 40
>>>    return record
>>>
>>> turntable.press.spin_frame(df, method)

turntable.spin module¶

The spin module contains tools to process Record Collections in either series or parallel.

Thanks to chriskiehl http://chriskiehl.com/article/parallelism-in-one-line/

turntable.spin.batch(collection, method, processes=None, batch_size=None, quiet=False, kwargs_to_dump=None, args=None, **kwargs)¶

Processes a collection in parallel batches, each batch processes in series on a single process. Running batches in parallel can be more effficient that splitting a list across cores as in spin.parallel because of parallel processing has high IO requirements.

collection : list: i.e. list of Record objects

method : method to call on each Record processes : int

number of processes to run on [defaults to number of cores on machine]

batch_size : int: lenght of each batch [defaults to number of elements / number of processes]

collection : list: list of Record objects after going through method called

adding 2 to every number in a range

>>> import turntable
>>> collection = range(100)
>>> def jam(record):
>>>     return record + 2
>>> collection = turntable.spin.batch(collection, jam)

lambda functions do not work in parallel

turntable.spin.new_function_batch(sequence, method, *args, **kwargs)¶

turntable.spin.new_function_dumping(args_to_load_names, function, main_arg, *args, **kwargs)¶

turntable.spin.parallel(collection, method, processes=None, args=None, **kwargs)¶

Processes a collection in parallel.

collection : list: i.e. list of Record objects

method : method to call on each Record processes : int

number of processes to run on [defaults to number of cores on machine]

batch_size : int: lenght of each batch [defaults to number of elements / number of processes]

collection : list: list of Record objects after going through method called

adding 2 to every number in a range

>>> import turntable
>>> collection = range(100)
>>> def jam(record):
>>>     return record + 2
>>> collection = turntable.spin.parallel(collection, jam)

lambda functions do not work in parallel

turntable.spin.process_dump(collection, function, kwargs_to_dump, processes=None, args=None, **kwargs)¶

turntable.spin.series(collection, method, prints=15, *args, **kwargs)¶

Processes a collection in series

collection : list: list of Record objects

method : method to call on each Record prints : int

number of timer prints to the screen

collection : list: list of Record objects after going through method called

If more than one collection is given, the function is called with an argument list consisting of the corresponding item of each collection, substituting None for missing values when not all collection have the same length. If the function is None, return the original collection (or a list of tuples if multiple collections).

adding 2 to every number in a range

>>> import turntable
>>> collection = range(100)
>>> method = lambda x: x + 2
>>> collection = turntable.spin.series(collection, method)

turntable.spin.thread(function, sequence, cores=None, runSeries=False, quiet=False)¶: sets up the threadpool with map for parallel processing

turntable.utils module¶

The utils module provides a collection of methods used across the package or of general utility.

class turntable.utils.Timer(nLoops, numPrints=100, verbose=True)¶

Timer that calculates time remaining for a process and the percent complete

Todo

Ask for details about the usage

nLoops : integer numPrints : integer (default is 100) verbose : bool (default is True)

nLoops : integer numPrints : integer verbose : bool

if True, print values when loop is called

count : integer elapsed : float

elapsed time

est_end : float: estimated end
ti : float: initial time
tf : float: current time

display_amt : integer

fin()¶

loop()¶: Tracks the time in a loop. The estimated time to completion can be calculated and if verbose is set to True, the object will print estimated time to completion, and percent complete. Actived in every loop to keep track

turntable.utils.Walk(root='.', recurse=True, pattern='*')¶

Generator for walking a directory tree. Starts at specified root folder, returning files that match our pattern. Optionally will also recurse through sub-folders.

root : string (default is ‘.’): Path for the root folder to look in.
recurse : bool (default is True): If True, will also look in the subfolders.
pattern : string (default is ‘*’, which means all the files are concerned): The pattern to look for in the files’ name.

generator: Walk yields a generator from the matching files paths.

turntable.utils.add_path_string(root_path='./results', path_string=None)¶

turntable.utils.batch_list(sequence, batch_size, mod=0, randomize=False)¶

Converts a list into a list of lists with equal batch_size.

sequence : list: list of items to be placed in batches
batch_size : int: length of each sub list
mod : int: remainder of list length devided by batch_size mod = len(sequence) % batch_size
randomize = bool: should the initial sequence be randomized before being batched

turntable.utils.catch(fcn, *args, **kwargs)¶

try:

retrun fcn(*args, **kwargs)

except:

print traceback

if ‘spit’ in kwargs.keys():: return kwargs[‘spit’]

fcn : function *args : unnamed parameters of fcn **kwargs : named parameters of fcn

spit : returns the parameter named return in the exception

The expected output of fcn or prints the exception traceback

turntable.utils.create_dir(path, dir_dict={})¶

Tries to create a new directory in the given path. create_dir can also create subfolders according to the dictionnary given as second argument.

path : string: string giving the path of the location to create the directory, either absolute or relative.
dir_dict : dictionary, optional: Dictionary ordering the creation of subfolders. Keys must be strings, and values either None or path dictionaries. the default is {}, which means that no subfolders will be created

>>> path = './project'
>>> dir_dict = {'dir1':None, 'dir2':{'subdir21':None}}
>>> utils.create_dir(path, dir_dict)

will create:

project/dir1
project/dir2/subdir21

in your parent directory.

turntable.utils.displayAll(elapsed, display_amt, est_end, nLoops, count, numPrints)¶: Displays time if verbose is true and count is within the display amount

turntable.utils.from_pickle(filename, clean_disk=False)¶

turntable.utils.path_to_filename(pathfile)¶

Takes a path filename string and returns the split between the path and the filename

if filename is not given, filename = ‘’ if path is not given, path = ‘./’

turntable.utils.scan_path(root='.', recurse=False, pattern='*')¶

Runs a loop over the Walk Generator to find all file paths in the root directory with the given pattern. If recurse is True: matching paths are identified for all sub directories.

root : string (default is ‘.’): Path for the root folder to look in.
recurse : bool (default is True): If True, will also look in the subfolders.
pattern : string (default is ‘*’, which means all the files are concerned): The pattern to look for in the files’ name.

path_list : list: list of all the matching files paths.

turntable.utils.timeUnit(elapsed, avg, est_end)¶: calculates unit of time to display

turntable.utils.to_pickle(obj, filename, clean_memory=False)¶: http://stackoverflow.com/questions/7900944/read-write-classes-to-files-in-an-efficent-way

turntable package¶

Submodules¶

turntable.press module¶

turntable.spin module¶

turntable.utils module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page