turntable package¶
Submodules¶
turntable.press module¶
The press module is used to create Record Collections.
-
class
turntable.press.
Record
(**kwargs)¶ Bases:
turntable.press.RecordSetter
,turntable.press.SeriesLoader
Record is a container object with the special property “series”. Any property added to Record will also be added to the pandas.Series
- series : pandas.Series
- container for parameters set to the instance
load : assigns items of a **kwargs to the class and to the series parameter _set_attributes : assigns items of a dictionary to the class and to the series parameter runMethod : runs a method by a string call
lets see how we can add a propertie to the record object
>>> record = Record(first_item = 'one') >>> record.second_item = 'two' >>> print record.series
-
mint
= True¶
-
class
turntable.press.
RecordPress
(pickle=True, pickle_path='./tmp')¶ Bases:
object
This class auto-seralizes any attributes assigned to an instance and clears them from memmory when an attribute is called via the dot operator, it is read from disk
- pickle : Boolean [True]
- if False, the instance will behave as a normal class
- pickle_path : string [‘./tmp’]
- the path underwhich the files will be stored
- clean_disk()
- deletes all files stored by the instance
- clean_memmory()
- sets the in memory attribute values to None reducing the memory footprint
This class can be encapsulated to be used elsewhere
>>> class NewClass(RecordPress): >>> >>> def __init__(self, pickle = True, pickle_path = './tmp'): >>> self.pickle = pickle >>> self.class_path = turntable.utils.path_to_filename(pickle_path+'/'+self.__class__.__name__)[0] >>> self.pickles = [] >>> >>> newClass = NewClass() >>> newClass.x = 10 >>> y = newClass.x >>> newClass.clean_disk()
-
clean_disk
()¶ clean removes all files and folders under the class_path directory
-
clean_memory
()¶ sets all attribute values from the attribute list pickles to None -this is the default behaviure so this method is redundent
-
class
turntable.press.
RecordSetter
(**kwargs)¶ RecordSetter provides a simple interface for initalizing arguments passed in kwargs and a run method for running a class method by name
kwargs : name : value
RecordSetter is a general python class that assigns **kwargs as instances of it self.
>>> obj = RecordSetter(name = 'me') >>> print obj.name me
-
load
(**kwargs)¶ Takes an instance of Record() and named arguments from **kwargs returns the record instance with the named arguemnts added to the record
- **kwargs : named arguments
- first_arg = 1, second_arg = ‘two’
record.first_arg -> 1 record.second_arg -> ‘two’
>>> import turntable >>> record = turntable.press.Record(first_arg = 1) >>> record = record.load(second_arg = 'two') >>> record.series first_arg 1 second_arg two
-
run_method
(method)¶ Calls a specied method by name using run_method
-
-
class
turntable.press.
SeriesLoader
¶ Bases:
object
SeriesLoader assigns given properties to a special pandas.Series property: self.series.
series : all atributes of the class get added to an internal pandas series
This class can be encapsulated to be used elsewhere
>>> series_loader = SeriesLoader() >>> series_loader.one = 'one' >>> print series_loader.series
-
turntable.press.
build_collection
(df, **kwargs)¶ Generates a list of Record objects given a DataFrame. Each Record instance has a series attribute which is a pandas.Series of the same attributes in the DataFrame. Optional data can be passed in through kwargs which will be included by the name of each object.
df : pandas.DataFrame kwargs : alternate arguments to be saved by name to the series of each object
- collection : list
- list of Record objects where each Record represents one row from a dataframe
This is how we generate a Record Collection from a DataFrame.
>>> import pandas as pd >>> import turntable >>> >>> df = pd.DataFrame({'Artist':"""Michael Jackson, Pink Floyd, Whitney Houston, Meat Loaf, Eagles, Fleetwood Mac, Bee Gees, AC/DC""".split(', '), >>> 'Album' :"""Thriller, The Dark Side of the Moon, The Bodyguard, Bat Out of Hell, Their Greatest Hits (1971-1975), Rumours, Saturday Night Fever, Back in Black""".split(', ')}) >>> collection = turntable.press.build_collection(df, my_favorite_record = 'nevermind') >>> record = collection[0] >>> print record.series
-
turntable.press.
collection_to_df
(collection)¶ Converts a collection back into a pandas DataFrame
- collection : list
- list of Record objects where each Record represents one row from a dataframe
- df : pandas.DataFrame
- DataFrame of length=len(collection) where each row represents one Record
-
turntable.press.
load_record
(record, **kwargs)¶ Takes an instance of Record() and named arguments from **kwargs returns the record instance with the named arguemnts added to the record
- record : Record()
- either full or empty record object
- **kwargs : named arguments
- first_arg = 1, second_arg = ‘two’
record.first_arg -> 1 record.second_arg -> ‘two’
>>> import turntable >>> record = load_record(turntable.press.Record(), first_arg = 1, second_arg = 'two') >>> record.series first_arg 1 second_arg two
-
turntable.press.
spin_frame
(df, method)¶ Runs the full turntable process on a pandas DataFrame
- df : pandas.DataFrame
- each row represents a record
- method : def method(record)
- function used to process each row
- df : pandas.DataFrame
- DataFrame processed by method
>>> import pandas as pd >>> import turntable >>> >>> df = pd.DataFrame({'Artist':"""Michael Jackson, Pink Floyd, Whitney Houston, Meat Loaf, Eagles, Fleetwood Mac, Bee Gees, AC/DC""".split(', '), 'Album':"""Thriller, The Dark Side of the Moon, The Bodyguard, Bat Out of Hell, Their Greatest Hits (1971–1975), Rumours, Saturday Night Fever, Back in Black""".split(', ')}) >>> >>> def method(record): >>> record.cost = 40 >>> return record >>> >>> turntable.press.spin_frame(df, method)
turntable.spin module¶
The spin module contains tools to process Record Collections in either series or parallel.
Thanks to chriskiehl http://chriskiehl.com/article/parallelism-in-one-line/
-
turntable.spin.
batch
(collection, method, processes=None, batch_size=None, quiet=False, kwargs_to_dump=None, args=None, **kwargs)¶ Processes a collection in parallel batches, each batch processes in series on a single process. Running batches in parallel can be more effficient that splitting a list across cores as in spin.parallel because of parallel processing has high IO requirements.
- collection : list
- i.e. list of Record objects
method : method to call on each Record processes : int
number of processes to run on [defaults to number of cores on machine]- batch_size : int
- lenght of each batch [defaults to number of elements / number of processes]
- collection : list
- list of Record objects after going through method called
adding 2 to every number in a range
>>> import turntable >>> collection = range(100) >>> def jam(record): >>> return record + 2 >>> collection = turntable.spin.batch(collection, jam)
lambda functions do not work in parallel
-
turntable.spin.
new_function_batch
(sequence, method, *args, **kwargs)¶
-
turntable.spin.
new_function_dumping
(args_to_load_names, function, main_arg, *args, **kwargs)¶
-
turntable.spin.
parallel
(collection, method, processes=None, args=None, **kwargs)¶ Processes a collection in parallel.
- collection : list
- i.e. list of Record objects
method : method to call on each Record processes : int
number of processes to run on [defaults to number of cores on machine]- batch_size : int
- lenght of each batch [defaults to number of elements / number of processes]
- collection : list
- list of Record objects after going through method called
adding 2 to every number in a range
>>> import turntable >>> collection = range(100) >>> def jam(record): >>> return record + 2 >>> collection = turntable.spin.parallel(collection, jam)
lambda functions do not work in parallel
-
turntable.spin.
process_dump
(collection, function, kwargs_to_dump, processes=None, args=None, **kwargs)¶
-
turntable.spin.
series
(collection, method, prints=15, *args, **kwargs)¶ Processes a collection in series
- collection : list
- list of Record objects
method : method to call on each Record prints : int
number of timer prints to the screen- collection : list
- list of Record objects after going through method called
If more than one collection is given, the function is called with an argument list consisting of the corresponding item of each collection, substituting None for missing values when not all collection have the same length. If the function is None, return the original collection (or a list of tuples if multiple collections).
adding 2 to every number in a range
>>> import turntable >>> collection = range(100) >>> method = lambda x: x + 2 >>> collection = turntable.spin.series(collection, method)
-
turntable.spin.
thread
(function, sequence, cores=None, runSeries=False, quiet=False)¶ sets up the threadpool with map for parallel processing
turntable.utils module¶
The utils module provides a collection of methods used across the package or of general utility.
-
class
turntable.utils.
Timer
(nLoops, numPrints=100, verbose=True)¶ Timer that calculates time remaining for a process and the percent complete
Todo
Ask for details about the usage
nLoops : integer numPrints : integer (default is 100) verbose : bool (default is True)
nLoops : integer numPrints : integer verbose : bool
if True, print values when loop is calledcount : integer elapsed : float
elapsed time- est_end : float
- estimated end
- ti : float
- initial time
- tf : float
- current time
display_amt : integer
-
fin
()¶
-
loop
()¶ Tracks the time in a loop. The estimated time to completion can be calculated and if verbose is set to True, the object will print estimated time to completion, and percent complete. Actived in every loop to keep track
-
turntable.utils.
Walk
(root='.', recurse=True, pattern='*')¶ Generator for walking a directory tree. Starts at specified root folder, returning files that match our pattern. Optionally will also recurse through sub-folders.
- root : string (default is ‘.’)
- Path for the root folder to look in.
- recurse : bool (default is True)
- If True, will also look in the subfolders.
- pattern : string (default is ‘*’, which means all the files are concerned)
- The pattern to look for in the files’ name.
- generator
- Walk yields a generator from the matching files paths.
-
turntable.utils.
add_path_string
(root_path='./results', path_string=None)¶
-
turntable.utils.
batch_list
(sequence, batch_size, mod=0, randomize=False)¶ Converts a list into a list of lists with equal batch_size.
- sequence : list
- list of items to be placed in batches
- batch_size : int
- length of each sub list
- mod : int
- remainder of list length devided by batch_size mod = len(sequence) % batch_size
- randomize = bool
- should the initial sequence be randomized before being batched
-
turntable.utils.
catch
(fcn, *args, **kwargs)¶ - try:
- except:
- print traceback
- if ‘spit’ in kwargs.keys():
- return kwargs[‘spit’]
fcn : function *args : unnamed parameters of fcn **kwargs : named parameters of fcn
spit : returns the parameter named return in the exceptionThe expected output of fcn or prints the exception traceback
-
turntable.utils.
create_dir
(path, dir_dict={})¶ Tries to create a new directory in the given path. create_dir can also create subfolders according to the dictionnary given as second argument.
- path : string
- string giving the path of the location to create the directory, either absolute or relative.
- dir_dict : dictionary, optional
- Dictionary ordering the creation of subfolders. Keys must be strings, and values either None or path dictionaries. the default is {}, which means that no subfolders will be created
>>> path = './project' >>> dir_dict = {'dir1':None, 'dir2':{'subdir21':None}} >>> utils.create_dir(path, dir_dict)
will create:
- project/dir1
- project/dir2/subdir21
in your parent directory.
-
turntable.utils.
displayAll
(elapsed, display_amt, est_end, nLoops, count, numPrints)¶ Displays time if verbose is true and count is within the display amount
-
turntable.utils.
from_pickle
(filename, clean_disk=False)¶
-
turntable.utils.
path_to_filename
(pathfile)¶ Takes a path filename string and returns the split between the path and the filename
if filename is not given, filename = ‘’ if path is not given, path = ‘./’
-
turntable.utils.
scan_path
(root='.', recurse=False, pattern='*')¶ Runs a loop over the
Walk
Generator to find all file paths in the root directory with the given pattern. If recurse is True: matching paths are identified for all sub directories.- root : string (default is ‘.’)
- Path for the root folder to look in.
- recurse : bool (default is True)
- If True, will also look in the subfolders.
- pattern : string (default is ‘*’, which means all the files are concerned)
- The pattern to look for in the files’ name.
- path_list : list
- list of all the matching files paths.
-
turntable.utils.
timeUnit
(elapsed, avg, est_end)¶ calculates unit of time to display
-
turntable.utils.
to_pickle
(obj, filename, clean_memory=False)¶ http://stackoverflow.com/questions/7900944/read-write-classes-to-files-in-an-efficent-way