twistml.utility package¶
Submodules¶
twistml.utility.plotting module¶
Provides functions for creating graphs using matplotlib
Author: | Matthias Manhertz |
---|---|
Copyright: |
|
Licence: | MIT |
-
twistml.utility.plotting.
multi_group_bar_chart
(data, errors, setting_labels, source_labels, x_label, y_label, title, colors=['b', 'g', 'y', 'c', 'r', 'm', 'k', 'w'], ylim=None)¶ Returns a bar chart figure
The bar chart contains multiple groups of bars. Each group has the same number of bars (one for each data source) and represents a certain setting. For example the data sources could be different machine learning algorithms like SVM or KNN. The setting could be different feature representations like bag of words, sentiment features or word2vec. In this case the data would have the shape (3x2).
- data : array-like
- The data that defines the actual bar heights. The shape is (n_settings x n_sources).
- errors : array-like
- The data that defines the error bars on each data bar. The shape is (n_settings x n_sources).
- setting_labels : list[str]
- A list of labels for the settings. Needs to be of length n_settings.
- source_labels : list[str]
- A list of labels for the sources. Needs to be of length n_sources.
- x_label : str
- Label for the x-axis.
- y_label : str
- Label for the y-axis.
- title : str
- Title for the whole plot.
- colors : list[str], optional for less than 9 sources
The colors for the bar faces. Has to have at least as many entries as the data has sources. Each source gets it’s own color in order. Additional colors are discarded. (default is [‘b’,’g’,’y’,’c’,’r’,’m’,’k’,’w’]) To find valid color names check the matplotlib documentation or run the following code snippet:
$ import matplotlib $ for colorname in matplotlib.colors.cnames: $ print colorname
- ylim : tuple(ymin, ymax) or ‘auto’ or None, optional
- The min and max values for the y-axis. (Default is None, which implies matplotlib can set these automatically.) Another option is to specify ‘auto’, which sets the minimum just below the smallest height and the maximum just above the largest height, wich often looks better than the matplotlib auto setting.
- fig : figure
- A matplotlib figure of the finished bar chart.
- ValueError
- If the shapes of data, errors, setting_labels and source_labels do not match.
twistml.utility.toydata module¶
Contains functions / classes to create toy datasets fot twistml.
<extended summary>
<routine listings>
<see also>
<notes>
<references>
<examples>
Author: | Matthias Manhertz |
---|---|
Copyright: |
|
Licence: | MIT |
-
twistml.utility.toydata.
create_toy_data
(targets, outdir, keywords, tweets_per_target=1000, lag=4.0, sigma=1.0)¶ Creates a set of tweets for the given targets and saves them to disk.
- targets : dict[datetime,float]
- The percentual change in stock price for each datetime stamp.
- outdir : str
- The full path to a directory where the .json files with the generated tweets should be saved.
- tweet_per_target : int, optional
- How many tweets will be generated per target value.
- lag : float, optional
- How many days should the tweets precede the stock prices. (Default is 4.0, which implies tweets are on average made four days before the corresponding change in stock price.)
- sigma : float, optional
- The sigma of the gaussian distribution of timestamps for the generated tweetsexpressed in days. (default is 1.0, which implies 68% of sampled dates will be within one day of center, 95% within two days and 99.7% within 3 days.)
-
twistml.utility.toydata.
random
() → x in the interval [0, 1).¶
twistml.utility.utility module¶
Provides some internal utility functions for the twistml package
Author: | Matthias Manhertz |
---|---|
Copyright: |
|
Licence: | MIT |
-
twistml.utility.utility.
find_files
(indir, subdirs=False, fromdate=None, todate=None, dateformat='%Y-%m-%d', filenameformat='%Y-%m-%d.json', logger=None)¶ - Flexible function to find relevant files. Returns a list of
- filepaths.
- indir : str
- The input directory, where the twitter files are located.
- subdirs : bool, optional
- Are the files located in subdirectories within indir (default is False)
- fromdate : str, optional
- First date of a daterange within which to find the files. The format for parsing the date from the string is given in dateformat. The file names are then parsed according to filenameformat and only files within the daterange are returned.
- todate : str, optional
- Last date of a daterange within which to find the files. The format for parsing the date from the string is given in dateformat. The file names are then parsed according to filenameformat and only files within the daterange are returned.
- dateformat : str, optional
- The format used to parse datetime objects from the given fromdate and todate.
- filenameformat : str, optional
- The format used to parse datetime objects from the filenames.
- logger : logging.Logger, optional
- A logger object, used to display / log console output (default is None, which implies quiet execution).
- filepaths : list[str]
- A list of the found files’ paths.
For details about datetime formatting strings see the official documentation at docs.python.org.
-
twistml.utility.utility.
float_ceil
(x, n=0)¶ Return the floating point value x rounded up to n digits after the decimal point.
- x : float
- The value to be rounded.
- n : int, optional
- The number of digits after the decimal point to be rounded to. (default is 0)
- float
- The value x, rounded up to n digits after the decimal point.
-
twistml.utility.utility.
float_floor
(x, n=0)¶ Return the floating point value x rounded down to n digits after the decimal point.
- x : float
- The value to be rounded.
- n : int, optional
- The number of digits after the decimal point to be rounded to. (default is 0)
- float
- The value x, rounded down to n digits after the decimal point.
-
twistml.utility.utility.
generate_datesequence
(fromdate, todate, informat='%Y-%m-%d', outformat='%Y-%m-%d')¶ Generates a list of sequential dates.
- fromdate, todate : str or datetime
- A date in format dateformat.
- informat : str, optional
- The format string as detailed in the python documentation for the datetime package (default is ‘%Y-%m-%d’). If this is None, it is assumed, that a valid datetime instance was passed.
- outformat : str, optional
- The format string as detailed in the python documentation for the datetime package (default is ‘%Y-%m-%d’). If this is None, the the returned dates will be datetime instances instead of formatted strings.
- list[str] or list[datetime]
- A list of dates in format dateformat starting at fromdate and ending at todate (inclusive)
-
twistml.utility.utility.
progress_report
(starting_time, total_files, completed_files)¶ Returns a progress report for multi-file-operations.
- starting_time : float
- Timestamp of the starting time of the multi-file-operation, usually obtained by calling time.time().
- total_files : int
- Number of files that are being processed.
- completed_files : int
- Number of files that have been completed.
- report : str
- A two-line string stating time passed, number of files completed / left and estimated time remaining.
-
twistml.utility.utility.
query_yes_no
(question, default='yes')¶ Ask a yes/no question via raw_input() and return their answer.
“question” is a string that is presented to the user. “default” is the presumed answer if the user just hits <Enter>.
It must be “yes” (the default), “no” or None (meaning an answer is required of the user).The “answer” return value is True for “yes” or False for “no”.
-
twistml.utility.utility.
remap
(oldValue, oldMin, oldMax, newMin=0.0, newMax=1.0)¶ Linear scaling of a value between two intervals.
Assuming oldValue lies in the interval [oldMin, oldMax] this function returns a new value that has been linearly scaled to the interval [newMin, newMax].
- oldvalue : float
- The old value from the interval [oldMin, oldMax].
- oldMin : float
- The minimum of the old interval.
- oldMax : float
- The maximum of the old interval.
- newMin : float, optional
- The minimum of the new interval (default is 0.0).
- newMax : float, optional
- The maximum of the new interval (default is 1.0).
- newValue : float
- The oldValue linearly scaled to the new interval.
- ValueError
- IF the oldValue is outside the old interval or either minimum is greater than or equal to the corresponding maximum.
The code for this function has been partially taken from this stackexchange question by SpliFF and the answers by jerryjvl and PenguinTD.
-
twistml.utility.utility.
setup_logger
(name, logfilepath=None, console=True, level=20)¶ Setup logging to a file and/or the console. Returns the logger.
- name : str
- The name for the logger.
- logfilepath : str, optional
- The full path to the logfile, that shall be written. (Default is None, which implies no log file will be created.)
- console : bool, optional
- Determines whether the log should (also) be written to the console. (Default is True.)
- level : int, optional
- The minimal level of logs, that shall be written. (Default is logging.INFO (20).)
- logger : logging.Logger
- The logger object.
Module contents¶
<package summary>
<extended summary>
<module listings>
Author: | Matthias Manhertz |
---|---|
Copyright: |
|
Licence: | MIT |