dautil.data

Utilities to download and load data.

class dautil.data.Nordpil

Utility class to get data from the Nordpil website.

Variables:dir – The data destination directory.
load_urban_tsv()

Downloads the urbanareas file.

Returns:The fully qualified path of the downloaded file.
class dautil.data.OHLC(data_source='yahoo')

Downloads and caches historical EOD data from the web. with the pandas.io.data.DataReader.

get(ticker)

Retrieves EOD data from cache or the web.

Parameters:ticker – The stock symbol, such as AAPL.
Returns:The data as a pandas DataFrame.
class dautil.data.SPANFB

Utility class which downloads data from the SPAN Facebook webpage. :ivar fname: The path of the downloaded file.

load()

Downloads the SPAN Facebook dataset.

Returns:The fully qualified path of the downloaded file.
class dautil.data.Weather

Utility class which downloads or loads weather data from the KNMI website.

static beaufort_scale(df)

Categorizes wind speed using the Beaufort scale. :param df: A pandas DataFrame.

Returns:A categorized pandas DataFrame.
static categorize_wind_dir(df)

Categorize the wind direction (0 - 360) using cardinal direction (North, South etc.)

Parameters:df – A pandas DataFrame.
Returns:A categorized pandas DataFrame.
static fetch_DeBilt_weather()

Downloads, cleans and pickles weather data from the KNMI website.

static get_header(alias)

Gets slightly longer and descriptive column labels.

Parameters:alias – A short column name.
Returns:A longer column name.
static get_headers()

Gets the column labels for the pandas DataFrame stored in the internal pickle.

Returns:A list that contains the column names.
static load()

Loads data from an internal pickle.

Returns:The pandas DataFrame loaded from the pickle.
static rain_values()

Loads rain values without NA values as a NumPy array.

Returns:The rain values as a NumPy array.
class dautil.data.Worldbank

Caching proxy for the pandas Worldbank API.

Variables:
  • indicators – A list of indicator tuples in the form: (alias, name, longname)
  • alias2name – Mapping of alias to name.
  • name2alias – Mapping of name to alias.
  • name2longname – Mapping of name to longname.
  • aliases – A list of aliases.
  • names – A list of indicator names.
download(*args, **kwargs)

Caches the pandas.io.wb.download() results.

Returns:The result of the query from cache or the WWW.
get_alias(name)

Gets an internal alias for the official Worldbank indicator.

Parameters:name – The name of the Worldbank indicator.
Returns:The internal alias.
get_countries(*args, **kwargs)

Caches the pandas.io.wb.get_countries() results.

Returns:The result of the query from cache or the WWW.
get_longname(name)

Gets a longer descriptive name for a Worldbank indicator.

Parameters:name – The name of a Worldbank indicator.
Returns:The long descriptive name.
get_name(alias)

Gets the official Worldbank indicator for an internal alias.

Parameters:alias – The internal alias.
Returns:The name of the Worldbank indicator.
rename_columns(df, use_longnames=False)

Renames the columns of a pandas DataFrame.

Parameters:
  • df – A pandas DataFrame.
  • use_longnames – Whether to use longnames for the renaming.
Returns:

The pandas DataFrame with its columns renamed.

dautil.data.centify(text, multiplier=100)

Converts a string representing money to the corresponding number in cents.

Parameters:
  • text – A string such as 10.55.
  • multiplier – A multiplier for the conversion.
Returns:

Cents as an integer, for instance 1055.

>>> from dautil import data
>>> data.centify('10.55')
1055
dautil.data.download(url, out)

Download a file from the web.

Parameters:
  • url – The URL of the file.
  • out – The path of the file.
dautil.data.dropinf(arr)

Removes np.inf and np.nan values.

Parameters:arr – Array with numbers.
Returns:The cleaned array.
>>> from dautil import data
>>> import numpy as np
>>> arr = np.array([np.inf, 0, 42, np.nan])
>>> data.dropinf(arr)
array([  0.,  42.])
dautil.data.from_pickle(fname)

Loads object from pickle.

Parameters:fname – The name of the pickle file.
Returns:The object from the pickle.
dautil.data.get_data_dir()

Finds the appropriate data directory to store data files.

Returns:A data directory, which is OS dependent.
dautil.data.get_direct_marketing_csv()

Retrieves a CSV file with direct marketing data as described in http://blog.minethatdata.com/2008/03/ minethatdata-e-mail-analytics-and-data.html

Returns:The path to the downloaded file.
dautil.data.get_smashing_baby()

Retrieves a WAV file of Austin Powers.

Returns:The path to the downloaded file.
dautil.data.process_gzip(url, file_path)

Downloads and uncompresses a GZIP file.

Parameters:
  • url – The URL of the GZIP file.
  • file_path – The path of the file.
dautil.data.process_zip(url, path, fname)

Downloads and uncompresses a ZIP file.

Parameters:
  • url – The URL of the ZIP file.
  • path – The path of the file.
  • fname – The name of the file.
Returns:

The contents of the extracted file.

dautil.data.read_csv(fname)

Reads a CSV file and returns a list of dictionaries where each line corresponds to a line in the file.

Parameters:fname – The name or path of the file.
Returns:The dictionary.

Previous topic

dautil.conf

Next topic

dautil.db

This Page