pyjstat¶

pyjstat is a python module for JSON-stat formatted data manipulation.

This module allows reading and writing JSON-stat [1] format with python, using data frame structures provided by the widely accepted pandas library [2]. The JSON-stat format is a simple lightweight JSON format for data dissemination. Pyjstat is inspired in rjstat [3], a library to read and write JSON-stat with R, by ajschumacher.

pyjstat is written and maintained by Miguel Expósito Martín and is distributed under the Apache 2.0 License (see LICENSE file).

[1]	http://json-stat.org/ for JSON-stat information

[2]	http://pandas.pydata.org for Python Data Analysis Library information

[3]	https://github.com/ajschumacher/rjstat for rjstat library information

Example

Importing a JSON-stat file into a pandas data frame can be done as follows:

import urllib2
import json
import pyjstat
results = pyjstat.from_json_stat(json.load(urllib2.urlopen(
'http://json-stat.org/samples/oecd-canada.json')))
print results

class pyjstat.Collection(*args, **kwargs)¶

A class representing a JSONstat collection.

get(element)¶

Gets ith element of a collection in an object of the corresponding class. :param output: can accept ‘jsonstat’ or ‘dataframe_list’

Returns:	Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.

classmethod read(data)¶

Reads data from URL or OrderedDict. :param data: can be a URL pointing to a JSONstat file, a JSON string

or an OrderedDict.

Returns:	An object of class Collection populated with data.

write(output='jsonstat')¶

Writes data from a Collection object to JSONstat or list of Pandas Dataframes. :param output: can accept ‘jsonstat’ or ‘dataframe_list’

Returns:	Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.

class pyjstat.Dataset(*args, **kwargs)¶

A class representing a JSONstat dataset.

get_dimension_index(name, value)¶

Converts a dimension ID string and a categody ID string into the numeric index of that category in that dimension :param name: ID string of the dimension. :type name: string :param value: ID string of the category.

Returns:	ndx[value](int): index of the category in the dimension.

get_dimension_indices(query)¶

Converts a dimension/category list of dicts into a list of dimensions’ indices. :param query: dimension/category list of dicts.

Returns:	indices – list of dimensions’ indices.
Return type:	list

get_value(query)¶

Converts a dimension/category list of dicts into a data value in three steps.

Parameters:	query (list) – list of dicts with the desired query.
Returns:	value – numeric data value.
Return type:	float

get_value_by_index(index)¶

Converts a numeric value index into its data value.

Parameters:	index (int) – numeric value index.
Returns:	self[‘value’][index](float): Numeric data value.

get_value_index(indices)¶

Converts a list of dimensions’ indices into a numeric value index.

Parameters:	indices (list) – list of dimension’s indices.
Returns:	num – numeric value index.
Return type:	int

classmethod read(data)¶

Reads data from URL, Dataframe, JSON string, JSON file or: OrderedDict.

Parameters:	data – can be a Pandas Dataframe, a JSON file, a JSON string, an OrderedDict or a URL pointing to a JSONstat file.
Returns:	An object of class Dataset populated with data.

write(output='jsonstat')¶

Writes data from a Dataset object to JSONstat or Pandas Dataframe. :param output: can accept ‘jsonstat’ or ‘dataframe’. Default to

‘jsonstat’.

Returns:	Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.

class pyjstat.Dimension(*args, **kwargs)¶

A class representing a JSONstat dimension.

classmethod read(data)¶

Reads data from URL, Dataframe, JSON string, JSON file: or OrderedDict.

Parameters:	data – can be a Pandas Dataframe, a JSON string, a JSON file, an OrderedDict or a URL pointing to a JSONstat file.
Returns:	An object of class Dimension populated with data.

write(output='jsonstat')¶

Writes data from a Dataset object to JSONstat or Pandas Dataframe. :param output: can accept ‘jsonstat’ or ‘dataframe’

Returns:	Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.

class pyjstat.NumpyEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)¶: Custom JSON encoder class for Numpy data types.

pyjstat.check_input(naming)¶

Check and validate input params.

Parameters:	naming (string) – a string containing the naming type (label or id).
Returns:	Nothing
Raises:	`ValueError` – if the parameter is not in the allowed list.

pyjstat.check_version_2(dataset)¶

Checks if json-stat version attribute exists and is equal or greater than 2.0 for a given dataset.

Parameters:	dataset (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),
Returns:	True if version exists and is equal or greater than 2.0, False otherwise. For datasets without the version attribute, always return False.
Return type:	bool

pyjstat.from_json_stat(datasets, naming='label', value='value')¶

Decode JSON-stat formatted data into pandas.DataFrame object.

Parameters:	datasets (OrderedDict, list) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), for example. Both List and OrderedDict are accepted as inputs. naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.Defaults to ‘label’. value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:	results – list of pandas.DataFrame with imported data.
Return type:	list

pyjstat.generate_df(js_dict, naming, value='value')¶

Decode JSON-stat dict into pandas.DataFrame object. Helper method that should be called inside from_json_stat().

Parameters:	js_dict (OrderedDict) – OrderedDict with data in JSON-stat format, previously deserialized into a python object by json.load() or json.loads(), for example. naming (string) – dimension naming. Possible values: ‘label’ or ‘id.’ value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:	output – pandas.DataFrame with converted data.
Return type:	DataFrame

pyjstat.get_df_row(dimensions, naming='label', i=0, record=None)¶

Generate row dimension values for a pandas dataframe.

Parameters:

dimensions (list) – list of pandas dataframes with dimension labels generated by get_dim_label or get_dim_index methods.
naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
i (int) – dimension list iteration index. Default is 0, it’s used in the recursive calls to the method.
record (list) – list of values representing a pandas dataframe row, except for the value column. Default is empty, it’s used in the recursive calls to the method.

Yields:

list – list with pandas dataframe column values except for value column

pyjstat.get_dim_index(js_dict, dim)¶

Get index from a given dimension.

Parameters:	js_dict (dict) – dictionary containing dataset data and metadata. dim (string) – dimension name obtained from JSON file.
Returns:	dim_index – DataFrame with index-based dimension data.
Return type:	pandas.DataFrame

pyjstat.get_dim_label(js_dict, dim, input='dataset')¶

Get label from a given dimension.

Parameters:	js_dict (dict) – dictionary containing dataset data and metadata. dim (string) – dimension name obtained from JSON file.
Returns:	dim_label – DataFrame with label-based dimension data.
Return type:	pandas.DataFrame

pyjstat.get_dimensions(js_dict, naming)¶

Get dimensions from input data.

Parameters:	js_dict (dict) – dictionary containing dataset data and metadata. naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
Returns:	dimensions – list of pandas data frames with dimension category data. dim_names (list): list of strings with dimension names.
Return type:	list

pyjstat.get_values(js_dict, value='value')¶

Get values from input data.

Parameters:	js_dict (dict) – dictionary containing dataset data and metadata. value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:	values – list of dataset values.
Return type:	list

pyjstat.request(path)¶

Send a request to a given URL accepting JSON format and return a deserialized Python object.

Parameters:	path (str) – The URI to be requested.
Returns:	Deserialized JSON Python object.
Return type:	response
Raises:	`HTTPError` – the HTTP error returned by the requested server. `InvalidURL` – an invalid URL has been requested. `Exception` – generic exception.

pyjstat.to_int(variable)¶

Convert variable to integer or string depending on the case.

Parameters:	variable (string) – a string containing a real string or an integer.
Returns:	variable – an integer or a string, depending on the content of variable.
Return type:	int, string

pyjstat.to_json_stat(input_df, value='value', output='list', version='1.3')¶

Encode pandas.DataFrame object into JSON-stat format. The DataFrames: must have exactly one value column.

Parameters:	df (pandas.DataFrame) – pandas data frame (or list of data frames) to value (string, optional) – name of the value column. Defaults to ‘value’. output (string) – accepts two values: ‘list’ or ‘dict’. Produce list of dicts or dict of dicts as output. version (string) – desired json-stat version. 2.0 is preferred now. Apart from this, only older 1.3 format is accepted, which is the default parameter in order to preserve backwards compatibility.
Returns:	output – String with JSON-stat object.
Return type:	string

pyjstat.to_str(variable)¶

Convert variable to integer or string depending on the case.

Parameters:	variable (string) – a string containing a real string or an integer.
Returns:	variable – an integer or a string, depending on the content of variable.
Return type:	int, string

pyjstat.uniquify(seq)¶

Return unique values in a list in the original order. See: http://www.peterbe.com/plog/uniqifiers-benchmark

Parameters:	seq (list) – original list.
Returns:	list without duplicates preserving original order.
Return type:	list

pyjstat.unnest_collection(collection, df_list)¶

Unnest collection structure extracting all its datasets and converting them to Pandas Dataframes.

Parameters:	collection (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), df_list (list) – list variable which will contain the converted datasets.
Returns:	Nothing.

pyjstat¶

Previous topic

This Page

Navigation

pyjstat¶

Previous topic

This Page

Quick search

Navigation