pyjstat

pyjstat is a python module for JSON-stat formatted data manipulation.

This module allows reading and writing JSON-stat [1] format with python, using data frame structures provided by the widely accepted pandas library [2]. The JSON-stat format is a simple lightweight JSON format for data dissemination. Pyjstat is inspired in rjstat [3], a library to read and write JSON-stat with R, by ajschumacher.

pyjstat is written and maintained by Miguel Expósito Martín and is distributed under the Apache 2.0 License (see LICENSE file).

[1]http://json-stat.org/ for JSON-stat information
[2]http://pandas.pydata.org for Python Data Analysis Library information
[3]https://github.com/ajschumacher/rjstat for rjstat library information

Example

Importing a JSON-stat file into a pandas data frame can be done as follows:

import urllib2
import json
import pyjstat
results = pyjstat.from_json_stat(json.load(urllib2.urlopen(
'http://json-stat.org/samples/oecd-canada.json')))
print results
class pyjstat.Collection(*args, **kwargs)

A class representing a JSONstat collection.

get(element)

Gets ith element of a collection in an object of the corresponding class. :param output: can accept ‘jsonstat’ or ‘dataframe_list’

Returns:Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.
classmethod read(data)

Reads data from URL or OrderedDict. :param data: can be a URL pointing to a JSONstat file, a JSON string

or an OrderedDict.
Returns:An object of class Collection populated with data.
write(output='jsonstat')

Writes data from a Collection object to JSONstat or list of Pandas Dataframes. :param output: can accept ‘jsonstat’ or ‘dataframe_list’

Returns:Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter.
class pyjstat.Dataset(*args, **kwargs)

A class representing a JSONstat dataset.

get_dimension_index(name, value)

Converts a dimension ID string and a categody ID string into the numeric index of that category in that dimension :param name: ID string of the dimension. :type name: string :param value: ID string of the category.

Returns:ndx[value](int): index of the category in the dimension.
get_dimension_indices(query)

Converts a dimension/category list of dicts into a list of dimensions’ indices. :param query: dimension/category list of dicts.

Returns:indices – list of dimensions’ indices.
Return type:list
get_value(query)

Converts a dimension/category list of dicts into a data value in three steps.

Parameters:query (list) – list of dicts with the desired query.
Returns:value – numeric data value.
Return type:float
get_value_by_index(index)

Converts a numeric value index into its data value.

Parameters:index (int) – numeric value index.
Returns:self[‘value’][index](float): Numeric data value.
get_value_index(indices)

Converts a list of dimensions’ indices into a numeric value index.

Parameters:indices (list) – list of dimension’s indices.
Returns:num – numeric value index.
Return type:int
classmethod read(data)
Reads data from URL, Dataframe, JSON string, JSON file or
OrderedDict.
Parameters:data – can be a Pandas Dataframe, a JSON file, a JSON string, an OrderedDict or a URL pointing to a JSONstat file.
Returns:An object of class Dataset populated with data.
write(output='jsonstat')

Writes data from a Dataset object to JSONstat or Pandas Dataframe. :param output: can accept ‘jsonstat’ or ‘dataframe’. Default to

‘jsonstat’.
Returns:Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.
class pyjstat.Dimension(*args, **kwargs)

A class representing a JSONstat dimension.

classmethod read(data)
Reads data from URL, Dataframe, JSON string, JSON file
or OrderedDict.
Parameters:data – can be a Pandas Dataframe, a JSON string, a JSON file, an OrderedDict or a URL pointing to a JSONstat file.
Returns:An object of class Dimension populated with data.
write(output='jsonstat')

Writes data from a Dataset object to JSONstat or Pandas Dataframe. :param output: can accept ‘jsonstat’ or ‘dataframe’

Returns:Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter.
class pyjstat.NumpyEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)

Custom JSON encoder class for Numpy data types.

pyjstat.check_input(naming)

Check and validate input params.

Parameters:naming (string) – a string containing the naming type (label or id).
Returns:Nothing
Raises:ValueError – if the parameter is not in the allowed list.
pyjstat.check_version_2(dataset)

Checks if json-stat version attribute exists and is equal or greater than 2.0 for a given dataset.

Parameters:dataset (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),
Returns:True if version exists and is equal or greater than 2.0, False otherwise. For datasets without the version attribute, always return False.
Return type:bool
pyjstat.from_json_stat(datasets, naming='label', value='value')

Decode JSON-stat formatted data into pandas.DataFrame object.

Parameters:
  • datasets (OrderedDict, list) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), for example. Both List and OrderedDict are accepted as inputs.
  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.Defaults to ‘label’.
  • value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:

results – list of pandas.DataFrame with imported data.

Return type:

list

pyjstat.generate_df(js_dict, naming, value='value')

Decode JSON-stat dict into pandas.DataFrame object. Helper method that should be called inside from_json_stat().

Parameters:
  • js_dict (OrderedDict) – OrderedDict with data in JSON-stat format, previously deserialized into a python object by json.load() or json.loads(), for example.
  • naming (string) – dimension naming. Possible values: ‘label’ or ‘id.’
  • value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:

output – pandas.DataFrame with converted data.

Return type:

DataFrame

pyjstat.get_df_row(dimensions, naming='label', i=0, record=None)

Generate row dimension values for a pandas dataframe.

Parameters:
  • dimensions (list) – list of pandas dataframes with dimension labels generated by get_dim_label or get_dim_index methods.
  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
  • i (int) – dimension list iteration index. Default is 0, it’s used in the recursive calls to the method.
  • record (list) – list of values representing a pandas dataframe row, except for the value column. Default is empty, it’s used in the recursive calls to the method.
Yields:

list – list with pandas dataframe column values except for value column

pyjstat.get_dim_index(js_dict, dim)

Get index from a given dimension.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • dim (string) – dimension name obtained from JSON file.
Returns:

dim_index – DataFrame with index-based dimension data.

Return type:

pandas.DataFrame

pyjstat.get_dim_label(js_dict, dim, input='dataset')

Get label from a given dimension.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • dim (string) – dimension name obtained from JSON file.
Returns:

dim_label – DataFrame with label-based dimension data.

Return type:

pandas.DataFrame

pyjstat.get_dimensions(js_dict, naming)

Get dimensions from input data.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
Returns:

dimensions – list of pandas data frames with dimension category data. dim_names (list): list of strings with dimension names.

Return type:

list

pyjstat.get_values(js_dict, value='value')

Get values from input data.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:

values – list of dataset values.

Return type:

list

pyjstat.request(path)

Send a request to a given URL accepting JSON format and return a deserialized Python object.

Parameters:

path (str) – The URI to be requested.

Returns:

Deserialized JSON Python object.

Return type:

response

Raises:
  • HTTPError – the HTTP error returned by the requested server.
  • InvalidURL – an invalid URL has been requested.
  • Exception – generic exception.
pyjstat.to_int(variable)

Convert variable to integer or string depending on the case.

Parameters:variable (string) – a string containing a real string or an integer.
Returns:variable – an integer or a string, depending on the content of variable.
Return type:int, string
pyjstat.to_json_stat(input_df, value='value', output='list', version='1.3')
Encode pandas.DataFrame object into JSON-stat format. The DataFrames
must have exactly one value column.
Parameters:
  • df (pandas.DataFrame) – pandas data frame (or list of data frames) to
  • value (string, optional) – name of the value column. Defaults to ‘value’.
  • output (string) – accepts two values: ‘list’ or ‘dict’. Produce list of dicts or dict of dicts as output.
  • version (string) – desired json-stat version. 2.0 is preferred now. Apart from this, only older 1.3 format is accepted, which is the default parameter in order to preserve backwards compatibility.
Returns:

output – String with JSON-stat object.

Return type:

string

pyjstat.to_str(variable)

Convert variable to integer or string depending on the case.

Parameters:variable (string) – a string containing a real string or an integer.
Returns:variable – an integer or a string, depending on the content of variable.
Return type:int, string
pyjstat.uniquify(seq)

Return unique values in a list in the original order. See: http://www.peterbe.com/plog/uniqifiers-benchmark

Parameters:seq (list) – original list.
Returns:list without duplicates preserving original order.
Return type:list
pyjstat.unnest_collection(collection, df_list)

Unnest collection structure extracting all its datasets and converting them to Pandas Dataframes.

Parameters:
  • collection (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(),
  • df_list (list) – list variable which will contain the converted datasets.
Returns:

Nothing.

Previous topic

Welcome to pyjstat’s documentation!

This Page