pyjstat is a python module for JSON-stat formatted data manipulation.
This module allows reading and writing JSON-stat [1] format with python, using data frame structures provided by the widely accepted pandas library [2]. The JSON-stat format is a simple lightweight JSON format for data dissemination. Pyjstat is inspired in rjstat [3], a library to read and write JSON-stat with R, by ajschumacher.
pyjstat is written and maintained by Miguel Expósito Martín and is distributed under the Apache 2.0 License (see LICENSE file).
[1] | http://json-stat.org/ for JSON-stat information |
[2] | http://pandas.pydata.org for Python Data Analysis Library information |
[3] | https://github.com/ajschumacher/rjstat for rjstat library information |
Example
Importing a JSON-stat file into a pandas data frame can be done as follows:
import urllib2
import json
import pyjstat
results = pyjstat.from_json_stat(json.load(urllib2.urlopen(
'http://json-stat.org/samples/oecd-canada.json')))
print results
A class representing a JSONstat collection.
Gets ith element of a collection in an object of the corresponding class. :param output: can accept ‘jsonstat’ or ‘dataframe_list’
Returns: | Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter. |
---|
Reads data from URL or OrderedDict. :param data: can be a URL pointing to a JSONstat file, a JSON string
or an OrderedDict.
Returns: | An object of class Collection populated with data. |
---|
Writes data from a Collection object to JSONstat or list of Pandas Dataframes. :param output: can accept ‘jsonstat’ or ‘dataframe_list’
Returns: | Serialized JSONstat or a list of Pandas Dataframes,depending on the ‘output’ parameter. |
---|
A class representing a JSONstat dataset.
Converts a dimension ID string and a categody ID string into the numeric index of that category in that dimension :param name: ID string of the dimension. :type name: string :param value: ID string of the category.
Returns: | ndx[value](int): index of the category in the dimension. |
---|
Converts a dimension/category list of dicts into a list of dimensions’ indices. :param query: dimension/category list of dicts.
Returns: | indices – list of dimensions’ indices. |
---|---|
Return type: | list |
Converts a dimension/category list of dicts into a data value in three steps.
Parameters: | query (list) – list of dicts with the desired query. |
---|---|
Returns: | value – numeric data value. |
Return type: | float |
Converts a numeric value index into its data value.
Parameters: | index (int) – numeric value index. |
---|---|
Returns: | self[‘value’][index](float): Numeric data value. |
Converts a list of dimensions’ indices into a numeric value index.
Parameters: | indices (list) – list of dimension’s indices. |
---|---|
Returns: | num – numeric value index. |
Return type: | int |
Parameters: | data – can be a Pandas Dataframe, a JSON file, a JSON string, an OrderedDict or a URL pointing to a JSONstat file. |
---|---|
Returns: | An object of class Dataset populated with data. |
Writes data from a Dataset object to JSONstat or Pandas Dataframe. :param output: can accept ‘jsonstat’ or ‘dataframe’. Default to
‘jsonstat’.
Returns: | Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter. |
---|
A class representing a JSONstat dimension.
Parameters: | data – can be a Pandas Dataframe, a JSON string, a JSON file, an OrderedDict or a URL pointing to a JSONstat file. |
---|---|
Returns: | An object of class Dimension populated with data. |
Writes data from a Dataset object to JSONstat or Pandas Dataframe. :param output: can accept ‘jsonstat’ or ‘dataframe’
Returns: | Serialized JSONstat or a Pandas Dataframe,depending on the ‘output’ parameter. |
---|
Custom JSON encoder class for Numpy data types.
Check and validate input params.
Parameters: | naming (string) – a string containing the naming type (label or id). |
---|---|
Returns: | Nothing |
Raises: | ValueError – if the parameter is not in the allowed list. |
Checks if json-stat version attribute exists and is equal or greater than 2.0 for a given dataset.
Parameters: | dataset (OrderedDict) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), |
---|---|
Returns: | True if version exists and is equal or greater than 2.0, False otherwise. For datasets without the version attribute, always return False. |
Return type: | bool |
Decode JSON-stat formatted data into pandas.DataFrame object.
Parameters: |
|
---|---|
Returns: | results – list of pandas.DataFrame with imported data. |
Return type: | list |
Decode JSON-stat dict into pandas.DataFrame object. Helper method that should be called inside from_json_stat().
Parameters: |
|
---|---|
Returns: | output – pandas.DataFrame with converted data. |
Return type: | DataFrame |
Generate row dimension values for a pandas dataframe.
Parameters: |
|
---|---|
Yields: | list – list with pandas dataframe column values except for value column |
Get index from a given dimension.
Parameters: |
|
---|---|
Returns: | dim_index – DataFrame with index-based dimension data. |
Return type: | pandas.DataFrame |
Get label from a given dimension.
Parameters: |
|
---|---|
Returns: | dim_label – DataFrame with label-based dimension data. |
Return type: | pandas.DataFrame |
Get dimensions from input data.
Parameters: |
|
---|---|
Returns: | dimensions – list of pandas data frames with dimension category data. dim_names (list): list of strings with dimension names. |
Return type: | list |
Get values from input data.
Parameters: |
|
---|---|
Returns: | values – list of dataset values. |
Return type: | list |
Send a request to a given URL accepting JSON format and return a deserialized Python object.
Parameters: | path (str) – The URI to be requested. |
---|---|
Returns: | Deserialized JSON Python object. |
Return type: | response |
Raises: |
|
Convert variable to integer or string depending on the case.
Parameters: | variable (string) – a string containing a real string or an integer. |
---|---|
Returns: | variable – an integer or a string, depending on the content of variable. |
Return type: | int, string |
Parameters: |
|
---|---|
Returns: | output – String with JSON-stat object. |
Return type: | string |
Convert variable to integer or string depending on the case.
Parameters: | variable (string) – a string containing a real string or an integer. |
---|---|
Returns: | variable – an integer or a string, depending on the content of variable. |
Return type: | int, string |
Return unique values in a list in the original order. See: http://www.peterbe.com/plog/uniqifiers-benchmark
Parameters: | seq (list) – original list. |
---|---|
Returns: | list without duplicates preserving original order. |
Return type: | list |
Unnest collection structure extracting all its datasets and converting them to Pandas Dataframes.
Parameters: |
|
---|---|
Returns: | Nothing. |