| |
- builtins.Exception(builtins.BaseException)
-
- FormatStrParseError
- builtins.object
-
- JsonToCsv
class FormatStrParseError(builtins.Exception) |
|
FormatStrParseError - Raised if there is an error in parsing the format string. |
|
- Method resolution order:
- FormatStrParseError
- builtins.Exception
- builtins.BaseException
- builtins.object
Data descriptors defined here:
- __weakref__
- list of weak references to the object (if defined)
Methods inherited from builtins.Exception:
- __init__(self, /, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- __new__(*args, **kwargs) from builtins.type
- Create and return a new object. See help(type) for accurate signature.
Methods inherited from builtins.BaseException:
- __delattr__(self, name, /)
- Implement delattr(self, name).
- __getattribute__(self, name, /)
- Return getattr(self, name).
- __reduce__(...)
- helper for pickle
- __repr__(self, /)
- Return repr(self).
- __setattr__(self, name, value, /)
- Implement setattr(self, name, value).
- __setstate__(...)
- __str__(self, /)
- Return str(self).
- with_traceback(...)
- Exception.with_traceback(tb) --
set self.__traceback__ to tb and return self.
Data descriptors inherited from builtins.BaseException:
- __cause__
- exception cause
- __context__
- exception context
- __dict__
- __suppress_context__
- __traceback__
- args
|
class JsonToCsv(builtins.object) |
|
JsonToCsv - Public class containing methods for dealing with converting
Json to csv data, merging data, etc.
Designed to produce RFC 4180 csv output from json data using a meta language. |
|
Methods defined here:
- __init__(self, formatStr, nullValue='', debug=False)
- __init__ - Create a JsonToCsv object.
@param formatStr <str> - The format formatStr for the json data to be converted.
@param nullValue <str> Default empty string - The value to assign to a "null" result.
@param debug <bool> Default False - If True, will output some debug data on stderr.
- convertToCsv(self, data, quoteFields='smart', lineSeparator='\r\n')
- convertToCsv - Convert given data to csv.
Alias to calling:
extractData
and then passing those results to:
dataToStr
@param data <string/dict> - Either a string of json data, or a dict
@param quoteFields <bool or 'smart'> Default 'smart' -
If False, fields will not be quoted (thus a comma or newline, etc will break the output, but it looks neater on screen)
If True, fields will always be quoted (protecting against commas, allows values to contain newlines, etc)
If 'smart' (default), the need to quote fields will be auto-determined. This may take slighly longer on HUGE datasets,
but is generally okay.
@param lineSeparator <str> - This will separate the lines. RFC4180 defines CRLF as the preferred ending, but implementations
can vary (i.e. unix generally just uses '
'). If you plan to have newlines ('
') in the data, I suggest using '
' as
the lineSeparator as otherwise many implementations (like python's own csv module) will swallow the newline within the data.
@return <list/str> - see "asList" param above.
- extractData(self, data)
- extractData - Return a list of lists. The outer list represents lines, the inner list data points.
e.x. returnData[0] is first line, returnData[0][2] is first line third data point.
@param data <string/dict> - Either a string of JSON data, or a dict.
NOTE: This is the recommended method to be used. You can pass the data to
JsonToCsv.dataToStr to convert to csv, tsv, and other formats.
@return list<list<str>> - List of lines, each line containing a list of datapoints.
Static methods defined here:
- dataToStr(csvData, separator=',', quoteFields='smart', lineSeparator='\r\n')
- dataToStr - Convert a list of lists of csv data to a string.
@param csvData list<list> - A list of lists, first list is lines, inner-list are values.
This is the data returned by JsonToCsv.extractData
@param separator <str> - Default ',' this is the separator used between fields (i.e. would be a tab in TSV format)
@param quoteFields <bool or 'smart'> Default 'smart' -
If False, fields will not be quoted (thus a comma or newline, etc will break the output, but it looks neater on screen)
If True, fields will always be quoted (protecting against commas, allows values to contain newlines, etc)
If 'smart' (default), the need to quote fields will be auto-determined. This may take slighly longer on HUGE datasets,
but is generally okay. Quotes within a field (") will be replaced with two adjacent quotes ("") as per RFC4180
Use 'smart' unless you REALLY need to specify otherwise, as 'smart' will always produce RFC4180 csv files
@param lineSeparator <str> - This will separate the lines. RFC4180 defines CRLF as the preferred ending, but implementations
can vary (i.e. unix generally just uses '
'). If you plan to have newlines ('
') in the data, I suggest using '
' as
the lineSeparator as otherwise many implementations (like python's own csv module) will swallow the newline within the data.
@return str - csv data
- findDuplicates(csvData, fieldNum, flat=False)
- findDuplicates - Find lines with duplicate values in a specific field number.
This is useful to strip duplicates before using JsonToCsv.joinCsv
which requires unique values in the join field.
@see JsonToCsv.joinCsv for example code
@param csvData list<list<str>> - List of lines, each line containing string field values.
JsonToCsv.extractData returns data in this form.
@param fieldNum int - Index of the field number in which to search for duplicates
@param flat bool Default False - If False, return is a map of { "duplicateKey" : lines(copy) }.
If True, return is a flat list of all duplicate lines
@return :
When #flat is False:
dict { duplicateKeyValue[str] : lines[list<list<str>>] (copy) } -
This dict has the values with duplicates as the key, and a COPY of the lines as each value.
When #flat is True
lines[list<list<str>>] (copy)
Copies of all lines with duplicate value in #fieldNum. Duplicates will be adjacent
- joinCsv(csvData1, joinFieldNum1, csvData2, joinFieldNum2)
- joinCsv - Join two sets of csv data based on a common field value in the two sets.
csvData should be a list of list (1st is lines, second is items). Such data is gathered by using JsonToCsv.extractData method
Combined data will append the fields of csvData2 to csvData1, omitting the common field from csvData2
@param csvData1 list<list> - The "primary" data set
@param joinFieldNum1 <int> - The index of the common field in csvData1
@param csvData2 list<list> - The secondary data set
@param joinFieldNum2 <int> - The index of the common field in csvData2
@return tuple( mergedData [list<list>], onlyCsvData1 [list<list>], onlyCsvData2 [list<list>] )
Return is a tuple of 3 elements. The first is the merged csv data where a join field matched.
The second is the elements only present in csvData1
The third is the elements only present in csvData2
@raises ValueError - If csvData1 or csvData2 are not in the right format (list of lists)
@raises KeyError - If there are duplicate keys preventing a proper merge
NOTE: each csvData MUST have unique values in the "join field", or it cannot join.
Maybe try out something new for today, and check out "multiJoinCsv" function.
Use multiJoinCsv to link all matches in csvData1 to all matches in csvData2 where join fields match.
JsonToCsv.findDuplicates will identify duplicate values for a given joinfield.
So you can have something like:
myCsvData = JsonToCsv.extractData(....)
joinFieldNum = 3 # Example, 4th field is the field we will join on
myCsvDataDuplicateLines = JsonToCsv.findDuplicates(myCsvData, joinFieldNum, flat=True)
if myCsvDataDuplicateLines:
myCsvDataUniq = [line for line in myCsvData if line not in myCsvDataDuplicateLines]
else:
myCsvDataUniq = myCsvData
- multiJoinCsv(csvData1, joinFieldNum1, csvData2, joinFieldNum2)
- multiJoinCsv - Join two sets of csv data based on a common field value, but this time merge any results, i.e. if key is repeated on A then you'd have:
AA and AB.
csvData should be a list of list (1st is lines, second is items). Such data is gathered by using JsonToCsv.extractData method
Combined data will append the fields of csvData2 to csvData1, omitting the common field from csvData2
@param csvData1 list<list> - The "primary" data set
@param joinFieldNum1 <int> - The index of the common field in csvData1
@param csvData2 list<list> - The secondary data set
@param joinFieldNum2 <int> - The index of the common field in csvData2
@return tuple( mergedData [list<list>], onlyCsvData1 [list<list>], onlyCsvData2 [list<list>] )
Return is a tuple of 3 elements. The first is the merged csv data where a join field matched.
The second is the elements only present in csvData1
The third is the elements only present in csvData2
@raises ValueError - If csvData1 or csvData2 are not in the right format (list of lists)
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |