Generated API documentation¶
SavReader¶
-
class
savReaderWriter.
SavReader
(savFileName, returnHeader=False, recodeSysmisTo=None, verbose=False, selectVars=None, idVar=None, rawMode=False, ioUtf8=False, ioLocale=None)[source]¶ Bases:
savReaderWriter.header.Header
Read SPSS system files (.sav, .zsav)
Parameters: savFileName : str
the file name of the spss data file
returnHeader : bool, default False
indicates whether the first record should be a list of variable names
recodeSysmisTo: (value), default None
indicates to which value SPSS missing values ($sysmis) should be recoded. Any value below 10 ** -10 is returned as None
verbose : bool, default False
indicates whether information about the spss data file (e.g., number of cases, variable names, file size) should be printed on the screen.
selectVars : list or None, default None
indicates which variables in the file should be selected. The variables should be specified as a list of valid variable names. If
None
is specified, all the variables in the file are usedidVar : str or None, default None
indicates which variable in the file should be used for use as id variable for the ‘get’ method
rawMode : bool, default False
indicates whether values should get SPSS-style formatting, and whether date variables (if present) should be converted into ISO-dates. If set to
True
the program does not format any values, which increases processing speed. In particularrawMode=True
implies that:- SPSS datetimes will not be converted into ISO8601 dates
- SPSS N formats will not be converted into strings with leading zeroes
- SPSS $sysmis values will not be converted into
None
values - String values will be ceiled multiples of 8 bytes
See also Formats and Date formats
ioUtf8 : bool, int, default False
indicates the mode in which text communicated to or from the I/O Module will be.
- codepage mode:
ioUtf8=CODEPAGE_MODE
, orioUtf8=0
, orioUtf8=False
. Use the current ioLocale setting to determine the encoding for reading and writing data. Cf. SET UNICODE=OFF. - standard unicode mode:
ioUtf8=UNICODE_UMODE
, orioUtf8=1
, orioUtf8=True
. Use Unicode encoding (UTF-8) for reading and writing data. Data are returned asunicode
strings. Cf. SET UNICODE=ON. - bytes unicode mode:
ioUtf8=UNICODE_BMODE
, orioUtf8=2
. Like standard unicode mode, but data are returned asbyte
strings.
See also under
savReaderWriter.Generic.ioUtf8()
and underioUtf8
insavReaderWriter.SavWriter
.Changed in version 3.4:
ioUtf8=UNICODE_BMODE
was added.ioLocale : str or None, default None
indicates the locale of the I/O module. Cf. SET LOCALE (default =
None
, which corresponds tolocale.setlocale(locale.LC_CTYPE)
, for example:en_US.UTF-8
(Unix) orenglish
(Windows). See also undersavReaderWriter.Generic.ioLocale()
.Examples
Typical use:
with SavReader('somefile.sav', returnHeader=True) as reader: header = reader.next() for line in reader: process(line)
Attributes
Methods
-
__init__
(savFileName, returnHeader=False, recodeSysmisTo=None, verbose=False, selectVars=None, idVar=None, rawMode=False, ioUtf8=False, ioLocale=None)[source]¶ Constructor. Initializes all vars that can be recycled
-
__exit__
(type, value, tb)[source]¶ This function closes the spss data file and does some cleaning.
Warning
Always ensure the the .sav file is properly closed, either by using a context manager (
with
statement) or by usingclose()
-
__len__
()[source]¶ This function reports the number of cases (rows) in the spss data file. For example: len(SavReader(savFileName))
-
__cmp__
(other)[source]¶ This function implements behavior for all of the comparison operators so comparisons can be made between SavReader instances, or comparisons between SavReader instances and integers.
-
__str__
()[source]¶ This function returns a conscise file report of the spss data file For example:
data = SavReader(savFileName) print(str(data)) # Python 3: bytes(data) data.close()
-
__unicode__
()[source]¶ This function returns a conscise file report of the spss data file. For example:
data = SavReader(savFileName) print(unicode(data)) # Python 3: str(data) data.close()
-
shape
¶ This function returns the number of rows (nrows) and columns (ncols) as a namedtuple. For example:
data = SavReader(savFileName) data.shape.nrows == len(data) # True data.close()
-
formatValues
(record)[source]¶ This function formats date fields to ISO dates (yyyy-mm-dd), plus some other date/time formats. The SPSS N format is formatted to a character value with leading zeroes. System missing values are recoded to <recodeSysmisTo>, which defaults to None. If rawMode==True, this function does nothing
-
__iter__
()[source]¶ x.__iter__() <==> iter(x). Yields records as a list. For example:
with SavReader("someFile.sav") as reader: for line in reader: process(line)
-
__getitem__
(key)[source]¶ x.__getitem__(y) <==> x[y], where y may be int or slice. This function reports the record of case number <key>. The <key> argument may also be a slice, for example:
data = SavReader("someFile.sav") print("The first six records look like this: %s" % data[:6]) print("The first record looks like this: %s" % data[0]) print("First column: %s" % data[..., 0]) # requires numpy print("Row 4 & 5, first three cols: %s" % data[4:6, :3]) data.close()
-
head
(n=5)[source]¶ This convenience function returns the first <n> records. Example:
data = SavReader("someFile.sav") print("The first five records look like this: %s" % data.head()) data.close()
-
tail
(n=5)[source]¶ This convenience function returns the last <n> records. Example:
data = SavReader("someFile.sav") print("The last four records look like this: %s" % data.tail(4)) data.close()
-
all
()[source]¶ This convenience function returns all the records. Example:
data = SavReader("someFile.sav") list_of_lists = data.all() data.close()
-
__contains__
(item)[source]¶ This function implements membership testing and returns True if <idVar> contains <item>. Thus, it requires the ‘idVar’ parameter to be set. Example:
reader = SavReader(savFileName, idVar="ssn") "987654321" in reader # returns True or False
-
get
(key, default=None, full=False)[source]¶ This function returns the records for which <idVar> == <key> if <key> in <savFileName>, else <default>. Thus, the function mimics dict.get, but note that dict[key] is NOT implemented. NB: Even though this uses a binary search, this is not very fast on large data (esp. the first call, and with full=True)
Parameters: key : str, int, float
key for which the corresponding record should be returned
default : (value)
value that should be returned if <key> is not found
full : bool
value that indicates whether all records for which <idVar> == <key> should be returned
Examples
For example:
data = SavReader(savFileName, idVar="ssn") data.get("987654321", "social security number not found!") data.close()
-
getSavFileInfo
()[source]¶ This function reads and returns some basic information of the open spss data file.
-
spss2strDate
(*datetime)[source]¶ This function converts internal SPSS dates (number of seconds since midnight, Oct 14, 1582 (the beginning of the Gregorian calendar)) to a human-readable format (ISO-8601 where possible)
Parameters: spssDateValue : int, float
fmt : strptime format
recodeSysmisTo : what SPSS $sysmis values will be replaced with
See also
savReaderWriter.SavReaderNp.spss2datetimeDate
- returns
datetime.datetime
object strptime-formats-settings
__init__.py
to change the strptime formats from ISO into something else. Note that dates before 1900 are not affected by format changes in __init__.py.- Date formats
- overview of SPSS datetime formats
Examples
For example:
data = SavReader(savFileName) iso_date = data.spss2strDate(11654150400.0, "%Y-%m-%d", None) data.close()
SavReaderNp¶
New in version 3.4.0.
-
class
savReaderWriter.
SavReaderNp
(savFileName, recodeSysmisTo=nan, rawMode=False, ioUtf8=False, ioLocale=None)[source]¶ Bases:
savReaderWriter.savReader.SavReader
Read SPSS .sav file data into a numpy array (either in-memory or mmap)
Parameters: savFileName : str
The file name of the spss data file
recodeSysmisTo : value
Indicates to which value missing values should be recoded
rawMode : bool
Set to
True
to get faster processing speeds.rawMode=False
indicates:- that trailing blanks will stripped off of string values
- that datetime variables (if present) will be converted into
datetime.datetime
objects, - that SPSS $sysmis values will be converted into
recodeSysmisTo (default
np.nan
, except for datetimes).
ioUtf8 : bool
Indicates the mode in which text communicated to or from the I/O Module will be. Valid values are True (UTF-8 mode aka Unicode mode) and False (Codepage mode). Cf. SET UNICODE=ON/OFF
ioLocale : locale str
indicates the locale of the I/O module. Cf. SET LOCALE. (default = None, which corresponds to locale.setlocale(locale.LC_ALL, “”). For example, en_US.UTF-8.
See also
savReaderWriter.SavWriter
- use _uncompressed.sav savFileName suffix to write uncompressed files
Examples
Typical use:
# memmapped array, omit filename to use in-memory array reader_np = SavReaderNp("Employee data.sav") array = reader_np.to_structured_array("/tmp/test.dat") reader_np.close()
Note. The sav-to-array conversion is MUCH faster when uncompressed .sav files are used. These are created with the SPSS command:
SAVE OUTFILE = 'some_file.sav' /UNCOMPRESSED.
This is NOT the default in SPSS.
Attributes
Methods
-
__getitem__
(*args)[source]¶ x.__getitem__(y) <==> x[y], where y may be int or slice
Parameters: key : int, slice Returns: record : numpy.ndarray Raises: IndexError, TypeError
-
__iter__
()[source]¶ x.__iter__() <==> iter(x). Yields records as a tuple. If rawMode=True, trailing spaces of strings are not removed and SPSS dates are not converted into datetime dates
Returns: record : tuple Raises: SPSSIOError
-
convert_datetimes
(func)[source]¶ Decorator to convert all the SPSS datetimes into datetime.datetime values. Missing datetimes are converted into the value datetime.datetime(1, 1, 1, 0, 0, 0)
-
convert_missings
(func)[source]¶ Decorator to recode numerical missing values into recodeSysmisTo (default: np.nan), unless they are datetimes
-
datetime_dtype
¶ Return the modified dtype in order to accomodate datetime.datetime values that were originally datetimes, stored as floats, in the SPSS file
Returns: datetime dtype : numpy.dtype (complex dtype)
-
datetimevars
¶ Returns a list of the datetime variable nanes (as unicode strings) in the dataset, if any
-
is_homogeneous
¶ Returns boolean that indicates whether the dataset contains only numerical variables (datetimes excluded). If rawMode=True, datetimes are also considered numeric. A dataset with string variables of equal length is not considered to be homogeneous
-
spss2datetimeDate
(*datetime)[source]¶ Convert an SPSS datetime into a
datetime.datetime
objectParameters: spssDateValue : float, int
Returns: datetime : datetime.datetime; errors and missings are returned as
datetime.datetime(datetime.MINYEAR, 1, 1, 0, 0, 0)
See also
savReaderWriter.SavReader.spss2strDate
- convert SPSS datetime into a datetime string
- Date formats
- overview of SPSS datetime formats
-
struct_dtype
¶ Get the dtype that is used to unpack the binary record
Returns: struct dtype : numpy.dtype (complex dtype if heterogeneous data,
simple dtype otherwise). A complex dtype uses varNames as names and varLabels (if any) as titles (fields).
-
to_array
(filename=None)[source]¶ Wrapper for to_ndarray and to_structured_array. Returns an ndarray if the dataset is all-numeric homogeneous (and no datetimes), a structured array otherwise
-
to_ndarray
(*args)[source]¶ Converts a homogeneous, all-numeric SPSS dataset into an ndarray, unless the numerical variables are actually datetimes
Parameters: filename : str, optional
The filename for the memory mapped array. If omitted, the array will be in-memory
Returns: array : numpy.ndarray (if filename=None) or numpy.core.memmap.memmap
The array has a simple dtype, i.e. is a regular ndarray
Raises: ValueError : if the data are not homogeneous. If rawMode=False
(default) SPSS datetimes are not considered to be numerical, even though they are stored as such in the .sav file
See also
savReaderWriter.SavReaderNp.is_homogeneous
- determines whether a dataset is considered to be all-numeric
Examples
For example:
import numpy.ma reader_np = SavReaderNp("./test_data/all_numeric.sav") array = reader_np.to_ndarray() average = numpy.ma.masked_invalid(array).mean() reader_np.close()
-
to_structured_array
(*args)[source]¶ Return the data in <savFileName> as a structured array, optionally using <filename> as a memmapped file.
Parameters: filename : str, optional
The filename for the memory mapped array. If omitted, the array will be in-memory
Returns: array : numpy.ndarray (if filename=None) or numpy.core.memmap.memmap
The array has a complex dtype, i.e. is a structured array. If defined, varLabels may also be used to retrieve columns
Examples
For example:
reader_np = SavReaderNp("./test_data/Employee data.sav") array = reader_np.to_structured_array() mean_salary = array["salary"].mean().round(2) mean_salary == array["Current Salary"].mean().round(2) # True first_record = array[0] reader_np.close()
-
trunc_dtype
¶ Returns the numpy dtype using the SPSS display formats
The following spss-format to numpy-dtype conversions are made:
spss numpy <= F2 float16 (f2) F3-F5 float32 (f4) >= F5 float64 (f8) (datetime) float64 (f8)* A1 >= S1 >= (a1) *) Subsequently converted to datetime.datetime unless rawMode=True. Examples of SPSS datetime display formats are SDATE, EDATE, ADATE, JDATE and TIME.
Note that all numerical values are stored in SPSS files as double precision floats. The SPSS display formats are used to create a more compact dtype. Datetime formats are never shrunk to a more compact format. In the table above, only F and A formats are displayed, but other numerical (e.g. DOLLAR) or string (AHEX) are treated the same way, e.g. DOLLAR5.2 will become float64.
Returns: truncated dtype : numpy.dtype (complex dtype) See also
- Formats
- overview of SPSS display formats
- Date formats
- overview of SPSS datetime formats
-
uformats
¶ Returns a dictionary of variable names (keys) and SPSS formats (values), both as unicode strings
-
uvarNames
¶ Returns a list of variable names, as unicode strings
-
uvarTypes
¶ Returns a dictionary of variable names, as unicode strings (keys) and variable types (values, int). Variable type == 0 indicates numerical values, other values indicate the string length in bytes
SavHeaderReader¶
-
class
savReaderWriter.
SavHeaderReader
(savFileName, ioUtf8=False, ioLocale=None)[source]¶ Bases:
savReaderWriter.header.Header
This class contains methods that read the data dictionary of an SPSS data file. This yields the same information as the Spss command DISPLAY DICTIONARY. NB: do not confuse an Spss dictionary with a Python dictionary!Parameters: savFileName : str
The file name of the spss data file
- ioUtf8 : bool, int, default False
Indicates the mode in which text communicated to or from the I/O Module will be. See also under
savReaderWriter.Generic.ioUtf8()
and underioUtf8
insavReaderWriter.SavReader
.Changed in version 3.4:
ioUtf8=UNICODE_BMODE
was added.
ioLocale : locale str, optional
indicates the locale of the I/O module. Cf. SET LOCALE. (default = None, which corresponds to
locale.setlocale(locale.LC_CTYPE)
)See also
savReaderWriter.Header
- for more options to retrieve individual metadata items
Examples
Typical use:
with SavHeaderReader(savFileName) as header: metadata = header.all() report = str(header) print(metadata.varLabels)
Attributes
Methods
-
__init__
(savFileName, ioUtf8=False, ioLocale=None)[source]¶ Constructor. Initializes all vars that can be recycled
-
__str__
()[source]¶ This function returns a report of the SPSS data dictionary (i.e., the header), in the encoding of the spss file
-
__unicode__
()[source]¶ This function returns a report of the SPSS data dictionary (i.e., the header).
-
__enter__
()[source]¶ This function returns the DictionaryReader object itself so its methods become available for use with context managers (‘with’ statements).
Warning
Always ensure the the .sav file is properly closed, either by using a context manager (
with
statement) or by usingclose()
-
dataDictionary
(asNamedtuple=False)[source]¶ This function returns all the dictionary items. It returns a Python dictionary based on the Spss dictionary of the given Spss file. This is equivalent to the Spss command ‘DISPLAY DICTIONARY’. If asNamedtuple=True, this function returns a namedtuple, so one can retrieve metadata like e.g. ‘metadata.valueLabels’
SavWriter¶
The most commonly used metadata aspects include VARIABLE LABELS, VALUE LABELS, FORMATS and MISSING VALUES.
-
class
savReaderWriter.
SavWriter
(savFileName, varNames, varTypes, valueLabels=None, varLabels=None, formats=None, missingValues=None, measureLevels=None, columnWidths=None, alignments=None, varSets=None, varRoles=None, varAttributes=None, fileAttributes=None, fileLabel=None, multRespDefs=None, caseWeightVar=None, overwrite=True, ioUtf8=False, ioLocale=None, mode='wb', refSavFileName=None)[source]¶ Bases:
savReaderWriter.header.Header
Write SPSS system files (.sav, .zsav)
Below, the associated SPSS commands are given in CAPS.
Parameters: savFileName : str
The file name of the spss data file.
- File names that end with ‘.sav’ are compressed using the ‘old’ compression scheme
- File names that end with ‘_uncompressed.sav’ are, well, not
compressed. This is useful when you intend to read the files with
the faster
savReaderWriter.SavReaderNp
class - File names that end with ‘.zsav’ are compressed using the ZLIB (ZSAV) compression scheme (requires v21 SPSS I/O files)
varNames : list
list of of strings of the variable names in the order in which they should appear in the spss data file. See also under
savReaderWriter.Header.varNamesTypes()
.varTypes : dict
varTypes dictionary {varName: varType}
- varType == 0 –> numeric
- varType > 0 –> character’ of that length (in bytes!)
See also under
savReaderWriter.Header.varNamesTypes()
.valueLabels : dict, optional
value label dictionary
{varName: {value: label}}
Cf. VALUE LABELS. See also undersavReaderWriter.Header.valueLabels()
.varLabels : dict, optional
variable label dictionary
{varName: varLabel}
. Cf. VARIABLE LABELS. See also undersavReaderWriter.Header.varLabels()
.formats : dict, optional
format dictionary
{varName: printFmt}
. Cf. FORMATS. See also undersavReaderWriter.Header.formats()
, under Formats and under Date formats.missingValues : dict, optional
missing values dictionary
{varName: {missing value spec}}
. Cf. MISSING VALUES. See also undersavReaderWriter.Header.missingValues()
measureLevels : dict, optional
measurement level dictionary
{varName: <level>}
. Valid levels are: “unknown”, “nominal”, “ordinal”, “scale”, “ratio”, “flag”, “typeless”. Cf. VARIABLE LEVEL See also undersavReaderWriter.Header.measureLevels()
.Warning
measureLevels, columnWidths and alignments must all three be set, if used
columnWidths : dict, optional
column display width dictionary
{varName: <int>}
. Cf. VARIABLE WIDTH. (default: None –> >= 10 [stringVars] or automatic [numVars]) See also undersavReaderWriter.Header.columnWidths()
.alignments : dict, optional
variable alignment dictionary
{varName: <left/center/right>}
. Cf. VARIABLE ALIGNMENT (default: None –> left) See also undersavReaderWriter.Header.alignments()
.varSets : dict, optional
sets dictionary
{setName: list_of_valid_varNames}
. Cf. SETSMR command. See also undersavReaderWriter.Header.varSets()
varRoles : dict, optional
variable roles dictionary
{varName: varRole}
, where varRole may be any of the following: ‘both’, ‘frequency’, ‘input’, ‘none’, ‘partition’, ‘record ID’, ‘split’, ‘target’. Cf. VARIABLE ROLE See also undersavReaderWriter.Header.varRoles()
.varAttributes : dict, optional
variable attributes dictionary
{varName: {attribName: attribValue}
. Cf. VARIABLE ATTRIBUTES. See also undersavReaderWriter.Header.varAttributes()
.fileAttributes : dict, optional
file attributes dictionary
{attribName: attribValue}
. Cf. FILE ATTRIBUTES. See also undersavReaderWriter.Header.fileAttributes()
.fileLabel : dict, optional
file label string, which defaults to “File created by user <username> at <datetime>” is file label is None. Cf. FILE LABEL See also under
savReaderWriter.Header.fileLabel()
.multRespDefs : dict, optional
multiple response sets definitions (dichotomy groups or category groups) dictionary
{setName: <set definition>}
. In SPSS syntax, ‘setName’ has a dollar prefix (‘$someSet’). Cf. MRSETS. See also undersavReaderWriter.Header.multRespDefs()
.caseWeightVar : str, optional
valid varName that is set as case weight (cf. WEIGHT BY). See also under
savReaderWriter.Header.caseWeightVar()
.overwrite : bool, optional
indicates whether an existing SPSS file should be overwritten
ioUtf8 : bool, optional
indicates the mode in which text communicated to or from the I/O Module will be. This refers to unicode mode (SET UNICODE=ON) and codepage mode in SPSS (SET UNICODE=OFF). See also under
savReaderWriter.Generic.ioUtf8()
and underioUtf8
insavReaderWriter.SavReader
.- ioUtf8=False. Use the current ioLocale setting to determine the encoding for writing data.
- ioUtf8=True. Use Unicode encoding (UTF-8) for writing data.
Note: Data files saved in Unicode encoding cannot be read by versions of IBM SPSS Statistics prior to 16. Unicode mode is the default since IBM SPSS Statistics version 21. When opening code page IBM SPSS Statistics data files in Unicode mode or saving data files as Unicode in codepage mode, defined string widths are automatically tripled.
ioLocale : bool, optional
indicates the locale of the I/O module, cf. SET LOCALE (default:
None
, which is the same aslocale.setlocale(locale.LC_CTYPE)
). See also undersavReaderWriter.Generic.ioLocale()
mode : str, optional
indicates the mode in which
savFileName
should be opened. Possible values are:- “wb” –> write
- “ab” –> append
- “cp” –> copy: initialize header using
refSavFileName
as a reference file, cf. APPLY DICTIONARY.
refSavFileName : str, optional
reference file that should be used to initialize the header (aka the SPSS data dictionary) containing variable label, value label, missing value, etc. etc. definitions. Only relevant in conjunction with
mode="cp"
.See also
savReaderWriter.Header
- for details about how to define individual metadata items
Examples
Typical use:
records = [[b'Test1', 1, 1], [b'Test2', 2, 1]] varNames = [b'var1', b'v2', b'v3'] varTypes = {b'var1': 5, b'v2': 0, b'v3': 0} savFileName = 'someFile.sav' with SavWriter(savFileName, varNames, varTypes) as writer: for record in records: writer.writerow(record)
Attributes
Methods
-
__init__
(savFileName, varNames, varTypes, valueLabels=None, varLabels=None, formats=None, missingValues=None, measureLevels=None, columnWidths=None, alignments=None, varSets=None, varRoles=None, varAttributes=None, fileAttributes=None, fileLabel=None, multRespDefs=None, caseWeightVar=None, overwrite=True, ioUtf8=False, ioLocale=None, mode='wb', refSavFileName=None)[source]¶ Constructor. Initializes all vars that can be recycled
-
__enter__
()[source]¶ This function returns the writer object itself so the writerow and writerows methods become available for use with ‘with’ statements
-
__exit__
(type, value, tb)[source]¶ This function closes the spss data file.
Warning
Always ensure the the .sav file is properly closed, either by using a context manager (
with
statement) or by usingclose()
-
convertDate
(day, month, year)[source]¶ This function converts a Gregorian date expressed as day-month-year to the internal SPSS date format. The time portion of the date variable is set to 0:00. To set the time portion if the date variable to another value, use convertTime.
-
convertTime
(day, hour, minute, second)[source]¶ This function converts a time given as day, hours, minutes, and seconds to the internal SPSS time format.
-
spssDateTime
(datetimeStr='2001-12-08', strptimeFmt='%Y-%m-%d')[source]¶ This function converts a date/time string into an SPSS date, using a strptime format. See also Date formats
-
writerows
(records)[source]¶ This function writes all records.
Parameters: records : list, tuple, numpy.ndarray, pandas.DataFrame, or similar
the records to be written to the .sav file
Raises: TypeError : if the records instance is not of a suitable type
ValueError : if bool(records) == False, or if the array/DataFrame
is empty
Header¶
Note
This class should not be used directly. Use SavHeaderReader
or SavReader
to retrieve metadata.
-
class
savReaderWriter.
Header
(savFileName, mode, refSavFileName, ioUtf8=False, ioLocale=None)[source]¶ Bases:
savReaderWriter.generic.Generic
This class contains methods responsible for getting and setting meta data that is embedded in the IBM SPSS Statistics data file. In SPSS speak, this header information is known as the SPSS Data Dictionary (which has diddly squat to do with a Python dictionary!). NOTE: this class should not be called directly. Use SavHeaderReader to retrieve metadata.
Attributes
Methods
-
numberofCases
¶ This function reports the number of cases present in a data file. Prehistoric files (< SPSS v6.0) don’t contain nCases info, therefore a guesstimate of the number of cases is given for those files (cf. SHOW N)
See also
savReaderWriter.SavReader.__len__
- use len(reader) to get the number of cases
savReaderWriter.SavReader.shape
- use reader.shape to get a (nrows, ncols) ntuple
-
numberofVariables
¶ This function returns the number of variables (columns) in the spss dataset
See also
savReaderWriter.SavReader.shape
- use reader.shape to get a (nrows, ncols) ntuple
-
varNamesTypes
¶ Get/Set a tuple of variable names and types
- Variable names is a list of the form [b’var1’, b’var2’, b’etc’]
- Variable types is a dictionary of the form {varName: varType}
The variable type code is an integer in the range 0-32767, 0 indicating a numeric variable (e.g., F8.2) and a positive value indicating a string variable of that size (in bytes).
-
valueLabels
¶ Get/Set VALUE LABELS. Takes a dictionary of the form {varName: {value: valueLabel}}:
{b'numGender': {1: b'female', {2: b'male'}, b'strGender': {b'f': b'female', b'm': b'male'}}
-
varLabels
¶ Get/set VARIABLE LABELS. Returns/takes a dictionary of the form {varName: varLabel}. For example:
{b'salary': b'Salary (dollars)', b'educ': b'Educational level (years)'}
-
formats
¶ Get the PRINT FORMATS, set PRINT FORMATS and WRITE FORMATS. Returns/takes a dictionary of the form {varName: <spss-format>}. For example:
{b'salary': b'DOLLAR8', b'gender': b'A1', b'educ': b'F8.2'}
-
missingValues
¶ Get/Set MISSING VALUES. User missing values are values that will not be included in calculations by SPSS. For example, ‘don’t know’ might be coded as a user missing value (a value of 999 is typically used, so when vairable ‘age’ has values 5, 15, 999, the average age is 10). This is different from ‘system missing values’, which are blank/null values. Takes a dictionary of the following form:
# note that 'lower', 'upper', 'value(s)' are without b' prefix missingValues = { # discrete values b"someNumvar1": {"values": [999, -1, -2]}, # range, cf. MISSING VALUES x (-9 THRU -1) b"someNumvar2": {"lower": -9, "upper": -1}, b"someNumvar3": {"lower": -9, "upper": -1, "value": 999}, # string variables can have up to three missing values b"someStrvar1": {"values": [b"foo", b"bar", b"baz"]}, b"someStrvar2": {"values': b"bletch"} }
-
measureLevels
¶ Get/Set VARIABLE LEVEL (measurement level). Returns/Takes a dictionary of the form {varName: varMeasureLevel}. Valid measurement levels are: “unknown”, “nominal”, “ordinal”, “scale”, “ratio”, “flag”, “typeless”. This is used in SPSS procedures such as CTABLES.
-
columnWidths
¶ Get/Set VARIABLE WIDTH (display width). Returns/Takes a dictionary of the form {varName: <int>}. A value of zero is special and means that the IBM SPSS Statistics Data Editor is to set an appropriate width using its own algorithm. If used, variable alignment, measurement level and column width all needs to be set.
-
alignments
¶ - Get/Set VARIABLE ALIGNMENT. Returns/Takes a dictionary of the
- form {varName: alignment}. Valid alignment values are: “left”, “right”, “center”.
Warning
measureLevels, columnWidths, alignments must all three be set, if used
-
varSets
¶ Get/Set VARIABLE SET information. Returns/Takes a dictionary with setname as keys and a list of SPSS variables as values. For example:
{b'SALARY': [b'salbegin', b'salary'], b'DEMOGR': [b'gender', b'minority', b'educ']}
-
varRoles
¶ Get/Set VARIABLE ROLES. Returns/Takes a dictionary of the form {varName: varRole}, where varRoles may be any of the following: ‘both’, ‘frequency’, ‘input’, ‘none’, ‘partition’, ‘record ID’, ‘split’, ‘target’
-
varAttributes
¶ Get/Set VARIABLE ATTRIBUTES. Returns/Takes dictionary of the form:
{b'var1': {b'attr name x': b'attr value x', b'attr name y': b'attr value y'}, b'var2': {b'attr name a': b'attr value a', b'attr name b': b'attr value b'}}
-
fileAttributes
¶ Get/Set DATAFILE ATTRIBUTES. Returns/Takes a dictionary of the form:
b'attrName[1]': b'attrValue1', b'revision[1]': b'2010-10-09', b'revision[2]': b'2010-10-22', b'revision[3]': b'2010-11-19'}
Square brackets indicate attribute arrays, which must start with 1
-
multRespDefs
¶ Get/Set MRSETS (multiple response) sets. Returns/takes a dictionary of the form:
- multiple category sets: {setName: {“setType”: “C”, “label”: lbl, “varNames”: [<list_of_varNames>]}}
- multiple dichotomy sets: {setName: {“setType”: “D”, “label”: lbl, “varNames”: [<list_of_varNames>], “countedValue”: countedValue}}
- extended multiple dichotomy sets: {setName: {“setType”: “E”, “label”: lbl, “varNames”: [<list_of_varNames>], “countedValue”: countedValue, ‘firstVarIsLabel’: <bool>}}
Note. You can get values of extended multiple dichotomy sets with getMultRespSetsDefEx, but you cannot write extended multiple dichotomy sets.
For example:
categorical = {b"setType": b"C", b"label": b"labelC", b"varNames": [b"salary", b"educ"]} dichotomous1 = {b"setType": b"D", b"label": b"labelD", b"varNames": [b"salary", b"educ"], b"countedValue": b"Yes"} dichotomous2 = {b"setType": b"D", b"label": b"", b"varNames": [b"salary", b"educ", b"jobcat"], b"countedValue": b"No"} extended1 = {b"setType": b"E", b"label": b"", b"varNames": [b"mevar1", b"mevar2", b"mevar3"], b"countedValue": b"1", b"firstVarIsLabel": True} extended2 = {b"setType": b"E", b"label": b"Enhanced set with user specified label", b"varNames": [b"mevar4", b"mevar5", b"mevar6"], b"countedValue": b"Yes", b"firstVarIsLabel": False} multRespDefs = {b"testSetC": categorical, b"testSetD1": dichotomous1, b"testSetD2": dichotomous2, b"testSetEx1": extended1, b"testSetEx2": extended2}
-
caseWeightVar
¶ Get/Set WEIGHT variable. Takes a valid varName, and returns weight variable, if any, as a string.
-
textInfo
¶ Get/Set text information. Takes a savFileName and returns a string of the form: “File %r built using SavReaderWriter.py version %s (%s)”. This is akin to, but not equivalent to the SPSS syntax command DISPLAY DOCUMENTS
-
fileLabel
¶ Get/Set FILE LABEL (id string) Takes a file label, and returns file label, if any, as a byte string.
-
Generic¶
Note
This class should not be used directly
-
class
savReaderWriter.
Generic
(savFileName, ioUtf8=False, ioLocale=None)[source]¶ Bases:
object
Class for methods and data used in reading as well as writing IBM SPSS Statistics data files
Attributes
Methods
-
byteorder
¶ This function returns the byte order of the open file as a string. It returns either ‘little’ or ‘big’.
-
spssVersion
¶ Return the SPSS version that was used to create the opened file as a three-tuple indicating major, minor, and fixpack version asunde ints. NB: in the transition from SPSS to IBM, a new four-digit versioning nomenclature is used. This function returns the old three-digit nomenclature. Therefore, no patch version information is available.
-
spssioVersion
¶ This function returns the version of the IBM SPSS I/O libraries as a named tuple with the fields major, minor, patch, fixpack. May also be inspected by passing an empty savFileName, as in: savReaderWriter.Generic(“”).spssioVersion
-
fileCompression
¶ Get/Set the file compression. Returns/Takes a compression switch which may be any of the following: ‘uncompressed’, ‘standard’, or ‘zlib’. Zlib comression requires SPSS v21 I/O files.
-
systemString
¶ This function returns the name of the system under which the file was created aa a string.
-
sysmis
¶ This function returns the IBM SPSS Statistics system-missing value ($SYSMIS) for the host system (also called ‘NA’ in other systems).
-
missingValuesLowHigh
¶ This function returns the ‘lowest’ and ‘highest’ values used for numeric missing value ranges on the host system. This can be used in a similar way as the LO and HI keywords in missing values specifications (cf. MISSING VALUES foo (LO THRU 0). It may be called at any time.
-
ioLocale
¶ This function gets/sets the I/O Module’s locale. This corresponds with the SPSS command SET LOCALE. The I/O Module’s locale is separate from that of the client application. The <localeName> parameter and the return value are identical to those for the C run-time function setlocale. The exact locale name specification depends on the OS of the host sytem, but has the following form:
<lang>_<territory>.<codeset>[@<modifiers>]
The ‘codeset’ and ‘modifier’ components are optional and in Windows, aliases (e.g. ‘english’) may be used. When the I/O Module is first loaded, its locale is set to the system default.
-
fileCodePage
¶ This function provides the Windows code page number of the encoding applicable to a file.
-
isCompatibleEncoding
()[source]¶ This function determines whether the file and interface encoding are compatible.
-
ioUtf8
¶ This function returns/sets the current interface encoding
ioUtf8 = False
–> CODEPAGE mode,ioUtf8 = True
–> UTF-8 mode, aka. UNICODE mode
This corresponds with the SPSS command SHOW UNICODE (getter) and SET UNICODE=ON/OFF (setter).
See also
-
fileEncoding
¶ This function obtains the encoding applicable to a file. The encoding is returned as an IANA encoding name, such as ISO-8859-1, which is then converted to the corresponding Python codec name. If the file contains no file encoding, the locale’s preferred encoding is returned
-