Generated API documentation¶

SavReader¶

class savReaderWriter.SavReader(savFileName, returnHeader=False, recodeSysmisTo=None, verbose=False, selectVars=None, idVar=None, rawMode=False, ioUtf8=False, ioLocale=None)[source]¶

Bases: savReaderWriter.header.Header

Read SPSS system files (.sav, .zsav)

Parameters:

savFileName : str

the file name of the spss data file

returnHeader : bool, default False

indicates whether the first record should be a list of variable names

recodeSysmisTo: (value), default None

indicates to which value SPSS missing values ($sysmis) should be recoded. Any value below 10 ** -10 is returned as None

verbose : bool, default False

indicates whether information about the spss data file (e.g., number of cases, variable names, file size) should be printed on the screen.

selectVars : list or None, default None

indicates which variables in the file should be selected. The variables should be specified as a list of valid variable names. If None is specified, all the variables in the file are used

idVar : str or None, default None

indicates which variable in the file should be used for use as id variable for the ‘get’ method

rawMode : bool, default False

indicates whether values should get SPSS-style formatting, and whether date variables (if present) should be converted into ISO-dates. If set to True the program does not format any values, which increases processing speed. In particular rawMode=True implies that:

SPSS datetimes will not be converted into ISO8601 dates

SPSS N formats will not be converted into strings with leading zeroes

SPSS $sysmis values will not be converted into None values

String values will be ceiled multiples of 8 bytes

See also Formats and Date formats

ioUtf8 : bool, int, default False

indicates the mode in which text communicated to or from the I/O Module will be.

codepage mode: ioUtf8=CODEPAGE_MODE, or ioUtf8=0, or ioUtf8=False. Use the current ioLocale setting to determine the encoding for reading and writing data. Cf. SET UNICODE=OFF.

standard unicode mode: ioUtf8=UNICODE_UMODE, or ioUtf8=1, or ioUtf8=True. Use Unicode encoding (UTF-8) for reading and writing data. Data are returned as unicode strings. Cf. SET UNICODE=ON.

bytes unicode mode: ioUtf8=UNICODE_BMODE, or ioUtf8=2. Like standard unicode mode, but data are returned as byte strings.

See also under savReaderWriter.Generic.ioUtf8() and under ioUtf8 in savReaderWriter.SavWriter.

Changed in version 3.4: ioUtf8=UNICODE_BMODE was added.

ioLocale : str or None, default None

indicates the locale of the I/O module. Cf. SET LOCALE (default = None, which corresponds to locale.setlocale(locale.LC_CTYPE), for example: en_US.UTF-8 (Unix) or english (Windows). See also under savReaderWriter.Generic.ioLocale().

Examples

Typical use:

with SavReader('somefile.sav', returnHeader=True) as reader:
    header = reader.next()
    for line in reader:
        process(line)

Attributes

Methods

__init__(savFileName, returnHeader=False, recodeSysmisTo=None, verbose=False, selectVars=None, idVar=None, rawMode=False, ioUtf8=False, ioLocale=None)[source]¶: Constructor. Initializes all vars that can be recycled

__enter__()[source]¶: This function opens the spss data file (context manager).

__exit__(type, value, tb)[source]¶: This function closes the spss data file and does some cleaning.

Warning

Always ensure the the .sav file is properly closed, either by using a context manager (with statement) or by using close()

close()[source]¶: This function closes the spss data file and does some cleaning.

__len__()[source]¶: This function reports the number of cases (rows) in the spss data file. For example: len(SavReader(savFileName))

__cmp__(other)[source]¶: This function implements behavior for all of the comparison operators so comparisons can be made between SavReader instances, or comparisons between SavReader instances and integers.

__hash__()[source]¶: This function returns a hash value for the object to ensure it is hashable.

__str__()[source]¶

This function returns a conscise file report of the spss data file For example:

data = SavReader(savFileName)
print(str(data))  # Python 3: bytes(data)
data.close()

__unicode__()[source]¶

This function returns a conscise file report of the spss data file. For example:

data = SavReader(savFileName)
print(unicode(data))  # Python 3: str(data)
data.close()

__next__()[source]¶: reader.next() -> the next value, or raise StopIteration

next() → the next value, or raise StopIteration[source]¶

shape¶

This function returns the number of rows (nrows) and columns (ncols) as a namedtuple. For example:

data = SavReader(savFileName)
data.shape.nrows == len(data) # True
data.close()

formatValues(record)[source]¶: This function formats date fields to ISO dates (yyyy-mm-dd), plus some other date/time formats. The SPSS N format is formatted to a character value with leading zeroes. System missing values are recoded to <recodeSysmisTo>, which defaults to None. If rawMode==True, this function does nothing

__iter__()[source]¶

x.__iter__() <==> iter(x). Yields records as a list. For example:

with SavReader("someFile.sav") as reader:
    for line in reader:
        process(line)

__getitem__(key)[source]¶

x.__getitem__(y) <==> x[y], where y may be int or slice. This function reports the record of case number <key>. The <key> argument may also be a slice, for example:

data = SavReader("someFile.sav") 
print("The first six records look like this: %s" % data[:6])
print("The first record looks like this: %s" % data[0])
print("First column: %s" % data[..., 0]) # requires numpy
print("Row 4 & 5, first three cols: %s" % data[4:6, :3])
data.close()

head(n=5)[source]¶

This convenience function returns the first <n> records. Example:

data = SavReader("someFile.sav") 
print("The first five records look like this: %s" % data.head())
data.close()

tail(n=5)[source]¶

This convenience function returns the last <n> records. Example:

data = SavReader("someFile.sav") 
print("The last four records look like this: %s" % data.tail(4))
data.close()

all()[source]¶

This convenience function returns all the records. Example:

data = SavReader("someFile.sav") 
list_of_lists = data.all()
data.close()

__contains__(item)[source]¶

This function implements membership testing and returns True if <idVar> contains <item>. Thus, it requires the ‘idVar’ parameter to be set. Example:

reader = SavReader(savFileName, idVar="ssn")
"987654321" in reader # returns True or False

get(key, default=None, full=False)[source]¶

This function returns the records for which <idVar> == <key> if <key> in <savFileName>, else <default>. Thus, the function mimics dict.get, but note that dict[key] is NOT implemented. NB: Even though this uses a binary search, this is not very fast on large data (esp. the first call, and with full=True)

Parameters:

key : str, int, float

key for which the corresponding record should be returned

default : (value)

value that should be returned if <key> is not found

full : bool

value that indicates whether all records for which <idVar> == <key> should be returned

Examples

For example:

data = SavReader(savFileName, idVar="ssn")
data.get("987654321", "social security number not found!")
data.close()

getSavFileInfo()[source]¶: This function reads and returns some basic information of the open spss data file.

decode(func)[source]¶: Decorator to decode datestrings for ioUtf8

spss2strDate(*datetime)[source]¶

This function converts internal SPSS dates (number of seconds since midnight, Oct 14, 1582 (the beginning of the Gregorian calendar)) to a human-readable format (ISO-8601 where possible)

Parameters:

spssDateValue : int, float

fmt : strptime format

recodeSysmisTo : what SPSS $sysmis values will be replaced with

See also

savReaderWriter.SavReaderNp.spss2datetimeDate: returns datetime.datetime object
strptime-formats-settings: __init__.py to change the strptime formats from ISO into something else. Note that dates before 1900 are not affected by format changes in __init__.py.
Date formats: overview of SPSS datetime formats

Examples

For example:

data = SavReader(savFileName)
iso_date = data.spss2strDate(11654150400.0, "%Y-%m-%d", None)
data.close()

getFileReport()[source]¶: This function prints a report about basic file characteristics

getHeader(selectVars)[source]¶: This function returns the variable names, or a selection thereof (as specified as a list using the selectVars parameter), as a list.

SavReaderNp¶

New in version 3.4.0.

class savReaderWriter.SavReaderNp(savFileName, recodeSysmisTo=nan, rawMode=False, ioUtf8=False, ioLocale=None)[source]¶

Bases: savReaderWriter.savReader.SavReader

Read SPSS .sav file data into a numpy array (either in-memory or mmap)

Parameters:

savFileName : str

The file name of the spss data file

recodeSysmisTo : value

Indicates to which value missing values should be recoded

rawMode : bool

Set to True to get faster processing speeds. rawMode=False indicates:

that trailing blanks will stripped off of string values

that datetime variables (if present) will be converted into datetime.datetime objects,

that SPSS $sysmis values will be converted into recodeSysmisTo (default np.nan, except for datetimes).

ioUtf8 : bool

Indicates the mode in which text communicated to or from the I/O Module will be. Valid values are True (UTF-8 mode aka Unicode mode) and False (Codepage mode). Cf. SET UNICODE=ON/OFF

ioLocale : locale str

indicates the locale of the I/O module. Cf. SET LOCALE. (default = None, which corresponds to locale.setlocale(locale.LC_ALL, “”). For example, en_US.UTF-8.

See also

savReaderWriter.SavWriter: use _uncompressed.sav savFileName suffix to write uncompressed files

Examples

Typical use:

# memmapped array, omit filename to use in-memory array 
reader_np = SavReaderNp("Employee data.sav")
array = reader_np.to_structured_array("/tmp/test.dat") 
reader_np.close()

Note. The sav-to-array conversion is MUCH faster when uncompressed .sav files are used. These are created with the SPSS command:

SAVE OUTFILE = 'some_file.sav' /UNCOMPRESSED.

This is NOT the default in SPSS.

Attributes

Methods

__getitem__(*args)[source]¶

x.__getitem__(y) <==> x[y], where y may be int or slice

Parameters:	key : int, slice
Returns:	record : numpy.ndarray
Raises:	IndexError, TypeError

__iter__()[source]¶

x.__iter__() <==> iter(x). Yields records as a tuple. If rawMode=True, trailing spaces of strings are not removed and SPSS dates are not converted into datetime dates

Returns:	record : tuple
Raises:	SPSSIOError

all(filename=None)[source]¶: Wrapper for to_structured_array; overrides the SavReader version

See also

savReaderWriter.SavReaderNp.to_structured_array

convert_datetimes(func)[source]¶: Decorator to convert all the SPSS datetimes into datetime.datetime values. Missing datetimes are converted into the value datetime.datetime(1, 1, 1, 0, 0, 0)

convert_missings(func)[source]¶: Decorator to recode numerical missing values into recodeSysmisTo (default: np.nan), unless they are datetimes

datetime_dtype¶

Return the modified dtype in order to accomodate datetime.datetime values that were originally datetimes, stored as floats, in the SPSS file

Returns:	datetime dtype : numpy.dtype (complex dtype)

datetimevars¶: Returns a list of the datetime variable nanes (as unicode strings) in the dataset, if any

is_homogeneous¶: Returns boolean that indicates whether the dataset contains only numerical variables (datetimes excluded). If rawMode=True, datetimes are also considered numeric. A dataset with string variables of equal length is not considered to be homogeneous

spss2datetimeDate(*datetime)[source]¶

Convert an SPSS datetime into a datetime.datetime object

Parameters:

spssDateValue : float, int

Returns:

datetime : datetime.datetime; errors and missings are returned as

datetime.datetime(datetime.MINYEAR, 1, 1, 0, 0, 0)

See also

savReaderWriter.SavReader.spss2strDate: convert SPSS datetime into a datetime string
Date formats: overview of SPSS datetime formats

struct_dtype¶

Get the dtype that is used to unpack the binary record

Returns:

struct dtype : numpy.dtype (complex dtype if heterogeneous data,

simple dtype otherwise). A complex dtype uses varNames as names and varLabels (if any) as titles (fields).

to_array(filename=None)[source]¶: Wrapper for to_ndarray and to_structured_array. Returns an ndarray if the dataset is all-numeric homogeneous (and no datetimes), a structured array otherwise

See also

savReaderWriter.SavReaderNp.to_ndarray, savReaderWriter.SavReaderNp.to_structured_array

to_ndarray(*args)[source]¶

Converts a homogeneous, all-numeric SPSS dataset into an ndarray, unless the numerical variables are actually datetimes

Parameters:

filename : str, optional

The filename for the memory mapped array. If omitted, the array will be in-memory

Returns:

array : numpy.ndarray (if filename=None) or numpy.core.memmap.memmap

The array has a simple dtype, i.e. is a regular ndarray

Raises:

ValueError : if the data are not homogeneous. If rawMode=False

(default) SPSS datetimes are not considered to be numerical, even though they are stored as such in the .sav file

See also

savReaderWriter.SavReaderNp.is_homogeneous: determines whether a dataset is considered to be all-numeric

savReaderWriter.SavReaderNp.to_structured_array

Examples

For example:

import numpy.ma 
reader_np = SavReaderNp("./test_data/all_numeric.sav")
array = reader_np.to_ndarray()
average = numpy.ma.masked_invalid(array).mean()
reader_np.close()

to_structured_array(*args)[source]¶

Return the data in <savFileName> as a structured array, optionally using <filename> as a memmapped file.

Parameters:

filename : str, optional

The filename for the memory mapped array. If omitted, the array will be in-memory

Returns:

array : numpy.ndarray (if filename=None) or numpy.core.memmap.memmap

The array has a complex dtype, i.e. is a structured array. If defined, varLabels may also be used to retrieve columns

Examples

For example:

reader_np = SavReaderNp("./test_data/Employee data.sav")
array = reader_np.to_structured_array()
mean_salary = array["salary"].mean().round(2)
mean_salary == array["Current Salary"].mean().round(2)  # True
first_record = array[0]
reader_np.close()

trunc_dtype¶

Returns the numpy dtype using the SPSS display formats

The following spss-format to numpy-dtype conversions are made:

spss	numpy
<= F2	float16 (f2)
F3-F5	float32 (f4)
>= F5	float64 (f8)
(datetime)	float64 (f8)*
A1 >=	S1 >= (a1)

*) Subsequently converted to datetime.datetime unless rawMode=True. Examples of SPSS datetime display formats are SDATE, EDATE, ADATE, JDATE and TIME.

Note that all numerical values are stored in SPSS files as double precision floats. The SPSS display formats are used to create a more compact dtype. Datetime formats are never shrunk to a more compact format. In the table above, only F and A formats are displayed, but other numerical (e.g. DOLLAR) or string (AHEX) are treated the same way, e.g. DOLLAR5.2 will become float64.

Returns:	truncated dtype : numpy.dtype (complex dtype)

See also

Formats: overview of SPSS display formats
Date formats: overview of SPSS datetime formats

uformats¶: Returns a dictionary of variable names (keys) and SPSS formats (values), both as unicode strings

uvarNames¶: Returns a list of variable names, as unicode strings

uvarTypes¶: Returns a dictionary of variable names, as unicode strings (keys) and variable types (values, int). Variable type == 0 indicates numerical values, other values indicate the string length in bytes

SavHeaderReader¶

class savReaderWriter.SavHeaderReader(savFileName, ioUtf8=False, ioLocale=None)[source]¶

Bases: savReaderWriter.header.Header

This class contains methods that read the data dictionary of an SPSS data file. This yields the same information as the Spss command DISPLAY DICTIONARY. NB: do not confuse an Spss dictionary with a Python dictionary!

Parameters:

savFileName : str

The file name of the spss data file

ioUtf8 : bool, int, default False

Indicates the mode in which text communicated to or from the I/O Module will be. See also under savReaderWriter.Generic.ioUtf8() and under ioUtf8 in savReaderWriter.SavReader.

Changed in version 3.4: ioUtf8=UNICODE_BMODE was added.

ioLocale : locale str, optional

indicates the locale of the I/O module. Cf. SET LOCALE. (default = None, which corresponds to locale.setlocale(locale.LC_CTYPE))

See also

savReaderWriter.Header: for more options to retrieve individual metadata items

Examples

Typical use:

with SavHeaderReader(savFileName) as header:
    metadata = header.all()
    report = str(header)
    print(metadata.varLabels)

Attributes

Methods

__init__(savFileName, ioUtf8=False, ioLocale=None)[source]¶: Constructor. Initializes all vars that can be recycled

__str__()[source]¶: This function returns a report of the SPSS data dictionary (i.e., the header), in the encoding of the spss file

__unicode__()[source]¶: This function returns a report of the SPSS data dictionary (i.e., the header).

__enter__()[source]¶: This function returns the DictionaryReader object itself so its methods become available for use with context managers (‘with’ statements).

Warning

Always ensure the the .sav file is properly closed, either by using a context manager (with statement) or by using close()

__exit__(type, value, tb)[source]¶: This function closes the spss data file and does some cleaning.

close()[source]¶: This function closes the spss data file and does some cleaning.

dataDictionary(asNamedtuple=False)[source]¶: This function returns all the dictionary items. It returns a Python dictionary based on the Spss dictionary of the given Spss file. This is equivalent to the Spss command ‘DISPLAY DICTIONARY’. If asNamedtuple=True, this function returns a namedtuple, so one can retrieve metadata like e.g. ‘metadata.valueLabels’

all(asNamedtuple=True)[source]¶: Returns all the metadata as a named tuple (cf. SavReader.all) Exactly the same as dataDictionary, but with different (nicer?) default

reportSpssDataDictionary(dataDict)[source]¶: This function reports information from the Spss dictionary of the active Spss dataset. The parameter ‘dataDict’ is the return value of dataDictionary()

SavWriter¶

The most commonly used metadata aspects include VARIABLE LABELS, VALUE LABELS, FORMATS and MISSING VALUES.

class savReaderWriter.SavWriter(savFileName, varNames, varTypes, valueLabels=None, varLabels=None, formats=None, missingValues=None, measureLevels=None, columnWidths=None, alignments=None, varSets=None, varRoles=None, varAttributes=None, fileAttributes=None, fileLabel=None, multRespDefs=None, caseWeightVar=None, overwrite=True, ioUtf8=False, ioLocale=None, mode='wb', refSavFileName=None)[source]¶

Bases: savReaderWriter.header.Header

Write SPSS system files (.sav, .zsav)

Below, the associated SPSS commands are given in CAPS.

Parameters:

savFileName : str

The file name of the spss data file.

File names that end with ‘.sav’ are compressed using the ‘old’ compression scheme

File names that end with ‘_uncompressed.sav’ are, well, not compressed. This is useful when you intend to read the files with the faster savReaderWriter.SavReaderNp class

File names that end with ‘.zsav’ are compressed using the ZLIB (ZSAV) compression scheme (requires v21 SPSS I/O files)

varNames : list

list of of strings of the variable names in the order in which they should appear in the spss data file. See also under savReaderWriter.Header.varNamesTypes().

varTypes : dict

varTypes dictionary {varName: varType}

varType == 0 –> numeric

varType > 0 –> character’ of that length (in bytes!)

See also under savReaderWriter.Header.varNamesTypes().

valueLabels : dict, optional

value label dictionary {varName: {value: label}} Cf. VALUE LABELS. See also under savReaderWriter.Header.valueLabels().

varLabels : dict, optional

variable label dictionary {varName: varLabel}. Cf. VARIABLE LABELS. See also under savReaderWriter.Header.varLabels().

formats : dict, optional

format dictionary {varName: printFmt}. Cf. FORMATS. See also under savReaderWriter.Header.formats(), under Formats and under Date formats.

missingValues : dict, optional

missing values dictionary {varName: {missing value spec}}. Cf. MISSING VALUES. See also under savReaderWriter.Header.missingValues()

measureLevels : dict, optional

measurement level dictionary {varName: <level>}. Valid levels are: “unknown”, “nominal”, “ordinal”, “scale”, “ratio”, “flag”, “typeless”. Cf. VARIABLE LEVEL See also under savReaderWriter.Header.measureLevels().

Warning

measureLevels, columnWidths and alignments must all three be set, if used

columnWidths : dict, optional

column display width dictionary {varName: <int>}. Cf. VARIABLE WIDTH. (default: None –> >= 10 [stringVars] or automatic [numVars]) See also under savReaderWriter.Header.columnWidths().

alignments : dict, optional

variable alignment dictionary {varName: <left/center/right>}. Cf. VARIABLE ALIGNMENT (default: None –> left) See also under savReaderWriter.Header.alignments().

varSets : dict, optional

sets dictionary {setName: list_of_valid_varNames}. Cf. SETSMR command. See also under savReaderWriter.Header.varSets()

varRoles : dict, optional

variable roles dictionary {varName: varRole}, where varRole may be any of the following: ‘both’, ‘frequency’, ‘input’, ‘none’, ‘partition’, ‘record ID’, ‘split’, ‘target’. Cf. VARIABLE ROLE See also under savReaderWriter.Header.varRoles().

varAttributes : dict, optional

variable attributes dictionary {varName: {attribName: attribValue}. Cf. VARIABLE ATTRIBUTES. See also under savReaderWriter.Header.varAttributes().

fileAttributes : dict, optional

file attributes dictionary {attribName: attribValue}. Cf. FILE ATTRIBUTES. See also under savReaderWriter.Header.fileAttributes().

fileLabel : dict, optional

file label string, which defaults to “File created by user <username> at <datetime>” is file label is None. Cf. FILE LABEL See also under savReaderWriter.Header.fileLabel().

multRespDefs : dict, optional

multiple response sets definitions (dichotomy groups or category groups) dictionary {setName: <set definition>}. In SPSS syntax, ‘setName’ has a dollar prefix (‘$someSet’). Cf. MRSETS. See also under savReaderWriter.Header.multRespDefs().

caseWeightVar : str, optional

valid varName that is set as case weight (cf. WEIGHT BY). See also under savReaderWriter.Header.caseWeightVar().

overwrite : bool, optional

indicates whether an existing SPSS file should be overwritten

ioUtf8 : bool, optional

indicates the mode in which text communicated to or from the I/O Module will be. This refers to unicode mode (SET UNICODE=ON) and codepage mode in SPSS (SET UNICODE=OFF). See also under savReaderWriter.Generic.ioUtf8() and under ioUtf8 in savReaderWriter.SavReader.

ioUtf8=False. Use the current ioLocale setting to determine the encoding for writing data.

ioUtf8=True. Use Unicode encoding (UTF-8) for writing data.

Note: Data files saved in Unicode encoding cannot be read by versions of IBM SPSS Statistics prior to 16. Unicode mode is the default since IBM SPSS Statistics version 21. When opening code page IBM SPSS Statistics data files in Unicode mode or saving data files as Unicode in codepage mode, defined string widths are automatically tripled.

See also

http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/faq_unicode.htm

ioLocale : bool, optional

indicates the locale of the I/O module, cf. SET LOCALE (default: None, which is the same as locale.setlocale(locale.LC_CTYPE)). See also under savReaderWriter.Generic.ioLocale()

mode : str, optional

indicates the mode in which savFileName should be opened. Possible values are:

“wb” –> write

“ab” –> append

“cp” –> copy: initialize header using refSavFileName as a reference file, cf. APPLY DICTIONARY.

refSavFileName : str, optional

reference file that should be used to initialize the header (aka the SPSS data dictionary) containing variable label, value label, missing value, etc. etc. definitions. Only relevant in conjunction with mode="cp".

See also

savReaderWriter.Header: for details about how to define individual metadata items

Examples

Typical use:

records = [[b'Test1', 1, 1], [b'Test2', 2, 1]]
varNames = [b'var1', b'v2', b'v3']
varTypes = {b'var1': 5, b'v2': 0, b'v3': 0}
savFileName = 'someFile.sav'
with SavWriter(savFileName, varNames, varTypes) as writer:
    for record in records:
        writer.writerow(record)

Attributes

Methods

__init__(savFileName, varNames, varTypes, valueLabels=None, varLabels=None, formats=None, missingValues=None, measureLevels=None, columnWidths=None, alignments=None, varSets=None, varRoles=None, varAttributes=None, fileAttributes=None, fileLabel=None, multRespDefs=None, caseWeightVar=None, overwrite=True, ioUtf8=False, ioLocale=None, mode='wb', refSavFileName=None)[source]¶: Constructor. Initializes all vars that can be recycled

__enter__()[source]¶: This function returns the writer object itself so the writerow and writerows methods become available for use with ‘with’ statements

__exit__(type, value, tb)[source]¶: This function closes the spss data file.

Warning

Always ensure the the .sav file is properly closed, either by using a context manager (with statement) or by using close()

close()[source]¶: This function closes the spss data file.

convertDate(day, month, year)[source]¶: This function converts a Gregorian date expressed as day-month-year to the internal SPSS date format. The time portion of the date variable is set to 0:00. To set the time portion if the date variable to another value, use convertTime.

convertTime(day, hour, minute, second)[source]¶: This function converts a time given as day, hours, minutes, and seconds to the internal SPSS time format.

spssDateTime(datetimeStr='2001-12-08', strptimeFmt='%Y-%m-%d')[source]¶: This function converts a date/time string into an SPSS date, using a strptime format. See also Date formats

writerow(record)[source]¶: This function writes one record, which is a Python list.

writerows(records)[source]¶

This function writes all records.

Parameters:

records : list, tuple, numpy.ndarray, pandas.DataFrame, or similar

the records to be written to the .sav file

Raises:

TypeError : if the records instance is not of a suitable type

ValueError : if bool(records) == False, or if the array/DataFrame

is empty

Header¶

Note

This class should not be used directly. Use SavHeaderReader or SavReader to retrieve metadata.

class savReaderWriter.Header(savFileName, mode, refSavFileName, ioUtf8=False, ioLocale=None)[source]¶

Bases: savReaderWriter.generic.Generic

This class contains methods responsible for getting and setting meta data that is embedded in the IBM SPSS Statistics data file. In SPSS speak, this header information is known as the SPSS Data Dictionary (which has diddly squat to do with a Python dictionary!). NOTE: this class should not be called directly. Use SavHeaderReader to retrieve metadata.

Attributes

Methods

numberofCases¶

This function reports the number of cases present in a data file. Prehistoric files (< SPSS v6.0) don’t contain nCases info, therefore a guesstimate of the number of cases is given for those files (cf. SHOW N)

See also

savReaderWriter.SavReader.__len__: use len(reader) to get the number of cases
savReaderWriter.SavReader.shape: use reader.shape to get a (nrows, ncols) ntuple

numberofVariables¶

This function returns the number of variables (columns) in the spss dataset

See also

savReaderWriter.SavReader.shape: use reader.shape to get a (nrows, ncols) ntuple

varNamesTypes¶

Get/Set a tuple of variable names and types

Variable names is a list of the form [b’var1’, b’var2’, b’etc’]
Variable types is a dictionary of the form {varName: varType}

The variable type code is an integer in the range 0-32767, 0 indicating a numeric variable (e.g., F8.2) and a positive value indicating a string variable of that size (in bytes).

valueLabels¶

Get/Set VALUE LABELS. Takes a dictionary of the form {varName: {value: valueLabel}}:

{b'numGender': {1: b'female', 
               {2: b'male'},
 b'strGender': {b'f': b'female', 
                b'm': b'male'}}

varLabels¶

Get/set VARIABLE LABELS. Returns/takes a dictionary of the form {varName: varLabel}. For example:

{b'salary': b'Salary (dollars)',
 b'educ': b'Educational level (years)'}

formats¶

Get the PRINT FORMATS, set PRINT FORMATS and WRITE FORMATS. Returns/takes a dictionary of the form {varName: <spss-format>}. For example:

{b'salary': b'DOLLAR8', 
 b'gender': b'A1',
 b'educ': b'F8.2'}

missingValues¶

Get/Set MISSING VALUES. User missing values are values that will not be included in calculations by SPSS. For example, ‘don’t know’ might be coded as a user missing value (a value of 999 is typically used, so when vairable ‘age’ has values 5, 15, 999, the average age is 10). This is different from ‘system missing values’, which are blank/null values. Takes a dictionary of the following form:

# note that 'lower', 'upper', 'value(s)' are without b' prefix
missingValues = { 

# discrete values
b"someNumvar1": {"values": [999, -1, -2]},

# range, cf. MISSING VALUES x (-9 THRU -1)
b"someNumvar2": {"lower": -9, "upper": -1},
b"someNumvar3": {"lower": -9, "upper": -1, "value": 999},

# string variables can have up to three missing values
b"someStrvar1": {"values": [b"foo", b"bar", b"baz"]},
b"someStrvar2": {"values': b"bletch"}
}

measureLevels¶: Get/Set VARIABLE LEVEL (measurement level). Returns/Takes a dictionary of the form {varName: varMeasureLevel}. Valid measurement levels are: “unknown”, “nominal”, “ordinal”, “scale”, “ratio”, “flag”, “typeless”. This is used in SPSS procedures such as CTABLES.

columnWidths¶: Get/Set VARIABLE WIDTH (display width). Returns/Takes a dictionary of the form {varName: <int>}. A value of zero is special and means that the IBM SPSS Statistics Data Editor is to set an appropriate width using its own algorithm. If used, variable alignment, measurement level and column width all needs to be set.

alignments¶

Get/Set VARIABLE ALIGNMENT. Returns/Takes a dictionary of the: form {varName: alignment}. Valid alignment values are: “left”, “right”, “center”.

Warning

measureLevels, columnWidths, alignments must all three be set, if used

varSets¶

Get/Set VARIABLE SET information. Returns/Takes a dictionary with setname as keys and a list of SPSS variables as values. For example:

{b'SALARY': [b'salbegin', b'salary'], 
 b'DEMOGR': [b'gender', b'minority', b'educ']}

varRoles¶: Get/Set VARIABLE ROLES. Returns/Takes a dictionary of the form {varName: varRole}, where varRoles may be any of the following: ‘both’, ‘frequency’, ‘input’, ‘none’, ‘partition’, ‘record ID’, ‘split’, ‘target’

varAttributes¶

Get/Set VARIABLE ATTRIBUTES. Returns/Takes dictionary of the form:

{b'var1': {b'attr name x': b'attr value x',
           b'attr name y': b'attr value y'},
 b'var2': {b'attr name a': b'attr value a',
           b'attr name b': b'attr value b'}}

fileAttributes¶

Get/Set DATAFILE ATTRIBUTES. Returns/Takes a dictionary of the form:

b'attrName[1]': b'attrValue1', 
b'revision[1]': b'2010-10-09',
b'revision[2]': b'2010-10-22', 
b'revision[3]': b'2010-11-19'}

Square brackets indicate attribute arrays, which must start with 1

multRespDefs¶

Get/Set MRSETS (multiple response) sets. Returns/takes a dictionary of the form:

multiple category sets: {setName: {“setType”: “C”, “label”: lbl, “varNames”: [<list_of_varNames>]}}
multiple dichotomy sets: {setName: {“setType”: “D”, “label”: lbl, “varNames”: [<list_of_varNames>], “countedValue”: countedValue}}
extended multiple dichotomy sets: {setName: {“setType”: “E”, “label”: lbl, “varNames”: [<list_of_varNames>], “countedValue”: countedValue, ‘firstVarIsLabel’: <bool>}}

Note. You can get values of extended multiple dichotomy sets with getMultRespSetsDefEx, but you cannot write extended multiple dichotomy sets.

For example:

categorical =  {b"setType": b"C", 
                b"label": b"labelC",
                b"varNames": [b"salary", b"educ"]}
dichotomous1 = {b"setType": b"D", b"label": b"labelD",
                b"varNames": [b"salary", b"educ"], 
                b"countedValue": b"Yes"}
dichotomous2 = {b"setType": b"D", 
                b"label": b"", 
                b"varNames": [b"salary", b"educ", b"jobcat"], 
                b"countedValue": b"No"}
extended1 =    {b"setType": b"E", 
                b"label": b"", 
                b"varNames": [b"mevar1", b"mevar2", b"mevar3"], 
                b"countedValue": b"1",
                b"firstVarIsLabel": True}
extended2 =    {b"setType": b"E", 
                b"label": b"Enhanced set with user specified label", 
                b"varNames": [b"mevar4", b"mevar5", b"mevar6"], 
                b"countedValue": b"Yes", 
                b"firstVarIsLabel": False}
multRespDefs = {b"testSetC": categorical, 
                b"testSetD1": dichotomous1,
                b"testSetD2": dichotomous2, 
                b"testSetEx1": extended1,
                b"testSetEx2": extended2}

caseWeightVar¶: Get/Set WEIGHT variable. Takes a valid varName, and returns weight variable, if any, as a string.

textInfo¶: Get/Set text information. Takes a savFileName and returns a string of the form: “File %r built using SavReaderWriter.py version %s (%s)”. This is akin to, but not equivalent to the SPSS syntax command DISPLAY DOCUMENTS

fileLabel¶: Get/Set FILE LABEL (id string) Takes a file label, and returns file label, if any, as a byte string.

Generic¶

Note

This class should not be used directly

class savReaderWriter.Generic(savFileName, ioUtf8=False, ioLocale=None)[source]¶

Bases: object

Class for methods and data used in reading as well as writing IBM SPSS Statistics data files

Attributes

Methods

byteorder¶: This function returns the byte order of the open file as a string. It returns either ‘little’ or ‘big’.

spssVersion¶: Return the SPSS version that was used to create the opened file as a three-tuple indicating major, minor, and fixpack version asunde ints. NB: in the transition from SPSS to IBM, a new four-digit versioning nomenclature is used. This function returns the old three-digit nomenclature. Therefore, no patch version information is available.

spssioVersion¶: This function returns the version of the IBM SPSS I/O libraries as a named tuple with the fields major, minor, patch, fixpack. May also be inspected by passing an empty savFileName, as in: savReaderWriter.Generic(“”).spssioVersion

fileCompression¶: Get/Set the file compression. Returns/Takes a compression switch which may be any of the following: ‘uncompressed’, ‘standard’, or ‘zlib’. Zlib comression requires SPSS v21 I/O files.

systemString¶: This function returns the name of the system under which the file was created aa a string.

sysmis¶: This function returns the IBM SPSS Statistics system-missing value ($SYSMIS) for the host system (also called ‘NA’ in other systems).

missingValuesLowHigh¶: This function returns the ‘lowest’ and ‘highest’ values used for numeric missing value ranges on the host system. This can be used in a similar way as the LO and HI keywords in missing values specifications (cf. MISSING VALUES foo (LO THRU 0). It may be called at any time.

ioLocale¶

This function gets/sets the I/O Module’s locale. This corresponds with the SPSS command SET LOCALE. The I/O Module’s locale is separate from that of the client application. The <localeName> parameter and the return value are identical to those for the C run-time function setlocale. The exact locale name specification depends on the OS of the host sytem, but has the following form:

<lang>_<territory>.<codeset>[@<modifiers>]

The ‘codeset’ and ‘modifier’ components are optional and in Windows, aliases (e.g. ‘english’) may be used. When the I/O Module is first loaded, its locale is set to the system default.

See also

linux: https://wiki.archlinux.org/index.php/Locale
windows: http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.80).aspx

fileCodePage¶: This function provides the Windows code page number of the encoding applicable to a file.

isCompatibleEncoding()[source]¶: This function determines whether the file and interface encoding are compatible.

ioUtf8¶

This function returns/sets the current interface encoding

ioUtf8 = False –> CODEPAGE mode,
ioUtf8 = True –> UTF-8 mode, aka. UNICODE mode

This corresponds with the SPSS command SHOW UNICODE (getter) and SET UNICODE=ON/OFF (setter).

See also

SPSS-unicode-mode: http://www-01.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/faq_unicode.htm

fileEncoding¶: This function obtains the encoding applicable to a file. The encoding is returned as an IANA encoding name, such as ISO-8859-1, which is then converted to the corresponding Python codec name. If the file contains no file encoding, the locale’s preferred encoding is returned

Generated API documentation¶

SavReader¶

SavReaderNp¶

SavHeaderReader¶

SavWriter¶

Header¶

Generic¶

Table Of Contents

Previous topic

This Page