Welcome to savReaderWriter’s documentation!¶

Installation¶

Platforms¶

As shown in Table 0 below, this program works for Linux (incl. z/Linux), Windows, Mac OS (32 and 64 bit), AIX-64, HP-UX and Solaris-64. The program has been tested with Python 2.7, 3.3 and 3.4 on Debian Linux (32 and 64 bit), Mac OS and Windows 7 (64 bit).

**Table 0.** supported platforms for `savReaderWriter`¶
Operating System	Architecture
	32 bit	64 bit
AIX		X
HP-UX		X
Linux	X	X
Mac OS	X	X?
Solaris		X
Windows	X	X
zLinux		X

Setup¶

The program can be installed by running:

python setup.py install

Or alternatively:

pip install savReaderWriter

To get the ‘bleeding edge’ version straight from the repository do:

pip install -U -e git+https://bitbucket.org/fomcl/savreaderwriter.git#egg=savreaderwriter

Note

Users of Mac OS X need to do two additional things:

DYLD_LIBRARY_PATH needs to be set to the directory where the SPSS I/O libraries for Mac OS X live. If you also set LC_ALL environment variable, you may skip the next ioLocale step. You may also want to edit your ~/.bashrc accordingly.
ioLocale needs to be set manually (work-around). The ioLocale is the locale of the SPSS I/O, which is supposed to be copied from the host system, if unset (i.e., equal to None). However, Python locale.setlocale and locale.getlocale are quirky in Mac OS X (see also this OS X and Python locale snippet).

The code below shows an example that uses Python 2.7.2 (Python 3.3.5 also works) under Mac OS X Mountain Lion 10.9.1:

fomcls-Mac-Pro:~ fomcl$ uname -a
Darwin fomcls-Mac-Pro.local 12.2.0 Darwin Kernel Version 12.2.0: Sat Aug 25 00:48:52 PDT 2012; root:xnu-2050.18.24~1/RELEASE_X86_64 x86_6
fomcls-Mac-Pro:~ fomcl$ export DYLD_LIBRARY_PATH=/Library/Python/2.7/site-packages/savReaderWriter/spssio/macos
fomcls-Mac-Pro:~ fomcl$ export LC_ALL=en_US.UTF-8  # if you also do this, specifiying ioLocale is usually not needed
fomcls-Mac-Pro:savReaderWriter fomcl$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
>>> import savReaderWriter
>>> savFileName = "/Library/Python/2.7/site-packages/savReaderWriter/test_data/Employee data.sav"
>>> with savReaderWriter.SavReader(savFileName, ioLocale='en_US.UTF-8') as reader:
...     for line in reader:
...         print line
...
[1.0, 'm', '1952-02-03', 15.0, 3.0, 57000.0, 27000.0, 98.0, 144.0, 0.0]
[2.0, 'm', '1958-05-23', 16.0, 1.0, 40200.0, 18750.0, 98.0, 36.0, 0.0]
[3.0, 'f', '1929-07-26', 12.0, 1.0, 21450.0, 12000.0, 98.0, 381.0, 0.0]
[4.0, 'f', '1947-04-15', 8.0, 1.0, 21900.0, 13200.0, 98.0, 190.0, 0.0]
# etc. etc.

Users of AIX, Solaris, HP-UX, zLinux need to install the SPSS I/O libraries separately (Pypi has a file size limit of about 60 Mb, so I had to exclude them - sorry):

python -m savReaderWriter.util.download_mainframe_libs

Changed in version 3.4.

Added SavReaderNp, a class to convert .sav files to numpy arrays
Added savViewer, a simple PyQt4-based script to view .sav, .zsav, .xls, .xlsx, .csv, .tab files. See also this savViewer screenshot.

Usage examples:
```
python -m savReaderWriter.util.savViewer
python -m savReaderWriter.util.savViewer '/path/to/some/file.sav'
```
Removed several bugs, notably one related to memoization of SPSS datetimes (THANKS everybody for taking the time to report these bugs!)
SavReader.__enter__ now returns self, not iter(self)

Changed in version 3.3.

The savReaderWriter program now runs on Python 2 and 3. It is tested with Python 2.7, 3.3 and PyPy under Debian Linux 3.2.0-4-AMD64.
Under Python 3.3, the data are in bytes! Use the b’ prefix when writing string data, or write data in unicode mode (ioUtf8=True).
Several bugs were removed, notably two that prevented the I/O modules from loading in 64-bit Linux and 64-bit Windows systems (NB: these bugs were entirely unrelated). I re-downloaded the SPSS I/O v21 FP1 modules because the Win 64 libs were incorrectly compiled. In addition, long variable labels were truncated to 120 characters, which is now fixed.
This has not yet been tested for performance.

Changed in version 3.2.

The savReaderWriter program is now self-contained. That is, the IBM SPSS I/O modules now all load by themselves, without any changes being required anymore to PATH, LD_LIBRARY_PATH and equivalents. Also, no extra .deb files need to be installed anymore (i.e. no dependencies).
savReaderWriter now uses version 21.0.0.1 (i.e., Fixpack 1) of the I/O module.

Optional features¶

cWriterow. The cWriterow package is a faster Cython implementation of the pyWriterow method (66 % faster). To install it, you need Cython and run setup.py in the cWriterow folder:

easy_install cython
python setup.py build_ext --inplace

numpy.

The numpy package should be installed if you intend to use array slicing (e.g data[:2,2:4]).
numpy is also needed to use the SavReaderNp sav-to-numpy class

Enviroment variables¶

SAVRW_DISPLAY_WARNS. To issue warnings you can set an enviroment variable SAVRW_DISPLAY_WARNS to any of the following actions: “error”, “ignore”, “always”, “default”, “module”, “once”. If the enviroment variable is not defined, warnings are ignored. Note that warnings are usually harmless, e.g. SPSS_NO_LABELS. See: http://docs.python.org/2/library/warnings.html.

SAVRW_USE_CWRITEROW. You can use this variable to toggle between the cWriterow and the pyWriterow method, by setting this variable to ON or OFF, respectively. This is intended for testing purposes.

DYLD_LIBRARY_PATH. Users of Mac OSX need to set this variable, see elsewhere in this documentation.

LC_ALL. Users of Mac OSX may need to set this variable, see elsewhere in this documentation.

Typical use (the TL;DR version)¶

The full documentation can be found in the Generated API documentation. Here are the most important parts

Reading files:

with SavReader('someFile.sav') as reader:
    header = reader.header
    for line in reader:
        process(line)

with SavReader('someFile.sav') as reader:
    records = reader.all()

Writing files:

savFileName = 'someFile.sav'
records = [[b'Test1', 1, 1], [b'Test2', 2, 1]]
varNames = ['var1', 'v2', 'v3']
varTypes = {'var1': 5, 'v2': 0, 'v3': 0}
with SavWriter(savFileName, varNames, varTypes) as writer:
    for record in records:
        writer.writerow(record)

Writing numpy arrays, pandas DataFrames, lists-of-lists, etc:

savFileName = 'someFile.sav'
args = ( ["v1", "v2"], dict(v1=0, v2=0) )
array = np.arange(10, dtype=np.float64).reshape(5, 2)
with SavWriter(savFileName, *args) as writer:
    writer.writerows(array)

Reading file metadata:

with SavHeaderReader(savFileName) as header:
    metadata = header.all()
    report = str(header)
print(metadata.valueLabels)
print(report)

Reading files into numpy arrays:

with SavReaderNp("Employee data.sav") as reader_np:
    array = reader_np.to_structured_array()
mean_salary = array["salary"].mean()

Reading a file in unicode mode (default in SPSS v21 and up):

>>> with SavReader('greetings.sav', ioUtf8=True) as reader:
...    for record in reader:
...        print(record[-1])
     নমস্কাৰ
     আসসালামুআলাইকুম
Greetings and salutations
გამარჯობა
Сәлеметсіз бе
Здравствуйте
¡Hola!
Grüezi
สวัสดี
Bondjoû

Reading a file in codepage mode

This could be needed when the file was created using an older SPSS for Windows version, which used codepage mode. Usually this means that (meta)data are encoded as windows-1252. In Linux, you may need to generate a locale with a windows encoding:

# wrong: variables with accented characters are returned as v1, v2, v3
>>> with SavHeaderReader('german.sav') as header:
...     print(header.varNames)
[b'python', b'programmieren', b'macht', b'v1', b'v2', b'v3']

# correct: variable names contain non-ascii characters
# locale definition and presence is OS-specific
# Linux: sudo localedef -f CP1252 -i de_DE /usr/lib/locale/de_DE.cp1252
>>> with SavHeaderReader('german.sav', ioLocale='de_DE.cp1252') as header:
...     print(header.varNames)
[b'python', b'programmieren', b'macht', b'\xfcberhaupt', b'v\xf6llig', b'spa\xdf']

Formats¶

SPSS knows just two different data types: string and numerical data. These data types can be formatted (displayed) by SPSS in several different ways. Format names are followed by total width (w) and an optional number of decimal positions (d). Table 1 below shows a complete list of all the available formats.

String data can be alphanumeric characters (A format) or the hexadecimal representation of alphanumeric characters (AHEX format). The maximum size of a string value is 32767 bytes. String formats do not have any decimal positions (d). Currently, SavReader maps both of the string formats to a regular alphanumeric string format.

Numerical data formats include the default numeric format (F), scientific notation (E) and zero-padded (N). For example, a format of F5.2 represents a numeric value with a total width of 5, including two decimal positions and a decimal indicator. For all numeric formats, the maximum width (w) is 40. For numeric formats where decimals are allowed, the maximum number of decimals (d) is 16. SavReader does not format numerical values, except for the N format, and dates/times (see under Date formats). The N format is a zero-padded value (e.g. SPSS format N8 is formatted as Python format %08d, e.g. ‘00001234’). For most numerical values, formatting means loss of precision. For instance, formatting SPSS F5.3 to Python %5.3f means that only the first three digits are retained. In addition, formatting incurs additional processing time. Finally, e.g. appending a percent sign to a value (PCT format) renders the value less useful for calculations.

**Table 1.** string and numerical formats in SPSS and `savReaderWriter`¶
Format	Description	Format	Description
A	Alphanumeric	JDATE	Julian date - yyyyddd
AHEX	Alphanumeric hexadecimal	MONTH	Month
ADATE	Date format dd-mmm-yyyy	MOYR	mmm yyyy
CCA	User Programmable currency format	N	N Format- unsigned with leading 0s
CCB	User Programmable currency format	P	Packed decimal
CCC	User Programmable currency format	PCT	Percent - F followed by %
CCD	User Programmable currency format	PIB	Positive integer binary unsigned
CCE	User Programmable currency format	PIBHEX	Positive integer binary - hex
COMMA	F Format with commas	PK	Positive integer binary unsigned
DATE	Date format dd-mmm-yyyy	QYR	q Q yyyy
DATETIME	Date and Time	RB	Floating point binary
DOLLAR	Commas and floating dollar sign	RBHEX	Floating point binary hex
DOT	Like COMMA, switching dot for comma	SDATE	Date in yyyy/mm/dd style
DTIME	Date-time dd hh:mm:ss.s	TIME	Time format hh:mm:ss.s
E	E Format- with explicit power of 10	WKDAY	Day of the week
EDATE	Date in dd/mm/yyyy style	WKYR	ww WK yyyy
F	Default Numeric Format	Z	Zoned decimal
IB	Integer binary

Note. The User Programmable currency formats (CCA, CCB, CCC and CCD) cannot be defined or written by SavWriter and existing definitions cannot be read by SavReader.

Date formats¶

Dates in SPSS. Date formats are a group of numerical formats. SPSS stores dates as the number of seconds since midnight, October 14, 1582 (the beginning of the Gregorian calendar). In SPSS, the user can make these seconds human-readable by giving them a print and/or write format (usually these are set at the same time using the FORMATS command). Examples of such display formats include ADATE (American date, mmddyyyy) and EDATE (European date, ddmmyyyy), SDATE (Asian/Sortable date, yyyymmdd) and JDATE (Julian date).

Reading dates. SavReader deliberately does not honor the different SPSS date display formats, but instead tries to convert them to the more practical (sortable) and less ambiguous ISO 8601 format (yyyy-mm-dd). You can easily change this behavior by modifying the supportedDates dictionary in __init__.py. Table 2 below shows how SavReader converts SPSS dates. Where applicable, the SPSS-to-Python conversion always results in the ‘long’ version of a date/time. For instance, TIME5 and TIME40.16 both result in a %H:%M:%S.%f-style format. If you do not want SavReader to automatically convert dates, you can set rawMode=True. If you use this setting, keep in mind that SavReader will also not convert system missing values ($SYSMIS) to an empty string; instead sysmis values will appear as the smallest value that can be represented on that system (-1 * sys.float_info.max)

**Table 2.** Date formats in SPSS and `SavReader`¶
General form	Format type	Min w in	Max w out	Max d	SPSS Example [1]	strftime format [2]	savReaderWriter Example	Note
dd-mmm-yy	DATEw	9	9	40	28-OCT-90	%Y-%m-%d	1990-10-28	[3]
dd-mmm-yyyy	DATEw	10	11		28-OCT-1990	idem
mm/dd/yy	ADATEw	8	8	40	10/28/90	%Y-%m-%d	1990-10-28	[3]
mm/dd/yyyy	ADATEw	10	10		10/28/1990	idem
dd.mm.yy	EDATEw	8	8	40	28.10.90	%Y-%m-%d	1990-10-28	[3]
dd.mm.yyyy	EDATEw	10	10		28.10.1990	idem
yyddd	JDATEw	5	5	40	90301	%Y-%m-%d	1990-10-28	[3]
yyyyddd	JDATEw	7	7		1990301	idem
yy/mm/dd	SDATEw	8	8	40	90/10/28	%Y-%m-%d	1990-10-28	[3]
yyyy/mm/dd	SDATEw	10	10		1990/10/28	idem
q Q yy	QYRw	4	6	40	4 Q 90	%m Q %Y	4 Q 1990	[4]
q Q yyyy	QYRw	6	8		4 Q 1990	idem
mmm yy	MOYRw	6	6	40	OCT 90	%B %Y	October 1990	[5]
mmm yyyy	MOYRw	8	8		OCT 1990	idem
ww WK yy	WKYRw	6	8	40	43 WK 90	%W WK %Y	43 WK 1990	[5]
ww WK yyyy	WKYRw	8	10		43 WK 1990	idem
(name of the day)	WKDAYw	2	2	40	SU	%A	Sunday	[5]
(name of the month)	MONTHw	3	3	40	JAN	%B	January	[5]
hh:mm	TIMEw	5	5	40	01:02	%H:%M:%S.%f	01:02:34.7500000
hh:mm:ss.s	TIMEw.d	10	10	40	01:02:34.75	idem
dd hh:mm	DTIMEw	1	1	40	20 08:03	%d %H:%M:%S	20 08:03:00
dd hh:mm:ss.s	DTIMEw.d	13	13	40	20 08:03:00	idem
dd-mmm-yyyy hh:mm	DATETIMEw	17	17	40	20-JUN-1990 08:03	%Y-%m-%d %H:%M:%S	1990-06-20 08:03:00
Dd-mmm-yyyy hh:mm:ss.s	DATETIMEw.d	22	22	40	20-JUN-1990 08:03:00	idem

Note. [1] IBM SPSS Statistics Command Syntax Reference.pdf [2] http://docs.python.org/2/library/datetime.html [3] ISO 8601 format dates are used wherever possible, e.g. mmddyyyy (ADATE) and ddmmyyyy (EDATE) is not maintained. [4] Months are converted to quarters using a simple lookup table [5] weekday, month names depend on host locale (not on ioLocale argument)

Writing dates. With SavWriter a Python date string value (e.g. “2010-10-25”) can be converted to an SPSS Gregorian date (i.e., just a whole bunch of seconds) by using the spssDateTime method, e.g.:

kwargs = dict(savFileName='/tmp/date.sav', varNames=['aDate'],
              varTypes={'aDate': 0}, formats={'aDate': 'EDATE40'})
with SavWriter(**kwargs) as writer:
    spssDateValue = writer.spssDateTime(b'2010-10-25', '%Y-%m-%d')
    writer.writerow([spssDateValue])

The display format of the date (i.e., the way it looks in the SPSS data editor after opening the .sav file) may be set by specifying the formats dictionary (see also Table 1). This is one of the optional arguments of the SavWriter initializer. Without such a specification, the date will look like a large integer (the number of seconds since the beginning of the Gregorian calendar).

Indices and tables¶

Generated API documentation