spacepy.pycdf.Var¶

class spacepy.pycdf.Var(cdf_file, var_name, *args)[source]¶

A CDF variable.

This object does not directly store the data from the CDF; rather, it provides access to the data in a format that much like a Python list or numpy ndarray. General list information is available in the python docs: 1, 2, 3.

The CDF user’s guide, section 2.3, provides background on variables.

Note

Not intended to be created directly; use methods of CDF to gain access to a variable.

A record-varying variable’s data are viewed as a hypercube of dimensions n_dims+1 (the extra dimension is the record number). They are indexed in row-major fashion, i.e. the last index changes most frequently / is contiguous in memory. If the CDF is column-major, the data are transformed to row-major before return.

Non record-varying variables are similar, but do not have the extra dimension of record number.

Variables can be subscripted by a multidimensional index to return the data. Indices are in row-major order with the first dimension representing the record number. If the CDF is column major, the data are reordered to row major. Each dimension is specified by standard Python slice notation, with dimensions separated by commas. The ellipsis fills in any missing dimensions with full slices. The returned data are lists; Python represents multidimensional arrays as nested lists. The innermost set of lists represents contiguous data.

Note

numpy ‘fancy indexing’ is not supported.

Degenerate dimensions are ‘collapsed’, i.e. no list of only one element will be returned if a single subscript is specified instead of a range. (To avoid this, specify a slice like 1:2, which starts with 1 and ends before 2).

Two special cases:

requesting a single-dimension slice for a record-varying variable will return all data for that record number (or those record numbers) for that variable.

Requests for multi-dimensional variables may skip the record-number dimension and simply specify the slice on the array itself. In that case, the slice of the array will be returned for all records.

In the event of ambiguity (e.g., single-dimension slice on a one-dimensional variable), case 1 takes priority. Otherwise, mismatch between the number of dimensions specified in the slice and the number of dimensions in the variable will cause an IndexError to be thrown.

This all sounds very complicated but it is essentially attempting to do the ‘right thing’ for a range of slices.

An unusual case is scalar (zero-dimensional) non-record-varying variables. Clearly they cannot be subscripted normally. In this case, use the [...] syntax meaning ‘access all data.’:

>>> from spacepy import pycdf
>>> testcdf = pycdf.CDF('test.cdf', '')
>>> variable = testcdf.new('variable', recVary=False,
...     type=pycdf.const.CDF_INT4)
>>> variable[...] = 10
>>> variable
<Var:
CDF_INT4 [] NRV
>
>>> variable[...]
10

Reading any empty non-record-varying variable will return an empty with the same number of dimensions, but all dimensions will be of zero length. The scalar is, again, a special case: due to the inability to have a numpy array which is both zero-dimensional and empty, reading an NRV scalar variable with no data will return an empty one-dimensional array. This is really not recommended.

Variables with no records (RV) or no data (NRV) are considered to be “false”; those with records or data written are considered to be “true”, allowing for an easy check of data existence:

>>> if testcdf['variable']:
>>>     # do things that require data to exist

As a list type, variables are also iterable; iterating over a variable returns a single complete record at a time.

This is all clearer with examples. Consider a variable B_GSM, with three elements per record (x, y, z components) and fifty records in the CDF. Then:

B_GSM[0, 1] is the y component of the first record.

B_GSM[10, :] is a three-element list, containing x, y, and z components of the 11th record. As a shortcut, if only one dimension is specified, it is assumed to be the record number, so this could also be written B_GSM[10].

B_GSM[...] reads all data for B_GSM and returns it as a fifty-element list, each element itself being a three-element list of x, y, z components.

Multidimensional example: consider fluxes stored as a function of pitch angle and energy. Such a variable may be called Flux and stored as a two-dimensional array, with the first dimension representing (say) ten energy steps and the second, eighteen pitch angle bins (ten degrees wide, centered from 5 to 175 degrees). Assume 100 records stored in the CDF (i.e. 100 different times).

Flux[4] is a list of ten elements, one per energy step, each element being a list of 18 fluxes, one per pitch bin. All are taken from the fifth record in the CDF.

Flux[4, :, 0:4] is the same record, all energies, but only the first four pitch bins (roughly, field-aligned).

Flux[..., 0:4] is a 100-element list (one per record), each element being a ten-element list (one per energy step), each containing fluxes for the first four pitch bins.

This slicing notation is very flexible and allows reading specifically the desired data from the CDF.

Note

The C CDF library allows reading records which have not been written to a file, returning a pad value. pycdf checks the size of a variable and will raise IndexError for most attempts to read past the end, except for variables with sparse records. If these checks fail, a value is returned with a warning VIRTUAL_RECORD_DATA. Please open an issue if this occurs for variables without sparse records. See pg. 39 and following of the CDF User’s Guide for more on virtual records.

All data are, on read, converted to appropriate Python data types; EPOCH, EPOCH16, and TIME_TT2000 types are converted to datetime. Data are returned in numpy arrays.

Note

Although pycdf supports TIME_TT2000 variables, the Python datetime object does not support leap seconds. Thus, on read, any seconds past 59 are truncated to 59.999999 (59 seconds, 999 milliseconds, 999 microseconds).

Potentially useful list methods and related functions:

The topic of array majority can be very confusing; good background material is available at IDL Array Storage and Indexing. In brief, regardless of the majority stored in the CDF, pycdf will always present the data in the native Python majority, row-major order, also known as C order. This is the default order in NumPy. However, packages that render image data may expect it in column-major order. If the axes seem ‘swapped’ this is likely the reason.

The attrs Python attribute acts as a dictionary referencing zAttributes (do not confuse the two); all the dictionary methods above also work on the attribute dictionary. See zAttrList for more on the dictionary of attributes.

With writing, as with reading, every attempt has been made to match the behavior of Python lists. You can write one record, many records, or even certain elements of all records. There is one restriction: only the record dimension (i.e. dimension 0) can be resized by write, as all records in a variable must have the same dimensions. Similarly, only whole records can be deleted.

Note

Unusual error messages on writing data usually mean that pycdf is unable to interpret the data as a regular array of a single type matching the type and shape of the variable being written. A 5x4 array is supported; an irregular array where one row has five columns and a different row has six columns is not. Error messages of this type include:

Data must be well-formed, regular array of number, string, or datetime

setting an array element with a sequence.

shape mismatch: objects cannot be broadcast to a single shape

For these examples, assume Flux has 100 records and dimensions [2, 3].

Rewrite the first record without changing the rest:

>>> Flux[0] = [[1, 2, 3], [4, 5, 6]]

Writes a new first record and delete all the rest:

>>> Flux[...] = [[1, 2, 3], [4, 5, 6]]

Write a new record in the last position and add a new record after:

>>> Flux[99:] = [[[1, 2, 3], [4, 5, 6]],
...              [[11, 12, 13], [14, 15, 16]]]

Insert two new records between the current number 5 and 6:

>>> Flux[5:6] = [[[1, 2, 3], [4, 5, 6]],  [[11, 12, 13],
...               [14, 15, 16]]]

This operation can be quite slow, as it requires reading and rewriting the entire variable. (CDF does not directly support record insertion.)

Change the first element of the first two records but leave other elements alone:

>>> Flux[0:2, 0, 0] = [1, 2]

Remove the first record:

>>> del Flux[0]

Removes record 5 (the sixth):

>>> del Flux[5]

Delete all data from Flux, but leave the variable definition intact:

>>> del Flux[...]

Note

Variables using sparse records do not support insertion and only support deletion of a single record at a time. See sparse() and section 2.3.12 of the CDF user’s guide for more information on sparse records.

Note

Although this interface only directly supports zVariables, zMode is set on opening the CDF so rVars appear as zVars. See p.24 of the CDF user’s guide; pyCDF uses zMode 2.

`attrs`	zAttributes for this zVariable in a dict-like format.
`compress`([comptype, param])	Set or check the compression of this variable
`copy`()	Copies all data and attributes from this variable
`dtype`	Provide the numpy dtype equivalent to the CDF type of this variable.
`dv`([new_dv])	Gets or sets dimension variance of each dimension of variable.
`insert`(index, data)	Inserts a single record before an index
`name`()	Returns the name of this variable
`nelems`()	Number of elements for each value in this variable
`pad`([value])	Gets or sets this variable's pad value.
`rename`(new_name)	Renames this variable
`rv`([new_rv])	Gets or sets whether this variable has record variance
`shape`	Provides the numpy array-like shape of this variable.
`sparse`([sparsetype])	Gets or sets this variable's sparse records mode.
`type`([new_type])	Returns or sets the CDF type of this variable

attrs¶: zAttributes for this zVariable in a dict-like format. See zAttrList for details.

compress(comptype=None, param=None)[source]¶

Set or check the compression of this variable

Compression may not be changeable on variables with data already written; even deleting the data may not permit the change.

See section 2.6 of the CDF user’s guide for more information on compression.

Returns:

outtuple: the (comptype, param) currently in effect

Other Parameters:

comptypectypes.c_long: type of compression to change to, see CDF C reference manual section 4.10. Constants for this parameter are in const. If not specified, will not change compression.
paramctypes.c_long: Compression parameter, see CDF CRM 4.10 and const. If not specified, will choose reasonable default (5 for gzip; other types have only one possible parameter.)

copy()[source]¶

Copies all data and attributes from this variable

Returns:

outVarCopy: list of all data in record order

dtype¶

Provide the numpy dtype equivalent to the CDF type of this variable.

Data from this variable will be returned in numpy arrays of this type.

Previous topic

Next topic

This Page

spacepy.pycdf.Var¶