spacepy.pycdf.Var

class spacepy.pycdf.Var(cdf_file, var_name, *args)[source]

A CDF variable.

This object does not directly store the data from the CDF; rather, it provides access to the data in a format that much like a Python list or numpy ndarray. General list information is available in the python docs: 1, 2, 3.

The CDF user’s guide, section 2.3, provides background on variables.

Note

Not intended to be created directly; use methods of CDF to gain access to a variable.

A record-varying variable’s data are viewed as a hypercube of dimensions n_dims+1 (the extra dimension is the record number). They are indexed in row-major fashion, i.e. the last index changes most frequently / is contiguous in memory. If the CDF is column-major, the data are transformed to row-major before return.

Non record-varying variables are similar, but do not have the extra dimension of record number.

Variables can be subscripted by a multidimensional index to return the data. Indices are in row-major order with the first dimension representing the record number. If the CDF is column major, the data are reordered to row major. Each dimension is specified by standard Python slice notation, with dimensions separated by commas. The ellipsis fills in any missing dimensions with full slices. The returned data are lists; Python represents multidimensional arrays as nested lists. The innermost set of lists represents contiguous data.

Note

numpy ‘fancy indexing’ is not supported.

Degenerate dimensions are ‘collapsed’, i.e. no list of only one element will be returned if a single subscript is specified instead of a range. (To avoid this, specify a slice like 1:2, which starts with 1 and ends before 2).

Two special cases:

  1. requesting a single-dimension slice for a record-varying variable will return all data for that record number (or those record numbers) for that variable.

  2. Requests for multi-dimensional variables may skip the record-number dimension and simply specify the slice on the array itself. In that case, the slice of the array will be returned for all records.

In the event of ambiguity (e.g., single-dimension slice on a one-dimensional variable), case 1 takes priority. Otherwise, mismatch between the number of dimensions specified in the slice and the number of dimensions in the variable will cause an IndexError to be thrown.

This all sounds very complicated but it is essentially attempting to do the ‘right thing’ for a range of slices.

An unusual case is scalar (zero-dimensional) non-record-varying variables. Clearly they cannot be subscripted normally. In this case, use the [...] syntax meaning ‘access all data.’:

>>> from spacepy import pycdf
>>> testcdf = pycdf.CDF('test.cdf', '')
>>> variable = testcdf.new('variable', recVary=False,
...     type=pycdf.const.CDF_INT4)
>>> variable[...] = 10
>>> variable
<Var:
CDF_INT4 [] NRV
>
>>> variable[...]
10

Reading any empty non-record-varying variable will return an empty with the same number of dimensions, but all dimensions will be of zero length. The scalar is, again, a special case: due to the inability to have a numpy array which is both zero-dimensional and empty, reading an NRV scalar variable with no data will return an empty one-dimensional array. This is really not recommended.

Variables with no records (RV) or no data (NRV) are considered to be “false”; those with records or data written are considered to be “true”, allowing for an easy check of data existence:

>>> if testcdf['variable']:
>>>     # do things that require data to exist

As a list type, variables are also iterable; iterating over a variable returns a single complete record at a time.

This is all clearer with examples. Consider a variable B_GSM, with three elements per record (x, y, z components) and fifty records in the CDF. Then:

  1. B_GSM[0, 1] is the y component of the first record.

  2. B_GSM[10, :] is a three-element list, containing x, y, and z components of the 11th record. As a shortcut, if only one dimension is specified, it is assumed to be the record number, so this could also be written B_GSM[10].

  3. B_GSM[...] reads all data for B_GSM and returns it as a fifty-element list, each element itself being a three-element list of x, y, z components.

Multidimensional example: consider fluxes stored as a function of pitch angle and energy. Such a variable may be called Flux and stored as a two-dimensional array, with the first dimension representing (say) ten energy steps and the second, eighteen pitch angle bins (ten degrees wide, centered from 5 to 175 degrees). Assume 100 records stored in the CDF (i.e. 100 different times).

  1. Flux[4] is a list of ten elements, one per energy step, each element being a list of 18 fluxes, one per pitch bin. All are taken from the fifth record in the CDF.

  2. Flux[4, :, 0:4] is the same record, all energies, but only the first four pitch bins (roughly, field-aligned).

  3. Flux[..., 0:4] is a 100-element list (one per record), each element being a ten-element list (one per energy step), each containing fluxes for the first four pitch bins.

This slicing notation is very flexible and allows reading specifically the desired data from the CDF.

Note

The C CDF library allows reading records which have not been written to a file, returning a pad value. pycdf checks the size of a variable and will raise IndexError for most attempts to read past the end, except for variables with sparse records. If these checks fail, a value is returned with a warning VIRTUAL_RECORD_DATA. Please open an issue if this occurs for variables without sparse records. See pg. 39 and following of the CDF User’s Guide for more on virtual records.

All data are, on read, converted to appropriate Python data types; EPOCH, EPOCH16, and TIME_TT2000 types are converted to datetime. Data are returned in numpy arrays.

Note

Although pycdf supports TIME_TT2000 variables, the Python datetime object does not support leap seconds. Thus, on read, any seconds past 59 are truncated to 59.999999 (59 seconds, 999 milliseconds, 999 microseconds).

Potentially useful list methods and related functions:

The topic of array majority can be very confusing; good background material is available at IDL Array Storage and Indexing. In brief, regardless of the majority stored in the CDF, pycdf will always present the data in the native Python majority, row-major order, also known as C order. This is the default order in NumPy. However, packages that render image data may expect it in column-major order. If the axes seem ‘swapped’ this is likely the reason.

The attrs Python attribute acts as a dictionary referencing zAttributes (do not confuse the two); all the dictionary methods above also work on the attribute dictionary. See zAttrList for more on the dictionary of attributes.

With writing, as with reading, every attempt has been made to match the behavior of Python lists. You can write one record, many records, or even certain elements of all records. There is one restriction: only the record dimension (i.e. dimension 0) can be resized by write, as all records in a variable must have the same dimensions. Similarly, only whole records can be deleted.

Note

Unusual error messages on writing data usually mean that pycdf is unable to interpret the data as a regular array of a single type matching the type and shape of the variable being written. A 5x4 array is supported; an irregular array where one row has five columns and a different row has six columns is not. Error messages of this type include:

  • Data must be well-formed, regular array of number, string, or datetime

  • setting an array element with a sequence.

  • shape mismatch: objects cannot be broadcast to a single shape

For these examples, assume Flux has 100 records and dimensions [2, 3].

Rewrite the first record without changing the rest:

>>> Flux[0] = [[1, 2, 3], [4, 5, 6]]

Writes a new first record and delete all the rest:

>>> Flux[...] = [[1, 2, 3], [4, 5, 6]]

Write a new record in the last position and add a new record after:

>>> Flux[99:] = [[[1, 2, 3], [4, 5, 6]],
...              [[11, 12, 13], [14, 15, 16]]]

Insert two new records between the current number 5 and 6:

>>> Flux[5:6] = [[[1, 2, 3], [4, 5, 6]],  [[11, 12, 13],
...               [14, 15, 16]]]

This operation can be quite slow, as it requires reading and rewriting the entire variable. (CDF does not directly support record insertion.)

Change the first element of the first two records but leave other elements alone:

>>> Flux[0:2, 0, 0] = [1, 2]

Remove the first record:

>>> del Flux[0]

Removes record 5 (the sixth):

>>> del Flux[5]

Delete all data from Flux, but leave the variable definition intact:

>>> del Flux[...]

Note

Variables using sparse records do not support insertion and only support deletion of a single record at a time. See sparse() and section 2.3.12 of the CDF user’s guide for more information on sparse records.

Note

Although this interface only directly supports zVariables, zMode is set on opening the CDF so rVars appear as zVars. See p.24 of the CDF user’s guide; pyCDF uses zMode 2.

attrs

zAttributes for this zVariable in a dict-like format.

compress([comptype, param])

Set or check the compression of this variable

copy()

Copies all data and attributes from this variable

dtype

Provide the numpy dtype equivalent to the CDF type of this variable.

dv([new_dv])

Gets or sets dimension variance of each dimension of variable.

insert(index, data)

Inserts a single record before an index

name()

Returns the name of this variable

nelems()

Number of elements for each value in this variable

pad([value])

Gets or sets this variable's pad value.

rename(new_name)

Renames this variable

rv([new_rv])

Gets or sets whether this variable has record variance

shape

Provides the numpy array-like shape of this variable.

sparse([sparsetype])

Gets or sets this variable's sparse records mode.

type([new_type])

Returns or sets the CDF type of this variable

attrs

zAttributes for this zVariable in a dict-like format. See zAttrList for details.

compress(comptype=None, param=None)[source]

Set or check the compression of this variable

Compression may not be changeable on variables with data already written; even deleting the data may not permit the change.

See section 2.6 of the CDF user’s guide for more information on compression.

Returns:
outtuple

the (comptype, param) currently in effect

Other Parameters:
comptypectypes.c_long

type of compression to change to, see CDF C reference manual section 4.10. Constants for this parameter are in const. If not specified, will not change compression.

paramctypes.c_long

Compression parameter, see CDF CRM 4.10 and const. If not specified, will choose reasonable default (5 for gzip; other types have only one possible parameter.)

copy()[source]

Copies all data and attributes from this variable

Returns:
outVarCopy

list of all data in record order

dtype

Provide the numpy dtype equivalent to the CDF type of this variable.

Data from this variable will be returned in numpy arrays of this type.

See also

type
dv(new_dv=None)[source]

Gets or sets dimension variance of each dimension of variable.

If the variance is unknown, True is assumed (this replicates the apparent behavior of the CDF library on variable creation).

Parameters:
new_dvlist of boolean

Each element True to change that dimension to dimension variance, False to change to not dimension variance. (Unspecified to simply check variance.)

Returns:
outlist of boolean

True if that dimension has variance, else false.

insert(index, data)[source]

Inserts a single record before an index

Parameters:
indexint

index before which to insert the new record

data

the record to insert

name()[source]

Returns the name of this variable

Returns:
outstr

variable’s name

nelems()[source]

Number of elements for each value in this variable

This is the length of strings for CHAR and UCHAR, should be 1 otherwise.

Returns:
int

length of strings

pad(value=None)[source]

Gets or sets this variable’s pad value.

See section 2.3.20 of the CDF user’s guide for more information on pad values.

Returns:
out

Current pad value for this variable. None if it has never been set. This rarely happens; the pad value is usually set by the CDF library on variable creation.

Other Parameters:
value

If specified, should be an appropriate pad value. If not specified, the pad value will not be set or changed.

Notes

New in version 0.2.3.

rename(new_name)[source]

Renames this variable

Parameters:
new_namestr

the new name for this variable

rv(new_rv=None)[source]

Gets or sets whether this variable has record variance

If the variance is unknown, True is assumed (this replicates the apparent behavior of the CDF library on variable creation).

Returns:
outBoolean

True if record varying, False if NRV

Other Parameters:
new_rvboolean

True to change to record variance, False to change to NRV, unspecified to simply check variance.

shape

Provides the numpy array-like shape of this variable.

Returns a tuple; first element is number of records (RV variable only) And the rest provide the dimensionality of the variable.

Note

Assigning to this attribute will not change the shape.

sparse(sparsetype=None)[source]

Gets or sets this variable’s sparse records mode.

Sparse records mode may not be changeable on variables with data already written; even deleting the data may not permit the change.

See section 2.3.12 of the CDF user’s guide for more information on sparse records.

Returns:
outctypes.c_long

Sparse record mode for this variable.

Other Parameters:
sparsetypectypes.c_long

If specified, should be a sparse record mode from const; see also CDF C reference manual section 4.11.1. If not specified, the sparse record mode for this variable will not change.

Notes

New in version 0.2.3.

type(new_type=None)[source]

Returns or sets the CDF type of this variable

Parameters:
new_typectypes.c_long

the new type from const

Returns:
outint

CDF type