Next: , Previous: Deflation, Up: Common features


3.24 Packed data

Availability: ncap2, ncbo, ncea, ncflint, ncpdq, ncra, ncwa
Short options: None

The phrase packed data refers to data which are stored in the standard netCDF3 packing format which employs a lossy algorithm. See ncks netCDF Kitchen Sink for a description of deflation, a lossless compression technique available with netCDF4 only. Packed data may be deflated to save additional space.

Packing Algorithm

Packing The standard netCDF packing algorithm is lossy, and produces data with the same dynamic range as the original but which requires no more than half the space to store. The packed variable is stored (usually) as type NC_SHORT with the two attributes required to unpack the variable, scale_factor and add_offset, stored at the original (unpacked) precision of the variable 1. Let min and max be the minimum and maximum values of x.


scale_factor = (max-min)/ndrv
add_offset = 0.5*(min+max)
pck = (upk-add_offset)/scale_factor = (upk-0.5*(min+max))*ndrv/(max-min)

where ndrv is the number of discrete representable values for given type of packed variable. The theoretical maximum value for ndrv is two raised to the number of bits used to store the packed variable. Thus if the variable is packed into type NC_SHORT, a two-byte datatype, then there are at most 2^16 = 65536 distinct values representible. In practice, the number of discretely representible values is taken to be two less than the theoretical maximum. This leaves space for a missing value and solves potential problems with rounding that may occur during the unpacking of the variable. Thus for NC_SHORT, ndrv = 65536 - 2 = 65534. Less often, the variable may be packed into type NC_CHAR, where ndrv = 256 - 2 = 254, or type NC_INT where where ndrv = 4294967295 - 2 = 4294967293. One useful feature of (lossy) netCDF packing algorithm is that additional, loss-less packing algorithms perform well on top of it.

Unpacking Algorithm

Unpacking The unpacking algorithm depends on the presence of two attributes, scale_factor and add_offset. If scale_factor is present for a variable, the data are multiplied by the value scale_factor after the data are read. If add_offset is present for a variable, then the add_offset value is added to the data after the data are read. If both scale_factor and add_offset attributes are present, the data are first scaled by scale_factor before the offset add_offset is added.


upk = scale_factor*pck + add_offset = (max-min)*pck/ndrv + 0.5*(min+max)

When scale_factor and add_offset are used for packing, the associated variable (containing the packed data) is typically of type byte or short, whereas the unpacked values are intended to be of type int, float, or double. An attribute's scale_factor and add_offset and _FillValue, if any, should all be of the type intended for the unpacked data, i.e., int, float or double.

Default Handling of Packed Data

All NCO arithmetic operators understand packed data. The operators automatically unpack any packed variable in the input file which will be arithmetically processed. For example, ncra unpacks all record variables, and ncwa unpacks all variable which contain a dimension to be averaged. These variables are stored unpacked in the output file.

On the other hand, arithmetic operators do not unpack non-processed variables. For example, ncra leaves all non-record variables packed, and ncwa leaves packed all variables lacking an averaged dimension. These variables (called fixed variables) are passed unaltered from the input to the output file. Hence fixed variables which are packed in input files remain packed in output files. Completely packing and unpacking files is easily accomplished with ncpdq (see ncpdq netCDF Permute Dimensions Quickly). Packing and unpacking individual variables may be done with ncpdq and the ncap2 pack() and unpack() functions (see Methods and functions).


Footnotes

[1] Although not a part of the standard, NCO enforces the policy that the _FillValue attribute, if any, of a packed variable is also stored at the original precision.