Availability: ncap2, ncbo, ncea,
ncflint, ncpdq, ncra, ncwa Short options: None |
The phrase packed data refers to data which are stored in the standard netCDF3 packing format which employs a lossy algorithm. See ncks netCDF Kitchen Sink for a description of deflation, a lossless compression technique available with netCDF4 only. Packed data may be deflated to save additional space.
Packing
The standard netCDF packing algorithm is lossy, and produces data with
the same dynamic range as the original but which requires no more than
half the space to store.
The packed variable is stored (usually) as type NC_SHORT
with the two attributes required to unpack the variable,
scale_factor
and add_offset
, stored at the original
(unpacked) precision of the variable
1.
Let min and max be the minimum and maximum values
of x.
scale_factor = (max-min)/ndrv
where ndrv is the number of discrete representable values for given type of packed variable. The theoretical maximum value for ndrv is two raised to the number of bits used to store the packed variable. Thus if the variable is packed into type
NC_SHORT
, a two-byte
datatype, then there are at most 2^16 = 65536 distinct values
representible.
In practice, the number of discretely representible values is taken
to be two less than the theoretical maximum.
This leaves space for a missing value and solves potential problems with
rounding that may occur during the unpacking of the variable.
Thus for NC_SHORT
, ndrv = 65536 - 2 = 65534.
Less often, the variable may be packed into type NC_CHAR
,
where ndrv = 256 - 2 = 254, or type NC_INT
where
where ndrv = 4294967295 - 2 = 4294967293.
One useful feature of (lossy) netCDF packing algorithm is that
additional, loss-less packing algorithms perform well on top of it.
Unpacking
The unpacking algorithm depends on the presence of two attributes,
scale_factor
and add_offset
.
If scale_factor
is present for a variable, the data are
multiplied by the value scale_factor after the data are read.
If add_offset
is present for a variable, then the
add_offset value is added to the data after the data are read.
If both scale_factor
and add_offset
attributes are
present, the data are first scaled by scale_factor before the
offset add_offset is added.
upk = scale_factor*pck + add_offset = (max-min)*pck/ndrv + 0.5*(min+max)
When
scale_factor
and add_offset
are used for packing, the
associated variable (containing the packed data) is typically of type
byte
or short
, whereas the unpacked values are intended to
be of type int
, float
, or double
.
An attribute's scale_factor
and add_offset
and
_FillValue
, if any, should all be of the type intended for the
unpacked data, i.e., int
, float
or double
.
All NCO arithmetic operators understand packed data. The operators automatically unpack any packed variable in the input file which will be arithmetically processed. For example, ncra unpacks all record variables, and ncwa unpacks all variable which contain a dimension to be averaged. These variables are stored unpacked in the output file.
On the other hand, arithmetic operators do not unpack non-processed variables. For example, ncra leaves all non-record variables packed, and ncwa leaves packed all variables lacking an averaged dimension. These variables (called fixed variables) are passed unaltered from the input to the output file. Hence fixed variables which are packed in input files remain packed in output files. Completely packing and unpacking files is easily accomplished with ncpdq (see ncpdq netCDF Permute Dimensions Quickly). Packing and unpacking individual variables may be done with ncpdq and the ncap2 pack() and unpack() functions (see Methods and functions).
[1] Although not a part of the standard, NCO enforces
the policy that the _FillValue
attribute, if any, of a packed
variable is also stored at the original precision.