Next: ncra netCDF Record Averager, Previous: ncks netCDF Kitchen Sink, Up: Operator Reference Manual
ncpdq [-3] [-4] [-6] [-A] [-a [-]dim[,...]] [-C] [-c] [--cnk_dmn nm,sz] [--cnk_map map] [--cnk_plc plc] [--cnk_scl sz] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [-h] [-L dfl_lvl] [-l path] [-M pck_map] [-O] [-o output-file] [-P pck_plc] [-p path] [-R] [-r] [-t thr_nbr] [-U] [-v var[,...]] [-X ...] [-x] input-file [output-file]
DESCRIPTION
ncpdq performs one of two distinct functions, packing or dimension permutation, but not both, when invoked. ncpdq is optimized to perform these actions in a parallel fashion with a minimum of time and memory. The pdq may stand for “Permute Dimensions Quickly”, “Pack Data Quietly”, “Pillory Dan Quayle”, or other silly uses.
The ncpdq packing (and unpacking) algorithms are described
in Methods and functions, and are also implemented in
ncap2.
ncpdq extends the functionality of these algorithms by
providing high level control of the packing policy so that
users can consistently pack (and unpack) entire files with one command.
The user specifies the desired packing policy with the ‘-P’ switch
(or its long option equivalents, ‘--pck_plc’ and
‘--pack_policy’) and its pck_plc argument.
Four packing policies are currently implemented:
ncpack
ncunpack
Regardless of the packing policy selected, ncpdq no longer (as of NCO version 4.0.4 in October, 2010) packs coordinate variables, or the special variables, weights, and other grid properties described in CF Conventions. Prior ncpdq versions treated coordinate variables and grid properties no differently from other variables. However, coordinate variables are one-dimensional, so packing saves little space on large files, and the resulting files are difficult for humans to read. Concurrently, Gaussian and area weights and other grid properties are often used to derive fields in re-inflated (unpacked) files, so packing such grid properties causes a considerable loss of precision in downstream data processing. If users express strong wishes to pack grid properties, we will implement new packing policies. An immediate workaround for those needing to pack grid properties now, is to use the ncap2 packing functions or to rename the grid properties prior to calling ncpdq. We welcome your feedback.
To reduce required memorization of these complex policy switches, ncpdq may also be invoked via a synonym or with switches that imply a particular policy. ncpack is a synonym for ncpdq and behaves the same in all respects. Both ncpdq and ncpack assume a default packing policy request of ‘all_new’. Hence ncpack may be invoked without any ‘-P’ switch, unlike ncpdq. Similarly, ncunpack is a synonym for ncpdq except that ncpack implicitly assumes a request to unpack, i.e., ‘-P pck_upk’. Finally, the ncpdq ‘-U’ switch (or its long option equivalents, ‘--upk’ and ‘--unpack’) requires no argument. It simply requests unpacking.
Given the menagerie of synonyms, equivalent options, and implied
options, a short list of some equivalent commands is appropriate.
The following commands are equivalent for packing:
ncpdq -P all_new
, ncpdq --pck_plc=all_new
, and
ncpack
.
The following commands are equivalent for unpacking:
ncpdq -P upk
, ncpdq -U
, ncpdq --pck_plc=unpack
,
and ncunpack
.
Equivalent commands for other packing policies, e.g., ‘all_xst’,
follow by analogy.
Note that ncpdq synonyms are subject to the same constraints
and recommendations discussed in the secion on ncbo synonyms
(see ncbo netCDF Binary Operator).
That is, symbolic links must exist from the synonym to ncpdq,
or else the user must define an alias.
The ncpdq packing algorithms must know to which type
particular types of input variables are to be packed.
The correspondence between the input variable type and the output,
packed type, is called the packing map.
The user specifies the desired packing map with the ‘-M’ switch
(or its long option equivalents, ‘--pck_map’ and
‘--map’) and its pck_map argument.
Five packing maps are currently implemented:
NC_SHORT
[default]NC_SHORT
NC_DOUBLE
,NC_FLOAT
] to NC_SHORT
NC_INT
,NC_SHORT
,NC_CHAR
,NC_BYTE
]NC_BYTE
NC_BYTE
NC_DOUBLE
,NC_FLOAT
] to NC_BYTE
NC_INT
,NC_SHORT
,NC_CHAR
,NC_BYTE
]NC_SHORT
NC_SHORT
NC_DOUBLE
,NC_FLOAT
,NC_INT
] to NC_SHORT
NC_SHORT
,NC_CHAR
,NC_BYTE
]NC_BYTE
NC_BYTE
NC_DOUBLE
,NC_FLOAT
,NC_INT
,NC_SHORT
] to NC_BYTE
NC_CHAR
,NC_BYTE
]NC_DOUBLE
to NC_INT
.
Pack [NC_FLOAT
,NC_INT
] to NC_SHORT
.
Pack NC_SHORT
to NC_BYTE
.NC_CHAR
,NC_BYTE
]NC_FLOAT
-dominated
file size by about 50%.
‘flt_byt’ packing reduces an NC_DOUBLE
-dominated file by
about 87%.
The netCDF packing algorithm (see Methods and functions) is
lossy—once packed, the exact original data cannot be recovered without
a full backup.
Hence users should be aware of some packing caveats:
First, the interaction of packing and data equal to the
_FillValue is complex.
Test the _FillValue
behavior by performing a pack/unpack cycle
to ensure data that are missing stay missing and data that are
not misssing do not join the Air National Guard and go missing.
This may lead you to elect a new _FillValue.
Second, ncpdq
actually allows packing into NC_CHAR
(with,
e.g., ‘flt_chr’).
However, the intrinsic conversion of signed char
to higher
precision types is tricky for values equal to zero, i.e., for
NUL
.
Hence packing to NC_CHAR
is not documented or advertised.
Pack into NC_BYTE
(with, e.g., ‘flt_byt’) instead.
ncpdq re-shapes variables in input-file by re-ordering and/or reversing dimensions specified in the dimension list. The dimension list is a whitespace-free, comma separated list of dimension names, optionally prefixed by negative signs, that follows the ‘-a’ (or long options ‘--arrange’, ‘--permute’, ‘--re-order’, or ‘--rdr’) switch. To re-order variables by a subset of their dimensions, specify these dimensions in a comma-separated list following ‘-a’, e.g., ‘-a lon,lat’. To reverse a dimension, prefix its name with a negative sign in the dimension list, e.g., ‘-a -lat’. Re-ordering and reversal may be performed simultaneously, e.g., ‘-a lon,-lat,time,-lev’.
Users may specify any permutation of dimensions, including permutations which change the record dimension identity. The record dimension is re-ordered like any other dimension. This unique ncpdq capability makes it possible to concatenate files along any dimension. See Concatenation for a detailed example. The record dimension is always the most slowly varying dimension in a record variable (see C and Fortran Index Conventions). The specified re-ordering fails if it requires creating more than one record dimension amongst all the output variables 1.
Two special cases of dimension re-ordering and reversal deserve special mention. First, it may be desirable to completely reverse the storage order of a variable. To do this, include all the variable's dimensions in the dimension re-order list in their original order, and prefix each dimension name with the negative sign. Second, it may useful to transpose a variable's storage order, e.g., from C to Fortran data storage order (see C and Fortran Index Conventions). To do this, include all the variable's dimensions in the dimension re-order list in reversed order. Explicit examples of these two techniques appear below.
Pack and unpack all variables in file in.nc and store the results in out.nc:
ncpdq in.nc out.nc # Same as ncpack in.nc out.nc ncpdq -P all_new -M flt_sht in.nc out.nc # Defaults ncpdq -P all_xst in.nc out.nc ncpdq -P upk in.nc out.nc # Same as ncunpack in.nc out.nc ncpdq -U in.nc out.nc # Same as ncunpack in.nc out.nc
The first two commands pack any unpacked variable in the input file. They also unpack and then re-pack every packed variable. The third command only packs unpacked variables in the input file. If a variable is already packed, the third command copies it unchanged to the output file. The fourth and fifth commands unpack any packed variables. If a variable is not packed, the third command copies it unchanged.
The previous examples all utilized the default packing map. Suppose you wish to archive all data that are currently unpacked into a form which only preserves 256 distinct values. Then you could specify the packing map pck_map as ‘hgh_byt’ and the packing policy pck_plc as ‘all_xst’:
ncpdq -P all_xst -M hgh_byt in.nc out.nc
Many different packing maps may be used to construct a given file by performing the packing on subsets of variables (e.g., with ‘-v’) and using the append feature with ‘-A’ (see Appending Variables).
Re-order file in.nc so that the dimension lon
always
precedes the dimension lat
and store the results in
out.nc:
ncpdq -a lon,lat in.nc out.nc ncpdq -v three_dmn_var -a lon,lat in.nc out.nc
The first command re-orders every variable in the input file.
The second command extracts and re-orders only the variable
three_dmn_var
.
Suppose the dimension lat
represents latitude and monotonically
increases increases from south to north.
Reversing the lat
dimension means re-ordering the data so that
latitude values decrease monotonically from north to south.
Accomplish this with
% ncpdq -a -lat in.nc out.nc % ncks -C -v lat in.nc lat[0]=-90 lat[1]=90 % ncks -C -v lat out.nc lat[0]=90 lat[1]=-90
This operation reversed the latitude dimension of all variables. Whitespace immediately preceding the negative sign that specifies dimension reversal may be dangerous. Quotes and long options can help protect negative signs that should indicate dimension reversal from being interpreted by the shell as dashes that indicate new command line switches.
ncpdq -a -lat in.nc out.nc # Dangerous? Whitespace before "-lat" ncpdq -a '-lat' in.nc out.nc # OK. Quotes protect "-" in "-lat" ncpdq -a lon,-lat in.nc out.nc # OK. No whitespace before "-" ncpdq --rdr=-lat in.nc out.nc # Preferred. Uses "=" not whitespace
To create the mathematical transpose of a variable, place all its
dimensions in the dimension re-order list in reversed order.
This example creates the transpose of three_dmn_var
:
% ncpdq -a lon,lev,lat -v three_dmn_var in.nc out.nc % ncks -C -v three_dmn_var in.nc lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0 lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1 lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2 ... lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21 lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22 lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23 % ncks -C -v three_dmn_var out.nc lon[0]=0 lev[0]=100 lat[0]=-90 three_dmn_var[0]=0 lon[0]=0 lev[0]=100 lat[1]=90 three_dmn_var[1]=12 lon[0]=0 lev[1]=500 lat[0]=-90 three_dmn_var[2]=4 ... lon[3]=270 lev[1]=500 lat[1]=90 three_dmn_var[21]=19 lon[3]=270 lev[2]=1000 lat[0]=-90 three_dmn_var[22]=11 lon[3]=270 lev[2]=1000 lat[1]=90 three_dmn_var[23]=23
To completely reverse the storage order of a variable, include
all its dimensions in the re-order list, each prefixed by a negative
sign.
This example reverses the storage order of three_dmn_var
:
% ncpdq -a -lat,-lev,-lon -v three_dmn_var in.nc out.nc % ncks -C -v three_dmn_var in.nc lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0 lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1 lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2 ... lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21 lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22 lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23 % ncks -C -v three_dmn_var out.nc lat[0]=90 lev[0]=1000 lon[0]=270 three_dmn_var[0]=23 lat[0]=90 lev[0]=1000 lon[1]=180 three_dmn_var[1]=22 lat[0]=90 lev[0]=1000 lon[2]=90 three_dmn_var[2]=21 ... lat[1]=-90 lev[2]=100 lon[1]=180 three_dmn_var[21]=2 lat[1]=-90 lev[2]=100 lon[2]=90 three_dmn_var[22]=1 lat[1]=-90 lev[2]=100 lon[3]=0 three_dmn_var[23]=0
Creating a record dimension named, e.g., time
, in a file which
has no existing record dimension is simple with ncecat:
ncecat -O -u time in.nc out.nc # Create degenerate record dimension named "time"
Now consider a file with all dimensions, including time
, fixed
(non-record).
Suppose the user wishes to convert time
from a fixed dimension to
a record dimension.
This may be useful, for example, when the user wishes to append
additional time slices to the data.
As of NCO version 4.0.1 (April, 2010) the preferred method for
doing this is with ncks:
ncks -O --mk_rec_dmn time in.nc out.nc # Change "time" to record dimension
Prior to 4.0.1, the procedure to change an existing fixed dimension into a record dimension required three separate commands, ncecat followed by ncpdq, and then ncwa. It is still instructive to present the original procedure, as it shows how multiple operators can achieve the same ends by different means:
ncecat -O in.nc out.nc # Add degenerate record dimension named "record" ncpdq -O -a time,record out.nc out.nc # Switch "record" and "time" ncwa -O -a record out.nc out.nc # Remove (degenerate) "record"
The first step creates a degenerate (size equals one) record dimension
named (by default) record
.
The second step swaps the ordering of the dimensions named time
and record
.
Since time
now occupies the position of the first (least rapidly
varying) dimension, it becomes the record dimension.
The dimension named record
is no longer a record dimension.
The third step averages over this degenerate record
dimension.
Averaging over a degenerate dimension does not alter the data.
The ordering of other dimensions in the file (lat
, lon
,
etc.) is immaterial to this procedure.
See ncecat netCDF Ensemble Concatenator and
ncks netCDF Kitchen Sink for other methods of
changing variable dimensionality, including the record dimension.
[1] This limitation, imposed by the netCDF storage layer, may be relaxed in the future with netCDF4.