Next: , Previous: ncks netCDF Kitchen Sink, Up: Operator Reference Manual


4.8 ncpdq netCDF Permute Dimensions Quickly

SYNTAX

     ncpdq [-3] [-4] [-6] [-A] [-a [-]dim[,...]] [-C] [-c] [-D dbg]
     [-d dim,[min][,[max][,[stride]]] [-F] [-h] [-L dfl_lvl] [-l path]
     [-M pck_map] [-O] [-o output-file] [-P pck_plc] [-p path]
     [-R] [-r] [-t thr_nbr] [-U] [-v var[,...]] [-X ...] [-x]
     input-file [output-file]

DESCRIPTION

ncpdq performs one of two distinct functions, packing or dimension permutation, but not both, when invoked. ncpdq is optimized to perform these actions in a parallel fashion with a minimum of time and memory. The pdq may stand for “Permute Dimensions Quickly”, “Pack Data Quietly”, “Pillory Dan Quayle”, or other silly uses.

Packing and Unpacking Functions

The ncpdq packing (and unpacking) algorithms are described in Methods and functions, and are also implemented in ncap2. ncpdq extends the functionality of these algorithms by providing high level control of the packing policy so that users can pack (and unpack) entire files consistently with one command. The user specifies the desired packing policy with the ‘-P’ switch (or its long option equivalents, ‘--pck_plc’ and ‘--pack_policy’) and its pck_plc argument. Four packing policies are currently implemented:

Packing (and Re-Packing) Variables [default]
Definition: Pack unpacked variables, re-pack packed variables
Alternate invocation: ncpack
pck_plc key values: ‘all_new’, ‘pck_all_new_att

Packing (and not Re-Packing) Variables
Definition: Pack unpacked variables, copy packed variables
Alternate invocation: none
pck_plc key values: ‘all_xst’, ‘pck_all_xst_att

Re-Packing Variables
Definition: Re-pack packed variables, copy unpacked variables
Alternate invocation: none
pck_plc key values: ‘xst_new’, ‘pck_xst_new_att

Unpacking
Definition: Unpack packed variables, copy unpacked variables
Alternate invocation: ncunpack
pck_plc key values: ‘upk’, ‘unpack’, ‘pck_upk
Equivalent key values are fully interchangeable. Multiple equivalent options are provided to satisfy disparate needs and tastes of NCO users working with scripts and from the command line.

To reduce required memorization of these complex policy switches, ncpdq may also be invoked via a synonym or with switches that imply a particular policy. ncpack is a synonym for ncpdq and behaves the same in all respects. Both ncpdq and ncpack assume a default packing policy request of ‘all_new’. Hence ncpack may be invoked without any ‘-P’ switch, unlike ncpdq. Similarly, ncunpack is a synonym for ncpdq except that ncpack implicitly assumes a request to unpack, i.e., ‘-P pck_upk’. Finally, the ncpdq-U’ switch (or its long option equivalents, ‘--upk’ and ‘--unpack’) requires no argument. It simply requests unpacking.

Given the menagerie of synonyms, equivalent options, and implied options, a short list of some equivalent commands is appropriate. The following commands are equivalent for packing: ncpdq -P all_new, ncpdq --pck_plc=all_new, and ncpack. The following commands are equivalent for unpacking: ncpdq -P upk, ncpdq -U, ncpdq --pck_plc=unpack, and ncunpack. Equivalent commands for other packing policies, e.g., ‘all_xst’, follow by analogy. Note that ncpdq synonyms are subject to the same constraints and recommendations discussed in the secion on ncbo synonyms (see ncbo netCDF Binary Operator). That is, symbolic links must exist from the synonym to ncpdq, or else the user must define an alias.

The ncpdq packing algorithms must know to which type particular types of input variables are to be packed. The correspondence between the input variable type and the output, packed type, is called the packing map. The user specifies the desired packing map with the ‘-M’ switch (or its long option equivalents, ‘--pck_map’ and ‘--map’) and its pck_map argument. Five packing maps are currently implemented:

Pack Floating Precisions to NC_SHORT [default]
Definition: Pack floating precision types to NC_SHORT
Map: Pack [NC_DOUBLE,NC_FLOAT] to NC_SHORT
Types copied instead of packed: [NC_INT,NC_SHORT,NC_CHAR,NC_BYTE]
pck_map key values: ‘flt_sht’, ‘pck_map_flt_sht

Pack Floating Precisions to NC_BYTE
Definition: Pack floating precision types to NC_BYTE
Map: Pack [NC_DOUBLE,NC_FLOAT] to NC_BYTE
Types copied instead of packed: [NC_INT,NC_SHORT,NC_CHAR,NC_BYTE]
pck_map key values: ‘flt_byt’, ‘pck_map_flt_byt

Pack Higher Precisions to NC_SHORT
Definition: Pack higher precision types to NC_SHORT
Map: Pack [NC_DOUBLE,NC_FLOAT,NC_INT] to NC_SHORT
Types copied instead of packed: [NC_SHORT,NC_CHAR,NC_BYTE]
pck_map key values: ‘hgh_sht’, ‘pck_map_hgh_sht

Pack Higher Precisions to NC_BYTE
Definition: Pack higher precision types to NC_BYTE
Map: Pack [NC_DOUBLE,NC_FLOAT,NC_INT,NC_SHORT] to NC_BYTE
Types copied instead of packed: [NC_CHAR,NC_BYTE]
pck_map key values: ‘hgh_byt’, ‘pck_map_hgh_byt

Pack to Next Lesser Precision
Definition: Pack each type to type of next lesser size
Map: Pack NC_DOUBLE to NC_INT. Pack [NC_FLOAT,NC_INT] to NC_SHORT. Pack NC_SHORT to NC_BYTE.
Types copied instead of packed: [NC_CHAR,NC_BYTE]
pck_map key values: ‘nxt_lsr’, ‘pck_map_nxt_lsr
The default ‘all_new’ packing policy with the default ‘flt_sht’ packing map reduces the typical NC_FLOAT-dominated file size by about 50%.flt_byt’ packing reduces an NC_DOUBLE-dominated file by about 87%.

The netCDF packing algorithm (see Methods and functions) is lossy—once packed, the exact original data cannot be recovered without a full backup. Hence users should be aware of some packing caveats: First, the interaction of packing and data equal to the _FillValue is complex. Test the _FillValue behavior by performing a pack/unpack cycle to ensure data that are missing stay missing and data that are not misssing do not join the Air National Guard and go missing. This may lead you to elect a new _FillValue. Second, ncpdq actually allows packing into NC_CHAR (with, e.g., ‘flt_chr’). However, the intrinsic conversion of signed char to higher precision types is tricky so for values equal to zero, i.e., NUL. Hence packing to NC_CHAR is not documented or advertised. Pack into NC_BYTE (with, e.g., ‘flt_byt’) instead.

Dimension Permutation

ncpdq re-shapes variables in input-file by re-ordering and/or reversing dimensions specified in the dimension list. The dimension list is a whitespace-free, comma separated list of dimension names, optionally prefixed by negative signs, that follows the ‘-a’ (or long options ‘--arrange’, ‘--permute’, ‘--re-order’, or ‘--rdr’) switch. To re-order variables by a subset of their dimensions, specify these dimensions in a comma-separated list following ‘-a’, e.g., ‘-a lon,lat’. To reverse a dimension, prefix its name with a negative sign in the dimension list, e.g., ‘-a -lat’. Re-ordering and reversal may be performed simultaneously, e.g., ‘-a lon,-lat,time,-lev’.

Users may specify any permutation of dimensions, including permutations which change the record dimension identity. The record dimension is re-ordered like any other dimension. This unique ncpdq capability makes it possible to concatenate files along any dimension. See Concatenation for a detailed example. The record dimension is always the most slowly varying dimension in a record variable (see C and Fortran Index Conventions). The specified re-ordering fails if it requires creating more than one record dimension amongst all the output variables 1.

Two special cases of dimension re-ordering and reversal deserve special mention. First, it may be desirable to completely reverse the storage order of a variable. To do this, include all the variable's dimensions in the dimension re-order list in their original order, and prefix each dimension name with the negative sign. Second, it may useful to transpose a variable's storage order, e.g., from C to Fortran data storage order (see C and Fortran Index Conventions). To do this, include all the variable's dimensions in the dimension re-order list in reversed order. Explicit examples of these two techniques appear below.

EXAMPLES

Pack and unpack all variables in file in.nc and store the results in out.nc:

     ncpdq in.nc out.nc # Same as ncpack in.nc out.nc
     ncpdq -P all_new -M flt_sht in.nc out.nc # Defaults
     ncpdq -P all_xst in.nc out.nc
     ncpdq -P upk in.nc out.nc # Same as ncunpack in.nc out.nc
     ncpdq -U in.nc out.nc # Same as ncunpack in.nc out.nc

The first two commands pack any unpacked variable in the input file. They also unpack and then re-pack every packed variable. The third command only packs unpacked variables in the input file. If a variable is already packed, the third command copies it unchanged to the output file. The fourth and fifth commands unpack any packed variables. If a variable is not packed, the third command copies it unchanged.

The previous examples all utilized the default packing map. Suppose you wish to archive all data that are currently unpacked into a form which only preserves 256 distinct values. Then you could specify the packing map pck_map as ‘hgh_byt’ and the packing policy pck_plc as ‘all_xst’:

     ncpdq -P all_xst -M hgh_byt in.nc out.nc

Many different packing maps may be used to construct a given file by performing the packing on subsets of variables (e.g., with ‘-v’) and using the append feature with ‘-A’ (see Appending Variables).

Re-order file in.nc so that the dimension lon always precedes the dimension lat and store the results in out.nc:

     ncpdq -a lon,lat in.nc out.nc
     ncpdq -v three_dmn_var -a lon,lat in.nc out.nc

The first command re-orders every variable in the input file. The second command extracts and re-orders only the variable three_dmn_var.

Suppose the dimension lat represents latitude and monotonically increases increases from south to north. Reversing the lat dimension means re-ordering the data so that latitude values decrease monotonically from north to south. Accomplish this with

     % ncpdq -a -lat in.nc out.nc
     % ncks -C -v lat in.nc
     lat[0]=-90
     lat[1]=90
     % ncks -C -v lat out.nc
     lat[0]=90
     lat[1]=-90

This operation reversed the latitude dimension of all variables. Whitespace immediately preceding the negative sign that specifies dimension reversal may be dangerous. Quotes and long options can help protect negative signs that should indicate dimension reversal from being interpreted by the shell as dashes that indicate new command line switches.

     ncpdq -a -lat in.nc out.nc # Dangerous? Whitespace before "-lat"
     ncpdq -a '-lat' in.nc out.nc # OK. Quotes protect "-" in "-lat"
     ncpdq -a lon,-lat in.nc out.nc # OK. No whitespace before "-"
     ncpdq --rdr=-lat in.nc out.nc # Preferred. Uses "=" not whitespace

To create the mathematical transpose of a variable, place all its dimensions in the dimension re-order list in reversed order. This example creates the transpose of three_dmn_var:

     % ncpdq -a lon,lev,lat -v three_dmn_var in.nc out.nc
     % ncks -C -v three_dmn_var in.nc
     lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0
     lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1
     lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2
     ...
     lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21
     lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22
     lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23
     % ncks -C -v three_dmn_var out.nc
     lon[0]=0 lev[0]=100 lat[0]=-90 three_dmn_var[0]=0
     lon[0]=0 lev[0]=100 lat[1]=90 three_dmn_var[1]=12
     lon[0]=0 lev[1]=500 lat[0]=-90 three_dmn_var[2]=4
     ...
     lon[3]=270 lev[1]=500 lat[1]=90 three_dmn_var[21]=19
     lon[3]=270 lev[2]=1000 lat[0]=-90 three_dmn_var[22]=11
     lon[3]=270 lev[2]=1000 lat[1]=90 three_dmn_var[23]=23

To completely reverse the storage order of a variable, include all its dimensions in the re-order list, each prefixed by a negative sign. This example reverses the storage order of three_dmn_var:

     % ncpdq -a -lat,-lev,-lon -v three_dmn_var in.nc out.nc
     % ncks -C -v three_dmn_var in.nc
     lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0
     lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1
     lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2
     ...
     lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21
     lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22
     lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23
     % ncks -C -v three_dmn_var out.nc
     lat[0]=90 lev[0]=1000 lon[0]=270 three_dmn_var[0]=23
     lat[0]=90 lev[0]=1000 lon[1]=180 three_dmn_var[1]=22
     lat[0]=90 lev[0]=1000 lon[2]=90 three_dmn_var[2]=21
     ...
     lat[1]=-90 lev[2]=100 lon[1]=180 three_dmn_var[21]=2
     lat[1]=-90 lev[2]=100 lon[2]=90 three_dmn_var[22]=1
     lat[1]=-90 lev[2]=100 lon[3]=0 three_dmn_var[23]=0

Consider a file with all dimensions, including time, fixed (non-record). Suppose the user wishes to convert time from a fixed dimension to a record dimension. This may be useful, for example, when the user wishes to append additional time slices to the data. The procedure is to use ncecat followed by ncpdq and then ncwa:

     ncecat -O in.nc out.nc # Add degenerate record dimension named "record"
     ncpdq -O -a time,record out.nc out.nc # Switch "record" and "time"
     ncwa -O -a record out.nc out.nc # Remove (degenerate) "record"

The first step creates a degenerate (size equals one) record dimension named (by default) record. The second step swaps the ordering of the dimensions named time and record. Since time now occupies the position of the first (least rapidly varying) dimension, it becomes the record dimension. The dimension named record is no longer a record dimension. The third step averages over this degenerate record dimension. Averaging over a degenerate dimension does not alter the data. The ordering of other dimensions in the file (lat, lon, etc.) is immaterial to this procedure. See ncecat netCDF Ensemble Concatenator for other methods of changing variable dimensionality, including the record dimension.


Footnotes

[1] This limitation, imposed by the netCDF storage layer, may be relaxed in the future with netCDF4.