Next: , Previous: ncea netCDF Ensemble Averager, Up: Operator Reference Manual


4.5 ncecat netCDF Ensemble Concatenator

SYNTAX

     ncecat [-3] [-4] [-6] [-A] [-C] [-c]
     [--cnk_dmn nm,sz] [--cnk_map map] [--cnk_plc plc] [--cnk_scl sz]
     [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [-h] [-L dfl_lvl] [-l path]
     [-M] [-n loop] [-O] [-o output-file] [-p path] [-R] [-r]
     [-t thr_nbr] [-u ulm_nm] [-v var[,...]] [-X ...] [-x]
     [input-files] [output-file]

DESCRIPTION

ncecat concatenates an arbitrary number of input files into a single output file. The input-files are stored consecutively as records in output-file. Each variable (except coordinate variables) in each input file becomes one record in the same variable in the output file. Coordinate variables are not concatenated, they are instead simply copied from the first input file to the output-file. All input-files must contain all extracted variables (or else there would be "gaps" in the output file).

A new record dimension is the glue which binds the input file data together. The new record dimension name is, by default, “record”. Its name can be specified with the ‘-u ulm_nm’ short option (or the ‘--ulm_nm’ or ‘rcd_nm’ long options).

Each extracted variable must be constant in size and rank across all input-files. The only exception is that ncecat allows files to differ in the record dimension size if the requested record hyperslab (see Hyperslabs) resolves to the same size for all files. This allows easier gluing/averaging of unequal length timeseries from simulation ensembles (e.g., the IPCC AR4 archive).

Thus, the output-file size is the sum of the sizes of the extracted variables in the input files. See Averaging vs. Concatenating, for a description of the distinctions between the various averagers and concatenators. As a multi-file operator, ncecat will read the list of input-files from stdin if they are not specified as positional arguments on the command line (see Large Numbers of Files).

Turn off global metadata copying. By default all NCO operators copy the global metadata of the first input file into output-file. This helps preserve the provenance of the output data. However, the use of metadata is burgeoning and is not uncommon to encounter files with excessive amounts of extraneous metadata. Extracting small bits of data from such files leads to output files which are much larger than necessary due to the automatically copied metadata. ncecat supports turning off the default copying of global metadata via the ‘-M’ switch (or its long option equivalents, ‘--glb_mtd_spr’ and ‘--global_metadata_suppress’).

Consider five realizations, 85a.nc, 85b.nc, ... 85e.nc of 1985 predictions from the same climate model. Then ncecat 85?.nc 85_ens.nc glues the individual realizations together into the single file, 85_ens.nc. If an input variable was dimensioned [lat,lon], it will by default have dimensions [record,lat,lon] in the output file. A restriction of ncecat is that the hyperslabs of the processed variables must be the same from file to file. Normally this means all the input files are the same size, and contain data on different realizations of the same variables.

Concatenating a variable packed with different scales across multiple datasets is beyond the capabilities of ncecat (and ncrcat, the other concatenator (Concatenation). ncecat does not unpack data, it simply copies the data from the input-files, and the metadata from the first input-file, to the output-file. This means that data compressed with a packing convention must use the identical packing parameters (e.g., scale_factor and add_offset) for a given variable across all input files. Otherwise the concatenated dataset will not unpack correctly. The workaround for cases where the packing parameters differ across input-files requires three steps: First, unpack the data using ncpdq. Second, concatenate the unpacked data using ncecat, Third, re-pack the result with ncpdq.

EXAMPLES

Consider a model experiment which generated five realizations of one year of data, say 1985. You can imagine that the experimenter slightly perturbs the initial conditions of the problem before generating each new solution. Assume each file contains all twelve months (a seasonal cycle) of data and we want to produce a single file containing all the seasonal cycles. Here the numeric filename suffix denotes the experiment number (not the month):

     ncecat 85_01.nc 85_02.nc 85_03.nc 85_04.nc 85_05.nc 85.nc
     ncecat 85_0[1-5].nc 85.nc
     ncecat -n 5,2,1 85_01.nc 85.nc

These three commands produce identical answers. See Specifying Input Files, for an explanation of the distinctions between these methods. The output file, 85.nc, is five times the size as a single input-file. It contains 60 months of data.

One often prefers that the (new) record dimension have a more descriptive, context-based name than simply “record”. This is easily accomplished with the ‘-u ulm_nm’ switch:

     ncecat -u realization 85_0[1-5].nc 85.nc

Users are more likely to understand the data processing history when such descriptive coordinates are used.

Consider a file with an existing record dimension named time. and suppose the user wishes to convert time from a record dimension to a non-record dimension. This may be useful, for example, when the user has another use for the record variable. The procedure is to use ncecat followed by ncwa:

     ncecat in.nc out.nc # Convert time to non-record dimension
     ncwa -a record in.nc out.nc # Remove new degenerate record dimension

The second step removes the degenerate record dimension. See ncpdq netCDF Permute Dimensions Quickly and ncks netCDF Kitchen Sink for other methods of of changing variable dimensionality, including the record dimension.