Availability: ncap2, ncbo, ncea,
ncecat, ncflint, ncks, ncpdq,
ncra, ncrcat, ncwa Short options: none Long options: ‘--cnk_dmn dmn_nm,cnk_sz’, ‘--chunk_dimension dmn_nm,cnk_sz’ , ‘--cnk_map cnk_map’, ‘--chunk_map cnk_map’, ‘--cnk_plc cnk_plc’, ‘--chunk_policy cnk_plc’, ‘--cnk_scl cnk_sz’, ‘--chunk_scalar cnk_sz’ |
All netCDF4-enabled NCO operators that define variables support a plethora of chunksize options. Chunking can significantly accelerate or degrade read/write access to large datasets. Dataset chunking issues are described in detail here.
The NCO chunking implementation is designed to be flexible. Users control three aspects of the chunking implementation. These are known as the chunking policy, chunking map, and chunksize. The first two are high-level mechanisms that apply to an entire file, while the third allows per-dimension specification of parameters. The implementation is a hybrid of the ncpdq packing policies (see ncpdq netCDF Permute Dimensions Quickly), and the hyperslab specifications (see Hyperslabs). Each aspect is intended to have a sensible default, so that most users will only need to set one switch to obtain sensible chunking. Power users can tune the three switches in tandem to obtain optimal performance.
The user specifies the desired chunking policy with the ‘-P’ switch
(or its long option equivalents, ‘--cnk_plc’ and
‘--chunk_policy’) and its cnk_plc argument.
Five chunking policies are currently implemented:
ncchunk
ncunchunk
The chunking algorithms must know the chunksizes of each dimension of
each variable to be chunked.
The correspondence between the input variable shape and the chunksizes
is called the chunking map.
The user specifies the desired chunking map with the ‘-M’ switch
(or its long option equivalents, ‘--cnk_map’ and
‘--chunk_map’) and its cnk_map argument.
Four chunking maps are currently implemented:
# Simple chunking and unchunking ncks -O -4 --cnk_plc=all in.nc out.nc # Chunk in.nc ncks -O -4 --cnk_plc=unchunk in.nc out.nc # Unchunk in.nc # Chunk data then unchunk it, printing informative metadata ncks -O -4 -D 4 --cnk_plc=all ~/nco/data/in.nc ~/foo.nc ncks -O -4 -D 4 --cnk_plc=uck ~/foo.nc ~/foo.nc # More complex chunking procedures, with informative metadata ncks -O -4 -D 4 --cnk_scl=8 ~/nco/data/in.nc ~/foo.nc ncks -O -4 -D 4 --cnk_scl=8 /data/zender/dstmch90/dstmch90_clm.nc ~/foo.nc ncks -O -4 -D 4 --cnk_dmn lat,64 --cnk_dmn lon,128 /data/zender/dstmch90/dstmch90_clm.nc ~/foo.nc ncks -O -4 -D 4 --cnk_plc=uck ~/foo.nc ~/foo.nc ncks -O -4 -D 4 --cnk_plc=g2d --cnk_map=rd1 --cnk_dmn lat,32 --cnk_dmn lon,128 /data/zender/dstmch90/dstmch90_clm_0112.nc ~/foo.nc # Chunking works with all operators... ncap2 -O -4 -D 4 --cnk_scl=8 -S ~/nco/data/ncap2_tst.nco ~/nco/data/in.nc ~/foo.nc ncbo -O -4 -D 4 --cnk_scl=8 -p ~/nco/data in.nc in.nc ~/foo.nc ncecat -O -4 -D 4 -n 12,2,1 --cnk_dmn lat,32 -p /data/zender/dstmch90 dstmch90_clm01.nc ~/foo.nc ncflint -O -4 -D 4 --cnk_scl=8 ~/nco/data/in.nc ~/foo.nc ncpdq -O -4 -D 4 -P all_new --cnk_scl=8 -L 5 ~/nco/data/in.nc ~/foo.nc ncrcat -O -4 -D 4 -n 12,2,1 --cnk_dmn lat,32 -p /data/zender/dstmch90 dstmch90_clm01.nc ~/foo.nc ncwa -O -4 -D 4 -a time --cnk_plc=g2d --cnk_map=rd1 --cnk_dmn lat,32 --cnk_dmn lon,128 /data/zender/dstmch90/dstmch90_clm_0112.nc ~/foo.nc
It is appropriate to conclude by informing users about an aspect of chunking that may not be expected: Record dimensions are always chunked with a chunksize of one. Hence all variables that contain the record dimension are also stored as chunked (since data must be stored with chunking either in all dimensions, or in no dimensions). Unless otherwise specified by the user, the other (fixed, non-record) dimensions of such variables are assigned default chunk sizes. The HDF5 layer does all this automatically to optimize the on-disk variable/file storage geometry of record variables. Do not be surprised to learn that files created without any explicit instructions to activate chunking nevertheless contain chunked variables.