Next: , Previous: Rebasing Time Coordinate, Up: Common features


3.21 Missing values

Availability: ncap2, ncbo, ncea, ncflint, ncpdq, ncra, ncwa
Short options: None

The phrase missing data refers to data points that are missing, invalid, or for any reason not intended to be arithmetically processed in the same fashion as valid data. The NCO arithmetic operators attempt to handle missing data in an intelligent fashion. There are four steps in the NCO treatment of missing data:

  1. Identifying variables that may contain missing data.

    NCO follows the convention that missing data should be stored with the _FillValue specified in the variable's _FillValue attributes. The only way NCO recognizes that a variable may contain missing data is if the variable has a _FillValue attribute. In this case, any elements of the variable which are numerically equal to the _FillValue are treated as missing data.

    NCO adopted the behavior that the default attribute name, if any, assumed to specify the value of data to ignore is _FillValue with version 3.9.2 (August, 2007). Prior to that, the missing_value attribute, if any, was assumed to specify the value of data to ignore. Supporting both of these attributes simultaneously is not practical. Hence the behavior NCO once applied to missing_value it now applies to any _FillValue. NCO now treats any missing_value as normal data 1.

    It has been and remains most advisable to create both _FillValue and missing_value attributes with identical values in datasets. Many legacy datasets contain only missing_value attributes. NCO can help migrating datasets between these conventions. One may use ncrename (see ncrename netCDF Renamer) to rename all missing_value attributes to _FillValue:

              ncrename -a .missing_value,_FillValue inout.nc
    

    Alternatively, one may use ncatted (see ncatted netCDF Attribute Editor) to add a _FillValue attribute to all variables

              ncatted -O -a _FillValue,,o,f,1.0e36 inout.nc
    
  2. Converting the _FillValue to the type of the variable, if neccessary.

    Consider a variable var of type var_type with a _FillValue attribute of type att_type containing the value _FillValue. As a guideline, the type of the _FillValue attribute should be the same as the type of the variable it is attached to. If var_type equals att_type then NCO straightforwardly compares each value of var to _FillValue to determine which elements of var are to be treated as missing data. If not, then NCO converts _FillValue from att_type to var_type by using the implicit conversion rules of C, or, if att_type is NC_CHAR 2, by typecasting the results of the C function strtod(_FillValue). You may use the NCO operator ncatted to change the _FillValue attribute and all data whose data is _FillValue to a new value (see ncatted netCDF Attribute Editor).

  3. Identifying missing data during arithmetic operations.

    When an NCO arithmetic operator processes a variable var with a _FillValue attribute, it compares each value of var to _FillValue before performing an operation. Note the _FillValue comparison imposes a performance penalty on the operator. Arithmetic processing of variables which contain the _FillValue attribute always incurs this penalty, even when none of the data are missing. Conversely, arithmetic processing of variables which do not contain the _FillValue attribute never incurs this penalty. In other words, do not attach a _FillValue attribute to a variable which does not contain missing data. This exhortation can usually be obeyed for model generated data, but it may be harder to know in advance whether all observational data will be valid or not.

  4. Treatment of any data identified as missing in arithmetic operators.

    NCO averagers (ncra, ncea, ncwa) do not count any element with the value _FillValue towards the average. ncbo and ncflint define a _FillValue result when either of the input values is a _FillValue. Sometimes the _FillValue may change from file to file in a multi-file operator, e.g., ncra. NCO is written to account for this (it always compares a variable to the _FillValue assigned to that variable in the current file). Suffice it to say that, in all known cases, NCO does “the right thing”.

    It is impossible to determine and store the correct result of a binary operation in a single variable. One such corner case occurs when both operands have differing _FillValue attributes, i.e., attributes with different numerical values. Since the output (result) of the operation can only have one _FillValue, some information may be lost. In this case, NCO always defines the output variable to have the same _FillValue as the first input variable. Prior to performing the arithmetic operation, all values of the second operand equal to the second _FillValue are replaced with the first _FillValue. Then the arithmetic operation proceeds as normal, comparing each element of each operand to a single _FillValue. Comparing each element to two distinct _FillValue's would be much slower and would be no likelier to yield a more satisfactory answer. In practice, judicious choice of _FillValue values prevents any important information from being lost.


Footnotes

[1] The old functionality, i.e., where the ignored values are indicated by missing_value not _FillValue, may still be selected at NCO build time by compiling NCO with the token definition CPPFLAGS='-DNCO_MSS_VAL_SNG=missing_value'.

[2] For example, the DOE ARM program often uses att_type = NC_CHAR and _FillValue = ‘-99999.’.