Next: , Previous: Include files, Up: ncap2 netCDF Arithmetic Processor


4.1.16 sort methods

In ncap there are two ways to sort data. The first is a regular sort. This sorts ALL the elements of a variable or attribute without regard to any dimensions. The second method applies a sort map to a variable. To apply this sort map the size of the variable must be exactly divisible by the size of the sort map. The method sort(var_in,&var_map) is overloaded. The second optional argument is to supply a call-by-reference variable to hold the returned sort map.

     a1[$time]={10,2,3,4,6,5,7,3,4,1};
     a1_sort=sort(a1);
     print(a1_sort);
     // 1, 2, 3, 3, 4, 4, 5, 6, 7, 10;
     
     a2[$lon]={2,1,4,3};
     a2_sort=sort(a2,&a2_map);
     print(a2);
     // 1, 2, 3, 4
     print(a2_map);
     // 1, 0, 3, 2;

If the map variable does not exist prior to the sort() call, then it will be created with the same shape as the input variable and be of type NC_INT. If the map variable already exists, then the only restriction is that it be of at least the same size as the input variable. To apply a map use dsort(var_in,var_map).

     defdim("nlat",5);
     
     a3[$lon]={2,5,3,7};
     a4[$nlat,$lon]={
      1, 2, 3, 4,
      5, 6, 7, 8,
      9,10,11,12,
      13,14,15,16,
      17,18,19,20};
     
     a3_sort=sort(a3,&a3_map);
     print(a3_map);
     // 0, 2, 1, 3;
     
     a4_sort=dsort(a4,a3_map);
     print(a4_sort);
     // 1, 3, 2, 4,
     // 5, 7, 6, 8,
     // 9,11,10,12,
     // 13,15,14,16,
     // 17,19,18,20;
     
     a3_map2[$nlat]={4,3,0,2,1};
     
     a4_sort2=dsort(a4,a3_map2);
     print(a4_sort2);
     // 3, 5, 4, 2, 1
     // 8, 10, 9,7, 6,
     // 13,15,14,12,11,
     // 18,20,19,17,16

As in the above example you may create your own sort map. To sort in descending order, apply the reverse() method after the sort().

Here is an extended example of how to use ncap2 features to hyperslab an irregular region based on the values of a variable not a coordinate. The distinction is crucial: hyperslabbing based on dimensional indices or coordinate values is straightforward. Using the values of single or multi-dimensional variable to define a hyperslab is quite different.

     cat > ~/ncap2_foo.nco << 'EOF'
     // Purpose: Save irregular 1-D regions based on variable values
     
     // Included in NCO User's Guide at http://nco.sf.net/nco.html#sort
     
     /* NB: Single quotes around EOF above turn off shell parameter
         expansion in "here documents". This in turn prevents the
         need for protecting dollarsign characters in NCO scripts with
         backslashes when the script is cut-and-pasted (aka "moused")
         from an editor or e-mail into a shell console window */
     
     /* Copy coordinates and variable(s) of interest into RAM variable(s)
        Benefits:
        1. ncap2 defines writes all variables on LHS of expression to disk
           Only exception is RAM variables, which are stored in RAM only
           Repeated operations on regular variables takes more time,
           because changes are written to disk copy after every change.
           RAM variables are only changed in RAM so script works faster
           RAM variables can be written to disk at end with ram_write()
        2. Script permutes variables of interest during processing
           Safer to work with copies that have different names
           This discourages accidental, mistaken use of permuted versions
        3. Makes this script a more generic template:
           var_in instead of specific variable names everywhere */
     *var_in=one_dmn_rec_var;
     *crd_in=time;
     *dmn_in_sz=$time.size; // [nbr] Size of input arrays
     
     /* Create all other "intermediate" variables as RAM variables
        to prevent them from cluttering the output file.
        Mask flag and sort map are same size as variable of interest */
     *msk_flg=var_in;
     *srt_map=var_in;
     
     /* In this example we mask for all values evenly divisible by 3
        This is the key, problem-specific portion of the template
        Replace this where() condition by that for your problem
        Mask variable is Boolean: 1=Meets condition, 0=Fails condition */
     where(var_in % 3 == 0) msk_flg=1; elsewhere msk_flg=0;
     
     // print("msk_flg = ");print(msk_flg); // For debugging...
     
     /* The sort() routine is overloaded, and takes one or two arguments
        The second argument (optional) is the "sort map" (srt_map below)
        Pass the sort map by reference, i.e., prefix with an ampersand
        If the sort map does not yet exist, then it will be created and
        returned as an integer type the same shape as the input variable.
        The output of sort(), on the LHS, is a sorted version of the input
        msk_flg is not needed in its original order after sort()
        Hence we use msk_flg as both input to and output from sort()
        Doing this prevents the need to define a new, unneeded variable */
     msk_flg=sort(msk_flg,&srt_map);
     
     // Count number of valid points in mask by summing the one's
     *msk_nbr=msk_flg.total();
     
     // Define output dimension equal in size to number of valid points
     defdim("crd_out",msk_nbr);
     
     /* Now sort the variable of interest using the sort map and dsort()
        The output, on the LHS, is the input re-arranged so that all points
        meeting the mask condition are contiguous at the end of the array
        Use same srt_map to hyperslab multiple variables of the same shape
        Remember to apply srt_map to the coordinate variables */
     crd_in=dsort(crd_in,srt_map);
     var_in=dsort(var_in,srt_map);
     
     /* Hyperslab last msk_nbr values of variable(s) of interest */
     crd_out[crd_out]=crd_in((dmn_in_sz-msk_nbr):(dmn_in_sz-1));
     var_out[crd_out]=var_in((dmn_in_sz-msk_nbr):(dmn_in_sz-1));
     
     /* NB: Even though we created all variables possible as RAM variables,
        the original coordinate of interest, time, is written to the ouput.
        I'm not exactly sure why. For now, delete it from the output with:
        ncks -O -x -v time ~/foo.nc ~/foo.nc
        */
     EOF
     ncap2 -O -v -S ~/ncap2_foo.nco ~/nco/data/in.nc ~/foo.nc
     ncks -O -x -v time ~/foo.nc ~/foo.nc
     ncks ~/foo.nc

Here is an extended example of how to use ncap2 features to sort multi-dimensional arrays based on the coordinate values along a single dimension.

     cat > ~/ncap2_foo.nco << 'EOF'
     /* Purpose: Sort multi-dimensional array based on coordinate values
        This example sorts the variable three_dmn_rec_var(time,lat,lon)
        based on the values of the time coordinate. */
     
     // Included in NCO User's Guide at http://nco.sf.net/nco.html#sort
     
     // Randomize the time coordinate
     time=10.0*gsl_rng_uniform(time);
     //print("original randomized time =\n");print(time);
     
     /* The sort() routine is overloaded, and takes one or two arguments
        The first argument is a one dimensional array
        The second argument (optional) is the "sort map" (srt_map below)
        Pass the sort map by reference, i.e., prefix with an ampersand
        If the sort map does not yet exist, then it will be created and
        returned as an integer type the same shape as the input variable.
        The output of sort(), on the LHS, is a sorted version of the input */
     
     time=sort(time,&srt_map);
     //print("sorted time (ascending order) and associated sort map =\n");print(time);print(srt_map);
     
     /* sort() always sorts in ascending order
        The associated sort map therefore re-arranges the original,
        randomized time array into ascending order.
        There are two methods to obtain the descending order the user wants
        1) We could solve the problem in ascending order (the default)
        and then apply the reverse() method to re-arrange the results.
        2) We could change the sort map to return things in descending
        order of time and solve the problem directly in descending order. */
     
     // Following shows how to do method one:
     
     /* Expand the sort map to srt_map_3d, the size of the data array
        1. Use data array to provide right shape for the expanded sort map
        2. Coerce data array into an integer so srt_map_3d is an integer
        3. Multiply data array by zero so 3-d map elements are all zero
        4. Add the 1-d sort map to the 3-d sort map (NCO automatically resizes)
        5. Add the spatial (lat,lon) offsets to each time index
        6. de-sort using the srt_map_3d
        7. Use reverse to obtain descending in time order
        Loops could accomplish the same thing (exercise left for reader)
        However, loops are slow for large datasets */
     
     /* Following index manipulation requires understanding correspondence
        between 1-d (unrolled, memory order of storage) and access into that
        memory as a multidimensional (3-d, in this case) rectangular array.
        Key idea to understand is how dimensionality affects offsets */
     // Copy 1-d sort map into 3-d sort map
     srt_map_3d=(0*int(three_dmn_rec_var))+srt_map;
     // Multiply base offset by factorial of lesser dimensions
     srt_map_3d*=$lat.size*$lon.size;
     lon_idx=array(0,1,$lon);
     lat_idx=array(0,1,$lat)*$lon.size;
     lat_lon_idx[$lat,$lon]=lat_idx+lon_idx;
     srt_map_3d+=lat_lon_idx;
     
     print("sort map 3d =\n");print(srt_map_3d);
     
     // Use dsort() to "de-sort" the data using the sort map
     three_dmn_rec_var=dsort(three_dmn_rec_var,srt_map_3d);
     
     // Finally, reverse data so time coordinate is descending
     time=time.reverse($time);
     //print("sorted time (descending order) =\n");print(time);
     three_dmn_rec_var=three_dmn_rec_var.reverse($time);
     
     // Method two: Key difference is srt_map=$time.size-srt_map-1;
     EOF
     ncap2 -O -v -S ~/ncap2_foo.nco ~/nco/data/in.nc ~/foo.nc