Next: , Previous: GSL least-squares fitting, Up: ncap2 netCDF Arithmetic Processor


4.1.22 GSL statistics

Wrappers for most of the GSL Statistical functions have been implemented. The GSL function names include a type specifier (except for type double functions). To obtain the equivalent NCO name simply remove the type specifier; then depending on the data type the appropriate GSL function is called. The weighed statistical functions e.g gsl_stats_wvariance() are only defined in GSL for floating point types; so your data must of type float or double otherwise ncap2 will emit an error message. To view the implemented functions use the shell command ncap2 -f|grep _stats

GSL Functions

     short gsl_stats_max (short data[], size_t stride, size_t n);
     double gsl_stats_int_mean (int data[], size_t stride, size_t n);
     double gsl_stats_short_sd_with_fixed_mean (short data[], size_t stride, size_t n, double mean);
     double gsl_stats_wmean (double w[], size_t wstride, double data[], size_t stride, size_t n);
     double gsl_stats_quantile_from_sorted_data (double sorted_data[], size_t stride, size_t n, double f) ;

Equivalent ncap2 wrapper functions

     short gsl_stats_max (var_data, data_stride, n);
     double gsl_stats_mean (var_data, data_stride, n);
     double gsl_stats_sd_with_fixed_mean (var_data, data_stride, n, var_mean);
     double gsl_stats_wmean (var_weight, weight_stride, var_data, data_stride, n, var_mean);
     double gsl_stats_quantile_from_sorted_data (var_sorted_data, data_stride, n, var_f) ;

GSL has no notion of missing values or dimensionality beyond one. If your data has missing values which you want ignored in the calculations then use the ncap2 built in aggregate functions( Methods and functions ). The GSL functions operate on a vector of values created from the var_data/stride/n arguments. The ncap wrappers check that there is no bounding error with regard to the size of the data and the final value in the vector.

Some examples

     a1[time]={1,2,3,4,5,6,7,8,9,10 };
     
     a1_avg=gsl_stats_mean(a1,1,10);
     print(a1_avg); // 5.5
     
     a1_var=gsl_stats_variance(a1,4,3);
     print(a1_var); // 16.0
     
     // bounding error, vector attempts to access element a1(10)
     a1_sd=gsl_stats_sd(a1,5,3);
     

For functions with the signature func_nm(var_data, data_stride, n) you can omit the second or third arguments. The default value for stride is 1. The default value for n is 1+ (data.size()-1)/stride

     // the following are equvalent
     n2=gsl_stats_max(a1,1,10)
     n2=gsl_stats_max(a1,1);
     n2=gsl_stats_max(a1);
     
     // the following are equivalent
     n3=gsl_stats_median_from_sorted_data(a1,2,5);
     n3=gsl_stats_median_from_sorted_data(a1,2);
     
     // the following are NOT equivalent
     n4=gsl_stats_kurtosis(a1,3,2);
     n4=gsl_stats_kurtosis(a1,3); //default n=4

The following example illustrates some of the weighted functions in action. The data is randomly generated. In this case the value of the weight for each datum is either 0.0 or 1.0

     defdim("r1",2000);
     data[r1]=1.0;
     
     // fill with ramdon numbers [0.0,10.0)
     data=10.0*gsl_rng_uniform(data);
     
     // Create a weighting var
     weight=(data>4.0);
     
     wmean=gsl_stats_wmean(weight,1,data,1,$r1.size);
     print(wmean);
     
     wsd=gsl_stats_wsd(weight,1,data,1,$r1.size);
     print(wsd);
     
     // number of values in data that are greater than 4
     weight_size=weight.total();
     print(weight_size);
     
     // print min/max of data
     dmin=data.gsl_stats_min();
     dmax=data.gsl_stats_max();
     print(dmin);print(dmax);