Compare Paired Data Sets

Model verification and validation is supported in pysatModels through pyForecastTools. Many of the standard statistics used for these purposes can be run in a single go using the utility pysatModels.utils.compare.compare_model_and_inst(). The following example uses the paired data produced in example Match Data by Location. The matched_inst object at this point should display as shown below.

print(matched_inst)

pysat Instrument object
-----------------------
Platform: 'cnofs'
Name: 'ivm'
Tag: ''
Instrument id: ''

Data Processing
---------------
Cleaning Level: 'clean'
Data Padding: None
Keyword Arguments Passed to list_files: {}
Keyword Arguments Passed to load: {}
Keyword Arguments Passed to preprocess: {}
Keyword Arguments Passed to download: {}
Keyword Arguments Passed to list_remote_files: {}
Keyword Arguments Passed to clean: {}
Keyword Arguments Passed to init: {}
Custom Functions: 1 applied
    0: <function update_longitude at 0x12fcb8310>
     : Kwargs={'low': -180.0, 'lon_name': 'glon', 'high': 180.0}

Local File Statistics
---------------------
Number of files: 930
Date Range: 01 January 2009 --- 01 April 2015

Loaded Data Statistics
----------------------
Date: 01 January 2009
DOY: 001
Time range: 01 January 2009 00:00:00 --- 01 January 2009 23:59:59
Number of Times: 168596
Number of variables: 89

Variable Names:
RPAflag                         driftMeterflag                  ionVelocityX
                                    ...
ECISC_index1                    LVLHSC_index1                   dineof_model_equator_model_data

pysat Meta object
-----------------
Tracking 15 metadata values
Metadata for 92 standard variables
Metadata for 0 ND variables

Now, we need to convert this pysat.Instrument object to an xarray.Dataset. This conversion is needed to simplify the comparison analysis, since pysat.Instrument.data may be either xarray.Dataset or pandas.DataFrame objects. pysatModels uses xarray.Dataset as the base analysis class, because this class is best suited for modelled output. This can be easily done using pysatModels.utils.convert.convert_pysat_to_xarray(). Before we do that, though, we’re going to update the units of the modelled data. This is necessary for the comparison, since the pysatModels.utils.compare.compare_model_and_inst() tests to make sure the paired data have the same units. It can handle converting between different units of the same type, so we will specify that the modelled data is a velocity in cm/s, while the observations are a velocity measured in m/s.

from pysatModels.utils import convert

inst_data_keys = ['ionVelmeridional']
model_data_keys = ['dineof_model_equator_model_data']
matched_inst.meta[model_data_keys[0]] = {
    matched_inst.meta.labels.units: "cm/s"}
paired_data = convert.convert_pysat_to_xarray(matched_inst)
print(paired_data)

<xarray.Dataset>
Dimensions:                          (index: 168596)
Coordinates:
  * index                            (index) datetime64[ns] 2009-01-01T00:00:...
Data variables: (12/89)
  RPAflag                          (index) int16 4 4 4 4 4 4 4 ... 4 3 4 4 4 3
  driftMeterflag                   (index) int16 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
  ionVelocityX                     (index) float32 nan nan nan ... nan nan nan
  ionVelocityY                     (index) float32 633.8 589.8 ... 234.5 238.1
  ionVelocityZ                     (index) float32 106.3 105.6 ... 119.6 118.5
  vXvariance                       (index) float32 0.0 0.0 0.0 ... 0.0 0.0 0.0
  ...                               ...
  meridionalunitvectorX            (index) float32 -0.04526 ... 0.02975
  meridionalunitvectorY            (index) float32 -0.2083 -0.208 ... -0.392
  meridionalunitvectorZ            (index) float32 -0.977 -0.9771 ... -0.9195
  ECISC_index1                     (index) float64 nan nan nan ... nan nan nan
  LVLHSC_index1                    (index) float64 nan nan nan ... nan nan nan
  dineof_model_equator_model_data  (index) float64 -0.06359 ... -1.612

Now we can compare the paired model and observed data. The example below will only calculate the bias-related statistics, but there are several options for the method keyword argument that allow single or groups of statistics to be calculated in one call.

Note the statistical output is in the units of the observed data set. The stat_dict output is a dict with the observed data variable name(s) as the first set of keys and the requested statistics for each data type as a nested dict.

Not all of the statistics were appropriate for the data set, as indicated by the RuntimeWarning messages seen when running compare_model_and_inst(). The values show that, unsurprisingly, the random data from the test model file does not agree well with the C/NOFS meridional E x B drifts.