Utilities

Convert

Routines to support extracting pysat.Instrument data as xarray.Datasets.

pysatModels.utils.convert.convert_pysat_to_xarray(inst)[source]

Extract data from a model Instrument as a Dataset with metadata.

Parameters

inst (pysat.Instrument) – Model instrument object

Returns

inst_data – Dataset from pysat Instrument object or None if there is no data

Return type

xarray.Dataset or NoneType

pysatModels.utils.convert.load_model_xarray(ftime, model_inst=None, filename=None)[source]

Load and extract data from a model Instrument at the specified time.

Parameters
  • ftime (dt.datetime) – Desired time for model Instrument input

  • model_inst (pysat.Instrument) – Model instrument object

  • filename (str or NoneType) – Model filename, if the file is not include in the Instrument filelist. or a filename that requires time specification from ftime (default=None)

Returns

model_xarray – Dataset from pysat Instrument object or None if there is no data

Return type

xarray.Dataset or NoneType

Match

Routines to match modelled and observational data.

pysatModels.utils.match.collect_inst_model_pairs(start, stop, tinc, inst, inst_download_kwargs=None, model_load_rout=<function load_model_xarray>, model_load_kwargs=None, inst_clean_rout=None, inst_lon_name=None, mod_lon_name=None, lon_pos='end', inst_name=None, mod_name=None, mod_datetime_name=None, mod_time_name=None, mod_units=None, sel_name=None, time_method='min', pair_method='closest', method='linear', model_label='model', comp_clean='clean')[source]

Pair instrument and model data.

Parameters
  • start (dt.datetime) – Starting datetime

  • stop (dt.datetime) – Ending datetime

  • tinc (dt.timedelta) – Time incriment for model files

  • inst (pysat.Instrument) – Instrument object for which modelled data will be extracted

  • inst_download_kwargs (dict or NoneType) – Optional keyword arguments for downloading instrument data (default=None)

  • model_load_rout (func) – Routine to load model data into an xarray using datetime as argument input input and other necessary data as keyword arguments. If the routine requires a time-dependent filename, ensure that the load routine uses the datetime input to construct the correct filename, as done in load_model_xarray. (default=load_model_xarray)

  • model_load_kwargs (dict or NoneType) – Keyword arguments for the model loading routine. (default=None)

  • inst_clean_rout (func) – Routine to clean the instrument data. (default=None)

  • inst_lon_name (str) – variable name for instrument longitude

  • mod_lon_name (str) – variable name for model longitude

  • lon_pos (str or int) – Accepts zero-offset integer for list order or ‘end’ (default=’end’)

  • inst_name (list or NoneType) – List of names of the data series to use for determing instrument location. (default=None)

  • mod_name (list or NoneType) – List of names of the data series to use for determing model locations in the same order as inst_name. These must make up a regular grid. (default=None)

  • mod_datetime_name (str) – Name of the data series in the model Dataset containing datetime info

  • mod_time_name (str) – Name of the time coordinate in the model Dataset

  • mod_units (list or NoneType) – Units for each of the mod_name location attributes. Currently supports: rad/radian(s), deg/degree(s), h/hr(s)/hour(s), m, km, and cm. (default=None)

  • sel_name (list or NoneType) – list of names of modelled data indices to append to instrument object, or None to append all modelled data (default=None)

  • time_method (str) – Pair data using larger (max) or smaller (min) of the smallest instrument/model time increments (default=’min’)

  • pair_method (str) – Find all relevent pairs (‘all’) or just the closest pairs (‘closest’). (default=’closest’)

  • method (str) – Interpolation method. Supported are ‘linear’, ‘nearest’, and ‘splinef2d’. The last is only supported for 2D data and is not recommended here. (default=’linear’)

  • model_label (str) – name of model, used to identify interpolated data values in instrument (default=”model”)

  • comp_clean (str) – Clean level for the comparison data (‘clean’, ‘dusty’, ‘dirty’, ‘none’) (default=’clean’)

Returns

matched_inst

Instrument object with observational data from inst and paired

modelled data.

Return type

pysat.Instrument

Raises

ValueError – If input is incorrect

Note

Perform the data cleaning after finding the times and locations where the observations and model align.

Extract

Routines to extract observational-style data from model output.

pysatModels.utils.extract.extract_modelled_observations(inst, model, inst_name, mod_name, mod_datetime_name, mod_time_name, mod_units, sel_name=None, time_method='min', pair_method='closest', method='linear', model_label='model', model_units_attr='units')[source]

Extract instrument-aligned data from a modelled data set.

Parameters
  • inst (pysat.Instrument) – Instrument object for which modelled data will be extracted

  • model (xarray.Dataset) – Modelled data set

  • inst_name (array-like) – List of names of the data series to use for determining instrument location

  • mod_name (array-like) – List of names of the data series to use for determining model locations in the same order as inst_name. These must make up a regular grid.

  • mod_datetime_name (str) – Name of the data series in the model Dataset containing datetime info

  • mod_time_name (str) – Name of the time coordinate in the model Dataset

  • mod_units (list of strings) – Units for each of the mod_name location attributes. Currently supports: rad/radian(s), deg/degree(s), h/hr(s)/hour(s), m, km, and cm

  • sel_name (array-like or NoneType) – list of names of modelled data indices to append to instrument object, or None to append all modelled data (default=None)

  • time_method (str) – Pair data using larger (max) or smaller (min) of the smallest instrument/model time increments (default=’min’)

  • pair_method (str) – Find all relevent pairs (‘all’) or just the closest pairs (‘closest’). (default=’closest’)

  • method (str) – Interpolation method. Supported are ‘linear’, ‘nearest’, and ‘splinef2d’. The last is only supported for 2D data and is not recommended here. (default=’linear’)

  • model_label (str) – name of model, used to identify interpolated data values in instrument (default=”model”)

  • model_units_attr (str) – Attribute for model xarray values that contains units (default=’units’)

Returns

interp_data.keys() – List of keys of modelled data added to the instrument

Return type

list

Raises

ValueError – For incorrect input arguments

Notes

For best results, select clean instrument data after alignment with model

pysatModels.utils.extract.instrument_altitude_to_model_pressure(inst, model, inst_name, mod_name, mod_datetime_name, mod_time_name, mod_units, inst_alt, mod_alt, mod_alt_units, scale=100.0, inst_out_alt='model_altitude', inst_out_pres='model_pressure', tol=1.0)[source]

Interpolates altitude values onto model pressure levels.

Parameters
  • inst (pysat.Instrument) – Instrument object with observational data

  • model (xarray.Dataset) – Model data in xarray format

  • inst_name (array-like) – List of variable names containing the observational data coordinates at which the model data will be interpolated. Must be in the same order as mod_name.

  • mod_name (array-like) – list of names of the coordinates to use for determining model locations. Must be in the same order as mod_alt is stored within xarray. The coordinates must make up a regular grid.

  • mod_datetime_name (str) – Name of the data series in the model Dataset containing datetime info

  • mod_time_name (str) – Name of the time coordinate in the model Dataset

  • mod_units (list) – units for each of the mod_name location attributes. Currently supports: rad/radian(s), deg/degree(s), h/hr(s)/hour(s), m, km, and cm

  • inst_alt (str) – String identifier used in inst for the altitude variable

  • mod_alt (str) – Variable identifier for altitude data in the model e.g. ‘ZG’ in standard TIEGCM files.

  • mod_alt_units (str) – units for the altitude variable. Currently supports: m, km, and cm

  • scale (float) – Scalar used to roughly translate a change in altitude with a change in pressure level, the scale height. Same units as used by inst. (default=100.)

  • inst_out_alt (str) – Label assigned to the model altitude data when attached to inst (default=’model_altitude’).

  • inst_out_pres (str) – Label assigned to the model pressure level when attached to inst (default=’model_pressure’).

  • tol (float) – Allowed difference between observed and modelled altitudes. Interpreted to have the same units as inst_alt (default=1.0).

Returns

[inst_out_alt, inst_out_pres] – List of keys corresponding to the modelled data that was added to the instrument.

Return type

list

Raises

ValueError – For incorrect input arguments

Notes

Uses an iterative regular grid interpolation to find the appropriate pressure level for the given input locations.

pysatModels.utils.extract.instrument_view_through_model(inst, model, inst_name, mod_name, mod_datetime_name, mod_time_name, mod_units, sel_name=None, methods=['linear'], model_label='model')[source]

Interpolates model values onto instrument locations.

Parameters
  • inst (pysat.Instrument) – Instrument object with observational data

  • model (xarray.Dataset) – Modelled data

  • inst_name (array-like) – List of variable names containing the observational data coordinates at which the model data will be interpolated. Do not include ‘time’, only spatial coordinates.

  • mod_name (array-like) – List of model dimension names used for organizing model data in the same order as inst_name. These must make up a regular grid. Do not include ‘time’, only spatial dimensions.

  • mod_datetime_name (str) – Name of the data series in the model Dataset containing datetime info.

  • mod_time_name (str) – Name of the time coordinate in the model Dataset.

  • mod_units (list) – Units for each of the mod_name location dimensions. Currently supports: rad/radian(s), deg/degree(s), h/hr(s)/hour(s), m, km, and cm

  • sel_name (array-like or NoneType) – List of names of modelled data indices to append to Instrument object, or None to append all modelled data. (default=None)

  • methods (str) – ‘linear’ interpolation or ‘nearest’ neighbor options for RegularGrid. Must supply an option for each variable. (default=[‘linear’])

  • model_label (str) – Name of model, used to identify interpolated data values in instrument (default=”model”)

Returns

interp_data.keys() – Keys of modelled data added to the instrument

Return type

Keys

Raises

ValueError – For incorrect input arguments

Note

Updates the inst Instrument with interpolated data from the model Instrument. The interpolation is performed via the RegularGridInterpolator for quick performance.

This method may require the use of a pre-processor on coordinate dimensions to ensure that a regular interpolation may actually be performed.

Models, such as TIEGCM, have a regular grid in pressure, not in altitude. To use this routine for TIEGCM please use instrument_altitude_to_model_pressure first to transform instrument altitudes to pressure levels suitable for this method.

Variables that vary exponentially in height may be approximated by taking a log before interpolating, though this does also yield an exponential variation along the horizontal directions as well.

Expects units strings to have the units as the first word, if a long description is provided (e.g., ‘degrees’, ‘degrees North’, or ‘deg_N’ and not ‘geographic North degrees’)

See also

pysat.utils.scale_units

pysatModels.utils.extract.interp_inst_w_irregular_model_coord(inst, model, inst_name, mod_name, mod_datetime_name, mod_units, mod_reg_dim, mod_irreg_var, mod_var_delta, sel_name=None, model_label='model')[source]

Interpolate irregular-coordinate model data onto Instrument path.

Parameters
  • inst (pysat.Instrument) – pysat object that will receive interpolated data based upon position.

  • model (pysat.Instrument) – Xarray pysat Instrument with model data that will be interpolated onto the inst locations.

  • inst_name (list) – List of variable names containing the instrument data coordinates at which the model data will be interpolated. Do not include ‘time’, only spatial coordinates. Same ordering as used by mod_name.

  • mod_name (list) – List of names of the data dimensions used to organize model data, in the same order as inst_name. These dimensions must make up a regular grid. Values from mod_irreg_var will be used to replace one of these regular dimensions, mod_reg_dim, with irregular values.

  • mod_datetime_name (str) – Name of the data series in the model Dataset containing datetime info.

  • mod_units (list) – Units for each of the mod_name dimensions. Users must provide units for mod_irreg_var’ instead of the units for `mod_reg_dim. Currently supports: rad/radian(s), deg/degree(s), h/hr(s)/hour(s), m, km, and cm.

  • mod_reg_dim (str) – Existing regular dimension name (must be in mod_name) used to organize model data that will be replaced with values from mod_irreg_var before performing interpolation.

  • mod_irreg_var (str) – Variable name in model used to define the irregular grid value locations along mod_reg_dim. Must have same dimensions as mod_name.

  • mod_var_delta (list) – List of delta values to be used when downselecting model values before interpolation, max(min(inst) - delta, min(model)) <= val <= min(max(inst) + delta, max(model)). Interpreted in the same order as mod_name.

  • sel_name (list) – List of strings denoting model variable names that will be interpolated onto inst. The coordinate dimensions for these variables must correspond to those in mod_irreg_var.

  • model_label (str) – Name of model, used to identify interpolated data values in instrument (default=”model”)

Returns

output_names – Keys of interpolated model data added to the instrument

Return type

list

Raises

ValueError – For incorrect input arguments

Notes

Expects units strings to have the units as the first word, if a long description is provided (e.g., ‘degrees’, ‘degrees North’, or ‘deg_N’ and not ‘geographic North degrees’).

See also

pysat.utils.scale_units

Compare

Routines to align and work with pairs of modelled and observational data.

pysatModels.utils.compare.compare_model_and_inst(pairs, inst_name, mod_name, methods=['all'], unit_label='units')[source]

Compare modelled and measured data.

Parameters
  • pairs (xarray.Dataset) – Dataset containing only the desired observation-model data pairs

  • inst_name (list) –

    Ordered list of strings indicating whicch instrument measurements to

    compare to modelled data

  • mod_name (list) – Ordered list of strings indicating which modelled data to compare to instrument measurements

  • methods (list) – Statistics to calculate. See Notes for accecpted inputs. (default=[‘all’])

  • unit_label (str) – Unit attribute for data in pairs (default=’units’)

Returns

  • stat_dict (dict) – Dict of dicts where the first layer of keys denotes the instrument data name and the second layer provides the desired statistics

  • data_units (dict) – Dict containing the units for the data

Raises

ValueError – If input parameters are improperly formatted

See also

PyForecastTools

Notes

Statistics are calculated using PyForecastTools (imported as verify).

  1. all: all statistics

  2. all_bias: bias, meanPercentageError, medianLogAccuracy, symmetricSignedBias

  3. accuracy: returns dict with mean squared error, root mean squared error, mean absolute error, and median absolute error

  4. scaledAccuracy: returns dict with normaled root mean squared error, mean absolute scaled error, mean absolute percentage error, median absolute percentage error, median symmetric accuracy

  5. bias: scale-dependent bias as measured by the mean error

  6. meanPercentageError: mean percentage error

  7. medianLogAccuracy: median of the log accuracy ratio

  8. symmetricSignedBias: Symmetric signed bias, as a percentage

  9. meanSquaredError: mean squared error

  10. RMSE: root mean squared error

  11. meanAbsError: mean absolute error

  12. medAbsError: median absolute error

  13. nRMSE: normaized root mean squared error

  14. scaledError: scaled error (see PyForecastTools for references)

  15. MASE: mean absolute scaled error

  16. forecastError: forecast error (see PyForecastTools for references)

  17. percError: percentage error

  18. absPercError: absolute percentage error

  19. logAccuracy: log accuracy ratio

  20. medSymAccuracy: Scaled measure of accuracy

  21. meanAPE: mean absolute percentage error

  22. medAPE: median absolute perceentage error

Model Methods

General

General functions for model instruments.

pysatModels.models.methods.general.clean(inst)[source]

Raise a low-level log message about lack of cleaning.

pysatModels.models.methods.general.download_test_data(remote_url, remote_file, data_path, test_date=None, format_str=None)[source]

Download test data from an online repository.

Parameters
  • remote_url (str) – URL of the target repository, including the path to the test file

  • remote_file (str) – Remote file name

  • data_path (str) – Path to directory where local file will be stored

  • test_date (dt.datetime or NoneType) – Datetime for which the test file will be assigned, does not need to correspond to the test model run time. Only used if format_str is also provided. (default=None)

  • format_str (str or NoneType) – Format string to construct a pysat-compatible filename or None to not change the filename (default=None)

Note

This routine is invoked by pysat and is not intended for direct use by the end user.

The test object generates the datetime requested by the user, which may not match the date of the model run.