# API Reference¶

Here we provide the reference documentation for aospy’s public API. If you are new to the package and/or just trying to get a feel for the overall workflow, you are better off starting in the Overview, Using aospy, or Examples sections of this documentation.

Warning

aospy is under active development. While we strive to maintain backwards compatibility, it is likely that some breaking changes to the codebase will occur in the future as aospy is improved.

## Core Hierarchy for Input Data¶

aospy provides three classes for specifying the location and characteristics of data saved on disk as netCDF files that the user wishes to use as input data for aospy calculations: Proj, Model, and Run.

### Proj¶

class aospy.proj.Proj(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]

An object that describes a single project that will use aospy.

This is the top-level class in the aospy hierarchy of data representations. It is meant to contain all of the Model, Run, and Region objects that are of relevance to a particular research project. (Any of these may be used by multiple Proj objects.)

The Proj class itself provides little functionality, but it is an important means of organizing a user’s work across different projects. In particular, the output of all calculations using aospy.Calc are saved in a directory structure whose root is that of the Proj object specified for that calculation.

Attributes

 name (str) The run’s name description (str) A description of the run direc_out, tar_direc_out (str) The paths to the root directories of, respectively, the standard and .tar versions of the output of aospy calculations saved to disk. models (dict) A dictionary with entries of the form {model_obj.name: model_obj}, for each of this Proj’s child model objects default_models (dict) The default model objects on which to perform calculations via aospy.Calc if not otherwise specified regions (dict) A dictionary with entries of the form {regin_obj.name: region_obj}, for each of this Proj’s child region objects
__init__(name, description=None, models=None, default_models=None, regions=None, direc_out='', tar_direc_out='')[source]
Parameters: name : str The project’s name. This should be unique from that of any other Proj objects being used. description : str, optional A description of the model. This is not used internally by aospy; it is solely for the user’s information. regions : {None, sequence of aospy.Region objects}, optional The desired regions over which to perform region-average calculations. models : {None, sequence of aospy.Model objects}, optional The child Model objects of this project. default_models : {None, sequence of aospy.Run objects}, optional The subset of this Model’s runs over which to perform calculations by default. direc_out, tar_direc_out : str Path to the root directories of where, respectively, regular output and a .tar-version of the output will be saved to disk.

aospy.Model, aospy.Region, aospy.Run

### Model¶

class aospy.model.Model(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False)[source]

An object that describes a single climate or weather model.

Each Model object is associated with a parent Proj object and also with one or more child Run objects.

If aospy is being used to work with non climate- or weather-model data, the Model object can be used e.g. to represent a gridded observational product, with its child Run objects representing different released versions of that dataset.

Attributes

 name (str) The model’s name description (str) A description of the model proj ({None, aospy.Proj}) The model’s parent aospy.Proj object runs (list) A list of this model’s child Run objects default_runs (list) The default subset of child run objects on which to perform calculations via aospy.Calc with this model if not otherwise specified grid_file_paths (list) The paths to netCDF files stored on disk from which the model’s coordinate data can be taken. default_start_date, default_end_date (datetime.datetime) The default start and end dates of any calculations using this Model
__init__(name=None, description=None, proj=None, grid_file_paths=None, default_start_date=None, default_end_date=None, runs=None, default_runs=None, load_grid_data=False)[source]
Parameters: name : str The model’s name. This must be unique from that of any other Model objects being used by the parent Proj. description : str, optional A description of the model. This is not used internally by aospy; it is solely for the user’s information. proj : {None, aospy.Proj}, optional The parent Proj object. When the parent Proj object is instantiated with this Model included in its models attribute, this will be over-written with that Proj object. grid_file_paths : {None, sequence of strings}, optional The paths to netCDF files stored on disk from which the model’s coordinate data can be taken. default_start_date, default_end_date : {None, datetime.datetime}, optional Default start and end dates of calculations to be performed using this Model. runs : {None, sequence of aospy.Run objects}, optional The child run objects of this Model default_runs : {None, sequence of aospy.Run objects}, optional The subset of this Model’s runs over which to perform calculations by default. load_grid_data : bool, optional (default False) Whether or not to load the grid data specified by ‘grid_file_paths’ upon initilization

aospy.DataLoader, aospy.Proj, aospy.Run

set_grid_data()[source]

Populate the attrs that hold grid data.

### Run¶

class aospy.run.Run(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]

An object that describes a single model ‘run’ (i.e. simulation).

Each Run object is associated with a parent Model object. This parent attribute is not set by Run itself, however; it is set during the instantation of the parent Model object.

If aospy is being used to work with non climate-model data, the Run object can be used e.g. to represent different versions of a gridded observational data product, with the parent Model representing that data product more generally.

Attributes

 name (str) The run’s name description (str) A description of the run proj ({None, aospy.Proj}) The run’s parent aospy.Proj object default_start_date, default_end_date (datetime.datetime) The default start and end dates of any calculations using this Run data_loader (aospy.DataLoader) The aospy.DataLoader object used to find data on disk corresponding to this Run object
__init__(name=None, description=None, proj=None, default_start_date=None, default_end_date=None, data_loader=None)[source]

Instantiate a Run object.

Parameters: name : str The run’s name. This must be unique from that of any other Run objects being used by the parent Model. description : str, optional A description of the model. This is not used internally by aospy; it is solely for the user’s information. proj : {None, aospy.Proj}, optional The parent Proj object. data_loader : aospy.DataLoader The DataLoader object used to find the data on disk to be used as inputs for aospy calculations for this run. default_start_date, default_end_date : datetime.datetime, optional Default start and end dates of calculations to be performed using this Model.

aospy.DataLoader, aospy.Model

Run objects rely on a helper “data loader” to specify how to find their underlying data that is saved on disk. This mapping of variables, time ranges, and potentially other parameters to the location of the corresponding data on disk can differ among modeling centers or even between different models at the same center.

Currently supported data loader types are DictDataLoader, NestedDictDataLoader, and GFDLDataLoader Each of these inherit from the abstract base DataLoader class.

class aospy.data_loader.DataLoader[source]

__init__()

Initialize self. See help(type(self)) for accurate signature.

load_variable(var=None, start_date=None, end_date=None, time_offset=None, **DataAttrs)[source]

Load a DataArray for requested variable and time range.

Automatically renames all grid attributes to match aospy conventions.

Parameters: var : Var aospy Var object start_date : datetime.datetime start date for interval end_date : datetime.datetime end date for interval time_offset : dict Option to add a time offset to the time coordinate to correct for incorrect metadata. **DataAttrs Attributes needed to identify a unique set of files to load from da : DataArray DataArray for the specified variable, date range, and interval in
class aospy.data_loader.DictDataLoader(file_map=None, preprocess_func=<function DictDataLoader.<lambda>>)[source]

A DataLoader that uses a dict mapping lists of files to string tags

This is the simplest DataLoader; it is useful for instance if one is dealing with raw model history files, which tend to group all variables of a single output interval into single filesets. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value.

Parameters: file_map : dict A dict mapping an input interval to a list of files preprocess_func : function (optional) A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case of two sets of files, one with monthly average output, and one with 3-hourly output.

>>> file_map = {'monthly': '000[4-6]0101.atmos_month.nc',
...             '3hr': '000[4-6]0101.atmos_8xday.nc'}


If one wanted to correct a CF-incompliant units attribute on each Dataset read in, which depended on the intvl_in of the fileset one could define a preprocess_func which took into account the intvl_in keyword argument.

>>> def preprocess(ds, **kwargs):
...     if kwargs['intvl_in'] == 'monthly':
...         ds['time'].attrs['units'] = 'days since 0001-01-0000'
...     if kwargs['intvl_in'] == '3hr':
...         ds['time'].attrs['units'] = 'hours since 0001-01-0000'
...     return ds

__init__(file_map=None, preprocess_func=<function DictDataLoader.<lambda>>)[source]

class aospy.data_loader.NestedDictDataLoader(file_map=None, preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]

This is the most flexible existing type of DataLoader; it allows for the specification of different sets of files for different variables. The intvl_in parameter is a string description of the time frequency of the data one is referencing (e.g. ‘monthly’, ‘daily’, ‘3-hourly’). In principle, one can give it any string value. The variable name can be any variable name in your aospy object library (including alternative names).

Parameters: file_map : dict A dict mapping intvl_in to dictionaries mapping Var objects to lists of files preprocess_func : function (optional) A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case of a set of monthly average files for large scale precipitation, and another monthly average set of files for convective precipitation.

>>> file_map = {'monthly': {'precl': '000[4-6]0101.precl.nc',
...                         'precc': '000[4-6]0101.precc.nc'}}


See aospy.data_loader.DictDataLoader for an example of a possible function to pass as a preprocess_func.

__init__(file_map=None, preprocess_func=<function NestedDictDataLoader.<lambda>>)[source]

class aospy.data_loader.GFDLDataLoader(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, preprocess_func=<function GFDLDataLoader.<lambda>>)[source]

DataLoader for NOAA GFDL model output

This is an example of a domain-specific custom DataLoader, designed specifically for finding files output by the Geophysical Fluid Dynamics Laboratory’s model history file post-processing tools.

Parameters: template : GFDLDataLoader Optional argument to specify a base GFDLDataLoader to inherit parameters from data_direc : str Root directory of data files data_dur : int Number of years included per post-processed file data_start_date : datetime.datetime Start date of data files data_end_date : datetime.datetime End date of data files preprocess_func : function (optional) A function to apply to every Dataset before processing in aospy. Must take a Dataset and **kwargs as its two arguments.

Examples

Case without a template to start from.

>>> base = GFDLDataLoader(data_direc='/archive/control/pp', data_dur=5,
...                       data_start_date=datetime(2000, 1, 1),
...                       data_end_date=datetime(2010, 12, 31))


Case with a starting template.

>>> data_loader = GFDLDataLoader(base, data_direc='/archive/2xCO2/pp')


See aospy.data_loader.DictDataLoader for an example of a possible function to pass as a preprocess_func.

__init__(template=None, data_direc=None, data_dur=None, data_start_date=None, data_end_date=None, preprocess_func=<function GFDLDataLoader.<lambda>>)[source]

## Variables and Regions¶

The Var and Region classes are used to represent, respectively, physical quantities the user wishes to be able to compute and geographical regions over which the user wishes to aggregate their calculations.

Whereas the Proj - Model - Run hierarchy is used to describe the data resulting from particular model simulations, Var and Region represent the properties of generic physical entities that do not depend on the underlying data.

### Var¶

class aospy.var.Var(name, alt_names=None, func=None, variables=None, func_input_dtype='DataArray', units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]

An object representing a physical quantity to be computed.

Attributes

 name (str) The variable’s name alt_names (tuple of strings) All other names that the variable may be referred to in the input data names (tuple of strings) The combination of name and alt_names description (str) A description of the variable func (function) The function with which to compute the variable variables (sequence of aospy.Var objects) The variables passed to func to compute it func_input_dtype ({‘DataArray’, ‘Dataset’, ‘numpy’}) The datatype expected by func of its arguments units (aospy.units.Units object) The variable’s physical units domain (str) The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’ def_time, def_vert, def_lat, def_lon (bool) Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude math_str (str) The mathematical representation of the variable colormap (str) The name of the default colormap to be used in plots of this variable valid_range (length-2 tuple) The range of values outside which to flag as unphysical/erroneous
__init__(name, alt_names=None, func=None, variables=None, func_input_dtype='DataArray', units='', plot_units='', plot_units_conv=1, domain='atmos', description='', def_time=False, def_vert=False, def_lat=False, def_lon=False, math_str=False, colormap='RdBu_r', valid_range=None)[source]

Instantiate a Var object.

Parameters: name : str The variable’s name alt_names : tuple of strings All other names that the variable might be referred to in any input data. Each of these should be unique to this variable in order to avoid loading the wrong quantity. description : str A description of the variable func : function The function with which to compute the variable variables : sequence of aospy.Var objects The variables passed to func to compute it. Order matters: whenever calculations are performed to generate data corresponding to this Var, the data corresponding to the elements of variables will be passed to self.function in the same order. func_input_dtype : {None, ‘DataArray’, ‘Dataset’, ‘numpy’} The datatype expected by func of its arguments units : aospy.units.Units object The variable’s physical units domain : str The physical domain of the variable, e.g. ‘atmos’, ‘ocean’, or ‘land’. This is only used by aospy by some types of DataLoader, including GFDLDataLoader. def_time, def_vert, def_lat, def_lon : bool Whether the variable is defined, respectively, in time, vertically, in latitude, and in longitude math_str : str The mathematical representation of the variable. This is typically a raw string of LaTeX math-mode, e.g. r’$T_mathrm{sfc}$’ for surface temperature. colormap : str (Currently not used by aospy) The name of the default colormap to be used in plots of this variable. valid_range : length-2 tuple The range of values outside which to flag as unphysical/erroneous
mask_unphysical(data)[source]

Mask data array where values are outside physically valid range.

to_plot_units(data, dtype_vert=False)[source]

Convert the given data to plotting units.

### Region¶

class aospy.region.Region(name='', description='', lon_bounds=[], lat_bounds=[], mask_bounds=[], do_land_mask=False)[source]

Geographical region over which to perform averages and other reductions.

Each Proj object includes a list of Region objects, which is used by Calc to determine which regions over which to perform time reductions over region-average quantities.

Region boundaries are specified as either a single “rectangle” in latitude and longitude or the union of multiple such rectangles. In addition, a land or ocean mask can be applied.

aospy.Calc.region_calcs

Attributes

 name (str) The region’s name description (str) A description of the region mask_bounds (tuple) The coordinates definining the lat-lon rectangle(s) that define(s) the region’s boundaries do_land_mask Whether values occurring over land, ocean, or neither are excluded from the region, and whether the included points must be strictly land or ocean or if fractional land/ocean values are included.
__init__(name='', description='', lon_bounds=[], lat_bounds=[], mask_bounds=[], do_land_mask=False)[source]

Instantiate a Region object.

If a region spans across the endpoint of the data’s longitude array (i.e. it crosses the Prime Meridian for data with longitudes spanning 0 to 360), it must be defined as the union of two sections extending to the east and to the west of the Prime Meridian.

Examples

Define a region spanning the entire globe

>>> globe = Region(name='globe', lat_bounds=(-90, 90),


Define a region corresponding to the Sahel region of Africa, which we’ll define as land points within 10N-20N latitude and 18W-40E longitude. Because this region crosses the 0 degrees longitude point, it has to be defined using mask_bounds as the union of two lat-lon rectangles.

>>> sahel = Region(name='sahel', do_land_mask=True,
...                             ((10, 20), (342, 360))])

av(data)[source]

Time average of region-average time-series.

mask_var(data)[source]

Mask the data of the given variable outside the region.

std(data)[source]

Standard deviation of region-average time-series.

ts(data)[source]

Create time-series of region-average data.

## Calculations¶

Calc is the engine that combines the user’s specifications of (1) the data on disk via Proj, Model, and Run, (2) the physical quantity to compute and regions to aggregate over via Var and Region, and (3) the desired date range, time reduction method, and other characteristics to actually perform the calculation

Whereas Proj, Model, Run, Var, and Region are all intended to be saved in .py files for reuse, Calc objects are intended to be generated dynamically by a main script and then not retained after they have written their outputs to disk following the user’s specifications.

Moreover, if the main.py script is used to execute calculations, no direct interfacing with Calc or it’s helper class, CalcInterface is required by the user, in which case this section should be skipped entirely.

Also included is the automate module, which enables aospy e.g. in the main script to find objects in the user’s object library that the user specifies via their string names rather than having to import the objects themselves.

### CalcInterface and Calc¶

class aospy.calc.CalcInterface(proj=None, model=None, run=None, ens_mem=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]

Interface to the Calc class.

__init__(proj=None, model=None, run=None, ens_mem=None, var=None, date_range=None, region=None, intvl_in=None, intvl_out=None, dtype_in_time=None, dtype_in_vert=None, dtype_out_time=None, dtype_out_vert=None, level=None, time_offset=None)[source]

Instantiate a CalcInterface object.

Parameters: proj : aospy.Proj object The project for this calculation. model : aospy.Model object The model for this calculation. run : aospy.Run object The run for this calculation. var : aospy.Var object The variable for this calculation. ens_mem : Currently not supported. This will eventually be used to specify particular ensemble members of multi-member ensemble simulations. region : sequence of aospy.Region objects The region(s) over which any regional reductions will be performed. date_range : tuple of datetime.datetime objects The range of dates over which to perform calculations. intvl_in : {None, ‘annual’, ‘monthly’, ‘daily’, ‘6hr’, ‘3hr’}, optional The time resolution of the input data. dtype_in_time : {None, ‘inst’, ‘ts’, ‘av’, ‘av_ts’}, optional What the time axis of the input data represents: ‘inst’ : Timeseries of instantaneous values ‘ts’ : Timeseries of averages over the period of each time-index ‘av’ : A single value averaged over a date range dtype_in_vert : {None, ‘pressure’, ‘sigma’}, optional The vertical coordinate system used by the input data: None : not defined vertically ‘pressure’ : pressure coordinates ‘sigma’ : hybrid sigma-pressure coordinates intvl_out : {‘ann’, season-string, month-integer} The sub-annual time interval over which to compute: ‘ann’ : Annual mean season-string : E.g. ‘JJA’ for June-July-August month-integer : 1 for January, 2 for February, etc. dtype_out_time : tuple with elements being one or more of: Gridpoint-by-gridpoint output: ‘av’ : Gridpoint-by-gridpoint time-average ‘std’ : Gridpoint-by-gridpoint temporal standard deviation ‘ts’ : Gridpoint-by-gridpoint time-series Averages over each region specified via region: ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’ dtype_out_vert : {None, ‘vert_av’, ‘vert_int’}, optional How to reduce the data vertically: None : no vertical reduction (i.e. output is defined vertically) ‘vert_av’ : mass-weighted vertical average ‘vert_int’ : mass-weighted vertical integral time_offset : {None, dict}, optional How to offset input data in time to correct for metadata errors None : no time offset applied dict : e.g. {'hours': -3} to offset times by -3 hours See aospy.utils.times.apply_time_offset().
class aospy.calc.Calc(calc_interface)[source]

Calc objects are instantiated with a single argument: a CalcInterface object that includes all of the parameters necessary to determine what calculations to perform.

__init__(calc_interface)[source]
ARR_XARRAY_NAME = 'aospy_result'
compute(write_to_tar=True)[source]

Perform all desired calculations on the data and save externally.

load(dtype_out_time, dtype_out_vert=False, region=False, time=False, vert=False, lat=False, lon=False, plot_units=False, mask_unphysical=False)[source]

Load the data from the object if possible or from disk.

region_calcs(arr, func)[source]

Perform a calculation for all regions.

save(data, dtype_out_time, dtype_out_vert=False, save_files=True, write_to_tar=False)[source]

Save aospy data to data_out attr and to an external file.

### automate¶

Functionality for specifying and cycling through multiple calculations.

exception aospy.automate.AospyException[source]

Base exception class for the aospy package.

class aospy.automate.CalcSuite(calc_suite_specs)[source]

Suite of Calc objects generated from provided specifications.

create_calcs()[source]

Generate a Calc object for each requested parameter combination.

aospy.automate.submit_mult_calcs(calc_suite_specs, exec_options=None)[source]

Generate and execute all specified computations.

Once the calculations are prepped and submitted for execution, any calculation that triggers any exception or error is skipped, and the rest of the calculations proceed unaffected. This prevents an error in a single calculation from crashing a large suite of calculations.

Parameters: calc_suite_specs : dict The specifications describing the full set of calculations to be generated and potentially executed. Accepted keys and their values: library : module or package comprising an aospy object library The aospy object library for these calculations. projects : list of aospy.Proj objects The projects to permute over. models : ‘all’, ‘default’, or list of aospy.Model objects The models to permute over. If ‘all’, use all models in the models attribute of each Proj. If ‘default’, use all models in the default_models attribute of each Proj. runs : ‘all’, ‘default’, or list of aospy.Run objects The runs to permute over. If ‘all’, use all runs in the runs attribute of each Model. If ‘default’, use all runs in the default_runs attribute of each Model. variables : list of aospy.Var objects The variables to be calculated. regions : ‘all’ or list of aospy.Region objects The region(s) over which any regional reductions will be performed. If ‘all’, use all regions in the regions attribute of each Proj. date_ranges : ‘default’ or tuple of datetime.datetime objects The range of dates (inclusive) over which to perform calculations. If ‘default’, use the default_start_date and default_end_date attribute of each Run. output_time_intervals : {‘ann’, season-string, month-integer} The sub-annual time interval over which to aggregate. ‘ann’ : Annual mean season-string : E.g. ‘JJA’ for June-July-August month-integer : 1 for January, 2 for February, etc. Each one is a separate reduction, e.g. [1, 2] would produce averages (or other specified time reduction) over all Januaries, and separately over all Februaries. output_time_regional_reductions : list of reduction string identifiers Unlike most other keys, these are not permuted over when creating the aospy.Calc objects that execute the calculations; each aospy.Calc performs all of the specified reductions. Accepted string identifiers are: Gridpoint-by-gridpoint output: ‘av’ : Gridpoint-by-gridpoint time-average ‘std’ : Gridpoint-by-gridpoint temporal standard deviation ‘ts’ : Gridpoint-by-gridpoint time-series Averages over each region specified via region: ‘reg.av’, ‘reg.std’, ‘reg.ts’ : analogous to ‘av’, ‘std’, ‘ts’ output_vertical_reductions : {None, ‘vert_av’, ‘vert_int’}, optional How to reduce the data vertically: None : no vertical reduction ‘vert_av’ : mass-weighted vertical average ‘vert_int’ : mass-weighted vertical integral input_time_intervals : {‘annual’, ‘monthly’, ‘daily’, ‘#hr’} A string specifying the time resolution of the input data. In ‘#hr’ above, the ‘#’ stands for a number, e.g. 3hr or 6hr, for sub-daily output. These are the suggested specifiers, but others may be used if they are also used by the DataLoaders for the given Runs. input_time_datatypes : {‘inst’, ‘ts’, ‘av’} What the time axis of the input data represents: ‘inst’ : Timeseries of instantaneous values ‘ts’ : Timeseries of averages over the period of each time-index ‘av’ : A single value averaged over a date range input_vertical_datatypes : {False, ‘pressure’, ‘sigma’}, optional The vertical coordinate system used by the input data: False : not defined vertically ‘pressure’ : pressure coordinates ‘sigma’ : hybrid sigma-pressure coordinates input_time_offsets : {None, dict}, optional How to offset input data in time to correct for metadata errors None : no time offset applied dict : e.g. {'hours': -3} to offset times by -3 hours See aospy.utils.times.apply_time_offset(). exec_options : dict or None (default None) Options regarding how the calculations are reported, submitted, and saved. If None, default settings are used for all options. Currently supported options (each should be either True or False): prompt_verify : (default False) If True, print summary of calculations to be performed and prompt user to confirm before submitting for execution. parallelize : (default False) If True, submit calculations in parallel. client : distributed.Client or None (default None) The dask.distributed Client used to schedule computations. If None and parallelize is True, a LocalCluster will be started. write_to_tar : (default True) If True, write results of calculations to .tar files, one for each aospy.Run object. These tar files have an identical directory structures the standard output relative to their root directory, which is specified via the tar_direc_out argument of each Proj object’s instantiation. A list of the return values from each aospy.Calc.compute() call If a calculation ran without error, this value is the aospy.Calc object itself, with the results of its calculations saved in its data_out attribute. data_out is a dictionary, with the keys being the temporal-regional reduction identifiers (e.g. ‘reg.av’), and the values being the corresponding result. If any error occurred during a calculation, the return value is None. AospyException If the prompt_verify option is set to True and the user does not respond affirmatively to the prompt.

## Units and Constants¶

aospy provides the classes Constant and Units for representing, respectively, physical constants (e.g. Earth’s gravitational acceleration at the surface = 9.81 m/s^2) and physical units (e.g. meters per second squared in that example).

aospy comes with several commonly used constants saved within the constants module in which the Constant class is also defined. In contrast, there are no pre-defined Units objects; the user must define any Units objects they wish to use (e.g. to populate the units attribute of their Var objects).

Warning

Whereas these baked-in Constant objects are used by aospy in various places, aospy currently does not actually use the Var.units attribute during calculations or the Units class more generally; they are solely for the user’s own informational purposes.

### constants¶

Classes and objects pertaining to physical constants.

class aospy.constants.Constant(value, units, description='')[source]

Physical constants used in atmospheric and oceanic sciences.

### units¶

Functionality for representing physical units, e.g. meters.

class aospy.units.Units(units='', plot_units=False, plot_units_conv=1.0, vert_int_units=False, vert_int_plot_units=False, vert_int_plot_units_conv=False)[source]

String representation of physical units and conversion methods.

Note

There has been discussion of implementing units-handling upstream within xarray (see here). If that happens, the Units class will likely be deprecated and replaced with the upstream version.

## Utilities¶

aospy includes a number of utility functions that are used internally and may also be useful to users for their own purposes. These include functions pertaining to input/output (IO), time arrays, andvertical coordinates.

### utils.io¶

Utility functions for data input and output.

aospy.utils.io.data_in_label(intvl_in, dtype_in_time, dtype_in_vert=False)[source]

Create string label specifying the input data of a calculation.

aospy.utils.io.data_name_gfdl(name, domain, data_type, intvl_type, data_yr, intvl, data_in_start_yr, data_in_dur)[source]

Determine the filename of GFDL model data output.

aospy.utils.io.data_out_label(time_intvl, dtype_time, dtype_vert=False)[source]
aospy.utils.io.dmget(files_list)[source]

Call GFDL command ‘dmget’ to access archived files.

aospy.utils.io.ens_label(ens_mem)[source]

Create label of the ensemble member for aospy data I/O.

aospy.utils.io.get_parent_attr(obj, attr, strict=False)[source]

Search recursively through an object and its parent for an attribute.

Check if the object has the given attribute and it is non-empty. If not, check each parent object for the attribute and use the first one found.

aospy.utils.io.time_label(intvl, return_val=True)[source]

Create time interval label for aospy data I/O.

aospy.utils.io.yr_label(yr_range)[source]

Create label of start and end years for aospy data I/O.

### utils.times¶

Utility functions for handling times, dates, etc.

aospy.utils.times.add_uniform_time_weights(ds)[source]

Append uniform time weights to a Dataset.

All DataArrays with a time coordinate require a time weights coordinate. For Datasets read in without a time bounds coordinate or explicit time weights built in, aospy adds uniform time weights at each point in the time coordinate.

Parameters: ds : Dataset Input data Dataset
aospy.utils.times.apply_time_offset(time, years=0, months=0, days=0, hours=0)[source]

Apply a specified offset to the given time array.

This is useful for GFDL model output of instantaneous values. For example, 3 hourly data postprocessed to netCDF files spanning 1 year each will actually have time values that are offset by 3 hours, such that the first value is for 1 Jan 03:00 and the last value is 1 Jan 00:00 of the subsequent year. This causes problems in xarray, e.g. when trying to group by month. It is resolved by manually subtracting off those three hours, such that the dates span from 1 Jan 00:00 to 31 Dec 21:00 as desired.

Parameters: time : xarray.DataArray representing a timeseries years, months, days, hours : int, optional The number of years, months, days, and hours, respectively, to offset the time array by. Positive values move the times later. pandas.DatetimeIndex

Examples

Case of a length-1 input time array:

>>> times = xr.DataArray(datetime.datetime(1899, 12, 31, 21))
>>> apply_time_offset(times)
Timestamp('1900-01-01 00:00:00')


Case of input time array with length greater than one:

>>> times = xr.DataArray([datetime.datetime(1899, 12, 31, 21),
...                       datetime.datetime(1899, 1, 31, 21)])
>>> apply_time_offset(times)
DatetimeIndex(['1900-01-01', '1899-02-01'], dtype='datetime64[ns]',
freq=None)

aospy.utils.times.assert_matching_time_coord(arr1, arr2)[source]

Check to see if two DataArrays have the same time coordinate.

Parameters: arr1 : DataArray or Dataset First DataArray or Dataset arr2 : DataArray or Dataset Second DataArray or Dataset ValueError If the time coordinates are not identical between the two Datasets
aospy.utils.times.average_time_bounds(ds)[source]

Return the average of each set of time bounds in the Dataset.

Useful for creating a new time array to replace the Dataset’s native time array, in the case that the latter matches either the start or end bounds. This can cause errors in grouping (akin to an off-by-one error) if the timesteps span e.g. one full month each. Note that the Dataset’s times must not have already undergone “CF decoding”, wherein they are converted from floats using the ‘units’ attribute into datetime objects.

Parameters: ds : xarray.Dataset A Dataset containing a time bounds array with name matching internal_names.TIME_BOUNDS_STR. This time bounds array must have two dimensions, one of which’s coordinates is the Dataset’s time array, and the other is length-2. xarray.DataArray The mean of the start and end times of each timestep in the original Dataset. ValueError If the time bounds array doesn’t match the shape specified above.
aospy.utils.times.convert_scalar_to_indexable_coord(scalar_da)[source]

Convert a scalar coordinate to an indexable one.

In xarray, scalar coordinates cannot be indexed. This converts a scalar coordinate-containing DataArray to one that can be indexed using da.sel and da.isel.

Parameters: scalar_da : DataArray Must contain a scalar coordinate DataArray
aospy.utils.times.datetime_or_default(date, default)[source]

Return a datetime.datetime object or a default.

Parameters: date : None or datetime-like object default : The value to return if date is None default if date is None, otherwise returns the result of utils.times.ensure_datetime(date)
aospy.utils.times.ensure_datetime(obj)[source]

Return the object if it is of type datetime.datetime; else raise.

Parameters: obj : Object to be tested. The original object if it is a datetime.datetime object. TypeError if obj is not of type datetime.datetime.
aospy.utils.times.ensure_time_as_dim(ds)[source]

Ensures that time is an indexable dimension on relevant quantites

In xarray, scalar coordinates cannot be indexed. We rely on indexing in the time dimension throughout the code; therefore we need this helper method to (if needed) convert a scalar time coordinate to a dimension.

Note that this must be applied before CF-conventions are decoded; otherwise it casts np.datetime64[ns] as int values.

Parameters: ds : Dataset Dataset with a time coordinate Dataset
aospy.utils.times.ensure_time_avg_has_cf_metadata(ds)[source]

Add time interval length and bounds coordinates for time avg data.

If the Dataset or DataArray contains time average data, enforce that there are coordinates that track the lower and upper bounds of the time intervals, and that there is a coordinate that tracks the amount of time per time average interval.

CF conventions require that a quantity stored as time averages over time intervals must have time and time_bounds coordinates [R1]. aospy further requires AVERAGE_DT for time average data, for accurate time-weighted averages, which can be inferred from the CF-required time_bounds coordinate if needed. This step should be done prior to decoding CF metadata with xarray to ensure proper computed timedeltas for different calendar types.

Parameters: ds : Dataset or DataArray Input data Dataset or DataArray Time average metadata attributes added if needed.
aospy.utils.times.extract_months(time, months)[source]

Extract times within specified months of the year.

Parameters: time : xarray.DataArray Array of times that can be represented by numpy.datetime64 objects (i.e. the year is between 1678 and 2262). months : Desired months of the year to include xarray.DataArray of the desired times
aospy.utils.times.month_indices(months)[source]

Convert string labels for months to integer indices.

Parameters: months : str, int If int, number of the desired month, where January=1, February=2, etc. If str, must match either ‘ann’ or some subset of ‘jfmamjjasond’. If ‘ann’, use all months. Otherwise, use the specified months. np.ndarray of integers corresponding to desired month indices TypeError : If months is not an int or str

_month_conditional

aospy.utils.times.monthly_mean_at_each_ind(monthly_means, sub_monthly_timeseries)[source]

Copy monthly mean over each time index in that month.

Parameters: monthly_means : xarray.DataArray array of monthly means sub_monthly_timeseries : xarray.DataArray array of a timeseries at sub-monthly time resolution xarray.DataArray with eath monthly mean value from monthly_means repeated at each time within that month from sub_monthly_timeseries

monthly_mean_ts
Create timeseries of monthly mean values
aospy.utils.times.monthly_mean_ts(arr)[source]

Convert a sub-monthly time-series into one of monthly means.

Also drops any months with no data in the original DataArray.

Parameters: arr : xarray.DataArray Timeseries of sub-monthly temporal resolution data xarray.DataArray Array resampled to comprise monthly means

monthly_mean_at_each_ind
Copy monthly means to each submonthly time
aospy.utils.times.numpy_datetime_range_workaround(date, min_year, max_year)[source]

Reset a date to earliest allowable year if outside of valid range.

Hack to address np.datetime64, and therefore pandas and xarray, not supporting dates outside the range 1677-09-21 and 2262-04-11 due to nanosecond precision. See e.g. https://github.com/spencerahill/aospy/issues/96.

Parameters: date : datetime.datetime object min_year : int Minimum year in the raw decoded dataset max_year : int Maximum year in the raw decoded dataset datetime.datetime object Original datetime.datetime object if the original date is within the permissible dates, otherwise a datetime.datetime object with the year offset to the earliest allowable year.
aospy.utils.times.numpy_datetime_workaround_encode_cf(ds)[source]

Generate CF-compliant units for out-of-range dates.

Hack to address np.datetime64, and therefore pandas and xarray, not supporting dates outside the range 1677-09-21 and 2262-04-11 due to nanosecond precision. See e.g. https://github.com/spencerahill/aospy/issues/96.

Specifically, we coerce the data such that, when decoded, the earliest value starts in 1678 but with its month, day, and shorter timescales (hours, minutes, seconds, etc.) intact and with the time-spacing between values intact.

Parameters: ds : xarray.Dataset xarray.Dataset, int, int Dataset with time units adjusted as needed, minimum year in loaded data, and maximum year in loaded data.
aospy.utils.times.sel_time(da, start_date, end_date)[source]

Subset a DataArray or Dataset for a given date range.

Ensures that data are present for full extent of requested range. Appends start and end date of the subset to the DataArray.

Parameters: da : DataArray or Dataset data to subset start_date : np.datetime64 start of date interval end_date : np.datetime64 end of date interval da : DataArray or Dataset subsetted data AssertionError if data for requested range do not exist for part or all of requested range
aospy.utils.times.yearly_average(arr, dt)[source]

Average a sub-yearly time-series over each year.

Resulting timeseries comprises one value for each year in which the original array had valid data. Accounts for (i.e. ignores) masked values in original data when computing the annual averages.

Parameters: arr : xarray.DataArray The array to be averaged dt : xarray.DataArray Array of the duration of each timestep xarray.DataArray Has the same shape and mask as the original arr, except for the time dimension, which is truncated to one value for each year that arr spanned

### utils.vertcoord¶

Utility functions for dealing with vertical coordinates.

aospy.utils.vertcoord.d_deta_from_pfull(arr)[source]

Compute $partial/partialeta$ of the array on full hybrid levels.

$eta$ is the model vertical coordinate, and its value is assumed to simply increment by 1 from 0 at the surface upwards. The data to be differenced is assumed to be defined at full pressure levels.

Parameters: arr : xarray.DataArray containing the ‘pfull’ dim deriv : xarray.DataArray with the derivative along ‘pfull’ computed via 2nd order centered differencing.
aospy.utils.vertcoord.d_deta_from_phalf(arr, pfull_coord)[source]

Compute pressure level thickness from half level pressures.

aospy.utils.vertcoord.does_coord_increase_w_index(arr)[source]

Determine if the array values increase with the index.

Useful, e.g., for pressure, which sometimes is indexed surface to TOA and sometimes the opposite.

aospy.utils.vertcoord.dp_from_p(p, ps, p_top=0.0, p_bot=110000.0)[source]

Get level thickness of pressure data, incorporating surface pressure.

Level edges are defined as halfway between the levels, as well as the user- specified uppermost and lowermost values. The dp of levels whose bottom pressure is less than the surface pressure is not changed by ps, since they don’t intersect the surface. If ps is in between a level’s top and bottom pressures, then its dp becomes the pressure difference between its top and ps. If ps is less than a level’s top and bottom pressures, then that level is underground and its values are masked.

Note that postprocessing routines (e.g. at GFDL) typically mask out data wherever the surface pressure is less than the level’s given value, not the level’s upper edge. This masks out more levels than the

aospy.utils.vertcoord.dp_from_ps(bk, pk, ps, pfull_coord)[source]

Compute pressure level thickness from surface pressure

aospy.utils.vertcoord.get_dim_name(arr, names)[source]

Determine if an object has an attribute name matching a given list.

aospy.utils.vertcoord.int_dp_g(arr, dp)[source]

Mass weighted integral.

aospy.utils.vertcoord.integrate(arr, ddim, dim=False, is_pressure=False)[source]

Integrate along the given dimension.

aospy.utils.vertcoord.level_thickness(p, p_top=0.0, p_bot=101325.0)[source]

Calculates the thickness, in Pa, of each pressure level.

Assumes that the pressure values given are at the center of that model level, except for the lowest value (typically 1000 hPa), which is the bottom boundary. The uppermost level extends to 0 hPa.

Unlike dp_from_p, this does not incorporate the surface pressure.

aospy.utils.vertcoord.pfull_from_ps(bk, pk, ps, pfull_coord)[source]

Compute pressure at full levels from surface pressure.

aospy.utils.vertcoord.phalf_from_ps(bk, pk, ps)[source]

Compute pressure of half levels of hybrid sigma-pressure coordinates.

aospy.utils.vertcoord.replace_coord(arr, old_dim, new_dim, new_coord)[source]

Replace a coordinate with new one; new and old must have same shape.

aospy.utils.vertcoord.to_hpa(arr)[source]

Convert pressure array from Pa to hPa (if needed).

aospy.utils.vertcoord.to_pascal(arr, is_dp=False)[source]

Force data with units either hPa or Pa to be in Pa.

aospy.utils.vertcoord.to_pfull_from_phalf(arr, pfull_coord)[source]

Compute data at full pressure levels from values at half levels.

aospy.utils.vertcoord.to_phalf_from_pfull(arr, val_toa=0, val_sfc=0)[source]

Compute data at half pressure levels from values at full levels.

Could be the pressure array itself, but it could also be any other data defined at pressure levels. Requires specification of values at surface and top of atmosphere.

aospy.utils.vertcoord.to_radians(arr, is_delta=False)[source]

aospy.utils.vertcoord.vert_coord_name(arr)[source]