Your aospy object library¶
The first step is writing the code that describes your data and the quantities you eventually want to compute using it. We refer to this code collectively as your “object library”.
Describing your data on disk¶
aospy needs to know where the data you want to use is located on disk
and how it is organized across different projects, models, and model
runs (i.e. simulations). This involves a hierarchy of three classes,
aospy.Proj: This represents a single project that involves analysis of data from one or more models and simulations.
aospy.Model: This represents a single climate model, other numerical model, observational data source, etc.
aospy.Run: This represents a single simulation, version of observational data, etc.
So each user’s object library will contain one or more
aospy.Proj objects, each of which will have one or more
aospy.Model objects, which in turn will each have
one or more child
Currently, the Proj-Model-Run hierarchy is rigid, in that each Run has a parent Model, and each Model has a parent Proj. Work is ongoing to relax this to a more generic parent-child framework.
aospy.Var class is used to represent physical variables,
e.g. precipitation rate or potential temperature. This includes both
variables which are directly available in netCDF files (e.g. they were
directly outputted by your model or gridded data product) as well as
those fields that must be computed from other variables (e.g. they
weren’t directly outputted but can be computed from other variables
that were outputted).
aospy.Region class is used to define geographical
regions over which quantities can be averaged (in addition to
gridpoint-by-gridpoint values). Like
they are more generic than the objects of the
aospy.Run hierarchy, in that
they correspond to the generic physical quantities/regions rather than
the data of a particular project, model, or simulation.
Object library structure¶
The officially supported way to submit calculations is the
aospy.submit_mult_calcs() function. In order for this to
work, your object library must follow one or the other of these
aospy.Varobjects are accessible as attributes of your library. This means that
my_obj_libis your object library, and
my_objis the object in question.
aospy.Projobjects are stored in a container called
projsis an attribute of your library (i.e.
my_obj_lib.projs). And likewise for
aospy.Varobjects in a
Beyond that, you can structure your object library however you wish. In particular, it can be structured as a Python module (i.e. a single “.py” file) or as a package (i.e. multiple “.py” files linked together; see the official documentation on package structuring).
A single module works great for small projects and for initially
trying out aospy (this is how the example object library,
aospy.examples.example_obj_lib, is structured). But as
your object library grows, it can become easier to manage as a package
of multiple files. For an example of a large object library that is
structured as a formal package, see here.
Accessing your library¶
If your current working directory is the one containing your library,
you can import your library via
import my_obj_lib (replacing
my_obj_lib with whatever you’ve named yours) in order to pass it
Once you start using aospy a lot, however, this requirement of being
in the same directory becomes cumbersome. As a solution, you can add
the directory containing your object library to the
environment variable. E.g if you’re using the bash shell:
Of course, replace
/path/to/your/object/library with the actual
path to yours. This command places your object library at the front
PYTHONPATH environment variable, which is essentially the
first place where Python looks to find packages and modules to be
imported. (For more, see Python’s official documentation on
It’s convenient to copy this command into your shell profile (e.g.,
for the bash shell on Linux or Mac,
~/.bash_profile) so that
you don’t have to call it again in every new terminal session.
To test this is working, run
python -c "import my_obj_lib" from a
directory other than where the library is located (again replacing
my_obj_lib with the name you’ve given to your library). If this
runs without error, you should be good to go.
As noted above, the officially supported way to submit calculations is the
We provide a template “main” script with aospy that uses this function. We recommend copying it to the location of your choice. In the copy, replace the example object library and associated objects with your own. (If you accidentally change the original, you can always get a fresh copy from Github).
Running the main script¶
Once the main script parameters are all modified as desired, execute the script from the command line as follows
This should generate a text summary of the specified parameters and a prompt as to whether to proceed or not with the calculations. An affirmative response then triggers the calculations to execute.
You may need to change the permissions on the file to make it executable. E.g. from a Mac or Linux: chmod u+x /path/to/your/aospy_main.py. Alternatively you can call python or IPython from the command line to run it: python /path/to/your/aospy_main.py or ipython /path/to/your/aospy_main.py.
Specifically, the parameters are permuted over all possible combinations. So, for example, if two model names and three variable names were listed and all other parameters had only one element, six calculations would be generated and executed. There is no limit to the number of permutations.
You can also call the main script interactively within an IPython
%run /path/to/your/main.py or, from the command
line, run the script and then start an interactive IPython session
ipython -i /path/to/your/main.py.
Or you can call
aospy.submit_mult_calcs() directly within
an interactive session.
As the calculations are performed, logging information will be printed to the terminal displaying their progress.
The calculations generated by the main script can be executed in
dask.distributed. aospy will either automatically
set up a
dask.distributed.LocalCluster to perform the calculations,
or one can optionally specify an external
distributed.Client to delegate
the work. Otherwise, or if the user sets
parallelize=False in the
calc_exec_options argument of
script, the calculations will be executed one-by-one.
Particularly on instititutional clusters with many cores, this parallelization yields an impressive speed-up when multiple calculations are generated.
When calculations are performed in parallel, often the logging information from different calculations running simultameously end up interwoven with one another, leading to output that is confusing to follow. Work is ongoing to improve the logging output when the computations are parallelized.
Finding the output¶
aospy saves the results of all calculations as netCDF files and embeds metadata describing it within the netCDF files, in their filenames, and in the directory structure within which they are saved.
- Directory structure:
- File name :
See the API Reference on
explanation of each of these components of the path and file name.
Under the hood¶
aospy.submit_mult_calcs() creates a
object that permutes over the provided lists of calculation
specifications, encoding each permutation into a
Actually, when multiple regions and/or output time/regional
reductions are specified, these all get passed to each
aospy.CalcInterface object rather than being permuted
over. They are then looped over during the subsequent
calculations. This is to prevent unnecessary re-loading and
re-computing, because, for a given simulation/variable/etc., all
regions and reduction methods use the same data.
aospy.CalcInterface object, in turn, is used to
aospy.Calc object. The
aospy.Calc object, in turn:
- loads the required netCDF data given its simulation, variable, and date range
- (if necessary) further truncates the data in time (i.e. to the given subset of the annual cycle, and/or if the requested date range doesn’t exactly align with the time chunking of the input netCDF files)
- (if the variable is a function of other variables) executes the function that computes the calculation using this loaded and truncated data
- applies all specified temporal and regional time reductions
- writes the results (plus additional metadata) to disk as netCDF
files and appends it to its own
aospy.Region, these objects are not intended to be
.py files for continual re-use. Instead, they are
generated as needed, perform their desired tasks, and then go away.
See the API reference documentation for further details.