Using aospy¶
This section provides a high-level summary of how to use aospy. See the Overview section of this documentation for more background information, or the Examples section for concrete examples.
Your aospy object library¶
The first step is writing the code that describes your data and the quantities you eventually want to compute using it. We refer to this code collectively as your “object library”.
Describing your data on disk¶
aospy needs to know where the data you want to use is located on disk
and how it is organized across different projects, models, and model
runs (i.e. simulations). This involves a hierarchy of three classes,
aospy.Proj
, aospy.Model
, and
aospy.Run
.
aospy.Proj
: This represents a single project that involves analysis of data from one or more models and simulations.aospy.Model
: This represents a single climate model, other numerical model, observational data source, etc.aospy.Run
: This represents a single simulation, version of observational data, etc.
So each user’s object library will contain one or more
aospy.Proj
objects, each of which will have one or more
child aospy.Model
objects, which in turn will each have
one or more child aospy.Run
objects.
Note
Currently, the Proj-Model-Run hierarchy is rigid, in that each Run has a parent Model, and each Model has a parent Proj. Work is ongoing to relax this to a more generic parent-child framework.
Physical variables¶
The aospy.Var
class is used to represent physical variables,
e.g. precipitation rate or potential temperature. This includes both
variables which are directly available in netCDF files (e.g. they were
directly outputted by your model or gridded data product) as well as
those fields that must be computed from other variables (e.g. they
weren’t directly outputted but can be computed from other variables
that were outputted).
Note
aospy.Var
objects with the name 'p'
or 'dp'
are
handled in a special manner. These variables are meant to represent
pressure and pressure thicknesses, respectively. When encountering
variables named this way, aospy
will attempt to compute them in
accordance with the available vertical coordinates depending on the
dtype_in_vert
specified in the main script. Currently supported
dtype_in_vert
values are:
'sigma'
: for data output on hybrid pressure coordinates'pressure'
: for data interpolated to levels of constant pressure.
Geographical regions¶
The aospy.Region
class is used to define geographical
regions over which quantities can be averaged (in addition to
gridpoint-by-gridpoint values). Like aospy.Var
objects,
they are more generic than the objects of the aospy.Proj
-
aospy.Model
- aospy.Run
hierarchy, in that
they correspond to the generic physical quantities/regions rather than
the data of a particular project, model, or simulation.
Object library structure¶
The officially supported way to submit calculations is the
aospy.submit_mult_calcs()
function. In order for this to
work, your object library must follow one or the other of these
structures:
- All
aospy.Proj
andaospy.Var
objects are accessible as attributes of your library. This means thatmy_obj_lib.my_obj
works, wheremy_obj_lib
is your object library, andmy_obj
is the object in question. - All
aospy.Proj
objects are stored in a container calledprojs
, whereprojs
is an attribute of your library (i.e.my_obj_lib.projs
). And likewise foraospy.Var
objects in avariables
attribute.
Beyond that, you can structure your object library however you wish. In particular, it can be structured as a Python module (i.e. a single “.py” file) or as a package (i.e. multiple “.py” files linked together; see the official documentation on package structuring).
A single module works great for small projects and for initially
trying out aospy (this is how the example object library,
aospy.examples.example_obj_lib
, is structured). But as
your object library grows, it can become easier to manage as a package
of multiple files. For an example of a large object library that is
structured as a formal package, see here.
Accessing your library¶
If your current working directory is the one containing your library,
you can import your library via import my_obj_lib
(replacing
my_obj_lib
with whatever you’ve named yours) in order to pass it
to aospy.submit_mult_calcs()
.
Once you start using aospy a lot, however, this requirement of being
in the same directory becomes cumbersome. As a solution, you can add
the directory containing your object library to the PYTHONPATH
environment variable. E.g if you’re using the bash shell:
export PYTHONPATH=/path/to/your/object/library:${PYTHONPATH}
Of course, replace /path/to/your/object/library
with the actual
path to yours. This command places your object library at the front
of the PYTHONPATH
environment variable, which is essentially the
first place where Python looks to find packages and modules to be
imported. (For more, see Python’s official documentation on
PYTHONPATH).
Note
It’s convenient to copy this command into your shell profile (e.g.,
for the bash shell on Linux or Mac, ~/.bash_profile
) so that
you don’t have to call it again in every new terminal session.
To test this is working, run python -c "import my_obj_lib"
from a
directory other than where the library is located (again replacing
my_obj_lib
with the name you’ve given to your library). If this
runs without error, you should be good to go.
Executing calculations¶
As noted above, the officially supported way to submit calculations is the
aospy.submit_mult_calcs()
function.
We provide a template “main” script with aospy that uses this function. We recommend copying it to the location of your choice. In the copy, replace the example object library and associated objects with your own. (If you accidentally change the original, you can always get a fresh copy from Github).
Running the main script¶
Once the main script parameters are all modified as desired, execute the script from the command line as follows
/path/to/your/aospy_main.py
This should generate a text summary of the specified parameters and a prompt as to whether to proceed or not with the calculations. An affirmative response then triggers the calculations to execute.
Note
You may need to change the permissions on the file to make it executable. E.g. from a Mac or Linux: chmod u+x /path/to/your/aospy_main.py. Alternatively you can call python or IPython from the command line to run it: python /path/to/your/aospy_main.py or ipython /path/to/your/aospy_main.py.
Specifically, the parameters are permuted over all possible combinations. So, for example, if two model names and three variable names were listed and all other parameters had only one element, six calculations would be generated and executed. There is no limit to the number of permutations.
Note
You can also call the main script interactively within an IPython
session via %run /path/to/your/main.py
or, from the command
line, run the script and then start an interactive IPython session
via ipython -i /path/to/your/main.py
.
Or you can call aospy.submit_mult_calcs()
directly within
an interactive session.
As the calculations are performed, logging information will be printed to the terminal displaying their progress.
Parallelized calculations¶
The calculations generated by the main script can be executed in
parallel using dask.distributed
. aospy will either automatically
set up a dask.distributed.LocalCluster
to perform the calculations,
or one can optionally specify an external distributed.Client
to delegate
the work. Otherwise, or if the user sets parallelize=False
in the
calc_exec_options
argument of aospy.submit_mult_calcs()
,
script, the calculations will be executed one-by-one.
Particularly on instititutional clusters with many cores, this parallelization yields an impressive speed-up when multiple calculations are generated.
Note
When calculations are performed in parallel, often the logging information from different calculations running simultameously end up interwoven with one another, leading to output that is confusing to follow. Work is ongoing to improve the logging output when the computations are parallelized.
Finding the output¶
aospy saves the results of all calculations as netCDF files and embeds metadata describing it within the netCDF files, in their filenames, and in the directory structure within which they are saved.
- Directory structure:
/path/to/aospy-rootdir/projname/modelname/runname/varname
- File name :
varname.intvl_out.dtype_out_time.'from_'intvl_in'_'dtype_in_time.model.run.date_range.nc
See the API Reference on aospy.Calc
for explanation of
each of these components of the path and file name.
Under the hood¶
aospy.submit_mult_calcs()
creates a aospy.CalcSuite
object that permutes over the provided lists of calculation
specifications, encoding each permutation into a
aospy.Calc
object.
The aospy.Calc
object, in turn:
- loads the required netCDF data given its simulation, variable, and date range
- (if necessary) further truncates the data in time (i.e. to the given subset of the annual cycle, and/or if the requested date range doesn’t exactly align with the time chunking of the input netCDF files)
- (if the variable is a function of other variables) executes the function that computes the calculation using this loaded and truncated data
- applies all specified temporal and regional time reductions
- writes the results (plus additional metadata) to disk as netCDF
files and appends it to its own
data_out
attribute
Note
Actually, when multiple regions and/or output time/regional
reductions are specified, these all get passed to each
aospy.Calc
object rather than being permuted
over. They are then looped over during the subsequent
calculations. This is to prevent unnecessary re-loading and
re-computing, because, for a given simulation/variable/etc., all
regions and reduction methods use the same data.
Note
Unlike aospy.Proj
, aospy.Model
,
aospy.Run
, aospy.Var
, and
aospy.Region
, these objects are not intended to be
saved in .py
files for continual re-use. Instead, they are
generated as needed, perform their desired tasks, and then go away.
See the API reference documentation for further details.