Using aospy¶
This section provides a high-level summary of how to use aospy. See the Examples section and associated Jupyter Notebook for concrete examples.
Note
There is a non-trivial amount of effort required (mainly in creating and populating your object library, described below) before you will be able to perform any calculations. However, once the object library is in place, there is essentially no limit to the number of calculations that can be performed on your data either together at the same time or at different times. In other words, the spinup time should be well worth it.
Your aospy object library¶
The first step is writing the code that describes your data and the quantities you eventually want to compute using it. We refer to this code collectively as your “object library”.
Describing your data on disk¶
aospy needs to know where the data you want to use is located on disk
and how it is organized across different simulations, models, and
projects. This involves a hierarchy of three classes, Proj
,
Model
, and Run
.
Proj
: This represents a single project that involves analysis of data from one or more models and simulations.Model
: This represents a single climate model, other numerical model, observational data source, etc.Run
: This represents a single simulation, version of observational data, etc.
So each user’s object library will contain one or more Proj
objects, each of which will have one or more child Model
objects,
which in turn will each have one or more child Run
objects.
Note
Currently, the Proj-Model-Run hierarchy is rigid, in that each Run has a parent Model, and each Model has a parent Proj. For small projects, this can lead to a lot of boilerplate code. Work is ongoing to relax this constraint to facilitate easier exploratory analysis.
Physical variables¶
The Var
class is used to represent physical variables,
e.g. precipitation rate or potential temperature. This includes both
variables which are directly available in netCDF files (e.g. they were
directly outputted by your climate model) as well as those fields that
must be computed from other variables (e.g. they weren’t directly
outputted but can be computed from other variables that were
outputted).
Geographical regions¶
The Region
class is used to define geographical regions over which
quantities can be averaged (in addition to gridpoint-by-gridpoint
values). Like Var
objects, they are more generic than the objects
of the Proj
- Model
- Run
hierarchy, in that they
correspond to the generic physical quantities/regions rather than the
data of a particular project, model, or simulation.
Configuring your object library¶
Required components¶
In order for your object library to work with the main script, it must include the following two objects:
projs
: A container ofProj
objectsvariables
: A container ofVar
objects
(The Model
, Run
, and Region
objects are all included
within their parent Proj
objects and thus don’t require analogous
top-level containers.)
These must be accessible from the object library’s toplevel namespace,
i.e. the Python commands import my_obj_lib.projs
and import
my_obj_lib.variables
must work, where my_obj_lib
is the name
you’ve given to your library. Which leads to the next topic: how to
structure your object library within one or more .py
files.
File/directory structure¶
The simplest way to structure your object library is to define
everything in a single module (i.e. a single .py
file). This
works great for small projects and for initially trying out aospy.
As an object library grows, however, it can become desirable to split
it into multiple .py
files. This effectively changes it from a
module to a proper Python package. Python packages require a specific
directory structure and specification of things to include at each
level via __init__.py
files. See the official documentation on
packages for further guidance.
For an example of a large object library that is structured as a proper package, see here.
Making your object library visible to Python¶
Whether it is structured as a single module or as a proper package,
you’ll likely have to add the directory containing your object library
to the PYTHONPATH
environment variable in order for Python to be
able to import it:
export PYTHONPATH=/path/to/your/object/library:${PYTHONPATH}
Of course, replace /path/to/your/object/library
with the actual
path to yours. This command places your object library at the front
of the PYTHONPATH
environment variable, which is essentially the
first place where Python looks to find packages and modules to be
imported.
Note
It’s convenient to copy this command into your shell profile (e.g.,
for the bash shell on Linux or Mac, ~/.bash_profile
) so that
you don’t have to call it again in every new terminal session.
Note
For object libraries structured as packages, it is also possible to
properly install your object library by creating a properly set-up
setup.py
file and python setup.py install
. But unless
you’re prevented from modifying PYTHONPATH
for some reason,
there’s no advantage of this versus the simpler
PYTHONPATH
alternative above.
Once this has been done, you should be able to import your object
library from within Python via import my_obj_lib
, where
my_obj_lib
is the name you’ve given to your library. You will not
be able to use the main script until this works.
Executing calculations¶
The main script contents¶
Calculations are performed by specifying in a “main script” the desired parameters and then running the script.
We provide a template main script within aospy. You should copy it to the location of your choice and in the copy replace the given names with the names of your own project, model, etc. objects that you want to perform computations on. (If you accidentally change the original, you can always get a fresh copy from Github.)
Except where noted otherwise in the template script’s comments, all
parameters should be submitted as lists, even if they are a single
element. E.g. models = ['name-of-my-model']
.
Note
Although the main script is the recommended way to perform
calculations, it’s possible to submit calculations by other means.
For example, one could explicitly create Calc
objects and call
their compute
method, as is done in the example Jupyter
notebook.
Running the main script¶
Once the main script parameters are all modified as desired, execute the script from the command line as follows
/path/to/your/main.py
This should generate a text summary of the specified parameters and a prompt as to whether to proceed or not with the calculations. An affirmative response then triggers the calculations to execute.
Specifically, the parameters are permuted over all possible combinations. So, for example, if two model names and three variable names were listed and all other parameters had only one element, six calculations would be generated and executed. There is no limit to the number of permutations.
Note
You can also call the main script interactively within an IPython
session via %run /path/to/your/main.py
or, from the command
line, run the script and then start an interactive IPython session
via ipython -i /path/to/your/main.py
.
As the calculations are performed, logging information will be printed to the terminal displaying their progress.
Parallelized calculations¶
The calculations generated by the main script can be executed in
parallel provided the optional dependency multiprocess
is
installed. (It is available via pip: pip install multiprocess
.)
Otherwise, or if the user sets parallelize
to False
in the main
script, the calculations will be executed one-by-one.
Particularly on instititutional clusters with many cores, this parallelization yields an impressive speed-up when multiple calculations are generated.
Note
When calculations are performed in parallel, often the logging information from different calculations running simultameously end up interwoven with one another, leading to output that is confusing to follow. Work is ongoing to improve the logging output when the computations are parallelized.
Finding the output¶
aospy saves the results of all calculations as netCDF files and embeds metadata describing it within the netCDF files, in their filenames, and in the directory structure within which they are saved.
- Directory structure:
/path/to/aospy-rootdir/projname/modelname/runname/varname
- File name :
varname.intvl_out.dtype_out_time.'from_'intvl_in'_'dtype_in_time.model.run.date_range.nc
See the API reference documentation of CalcInterface
for explanation of each of these components of the path and file name.
Under the hood¶
The main script encodes each permutation of the input parameters into
a CalcInterface
object. This object, in turn, is used to
instantiate a Calc
object. The Calc
object, in turn, performs
the calculation.
Unlike Proj
, Model
, Run
, Var
, and Region
, these
objects are not intended to be saved in .py
files for continual
re-use. Instead, they are generated as needed, perform their desired
tasks, and then go away.
See the API reference documentation for further details.