IMOS Data access with Python¶
This notebook is based on the examples from the IMOS based Python library tutorial.
Note
In this first example (OceanData1.ipynb
), we will work with the IMOS portal using Python via Jupyter Notebooks.
There are several advantages of using Python
as a general data analysis language and the notebook environment is a versatile tool that is designed to be interactive, user-friendly, open-source and sharable.
We will see how to load NetCDF
data into a Python
environment, and show how to use the data once loaded.
# from IPython.display import YouTubeVideo
# YouTubeVideo('7yTpv70gkGE', width=760, height=450)
Note
First we will load IMOS NetCDF, and see how to quickly use the data once loaded.
For this example, we will rely on the netCDF4 Python library (http://code.google.com/p/netcdf4-python/)
Attention
The examples provided in this document only represent a tiny bit of the content of most of the NetCDF files. There are usually many more variables available in a NetCDF file, and therefore many other ways to display data.
Content:
Installation of the IMOS User Code Library (Python) and required packages¶
The examples that will be used rely on the following Python
packages, which have already been installed:
numpy
– standard package for scientific computing in Python, provides versatile numerical array objects (http://www.numpy.org/).matplotlib
– for plotting (http://matplotlib.org/).netCDF4
– for accessing netCDF files (http://code.google.com/p/netcdf4-python/).
Finding an IMOS NetCDF File¶
In order to find a dataset you are interested in, please refer to the portal help: http://help.aodn.org.au/help/?q=node/6. This is a how-to guide that can help users find an IMOS NetCDF file. When downloading chosen dataset from the portal, one will have to choose one of the download options “List of URLs”, or “All source NETCDF files” to obtain netCDF files.
For users who are already familiar with IMOS facilities and datasets, IMOS NetCDF files are also directly accessible via an OPeNDAP catalog at : http://thredds.aodn.org.au/thredds/catalog/IMOS/catalog.html
Here we will use the ‘Data URL’ of a dataset. If you have downloaded your dataset from the portal, the data URL is the file path to the file on your local machine. If you are using the THREDDS catalog, the file does not have to be downloaded to your local machine first – the OPeNDAP data URL can be parsed into Python.
The OPeNDAP data URL is found on the ‘OPeNDAP Dataset Access Form’ page (see http://help.aodn.org.au/help/?q=node/11), inside the box labelled ‘Data URL’ just above the ‘Global Attributes’ field.
Note
The list of URL’s generated by the IMOS portal when choosing that download option can be converted to a list of OPeNDAP data URL’s by replacing string http://data.aodn.org.au/IMOS/opendap with http://thredds.aodn.org.au/thredds/dodsC/IMOS.
General Features of the netCDF4 module¶
The first step consists of opening the NetCDF file, whether this file is available locally or remotely on an OPeNDAP server.
This is done by running the commands shown in the cell below and it creates a netCDF
Dataset object, through which you can access all the contents of the file.
from netCDF4 import Dataset
aatams_URL = 'http://thredds.aodn.org.au/thredds/dodsC/IMOS/eMII/demos/AATAMS/marine_mammal_ctd-tag/2009_2011_ct64_Casey_Macquarie/ct64-M746-09/IMOS_AATAMS-SATTAG_TSP_20100205T043000Z_ct64-M746-09_END-20101029T071000Z_FV00.nc'
aatams_DATA = Dataset(aatams_URL)
Output structure¶
Please refer to the netCDF4 module documentation for a complete description of the Dataset
object:: ‘http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html’ (or type help(Dataset) at the Python prompt).
help(aatams_DATA)
Help on Dataset object:
class Dataset(builtins.object)
| A netCDF [Dataset](#Dataset) is a collection of dimensions, groups, variables and
| attributes. Together they describe the meaning of data and relations among
| data fields stored in a netCDF file. See [Dataset.__init__](#Dataset.__init__) for more
| details.
|
| A list of attribute names corresponding to global netCDF attributes
| defined for the [Dataset](#Dataset) can be obtained with the
| [Dataset.ncattrs](#Dataset.ncattrs) method.
| These attributes can be created by assigning to an attribute of the
| [Dataset](#Dataset) instance. A dictionary containing all the netCDF attribute
| name/value pairs is provided by the `__dict__` attribute of a
| [Dataset](#Dataset) instance.
|
| The following class variables are read-only and should not be
| modified by the user.
|
| **`dimensions`**: The `dimensions` dictionary maps the names of
| dimensions defined for the [Group](#Group) or [Dataset](#Dataset) to instances of the
| [Dimension](#Dimension) class.
|
| **`variables`**: The `variables` dictionary maps the names of variables
| defined for this [Dataset](#Dataset) or [Group](#Group) to instances of the
| [Variable](#Variable) class.
|
| **`groups`**: The groups dictionary maps the names of groups created for
| this [Dataset](#Dataset) or [Group](#Group) to instances of the [Group](#Group) class (the
| [Dataset](#Dataset) class is simply a special case of the [Group](#Group) class which
| describes the root group in the netCDF4 file).
|
| **`cmptypes`**: The `cmptypes` dictionary maps the names of
| compound types defined for the [Group](#Group) or [Dataset](#Dataset) to instances of the
| [CompoundType](#CompoundType) class.
|
| **`vltypes`**: The `vltypes` dictionary maps the names of
| variable-length types defined for the [Group](#Group) or [Dataset](#Dataset) to instances
| of the [VLType](#VLType) class.
|
| **`enumtypes`**: The `enumtypes` dictionary maps the names of
| Enum types defined for the [Group](#Group) or [Dataset](#Dataset) to instances
| of the [EnumType](#EnumType) class.
|
| **`data_model`**: `data_model` describes the netCDF
| data model version, one of `NETCDF3_CLASSIC`, `NETCDF4`,
| `NETCDF4_CLASSIC`, `NETCDF3_64BIT_OFFSET` or `NETCDF3_64BIT_DATA`.
|
| **`file_format`**: same as `data_model`, retained for backwards compatibility.
|
| **`disk_format`**: `disk_format` describes the underlying
| file format, one of `NETCDF3`, `HDF5`, `HDF4`,
| `PNETCDF`, `DAP2`, `DAP4` or `UNDEFINED`. Only available if using
| netcdf C library version >= 4.3.1, otherwise will always return
| `UNDEFINED`.
|
| **`parent`**: `parent` is a reference to the parent
| [Group](#Group) instance. `None` for the root group or [Dataset](#Dataset)
| instance.
|
| **`path`**: `path` shows the location of the [Group](#Group) in
| the [Dataset](#Dataset) in a unix directory format (the names of groups in the
| hierarchy separated by backslashes). A [Dataset](#Dataset) instance is the root
| group, so the path is simply `'/'`.
|
| **`keepweakref`**: If `True`, child Dimension and Variables objects only keep weak
| references to the parent Dataset or Group.
|
| **`_ncstring_attrs__`**: If `True`, all text attributes will be written as variable-length
| strings.
|
| Methods defined here:
|
| __delattr__(self, name, /)
| Implement delattr(self, name).
|
| __enter__(...)
|
| __exit__(...)
|
| __getattr__(...)
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(self, key, /)
| Return self[key].
|
| __init__(...)
| **`__init__(self, filename, mode="r", clobber=True, diskless=False,
| persist=False, keepweakref=False, memory=None, encoding=None,
| parallel=False, comm=None, info=None, format='NETCDF4')`**
|
| [Dataset](#Dataset) constructor.
|
| **`filename`**: Name of netCDF file to hold dataset. Can also
| be a python 3 pathlib instance or the URL of an OpenDAP dataset. When memory is
| set this is just used to set the `filepath()`.
|
| **`mode`**: access mode. `r` means read-only; no data can be
| modified. `w` means write; a new file is created, an existing file with
| the same name is deleted. `a` and `r+` mean append (in analogy with
| serial files); an existing file is opened for reading and writing.
| Appending `s` to modes `r`, `w`, `r+` or `a` will enable unbuffered shared
| access to `NETCDF3_CLASSIC`, `NETCDF3_64BIT_OFFSET` or
| `NETCDF3_64BIT_DATA` formatted files.
| Unbuffered access may be useful even if you don't need shared
| access, since it may be faster for programs that don't access data
| sequentially. This option is ignored for `NETCDF4` and `NETCDF4_CLASSIC`
| formatted files.
|
| **`clobber`**: if `True` (default), opening a file with `mode='w'`
| will clobber an existing file with the same name. if `False`, an
| exception will be raised if a file with the same name already exists.
|
| **`format`**: underlying file format (one of `'NETCDF4',
| 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'`, `'NETCDF3_64BIT_OFFSET'` or
| `'NETCDF3_64BIT_DATA'`.
| Only relevant if `mode = 'w'` (if `mode = 'r','a'` or `'r+'` the file format
| is automatically detected). Default `'NETCDF4'`, which means the data is
| stored in an HDF5 file, using netCDF 4 API features. Setting
| `format='NETCDF4_CLASSIC'` will create an HDF5 file, using only netCDF 3
| compatible API features. netCDF 3 clients must be recompiled and linked
| against the netCDF 4 library to read files in `NETCDF4_CLASSIC` format.
| `'NETCDF3_CLASSIC'` is the classic netCDF 3 file format that does not
| handle 2+ Gb files. `'NETCDF3_64BIT_OFFSET'` is the 64-bit offset
| version of the netCDF 3 file format, which fully supports 2+ GB files, but
| is only compatible with clients linked against netCDF version 3.6.0 or
| later. `'NETCDF3_64BIT_DATA'` is the 64-bit data version of the netCDF 3
| file format, which supports 64-bit dimension sizes plus unsigned and
| 64 bit integer data types, but is only compatible with clients linked against
| netCDF version 4.4.0 or later.
|
| **`diskless`**: If `True`, create diskless (in-core) file.
| This is a feature added to the C library after the
| netcdf-4.2 release. If you need to access the memory buffer directly,
| use the in-memory feature instead (see `memory` kwarg).
|
| **`persist`**: if `diskless=True`, persist file to disk when closed
| (default `False`).
|
| **`keepweakref`**: if `True`, child Dimension and Variable instances will keep weak
| references to the parent Dataset or Group object. Default is `False`, which
| means strong references will be kept. Having Dimension and Variable instances
| keep a strong reference to the parent Dataset instance, which in turn keeps a
| reference to child Dimension and Variable instances, creates circular references.
| Circular references complicate garbage collection, which may mean increased
| memory usage for programs that create may Dataset instances with lots of
| Variables. It also will result in the Dataset object never being deleted, which
| means it may keep open files alive as well. Setting `keepweakref=True` allows
| Dataset instances to be garbage collected as soon as they go out of scope, potentially
| reducing memory usage and open file handles. However, in many cases this is not
| desirable, since the associated Variable instances may still be needed, but are
| rendered unusable when the parent Dataset instance is garbage collected.
|
| **`memory`**: if not `None`, create or open an in-memory Dataset.
| If mode = 'r', the memory kwarg must contain a memory buffer object
| (an object that supports the python buffer interface).
| The Dataset will then be created with contents taken from this block of memory.
| If mode = 'w', the memory kwarg should contain the anticipated size
| of the Dataset in bytes (used only for NETCDF3 files). A memory
| buffer containing a copy of the Dataset is returned by the
| `Dataset.close` method. Requires netcdf-c version 4.4.1 for mode='r,
| netcdf-c 4.6.2 for mode='w'. To persist the file to disk, the raw
| bytes from the returned buffer can be written into a binary file.
| The Dataset can also be re-opened using this memory buffer.
|
| **`encoding`**: encoding used to encode filename string into bytes.
| Default is None (`sys.getdefaultfileencoding()` is used).
|
| **`parallel`**: open for parallel access using MPI (requires mpi4py and
| parallel-enabled netcdf-c and hdf5 libraries). Default is `False`. If
| `True`, `comm` and `info` kwargs may also be specified.
|
| **`comm`**: MPI_Comm object for parallel access. Default `None`, which
| means MPI_COMM_WORLD will be used. Ignored if `parallel=False`.
|
| **`info`**: MPI_Info object for parallel access. Default `None`, which
| means MPI_INFO_NULL will be used. Ignored if `parallel=False`.
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| __reduce__(...)
| helper for pickle
|
| __repr__(self, /)
| Return repr(self).
|
| __setattr__(self, name, value, /)
| Implement setattr(self, name, value).
|
| __unicode__(...)
|
| close(...)
| **`close(self)`**
|
| Close the Dataset.
|
| createCompoundType(...)
| **`createCompoundType(self, datatype, datatype_name)`**
|
| Creates a new compound data type named `datatype_name` from the numpy
| dtype object `datatype`.
|
| ***Note***: If the new compound data type contains other compound data types
| (i.e. it is a 'nested' compound type, where not all of the elements
| are homogeneous numeric data types), then the 'inner' compound types **must** be
| created first.
|
| The return value is the [CompoundType](#CompoundType) class instance describing the new
| datatype.
|
| createDimension(...)
| **`createDimension(self, dimname, size=None)`**
|
| Creates a new dimension with the given `dimname` and `size`.
|
| `size` must be a positive integer or `None`, which stands for
| "unlimited" (default is `None`). Specifying a size of 0 also
| results in an unlimited dimension. The return value is the [Dimension](#Dimension)
| class instance describing the new dimension. To determine the current
| maximum size of the dimension, use the `len` function on the [Dimension](#Dimension)
| instance. To determine if a dimension is 'unlimited', use the
| [Dimension.isunlimited](#Dimension.isunlimited) method of the [Dimension](#Dimension) instance.
|
| createEnumType(...)
| **`createEnumType(self, datatype, datatype_name, enum_dict)`**
|
| Creates a new Enum data type named `datatype_name` from a numpy
| integer dtype object `datatype`, and a python dictionary
| defining the enum fields and values.
|
| The return value is the [EnumType](#EnumType) class instance describing the new
| datatype.
|
| createGroup(...)
| **`createGroup(self, groupname)`**
|
| Creates a new [Group](#Group) with the given `groupname`.
|
| If `groupname` is specified as a path, using forward slashes as in unix to
| separate components, then intermediate groups will be created as necessary
| (analogous to `mkdir -p` in unix). For example,
| `createGroup('/GroupA/GroupB/GroupC')` will create `GroupA`,
| `GroupA/GroupB`, and `GroupA/GroupB/GroupC`, if they don't already exist.
| If the specified path describes a group that already exists, no error is
| raised.
|
| The return value is a [Group](#Group) class instance.
|
| createVLType(...)
| **`createVLType(self, datatype, datatype_name)`**
|
| Creates a new VLEN data type named `datatype_name` from a numpy
| dtype object `datatype`.
|
| The return value is the [VLType](#VLType) class instance describing the new
| datatype.
|
| createVariable(...)
| **`createVariable(self, varname, datatype, dimensions=(), zlib=False,
| complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None,
| endian='native', least_significant_digit=None, fill_value=None, chunk_cache=None)`**
|
| Creates a new variable with the given `varname`, `datatype`, and
| `dimensions`. If dimensions are not given, the variable is assumed to be
| a scalar.
|
| If `varname` is specified as a path, using forward slashes as in unix to
| separate components, then intermediate groups will be created as necessary
| For example, `createVariable('/GroupA/GroupB/VarC', float, ('x','y'))` will create groups `GroupA`
| and `GroupA/GroupB`, plus the variable `GroupA/GroupB/VarC`, if the preceding
| groups don't already exist.
|
| The `datatype` can be a numpy datatype object, or a string that describes
| a numpy dtype object (like the `dtype.str` attribute of a numpy array).
| Supported specifiers include: `'S1' or 'c' (NC_CHAR), 'i1' or 'b' or 'B'
| (NC_BYTE), 'u1' (NC_UBYTE), 'i2' or 'h' or 's' (NC_SHORT), 'u2'
| (NC_USHORT), 'i4' or 'i' or 'l' (NC_INT), 'u4' (NC_UINT), 'i8' (NC_INT64),
| 'u8' (NC_UINT64), 'f4' or 'f' (NC_FLOAT), 'f8' or 'd' (NC_DOUBLE)`.
| `datatype` can also be a [CompoundType](#CompoundType) instance
| (for a structured, or compound array), a [VLType](#VLType) instance
| (for a variable-length array), or the python `str` builtin
| (for a variable-length string array). Numpy string and unicode datatypes with
| length greater than one are aliases for `str`.
|
| Data from netCDF variables is presented to python as numpy arrays with
| the corresponding data type.
|
| `dimensions` must be a tuple containing dimension names (strings) that
| have been defined previously using [Dataset.createDimension](#Dataset.createDimension). The default value
| is an empty tuple, which means the variable is a scalar.
|
| If the optional keyword `zlib` is `True`, the data will be compressed in
| the netCDF file using gzip compression (default `False`).
|
| The optional keyword `complevel` is an integer between 1 and 9 describing
| the level of compression desired (default 4). Ignored if `zlib=False`.
|
| If the optional keyword `shuffle` is `True`, the HDF5 shuffle filter
| will be applied before compressing the data (default `True`). This
| significantly improves compression. Default is `True`. Ignored if
| `zlib=False`.
|
| If the optional keyword `fletcher32` is `True`, the Fletcher32 HDF5
| checksum algorithm is activated to detect errors. Default `False`.
|
| If the optional keyword `contiguous` is `True`, the variable data is
| stored contiguously on disk. Default `False`. Setting to `True` for
| a variable with an unlimited dimension will trigger an error.
|
| The optional keyword `chunksizes` can be used to manually specify the
| HDF5 chunksizes for each dimension of the variable. A detailed
| discussion of HDF chunking and I/O performance is available
| [here](http://www.hdfgroup.org/HDF5/doc/H5.user/Chunking.html).
| Basically, you want the chunk size for each dimension to match as
| closely as possible the size of the data block that users will read
| from the file. `chunksizes` cannot be set if `contiguous=True`.
|
| The optional keyword `endian` can be used to control whether the
| data is stored in little or big endian format on disk. Possible
| values are `little, big` or `native` (default). The library
| will automatically handle endian conversions when the data is read,
| but if the data is always going to be read on a computer with the
| opposite format as the one used to create the file, there may be
| some performance advantage to be gained by setting the endian-ness.
|
| The `zlib, complevel, shuffle, fletcher32, contiguous, chunksizes` and `endian`
| keywords are silently ignored for netCDF 3 files that do not use HDF5.
|
| The optional keyword `fill_value` can be used to override the default
| netCDF `_FillValue` (the value that the variable gets filled with before
| any data is written to it, defaults given in [default_fillvals](#default_fillvals)).
| If fill_value is set to `False`, then the variable is not pre-filled.
|
| If the optional keyword parameter `least_significant_digit` is
| specified, variable data will be truncated (quantized). In conjunction
| with `zlib=True` this produces 'lossy', but significantly more
| efficient compression. For example, if `least_significant_digit=1`,
| data will be quantized using `numpy.around(scale*data)/scale`, where
| scale = 2**bits, and bits is determined so that a precision of 0.1 is
| retained (in this case bits=4). From the
| [PSL metadata conventions](http://www.esrl.noaa.gov/psl/data/gridded/conventions/cdc_netcdf_standard.shtml):
| "least_significant_digit -- power of ten of the smallest decimal place
| in unpacked data that is a reliable value." Default is `None`, or no
| quantization, or 'lossless' compression.
|
| When creating variables in a `NETCDF4` or `NETCDF4_CLASSIC` formatted file,
| HDF5 creates something called a 'chunk cache' for each variable. The
| default size of the chunk cache may be large enough to completely fill
| available memory when creating thousands of variables. The optional
| keyword `chunk_cache` allows you to reduce (or increase) the size of
| the default chunk cache when creating a variable. The setting only
| persists as long as the Dataset is open - you can use the set_var_chunk_cache
| method to change it the next time the Dataset is opened.
| Warning - messing with this parameter can seriously degrade performance.
|
| The return value is the [Variable](#Variable) class instance describing the new
| variable.
|
| A list of names corresponding to netCDF variable attributes can be
| obtained with the [Variable](#Variable) method [Variable.ncattrs](#Variable.ncattrs). A dictionary
| containing all the netCDF attribute name/value pairs is provided by
| the `__dict__` attribute of a [Variable](#Variable) instance.
|
| [Variable](#Variable) instances behave much like array objects. Data can be
| assigned to or retrieved from a variable with indexing and slicing
| operations on the [Variable](#Variable) instance. A [Variable](#Variable) instance has six
| Dataset standard attributes: `dimensions, dtype, shape, ndim, name` and
| `least_significant_digit`. Application programs should never modify
| these attributes. The `dimensions` attribute is a tuple containing the
| names of the dimensions associated with this variable. The `dtype`
| attribute is a string describing the variable's data type (`i4, f8,
| S1,` etc). The `shape` attribute is a tuple describing the current
| sizes of all the variable's dimensions. The `name` attribute is a
| string containing the name of the Variable instance.
| The `least_significant_digit`
| attributes describes the power of ten of the smallest decimal place in
| the data the contains a reliable value. assigned to the [Variable](#Variable)
| instance. If `None`, the data is not truncated. The `ndim` attribute
| is the number of variable dimensions.
|
| delncattr(...)
| **`delncattr(self,name,value)`**
|
| delete a netCDF dataset or group attribute. Use if you need to delete a
| netCDF attribute with the same name as one of the reserved python
| attributes.
|
| filepath(...)
| **`filepath(self,encoding=None)`**
|
| Get the file system path (or the opendap URL) which was used to
| open/create the Dataset. Requires netcdf >= 4.1.2. The path
| is decoded into a string using `sys.getfilesystemencoding()` by default, this can be
| changed using the `encoding` kwarg.
|
| get_variables_by_attributes(...)
| **`get_variables_by_attribute(self, **kwargs)`**
|
| Returns a list of variables that match specific conditions.
|
| Can pass in key=value parameters and variables are returned that
| contain all of the matches. For example,
|
| ```python
| >>> # Get variables with x-axis attribute.
| >>> vs = nc.get_variables_by_attributes(axis='X')
| >>> # Get variables with matching "standard_name" attribute
| >>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')
| ```
|
| Can pass in key=callable parameter and variables are returned if the
| callable returns True. The callable should accept a single parameter,
| the attribute value. None is given as the attribute value when the
| attribute does not exist on the variable. For example,
|
| ```python
| >>> # Get Axis variables
| >>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
| >>> # Get variables that don't have an "axis" attribute
| >>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
| >>> # Get variables that have a "grid_mapping" attribute
| >>> vs = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None)
| ```
|
| getncattr(...)
| **`getncattr(self,name)`**
|
| retrieve a netCDF dataset or group attribute.
| Use if you need to get a netCDF attribute with the same
| name as one of the reserved python attributes.
|
| option kwarg `encoding` can be used to specify the
| character encoding of a string attribute (default is `utf-8`).
|
| isopen(...)
| **`close(self)`**
|
| is the Dataset open or closed?
|
| ncattrs(...)
| **`ncattrs(self)`**
|
| return netCDF global attribute names for this [Dataset](#Dataset) or [Group](#Group) in a list.
|
| renameAttribute(...)
| **`renameAttribute(self, oldname, newname)`**
|
| rename a [Dataset](#Dataset) or [Group](#Group) attribute named `oldname` to `newname`.
|
| renameDimension(...)
| **`renameDimension(self, oldname, newname)`**
|
| rename a [Dimension](#Dimension) named `oldname` to `newname`.
|
| renameGroup(...)
| **`renameGroup(self, oldname, newname)`**
|
| rename a [Group](#Group) named `oldname` to `newname` (requires netcdf >= 4.3.1).
|
| renameVariable(...)
| **`renameVariable(self, oldname, newname)`**
|
| rename a [Variable](#Variable) named `oldname` to `newname`
|
| set_always_mask(...)
| **`set_always_mask(self, True_or_False)`**
|
| Call [Variable.set_always_mask](#Variable.set_always_mask) for all variables contained in
| this [Dataset](#Dataset) or [Group](#Group), as well as for all
| variables in all its subgroups.
|
| **`True_or_False`**: Boolean determining if automatic conversion of
| masked arrays with no missing values to regular numpy arrays shall be
| applied for all variables. Default True. Set to False to restore the default behaviour
| in versions prior to 1.4.1 (numpy array returned unless missing values are present,
| otherwise masked array returned).
|
| ***Note***: Calling this function only affects existing
| variables. Variables created after calling this function will follow
| the default behaviour.
|
| set_auto_chartostring(...)
| **`set_auto_chartostring(self, True_or_False)`**
|
| Call [Variable.set_auto_chartostring](#Variable.set_auto_chartostring) for all variables contained in this [Dataset](#Dataset) or
| [Group](#Group), as well as for all variables in all its subgroups.
|
| **`True_or_False`**: Boolean determining if automatic conversion of
| all character arrays <--> string arrays should be performed for
| character variables (variables of type `NC_CHAR` or `S1`) with the
| `_Encoding` attribute set.
|
| ***Note***: Calling this function only affects existing variables. Variables created
| after calling this function will follow the default behaviour.
|
| set_auto_mask(...)
| **`set_auto_mask(self, True_or_False)`**
|
| Call [Variable.set_auto_mask](#Variable.set_auto_mask) for all variables contained in this [Dataset](#Dataset) or
| [Group](#Group), as well as for all variables in all its subgroups.
|
| **`True_or_False`**: Boolean determining if automatic conversion to masked arrays
| shall be applied for all variables.
|
| ***Note***: Calling this function only affects existing variables. Variables created
| after calling this function will follow the default behaviour.
|
| set_auto_maskandscale(...)
| **`set_auto_maskandscale(self, True_or_False)`**
|
| Call [Variable.set_auto_maskandscale](#Variable.set_auto_maskandscale) for all variables contained in this [Dataset](#Dataset) or
| [Group](#Group), as well as for all variables in all its subgroups.
|
| **`True_or_False`**: Boolean determining if automatic conversion to masked arrays
| and variable scaling shall be applied for all variables.
|
| ***Note***: Calling this function only affects existing variables. Variables created
| after calling this function will follow the default behaviour.
|
| set_auto_scale(...)
| **`set_auto_scale(self, True_or_False)`**
|
| Call [Variable.set_auto_scale](#Variable.set_auto_scale) for all variables contained in this [Dataset](#Dataset) or
| [Group](#Group), as well as for all variables in all its subgroups.
|
| **`True_or_False`**: Boolean determining if automatic variable scaling
| shall be applied for all variables.
|
| ***Note***: Calling this function only affects existing variables. Variables created
| after calling this function will follow the default behaviour.
|
| set_fill_off(...)
| **`set_fill_off(self)`**
|
| Sets the fill mode for a [Dataset](#Dataset) open for writing to `off`.
|
| This will prevent the data from being pre-filled with fill values, which
| may result in some performance improvements. However, you must then make
| sure the data is actually written before being read.
|
| set_fill_on(...)
| **`set_fill_on(self)`**
|
| Sets the fill mode for a [Dataset](#Dataset) open for writing to `on`.
|
| This causes data to be pre-filled with fill values. The fill values can be
| controlled by the variable's `_Fill_Value` attribute, but is usually
| sufficient to the use the netCDF default `_Fill_Value` (defined
| separately for each variable type). The default behavior of the netCDF
| library corresponds to `set_fill_on`. Data which are equal to the
| `_Fill_Value` indicate that the variable was created, but never written
| to.
|
| set_ncstring_attrs(...)
| **`set_ncstring_attrs(self, True_or_False)`**
|
| Call [Variable.set_ncstring_attrs](#Variable.set_ncstring_attrs) for all variables contained in
| this [Dataset](#Dataset) or [Group](#Group), as well as for all its
| subgroups and their variables.
|
| **`True_or_False`**: Boolean determining if all string attributes are
| created as variable-length NC_STRINGs, (if True), or if ascii text
| attributes are stored as NC_CHARs (if False; default)
|
| ***Note***: Calling this function only affects newly created attributes
| of existing (sub-) groups and their variables.
|
| setncattr(...)
| **`setncattr(self,name,value)`**
|
| set a netCDF dataset or group attribute using name,value pair.
| Use if you need to set a netCDF attribute with the
| with the same name as one of the reserved python attributes.
|
| setncattr_string(...)
| **`setncattr_string(self,name,value)`**
|
| set a netCDF dataset or group string attribute using name,value pair.
| Use if you need to ensure that a netCDF attribute is created with type
| `NC_STRING` if the file format is `NETCDF4`.
|
| setncatts(...)
| **`setncatts(self,attdict)`**
|
| set a bunch of netCDF dataset or group attributes at once using a python dictionary.
| This may be faster when setting a lot of attributes for a `NETCDF3`
| formatted file, since nc_redef/nc_enddef is not called in between setting
| each attribute
|
| sync(...)
| **`sync(self)`**
|
| Writes all buffered data in the [Dataset](#Dataset) to the disk file.
|
| tocdl(...)
| **`tocdl(self, coordvars=False, data=False, outfile=None)`**
|
| call `ncdump` via subprocess to create [CDL](https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_utilities_guide.html#cdl_guide)
| text representation of Dataset. Requires `ncdump` to be installed and in `$PATH`.
|
| **`coordvars`**: include coordinate variable data (via `ncdump -c`). Default False
|
| **`data`**: if True, write out variable data (Default False).
|
| **`outfile`**: If not None, file to output ncdump to. Default is to return a string.
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| fromcdl(...)
| **`fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')`**
|
| call `ncgen` via subprocess to create Dataset from [CDL](https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_utilities_guide.html#cdl_guide)
| text representation. Requires `ncgen` to be installed and in `$PATH`.
|
| **`cdlfilename`**: CDL file.
|
| **`ncfilename`**: netCDF file to create. If not given, CDL filename with
| suffix replaced by `.nc` is used..
|
| **`mode`**: Access mode to open Dataset (Default `'a'`).
|
| **`format`**: underlying file format to use (one of `'NETCDF4',
| 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'`, `'NETCDF3_64BIT_OFFSET'` or
| `'NETCDF3_64BIT_DATA'`. Default `'NETCDF4'`.
|
| Dataset instance for `ncfilename` is returned.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __orthogonal_indexing__
|
| cmptypes
|
| data_model
|
| dimensions
|
| disk_format
|
| enumtypes
|
| file_format
|
| groups
|
| keepweakref
|
| name
| string name of Group instance
|
| parent
|
| path
|
| variables
|
| vltypes
Discover the metadata¶
In order to see all the global attributes and some other information about the file, type in your command window:
print(aatams_DATA)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format DAP2):
project: Integrated Marine Observing System (IMOS)
conventions: IMOS-1.2
date_created: 2012-09-13T07:27:03Z
title: Temperature, Salinity and Depth profiles in near real time
institution: AATAMS
site: CTD Satellite Relay Data Logger
abstract: CTD Satellite Relay Data Loggers are used to explore how marine mammal behaviour relates to their oceanic environment. Loggers developped at the University of St Andrews Sea Mammal Research Unit transmit data in near real time via the Argo satellite system
source: SMRU CTD Satellite relay Data Logger on marine mammals
keywords: Oceans>Ocean Temperature>Water Temperature ;Oceans>Salinity/Density>Conductivity ;Oceans>Marine Biology>Marine Mammals
references: http://imos.org.au/aatams.html
unique_reference_code: ct64-M746-09
platform_code: Q9900335
netcdf_version: 3.6
naming_authority: IMOS
quality_control_set: 1
cdm_data_type: Trajectory
geospatial_lat_min: -73.2633350301659
geospatial_lat_max: -54.4634576271227
geospatial_lat_units: degrees_north
geospatial_lon_min: -179.903050293358
geospatial_lon_max: 179.942919142718
geospatial_lon_units: degrees_east
geospatial_vertical_min: 6.0
geospatial_vertical_max: 1138.0
geospatial_vertical_units: dbar
time_coverage_start: 2010-02-05T04:30:00Z
time_coverage_end: 2010-10-29T07:10:00Z
data_centre_email: info@emii.org.au
data_centre: eMarine Information Infrastructure (eMII)
author: Besnard, Laurent
author_email: laurent.besnard@utas.edu.au
institution_references: http://imos.org.au/emii.html
principal_investigator: Harcourt, Rob
citation: Citation to be used in publications should follow the format: IMOS, [year-of-data-download], [Title], [data-access-URL],accessed [date-of-access]
acknowledgment: Any users of IMOS data are required to clearly acknowledge the source of the material in the format: "Data was sourced from the Integrated Marine Observing System (IMOS) - IMOS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) and the Super Science Initiative (SSI)"
distribution_statement: AATAMS data may be re-used, provided that related metadata explaining the data has been reviewed by the user and the data is appropriately acknowledged. Data, products and services from IMOS are provided "as is" without any warranty as to fitness for a particular purpose
file_version: Level 0 - Raw data
file_version_quality_control: Data in this file has not undergone quality control. There has been no QC performed on this real-time data.
metadata_uuid: dbe1e98e-f447-4cf6-bb1c-412baae171ab
body_code: 11449
ptt_code: 54746
species_name: Southern ellie
release_site: Macquarie
sattag_program: ct64
EXTRA_DIMENSION.length_char: 8
dimensions(sizes): obs(12987), profiles(800)
variables(dimensions): float64 TIME(profiles), float64 LATITUDE(profiles), float64 LONGITUDE(profiles), float64 TEMP(obs), float64 PRES(obs), float64 PSAL(obs), float64 parentIndex(obs), float64 TIME_quality_control(profiles), float64 LATITUDE_quality_control(profiles), float64 LONGITUDE_quality_control(profiles), float64 TEMP_quality_control(obs), float64 PRES_quality_control(obs), float64 PSAL_quality_control(obs)
groups:
Tip
Global attributes in the netCDF
file become attributes of the Dataset
object.
A list of global attribute names is returned by the ncattrs() method of the object. The dict attribute of the object is a dictionary of all netCDF attribute names and values.
# store the dataset's title in a local variable
title_str = aatams_DATA.title
# list all global attribute names
aatams_DATA.ncattrs()
['project',
'conventions',
'date_created',
'title',
'institution',
'site',
'abstract',
'source',
'keywords',
'references',
'unique_reference_code',
'platform_code',
'netcdf_version',
'naming_authority',
'quality_control_set',
'cdm_data_type',
'geospatial_lat_min',
'geospatial_lat_max',
'geospatial_lat_units',
'geospatial_lon_min',
'geospatial_lon_max',
'geospatial_lon_units',
'geospatial_vertical_min',
'geospatial_vertical_max',
'geospatial_vertical_units',
'time_coverage_start',
'time_coverage_end',
'data_centre_email',
'data_centre',
'author',
'author_email',
'institution_references',
'principal_investigator',
'citation',
'acknowledgment',
'distribution_statement',
'file_version',
'file_version_quality_control',
'metadata_uuid',
'body_code',
'ptt_code',
'species_name',
'release_site',
'sattag_program',
'EXTRA_DIMENSION.length_char']
We can store the complete set of attributes in a dictionary (OrderedDict
) object (similar to a standard Python dict
, but
which maintains the order in which items are entered):
globalAttr = aatams_DATA.__dict__
globalAttr
OrderedDict([('project', 'Integrated Marine Observing System (IMOS)'),
('conventions', 'IMOS-1.2'),
('date_created', '2012-09-13T07:27:03Z'),
('title',
'Temperature, Salinity and Depth profiles in near real time'),
('institution', 'AATAMS'),
('site', 'CTD Satellite Relay Data Logger'),
('abstract',
'CTD Satellite Relay Data Loggers are used to explore how marine mammal behaviour relates to their oceanic environment. Loggers developped at the University of St Andrews Sea Mammal Research Unit transmit data in near real time via the Argo satellite system'),
('source',
'SMRU CTD Satellite relay Data Logger on marine mammals'),
('keywords',
'Oceans>Ocean Temperature>Water Temperature ;Oceans>Salinity/Density>Conductivity ;Oceans>Marine Biology>Marine Mammals'),
('references', 'http://imos.org.au/aatams.html'),
('unique_reference_code', 'ct64-M746-09'),
('platform_code', 'Q9900335'),
('netcdf_version', '3.6'),
('naming_authority', 'IMOS'),
('quality_control_set', '1'),
('cdm_data_type', 'Trajectory'),
('geospatial_lat_min', -73.2633350301659),
('geospatial_lat_max', -54.4634576271227),
('geospatial_lat_units', 'degrees_north'),
('geospatial_lon_min', -179.903050293358),
('geospatial_lon_max', 179.942919142718),
('geospatial_lon_units', 'degrees_east'),
('geospatial_vertical_min', 6.0),
('geospatial_vertical_max', 1138.0),
('geospatial_vertical_units', 'dbar'),
('time_coverage_start', '2010-02-05T04:30:00Z'),
('time_coverage_end', '2010-10-29T07:10:00Z'),
('data_centre_email', 'info@emii.org.au'),
('data_centre', 'eMarine Information Infrastructure (eMII)'),
('author', 'Besnard, Laurent'),
('author_email', 'laurent.besnard@utas.edu.au'),
('institution_references', 'http://imos.org.au/emii.html'),
('principal_investigator', 'Harcourt, Rob'),
('citation',
'Citation to be used in publications should follow the format: IMOS, [year-of-data-download], [Title], [data-access-URL],accessed [date-of-access]'),
('acknowledgment',
'Any users of IMOS data are required to clearly acknowledge the source of the material in the format: "Data was sourced from the Integrated Marine Observing System (IMOS) - IMOS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) and the Super Science Initiative (SSI)"'),
('distribution_statement',
'AATAMS data may be re-used, provided that related metadata explaining the data has been reviewed by the user and the data is appropriately acknowledged. Data, products and services from IMOS are provided "as is" without any warranty as to fitness for a particular purpose'),
('file_version', 'Level 0 - Raw data'),
('file_version_quality_control',
'Data in this file has not undergone quality control. There has been no QC performed on this real-time data.'),
('metadata_uuid', 'dbe1e98e-f447-4cf6-bb1c-412baae171ab'),
('body_code', '11449'),
('ptt_code', '54746'),
('species_name', 'Southern ellie'),
('release_site', 'Macquarie'),
('sattag_program', 'ct64'),
('EXTRA_DIMENSION.length_char', 8)])
# now you can also do (same effect as first command above)
title_str = globalAttr['title']
print(title_str)
Temperature, Salinity and Depth profiles in near real time
Discover Variables¶
To list all the variables available in the NetCDF
file, type:
aatams_DATA.variables.keys()
odict_keys(['TIME', 'LATITUDE', 'LONGITUDE', 'TEMP', 'PRES', 'PSAL', 'parentIndex', 'TIME_quality_control', 'LATITUDE_quality_control', 'LONGITUDE_quality_control', 'TEMP_quality_control', 'PRES_quality_control', 'PSAL_quality_control'])
Each variable is accessed via a Variable
object, in a similar way to the Dataset
object. To access the Temperature
variable :
TEMP = aatams_DATA.variables['TEMP']
# now you can print the variable's attributes and other info
print(TEMP)
<class 'netCDF4._netCDF4.Variable'>
float64 TEMP(obs)
standard_name: sea_water_temperature
long_name: sea_water_temperature
units: Celsius
valid_min: -2.0
valid_max: 40.0
_FillValue: 9999.0
ancillary_variables: TEMP_quality_control
unlimited dimensions:
current shape = (12987,)
filling off
Accessing variable attributes, e.g. its standard_name for example
TEMP.standard_name
'sea_water_temperature'
Extract the data values (as a Numpy
Masked Array)
temperature = TEMP[:]
print('Values:',temperature)
print('Min:',min(temperature),' Max:',max(temperature))
print(type(temperature))
Values: [7.644 7.6322 7.6243 ... 5.4599 5.1781 4.4438]
Min: -1.855 Max: 8.136
<class 'numpy.ma.core.MaskedArray'>
Export the salinity variable¶
Following the same approach as the one proposed for the temperature variable extract the salinity one.
# netCDF4 Variable object
PSAL = aatams_DATA.variables['PSAL']
# now you can print the variable's attributes and other info
print(PSAL)
# access variable attributes, e.g. its standard_name
PSAL.standard_name
# extract the data values (as a numpy array)
salinity = PSAL[:]
<class 'netCDF4._netCDF4.Variable'>
float64 PSAL(obs)
standard_name: sea_water_salinity
long_name: sea_water_salinity
units: 1e-3
_FillValue: 9999.0
ancillary_variables: PSAL_quality_control
unlimited dimensions:
current shape = (12987,)
filling off
print('Values:',salinity)
print('Min:',min(salinity),' Max:',max(salinity))
Values: [34.004 34.0069 34.0058 ... 34.1937 34.252 34.2277]
Min: 33.148 Max: 34.766
Plotting the Salinity-Temperature relationship¶
We can now work with and plot these variables and get some information/relationship regarding their temporal and/or spatial evalution for example.
To do so we need to import some useful Python libraries…
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.nonparametric.smoothers_lowess import lowess
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.rcParams['mathtext.fontset'] = 'cm'
We then use a smoothing function lowess
Locally Weighted Scatterplot Smoothing to get a general trend of salinity evolution in relation to temperature from the existing scattered dataset.
ys = lowess(salinity, temperature, it=5, frac=0.2)
Now we plot the result with matplotlib:
plt.figure(figsize=(10,6))
plt.scatter(temperature, salinity, s=2, marker='o', facecolor='r', lw = 1,label='T-S dataset')
plt.xlabel('Temperature in Celcius')
plt.ylabel('Salinity units of parts per thousand (1.e-3)')
plt.title(title_str)
plt.plot(ys[:,0],ys[:,1],'k',linewidth=2,label='smoothing')
plt.legend(loc=0, fontsize=10)
plt.show()
plt.close()