API

Documentation of the pyaerocom programming interface.

Data classes

Gridded data

class pyaerocom.griddeddata.GriddedData(input=None, var_name=None, convert_unit_on_init=True, **meta)[source]

Base class representing model data

This class is largely based on the iris.Cube object. However, this object comes with an expanded functionality for convenience, for instance, netCDF files can directly be loaded in the GriddedData object, whereas iris.cube.Cube instances are typically created using helper methods such as

1. iris.load() (returns iris.cube.CubeList, i.e. a list-like iterable object that contains instances of Cube objects, one for each variable) or

2. iris.load_cube() which directly returns a iris.cube.Cube instance and typically requires specification of a variable constraint.

The GriddedData object represents one variable in space and time, as well as corresponding meta information. Since it is based on the https://github.com/SciTools/iris/issues/1977 iris.cube.Cube it is optimised for netCDF files that follow the CF conventions and may not work for files that do not follow this standard.

Parameters
  • input (str: or Cube) – data input. Can be a single .nc file or a preloaded iris Cube.

  • var_name (str, optional) – variable name that is extracted if input is a file path. Irrelevant if input is preloaded Cube

COORDS_ORDER_TSERIES = ['time', 'latitude', 'longitude']

Req. order of dimension coordinates for time-series computation

SUPPORTED_VERT_SCHEMES = ['mean', 'max', 'min', 'surface', 'altitude', 'profile']
property TS_TYPES

List with valid filename encryptions specifying temporal resolution

add_aggregator(aggr_name)[source]
aerocom_filename(at_stations=False)[source]

Filename of data following Aerocom 3 conventions

Parameters

at_stations (str) – if True, then AtStations string will be included in filename

Returns

generated file name based on what is in this object

Return type

str

aerocom_savename(data_id=None, var_name=None, vert_code=None, year=None, ts_type=None)[source]

Get filename for saving following AeroCom conventions

property altitude_access
apply_region_mask(region_id, thresh_coast=0.5, inplace=False)[source]

Apply a masked region filter

area_weighted_mean()[source]

Get area weighted mean

property area_weights

Area weights of lat / lon grid

property base_year

Base year of time dimension

Note

Changing this attribute will update the time-dimension.

calc_area_weights()[source]

Calculate area weights for grid

change_base_year(new_year, inplace=True)[source]

Changes base year of time dimension

Relevant, e.g. for climatological analyses.

Note

This method does not account for offsets arising from leap years ( affecting daily or higher resolution data). It is thus recommended to use this method with care. E.g. if you use this method on a 2016 daily data object, containing a calendar that supports leap years, you’ll end up with 366 time stamps also in the new data object.

Parameters
  • new_year (int) – new base year (can also be other than integer if it is convertible)

  • inplace (bool) – if True, modify this object, else, use a copy

Returns

modified data object

Return type

GriddedData

check_altitude_access()[source]

Checks if altitude levels can be accessed

Returns

True, if altitude access is provided, else False

Return type

bool

check_coord_order()[source]

Wrapper for check_dimcoords_tseries()

check_dimcoords_tseries()[source]

Check order of dimension coordinates for time series retrieval

For computation of time series at certain lon / lat coordinates, the data dimensions have to be in a certain order specified by COORDS_ORDER_TSERIES.

This method checks the current order (and dimensionality) of data and raises appropriate errors.

Raises
  • DataDimensionError – if dimension of data is not supported (currently, 3D or 4D data is supported)

  • DimensionOrderError – if dimensions are not in the right order (in which case reorder_dimensions_tseries() may be used to catch the Exception)

check_frequency()[source]

Check if all datapoints are sampled at the same time frequency

check_lon_circular()[source]

Check if latitude and longitude coordinates are circular

check_unit()[source]

Check if unit of data is AeroCom default and convert if not

collapsed(coords, aggregator, **kwargs)[source]

Collapse cube

Reimplementation of method iris.cube.Cube.collapsed(), for details see here

Parameters
  • coords (str or list) – string IDs of coordinate(s) that are to be collapsed (e.g. ["longitude", "latitude"])

  • aggregator (str or Aggregator or WeightedAggretor) – the aggregator used. If input is string, it is converted into the corresponding iris Aggregator object, see str_to_iris() for valid strings

  • **kwargs – additional keyword args (e.g. weights)

Returns

collapsed data object

Return type

GriddedData

compute_at_stations_file(latitudes=None, longitudes=None, out_dir=None, savename=None, obs_data=None)[source]

Creates and saves new netcdf file at input lat / lon coordinates

This method can be used to reduce the size of too large grid files. It reduces the lon / lat dimensionality corresponding to the locations of the input lat / lon coordinates.

property computed
property concatenated
convert_unit(new_unit)[source]

Convert unit of data to new unit

property coord_names

List containing coordinate names

property coords_order

Array containing the order of coordinates

copy()[source]

Copy this data object

copy_coords(other, inplace=True)[source]

Copy all coordinates from other data object

Requires the underlying data to be the same shape.

Warning

This operation will delete all existing coordinates and auxiliary coordinates and will then copy the ones from the input data object. No checks of any kind will be performed

Parameters

other (GriddedData or Cube) – other data object (needs to be same shape as this object)

Returns

data object containing coordinates from other object

Return type

GriddedData

crop(lon_range=None, lat_range=None, time_range=None, region=None)[source]

High level function that applies cropping along multiple axes

Note

1. For cropping of longitudes and latitudes, the method iris.cube.Cube.intersection() is used since it automatically accepts and understands longitude input based on definition 0 <= lon <= 360 as well as for -180 <= lon <= 180 2. Time extraction may be provided directly as index or in form of pandas.Timestamp objects.

Parameters
  • lon_range (tuple, optional) – 2-element tuple containing longitude range for cropping. If None, the longitude axis remains unchanged. Example input to crop around meridian: lon_range=(-30, 30)

  • lat_range (tuple, optional) – 2-element tuple containing latitude range for cropping. If None, the latitude axis remains unchanged

  • time_range (tuple, optional) –

    2-element tuple containing time range for cropping. Allowed data types for specifying the times are

    1. a combination of 2 pandas.Timestamp instances or

    2. a combination of two strings that can be directly converted into pandas.Timestamp instances (e.g. time_range=(“2010-1-1”, “2012-1-1”)) or

    3. directly a combination of indices (int).

    If None, the time axis remains unchanged.

  • region (str or Region, optional) – string ID of pyaerocom default region or directly an instance of the Region object. May be used instead of lon_range and lat_range, if these are unspecified.

Returns

new data object containing cropped grid

Return type

GriddedData

property cube

Instance of underlying cube object

property data

Data array (n-dimensional numpy array)

Note

This is a pointer to the data object of the underlying iris.Cube instance and will load the data into memory. Thus, in case of large datasets, this may lead to a memory error

property data_id

ID of data object (e.g. model run ID, obsnetwork ID)

Note

This attribute was formerly named name which is alse the corresponding attribute name in metadata

property data_revision

Revision string from file Revision.txt in the main data directory

delete_all_coords(inplace=True)[source]

Deletes all coordinates (dimension + auxiliary) in this object

delete_aux_vars()[source]

Delete auxiliary variables and iris AuxFactories

property delta_t

Array containing timedelta values for each time stamp

property dimcoord_names

List containing coordinate names

downscale_time(to_ts_type='monthly')[source]
extract(constraint, inplace=False)[source]

Extract subset

Parameters

constraint (iris.Constraint) – constraint that is to be applied

Returns

new data object containing cropped data

Return type

GriddedData

extract_surface_level()[source]

Extract surface level from 4D field

filter_altitude(alt_range=None)[source]

Currently dummy method that makes life easier in Filter

Returns

current instance

Return type

GriddedData

filter_region(region_id, inplace=False, **kwargs)[source]

Filter region based on ID

This works both for rectangular regions and mask regions

Parameters
  • region_id (str) – name of region

  • inplace (bool) – if True, the current data object is modified, else a new object is returned

  • **kwargs – additional keyword args passed to apply_region_mask() if input region is a mask.

Returns

filtered data object

Return type

GriddedData

find_closest_index(**dimcoord_vals)[source]

Find the closest indices for dimension coordinate values

property from_files

List of file paths from which this data object was created

get_altitude(**coords)[source]

Extract (or try to compute) altitude values at input coordinates

get_area_weighted_timeseries(region=None)[source]

Helper method to extract area weighted mean timeseries

Parameters

region – optional, name of AeroCom default region for which the mean is to be calculated (e.g. EUROPE)

Returns

station data containing area weighted mean

Return type

StationData

property grid

Underlying grid data object

property has_data

True if sum of shape of underlying Cube instance is > 0, else False

property has_latlon_dims

Boolean specifying whether data has latitude and longitude dimensions

property has_time_dim

Boolean specifying whether data has latitude and longitude dimensions

infer_ts_type()[source]

Try to infer sampling frequency from time dimension data

Returns

ts_type that was inferred (is assigned to metadata too)

Return type

str

Raises

DataDimensionError – if data object does not contain a time dimension

interpolate(sample_points=None, scheme='nearest', collapse_scalar=True, **coords)[source]

Interpolate cube at certain discrete points

Reimplementation of method iris.cube.Cube.interpolate(), for details see here

Note

The input coordinates may also be provided using the input arg **coords which provides a more intuitive option (e.g. input (sample_points=[("longitude", [10, 20]), ("latitude", [1, 2])]) is the same as input (longitude=[10, 20], latitude=[1,2])

Parameters
  • sample_points (list) – sequence of coordinate pairs over which to interpolate

  • scheme (str or iris interpolator object) – interpolation scheme, pyaerocom default is nearest. If input is string, it is converted into the corresponding iris Interpolator object, see str_to_iris() for valid strings

  • collapse_scalar (bool) – Whether to collapse the dimension of scalar sample points in the resulting cube. Default is True.

  • **coords – additional keyword args that may be used to provide the interpolation coordinates in an easier way than using the Cube argument sample_points. May also be a combination of both.

Returns

new data object containing interpolated data

Return type

GriddedData

Examples

>>> from pyaerocom import GriddedData
>>> data = GriddedData()
>>> data._init_testdata_default()
>>> itp = data.interpolate([("longitude", (10)),
...                         ("latitude" , (35))])
>>> print(itp.shape)
(365, 1, 1)
intersection(*args, **kwargs)[source]

Ectract subset using iris.cube.Cube.intersection()

See here for details related to method and input parameters.

Note

Only works if underlying grid data type is iris.cube.Cube

Parameters
  • *args – non-keyword args

  • **kwargs – keyword args

Returns

new data object containing cropped data

Return type

GriddedData

property is_climatology
property is_masked

Flag specifying whether data is masked or not

Note

This method only works if the data is loaded.

isel(**kwargs)[source]
property lat_res
load_input(input, var_name=None, perform_fmt_checks=None)[source]

Import input as cube

Parameters
  • input (str: or Cube) – data input. Can be a single .nc file or a preloaded iris Cube.

  • var_name (str, optional) – variable name that is extracted if input is a file path . Irrelevant if input is preloaded Cube

  • perform_fmt_checks (bool, optional) – perform formatting checks based on information in filenames. Only relevant if input is a file

property lon_res
property long_name

Long name of variable

max()[source]

Maximum value

mean(areaweighted=True)[source]

Mean value of data array

Note

Corresponds to numerical mean of underlying N-dimensional numpy array. Does not consider area-weights or any other advanced averaging.

mean_at_coords(latitude=None, longitude=None, time_resample_kwargs=None, **kwargs)[source]

Compute mean value at all input locations

Parameters
  • latitude (1D list or similar) – list of latitude coordinates of coordinate locations. If None, please provided coords in iris style as list of (lat, lon) tuples via coords (handled via arg kwargs)

  • longitude (1D list or similar) – list of longitude coordinates of coordinate locations. If None, please provided coords in iris style as list of (lat, lon) tuples via coords (handled via arg kwargs)

  • time_resample_kwargs (dict, optional) – time resampling arguments passed to StationData.resample_time()

  • **kwargs – additional keyword args passed to to_time_series()

Returns

mean value at coordinates over all times available in this object

Return type

float

property metadata
min()[source]

Minimum value

property name

ID of model to which data belongs

property ndim

Number of dimensions

property plot_settings

Variable instance that contains plot settings

The settings can be specified in the variables.ini file based on the unique var_name, see e.g. here

If no default settings can be found for this variable, all parameters will be initiated with None, in which case the Aerocom plot method uses

quickplot_map(time_idx=0, xlim=- 180, 180, ylim=- 90, 90, add_mean=True, **kwargs)[source]

Make a quick plot onto a map

Parameters
  • time_idx (int) – index in time to be plotted

  • xlim (tuple) – 2-element tuple specifying plotted longitude range

  • ylim (tuple) – 2-element tuple specifying plotted latitude range

  • add_mean (bool) – if True, the mean value over the region and period is inserted

  • **kwargs – additional keyword arguments passed to pyaerocom.quickplot.plot_map()

Returns

matplotlib figure instance containing plot

Return type

fig

property reader

Instance of reader class from which this object was created

regrid(other=None, lat_res_deg=None, lon_res_deg=None, scheme='areaweighted', **kwargs)[source]

Regrid this grid to grid resolution of other grid

Parameters
  • other (GriddedData or Cube, optional) – other data object to regrid to. If None, then input args lat_res and lon_res are used to regrid.

  • lat_res_deg (float or int, optional) – latitude resolution in degrees (is only used if input arg other is None)

  • lon_res_deg (float or int, optional) – longitude resolution in degrees (is only used if input arg other is None)

  • scheme (str) – regridding scheme (e.g. linear, neirest, areaweighted)

Returns

regridded data object (new instance, this object remains unchanged)

Return type

GriddedData

remove_outliers(low=None, high=None, inplace=True)[source]

Remove outliers from data

Parameters
  • low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)

  • high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)

  • inplace (bool) – if True, this object is modified, else outliers are removed in a copy of this object

Returns

modified data object

Return type

GriddedData

reorder_dimensions_tseries()[source]

Reorders dimensions of data such that to_time_series() works

reorder_dimensions_tseries_old()[source]

Reorders dimensions of data such that to_time_series() works

resample_time(to_ts_type='monthly', how=None, apply_constraints=None, min_num_obs=None, use_iris=False)[source]

Resample time to input resolution

Parameters
  • to_ts_type (str) – either of the supported temporal resolutions (cf. IRIS_AGGREGATORS in helpers, e.g. “monthly”)

  • how (str) – string specifying how the data is to be aggregated, default is mean

  • apply_constraints (bool, optional) – if True, hierarchical resampling is applied using input min_num_obs (if provided) or else, using constraints specified in pyaerocom.const.OBS_MIN_NUM_RESAMPLE

  • min_num_obs (dict or int, optinal) –

    integer or nested dictionary specifying minimum number of observations required to resample from higher to lower frequency. For instance, if input_data is hourly and to_ts_type is monthly, you may specify something like:

    min_num_obs =
        {'monthly'  :   {'daily'  : 7},
         'daily'    :   {'hourly' : 6}}
    

    to require at least 6 hours per day and 7 days per month.

  • use_iris (bool) – option to use resampling scheme from iris library rather than xarray.

Returns

new data object containing downscaled data

Return type

GriddedData

Raises

TemporalResolutionError – if input resolution is not provided, or if it is higher temporal resolution than this object

search_other(var_name, require_same_shape=True)[source]

Searches data for another variable

sel(use_neirest=True, **dimcoord_vals)[source]

Select subset by dimension names

Note

This is a BETA version, please use with care

Parameters

**dimcoord_vals – key / value pairs specifying coordinate values to be extracted

Returns

subset data object

Return type

GriddedData

property shape
short_str()[source]

Short string representation

split_years(years=None)[source]

Generator to split data object into individual years

Note

This is a generator method and thus should be looped over

Parameters

years (list, optional) – List of years that should be excluded. If None, it uses output from years_avail().

Yields

GriddedData – single year data object

property standard_name

Standard name of variable

property start

Start time of dataset as datetime64 object

std()[source]

Standard deviation of values

property stop

Start time of dataset as datetime64 object

property suppl_info
time_stamps()[source]

Convert time stamps into list of numpy datetime64 objects

The conversion is done using method cfunit_to_datetime64()

Returns

list containing all time stamps as datetime64 objects

Return type

list

to_netcdf(out_dir, savename=None, **kwargs)[source]

Save as NetCDF file

Parameters
  • out_dir (str) – output direcory (must exist)

  • savename (str, optional) – name of file. If None, aerocom_savename() is used which is generated automatically and may be modified via **kwargs

  • **kwargs – keywords for name

Returns

list of output files created

Return type

list

to_time_series(sample_points=None, scheme='nearest', vert_scheme=None, add_meta=None, use_iris=False, **coords)[source]

Extract time-series for provided input coordinates (lon, lat)

Extract time series for each lon / lat coordinate in this cube or at predefined sample points (e.g. station data). If sample points are provided, the cube is interpolated first onto the sample points.

Parameters
  • sample_points (list) – coordinates (e.g. lon / lat) at which time series is supposed to be retrieved

  • scheme (str or iris interpolator object) – interpolation scheme (for details, see interpolate())

  • vert_scheme (str) – string specifying how to treat vertical coordinates. This is only relevant for data that contains vertical levels. It will be ignored otherwise. Note that if the input coordinate specifications contain altitude information, this parameter will be set automatically to ‘altitude’. Allowed inputs are all data collapse schemes that are supported by pyaerocom.helpers.str_to_iris() (e.g. mean, median, sum). Further valid schemes are altitude, surface, profile. If not other specified and if altitude coordinates are provided via sample_points (or **coords parameters) then, vert_scheme will be set to altitude. Else, profile is used.

  • add_meta (dict, optional) – dictionary specifying additional metadata for individual input coordinates. Keys are meta attribute names (e.g. station_name) and corresponding values are lists (with length of input coords) or single entries that are supposed to be assigned to each station. E.g. add_meta=dict(station_name=[<list_of_station_names>])).

  • **coords – additional keyword args that may be used to provide the interpolation coordinates (for details, see interpolate())

Returns

list of result dictionaries for each coordinate. Dictionary keys are: longitude, latitude, var_name

Return type

list

to_time_series_single_coord(latitude, longitude)[source]

Make time series dictionary of single location using neirest coordinate

Parameters
  • latitude (float) – latitude of coordinate

  • longitude (float) – longitude of coordinate

Returns

dictionary containing results

Return type

dict

to_xarray()[source]
transpose(new_order)[source]

Re-order data dimensions in object

Wrapper for iris.cube.Cube.transpose()

Note

Changes THIS object (i.e. no new instance of GriddedData will be created)

Parameters

order (list) – new index order

property ts_type

Temporal resolution of data

property unit

Unit of data

property unit_ok

Boolean specifying if variable unit is AeroCom default

property units

Unit of data

update_meta(**kwargs)[source]

Update metadata dictionary

property var_info

Print information about variable

property var_name

Name of variable

property var_name_aerocom

AeroCom variable name

property vert_code

Vertical code of data (e.g. Column, Surface, ModelLevel)

years_avail()[source]

Generate list of years that are available in this dataset

Returns

Return type

list

Ungridded data

class pyaerocom.ungriddeddata.UngriddedData(num_points=None, add_cols=None)[source]

Class representing point-cloud data (ungridded)

The data is organised in a 2-dimensional numpy array where the first index (rows) axis corresponds to individual measurements (i.e. one timestamp of one variable) and along the second dimension (containing 11 columns) the actual values are stored (in column 6) along with additional information, such as metadata index (can be used as key in metadata to access additional information related to this measurement), timestamp, latitude, longitude, altitude of instrument, variable index and, in case of 3D data (e.g. LIDAR profiles), also the altitude corresponding to the data value.

Note

That said, let’s look at two examples.

Example 1: Suppose you load 3 variables from 5 files, each of which contains 30 timestamps. This corresponds to a total of 3*5*30=450 data points and hence, the shape of the underlying numpy array will be 450x11.

Example 2: 3 variables, 5 files, 30 timestamps, but each variable is height resolved, containing 100 altitudes => 3*5*30*100=4500 data points, thus, the final shape will be 4500x11.

metadata

dictionary containing meta information about the data. Keys are floating point numbers corresponding to each station, values are corresponding dictionaries containing station information.

Type

dict

meta_idx

dictionary containing index mapping for each station and variable. Keys correspond to metadata key (float -> station, see metadata) and values are dictionaries containing keys specifying variable name and corresponding values are arrays or lists, specifying indices (rows) of these station / variable information in _data. Note: this information is redunant and is there to accelarate station data extraction since the data index matches for a given metadata block do not need to be searched in the underlying numpy array.

Type

dict

var_idx

mapping of variable name (keys, e.g. od550aer) to numerical variable index of this variable in data numpy array (in column specified by _VARINDEX)

Type

dict

Parameters
  • num_points (int, optional) – inital number of total datapoints (number of rows in 2D dataarray)

  • add_cols (list, optional) – list of additional index column names of 2D datarray.

STANDARD_META_KEYS = ['filename', 'station_id', 'station_name', 'instrument_name', 'PI', 'country', 'country_code', 'ts_type', 'latitude', 'longitude', 'altitude', 'data_id', 'dataset_name', 'data_product', 'data_version', 'data_level', 'framework', 'instr_vert_loc', 'revision_date', 'website', 'ts_type_src', 'stat_merge_pref_attr']
add_chunk(size=None)[source]

Extend the size of the data array

Parameters

size (int, optional) – number of additional rows. If None (default) or smaller than minimum chunksize specified in attribute _CHUNKSIZE, then the latter is used.

add_station_data(stat, meta_idx=None, data_idx=None, check_index=False)[source]
all_datapoints_var(var_name)[source]

Get array of all data values of input variable

Parameters

var_name (str) – variable name

Returns

1-d numpy array containing all values of this variable

Return type

ndarray

Raises

AttributeError – if variable name is not available

property altitude

Altitudes of stations

append(other)[source]

Append other instance of UngriddedData to this object

Note

Calls merge(other, new_obj=False)()

Parameters

other (UngriddedData) – other data object

Returns

merged data object

Return type

UngriddedData

Raises

ValueError – if input object is not an instance of UngriddedData

apply_filters(var_outlier_ranges=None, **filter_attributes)[source]

Extended filtering method

Combines filter_by_meta() and adds option to also remove outliers (keyword remove_outliers), set flagged data points to NaN (keyword set_flags_nan) and to extract individual variables (keyword var_name).

Parameters
  • var_outlier_ranges (dict, optional) – dictionary specifying custom outlier ranges for individual variables.

  • **filter_attributes (dict) – filters that are supposed to be applied to the data. To remove outliers, use keyword remove_outliers, to set flagged values to NaN, use keyword set_flags_nan, to extract single or multiple variables, use keyword var_name. Further filter keys are assumed to be metadata specific and are passed to filter_by_meta().

Returns

filtered data object

Return type

UngriddedData

apply_region_mask(region_id=None)[source]

TODO : Write documentations

Parameters

region_id (str or list (of strings)) – ID of region or IDs of multiple regions to be combined

change_var_idx(var_name, new_idx)[source]

Change index that is assigned to variable

Each variable in this object has assigned a unique index that is stored in the dictionary var_idx and which is used internally to access data from a certain variable from the data array _data (the indices are stored in the data column specified by _VARINDEX, cf. class header).

This index thus needs to be unique for each variable and hence, may need to be updated, when two instances of UngriddedData are merged (cf. merge()).

And the latter is exactrly what this function does.

Parameters
  • var_name (str) – name of variable

  • new_idx (int) – new index of variable

Raises

ValueError – if input new_idx already exist in this object as a variable index

check_convert_var_units(var_name, to_unit=None, inplace=True)[source]
check_set_country()[source]

CHecks all metadata entries for availability of country information

Metadata blocks that are missing country entry will be updated based on country inferred from corresponding lat / lon coordinate. Uses pyaerocom.geodesy.get_country_info_coords() (library reverse-geocode) to retrieve countries. This may be errouneous close to country borders as it uses eucledian distance based on a list of known locations.

Note

Metadata blocks that do not contain latitude and longitude entries are skipped.

Returns

  • list – metadata entries where country was added

  • list – corresponding countries that were inferred from lat / lon

check_unit(var_name, unit=None)[source]

Check if variable unit corresponds to AeroCom unit

Parameters
  • var_name (str) – variable name for which unit is to be checked

  • unit (str, optional) – unit to be checked, if None, AeroCom default unit is used

Raises

MetaDataError – if unit information is not accessible for input variable name

clear_meta_no_data(inplace=True)[source]

Remove all metadata blocks that do not have data associated with it

Parameters

inplace (bool) – if True, the changes are applied to this instance directly, else to a copy

Returns

cleaned up data object

Return type

UngriddedData

Raises

DataCoverageError – if filtering results in empty data object

code_lat_lon_in_float()[source]

method to code lat and lon in a single number so that we can use np.unique to determine single locations

colocate_vardata(var1, data_id1=None, var2=None, data_id2=None, other=None, **kwargs)[source]
property contains_datasets

List of all datasets in this object

property contains_instruments

List of all instruments in this object

property contains_vars

List of all variables in this dataset

copy()[source]

Make a copy of this object

Returns

copy of this object

Return type

UngriddedData

Raises

MemoryError – if copy is too big to fit into memory together with existing instance

property countries_available

Alphabetically sorted list of country names available

decode_lat_lon_from_float()[source]

method to decode lat and lon from a single number calculated by code_lat_lon_in_float

empty_trash()[source]

Set all values in trash column to NaN

extract_dataset(data_id)[source]

Extract single dataset into new instance of UngriddedData

Calls filter_by_meta().

Parameters

data_id (str) – ID of dataset

Returns

new instance of ungridded data containing only data from specified input network

Return type

UngriddedData

extract_var(var_name, check_index=True)[source]

Split this object into single-var UngriddedData objects

Parameters
  • var_name (str) – name of variable that is supposed to be extracted

  • check_index (Bool) – Call _check_index() in the new data object.

Returns

new data object containing only input variable data

Return type

UngriddedData

extract_vars(var_names, check_index=True)[source]

Extract multiple variables from dataset

Loops over input variable names and calls extract_var() to retrieve single variable UngriddedData objects for each variable and then merges all of these into one object

Parameters
  • var_names (list or str) – list of variables to be extracted

  • check_index (Bool) – Call _check_index() in the new data object.

Returns

new data object containing input variables

Return type

UngriddedData

Raises

VarNotAvailableError – if one of the input variables is not available in this data object

filter_altitude(alt_range)[source]

Filter altitude range

Parameters

alt_range (list or tuple) – 2-element list specifying altitude range to be filtered in m

Returns

filtered data object

Return type

UngriddedData

filter_by_meta(negate=None, **filter_attributes)[source]

Flexible method to filter these data based on input meta specs

Parameters
  • negate (list or str, optional) – specified meta key(s) provided via filter_attributes that are supposed to be treated as ‘not valid’. E.g. if station_name=”bad_site” is input in filter_attributes and if station_name is listed in negate, then all metadata blocks containing “bad_site” as station_name will be excluded in output data object.

  • **filter_attributes – valid meta keywords that are supposed to be filtered and the corresponding filter values (or value ranges) Only valid meta keywords are considered (e.g. data_id, longitude, latitude, altitude, ts_type)

Returns

filtered ungridded data object

Return type

UngriddedData

Raises
  • NotImplementedError – if attempt variables are supposed to be filtered (not yet possible)

  • IOError – if any of the input keys are not valid meta key

Example

>>> import pyaerocom as pya
>>> r = pya.io.ReadUngridded(['AeronetSunV2Lev2.daily',
                              'AeronetSunV3Lev2.daily'], 'od550aer')
>>> data = r.read()
>>> data_filtered = data.filter_by_meta(data_id='AeronetSunV2Lev2.daily',
...                                     longitude=[-30, 30],
...                                     latitude=[20, 70],
...                                     altitude=[0, 1000])
filter_region(region_id, check_mask=True, check_country_meta=False, **kwargs)[source]

Filter object by a certain region

Parameters
  • region_id (str) – name of region (must be valid AeroCom region name or HTAP region)

  • check_mask (bool) – if True and region_id a valid name for a binary mask, then the filtering is done based on that binary mask.

  • check_country_meta (bool) – if True, then the input region_id is first checked against available country names in metadata. If that fails, it is assumed that this regions is either a valid name for registered rectangular regions or for available binary masks.

  • **kwargs – currently not used in method (makes usage in higher level classes such as Filter easier as other data objects have the same method with possibly other input possibilities)

Returns

filtered data object (containing only stations that fall into input region)

Return type

UngriddedData

find_common_data_points(other, var_name, sampling_freq='daily')[source]
find_common_stations(other, check_vars_available=None, check_coordinates=True, max_diff_coords_km=0.1)[source]

Search common stations between two UngriddedData objects

This method loops over all stations that are stored within this object (using metadata) and checks if the corresponding station exists in a second instance of UngriddedData that is provided. The check is performed on basis of the station name, and optionally, if desired, for each station name match, the lon lat coordinates can be compared within a certain radius (defaul 0.1 km).

Note

This is a beta version and thus, to be treated with care.

Parameters
  • other (UngriddedData) – other object of ungridded data

  • check_vars_available (list (or similar), optional) – list of variables that need to be available in stations of both datasets

  • check_coordinates (bool) – if True, check that lon and lat coordinates of station candidates match within a certain range, specified by input parameter max_diff_coords_km

Returns

dictionary where keys are meta_indices of the common station in this object and corresponding values are meta indices of the station in the other object

Return type

OrderedDict

find_station_meta_indices(station_name_or_pattern, allow_wildcards=True)[source]

Find indices of all metadata blocks matching input station name

You may also use wildcard pattern as input (e.g. Potenza)

Parameters
  • station_pattern (str) – station name or wildcard pattern

  • allow_wildcards (bool) – if True, input station_pattern will be used as wildcard pattern and all matches are returned.

Returns

list containing all metadata indices that match the input station name or pattern

Return type

list

Raises

StationNotFoundError – if no such station exists in this data object

property first_meta_idx
static from_cache(data_dir, file_name)[source]

Load pickled instance of UngriddedData

Parameters
  • data_dir (str) – directory where pickled object is stored

  • file_name (str) – file name of pickled object (needs to end with pkl)

Raises

ValueError – if loading failed

Returns

loaded UngriddedData object. If this method is called from an instance of UngriddedData, this instance remains unchanged. You may merge the returned reloaded instance using merge().

Return type

UngriddedData

static from_station_data(stats)[source]

Create UngriddedData from input station data object(s)

Parameters

stats (list or StationData) – input data object(s)

Raises

ValueError – if any of the input data objects is not an instance of StationData.

Returns

ungridded data object created from input station data objects

Return type

UngriddedData

get_time_series(station, var_name, start=None, stop=None, ts_type=None, **kwargs)[source]

Get time series of station variable

Parameters
  • station (str or int) – station name or index of station in metadata dict

  • var_name (str) – name of variable to be retrieved

  • start – start time (optional)

  • stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered

  • ts_type (str, optional) – temporal resolution

  • **kwargs – Additional keyword args passed to method to_station_data()

Returns

time series data

Return type

pandas.Series

get_timeseries(station_name, var_name, start=None, stop=None, ts_type=None, insert_nans=True, **kwargs)[source]

Get variable timeseries data for a certain station

Parameters
  • station_name (str or int) – station name or index of station in metadata dict

  • var_name (str) – name of variable to be retrieved

  • start – start time (optional)

  • stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered

  • ts_type (str, optional) – temporal resolution (can be pyaerocom ts_type or pandas freq. string)

  • **kwargs – Additional keyword args passed to method to_station_data()

Returns

time series data

Return type

pandas.Series

get_variable_data(variables, start=None, stop=None, ts_type=None, **kwargs)[source]

Extract all data points of a certain variable

Parameters

vars_to_extract (str or list) – all variables that are supposed to be accessed

property has_flag_data

Boolean specifying whether this object contains flag data

property index
property is_empty

Boolean specifying whether this object contains data or not

property is_filtered

Boolean specifying whether this data object has been filtered

Note

Details about applied filtering can be found in filter_hist

last_filter_applied()[source]

Returns the last filter that was applied to this dataset

To see all filters, check out filter_hist

property last_meta_idx

Index of last metadata block

property latitude

Latitudes of stations

property longitude

Longitudes of stations

merge(other, new_obj=True)[source]

Merge another data object with this one

Parameters
  • other (UngriddedData) – other data object

  • new_obj (bool) – if True, this object remains unchanged and the merged data objects are returned in a new instance of UngriddedData. If False, then this object is modified

Returns

merged data object

Return type

UngriddedData

Raises

ValueError – if input object is not an instance of UngriddedData

merge_common_meta(ignore_keys=None)[source]

Merge all meta entries that are the same

Note

If there is an overlap in time between the data, the blocks are not merged

Parameters

ignore_keys (list) – list containing meta keys that are supposed to be ignored

Returns

merged data object

Return type

UngriddedData

property nonunique_station_names

List of station names that occur more than once in metadata

num_obs_var_valid(var_name)[source]

Number of valid observations of variable in this dataset

Parameters

var_name (str) – name of variable

Returns

number of valid observations (all values that are not NaN)

Return type

int

plot_station_coordinates(var_name=None, start=None, stop=None, ts_type=None, color='r', marker='o', markersize=8, fontsize_base=10, legend=True, add_title=True, **kwargs)[source]

Plot station coordinates on a map

All input parameters are optional and may be used to add constraints related to which stations are plotted. Default is all stations of all times.

Parameters
  • var_name (str, optional) – name of variable to be retrieved

  • start – start time (optional)

  • stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered

  • ts_type (str, optional) – temporal resolution

  • color (str) – color of stations on map

  • marker (str) – marker type of stations

  • markersize (int) – size of station markers

  • fontsize_base (int) – basic fontsize

  • legend (bool) – if True, legend is added

  • add_title (bool) – if True, title will be added

  • **kwargs – Addifional keyword args passed to pyaerocom.plot.plot_coordinates()

Returns

matplotlib axes instance

Return type

axes

plot_station_timeseries(station_name, var_name, start=None, stop=None, ts_type=None, insert_nans=True, ax=None, **kwargs)[source]

Plot time series of station and variable

Parameters
  • station_name (str or int) – station name or index of station in metadata dict

  • var_name (str) – name of variable to be retrieved

  • start – start time (optional)

  • stop – stop time (optional). If start time is provided and stop time not, then only the corresponding year inferred from start time will be considered

  • ts_type (str, optional) – temporal resolution

  • **kwargs – Addifional keyword args passed to method pandas.Series.plot()

Returns

matplotlib axes instance

Return type

axes

remove_outliers(var_name, inplace=False, low=None, high=None, unit_ref=None, move_to_trash=True)[source]

Method that can be used to remove outliers from data

Parameters
  • var_name (str) – variable name

  • inplace (bool) – if True, the outliers will be removed in this object, otherwise a new oject will be created and returned

  • low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)

  • high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)

  • unit_ref (str) – reference unit for assessment of input outlier ranges: all data needs to be in that unit, else an Exception will be raised

  • move_to_trash (bool) – if True, then all detected outliers will be moved to the trash column of this data object (i.e. column no. specified at UngriddedData._TRASHINDEX).

Returns

ungridded data object that has all outliers for this variable removed.

Return type

UngriddedData

Raises

ValueError – if input move_to_trash is True and in case for some of the measurements there is already data in the trash.

save_as(file_name, save_dir)[source]

Save this object to disk

Note

So far, only storage as pickled object via CacheHandlerUngridded is supported, so input file_name must end with .pkl

Parameters
  • file_name (str) – name of output file

  • save_dir (str) – name of output directory

Returns

file path

Return type

str

set_flags_nan(inplace=False, verbose=False)[source]

Set all flagged datapoints to NaN

Parameters

inplace (bool) – if True, the flagged datapoints will be set to NaN in this object, otherwise a new oject will be created and returned

Returns

data object that has all flagged data values set to NaN

Return type

UngriddedData

Raises

AttributeError – if no flags are assigned

property shape

Shape of data array

property station_coordinates

dictionary with station coordinates

Returns

dictionary containing station coordinates (latitude, longitude, altitude -> values) for all stations (keys) where these parameters are accessible.

Return type

dict

property station_name

Latitudes of data

property time

Time dimension of data

to_station_data(meta_idx, vars_to_convert=None, start=None, stop=None, freq=None, merge_if_multi=True, merge_pref_attr=None, merge_sort_by_largest=True, insert_nans=False, allow_wildcards_station_name=True, add_meta_keys=None, **kwargs)[source]

Convert data from one station to StationData

Parameters
  • meta_idx (float) – index of station or name of station.

  • vars_to_convert (list or str, optional) – variables that are supposed to be converted. If None, use all variables that are available for this station

  • start – start time, optional (if not None, input must be convertible into pandas.Timestamp)

  • stop – stop time, optional (if not None, input must be convertible into pandas.Timestamp)

  • freq (str) – pandas frequency string (e.g. ‘D’ for daily, ‘M’ for month end) or valid pyaerocom ts_type

  • merge_if_multi (bool) – if True and if data request results in multiple instances of StationData objects, then these are attempted to be merged into one StationData object using merge_station_data()

  • merge_pref_attr – only relevant for merging of multiple matches: preferred attribute that is used to sort the individual StationData objects by relevance. Needs to be available in each of the individual StationData objects. For details cf. pref_attr in docstring of merge_station_data(). Example could be revision_date. If None, then the stations will be sorted based on the number of available data points (if merge_sort_by_largest is True, which is default).

  • merge_sort_by_largest (bool) – only relevant for merging of multiple matches: cf. prev. attr. and docstring of merge_station_data() method.

  • insert_nans (bool) – if True, then the retrieved StationData objects are filled with NaNs

  • allow_wildcards_station_name (bool) – if True and if input meta_idx is a string (i.e. a station name or pattern), metadata matches will be identified applying wildcard matches between input meta_idx and all station names in this object.

Returns

StationData object(s) containing results. list is only returned if input for meta_idx is station name and multiple matches are detected for that station (e.g. data from different instruments), else single instance of StationData. All variable time series are inserted as pandas Series

Return type

StationData or list

to_station_data_all(vars_to_convert=None, start=None, stop=None, freq=None, by_station_name=True, ignore_index=None, **kwargs)[source]

Convert all data to StationData objects

Creates one instance of StationData for each metadata block in this object.

Parameters
  • vars_to_convert (list or str, optional) – variables that are supposed to be converted. If None, use all variables that are available for this station

  • start – start time, optional (if not None, input must be convertible into pandas.Timestamp)

  • stop – stop time, optional (if not None, input must be convertible into pandas.Timestamp)

  • freq (str) – pandas frequency string (e.g. ‘D’ for daily, ‘M’ for month end) or valid pyaerocom ts_type (e.g. ‘hourly’, ‘monthly’).

  • by_station_name (bool) – if True, then iter over unique_station_name (and merge multiple matches if applicable), else, iter over metadata index

  • **kwargs – additional keyword args passed to to_station_data() (e.g. merge_if_multi, merge_pref_attr, merge_sort_by_largest, insert_nans)

Returns

4-element dictionary containing following key / value pairs:

  • stats: list of StationData objects

  • station_name: list of corresponding station names

  • latitude: list of latitude coordinates

  • longitude: list of longitude coordinates

Return type

dict

to_timeseries(station_name=None, start_date=None, end_date=None, freq=None)[source]

Convert this object into individual pandas.Series objects

Parameters
  • station_name (tuple or str:, optional) – station_name or list of station_names to return

  • start_date (str:, optional) – date strings with start and end date to return

  • end_date (str:, optional) – date strings with start and end date to return

  • freq (obj:str:, optional) – frequency to resample to using the pandas resample method us the offset aliases as noted in http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

Returns

station_names is a string: dictionary with station data station_names is list or None: list of dictionaries with station data

Return type

list or dictionary

Example

>>> import pyaerocom.io.readobsdata
>>> obj = pyaerocom.io.readobsdata.ReadUngridded()
>>> obj.read()
>>> pdseries = obj.to_timeseries()
>>> pdseriesmonthly = obj.to_timeseries(station_name='Avignon',start_date='2011-01-01', end_date='2012-12-31', freq='M')
property unique_station_names

List of unique station names

property vars_to_retrieve
pyaerocom.ungriddeddata.reduce_array_closest(arr_nominal, arr_to_be_reduced)[source]

Colocated data

class pyaerocom.colocateddata.ColocatedData(data=None, **kwargs)[source]

Class representing colocated and unified data from two sources

Sources may be instances of UngriddedData or GriddedData that have been compared to each other.

Note

Currently, it is not foreseen, that this object is instantiated from scratch, but it is rather created in and returned by objects / methods that perform colocation. The purpose of this object is thus, not the creation of colocated objects, but solely the analysis of such data as well as I/O features (e.g. save as / read from .nc files, convert to pandas.DataFrame, plot station time series overlays, scatter plots, etc.)

In the current design, such an object comprises 3 dimensions, where the first dimension (depth, index 0) is ALWAYS length 2 and specifies the two datasets that were compared

Parameters
  • data (xarray.DataArray or numpy.ndarray or str, optional) – Colocated data. If str, then it is attempted to be loaded from file. Else, it is assumed that data is numpy array and that all further supplementary inputs (e.g. coords, dims) for the instantiation of DataArray is provided via **kwargs.

  • ref_data_id (str, optional) – ID of reference data

  • **kwargs – Additional keyword args that are passed to init of DataArray in case input data is numpy array.

Raises

IOError – if init fails

apply_country_filter(region_id, use_country_code=False, inplace=False)[source]
apply_latlon_filter(lat_range=None, lon_range=None, region_id=None, inplace=False)[source]

Apply regional filter

Returns new object filtered for input coordinate range

Parameters
  • lat_range (list, optional) – latitude range that is supposed to be applied. If specified, then also lon_range need to be specified, else, region_id is checked against AeroCom default regions (and used if applicable)

  • lon_range (list, optional) – longitude range that is supposed to be applied. If specified, then also lat_range need to be specified, else, region_id is checked against AeroCom default regions (and used if applicable)

  • region_id (str) – name of region to be applied. If provided (i.e. not None) then input args lat_range and lon_range are ignored

Returns

filtered data object

Return type

ColocatedData

apply_region_mask(region_id, inplace=False)[source]

Apply a binary regions mask filter to data object. Available binary regions IDs can be found at pyaerocom.const.HTAP_REGIONS.

Parameters
  • region_id (str) – ID of binary regions.

  • inplace (bool, optional) – If True, the current instance, is modified, else a new instance of ColocatedData is created and filtered. The default is False.

Returns

data – Filtered data object.

Return type

ColocatedData

property area_weights

Wrapper for calc_area_weights()

calc_area_weights()[source]

Calculate area weights

Note

Only applies to colocated data that has latitude and longitude dimension.

Returns

array containing weights for each datapoint (same shape as self.data[0])

Return type

ndarray

calc_nmb_array()[source]

Calculate data array with normalised bias (NMB) values

Returns

NMBs at each coordinate

Return type

DataArray

calc_statistics(constrain_val_range=False, use_area_weights=False, **kwargs)[source]

Calculate statistics from model and obs data

Wrapper for function pyaerocom.mathutils.calc_statistics()

Returns

dictionary containing statistical parameters

Return type

dict

check_dimensions()[source]

Checks if data source and time dimension are at the right index

check_set_countries(inplace=True, assign_to_dim=None)[source]

Checks if country information is available and assigns if not

If not country information is available, countries will be assigned for each lat / lon coordinate using pyaerocom.geodesy.get_country_info_coords().

Parameters
  • inplace (bool, optional) – If True, modify and return this object, else a copy. The default is True.

  • assign_to_dim (str, optional) – name of dimension to which the country coordinate is assigned. Default is None, in which case station_name is used.

Raises

DataDimensionError – If data is 4D (i.e. if latitude and longitude are othorgonal dimensions)

Returns

data object with countries assigned

Return type

ColocatedData

property coords

Coordinates of data array

copy()[source]

Copy this object

property countries_available

Alphabetically sorted list of country names available

Raises

MetaDataError – if no country information is available

Returns

list of countries available in these data

Return type

list

property country_codes_available

Alphabetically sorted list of country codes available

Raises

MetaDataError – if no country information is available

Returns

list of countries available in these data

Return type

list

property data

Data object (instance of xarray.DataArray)

property data_source

Coordinate array containing data sources (z-axis)

property dims

Names of dimensions

filter_altitude(alt_range, inplace=False)[source]
filter_region(region_id, check_mask=True, check_country_meta=False, inplace=False)[source]

Filter object by region

Parameters
  • region_id (str) – ID of region

  • inplace (bool) – if True, the filtering is done directly in this instance, else a new instance is returned

  • check_mask (bool) – if True and region_id a valid name for a binary mask, then the filtering is done based on that binary mask.

  • check_country_meta (bool) – if True, then the input region_id is first checked against available country names in metadata. If that fails, it is assumed that this regions is either a valid name for registered rectangular regions or for available binary masks.

Returns

filtered data object

Return type

ColocatedData

flatten_latlondim_station_name()[source]

Stack (flatten) lat / lon dimension into new dimension station_name

Returns

new colocated data object with dimension station_name and lat lon arrays as additional coordinates

Return type

ColocatedData

from_csv(file_path)[source]

Read data from CSV file

from_dataframe(df)[source]

Create colocated Data object from dataframe

Note

This is intended to be used as back-conversion from to_dataframe() and methods that use the latter (e.g. to_csv()).

get_coords_valid_obs()[source]

Get latitude / longitude coordinates where obsdata is available

Returns

  • list – latitute coordinates

  • list – longitude coordinates

get_country_codes()[source]

Get country names and codes for all locations contained in these data

Raises

MetaDataError – if no country information is available

Returns

dictionary of unique country names (keys) and corresponding country codes (values)

Return type

dict

static get_meta_from_filename(file_path)[source]

Get meta information from file name

Note

This does not yet include IDs of model and obs data as these should be included in the data anyways (e.g. column names in CSV file) and may include the delimiter _ in their name.

Returns

dicitonary with meta information

Return type

dict

get_regional_timeseries(region_id, **filter_kwargs)[source]

Compute regional timeseries both for model and obs

Parameters
  • region_id (str) – name of region for which regional timeseries is supposed to be retrieved

  • **filter_kwargs – additional keyword args passed to filter_region().

Returns

dictionary containing regional timeseries for model (key mod) and obsdata (key obs) and name of region.

Return type

dict

property has_latlon_dims

Boolean specifying whether data has latitude and longitude dimensions

property has_time_dim

Boolean specifying whether data has a time dimension

property latitude

Array of latitude coordinates

property longitude

Array of longitude coordinates

max()[source]
property meta

Meta data

min()[source]
property name

Name of data (should be variable name)

property ndim

Dimension of data array

property num_coords

Total number of lat/lon coordinates

property num_coords_with_data

Number of lat/lon coordinates that contain at least one datapoint

property num_grid_points

Number of lon / lat grid points that contain data

open(file_path)[source]

High level helper for reading from supported file sources

Parameters

file_path (str) – file path

plot_coordinates(marker='x', markersize=12, fontsize_base=10, **kwargs)[source]
plot_scatter(constrain_val_range=False, **kwargs)[source]

Create scatter plot of data

Parameters

**kwargs – keyword args passed to pyaerocom.plot.plotscatter.plot_scatter()

Returns

matplotlib axes instance

Return type

ax

read_netcdf(file_path)[source]

Read data from NetCDF file

Parameters

file_path (str) – file path

rename_variable(var_name, new_var_name, data_source, inplace=True)[source]

Rename a variable in this object

Parameters
  • var_name (str) – current variable name

  • new_var_name (str) – new variable name

  • data_source (str) – name of data source (along data_source dimension)

  • inplace (bool) – replace here or create new instance

Returns

instance with renamed variable

Return type

ColocatedData

Raises
  • VarNotAvailableError – if input variable is not available in this object

  • DataSourceError – if input data_source is not available in this object

resample_time(to_ts_type, how='mean', apply_constraints=None, min_num_obs=None, colocate_time=True, inplace=True, **kwargs)[source]

Resample time dimension

Parameters

to_ts_type (str) – new temporal resolution (must be lower than current resolution)

Returns

new data object containing resampled data

Return type

ColocatedData

Raises

TemporalResolutionError – if input resolution is higher than current resolution

property savename_aerocom

Default save name for data object following AeroCom convention

set_zeros_nan(inplace=True)[source]

Replace all 0’s with NaN in data

Parameters

inplace (str, optional) – Whether to modify this object or return a copy. The default is True.

Returns

cd – modified data object

Return type

ColocatedData

property shape

Shape of data array

stack(inplace=False, **kwargs)[source]

Stack one or more dimensions

Parameters

**kwargs – input arguments passed to DataArray.stack()

Returns

stacked data object

Return type

ColocatedData

Example

coldata = coldata.stack(latlon=[‘latitude’, ‘longitude’])

property start

Start datetime of data

property stop

Stop datetime of data

property time

Array containing time stamps

to_csv(out_dir, savename=None)[source]

Save data object as .csv file

Converts data to pandas.DataFrame and then saves as csv

Parameters
  • out_dir (str) – output directory

  • savename (str, optional) – name of file, if None, the default save name is used (cf. savename_aerocom)

to_dataframe()[source]

Convert this object into pandas.DataFrame

Note

This does not include meta information

to_netcdf(out_dir, savename=None, **kwargs)[source]

Save data object as NetCDF file

Wrapper for method xarray.DataArray.to_netdcf()

Parameters
  • out_dir (str) – output directory

  • savename (str, optional) – name of file, if None, the default save name is used (cf. savename_aerocom)

  • **kwargs – additional, optional keyword arguments passed to xarray.DataArray.to_netdcf()

property ts_type

String specifying temporal resolution of data

property unit

Unit of data

property units

Unit of data

property unitstr
unstack(inplace=False, **kwargs)[source]

Unstack one or more dimensions

Parameters

**kwargs – input arguments passed to DataArray.unstack()

Returns

unstacked data object

Return type

ColocatedData

property var_name

Coordinate array containing data sources (z-axis)

Station data

class pyaerocom.stationdata.StationData(**meta_info)[source]

Dict-like base class for single station data

ToDo: write more detailed introduction

Note

Variable data (e.g. numpy array or pandas Series) can be directly assigned to the object. When assigning variable data it is recommended to add variable metadata (e.g. unit, ts_type) in var_info, where key is variable name and value is dict with metadata entries.

dtime

list / array containing time index values

Type

list

var_info

dictionary containing information about each variable

Type

dict

data_err

dictionary that may be used to store uncertainty timeseries or data arrays associated with the different variable data.

Type

dict

overlap

dictionary that may be filled to store overlapping timeseries data associated with one variable. This is, for instance, used in merge_vardata() to store overlapping data from another station.

Type

dict

PROTECTED_KEYS = ['dtime', 'var_info', 'station_coords', 'data_err', 'overlap', 'numobs', 'data_flagged']

Keys that are ignored when accessing metadata

STANDARD_COORD_KEYS = ['latitude', 'longitude', 'altitude']

List of keys that specify standard metadata attribute names. This is used e.g. in get_meta()

STANDARD_META_KEYS = ['filename', 'station_id', 'station_name', 'instrument_name', 'PI', 'country', 'country_code', 'ts_type', 'latitude', 'longitude', 'altitude', 'data_id', 'dataset_name', 'data_product', 'data_version', 'data_level', 'framework', 'instr_vert_loc', 'revision_date', 'website', 'ts_type_src', 'stat_merge_pref_attr']
VALID_TS_TYPES = ['minutely', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'native']
calc_climatology(var_name, start=None, stop=None, apply_constraints=None, min_num_obs=None, clim_mincount=None, clim_freq=None, set_year=None, resample_how=None)[source]

Calculate climatological timeseries for input variable

The computation is done as follows:

1. retrieve monthly timesereries for climatological interval (if data is not already monthly). This is done by applying input resampling constraints via apply_constraints and min_num_obs and if these are unspecified, pyaerocom default is used (which is usually applying a hierarchical resampling) 2. Climatological timeseries is then computed from that monthly timeseries, and if apply_constraints is True a further sampling coverage criterium is applied to compute the climatology, which can be specified via mincount_month, or, if unspecified, pyaerocom default is used (cf. pyaerocom.const.CLIM_MIN_COUNT)

Parameters
  • var_name (str) – name of data variable

  • start – start time of data used to compute climatology

  • stop – start time of data used to compute climatology

  • apply_constraints (bool, optional) – if True, then hierarchical resampling constraints are applied (for details see pyaerocom.time_resampler.TimeResampler.resample())

  • min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see pyaerocom.time_resampler.TimeResampler.resample())

  • clim_micount (int, optional) – minimum number of of monthly values required per month of climatology

  • set_year (int, optional) – if specified, the output data will be assigned the input year. Else the middle year of the climatological interval is used.

  • resample_how (str) – how should the resampled data be averaged (e.g. mean, median)

  • **kwargs – Additional keyword args passed to pyaerocom.time_resampler.TimeResampler.resample()

Returns

new instance of StationData containing climatological data

Return type

StationData

check_dtime()[source]

Checks if dtime attribute is array or list

check_if_3d(var_name)[source]

Checks if altitude data is available in this object

check_unit(var_name, unit=None)[source]

Check if variable unit corresponds to a certain unit

Parameters
  • var_name (str) – variable name for which unit is to be checked

  • unit (str, optional) – unit to be checked, if None, AeroCom default unit is used

Raises
  • MetaDataError – if unit information is not accessible for input variable name

  • UnitConversionError – if current unit cannot be converted into specified unit (e.g. 1 vs m-1)

  • DataUnitError – if current unit is not equal to input unit but can be converted (e.g. 1/Mm vs 1/m)

check_var_unit_aerocom(var_name)[source]

Check if unit of input variable is AeroCom default, if not, convert

Parameters

var_name (str) – name of variable

Raises
  • MetaDataError – if unit information is not accessible for input variable name

  • UnitConversionError – if current unit cannot be converted into specified unit (e.g. 1 vs m-1)

  • DataUnitError – if current unit is not equal to AeroCom default and cannot be converted.

compute_trend(var_name, start_year=None, stop_year=None, season=None, slope_confidence=None, **alt_range)[source]
convert_unit(var_name, to_unit)[source]

Try to convert unit of data

Requires that unit of input variable is available in var_info

Note

BETA version

Parameters
  • var_name (str) – name of variable

  • to_unit (str) – new unit

Raises
  • MetaDataError – if variable unit cannot be accessed

  • UnitConversionError – if conversion failed

copy() → a shallow copy of od[source]
property default_vert_grid

AeroCom default grid for vertical regridding

For details, see DEFAULT_VERT_GRID_DEF in Config

Returns

numpy array specifying default coordinates

Return type

ndarray

dist_other(other)[source]

Distance to other station in km

Parameters

other (StationData) – other data object

Returns

distance between this and other station in km

Return type

float

get_data_columns()[source]

List containing all data columns

Iterates over all key / value pairs and finds all values that are lists or numpy arrays that match the length of the time-stamp array (attr. time)

Returns

list containing N arrays, where N is the total number of datacolumns found.

Return type

list

get_meta(force_single_value=True, quality_check=True, add_none_vals=False, add_meta_keys=None)[source]

Return meta-data as dictionary

By default, only default metadata keys are considered, use parameter add_meta_keys to add additional metadata.

Parameters
  • force_single_value (bool) – if True, then each meta value that is list or array,is converted to a single value.

  • quality_check (bool) – if True, and coordinate values are lists or arrays, then the standarad deviation in the values is compared to the upper limits allowed in the local variation. The upper limits are specified in attr. COORD_MAX_VAR.

  • add_none_vals (bool) – Add metadata keys which have value set to None.

  • add_meta_keys (str or list, optional) – Add none-standard metadata.

Returns

dictionary containing the retrieved meta-data

Return type

dict

Raises
  • AttributeError – if one of the meta entries is invalid

  • MetaDataError – in case of consistencies in meta data between individual time-stamps

get_station_coords(force_single_value=True, quality_check=True)[source]

Return coordinates as dictionary

This method uses the standard coordinate names defined in STANDARD_COORD_KEYS (latitude, longitude and altitude) to get the station coordinates. For each of these parameters tt first looks in station_coords if the parameter is defined (i.e. it is not None) and if not it checks if this object has an attribute that has this name and uses that one.

Parameters
  • force_single_value (bool) – if True and coordinate values are lists or arrays, then they are collapsed to single value using mean

  • quality_check (bool) – if True, and coordinate values are lists or arrays, then the standarad deviation in the values is compared to the upper limits allowed in the local variation.

Returns

dictionary containing the retrieved coordinates

Return type

dict

Raises
  • AttributeError – if one of the coordinate values is invalid

  • CoordinateError – if local variation in either of the three spatial coordinates is found too large

get_unit(var_name)[source]

Get unit of variable data

Parameters

var_name (str) – name of variable

Returns

unit of variable

Return type

str

Raises

MetaDataError – if unit cannot be accessed for variable

get_var_ts_type(var_name, try_infer=True)[source]

Get ts_type for a certain variable

Note

Converts to ts_type string if assigned ts_type is in pandas format

Parameters
  • var_name (str) – data variable name for which the ts_type is supposed to be retrieved

  • try_infer (bool) – if ts_type is not available, try inferring it from data

Returns

the corresponding data time resolution

Return type

str

Raises

MetaDataError – if no metadata is available for this variable (e.g. if var_name cannot be found in var_info)

has_var(var_name)[source]

Checks if input variable is available in data object

Parameters

var_name (str) – name of variable

Returns

True, if variable data is available, else False

Return type

bool

insert_nans_timeseries(var_name)[source]

Fill up missing values with NaNs in an existing time series

Note

This method does a resample of the data onto a regular grid. Thus, if the input ts_type is different from the actual current ts_type of the data, this method will not only insert NaNs but at the same.

Parameters
  • var_name (str) – variable name

  • inplace (bool) – if True, the actual data in this object will be overwritten with the new data that contains NaNs

Returns

the modified station data object

Return type

StationData

interpolate_timeseries(var_name, freq, min_coverage_interp=0.3, resample_how='mean', inplace=False)[source]

Interpolate one variable timeseries to a certain frequency

ToDo: complete docstring

merge_meta_same_station(other, coord_tol_km=None, check_coords=True, inplace=True, add_meta_keys=None, raise_on_error=False)[source]

Merge meta information from other object

Note

Coordinate attributes (latitude, longitude and altitude) are not copied as they are required to be the same in both stations. The latter can be checked and ensured using input argument check_coords

Parameters
  • other (StationData) – other data object

  • coord_tol_km (float) – maximum distance in km between coordinates of input StationData object and self. Only relevant if check_coords is True. If None, then _COORD_MAX_VAR is used which is defined in the class header.

  • check_coords (bool) – if True, the coordinates are compared and checked if they are lying within a certain distance to each other (cf. coord_tol_km).

  • inplace (bool) – if True, the metadata from the other station is added to the metadata of this station, else, a new station is returned with the merged attributes.

  • add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.

  • raise_on_error (bool) – if True, then an Exception will be raised in case one of the metadata items cannot be merged, which is most often due to unresolvable type differences of metadata values between the two objects

merge_other(other, var_name, add_meta_keys=None)[source]

Merge other station data object

Parameters
  • other (StationData) – other data object

  • var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)

  • add_meta_keys (str or list, optional) – additional non-standard metadata keys that are supposed to be considered for merging.

Returns

this object that has merged the other station

Return type

StationData

merge_vardata(other, var_name)[source]

Merge variable data from other object into this object

Note

This merges also the information about this variable in the dict var_info. It is required, that variable meta-info is specified in both StationData objects.

Note

This method removes NaN’s from the existing time series in the data objects. In order to fill up the time-series with NaNs again after merging, call insert_nans_timeseries()

Parameters
  • other (StationData) – other data object

  • var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)

Returns

this object

Return type

StationData

merge_varinfo(other, var_name)[source]

Merge variable specific meta information from other object

Parameters
  • other (StationData) – other data object

  • var_name (str) – variable name for which info is to be merged (needs to be both available in this object and the provided other object)

plot_timeseries(var_name, freq=None, resample_how='mean', add_overlaps=False, legend=True, tit=None, **kwargs)[source]

Plot timeseries for variable

Note

If you set input arg add_overlaps = True the overlapping timeseries data - if it exists - will be plotted on top of the actual timeseries using red colour and dashed line. As the overlapping data may be identical with the actual data, you might want to increase the line width of the actual timeseries using an additional input argument lw=4, or similar.

Parameters
  • var_name (str) – name of variable (e.g. “od550aer”)

  • freq (str, optional) – sampling resolution of data (can be pandas freq. string, or pyaerocom ts_type).

  • resample_how (str, optional) – choose from mean or median (only relevant if input parameter freq is provided, i.e. if resampling is applied)

  • add_overlaps (bool) – if True and if overlapping data exists for this variable, it will be added to the plot.

  • tit (str, optional) – title of plot, if None, default title is used

  • **kwargs – additional keyword args passed to matplotlib plot method

Returns

matplotlib.axes instance of plot

Return type

axes

Raises
  • KeyError – if variable key does not exist in this dictionary

  • ValueError – if length of data array does not equal the length of the time array

remove_outliers(var_name, low=None, high=None, check_unit=True)[source]

Remove outliers from one of the variable timeseries

Parameters
  • var_name (str) – variable name

  • low (float) – lower end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. minimum attribute of available variables)

  • high (float) – upper end of valid range for input variable. If None, then the corresponding value from the default settings for this variable are used (cf. maximum attribute of available variables)

  • check_unit (bool) – if True, the unit of the data is checked against AeroCom default

remove_variable(var_name)[source]

Remove variable data

Parameters

var_name (str) – name of variable that is to be removed

Returns

current instance of this object, with data removed

Return type

StationData

Raises

VarNotAvailableError – if the input variable is not available in this object

resample_time(var_name, ts_type, how='mean', apply_constraints=None, min_num_obs=None, inplace=False, **kwargs)[source]

Resample one of the time-series in this object

Parameters
  • var_name (str) – name of data variable

  • ts_type (str) – new frequency string (can be pyaerocom ts_type or valid pandas frequency string)

  • how (str) – how should the resampled data be averaged (e.g. mean, median)

  • apply_constraints (bool, optional) – if True, then hierarchical resampling constraints are applied (for details see pyaerocom.time_resampler.TimeResampler.resample())

  • min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see pyaerocom.time_resampler.TimeResampler.resample())

  • inplace (bool) – if True, then the current data object stored in self, will be overwritten with the resampled time-series

  • **kwargs – Additional keyword args passed to pyaerocom.time_resampler.TimeResampler.resample()

Returns

with resampled variable timeseries

Return type

StationData

resample_timeseries(var_name, **kwargs)[source]

Wrapper for resample_time() (for backwards compatibility)

Note

For backwards compatibility, this method will return a pandas Series instead of the actual StationData object

same_coords(other, tol_km=None)[source]

Compare station coordinates of other station with this station

Parameters
  • other (StationData) – other data object

  • tol_km (float) – distance tolerance in km

Returns

if True, then the two object are located within the specified tolerance range

Return type

bool

select_altitude(var_name, altitudes)[source]

Extract variable data within certain altitude range

Note

Beta version

Parameters
  • var_name (str) – name of variable for which metadata is supposed to be extracted

  • altitudes (list) – altitude range in m, e.g. [0, 1000]

Returns

data object within input altitude range

Return type

pandas. Series or xarray.DataArray

to_dataframe()[source]

Convert this object to pandas dataframe

Find all key/value pairs that contain observation data (i.e. values must be list or array and must have the same length as attribute time)

to_timeseries(var_name, freq=None, resample_how='mean', apply_constraints=None, min_num_obs=None, **kwargs)[source]

Get pandas.Series object for one of the data columns

Parameters
  • var_name (str) – name of variable (e.g. “od550aer”)

  • freq (str) – new temporal resolution (can be pandas freq. string, or pyaerocom ts_type)

  • resample_how (str) – choose from mean or median (only relevant if input parameter freq is provided, i.e. if resampling is applied)

  • apply_constraints (bool, optional) – if True, then hierarchical resampling constraints are applied (for details see pyaerocom.time_resampler.TimeResampler.resample())

  • min_num_obs (dict or int, optional) – minimum number of observations required per period (when downsampling). For details see pyaerocom.time_resampler.TimeResampler.resample())

  • **kwargs – optional keyword args passed to resample_timeseries()

Returns

time series object

Return type

Series

Raises
  • KeyError – if variable key does not exist in this dictionary

  • ValueError – if length of data array does not equal the length of the time array

property units

Dictionary containing units of all variables in this object

property vars_available

Number of variables available in this data object

Method to compute trends for a StationData object

Note

This method is badly designed and will be outsourced at some point. Please do not use and use StationData.compute_trend() directly (which will need to be rewritten as well, as it uses this method at the moment…)

No docstring because you shouldn’t use this method!

Other data classes

class pyaerocom.vertical_profile.VerticalProfile(data=None, altitude=None, dtime=None, var_name=None, data_err=None, var_unit=None, altitude_unit=None, **location_info)[source]

Object representing single variable profile data

property altitude

Array containing altitude values corresponding to data

property altitude_unit

Unit of altitude

compute_altitude()[source]

Compute altitude based on vertical coorinate information

property data

Array containing data values corresponding to data

property data_err

Array containing data values corresponding to data

plot(plot_errs=True, whole_alt_range=False, rot_xlabels=30, errs_shaded=True, errs_alpha=0.1, add_vertbar_zero=True, **kwargs)[source]

Simple plot method for vertical profile

Parameters

plot_errs (bool) – if True, and if errordata is available

update(**kwargs)[source]
property var_name

Variable name of profile data

property var_unit

Unit of variable (requires var_name to be available)

Colocation routines

Automatic colocation engine

High level module containing analysis classes and methods to perform colocation.

Note

This module will be deprecated soon but most of the code will be refactored into colocation.py module.

class pyaerocom.colocation_auto.ColocationSetup(model_id=None, obs_id=None, obs_vars=None, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg=None, remove_outliers=True, vert_scheme=None, harmonise_units=False, model_use_vars=None, model_add_vars=None, model_read_aux=None, read_opts_ungridded=None, obs_vert_type=None, model_vert_type_alt=None, var_outlier_ranges=None, var_ref_outlier_ranges=None, model_ts_type_read=None, obs_ts_type_read=None, flex_ts_type_gridded=True, apply_time_resampling_constraints=None, min_num_obs=None, model_keep_outliers=True, obs_keep_outliers=False, obs_use_climatology=False, colocate_time=False, basedir_coldata=None, obs_name=None, model_name=None, save_coldata=True, **kwargs)[source]

Setup class for model / obs intercomparison

An instance of this setup class can be used to run a colocation analysis between a model and an observation network and will create a number of pya.ColocatedData instances and save them as netCDF file.

Note

This is a very first draft and will likely undergo significant changes

model_id

ID of model to be used

Type

str

obs_id

ID of observation network to be used

Type

str

obs_vars

variables to be analysed. If any of the provided variables to be analysed in the model data is not available in obsdata, the obsdata will be checked against potential alternative variables which are specified in model_use_vars and which can be specified in form of a dictionary for each . If None, all variables are analysed that are available both in model and obsdata.

Type

str or list, optional

ts_type

string specifying colocation frequency

start

start time. Input can be anything that can be converted into pandas.Timestamp using pyaerocom.helpers.to_pandas_timestamp(). If None, than the first available date in the model data is used.

stop

stop time. Anything that can be converted into pandas.Timestamp using pyaerocom.helpers.to_pandas_timestamp() or None. If None and if start is on resolution of year (e.g. start=2010) then stop will be automatically set to the end of that year. Else, it will be set to the last available timestamp in the model data.

filter_name

name of filter to be applied. If None, AeroCom default is used (i.e. pyaerocom.const.DEFAULT_REG_FILTER)

Type

str

regrid_res_deg

resolution in degrees for regridding of model grid (done before colocation)

Type

int, optional

remove_outliers

if True, outliers are removed from model and obs data before colocation, else not.

Type

bool

vert_scheme

vertical scheme used for colocation

Type

str, optional

harmonise_units

if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised).

Type

bool

model_use_vars

dictionary that specifies mapping of model variables. Keys are observation variables, values are the corresponding model variables (e.g. model_use_vars=dict(od550aer=’od550csaer’)). Example: your observation has var od550aer but your model model uses a different variable name for that variable, say od550. Then, you can specify this via model_use_vars = {‘od550aer’ : ‘od550’}. NOTE: in this case, a model variable *od550aer* will be ignored, even if it exists (cf :attr:`model_add_vars).

Type

dict, optional

model_read_aux

may be used to specify additional computation methods of variables from models. Keys are obs variables, values are dictionaries with keys vars_required (list of required variables for computation of var and fun (method that takes list of read data objects and computes and returns var)

Type

dict, optional

read_opts_ungridded

dictionary that specifies reading constraints for ungridded reading (c.g. pyaerocom.io.ReadUngridded).

Type

dict, optional

obs_vert_type

Aerocom vertical code encoded in the model filenames (only AeroCom 3 and later). Specifies which model file should be read in case there are multiple options (e.g. surface level data can be read from a Surface.nc file as well as from a ModelLevel.nc file). If input is string (e.g. ‘Surface’), then the corresponding vertical type code is used for reading of all variables that are colocated (i.e. that are specified in obs_vars). Else (if input is dictionary, e.g. obs_vert_type=dict(od550aer=’Column’, ec550aer=’ModelLevel’)), information is extracted variable specific, for those who are defined in the dictionary, for all others, None is used.

Type

str or dict, optional

model_vert_type_alt

like obs_vert_type but is used in case of exception cases, i.e. where the obs_vert_type is not available in the models.

Type

str or dict, optional

var_outlier_ranges

dictionary specifying outlier ranges for individual variables. (e.g. dict(od550aer = [-0.05, 10], ang4487aer=[0,4]))

Type

dict, optional

model_ts_type_read

may be specified to explicitly define the reading frequency of the model data. Not to be confused with ts_type, which specifies the frequency used for colocation. Can be specified variable specific by providing a dictionary.

Type

str or dict, optional

obs_ts_type_read

may be specified to explicitly define the reading frequency of the observation data (so far, this does only apply to gridded obsdata such as satellites). For ungridded reading, the frequency may be specified via obs_id, where applicable (e.g. AeronetSunV3Lev2.daily). Not to be confused with ts_type, which specifies the frequency used for colocation. Can be specified variable specific in form of dictionary.

Type

str or dict, optional

flex_ts_type_gridded

boolean specifying whether reading frequency of gridded data is allowed to be flexible. This includes all gridded data, whether it is model or gridded observation (e.g. satellites). Defaults to True.

Type

bool

apply_time_resampling_constraints

if True, then time resampling constraints are applied as provided via min_num_obs or if that one is unspecified, as defined in pyaerocom.const.OBS_MIN_NUM_RESAMPLE. If None, than pyaerocom.const.OBS_APPLY_TIME_RESAMPLE_CONSTRAINTS is used (which defaults to True !!).

Type

bool, optional

min_num_obs

time resampling constraints applied if input arg apply_time_resampling_constraints is True - or None, in which case pyaerocom.const.OBS_APPLY_TIME_RESAMPLE_CONSTRAINTS is used.

Type

dict or int, optional

resample_how

string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘conco3’: ‘daily’: {‘hourly’ : ‘max’}}} would use the maximum value to aggregate from hourly to daily for variable conco3, rather than the mean.

Type

str or dict

model_keep_outliers

if True, no outliers are removed from model data

Type

bool

obs_keep_outliers

if True, no outliers are removed from obs / reference data

Type

bool

obs_use_climatology

BETA if True, pyaerocom default climatology is computed from observation stations (so far only possible for unrgidded / gridded colocation)

Type

bool

colocate_time

if True and if obs and model sampling frequency (e.g. daily) are higher than input colocation frequency (e.g. monthly), then the datasets are first colocated in time (e.g. on a daily basis), before the monthly averages are calculated. Default is False.

Type

bool

basedir_coldata

base directory for storing of colocated data files

Type

str

obs_name

if provided, this string will be used in colocated data filename to specify obsnetwork, else obs_id will be used

Type

str, optional

model_name

if provided, this string will be used in colocated data filename to specify model, else obs_id will be used

Type

str, optional

save_coldata

if True, colocated data objects are saved as NetCDF file.

Type

bool

OBS_VERT_TYPES_ALT = {'Surface': 'ModelLevel'}

Dictionary specifying alternative vertical types that may be used to read model data. E.g. consider the variable is ec550aer, obs_vert_type=’Surface’ and obs_vert_type_alt=dict(Surface=’ModelLevel’). Now, if a model that is used for the analysis does not contain a data file for ec550aer at the surface (‘ec550aer*Surface.nc’), then, the colocation routine will look for ‘ec550aer*ModelLevel.nc’ and if this exists, it will load it and extract the surface level.

property UNGRIDDED_IDS

ID’s of all supported ungridded datasets

property basedir_logfiles

Base directory for storing logfiles

raise_exceptions

If True, the colocation routine will raise any Exception that may occur, else (False), expected expcetions will be ignored and logged.

reanalyse_existing

If True, existing colocated data files will be re-computed and overwritten

update([E, ]**F) → None. Update D from dict/iterable E and F.[source]

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

class pyaerocom.colocation_auto.Colocator(**kwargs)[source]

High level class for running colocation

Note

This object inherits from ColocationSetup and is also instantiated as such. For attributes, please see base class.

SUPPORTED_GRIDDED_READERS = {'ReadGridded': <class 'pyaerocom.io.readgridded.ReadGridded'>, 'ReadMscwCtm': <class 'pyaerocom.io.read_mscw_ctm.ReadMscwCtm'>}
static get_lowest_resolution(ts_type, *ts_types)[source]

Get the lowest resolution ts_type of input ts_types

instantiate_gridded_reader(what)[source]

Create reader for model or observational gridded data.

Parameters

what (str) – Type of reader. (“model” or “obs”)

Returns

Return type

Instance of reader class defined in self.SUPPORTED_GRIDDED_READERS

read_model_data(var_name, **kwargs)[source]

Read model variable data based on colocation setup

Parameters

var_name (str) – variable to be read

Returns

variable data

Return type

GriddedData

read_ungridded(vars_to_read=None)[source]

Helper to read UngriddedData

Note

Currently not used in main processing method _run_gridded_ungridded(). But should be.

Parameters

vars_to_read (str or list, optional) – variables that should be read from obs-network (obs_id)

Returns

loaded data object

Return type

UngriddedData

run(var_name=None, **opts)[source]

Perform colocation for current setup

The current setup comprises at least

Parameters

**opts – keyword args that may be specified to change the current setup before colocation

Low-level colocation methods

Methods and / or classes to perform colocation

pyaerocom.colocation.colocate_gridded_gridded(gridded_data, gridded_data_ref, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg=None, remove_outliers=True, vert_scheme=None, harmonise_units=True, regrid_scheme='areaweighted', var_outlier_ranges=None, var_ref_outlier_ranges=None, update_baseyear_gridded=None, apply_time_resampling_constraints=None, min_num_obs=None, colocate_time=False, var_keep_outliers=True, var_ref_keep_outliers=False, resample_how=None, **kwargs)[source]

Colocate 2 gridded data objects

Parameters
  • gridded_data (GriddedData) – gridded data (e.g. model results)

  • gridded_data_ref (GriddedData) – reference dataset that is used to evaluate gridded_data (e.g. gridded observation data)

  • ts_type (str) – desired temporal resolution of colocated data (must be valid AeroCom ts_type str such as daily, monthly, yearly..)

  • start (str or datetime64 or similar, optional) – start time for colocation, if None, the start time of the input GriddedData object is used

  • stop (str or datetime64 or similar, optional) – stop time for colocation, if None, the stop time of the input GriddedData object is used

  • filter_name (str) – string specifying filter used (cf. pyaerocom.filter.Filter for details). If None, then it is set to ‘WORLD-wMOUNTAINS’, which corresponds to no filtering (world with mountains). Use WORLD-noMOUNTAINS to exclude mountain sites.

  • regrid_res_deg (int or dict, optional) – regrid resolution in degrees. If specified, the input gridded data objects will be regridded in lon / lat dimension to the input resolution (if input is integer, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively).

  • remove_outliers (bool) – if True, outliers are removed from model and obs data before colocation, else not.

  • vert_scheme (str) – string specifying scheme used to reduce the dimensionality in case input grid data contains vertical dimension. Example schemes are mean, surface, altitude, for details see GriddedData.to_time_series().

  • harmonise_units (bool) – if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised).

  • regrid_scheme (str) – iris scheme used for regridding (defaults to area weighted regridding)

  • var_outlier_ranges (dict, optional) – dictionary specifying outlier ranges for dataset to be analysed (e.g. dict(od550aer = [-0.05, 10], ang4487aer=[0,4])). If None, then the pyaerocom default outlier ranges are used for the input variable. Defaults to None.

  • var_ref_outlier_ranges (dict, optional) – like var_outlier_ranges but for reference dataset.

  • update_baseyear_gridded (int, optional) – optional input that can be set in order to redefine the time dimension in the gridded data object to be analysed. E.g., if the data object is a climatology (one year of data) that has set the base year of the time dimension to a value other than the specified input start / stop time this may be used to update the time in order to make colocation possible.

  • apply_time_resampling_constraints (bool, optional) – if True, then time resampling constraints are applied as provided via min_num_obs or if that one is unspecified, as defined in pyaerocom.const.OBS_MIN_NUM_RESAMPLE. If None, than pyaerocom.const.OBS_APPLY_TIME_RESAMPLE_CONSTRAINTS is used (which defaults to True !!).

  • min_num_obs (int or dict, optional) – minimum number of observations for resampling of time

  • colocate_time (bool) – if True and if original time resolution of data is higher than desired time resolution (ts_type), then both datasets are colocated in time before resampling to lower resolution.

  • var_keep_outliers (bool) – if True, then no outliers will be removed from dataset to be analysed, even if remove_outliers is True. That is because for model evaluation often only outliers are supposed to be removed in the observations but not in the model.

  • var_ref_keep_outliers (bool) – if True, then no outliers will be removed from the reference dataset, even if remove_outliers is True.

  • resample_how (str or dict) – string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily, rather than the mean.

  • **kwargs – additional keyword args (not used here, but included such that factory class can handle different methods with different inputs)

Returns

instance of colocated data

Return type

ColocatedData

pyaerocom.colocation.colocate_gridded_ungridded(gridded_data, ungridded_data, ts_type=None, start=None, stop=None, filter_name=None, regrid_res_deg=None, remove_outliers=True, vert_scheme=None, harmonise_units=True, regrid_scheme='areaweighted', var_ref=None, var_outlier_ranges=None, var_ref_outlier_ranges=None, update_baseyear_gridded=None, ignore_station_names=None, apply_time_resampling_constraints=None, min_num_obs=None, colocate_time=False, var_keep_outliers=True, var_ref_keep_outliers=False, use_climatology_ref=False, resample_how=None, **kwargs)[source]

Colocate gridded with ungridded data (low level method)

For high-level colocation see pyaerocom.colocation_auto.Colocator and pyaerocom.colocation_auto.ColocationSetup

Note

Uses the variable that is contained in input GriddedData object (since these objects only contain a single variable). If this variable is not contained in observation data (or contained but using a different variable name) you may specify the obs variable to be used via input arg var_ref

Parameters
  • gridded_data (GriddedData) – gridded data object (e.g. model results).

  • ungridded_data (UngriddedData) – ungridded data object (e.g. observations).

  • ts_type (str) – desired temporal resolution of colocated data (must be valid AeroCom ts_type str such as daily, monthly, yearly.).

  • start (str or datetime64 or similar, optional) – start time for colocation, if None, the start time of the input GriddedData object is used.

  • stop (str or datetime64 or similar, optional) – stop time for colocation, if None, the stop time of the input GriddedData object is used

  • filter_name (str) – string specifying filter used (cf. pyaerocom.filter.Filter for details). If None, then it is set to ‘WORLD-wMOUNTAINS’, which corresponds to no filtering (world with mountains). Use WORLD-noMOUNTAINS to exclude mountain sites.

  • regrid_res_deg (int or dict, optional) – regrid resolution in degrees. If specified, the input gridded data object will be regridded in lon / lat dimension to the input resolution (if input is integer, both lat and lon are regridded to that resolution, if input is dict, use keys lat_res_deg and lon_res_deg to specify regrid resolutions, respectively).

  • remove_outliers (bool) – if True, outliers are removed from model and obs data before colocation, else not. Outlier ranges can be specified via input args var_outlier_ranges and var_ref_outlier_ranges.

  • vert_scheme (str) – string specifying scheme used to reduce the dimensionality in case input grid data contains vertical dimension. Example schemes are mean, surface, altitude, for details see GriddedData.to_time_series().

  • harmonise_units (bool) – if True, units are attempted to be harmonised (note: raises Exception if True and units cannot be harmonised).

  • var_ref (str, optional) – variable against which data in gridded_data is supposed to be compared. If None, then the same variable is used (i.e. gridded_data.var_name).

  • var_outlier_ranges (dict, optional) – dictionary specifying outlier ranges for dataset to be analysed (e.g. dict(od550aer = [-0.05, 10], ang4487aer=[0,4])). If None, then the pyaerocom default outlier ranges are used for the input variable. Defaults to None.

  • var_ref_outlier_ranges (dict, optional) – like var_outlier_ranges but for reference dataset.

  • update_baseyear_gridded (int, optional) – optional input that can be set in order to re-define the time dimension in the gridded data object to be analysed. E.g., if the data object is a climatology (one year of data) that has set the base year of the time dimension to a value other than the specified input start / stop time this may be used to update the time in order to make colocation possible.

  • ignore_station_names (str or list, optional) – station name or pattern or list of station names or patterns that should be ignored

  • apply_time_resampling_constraints (bool, optional) – if True, then time resampling constraints are applied as provided via min_num_obs or if that one is unspecified, as defined in pyaerocom.const.OBS_MIN_NUM_RESAMPLE. If None, than pyaerocom.const.OBS_APPLY_TIME_RESAMPLE_CONSTRAINTS is used (which defaults to True !!).

  • min_num_obs (int or dict, optional) – minimum number of observations for resampling of time

  • colocate_time (bool) – if True and if original time resolution of data is higher than desired time resolution (ts_type), then both datasets are colocated in time before resampling to lower resolution.

  • var_keep_outliers (bool) – if True, then no outliers will be removed from dataset to be analysed, even if remove_outliers is True. That is because for model evaluation often only outliers are supposed to be removed in the observations but not in the model.

  • var_ref_keep_outliers (bool) – if True, then no outliers will be removed from the reference dataset, even if remove_outliers is True.

  • use_climatology_ref (bool) – if True, climatological timeseries are used from observations

  • resample_how (str or dict) – string specifying how data should be aggregated when resampling in time. Default is “mean”. Can also be a nested dictionary, e.g. resample_how={‘daily’: {‘hourly’ : ‘max’}} would use the maximum value to aggregate from hourly to daily, rather than the mean.

  • **kwargs – additional keyword args (passed to UngriddedData.to_station_data_all())

Returns

instance of colocated data

Return type

ColocatedData

Raises
  • VarNotAvailableError – if grid data variable is not available in ungridded data object

  • AttributeError – if instance of input UngriddedData object contains more than one dataset

  • TimeMatchError – if gridded data time range does not overlap with input time range

  • ColocationError – if none of the data points in input UngriddedData matches the input colocation constraints

pyaerocom.colocation.correct_model_stp_coldata(coldata, p0=None, t0=273.15, inplace=False)[source]

Correct modeldata in colocated data object to STP conditions

Note

BETA version, quite unelegant coded (at 8pm 3 weeks before IPCC deadline), but should do the job for 2010 monthly colocated data files (AND NOTHING ELSE)!

Combining ungridded observations

pyaerocom.combine_vardata_ungridded.combine_vardata_ungridded(data_ids_and_vars, match_stats_how='closest', match_stats_tol_km=1, merge_how='combine', merge_eval_fun=None, var_name_out=None, data_id_out=None, var_unit_out=None, resample_how='mean', apply_time_resampling_constraints=False, min_num_obs=None, add_meta_keys=None)[source]

Combine and colocate different variables from UngriddedData

This method allows to combine different variable timeseries from different ungridded observation records in multiple ways. The source data may be all included in a single instance of UngriddedData or in multiple, for details see first input parameter :param:`data_ids_and_vars`. Merging can be done in flexible ways, e.g. by combining measurements of the same variable from 2 different datasets or by computing new variables based on 2 measured variables (e.g. concox=concno2+conco3). Doing this requires colocation of site locations and timestamps of both input observation records, which is done in this method.

It comprises 2 major steps:

  1. Compute list of StationData objects for both input data combinations (data_id1 & var1; data_id2 & var2) and based on these, find the coincident locations. Finding coincident sites can either be done based on site location name or based on

    their lat/lon locations. The method to use can be specified via input arg :param:`match_stats_how`.

  2. For all coincident locations, a new instance of StationData is computed that has merged the 2 timeseries in the way

    that can be specified through input args :param:`merge_how` and :param:`merge_eval_fun`. If the 2 original timeseries from both sites come in different temporal resolutions, they will be resampled to the lower of both resolutions. Resampling constraints that are supposed to be applied in that case can be provided via the respective input args for temporal resampling. Default is pyaerocom default, which corresponds to ~25% coverage constraint (as of 22.10.2020) for major resolution steps, such as daily->monthly.

Note

Currently, only 2 variables can be combined to a new one (e.g. concox=conco3+concno2).

Note

Be aware of unit conversion issues that may arise if your input data is not in AeroCom default units. For details see below.

Parameters
  • data_ids_and_vars (list) – list of 3 element tuples, each containing, in the following order 1. instance of UngriddedData; 2. dataset ID (remember that UngriddedData can contain more than one dataset); and 3. variable name. Note that currently only 2 of such tuples can be combined.

  • match_stats_how (str, optional) – String specifying how site locations are supposed to be matched. The default is ‘closest’. Supported are ‘closest’ and ‘station_name’.

  • match_stats_tol_km (float, optional) – radius tolerance in km for matching site locations when using ‘closest’ for site location matching. The default is 1.

  • merge_how (str, optional) – String specifying how to merge variable data at site locations. The default is ‘combine’. If both input variables are the same and combine is used, then the first input variable will be preferred over the other. Supported are ‘combine’, ‘mean’ and ‘eval’, for the latter, merge_eval_fun needs to be specified explicitly.

  • merge_eval_fun (str, optional) – String specifying how var1 and var2 data should be evaluated (only relevant if merge_how=’eval’ is used) . The default is None. E.g. if one wants to retrieve the column aerosol fine mode fraction at 550nm (fmf550aer) through AERONET, this could be done through the SDA product by prodiding data_id1 and var1 are ‘AeronetSDA’ and ‘od550aer’ and second input data_id2 and var2 are ‘AeronetSDA’ and ‘od550lt1aer’ and merge_eval_fun could then be ‘fmf550aer=(AeronetSDA;od550lt1aer/AeronetSDA;od550aer)*100’. Note that the input variables will be converted to their AeroCom default units, so the specification of merge_eval_fun should take that into account in case the originally read obsdata is not in default units.

  • var_name_out (str, optional) – Name of output variable. Default is None, in which case it is attempted to be inferred.

  • data_id_out (str, optional) – data_id set in output StationData objects. Default is None, in which case it is inferred from input data_ids (e.g. in above example of merge_eval_fun, the output data_id would be ‘AeronetSDA’ since both input IDs are the same.

  • var_unit_out (str) – unit of output variable.

  • resample_how (str, optional) – String specifying how temporal resampling should be done. The default is ‘mean’.

  • apply_time_resampling_constraints (bool, optional) – Boolean specifying whether constraints should be applied for temporal resampling (e.g. at least X daily values to get a monthly mean). The default is False.

  • min_num_obs (int or dict, optional) – Minimum number of observations for temporal resampling. The default is None in which case pyaerocom default is used, which is available via pyaerocom.const.OBS_MIN_NUM_RESAMPLE.

  • add_meta_keys (list, optional) – additional metadata keys to be added to output StationData objects from input data. If None, then only the pyaerocom default keys are added (see StationData.STANDARD_META_KEYS).

Raises
  • ValueError – If input for merge_how or match_stats_how is invalid.

  • NotImplementedError – If one of the input UngriddedData objects contains more than one dataset.

Returns

merged_stats – list of StationData objects containing the colocated and combined variable data.

Return type

list

Reading of gridded data

Gridded data specifies any dataset that can be represented and stored on a regular grid within a certain domain (e.g. lat, lon time), for instance, model output or level 3 satellite data, stored, for instance, as NetCDF files. In pyaerocom, the underlying data object is GriddedData and pyaerocom supports reading of such data for different file naming conventions.

Data stored using AeroCom conventions

class pyaerocom.io.readgridded.ReadGridded(data_id=None, data_dir=None, file_convention='aerocom3', init=True)[source]

Class for reading gridded files based on network or model ID

Note

The reading only works if files are stored using a valid file naming convention. See package data file file_conventions.ini for valid keys. You may define your own fileconvention in this file, if you wish.

data_id

string ID for model or obsdata network (see e.g. Aerocom interface map plots lower left corner)

Type

str

data

imported data object

Type

GriddedData

data_dir

directory containing result files for this model

Type

str

start

start time for data import

Type

pandas.Timestamp

stop

stop time for data import

Type

pandas.Timestamp

file_convention

class specifying details of the file naming convention for the model

Type

FileConventionRead

files

list containing all filenames that were found. Filled, e.g. in ReadGridded.get_model_files()

Type

list

from_files

List of all netCDF files that were used to concatenate the current data cube (i.e. that can be based on certain matching settings such as var_name or time interval).

Type

list

ts_types

list of all sampling frequencies (e.g. hourly, daily, monthly) that were inferred from filenames (based on Aerocom file naming convention) of all files that were found

Type

list

vars

list containing all variable names (e.g. od550aer) that were inferred from filenames based on Aerocom model file naming convention

Type

list

years

list of available years as inferred from the filenames in the data directory.

Type

list

Parameters
  • data_id (str) – string ID of model (e.g. “AATSR_SU_v4.3”,”CAM5.3-Oslo_CTRL2016”)

  • data_dir (str, optional) – directory containing data files. If provided, only this directory is considered for data files, else the input data_id is used to search for the corresponding directory.

  • file_convention (str) – string ID specifying the file convention of this model (cf. installation file file_conventions.ini)

  • init (bool) – if True, the model directory is searched (search_data_dir()) on instantiation and if it is found, all valid files for this model are searched using search_all_files().

AUX_ADD_ARGS = {'concprcpoxn': {'prlim': 0.0001, 'prlim_set_under': nan, 'prlim_unit': 'm d-1', 'ts_type': 'daily'}}

Additional arguments passed to computation methods for auxiliary data This is optional and defined per-variable like in AUX_FUNS

AUX_ALT_VARS = {'ac550dryaer': ['ac550aer'], 'od440aer': ['od443aer'], 'od870aer': ['od865aer']}
AUX_FUNS = {'ang4487aer': <function compute_angstrom_coeff_cubes>, 'angabs4487aer': <function compute_angstrom_coeff_cubes>, 'conc*': <function multiply_cubes>, 'concno3': <function add_cubes>, 'concox': <function add_cubes>, 'concprcpoxn': <function compute_concprcp_from_pr_and_wetdep>, 'dryoa': <function add_cubes>, 'fmf550aer': <function divide_cubes>, 'od550gt1aer': <function subtract_cubes>, 'sc550dryaer': <function subtract_cubes>, 'vmrox': <function add_cubes>, 'wetoa': <function add_cubes>}
AUX_REQUIRES = {'ang4487aer': ('od440aer', 'od870aer'), 'angabs4487aer': ('abs440aer', 'abs870aer'), 'conc*': ('mmr*', 'rho'), 'concno3': ('concno3c', 'concno3f'), 'concox': ('concno2', 'conco3'), 'concprcpoxn': ('wetoxn', 'pr'), 'dryoa': ('drypoa', 'drysoa'), 'fmf550aer': ('od550lt1aer', 'od550aer'), 'od550gt1aer': ('od550aer', 'od550lt1aer'), 'sc550dryaer': ('ec550dryaer', 'ac550dryaer'), 'vmrox': ('vmrno2', 'vmro3'), 'wetoa': ('wetpoa', 'wetsoa')}
CONSTRAINT_OPERATORS = {'<': <ufunc 'less'>, '<=': <ufunc 'less_equal'>, '==': <ufunc 'equal'>, '>': <ufunc 'greater'>, '>=': <ufunc 'greater_equal'>}
property TS_TYPES

List with valid filename encryptions specifying temporal resolution

Update 7.11.2019: not in use anymore due to improved handling of all possible frequencies now using TsType class.

VERT_ALT = {'Surface': 'ModelLevel'}
add_aux_compute(var_name, vars_required, fun)[source]

Register new variable to be computed

Parameters
  • var_name (str) – variable name to be computed

  • vars_required (list) – list of variables to read, that are required to compute var_name

  • fun (callable) – function that takes a list of GriddedData objects as input and that are read using variable names specified by vars_required.

apply_read_constraint(data, constraint, **kwargs)[source]

Filter a GriddeData object by value in another variable

Note

BETA version, that was hacked down in a rush to be able to apply AOD>0.1 threshold when reading AE.

Parameters
  • data (GriddedData) – data object to which constraint is applied

  • constraint (dict) – dictionary defining read constraint (see check_constraint_valid() for minimum requirement). If constraint contains key var_name (not mandatory), then the corresponding variable is attemted to be read and is used to evaluate constraint and the corresponding boolean mask is then applied to input data. Wherever this mask is True (i.e. constraint is met), the current value in input data will be replaced with numpy.ma.masked or, if specified, with entry new_val in input constraint dict.

  • **kwargs (TYPE) – reading arguments in case additional variable data needs to be loaded, to determine filter mask (i.e. if var_name is specified in input constraint). Parse to read_var().

Raises

ValueError – If constraint is invalid (cf. check_constraint_valid() for details).

Returns

modified data objects (all grid-points that met constraint are replaced with either numpy.ma.masked or with a value that can be specified via key new_val in input constraint).

Return type

GriddedData

browser

This object can be used to

check_compute_var(var_name)[source]

Check if variable name belongs to family that can be computed

For instance, if input var_name is concdust this method will check AUX_REQUIRES to see if there is a variable family pattern (conc*) defined that specifies how to compute these variables. If a match is found, the required variables and computation method is added via add_aux_compute().

Parameters

var_name (str) – variable name to be checked

Returns

True if match is found, else False

Return type

bool

check_constraint_valid(constraint)[source]

Check if reading constraint is valid

Parameters

constraint (dict) – reading constraint. Requires at lest entries for following keys: - operator (str): for valid operators see CONSTRAINT_OPERATORS - filter_val (float): value against which data is evaluated wrt to operator

Raises

ValueError – If constraint is invalid

Returns

Return type

None.

compute_var(var_name, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, vars_to_read=None, aux_fun=None, try_convert_units=True, aux_add_args=None, **kwargs)[source]

Compute auxiliary variable

Like read_var() but for auxiliary variables (cf. AUX_REQUIRES)

Parameters
  • var_name (str) – variable that are supposed to be read

  • start (Timestamp or str, optional) – start time of data import (if valid input, then the current start will be overwritten)

  • stop (Timestamp or str, optional) – stop time of data import

  • ts_type (str) – string specifying temporal resolution (choose from hourly, 3hourly, daily, monthly). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel)

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

  • try_convert_units (bool) – if True, units of GriddedData objects are attempted to be converted to AeroCom default. This applies both to the GriddedData objects being read for computation as well as the variable computed from the forme objects. This is, for instance, useful when computing concentration in precipitation from wet deposition and precipitation amount.

  • **kwargs – additional keyword args passed to _load_var()

Returns

loaded data object

Return type

GriddedData

concatenate_cubes(cubes)[source]

Concatenate list of cubes into one cube

Parameters

CubeList – list of individual cubes

Returns

Single cube that contains concatenated cubes from input list

Return type

Cube

Raises

iris.exceptions.ConcatenateError – if concatenation of all cubes failed

property data_dir

Directory where data files are located

property data_id

Data ID of dataset

property experiments

List of all experiments that are available in this dataset

property file_type

File type of data files

property files

List of data files

filter_files(var_name=None, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, df=None)[source]

Filter file database

Parameters
  • var_name (str) – variable that are supposed to be read

  • ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used

  • start (Timestamp or str, optional) – start time of data import

  • stop (Timestamp or str, optional) – stop time of data import

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

filter_query(var_name, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, flex_ts_type=True, prefer_longer=False)[source]

Filter files for read query based on input specs

Returns

dataframe containing filtered dataset

Return type

DataFrame

find_common_ts_type(vars_to_read, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True)[source]

Find common ts_type for list of variables to be read

Parameters
  • vars_to_read (list) – list of variables that is supposed to be read

  • start (Timestamp or str, optional) – start time of data import (if valid input, then the current start will be overwritten)

  • stop (Timestamp or str, optional) – stop time of data import (if valid input, then the current start will be overwritten)

  • ts_type (str) – string specifying temporal resolution (choose from hourly, 3hourly, daily, monthly). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel)

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

Returns

common ts_type for input variable

Return type

str

Raises

DataCoverageError – if no match can be found

get_files(var_name, ts_type=None, start=None, stop=None, experiment=None, vert_which=None, is_at_stations=False, flex_ts_type=True, prefer_longer=False)[source]

Get data files based on input specs

get_var_info_from_files()[source]

Creates dicitonary that contains variable specific meta information

Returns

dictionary where keys are available variables and values (for each variable) contain information about available ts_types, years, etc.

Return type

OrderedDict

has_var(var_name)[source]

Check if variable is available

Parameters

var_name (str) – variable to be checked

Returns

Return type

bool

property name

Deprecated name of attribute data_id

read(vars_to_retrieve=None, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, require_all_vars_avail=False, **kwargs)[source]

Read all variables that could be found

Reads all variables that are available (i.e. in vars_filename)

Parameters
  • vars_to_retrieve (list or str, optional) – variables that are supposed to be read. If None, all variables that are available are read.

  • start (Timestamp or str, optional) – start time of data import

  • stop (Timestamp or str, optional) – stop time of data import

  • ts_type (str, optional) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

  • require_all_vars_avail (bool) – if True, it is strictly required that all input variables are available.

  • **kwargs – optional and support for deprecated input args

Returns

loaded data objects (type GriddedData)

Return type

tuple

Raises
  • IOError – if input variable names is not list or string

  • VarNotAvailableError

    1. if require_all_vars_avail=True and one or more of the desired variables is not available in this class 2. if require_all_vars_avail=True and if none of the input variables is available in this object

read_var(var_name, start=None, stop=None, ts_type=None, experiment=None, vert_which=None, flex_ts_type=True, prefer_longer=False, aux_vars=None, aux_fun=None, constraints=None, **kwargs)[source]

Read model data for a specific variable

This method searches all valid files for a given variable and for a provided temporal resolution (e.g. daily, monthly), optionally within a certain time window, that may be specified on class instantiation or using the corresponding input parameters provided in this method.

The individual NetCDF files for a given temporal period are loaded as instances of the iris.Cube object and appended to an instance of the iris.cube.CubeList object. The latter is then used to concatenate the individual cubes in time into a single instance of the pyaerocom.GriddedData class. In order to ensure that this works, several things need to be ensured, which are listed in the following and which may be controlled within the global settings for NetCDF import using the attribute GRID_IO (instance of OnLoad) in the default instance of the pyaerocom.config.Config object accessible via pyaerocom.const.

Parameters
  • var_name (str) – variable that are supposed to be read

  • start (Timestamp or str, optional) – start time of data import

  • stop (Timestamp or str, optional) – stop time of data import

  • ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”). If None, prioritised of the available resolutions is used

  • experiment (str) – name of experiment (only relevant if this dataset contains more than one experiment)

  • vert_which (str or dict, optional) – valid AeroCom vertical info string encoded in name (e.g. Column, ModelLevel) or dictionary containing var_name as key and vertical coded string as value, accordingly

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

  • prefer_longer (bool) – if True and applicable, the ts_type resulting in the longer time coverage will be preferred over other possible frequencies that match the query.

  • aux_vars (list) – only relevant if var_name is not available for reading but needs to be computed: list of variables that are required to compute var_name

  • aux_fun (callable) – only relevant if var_name is not available for reading but needs to be computed: custom method for computation (cf. add_aux_compute() for details)

  • constraints (list, optional) – list of reading constraints (dict type). See check_constraint_valid() and apply_read_constraint() for details related to format of the individual constraints.

  • **kwargs – additional keyword args parsed to _load_var()

Returns

loaded data object

Return type

GriddedData

Raises
  • AttributeError – if none of the ts_types identified from file names is valid

  • VarNotAvailableError – if specified ts_type is not supported

property registered_var_patterns

List of string patterns for computation of variables

The information is extracted from AUX_REQUIRES

Returns

list of variable patterns

Return type

list

reinit()[source]

Reinit everything that is loaded specific to data_dir

search_all_files(update_file_convention=True)[source]

Search all valid model files for this model

This method browses the data directory and finds all valid files, that is, file that are named according to one of the aerocom file naming conventions. The file list is stored in files.

Note

It is presumed, that naming conventions of files in the data directory are not mixed but all correspond to either of the conventions defined in

Parameters

update_file_convention (bool) – if True, the first file in data_dir is used to identify the file naming convention (cf. FileConventionRead)

Raises

DataCoverageError – if no valid files could be found

search_data_dir()[source]

Search data directory based on model ID

Wrapper for method search_data_dir_aerocom()

Returns

data directory

Return type

str

Raises

IOError – if directory cannot be found

property start

First available year in the dataset (inferred from filenames)

Note

This is not variable or ts_type specific, so it is not necessarily given that data from this year is available for all variables in vars or all frequencies liste in ts_types

property stop

Last available year in the dataset (inferred from filenames)

Note

This is not variable or ts_type specific, so it is not necessarily given that data from this year is available for all variables in vars or all frequencies liste in ts_types

property ts_types

Available frequencies

update(**kwargs)[source]

Update one or more valid parameters

Parameters

**kwargs – keyword args that will be used to update (overwrite) valid class attributes such as data, data_dir, files

property vars
property vars_filename
property vars_provided

Variables provided by this dataset

property years

Wrapper for years_available

property years_avail

Years available in dataset

class pyaerocom.io.readgridded.ReadGriddedMulti(data_ids)[source]

Class for import of AEROCOM model data from multiple models

This class provides an interface to import model results from an arbitrary number of models and specific for a certain time interval (that can be defined, but must not be defined). Largely based on ReadGridded.

Note

The reading only works if files are stored using a valid file naming convention. See package data file file_conventions.ini for valid keys. You may define your own fileconvention in this file, if you wish.

data_ids

list containing string IDs of all models that should be imported

Type

list

results

dictionary containing ReadGridded instances for each name

Type

dict

Examples

>>> import pyaerocom, pandas
>>> start, stop = pandas.Timestamp("2012-1-1"), pandas.Timestamp("2012-5-1")
>>> models = ["AATSR_SU_v4.3", "CAM5.3-Oslo_CTRL2016"]
>>> read = pyaerocom.io.ReadGriddedMulti(models, start, stop)
>>> print(read.data_ids)
['AATSR_SU_v4.3', 'CAM5.3-Oslo_CTRL2016']
>>> read_cam = read['CAM5.3-Oslo_CTRL2016']
>>> assert type(read_cam) == pyaerocom.io.ReadGridded
>>> for var in read_cam.vars: print(var)
abs550aer
deltaz3d
humidity3d
od440aer
od550aer
od550aer3d
od550aerh2o
od550dryaer
od550dust
od550lt1aer
od870aer
read(vars_to_retrieve, start=None, stop=None, ts_type=None, **kwargs)[source]

High level method to import data for multiple variables and models

Parameters
  • var_names (str or list) – string IDs of all variables that are supposed to be imported

  • start (Timestamp or str, optional) – start time of data import (if valid input, then the current start will be overwritten)

  • stop (Timestamp or str, optional) – stop time of data import (if valid input, then the current start will be overwritten)

  • ts_type (str) – string specifying temporal resolution (choose from “hourly”, “3hourly”, “daily”, “monthly”).If None, prioritised of the available resolutions is used

  • flex_ts_type (bool) – if True and if applicable, then another ts_type is used in case the input ts_type is not available for this variable

Returns

loaded objects, keys are variable names, values are instances of GridddedData.

Return type

dict

Examples

>>> read = ReadGriddedMulti(names=["ECMWF_CAMS_REAN",
...                                "ECMWF_OSUITE"])
>>> read.read(["od550aer", "od550so4", "od550bc"])
pyaerocom.io.readgridded.check_pr_units(gridded)[source]
pyaerocom.io.readgridded.check_wdep_units(gridded)[source]
pyaerocom.io.readgridded.compute_concprcp_from_pr_and_wetdep(wdep, pr, ts_type=None, prlim=None, prlim_unit=None, prlim_set_under=None)[source]

Data stored using EMEP conventions

Created on Mon Feb 10 13:20:04 2020

@author: eirikg

class pyaerocom.io.read_mscw_ctm.ReadEMEP(*args, **kwargs)[source]

Old name of ReadMscwCtm.

class pyaerocom.io.read_mscw_ctm.ReadMscwCtm(filepath=None, data_id=None, data_dir=None)[source]

Class for reading model output from the EMEP MSC-W chemical transport model.

Parameters
  • data_id (str) – string ID of model (e.g. “AATSR_SU_v4.3”,”CAM5.3-Oslo_CTRL2016”)

  • filepath (str) – Path to netcdf file.

  • data_dir (str, optional) – Base directory of EMEP data, containing one or more netcdf files

filepath

Path to netcdf file

Type

str

data_id

ID of model

Type

str

data_dir

Base directory of EMEP data, containing one or more netcdf files

Type

str

vars_provided

Variables that are available to read in filepath or data_dir

Type

str

ts_types

Available temporal resolution in filepath or data_dir

Type

str

years_avail

Years available for reading

Type

str

AUX_FUNS = {'depso4': <function add_cubes>, 'sconcbc': <function add_cubes>, 'sconcno3': <function add_cubes>, 'sconcoa': <function add_cubes>, 'sconctno3': <function add_cubes>}
AUX_REQUIRES = {'depso4': ['dryso4', 'wetso4'], 'sconcbc': ['sconcbcf', 'sconcbcc'], 'sconcno3': ['sconcno3c', 'sconcno3f'], 'sconcoa': ['sconcoac', 'sconcoaf']}
property data_dir

Directory containing netcdf files

property data_id

Data ID of dataset

property filepath

Path to netcdf file

has_var(var_name)[source]

Check if variable is available

Parameters

var_name (str) – variable to be checked

Returns

Return type

bool

static preprocess_units(units, prefix=None)[source]
read_var(var_name, ts_type=None, **kwargs)[source]

Load data for given variable.

Parameters
  • var_name (str) – Variable to be read

  • ts_type (str) – Temporal resolution of data to read. (“hourly”, “daily”, “monthly” , “yearly”)

Returns

Return type

GriddedData

property ts_types
property vars_provided

Variables provided by this dataset

property years_avail

Years available in dataset

pyaerocom.io.read_mscw_ctm.ts_type_from_filename(filename)[source]

Reading of ungridded data

Other than gridded data, ungridded data represents data that is irregularly sampled in space and time, for instance, observations at different locations around the globe. Such data is represented in pyaerocom by UngriddedData which is essentially a point-cloud dataset. Reading of UngriddedData is typically specific for different observational data records, as they typically come in various data formats using various metadata conventions, which need to be harmonised, which is done during the data import.

The following flowchart illustrates the architecture of ungridded reading in pyaerocom. Below are information about the individual reading classes for each dataset (blue in flowchart), the abstract template base classes the reading classes are based on (dark green) and the factory class ReadUngridded (orange) which has registered all individual reading classes. The data classes that are returned by the reading class are indicated in light green.

_images/pyaerocom_ungridded_io_flowchart.png

ReadUngridded factory class

Factory class that has all reading class for the individual datasets registered.

class pyaerocom.io.readungridded.ReadUngridded(datasets_to_read=None, vars_to_retrieve=None, ignore_cache=False, data_dir=None)[source]

Factory class for reading of ungridded data based on obsnetwork ID

This class also features reading functionality that goes beyond reading of inidividual observation datasets; including, reading of multiple datasets and post computation of new variables based on datasets that can be read.

property DATASET_PATH

Data directory of dataset to read

Raises exception if more than one dataset to read is specified

DONOTCACHE_NAME = 'DONOTCACHE'
property SUPPORTED_DATASETS

Returns list of strings containing all supported dataset names

SUPPORTED_READERS = [<class 'pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3'>, <class 'pyaerocom.io.read_aeronet_invv2.ReadAeronetInvV2'>, <class 'pyaerocom.io.read_aeronet_sdav2.ReadAeronetSdaV2'>, <class 'pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3'>, <class 'pyaerocom.io.read_aeronet_sunv2.ReadAeronetSunV2'>, <class 'pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3'>, <class 'pyaerocom.io.read_earlinet.ReadEarlinet'>, <class 'pyaerocom.io.read_ebas.ReadEbas'>, <class 'pyaerocom.io.read_gaw.ReadGAW'>, <class 'pyaerocom.io.read_aasetal.ReadAasEtal'>, <class 'pyaerocom.io.read_ghost.ReadGhost'>]
property data_dir

Data directory(ies) for dataset(s) to read

dataset_provides_variables(dataset_to_read=None)[source]

List of variables provided by a certain dataset

property dataset_to_read

Helper that returns the dataset to be read

Note

Only works if a single dataset is assigned in datasets_to_read, else throws an ValueError.

Raises

ValueError – if datasets_to_read contains no or more than one entry.

property datasets_to_read

List of datasets supposed to be read

find_read_class(dataset_to_read)[source]

Find reading class for dataset name

Loops over all reading classes available in SUPPORTED_READERS and finds the first one that matches the input dataset name, by checking the attribute SUPPORTED_DATASETS in each respective reading class.

Parameters

dataset_to_read (str) – Name of dataset

Returns

instance of reading class (needs to be implementation of base class ReadUngriddedBase)

Return type

ReadUngriddedBase

Raises
  • NetworkNotSupported – if network is not supported by pyaerocom

  • NetworkNotImplemented – if network is supported but no reading routine is implemented yet

get_reader(dataset_to_read=None)[source]

Helper method that returns loaded reader class

Parameters

dataset_to_read (str) – Name of dataset

Returns

instance of reading class (needs to be implementation of base class ReadUngriddedBase)

Return type

ReadUngriddedBase

Raises
  • NetworkNotSupported – if network is not supported by pyaerocom

  • NetworkNotImplemented – if network is supported but no reading routine is implemented yet

get_vars_supported(obs_id, vars_desired)[source]

Filter input list of variables by supported ones for a certain data ID

Parameters
  • obs_id (str) – ID of observation network

  • vars_desired (list) – List of variables that are desired

Returns

list of variables that can be read through the input network

Return type

list

property ignore_cache

Boolean specifying whether caching is active or not

property post_compute

Information about datasets that can be computed in post

read(datasets_to_read=None, vars_to_retrieve=None, only_cached=False, filter_post=None, **kwargs)[source]

Read observations

Iter over all datasets in datasets_to_read, call read_dataset() and append to data object

Parameters
  • datasets_to_read (str or list) – data ID or list of all datasets to be imported

  • vars_to_retrieve (str or list) – variable or list of variables to be imported

  • only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)

  • filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via UngriddedData.apply_filters(). This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each dataset_to_read. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.

  • **kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.

Example

>>> import pyaerocom.io.readungridded as pio
>>> from pyaerocom import const
>>> obj = pio.ReadUngridded(dataset_to_read=const.AERONET_SUN_V3L15_AOD_ALL_POINTS_NAME)
>>> obj.read()
>>> print(obj)
>>> print(obj.metadata[0.]['latitude'])
read_dataset(dataset_to_read, vars_to_retrieve=None, only_cached=False, filter_post=None, **kwargs)[source]

Read dataset into an instance of ReadUngridded

Parameters
  • dataset_to_read (str) – name of dataset

  • vars_to_retrieve (str or list) – variable or list of variables to be imported

  • only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)

  • filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via UngriddedData.apply_filters(). This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each dataset_to_read. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.

  • **kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.

Returns

data object

Return type

UngriddedData

read_dataset_post(dataset_to_read, vars_to_retrieve, only_cached=False, filter_post=None, **kwargs)[source]

Read dataset into an instance of ReadUngridded

Parameters
  • dataset_to_read (str) – name of dataset

  • vars_to_retrieve (list) – variable or list of variables to be imported

  • only_cached (bool) – if True, then nothing is reloaded but only data is loaded that is available as cached objects (not recommended to use but may be used if working offline without connection to database)

  • filter_post (dict, optional) – filters applied to UngriddedData object AFTER it is read into memory, via UngriddedData.apply_filters(). This option was introduced in pyaerocom version 0.10.0 and should be used preferably over **kwargs. There is a certain flexibility with respect to how these filters can be defined, for instance, sub dicts for each dataset_to_read. The most common way would be to provide directly the input needed for UngriddedData.apply_filters. If you want to read multiple variables from one or more datasets, and if you want to apply variable specific filters, it is recommended to read the data individually for each variable and corresponding set of filters and then merge the individual filtered UngriddedData objects afterwards, e.g. using data_var1 & data_var2.

  • **kwargs – Additional input options for reading of data, which are applied WHILE the data is read. If any such additional options are provided that are applied during the reading, then automatic caching of the output UngriddedData object will be deactivated. Thus, it is recommended to handle data filtering via filter_post argument whenever possible, which will result in better performance as the unconstrained original data is read in and cached, and then the filtering is applied.

Returns

data object

Return type

UngriddedData

property supported_datasets

Wrapper for SUPPORTED_DATASETS

property vars_to_retrieve

Variables to retrieve (list or dict)

Dictionary can be used in case different variables from multiple datasets are supposed to be read.

ReadUngriddedBase template class

All ungridded reading routines are based on this template class.

class pyaerocom.io.readungriddedbase.ReadUngriddedBase(dataset_to_read=None, dataset_path=None)[source]

TEMPLATE: Abstract base class template for reading of ungridded data

Note

The two dictionaries AUX_REQUIRES and AUX_FUNS can be filled with variables that are not contained in the original data files but are computed during the reading. The former specifies what additional variables are required to perform the computation and the latter specifies functions used to perform the computations of the auxiliary variables. See, for instance, the class ReadAeronetSunV3, which includes the computation of the AOD at 550nm and the Angstrom coefficient (in 440-870 nm range) from AODs measured at other wavelengths.

AUX_FUNS = {}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

abstract property DATA_ID

Name of dataset (OBS_ID)

Note

  • May be implemented as global constant in header of derieved class

  • May be multiple that can be specified on init (see example below)

abstract property DEFAULT_VARS

List containing default variables to read

IGNORE_META_KEYS = []
abstract property PROVIDES_VARIABLES

List of variables that are provided by this dataset

Note

May be implemented as global constant in header

property REVISION_FILE

Name of revision file located in data directory

abstract property SUPPORTED_DATASETS

List of all datasets supported by this interface

Note

  • best practice to specify in header of class definition

  • needless to mention that DATA_ID needs to be in this list

abstract property TS_TYPE

Temporal resolution of dataset

This should be defined in the header of an implementation class if it can be globally defined for the corresponding obs-network or in other cases it should be initated as string undefined and then, if applicable, updated in the reading routine of a file.

The TS_TYPE information should ultimately be written into the meta-data of objects returned by the implementation of read_file() (e.g. instance of StationData or a normal dictionary) and the method read() (which should ALWAYS return an instance of the UngriddedData class).

Note

  • Please use "undefined" if the derived class is not sampled on a regular basis.

  • If applicable please use Aerocom ts_type (i.e. hourly, 3hourly, daily, monthly, yearly)

  • Note also, that the ts_type in a derived class may or may not be defined in a general case. For instance, in the EBAS database the resolution code can be found in the file header and may thus be intiated as "undefined" in the initiation of the reading class and then updated when the class is being read

  • For derived implementation classes that support reading of multiple network versions, you may also assign

check_vars_to_retrieve(vars_to_retrieve)[source]

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

compute_additional_vars(data, vars_to_compute)[source]

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)[source]

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)[source]

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

logger

Class own instance of logger class

abstract read(vars_to_retrieve=None, files=[], first_file=None, last_file=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used

Returns

instance of ungridded data object containing data from all files.

Return type

UngriddedData

abstract read_file(filename, vars_to_retrieve=None)[source]

Read single file

Parameters
  • filename (str) – string specifying filename

  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

Returns

imported data in a suitable format that can be handled by read() which is supposed to append the loaded results from this method (which reads one datafile) to an instance of UngriddedData for all files.

Return type

dict or StationData, or other…

read_first_file(**kwargs)[source]

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)[source]

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)[source]

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)[source]

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

AERONET

All AERONET reading classes are based on the template ReadAeronetBase class which, in turn inherits from ReadUngriddedBase.

class pyaerocom.io.readaeronetbase.ReadAeronetBase(dataset_to_read=None)[source]

Bases: pyaerocom.io.readungriddedbase.ReadUngriddedBase

TEMPLATE: Abstract base class template for reading of Aeronet data

Extended abstract base class, derived from low-level base class ReadUngriddedBase that contains some more functionality.

ALT_VAR_NAMES_FILE = {}

dictionary specifying alternative column names for variables defined in VAR_NAMES_FILE

Type

OPTIONAL

AUX_FUNS = {}
AUX_REQUIRES = {}
property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','

column delimiter in data block of files

property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

abstract property DATA_ID

Name of dataset (OBS_ID)

Note

  • May be implemented as global constant in header of derieved class

  • May be multiple that can be specified on init (see example below)

DEFAULT_UNIT = '1'

Default data unit that is assigned to all variables that are not specified in UNITS dictionary (cf. UNITS)

abstract property DEFAULT_VARS

List containing default variables to read

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'

name of measurement instrument

META_NAMES_FILE = {}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
abstract property PROVIDES_VARIABLES

List of variables that are provided by this dataset

Note

May be implemented as global constant in header

property REVISION_FILE

Name of revision file located in data directory

abstract property SUPPORTED_DATASETS

List of all datasets supported by this interface

Note

  • best practice to specify in header of class definition

  • needless to mention that DATA_ID needs to be in this list

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution. Key is the name of the dataset and value is the corresponding ts_type

UNITS = {}

Variable specific units, only required for variables that deviate from DEFAULT_UNIT (is irrelevant for all variables that are so far supported by the implemented Aeronet products, i.e. all variables are dimensionless as specified in DEFAULT_UNIT)

VAR_NAMES_FILE = {}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}

Mappings for identifying variables in file (may be specified in addition to explicit variable names specified in VAR_NAMES_FILE)

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)[source]

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()[source]
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

abstract read_file(filename, vars_to_retrieve=None)

Read single file

Parameters
  • filename (str) – string specifying filename

  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

Returns

imported data in a suitable format that can be handled by read() which is supposed to append the loaded results from this method (which reads one datafile) to an instance of UngriddedData for all files.

Return type

dict or StationData, or other…

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

AERONET Sun (V3)

class pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3(dataset_to_read=None)[source]

Bases: pyaerocom.io.readaeronetbase.ReadAeronetBase

Interface for reading Aeronet direct sun version 3 Level 1.5 and 2.0 data

See also

Base classes ReadAeronetBase and ReadUngriddedBase

ALT_VAR_NAMES_FILE = {}
AUX_FUNS = {'ang44&87aer': <function calc_ang4487aer>, 'od550aer': <function calc_od550aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'ang44&87aer': ['od440aer', 'od870aer'], 'od550aer': ['od440aer', 'od500aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','
property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'AeronetSunV3Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'
DEFAULT_VARS = ['od550aer', 'ang4487aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'
META_NAMES_FILE = {'altitude': 'Site_Elevation(m)', 'data_quality_level': 'Data_Quality_Level', 'date': 'Date(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year', 'instrument_number': 'AERONET_Instrument_Number', 'latitude': 'Site_Latitude(Degrees)', 'longitude': 'Site_Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = {'AERONET_Site': ['AERONET_Site_Name']}
NAN_VAL = -999.0
PROVIDES_VARIABLES = ['od340aer', 'od440aer', 'od500aer', 'od870aer', 'ang4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetSunV3Lev1.5.daily', 'AeronetSunV3Lev1.5.AP', 'AeronetSunV3Lev2.daily', 'AeronetSunV3Lev2.AP']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetSunV3Lev1.5.daily': 'daily', 'AeronetSunV3Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}
VAR_NAMES_FILE = {'ang4487aer': '440-870_Angstrom_Exponent', 'od340aer': 'AOD_340nm', 'od440aer': 'AOD_440nm', 'od500aer': 'AOD_500nm', 'od870aer': 'AOD_870nm'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {'AOD_([0-9]*)nm': 'od*aer'}

Mappings for identifying variables in file

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False, read_all_possible=False)[source]

Read Aeronet Sun V3 level 1.5 or 2 file

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

  • read_all_possible (bool) – if True, than all available variables belonging to either of the variable families that are specified in VAR_PATTERNS_FILE are read from the file (in addition to the ones that are specified via vars_to_retrieve).

Returns

dict-like object containing results

Return type

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

AERONET SDA (V3)

Read Aeronet SDA V3 data

class pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3(dataset_to_read=None)[source]

Bases: pyaerocom.io.readaeronetbase.ReadAeronetBase

Interface for reading Aeronet Sun SDA V3 Level 1.5 and 2.0 data

See also

Base classes ReadAeronetBase and ReadUngriddedBase

ALT_VAR_NAMES_FILE = {}
AUX_FUNS = {'od550aer': <function calc_od550aer>, 'od550gt1aer': <function calc_od550gt1aer>, 'od550lt1aer': <function calc_od550lt1aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'od550aer': ['od500aer', 'ang4487aer'], 'od550gt1aer': ['od500gt1aer', 'ang4487aer'], 'od550lt1aer': ['od500lt1aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','
property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'AeronetSDAV3Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'
DEFAULT_VARS = ['od550aer', 'od550gt1aer', 'od550lt1aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'
META_NAMES_FILE = {'altitude': 'Site_Elevation(m)', 'data_quality_level': 'Data_Quality_Level', 'date': 'Date_(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year', 'instrument_number': 'AERONET_Instrument_Number', 'latitude': 'Site_Latitude(Degrees)', 'longitude': 'Site_Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time_(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
NAN_VAL = -999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['od500gt1aer', 'od500lt1aer', 'od500aer', 'ang4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetSDAV3Lev1.5.daily', 'AeronetSDAV3Lev2.daily']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetSDAV3Lev1.5.daily': 'daily', 'AeronetSDAV3Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}
VAR_NAMES_FILE = {'ang4487aer': 'Angstrom_Exponent(AE)-Total_500nm[alpha]', 'od500aer': 'Total_AOD_500nm[tau_a]', 'od500gt1aer': 'Coarse_Mode_AOD_500nm[tau_c]', 'od500lt1aer': 'Fine_Mode_AOD_500nm[tau_f]'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}
check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet SDA V3 file and return it in a dictionary

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns

dict-like object containing results

Return type

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

AERONET Inversion (V3)

class pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3(dataset_to_read=None, level=None)[source]

Bases: pyaerocom.io.readaeronetbase.ReadAeronetBase

Interface for reading Aeronet inversion V3 Level 1.5 and 2.0 data

Parameters

dataset_to_read – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS

ALT_VAR_NAMES_FILE = {}
AUX_FUNS = {'abs550aer': <function calc_abs550aer>, 'od550aer': <function calc_od550aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'abs550aer': ['abs440aer', 'angabs4487aer'], 'od550aer': ['od440aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','
property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'AeronetInvV3Lev2.daily'

Name of dataset (OBS_ID)

DATA_LEVELS = {1.5: 'AeronetInvV3Lev1.5.daily', 2.0: 'AeronetInvV3Lev2.daily'}

Mapping for dataset location for different data levels that can be read with this interface (can be used when creating the object)

DEFAULT_UNIT = '1'
DEFAULT_VARS = ['abs550aer', 'od550aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'
META_NAMES_FILE = {'altitude': 'Elevation(m)', 'date': 'Date(dd:mm:yyyy)', 'day_of_year': 'Day_of_Year(fraction)', 'latitude': 'Latitude(Degrees)', 'longitude': 'Longitude(Degrees)', 'station_name': 'AERONET_Site', 'time': 'Time(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
NAN_VAL = -999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['abs440aer', 'angabs4487aer', 'od440aer', 'ang4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetInvV3Lev2.daily', 'AeronetInvV3Lev1.5.daily']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetInvV3Lev1.5.daily': 'daily', 'AeronetInvV3Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}
VAR_NAMES_FILE = {'abs440aer': 'Absorption_AOD[440nm]', 'ang4487aer': 'Extinction_Angstrom_Exponent_440-870nm-Total', 'angabs4487aer': 'Absorption_Angstrom_Exponent_440-870nm', 'od440aer': 'AOD_Extinction-Total[440nm]'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}
change_data_level(level)[source]

Change level of Inversion data

:param level float or int: data level (choose from 1.5 or 2) :param : data level (choose from 1.5 or 2)

Raises

ValueError – if input level is not available

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet file containing results from v2 inversion algorithm

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list) – list of str with variable names to read

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns

dict-like object containing results

Return type

StationData

Example

>>> import pyaerocom.io as pio
>>> obj = pio.read_aeronet_invv2.ReadAeronetInvV2()
>>> files = obj.get_file_list()
>>> filedata = obj.read_file(files[0])
read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

AERONET (older versions)

class pyaerocom.io.read_aeronet_sunv2.ReadAeronetSunV2(dataset_to_read=None)[source]

Bases: pyaerocom.io.readaeronetbase.ReadAeronetBase

Interface for reading Aeronet direct sun version 2 Level 2.0 data

See also

Base classes ReadAeronetBase and ReadUngriddedBase

Parameters

dataset_to_read – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS.

ALT_VAR_NAMES_FILE = {}
AUX_FUNS = {'ang4487aer_calc': <function calc_ang4487aer>, 'od550aer': <function calc_od550aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'ang4487aer_calc': ['od440aer', 'od870aer'], 'od550aer': ['od440aer', 'od500aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','
COL_INDEX = {'ang4487aer': 37, 'date': 0, 'julien_day': 2, 'od1020aer': 4, 'od1640aer': 3, 'od340aer': 18, 'od380aer': 17, 'od440aer': 15, 'od500aer': 12, 'od531aer': 11, 'od532aer': 10, 'od551aer': 9, 'od555aer': 8, 'od667aer': 7, 'od675aer': 6, 'od870aer': 5, 'time': 1}

Dictionary that specifies the index for each data column

property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'AeronetSunV2Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'
DEFAULT_VARS = ['od550aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'
META_NAMES_FILE = {}
META_NAMES_FILE_ALT = ({},)
NAN_VAL = -9999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['od1640aer', 'od1020aer', 'od870aer', 'od675aer', 'od667aer', 'od555aer', 'od551aer', 'od532aer', 'od531aer', 'od500aer', 'od440aer', 'od380aer', 'od340aer', 'ang4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetSunV2Lev2.daily', 'AeronetSunV2Lev2.AP']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetSunV2Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}
VAR_NAMES_FILE = {}
VAR_PATTERNS_FILE = {}
check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Overload of method in base class

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet Sun V2 level 2 file

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns

dict-like object containing results

Return type

StationData

Example

>>> import pyaerocom.io.read_aeronet_sunv2
>>> obj = pyaerocom.io.read_aeronet_sunv2.ReadAeronetSunV2()
>>> files = obj.get_file_list()
>>> filedata = obj.read_file(files[0])
read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

class pyaerocom.io.read_aeronet_sdav2.ReadAeronetSdaV2(dataset_to_read=None)[source]

Bases: pyaerocom.io.readaeronetbase.ReadAeronetBase

Interface for reading Aeronet Sun V2 Level 2 data

Parameters

dataset_to_read – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS.

ALT_VAR_NAMES_FILE = {}
AUX_FUNS = {'ang4487aer': <function calc_ang4487aer>, 'od550aer': <function calc_od550aer>, 'od550gt1aer': <function calc_od550gt1aer>, 'od550lt1aer': <function calc_od550lt1aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'ang4487aer': ['od440aer', 'od870aer'], 'od550aer': ['od500aer', 'ang4487aer'], 'od550gt1aer': ['od500gt1aer', 'ang4487aer'], 'od550lt1aer': ['od500lt1aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','
COL_INDEX = {'_ang50aer': 11, '_aod500aer_fiterr': 7, '_aod500gt1aer_rmse': 9, '_aod500lt1aer_rmse': 8, '_eta500lt1': 6, '_eta500lt1_rmse': 10, 'date': 0, 'julien_day': 2, 'od380aer': 27, 'od412aer': 26, 'od440aer': 25, 'od443aer': 24, 'od490aer': 23, 'od500aer': 3, 'od500aer_input': 22, 'od500gt1aer': 5, 'od500lt1aer': 4, 'od531aer': 21, 'od532aer': 20, 'od551aer': 19, 'od555aer': 18, 'od667aer': 17, 'od675aer': 16, 'od870aer': 15, 'time': 1}

Dictionary that specifies the index for each data column

property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'AeronetSDAV2Lev2.daily'

Name of dataset (OBS_ID)

DEFAULT_UNIT = '1'
DEFAULT_VARS = ['od550aer', 'od550gt1aer', 'od550lt1aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'
META_NAMES_FILE = {}
META_NAMES_FILE_ALT = ({},)
NAN_VAL = -9999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['date', 'time', 'julien_day', 'od500aer', 'od500lt1aer', 'od500gt1aer', '_eta500lt1', '_aod500aer_fiterr', '_aod500lt1aer_rmse', '_aod500gt1aer_rmse', '_eta500lt1_rmse', '_ang50aer', 'od870aer', 'od675aer', 'od667aer', 'od555aer', 'od551aer', 'od532aer', 'od531aer', 'od500aer_input', 'od490aer', 'od443aer', 'od440aer', 'od412aer', 'od380aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetSDAV2Lev2.daily']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetSDAV2Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}
VAR_NAMES_FILE = {}
VAR_PATTERNS_FILE = {}
check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Pointer to COL_INDEX

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet Sun SDA V2 file

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns

dict-like object containing results

Return type

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

class pyaerocom.io.read_aeronet_invv2.ReadAeronetInvV2(dataset_to_read=None, level=None)[source]

Bases: pyaerocom.io.readaeronetbase.ReadAeronetBase

Interface for reading Aeronet inversion V2 Level 1.5 and 2.0 data

Parameters

dataset_to_read – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS

ALT_VAR_NAMES_FILE = {'abs440aer': ['AOTAbsp439-T', 'AOTAbsp441-T', 'AOTAbsp438-T', 'AOTAbsp437-T', 'AOTAbsp442-T'], 'od440aer': ['AOTExt439-T', 'AOTExt441-T', 'AOTExt438-T', 'AOTExt437-T', 'AOTExt442-T'], 'ssa1020aer': ['SSA1022-T', 'SSA1016-T', 'SSA1018-T'], 'ssa440aer': ['SSA439-T', 'SSA441-T', 'SSA438-T', 'SSA437-T', 'SSA442-T'], 'ssa675aer': ['SSA676-T', 'SSA673-T', 'SSA674-T', 'SSA669-T', 'SSA677-T', 'SSA668-T', 'SSA672-T'], 'ssa870aer': ['SSA871-T', 'SSA869-T', 'SSA868-T', 'SSA873-T', 'SSA867-T', 'SSA872-T']}

dictionary specifying alternative column names for variables defined in VAR_NAMES_FILE. Check attribute _alt_vars_cols after running read().

Type

OPTIONAL

AUX_FUNS = {'abs550aer': <function calc_abs550aer>, 'od550aer': <function calc_od550aer>}

Functions that are used to compute additional variables (i.e. one for each variable defined in AUX_REQUIRES)

AUX_REQUIRES = {'abs550aer': ['abs440aer', 'angabs4487aer'], 'od550aer': ['od440aer', 'ang4487aer']}

dictionary containing information about additionally required variables for each auxiliary variable (i.e. each variable that is not provided by the original data but computed on import)

property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

COL_DELIM = ','
property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'AeronetInvV2Lev2.daily'

Name of dataset (OBS_ID)

DATA_LEVELS = {1.5: 'AeronetInvV2Lev1.5.daily', 2.0: 'AeronetInvV2Lev2.daily'}

Mapping for dataset location for different data levels that can be read with this interface (can be used when creating the object)

DEFAULT_UNIT = '1'
DEFAULT_VARS = ['ssa675aer', 'ssa440aer', 'ssa870aer', 'ssa1020aer', 'abs550aer', 'od550aer']

default variables for read method

IGNORE_META_KEYS = ['date', 'time', 'day_of_year']
INSTRUMENT_NAME = 'sun_photometer'
META_NAMES_FILE = {'data_quality_level': 'DATA_TYPE', 'date': 'Date(dd-mm-yyyy)', 'day_of_year': 'Julian_Day', 'time': 'Time(hh:mm:ss)'}

dictionary specifying the file column names (values) for each metadata key (cf. attributes of StationData, e.g. ‘station_name’, ‘longitude’, ‘latitude’, ‘altitude’)

META_NAMES_FILE_ALT = ({},)
NAN_VAL = -9999.0

value corresponding to invalid measurement

PROVIDES_VARIABLES = ['ssa440aer', 'ssa675aer', 'ssa870aer', 'ssa1020aer', 'od440aer', 'ang4487aer', 'abs440aer', 'angabs4487aer']

List of variables that are provided by this dataset (will be extended by auxiliary variables on class init, for details see __init__ method of base class ReadUngriddedBase)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['AeronetInvV2Lev2.daily', 'AeronetInvV2Lev1.5.daily']

List of all datasets supported by this interface

property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'AeronetInvV2Lev1.5.daily': 'daily', 'AeronetInvV2Lev2.daily': 'daily'}

dictionary assigning temporal resolution flags for supported datasets that are provided in a defined temporal resolution

UNITS = {}
VAR_NAMES_FILE = {'abs440aer': 'AOTAbsp440-T', 'ang4487aer': '870-440AngstromParam.[AOTExt]-Total', 'angabs4487aer': '870-440AngstromParam.[AOTAbsp]', 'od440aer': 'AOTExt440-T', 'ssa1020aer': 'SSA1020-T', 'ssa440aer': 'SSA440-T', 'ssa675aer': 'SSA675-T', 'ssa870aer': 'SSA870-T'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {}
change_data_level(level)[source]

Change level of Inversion data

:param level float or int: data level (choose from 1.5 or 2) :param : data level (choose from 1.5 or 2)

Raises

ValueError – if input level is not available

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

property col_index

Dictionary that specifies the index for each data column

Note

Implementation depends on the data. For instance, if the variable information is provided in all files (of all stations) and always in the same column, then this can be set as a fixed dictionary in the __init__ function of the implementation (see e.g. class ReadAeronetSunV2). In other cases, it may not be ensured that each variable is available in all files or the column definition may differ between different stations. In the latter case you may automise the column index retrieval by providing the header names for each meta and data column you want to extract using the attribute dictionaries META_NAMES_FILE and VAR_NAMES_FILE by calling _update_col_index() in your implementation of read_file() when you reach the line that contains the header information.

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(pattern=None)

Search all files to be read

Uses _FILEMASK (+ optional input search pattern, e.g. station_name) to find valid files for query.

Parameters

pattern (str, optional) – file name pattern applied to search

Returns

list containing retrieved file locations

Return type

list

Raises

IOError – if no files can be found

infer_wavelength_colname(colname, low=250, high=2000)

Get variable wavelength from column name

Parameters
  • colname (str) – string of column name

  • low (int) – lower limit of accepted value range

  • high (int) – upper limit of accepted value range

Returns

wavelength in nm as floating str

Return type

str

Raises

ValueError – if None or more than one number is detected in variable string

print_all_columns()
read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, file_pattern=None, common_meta=None)

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used. Note: is ignored if input parameter file_pattern is specified.

  • file_pattern (str, optional) – string pattern for file search (cf get_file_list())

  • common_meta (dict, optional) – dictionary that contains additional metadata shared for this network (assigned to each metadata block of the UngriddedData object that is returned)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, vars_as_series=False)[source]

Read Aeronet file containing results from v2 inversion algorithm

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list) – list of str with variable names to read

  • vars_as_series (bool) – if True, the data columns of all variables in the result dictionary are converted into pandas Series objects

Returns

dict-like object containing results

Return type

StationData

Example

>>> import pyaerocom.io as pio
>>> obj = pio.read_aeronet_invv2.ReadAeronetInvV2()
>>> files = obj.get_file_list()
>>> filedata = obj.read_file(files[0])
read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

EARLINET

class pyaerocom.io.read_earlinet.ReadEarlinet(dataset_to_read=None)[source]

Bases: pyaerocom.io.readungriddedbase.ReadUngriddedBase

Interface for reading of EARLINET data

ALTITUDE_ID = 'Altitude'

variable name of altitude in files

AUX_FUNS = {}
AUX_REQUIRES = {}
property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'EARLINET'

Name of dataset (OBS_ID)

DEFAULT_VARS = ['ec532aer']

default variables for read method

ERR_VARNAMES = {'ec355aer': 'ErrorExtinction', 'ec532aer': 'ErrorExtinction'}

Variable names of uncertainty data

EXCLUDE_CASES = ['cirrus.txt']
IGNORE_META_KEYS = []
KEEP_ADD_META = ['location', 'wavelength_emis', 'wavelength_det', 'res_raw_m', 'zenith_ang_deg', 'comments', 'shots_avg', 'detection_mode', 'res_eval', 'input_params', 'eval_method']

Metadata keys from META_NAMES_FILE that are additional to standard keys defined in StationMetaData and that are supposed to be inserted into UngriddedData object created in read()

META_NAMES_FILE = {'altitude': 'Altitude_meter_asl', 'comments': 'Comments', 'detection_mode': 'DetectionMode', 'eval_method': 'EvaluationMethod', 'input_params': 'InputParameters', 'instrument_name': 'System', 'latitude': 'Latitude_degrees_north', 'location': 'Location', 'longitude': 'Longitude_degrees_east', 'res_eval': 'ResolutionEvaluated', 'res_raw_m': 'ResolutionRaw_meter', 'shots_avg': 'ShotsAveraged', 'start_date': 'StartDate', 'start_utc': 'StartTime_UT', 'stop_utc': 'StopTime_UT', 'wavelength_det': 'DetectionWavelength_nm', 'wavelength_emis': 'EmissionWavelength_nm', 'zenith_ang_deg': 'ZenithAngle_degrees'}

metadata names that are supposed to be imported

META_NEEDED = ['Location', 'StartDate', 'StartTime_UT', 'StopTime_UT', 'Longitude_degrees_east', 'Latitude_degrees_north', 'Altitude_meter_asl']

metadata keys that are needed for reading (must be values in META_NAMES_FILE)

PROVIDES_VARIABLES = ['ec532aer', 'ec355aer', 'bsc532aer', 'bsc355aer', 'bsc1064aer', 'zdust']
READ_ERR = True

If true, the uncertainties are also read (where available, cf. ERR_VARNAMES)

property REVISION_FILE

Name of revision file located in data directory

SUPPORTED_DATASETS = ['EARLINET']

List of all datasets supported by this interface

TS_TYPE = 'native'

temporal resolution

VAR_NAMES_FILE = {'bsc1064aer': 'Backscatter', 'bsc355aer': 'Backscatter', 'bsc532aer': 'Backscatter', 'ec1064aer': 'Extinction', 'ec355aer': 'Extinction', 'ec532aer': 'Extinction', 'zdust': 'DustLayerHeight'}

dictionary specifying the file column names (values) for each Aerocom variable (keys)

VAR_PATTERNS_FILE = {'bsc1064aer': '*.b1064', 'bsc355aer': '*.b355', 'bsc532aer': '*.b532', 'ec355aer': '*.e355', 'ec532aer': '*.e532', 'zdust': '*.e*'}

dictionary specifying the file search patterns for each variable

VAR_UNIT_NAMES = {'Altitude': 'units', 'Backscatter': ['BackscatterUnits', 'units'], 'Extinction': ['ExtinctionUnits', 'units']}

Attribute access names for unit reading of variable data

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

compute_additional_vars(data, vars_to_compute)

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

copy()[source]

Make and return a deepcopy of this object

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
exclude_files

files that are supposed to be excluded from reading

excluded_files

files that were actually excluded from reading

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

get_file_list(vars_to_retrieve=None, pattern=None)[source]

Perform recusive file search for all input variables

Note

Overloaded implementation of base class, since for Earlinet, the paths are variable dependent

Parameters
  • vars_to_retrieve (list) – list of variables to retrieve

  • pattern (str, optional) – file name pattern applied to search

Returns

list containing file paths

Return type

list

read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, read_err=None, remove_outliers=True, pattern=None)[source]

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used

  • read_err (bool) –

    if True, uncertainty data is also read (where available). If

    unspecified (None), then the default is used (cf. READ_ERR)

    patternstr, optional

    string pattern for file search (cf get_file_list())

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, read_err=None, remove_outliers=True)[source]

Read EARLINET file and return it as instance of StationData

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read. If None, use DEFAULT_VARS

  • read_err (bool) – if True, uncertainty data is also read (where available).

  • remove_outliers (bool) – if True, outliers are removed for each variable using the minimum and maximum attributes for that variable (accessed via pyaerocom.const.VARS[var_name]).

Returns

dict-like object containing results

Return type

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

EBAS

class pyaerocom.io.read_ebas.ReadEbas(dataset_to_read=None)[source]

Bases: pyaerocom.io.readungriddedbase.ReadUngriddedBase

Interface for reading EBAS data

Parameters

dataset_to_read – string specifying either of the supported datasets that are defined in SUPPORTED_DATASETS

ASSUME_AAE_SHIFT_WVL = 1.0
ASSUME_AE_SHIFT_WVL = 1
AUX_FUNS = {'ac550dryaer': <function compute_ac550dryaer>, 'ang4470dryaer': <function compute_ang4470dryaer_from_dry_scat>, 'sc440dryaer': <function compute_sc440dryaer>, 'sc550dryaer': <function compute_sc550dryaer>, 'sc700dryaer': <function compute_sc700dryaer>}
AUX_REQUIRES = {'ac550dryaer': ['ac550aer', 'acrh'], 'ang4470dryaer': ['sc440dryaer', 'sc700dryaer'], 'sc440dryaer': ['sc440aer', 'scrh'], 'sc550dryaer': ['sc550aer', 'scrh'], 'sc700dryaer': ['sc700aer', 'scrh']}
AUX_USE_META = {'ac550dryaer': 'ac550aer', 'sc440dryaer': 'sc440aer', 'sc550dryaer': 'sc550aer', 'sc700dryaer': 'sc700aer'}
property AUX_VARS

List of auxiliary variables (keys of attr. AUX_REQUIRES)

Auxiliary variables are those that are not included in original files but are computed from other variables during import

CACHE_SQLITE_FILE = ['EBASMC']

For the following data IDs, the sqlite database file will be cached if const.EBAS_DB_LOCAL_CACHE is True

property DATASET_PATH

Path to datafiles of specified dataset

Is retrieved automatically (if not specified explicitely on class instantiation), based on network ID (DATA_ID) using get_obsnetwork_dir() (which uses the information in pyaerocom.const).

DATA_ID = 'EBASMC'

Name of dataset (OBS_ID)

DEFAULT_VARS = ['ac550aer', 'sc550aer']

default variables for read method

property FILE_REQUEST_OPTS

List of options for file retrieval

FILE_SUBDIR_NAME = 'data'

Name of subdirectory containing data files (relative to DATASET_PATH)

IGNORE_FILES = ['CA0420G.20100101000000.20190125102503.filter_absorption_photometer.aerosol_absorption_coefficient.aerosol.1y.1h.CA01L_Magee_AE31_ALT.CA01L_aethalometer.lev2.nas']
IGNORE_META_KEYS = []
IGNORE_WAVELENGTH = ['conceqbc']
MERGE_STATIONS = {'Birkenes': 'Birkenes II'}
property NAN_VAL

Irrelevant for implementation of EBAS I/O

property PROVIDES_VARIABLES

List of variables provided by the interface

property REVISION_FILE

Name of revision file located in data directory

SQL_DB_NAME = 'ebas_file_index.sqlite3'

Name of sqlite database file

SUPPORTED_DATASETS = ['EBASMC']

List of all datasets supported by this interface

TS_TYPE = 'undefined'
TS_TYPE_CODES = {'1d': 'daily', '1h': 'hourly', '1mn': 'minutely', '1mo': 'monthly', '1w': 'weekly', 'd': 'daily', 'h': 'hourly', 'mn': 'minutely', 'mo': 'monthly', 'w': 'weekly'}

Temporal resolution codes that (so far) can be understood by pyaerocom

property all_station_names

List of all available station names in EBAS database

check_vars_to_retrieve(vars_to_retrieve)

Separate variables that are in file from those that are computed

Some of the provided variables by this interface are not included in the data files but are computed within this class during data import (e.g. od550aer, ang4487aer).

The latter may require additional parameters to be retrieved from the file, which is specified in the class header (cf. attribute AUX_REQUIRES).

This function checks the input list that specifies all required variables and separates them into two lists, one that includes all variables that can be read from the files and a second list that specifies all variables that are computed in this class.

Parameters

vars_to_retrieve (list) – all parameter names that are supposed to be loaded

Returns

2-element tuple, containing

  • list: list containing all variables to be read

  • list: list containing all variables to be computed

Return type

tuple

compute_additional_vars(data, vars_to_compute)[source]

Compute additional variables and put into station data

Note

Extended version of ReadUngriddedBase.compute_additional_vars()

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

property data_id

Wrapper for DATA_ID (pyaerocom standard name)

property data_revision

Revision string from file Revision.txt in the main data directory

property dataset_to_read
property eval_flags

Boolean specifying whether to use EBAS flag columns

property file_dir

Directory containing EBAS NASA Ames files

property file_index

SQlite file mapping metadata with filenames

files_contain

this is filled in method get_file_list and specifies variables to be read from each file

find_in_file_list(pattern=None)

Find all files that match a certain wildcard pattern

Parameters

pattern (str, optional) – wildcard pattern that may be used to narrow down the search (e.g. use pattern=*Berlin* to find only files that contain Berlin in their filename)

Returns

list containing all files in files that match pattern

Return type

list

Raises

IOError – if no matches can be found

find_station_matches(stats_or_patterns)[source]
find_var_cols(vars_to_read, loaded_nasa_ames)[source]

Find best-match variable columns in loaded NASA Ames file

For each of the input variables, try to find one or more matches in the input NASA Ames file (loaded data object). If more than one match occurs, identify the best one (an example here is: user wants sc550aer and file contains scattering coefficients at 530 nm and 580 nm: in this case the 530 nm column will be used, cf. also accepted wavelength tolerance for reading of wavelength dependent variables wavelength_tol_nm).

Parameters
  • vars_to_read (list) – list of variables that are supposed to be read

  • loaded_nasa_ames (EbasNasaAmesFile) – loaded data object

Returns

dictionary specifying the best-match variable column for each of the input variables.

Return type

dict

get_ebas_var(var_name)[source]

Get instance of EbasVarInfo for input AeroCom variable

get_file_list(vars_to_retrieve=None, **constraints)[source]

Get list of files for all variables to retrieve

Parameters
Returns

unified list of file paths each containing either of the specified variables

Return type

list

property ignore_statistics

List containing column statistics keys to be ignored

property keep_aux_vars

Keep auxiliary variables during reading

Type

Option

property merge_meta

if True, then common meta-data blocks are merged on reading

Type

Option

property prefer_statistics

List containing preferred statistics columns

read(vars_to_retrieve=None, first_file=None, last_file=None, multiproc=False, files=None, **constraints)[source]

Method that reads list of files as instance of UngriddedData

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) – index of last file in list to read. If None, the very last file in the list is used

  • **constraints – further reading constraints deviating from default (default info for each AEROCOM variable can be found in `ebas_config.ini < https://github.com/metno/pyaerocom/blob/master/pyaerocom/data/ ebas_config.ini>`__). For details on possible input parameters see EbasSQLRequest (or this tutorial)

Returns

data object

Return type

UngriddedData

read_file(filename, vars_to_retrieve=None, _vars_to_read=None, _vars_to_compute=None)[source]

Read EBAS NASA Ames file

Parameters
  • filename (str) – absolute path to filename to read

  • vars_to_retrieve (list, optional) – list of str with variable names to read, if None (and if not both of the alternative possible parameters _vars_to_read and _vars_to_compute are specified explicitely) then the default settings are used

Returns

dict-like object containing results

Return type

StationData

read_first_file(**kwargs)

Read first file returned from get_file_list()

Note

This method may be used for test purposes.

Parameters

**kwargs – keyword args passed to read_file() (e.g. vars_to_retrieve)

Returns

dictionary or similar containing loaded results from first file

Return type

dict-like

read_station(station_id_filename, **kwargs)

Read data from a single station into UngriddedData

Find all files that contain the station ID in their filename and then call read(), providing the reduced filelist as input, in order to read all files from this station into data object.

Parameters
  • station_id_filename (str) – name of station (MUST be encrypted in filename)

  • **kwargs – additional keyword args passed to read() (e.g. vars_to_retrieve)

Returns

loaded data

Return type

UngriddedData

Raises

IOError – if no files can be found for this station ID

remove_outliers(data, vars_to_retrieve, **valid_rng_vars)

Remove outliers from data

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_retrieve (list) – list of variable names for which outliers will be removed from data

  • **valid_rng_vars – additional keyword args specifying variable name and corresponding min / max interval (list or tuple) that specifies valid range for the variable. For each variable that is not explicitely defined here, the default minimum / maximum value is used (accessed via pyaerocom.const.VARS[var_name])

property sqlite_database_file

Path to EBAS SQL database

var_info(var_name)[source]

Aerocom variable info for input var_name

var_supported(var_name)

Check if input variable is supported

Parameters

var_name (str) – AeroCom variable name or alias

Raises

VariableDefinitionError – if input variable is not supported by pyaerocom

Returns

True, if variable is supported by this interface, else False

Return type

bool

property verbosity_level

Current level of verbosity of logger

property wavelength_tol_nm

Wavelength tolerance in nm for columns

class pyaerocom.io.read_ebas.ReadEbasOptions[source]

Bases: pyaerocom._lowlevel_helpers.BrowseDict

Options for EBAS reading routine

prefer_statistics

preferred order of data statistics. Some files may contain multiple columns for one variable, where each column corresponds to one of the here defined statistics that where applied to the data. This attribute is only considered for ebas variables, that have not explicitely defined what statistics to use (and in which preferred order, if applicable). Reading preferences for all Ebas variables are specified in the file ebas_config.ini in the data directory of pyaerocom.

Type

list

ignore_statistics

columns that have either of these statistics applied are ignored for variable data reading.

Type

list

wavelength_tol_nm

Wavelength tolerance in nm for reading of (wavelength dependent) variables. If multiple matches occur (e.g. query -> variable at 550nm but file contains 3 columns of that variable, e.g. at 520, 530 and 540 nm), then the closest wavelength to the queried wavelength is used within the specified tolerance level.

Type

int

eval_flags

If True, the flag columns in the NASA Ames files are read and decoded (using EbasFlagCol.decode()) and the (up to 3 flags for each measurement) are evaluated as valid / invalid using the information in the flags CSV file. The evaluated flags are stored in the data files returned by the reading methods ReadEbas.read() and ReadEbas.read_file().

Type

bool

keep_aux_vars

if True, auxiliary variables required for computed variables will be written to the UngriddedData object created in ReadEbas.read() (e.g. if sc550dryaer is requested, this requires reading of sc550aer and scrh. The latter 2 will be written to the data object if this parameter evaluates to True)

Type

bool

merge_meta

if True, then UngriddedData.merge_common_meta() will be called at the end of ReadEbas.read() (merges common metadata blocks together)

Type

bool

clear() → None. Remove all items from od.
copy() → a shallow copy of od
property filter_dict
fromkeys(value=None)

Create a new ordered dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
move_to_end(key, last=True)

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(k[, d]) → v, remove specified key and return the corresponding

value. If key is not found, d is returned if given, otherwise KeyError is raised.

popitem(last=True)

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D’s values

EBAS (low level)

Pyearocom module for reading and processing of EBAS NASA Ames files

For details on the file format see here

class pyaerocom.io.ebas_nasa_ames.EbasColDef(name, is_var, is_flag, unit='1')[source]

Dict-like object for EBAS NASA Ames column definitions

Note

The meta attribute name ‘unit’ can also be accessed using the CF attr name ‘units’

name

column name

Type

str

unit

unit of data in column (if applicable)

Type

str

is_var

True if column corresponds to variable data, False if not

Type

bool

is_flag

True, if column corresponds to Flag column, False if not

Type

bool

flag_col

column number of flag column that corresponds to this data column (only relevant if is_var is True)

Type

int

Parameters
  • name (str) – column name

  • is_var (bool) – True if column corresponds to variable data, False if not

  • is_flag (bool) – True, if column corresponds to Flag column, False if not

  • unit (str, optional) – unit of data in column (if applicable)

  • flag_col (str, optional) – name of flag column that corresponds to this data colum (only relevant if is_var is True)

get_wavelength_nm()[source]

Try to access wavelength information in nm (as float)

to_dict(ignore_keys=['is_var', 'is_flag', 'flag_col', 'wavelength_nm'])[source]
to_str()[source]
class pyaerocom.io.ebas_nasa_ames.EbasFlagCol(raw_data, info, interpret_on_init=True)[source]

Simple helper class to decode and interpret EBAS flag columns

info

EBAS column definition information for flag column (from file header)

Type

EbasColDef

raw_data

raw flag column (containing X-digit floating point numbers)

Type

ndarray

property FLAG_INFO

Detailed information about EBAS flag definitions

decode()[source]

Decode raw flag column

property decoded

Nx3 numpy array containing decoded flag columns

property valid

Boolean array specifying valid and invalid measurements

class pyaerocom.io.ebas_nasa_ames.EbasNasaAmesFile(file=None, only_head=False, replace_invalid_nan=True, convert_timestamps=True, evaluate_flags=False, quality_check=True, **kwargs)[source]

EBAS NASA Ames file interface

Class interface for reading and processing of EBAS NASA Ames file

time_stamps

array containing datetime64 objects with timestamps

Type

ndarray

flags

dictionary containing EbasFlagCol objects for each column containing flags

Type

dict

Parameters
  • file (str, optional) – EBAS NASA Ames file. if valid file path, then the file is read on init (please note following options for import)

  • only_head (bool) – read only file header

  • replace_invalid_nan (bool) – replace all invalid values in the table by NaNs. The invalid values for each dependent data column are identified based on the information in the file header.

  • convert_timestamps (bool) – compute array of numpy datetime64 timestamps from numeric timestamps in data

  • evaluate_flags (bool) – if True, all flags in all flag columns are decoded from floating point representation to 3 integers, e.g. 0.111222333 -> 111 222 333

  • quality_check (bool) – perform quality check after import (for details see _quality_check())

  • **kwargs – optional input args that are passed to init of NasaAmesHeader base class

ERR_HIGH_STATS = 'percentile:84.13'
ERR_LOW_STATS = 'percentile:15.87'
TIMEUNIT2SECFAC = {'Days': 86400, 'days': 86400}
assign_flagcols()[source]
property base_date

Base date of data as numpy.datetime64[s]

property col_names

Column names of table

property col_names_vars

Names of all columns that are flagged as variables

property col_num

Number of columns in table

property col_nums_vars

Column index number of all variables

compute_time_stamps()[source]

Compute time stamps from first two data columns

property data

2D numpy array containing data table

property data_header
get_colnames_unique()[source]

Create a list of unique column names

get_time_differences_meas(np_freq='s')[source]

Get array with time between individual measurements

This is computed based on start and timestamps, e.g. =dt[0] = start[1] - start[0]

Parameters

np_freq (str) – string specifying output frequency of gap values

Returns

array with time-differences as floating point number in specified input resolution

Return type

ndarray

get_time_gaps_meas(np_freq='s')[source]

Get array with time gaps between individual measurements

This is computed based on start and stop timestamps, e.g. =dt[0] = start[1] - stop[0]

Parameters

np_freq (str) – string specifying output frequency of gap values

Returns

array with time-differences as floating point number in specified input resolution

Return type

ndarray

init_flags(evaluate=True)[source]

Decode flag columns and store info in flags

static numarr_to_datetime64(basedate, num_arr, mulfac_to_sec)[source]

Convert array of numerical timestamps into datetime64 array

Parameters
  • basedate (datetime64) – reference date

  • num_arr (ndarray) – numerical time stamps relative to basedate

  • mulfac_to_sec (float) – multiplicative factor to convert numerical values to unit of seconds

Returns

array containing timestamps as datetime64 objects

Return type

ndarray

plot_var(var_name, statistics='arithmetic mean', ax=None, style=None, **kwargs)[source]

Plot time series of one column

If percentiles are available, they will be plotted as shaded area

Parameters
  • var_name (str) – EBAS variable name

  • statistics (str) – statistics specifications

print_col_info()[source]

Print information about individual columns

read_file(nasa_ames_file, only_head=False, replace_invalid_nan=True, convert_timestamps=True, evaluate_flags=False, quality_check=False)[source]

Read NASA Ames file

Parameters
  • nasa_ames_file (str) – EBAS NASA Ames file

  • only_head (bool) – read only file header

  • replace_invalid_nan (bool) – replace all invalid values in the table by NaNs. The invalid values for each dependent data column are identified based on the information in the file header.

  • convert_timestamps (bool) – compute array of numpy datetime64 timestamps from numeric timestamps in data

  • evaluate_flags (bool) – if True, all data columns get assigned their corresponding flag column, the flags in all flag columns are decoded from floating point representation to 3 integers, e.g. 0.111222333 -> 111 222 333 and if input `replace_invalid_nan==True`, then the invalid measurements in each column are replaced with NaN’s.

  • quality_check (bool) – perform quality check after import (for details see _quality_check())

read_header(nasa_ames_file, quality_check=True)[source]
set_invalid_flags_nan(colnum=None)[source]

Use flag column information to identify and remove invalid measurements

property shape

Shape of data array

property time_unit

Time unit of data

to_dataframe(var_name=None, wavelength_nm=None, statistics=None)[source]

Convert table to dataframe

Parameters
  • var_name (str, optional) – EBAS variable name (e.g. aerosol_light_scattering_coefficient). If specified, only columns corresponding to this variable name will be extracted into the dataframe

  • wavelength_nm (int, tuple, optional) – wavelength (or wavelength range -> list or tuple input) in nm. If specified, only columns containing wavelength dependent data as specified are extracted and put into the Dataframe

  • statistics (str, optional) – specify column statistics (e.g. arithmetic mean)

class pyaerocom.io.ebas_nasa_ames.NasaAmesHeader(**kwargs)[source]

Header class for Ebas NASA Ames file

Note

Is used in EbasNasaAmesFile and should not be used directly.

CONV_FLOAT()
CONV_INT()
CONV_MULTIFLOAT()
CONV_MULTIINT()
CONV_PI()
CONV_STR()
property head_fix

Dictionary containing fixed header info (that is always available)

property meta

Meta data dictionary (specific for this file)

update(**kwargs)[source]
property var_defs

List containing column variable definitions

List index is column index in file and value is instance of EbasColDef

exception pyaerocom.io.ebas_nasa_ames.NasaAmesReadError[source]
exception pyaerocom.io.ebas_nasa_ames.NasaAmesVariableError[source]
class pyaerocom.io.ebas_file_index.EbasFileIndex(database=None)[source]

EBAS SQLite I/O interface

Takes care of connection to database and execution of requests

property ALL_INSTRUMENTS

List of all variables available

property ALL_MATRICES

List of all matrix values available

property ALL_STATION_CODES

List of all available station codes in database

Note

Not tested whether the order is the same as the order in STATION_NAMES, i.e. the lists should not be linked to each other

property ALL_STATION_NAMES

List of all available station names in database

property ALL_STATISTICS_PARAMS

List of all statistical parameters available

For more info see here

property ALL_VARIABLES

List of all variables available

contains_altitudes(request)[source]

List altitudes of stations contained in request

Parameters

request (EbasSQLRequest) – request class

Returns

list containing result

Return type

list

contains_coordinates(request)[source]

List all station coordinates (lon, lat) that are contained in request

Parameters

request (EbasSQLRequest) – request class

Returns

list containing result

Return type

list

contains_matrices(request)[source]

List all matrices that are contained in request

Parameters

request (EbasSQLRequest) – request class

Returns

list containing result

Return type

list

contains_station_names(request)[source]

List all station_names that are contained in request

Parameters

request (EbasSQLRequest) – request class

Returns

list containing result

Return type

list

contains_variables(request)[source]

List all variables contained in request

Parameters

request (EbasSQLRequest) – request class

Returns

list containing result

Return type

list

property database

Path to ebas_file_index.sqlite3 file

execute_request(request)[source]

Connect to database and retrieve data for input request

Parameters

request (EbasSQLRequest or str) – request specifications

Returns

list of tuples containing the retrieved results. The number of items in each tuple corresponds to the number of requested parameters (usually one, can be specified in make_query_str() using argument what)

Return type

list

execute_request_fast(request)[source]

Connect to database and retrieve data for input request

Parameters

request (EbasSQLRequest or str) – request specifications

Returns

list of tuples containing the retrieved results. The number of items in each tuple corresponds to the number of requested parameters (usually one, can be specified in make_query_str() using argument what)

Return type

list

get_file_names(request)[source]

Get all files that match the request specifications

Parameters

request (EbasSQLRequest or str) – request specifications

Returns

list of file paths that match the request

Return type

list

table_columns(table_name)[source]
table_names()[source]
class pyaerocom.io.ebas_file_index.EbasSQLRequest(variables=None, start_date=None, stop_date=None, station_names=None, matrices=None, altitude_range=None, lon_range=None, lat_range=None, instrument_types=None, statistics=None, datalevel=None)[source]

Low level dictionary like object for EBAS sqlite queries

variables

tuple containing variable names to be extracted (e.g. ('aerosol_light_scattering_coefficient', 'aerosol_optical_depth')). If None, all available is used

Type

tuple, optional

start_date

start date of data request (format YYYY-MM-DD). If None, all available is used

Type

str, optional

stop_date

stop date of data request (format YYYY-MM-DD). If None, all available is used

Type

str, optional

station_names

tuple containing station_names of request (e.g. ('Birkenes II', 'Asa')).If None, all available is used

Type

tuple, optional

matrices

tuple containing station_names of request (e.g. ('pm1', 'pm10', 'pm25', 'aerosol')) If None, all available is used

Type

tuple, optional

altitude_range

tuple specifying altitude range of station in m (e.g. (0.0, 500.0)). If None, all available is used

Type

tuple, optional

lon_range

tuple specifying longitude range of station in degrees (e.g. (-20, 20)). If None, all available is used

Type

tuple, optional

lat_range

tuple specifying latitude range of station in degrees (e.g. (50, 80)). If None, all available is used

Type

tuple, optional

instrument_type

string specifying instrument types (e.g. ("nephelometer"))

Type

str, optional

statistics

string specifying statistics code (e.g. ("arithmetic mean"))

Type

tuple, optional

Parameters

Attributes (see) –

make_file_query_str(distinct=True, **kwargs)[source]

Wrapper for base method make_query_str()

Parameters
  • distinct (bool) – return unique files

  • **kwargs – update request attributes (e.g. lon_range=(30, 60))

Returns

SQL file request command for current specs

Return type

str

make_query_str(what='filename', distinct=True, **kwargs)[source]

Translate current class state into SQL query command string

Parameters
  • what (str or tuple) – what columns to retrieve (e.g. comp_name for all variables) from table specified.

  • distinct (bool) – return unique files

  • **kwargs – update request attributes (e.g. lon_range=(30, 60))

Returns

SQL file request command for current specs

Return type

str

update([E, ]**F) → None. Update D from dict/iterable E and F.[source]

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

class pyaerocom.io.ebas_varinfo.EbasVarInfo(var_name, init=True, **kwargs)[source]

Interface for mapping between EBAS variable information and AeroCom

For more information about EBAS variable and data information see EBAS website.

var_name

AeroCom variable name

Type

str

component

list of EBAS variable / component names that are mapped to var_name

Type

list

matrix

list of EBAS matrix values that are accepted, default is None, i.e. all available matrices are used

Type

list, optional

instrument

list of all instruments that are accepted for this variable

Type

list, optional

requires

for variables that are computed and not directly available in EBAS. Provided as list of (AeroCom) variables that are required to compute var_name (e.g. for sc550dryaer this would be [sc550aer,scrh]).

Type

list, optional

scale_factor

multiplicative scale factor that is applied in order to convert EBAS variable into AeroCom variable (e.g. 1.4 for conversion of EBAS OC measurement to AeroCom concoa variable)

Type

float, optional

old_name

old variable name (refers to outdated conventions, currently not used)

Type

str

Parameters
  • var_name (str) – AeroCom variable name

  • init (bool) – if True, EBAS configuration for input variable is retrieved from data file ebas_config.ini (if possible)

  • **kwargs – additional keyword arguments (currently not used)

static PROVIDES_VARIABLES()[source]

List specifying provided variables

get_all_components()[source]

Get list of all components

instrument

list of instrument names (EBAS side, optional)

make_sql_request(**constraints)[source]

Create an SQL request for the specifications in this object

Parameters

constraints – request constraints deviating from default. For details on parameters see EbasSQLRequest

Returns

the SQL request object that can be used to retrieve corresponding file names using instance of EbasFileIndex.get_file_names().

Return type

EbasSQLRequest

matrix

list of matrix names (EBAS side, optional)

old_name

old variable name

static open_config()[source]
parse_from_ini(var_name=None, conf_reader=None)[source]

Parse EBAS info for input AeroCom variable (works also for aliases)

Parameters
  • var_name (str) – AeroCom variable name

  • conf_reader (ConfigParser) – open config parser object

Raises

VarNotAvailableError – if variable is not supported

Returns

True, if default could be loaded, False if not

Return type

bool

requires

list of additional variable required for retrieval of this variable

scale_factor

scale factor for conversion to Aerocom units

statistics

list containing variable statistics info (EBAS side, optional)

to_dict()[source]

Convert into dictionary

property var_name_aerocom

Variable name in AeroCom convention

pyaerocom.io.ebas_varinfo.check_all_variables()[source]

Helper function that checks all EBAS variables against SQL database

For all variables, see file ebas_config.ini in data directory

Raises

AttributeError – if one of the variable definitions in the ini file is not according to requirements

pyaerocom.io.ebas_varinfo.get_all_components(var_name, varlist=None)[source]

Get all EBAS components required to read a certain variable

Parameters
  • var_name (str) – AeroCom variable name

  • varlist (list, optional) – list of components already inferred (this function runs recursively).

Returns

list of components required to read / compute input AeroCom variable

Return type

list

GHOST

class pyaerocom.io.read_ghost.ReadGhost(dataset_to_read=None, dataset_path=None)[source]

Reading interface for GHOST data

First version of GHOST reading class for reading in pyaerocom

Note

This class inherits from the metaclass template ReadUngriddedBase. Please have a look at that and make sure you understand the idea behind it.

AUX_FUNS = {'concco': <function _vmr_to_conc_ghost_stats>, 'concno': <function _vmr_to_conc_ghost_stats>, 'concno2': <function _vmr_to_conc_ghost_stats>, 'conco3': <function _vmr_to_conc_ghost_stats>, 'concso2': <function _vmr_to_conc_ghost_stats>}
AUX_REQUIRES = {'concco': ['vmrco'], 'concno': ['vmrno'], 'concno2': ['vmrno2'], 'conco3': ['vmro3'], 'concso2': ['vmrso2']}
CONVERT_UNITS_META = {'network_provided_volume_standard_pressure': 'Pa'}
DATA_ID = 'GHOST.EEA.daily'
DEFAULT_FLAGS_INVALID = {'flag': None, 'qa': array([[ 0, 1, 2, 3, 6, 20, 21, 22, 72, 75, 82, 83, 90, 91, 92, 110, 111, 112, 113, 115, 132, 133]])}

Default flags used to invalidate data points (these may be either from provided flag or qa variable, or both, currently only from qa variable)

property DEFAULT_VARS

list of default variables to retrieve

FLAG_DIMNAMES = {'flag': 'N_flag_codes', 'qa': 'N_qa_codes'}
FLAG_VARS = ['flag', 'qa']
META_KEYS = ['EDGAR_v4.3.2_annual_average_BC_emissions', 'EDGAR_v4.3.2_annual_average_CO_emissions', 'EDGAR_v4.3.2_annual_average_NH3_emissions', 'EDGAR_v4.3.2_annual_average_NMVOC_emissions', 'EDGAR_v4.3.2_annual_average_NOx_emissions', 'EDGAR_v4.3.2_annual_average_OC_emissions', 'EDGAR_v4.3.2_annual_average_PM10_emissions', 'EDGAR_v4.3.2_annual_average_SO2_emissions', 'EDGAR_v4.3.2_annual_average_biogenic_PM2.5_emissions', 'EDGAR_v4.3.2_annual_average_fossilfuel_PM2.5_emissions', 'ESDAC_Iwahashi_landform_classification', 'ESDAC_Meybeck_landform_classification', 'ESDAC_modal_Iwahashi_landform_classification_25km', 'ESDAC_modal_Iwahashi_landform_classification_5km', 'ESDAC_modal_Meybeck_landform_classification_25km', 'ESDAC_modal_Meybeck_landform_classification_5km', 'ETOPO1_altitude', 'ETOPO1_max_altitude_difference_5km', 'GHOST_version', 'GPW_average_population_density_25km', 'GPW_average_population_density_5km', 'GPW_max_population_density_25km', 'GPW_max_population_density_5km', 'GPW_population_density', 'GSFC_coastline_proximity', 'Joly-Peuch_classification_code', 'Koppen-Geiger_classification', 'Koppen-Geiger_modal_classification_25km', 'Koppen-Geiger_modal_classification_5km', 'MODIS_MCD12C1_v6_IGBP_land_use', 'MODIS_MCD12C1_v6_LAI', 'MODIS_MCD12C1_v6_UMD_land_use', 'MODIS_MCD12C1_v6_modal_IGBP_land_use_25km', 'MODIS_MCD12C1_v6_modal_IGBP_land_use_5km', 'MODIS_MCD12C1_v6_modal_LAI_25km', 'MODIS_MCD12C1_v6_modal_LAI_5km', 'MODIS_MCD12C1_v6_modal_UMD_land_use_25km', 'MODIS_MCD12C1_v6_modal_UMD_land_use_5km', 'NOAA-DMSP-OLS_v4_average_nighttime_stable_lights_25km', 'NOAA-DMSP-OLS_v4_average_nighttime_stable_lights_5km', 'NOAA-DMSP-OLS_v4_max_nighttime_stable_lights_25km', 'NOAA-DMSP-OLS_v4_max_nighttime_stable_lights_5km', 'NOAA-DMSP-OLS_v4_nighttime_stable_lights', 'OMI_level3_column_annual_average_NO2', 'OMI_level3_column_cloud_screened_annual_average_NO2', 'OMI_level3_tropospheric_column_annual_average_NO2', 'OMI_level3_tropospheric_column_cloud_screened_annual_average_NO2', 'UMBC_anthrome_classification', 'UMBC_modal_anthrome_classification_25km', 'UMBC_modal_anthrome_classification_5km', 'WMO_region', 'WWF_TEOW_biogeographical_realm', 'WWF_TEOW_biome', 'WWF_TEOW_terrestrial_ecoregion', 'administrative_country_division_1', 'administrative_country_division_2', 'altitude', 'area_classification', 'associated_networks', 'city', 'climatology', 'contact_email_address', 'contact_institution', 'contact_name', 'country', 'daily_passing_vehicles', 'data_level', 'daytime_traffic_speed', 'distance_to_building', 'distance_to_junction', 'distance_to_kerb', 'distance_to_source', 'land_use', 'latitude', 'longitude', 'main_emission_source', 'measurement_altitude', 'measurement_methodology', 'measurement_scale', 'measuring_instrument_calibration_scale', 'measuring_instrument_documented_absorption_cross_section', 'measuring_instrument_documented_accuracy', 'measuring_instrument_documented_flow_rate', 'measuring_instrument_documented_lower_limit_of_detection', 'measuring_instrument_documented_measurement_resolution', 'measuring_instrument_documented_precision', 'measuring_instrument_documented_span_drift', 'measuring_instrument_documented_uncertainty', 'measuring_instrument_documented_upper_limit_of_detection', 'measuring_instrument_documented_zero_drift', 'measuring_instrument_documented_zonal_drift', 'measuring_instrument_further_details', 'measuring_instrument_inlet_information', 'measuring_instrument_manual_name', 'measuring_instrument_name', 'measuring_instrument_process_details', 'measuring_instrument_reported_absorption_cross_section', 'measuring_instrument_reported_accuracy', 'measuring_instrument_reported_flow_rate', 'measuring_instrument_reported_lower_limit_of_detection', 'measuring_instrument_reported_measurement_resolution', 'measuring_instrument_reported_precision', 'measuring_instrument_reported_span_drift', 'measuring_instrument_reported_uncertainty', 'measuring_instrument_reported_units', 'measuring_instrument_reported_upper_limit_of_detection', 'measuring_instrument_reported_zero_drift', 'measuring_instrument_reported_zonal_drift', 'measuring_instrument_sampling_type', 'network', 'network_maintenance_details', 'network_miscellaneous_details', 'network_provided_volume_standard_pressure', 'network_provided_volume_standard_temperature', 'network_qa_details', 'network_sampling_details', 'network_uncertainty_details', 'population', 'primary_sampling_further_details', 'primary_sampling_instrument_documented_flow_rate', 'primary_sampling_instrument_manual_name', 'primary_sampling_instrument_name', 'primary_sampling_instrument_reported_flow_rate', 'primary_sampling_process_details', 'primary_sampling_type', 'principal_investigator_email_address', 'principal_investigator_institution', 'principal_investigator_name', 'process_warnings', 'representative_radius', 'sample_preparation_further_details', 'sample_preparation_process_details', 'sample_preparation_techniques', 'sample_preparation_types', 'sampling_height', 'station_classification', 'station_name', 'station_reference', 'station_timezone', 'street_type', 'street_width', 'terrain']
property PROVIDES_VARIABLES

list of variable names that can be retrieved through this interface

SUPPORTED_DATASETS = ['GHOST.EEA.monthly', 'GHOST.EEA.hourly', 'GHOST.EEA.daily', 'GHOST.EBAS.monthly', 'GHOST.EBAS.hourly', 'GHOST.EBAS.daily']
property TS_TYPE

Default implementation of string for temporal resolution

TS_TYPES = {'GHOST.EBAS.daily': 'daily', 'GHOST.EBAS.hourly': 'hourly', 'GHOST.EBAS.monthly': 'monthly', 'GHOST.EEA.daily': 'daily', 'GHOST.EEA.hourly': 'hourly', 'GHOST.EEA.monthly': 'monthly'}
VARNAMES_DATA = {'conccl': 'sconccl', 'concpm1': 'pm1', 'concpm10': 'pm10', 'concpm10al': 'pm10al', 'concpm10as': 'pm10as', 'concpm25': 'pm2p5', 'vmrco': 'sconcco', 'vmrno': 'sconcno', 'vmrno2': 'sconcno2', 'vmro3': 'sconco3', 'vmrso2': 'sconcso2'}

dictionary mapping GHOST variable names to AeroCom variable names

VARS_TO_READ = ['concpm10', 'concpm10al', 'concpm10as', 'concpm25', 'concpm1', 'conccl', 'vmrco', 'vmrno', 'vmrno2', 'vmro3', 'vmrso2']

these need to be output variables in AeroCom convention (cf. file pyaerocom/data/variables.ini). See also VARNAMES_DATA for a mapping of variable names used in GHOST

compute_additional_vars(statlist_from_file, vars_to_compute)[source]

Compute all additional variables

The computations for each additional parameter are done using the specified methods in AUX_FUNS.

Parameters
  • data (dict-like) – data object containing data vectors for variables that are required for computation (cf. input param vars_to_compute)

  • vars_to_compute (list) – list of variable names that are supposed to be computed. Variables that are required for the computation of the variables need to be specified in AUX_VARS and need to be available as data vectors in the provided data dictionary (key is the corresponding variable name of the required variable).

Returns

updated data object now containing also computed variables

Return type

dict

get_file_list(vars_to_read=None, pattern=None)[source]

Retrieve a list of files to read based on input variable names

Parameters
  • vars_to_read (str, optional) – list of variables to be imported. If None, use The default is None.

  • pattern (TYPE, optional) – DESCRIPTION. The default is None.

Raises

ValueError – If no files can be found for any of the input variables.

Returns

list with file paths

Return type

list

get_meta_filename(filename)[source]

Extract metadata from data filename

Parameters

filename (str) – data file path or name.

Returns

dictionary containing var_name, start and stop, and eventually also frequency (ts_type)

Return type

dict

read(vars_to_retrieve=None, files=None, first_file=None, last_file=None, pattern=None, check_time=True, **kwargs)[source]

Read data files into UngriddedData object

Parameters
  • vars_to_retrieve (list or similar, optional,) – list containing variable IDs that are supposed to be read. If None, all variables in PROVIDES_VARIABLES are loaded

  • files (list, optional) – list of files to be read. If None, then the file list is used that is returned on get_file_list().

  • first_file (int, optional) – index of first file in file list to read. If None, the very first file in the list is used

  • last_file (int, optional) –

    index of last file in list to read. If None, the very last file

    in the list is used

    file_patternstr, optional

    string pattern for file search (cf get_file_list())

Returns

data object

Return type

UngriddedData

read_file(filename, var_to_read=None, invalidate_flags=None, var_to_write=None)[source]

Read GHOST NetCDF data file

Parameters
  • filename (str) – absolute path to filename to read

  • var_name (str, optional) – name of variable to be read, if None, it is inferred from filename

Returns

list of loaded StationData objects (dict-like data objects)

Return type

list

property var_names_data_inv

Inverted version of dictionary VARNAMES_DATA

Further I/O features

Note

The pyaerocom.io package also includes all relevant data import and

reading routines. These are introduced above, in Section Reading of gridded data.

AeroCom database browser

Created on Wed Aug 1 09:06:06 2018

@author: jonasg

class pyaerocom.io.aerocom_browser.AerocomBrowser[source]

Interface for browsing all Aerocom data direcories

Note

Use browse() to find directories matching a certain search pattern. The class methods find_matches() and find_dir() both use browse(), the only difference is, that the find_matches() adds the search result (a list with strings) to

property dirs_found

All directories that were found

find_data_dir(name_or_pattern, ignorecase=True)[source]

Find match of input name or pattern in Aerocom database

Parameters
  • name_or_pattern (str) – name or pattern of data (can be model or obs data)

  • ignorecase (bool) – if True, upper / lower case is ignored

Returns

data directory of match

Return type

str

Raises

DataSearchError – if no matches or no unique match can be found

find_matches(name_or_pattern, ignorecase=True)[source]

Search all Aerocom data directories that match input name or pattern

Parameters
  • name_or_pattern (str) – name or pattern of data (can be model or obs data)

  • ignorecase (bool) – if True, upper / lower case is ignored

Returns

list of names that match the pattern (corresponding paths can be accessed from this class instance)

Return type

list

Raises

DataSearchError – if no matches can be found

property ids_found

All data IDs that were found

File naming conventions

Low level classes and methods for io

class pyaerocom.io.fileconventions.FileConventionRead(name='aerocom3', file_sep='_', year_pos=None, var_pos=None, ts_pos=None, vert_pos=None, data_id_pos=None, from_file=None)[source]

Class that represents a file naming convention for reading Aerocom files

name

name of this convention (e.g. “aerocom3”)

Type

str

file_sep

filename delimiter for accessing different variables

Type

str

year_pos

position of year information in filename after splitting using delimiter file_sep

Type

int

var_pos

position of variable information in filename after splitting using delimiter file_sep

Type

int

ts_pos

position of information of temporal resolution in filename after splitting using delimiter file_sep

Type

int

vert_pos

position of information about vertical resolution of data

Type

int

data_id_pos

position of data ID

Type

int

AEROCOM3_VERT_INFO = {'2d': ['surface', 'column', 'modellevel'], '3d': ['modellevelatstations']}
check_validity(file)[source]

Check if filename is valid

from_dict(new_vals)[source]

Load info from dictionary

Parameters

new_vals (dict) – dictionary containing information

Returns

Return type

self

from_file(file)[source]

Identify convention from a file

Currently only two conventions (aerocom2 and aerocom3) exist that are identified by the delimiter used.

Parameters

file (str) – file path or file name

Returns

this object (with updated convention)

Return type

FileConventionRead

Raises

FileConventionError – if convention cannot be identified

Example

>>> from pyaerocom.io import FileConventionRead
>>> filename = 'aerocom3_CAM5.3-Oslo_AP3-CTRL2016-PD_od550aer_Column_2010_monthly.nc'
>>> print(FileConventionRead().from_file(filename))
pyaeorocom FileConventionRead
name: aerocom3
file_sep: _
year_pos: -2
var_pos: -4
ts_pos: -1
get_info_from_file(file)[source]

Identify convention from a file

Currently only two conventions (aerocom2 and aerocom3) exist that are identified by the delimiter used.

Parameters

file (str) – file path or file name

Returns

dictionary containing keys year, var_name, ts_type and corresponding variables, extracted from the filename

Return type

OrderedDict

Raises

FileConventionError – if convention cannot be identified

Example

>>> from pyaerocom.io import FileConventionRead
>>> filename = 'aerocom3_CAM5.3-Oslo_AP3-CTRL2016-PD_od550aer_Column_2010_monthly.nc'
>>> conv = FileConventionRead("aerocom3")
>>> info = conv.get_info_from_file(filename)
>>> for item in info.items(): print(item)
('year', 2010)
('var_name', 'od550aer')
('ts_type', 'monthly')
import_default(name)[source]

Checks and load default information from database

property info_init

Empty dictionary containing init values of infos to be extracted from filenames

string_mask(data_id, var, year, ts_type, vert_which=None)[source]

Returns mask that can be used to identify files of this convention

Parameters
  • data_id (str) – experiment ID (e.g. GISS-MATRIX.A2.CTRL)

  • var (str) – variable string ID (e.g. “od550aer”)

  • year (int) – desired year of observation (e.g. 2012)

  • ts_type (str) – string specifying temporal resolution (e.g. “daily”)

Example

import re conf_aero2 = FileConventionRead(name=”aerocom2”) conf_aero3 = FileConventionRead(name=”aerocom2”)

var = od550aer year = 2012 ts_type = “daily”

match_str_aero2 = conf_aero2.string_mask(var, year, ts_type)

match_str_aero3 = conf_aero3.string_mask(var, year, ts_type)

to_dict()[source]

Convert this object to ordered dictionary

Iris helpers

Module containing helper functions related to iris I/O methods. These contain reading of Cubes, and some methods to perform quality checks of the data, e.g.

  1. checking and correction of time definition

  2. number and length of dimension coordinates must match data array

  3. Longitude definition from -180 to 180 (corrected if defined on 0 -> 360 intervall)

pyaerocom.io.iris_io.check_and_regrid_lons_cube(cube)[source]

Checks and corrects for if longitudes of grid are 0 -> 360

Note

This method checks if the maximum of the current longitudes array exceeds 180. Thus, it is not recommended to use this function after subsetting a cube, rather, it should be checked directly when the file is loaded (cf. load_input())

Parameters

cube (iris.cube.Cube) – gridded data loaded as iris.Cube

Returns

True, if longitudes were on 0 -> 360 and have been rolled, else False

Return type

bool

pyaerocom.io.iris_io.check_dim_coord_names_cube(cube)[source]
pyaerocom.io.iris_io.check_dim_coords_cube(cube)[source]

Checks, and if necessary and applicable, updates coords names in Cube

Parameters

cube (iris.cube.Cube) – input cube

Returns

updated or unchanged cube

Return type

iris.cube.Cube

pyaerocom.io.iris_io.check_time_coord(cube, ts_type, year)[source]

Method that checks the time coordinate of an iris Cube

This method checks if the time dimension of a cube is accessible and according to the standard (i.e. fully usable). It only checks, and does not correct. For the latter, please see correct_time_coord().

Parameters
  • cube (Cube) – cube containing data

  • ts_type (str) – pyaerocom ts_type

  • year – year of data

Returns

True, if time dimension is ok, False if not

Return type

bool

pyaerocom.io.iris_io.check_time_coordOLD(cube, ts_type, year)[source]

Method that checks the time coordinate of an iris Cube

This method checks if the time dimension of a cube is accessible and according to the standard (i.e. fully usable). It only checks, and does not correct. For the latter, please see correct_time_coord().

Parameters
  • cube (Cube) – cube containing data

  • ts_type (str) – temporal resolution of data (e.g. “hourly”, “daily”). This information is e.g. encrypted in the filename of a NetCDF file and may be accessed using pyaerocom.io.FileConventionRead

  • year (int) – interger specifying year of observation, e.g. 2017

Returns

True, if time dimension is ok, False if not

Return type

bool

pyaerocom.io.iris_io.concatenate_iris_cubes(cubes, error_on_mismatch=True)[source]

Concatenate list of iris.Cube instances cubes into single Cube

Helper method for concatenating list of cubes and that helps with handling the fact that the corresponding iris method is not well defined in the sense of what it returns (i.e. instance of Cube or CubeList, depending on whether all cubes could be concatenated or not…)

This method is not supposed to be called directly but rather concatenate_cubes() (which ALWAYS returns instance of Cube or raises Exception) or concatenate_possible_cubes() (which ALWAYS returns instance of CubeList or raises Exception)

Parameters
  • cubes (CubeList) – list of individual cubes

  • error_on_mismatch – boolean specifying whether an Exception is supposed to be raised or not

Returns

result of concatenation

Return type

Cube or CubeList

Raises

iris.exceptions.ConcatenateError – if error_on_mismatch=True and input cubes could not all concatenated into a single instance of iris.Cube class.

pyaerocom.io.iris_io.correct_time_coord(cube, ts_type, year)[source]

Method that corrects the time coordinate of an iris Cube

Parameters
  • cube (Cube) – cube containing data

  • ts_type (TsType or str) – temporal resolution of data (e.g. “hourly”, “daily”). This information is e.g. encrypted in the filename of a NetCDF file and may be accessed using pyaerocom.io.FileConventionRead

  • year (int) – interger specifying start year, e.g. 2017

Returns

the same instance of the input cube with corrected time dimension axis

Return type

Cube

pyaerocom.io.iris_io.load_cube_custom(file, var_name=None, file_convention=None, perform_fmt_checks=None)[source]

Load netcdf file as iris.Cube

Parameters
  • file (str) – netcdf file

  • var_name (str) – name of variable to read

  • quality_check (bool) – if True, then a quality check of data is performed against the information provided in the filename

  • file_convention (FileConventionRead, optional) – Aerocom file convention. If provided, then the data content (e.g. dimension definitions) is tested against definition in file name

  • perform_fmt_checks (bool) – if True, additional quality checks (and corrections) are (attempted to be) performed.

Returns

loaded data as Cube

Return type

iris.cube.Cube

pyaerocom.io.iris_io.load_cubes_custom(files, var_name=None, file_convention=None, perform_fmt_checks=True, **kwargs)[source]

Load multiple NetCDF files into CubeList

Parameters
  • files (list) – netcdf file

  • var_name (str) – name of variable to read

  • quality_check (bool) – if True, then a quality check of data is performed against the information provide