API

Scripts

1. download_data.py:

2. download_data_by_date.py:

energy_balance.scripts.download_data_by_date.get_data(url, start_date, end_date, dir_path)[source]

Extract data from the campbell data logger for each specified table and save to a daily csv file between the date ranges specified. Default tables are: Housekeeping, GPS_datetime, SoilTemperature, SoilMoisture, SoilHeatFlux and Radiation

Parameters
  • url – (str) URL for connection with logger in format ‘tcp:iphost:port’ or ‘serial:/dev/ttyUSB0:19200:8N1’

  • start_date – (datetime.datetime) The start date from which to collect data

  • end_date – (datetime.datetime) The end date after which to stop collecting data. (the end date will be included in the data.)

  • dir_path – (str) The path to the top level directory in which to create the csv files and folders.

Returns

None

energy_balance.scripts.download_data_by_date.get_data_from_range(device, table, csv_path, start, end, header)[source]

Gets range of data specified by start and end dates and saves to csv at the path specified.

Parameters
  • device – (pycampbellcr1000.CR1000 object) URL for connection with logger in format ‘tcp:iphost:port’ or ‘serial:/dev/ttyUSB0:19200:8N1’

  • table – (str) The name of the table on the logger from which the data is being extracted.

  • csv_path – (str) The path to the csv file to back fill with todays data.

  • start – (datetime.datetime) The start datetime from which to collect data

  • end – (datetime.datetime) The end datetime after which to stop collecting data. (end will be included in the data.)

Returns

None

3. add_to_mysql.py:

energy_balance.scripts.add_to_mysql.insert_into_tables(user, password, database, dir_path)[source]

Gets data from the csv files found in the specified directory path and inserts it into MySQL tables. The MySQL tables must have been created proir to running this. The default names used to map the logger tables to MySQL tables are: {‘Housekeeping’: ‘housekeeping’, ‘GPS_datetime’: ‘gps’, ‘SoilTemperature’: ‘soil_temp’, ‘SoilMoisture’: ‘soil_moisture’, ‘SoilHeatFlux’: ‘soil_heat_flux’, ‘Radiation’: ‘radiation’}

Parameters
  • user – (str) The username for connecting to MySQL.

  • password – (str) The password for connecting to MySQL.

  • database – (str) The names of the database in which the tables exist.

  • dir_path – (str) The path to the top level directory in which the csv files and folders were created.

Returns

None

4. create_files.py:

energy_balance.scripts.create_files.create_files(start_date, end_date, frequency, data_product)[source]

Create netcdf files for the specified data product in the time range provided. If no data product is provided, netcdf files are created for soil and radiation.

Parameters
  • start_date – (datetime.datetime) The start date for which to create the files.

  • end_date – (datetime.datetime) The end date for which to create the files.

  • frequency – (str) The frequency for files - daily or monthly.

  • data_product – (str) The data product to create the netcdf files for e.g. radiation or soil

Returns

None

energy_balance.scripts.create_files.create_radiation_files(date, frequency)[source]

Create radiation netcdf.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_files.create_soil_files(date, frequency)[source]

Create soil netcdf.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_files.get_create_file(data_product)[source]

Get the function for creating files for the specified data product.

5. calculate_valid_min_max.py:

energy_balance.scripts.calculate_valid_min_max.calculate_valid_min_max(fpath, var_name, qc_var_name, qc_value)[source]

Re calculate valid min and valid max for a variable, given the quality control variable and maximum desired quality control value. Useful after quality control variable has been changed manually.

Parameters
  • fpath – (str) Path to netCDF file on which to calculate the min/max

  • var_name – (str) The name of the variable to update the min/max on.

  • qc_var_name – (str) The name of the quality control variable to use as a mask for retrieving valid values.

  • qc_value – (int) Max value of qc to use i.e. 1 will calculate min/max on only ‘good data’, 2 will calculate it on good data and data marked with a flag of 2.

6. create_qc_csvs.py:

energy_balance.scripts.create_qc_csvs.create_files(start_date, end_date, frequency, data_product, fpath)[source]

Create masked csvs for the specified data product in the time range provided.

Parameters
  • start_date – (datetime.datetime) The start date for which to create the files.

  • end_date – (datetime.datetime) The end date for which to create the files.

  • frequency – (str) The frequency for files - daily or monthly.

  • data_product – (str) The data product to create the csvs for e.g. radiation or soil

  • fpath – (str) The directory path at which to create the output file.

Returns

None

energy_balance.scripts.create_qc_csvs.create_radiation_files(date, frequency, path)[source]

Create radiation masked csv.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_qc_csvs.create_soil_files(date, frequency, path)[source]

Create soil masked csv.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_qc_csvs.get_create_file(data_product)[source]

Get the function for creating files for the specified data product.

energy_balance.scripts.create_qc_csvs.prepare_date(date, frequency)[source]

Convert datetimes to strings, dependening on frequency. If monthly: format returned will be %Y%m If daily: format returned will be %Y%m%d

Parameters
  • date – (datetime.datetime) The date for which the file is being created.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

(str) The date converted to string format.

7. plot_csv.py:

energy_balance.scripts.plot_csv.plot(start, end, columns, fpath)[source]

Plot the columns from the csv specified.

Parameters
  • start – (str) The start date/time from which to plot.

  • end – (str) The end date/time for the plot.

  • columns – (list) List of columns to plot.

  • fpath – (str) File path of csv file.

Returns

None

energy_balance.scripts.plot_csv.validate_time(time)[source]

Validate that time is in the correct format (Y-M-d H:M:S)

Parameters

time – (str) The time string to validate

Return time

(str) The input time, if validated. Otherwise an exception is raised.

Quality control

class energy_balance.netcdf.quality_control.QualityControl(date, frequency)[source]

Bases: object

Base class used for apply quality control to data in pandas data frames. Creates a quality control dataframe and a masked dataframe (the initial data with a quality control mask applied) from input csv files. The input files and various options are taken from a config file.

Constant values are taken from the config file, excluding ‘headers’ which must be set in each specific implementation.

Parameters
  • date – (datetime.datetime) The date to do the QC for. If frequency is monthly, only the year and month will be taken into account.

  • frequency – (str) ‘daily’ or ‘monthly’. Determines whether one days worth of data, or one months worth is taken from the csv files to create the dataframes.

apply_qc(conditions, choices, col)[source]

Generic method to apply QC to a column in a dataframe, new column is created in QC dataframe.

Parameters
  • conditions – (list) The conditions at which a QC flag should be applied. e.g. [np.isnan(self._df[col]), self._df[col] < -35, self._df[col] > 50]

  • choices – (list) The QC flag to be applied, corresponds to conditions. e.g. [2, 2, 2]

  • col – (str) The name of the column to apply QC to e.g. ‘WP_kPa_1’

create_dataframes()[source]

Class specific implementation to create pandas dataframe from input csv and empty QC dataframe other than column names. Sets self._df and self._qc

create_masked_csv(file_path)[source]

Create a csv file from the masked dataframe.

Parameters

file_path – (str) The path at which to create the csv file e.g. /path/to/my/file.csv

create_masked_df(qc_flag)[source]

Create masked pandas dataframe based on self._qc and the qc flag requested. Sets self._df_masked.

Parameters

qc_flag – (int) Max value of qc to show i.e. 1 will show only ‘good data’, 2 will show good data and data marked with a flag of 2.

property df

Returns the original dataframe created from the input csv files. All headers set in each class implementaiton of self.headers are included.

property df_masked

Returns the original dataframe masked following QC.

dt_header = 'Datetime'
execute_qc()[source]

Create the dataframes, apply the QC and create the masked dataframe.

headers = 'UNDEFINED'
prepare_date(input_date_format)[source]

Prepares the input date format so it matches with the frequency requested.

Parameters

input_date_format – (str) The format in which the date is provided in the input csv files.

Returns

(str) The date now converted to string format.

property qc

Returns the QC dataframe created based on conditions and choices set in the qc_variables method.

qc_flag_level = 1
qc_variables()[source]

Class specific implementation to apply QC to all columns.

class energy_balance.netcdf.soil_quality_control.SoilQualityControl(date, frequency)[source]

Bases: energy_balance.netcdf.quality_control.QualityControl

create_dataframes()[source]

SoilQualityControl specific implementation to create pandas dataframe from input csvs and empty QC dataframe other than column names. Sets self._df and self._qc

qc_variables()[source]

SoilQualityControl specific implementation to set QC conditions and flags and record in QC dataframe.

class energy_balance.netcdf.radiation_quality_control.RadiationQualityControl(date, frequency)[source]

Bases: energy_balance.netcdf.quality_control.QualityControl

apply_cleaning_and_temp_masks()[source]

Apply cleaning QC and body temperature QC to all columns in the dataframe.

create_dataframes()[source]

RadiationQualityControl specific implementation to create pandas dataframe from input csvs and empty QC dataframe other than column names. Sets self._df and self._qc

create_masked_df(qc_flag)[source]

RadiationQualityControl specific implementation to create the masked dataframe.

qc_variables()[source]

RadiationQualityControl specific implementation to set QC conditions and flags and record in QC dataframe.

NetCDF

class energy_balance.netcdf.base_netcdf.BaseNetCDF(df, qc, date, frequency)[source]

Bases: object

Base class used for creating netCDF files. Creates all the common variables found in netCDF files under the NCAS-GENERAL Data Standard. Sets all the required global attributes.

Constant values are taken from the config file, excluding ‘headers’ and ‘data_product’ which must be set in each specific implementation.

Parameters
  • df – A pandas dataframe containing all columns required to create the netCDF file.

  • qc – A pandas dataframe with the same columns as df, but containing the quality control values instead. (i.e. 1, 2, 3 etc.)

  • date – (datetime.datetime) The date to create the netCDF file for. If frequency is monthly, only the year and month will be taken into account.

  • frequency – (str) ‘daily’ or ‘monthly’. Determines whether the file will use data from one day or for one month.

convert_date_to_string(date, frequency)[source]

Generate a date string for the file name based on the date provided and the frequency required.

Parameters
  • date – (datetime.datetime) The date to convert to string.

  • frequency – (str) The frequency at which to have the date string.

Returns

(str) The date now converted to string format.

static convert_times(times)[source]

Convert times from strings to total seconds since 1970-01-01T00:00:00.

Parameters

times – (sequence) Times to convert to total seconds since 1970-01-01T00:00:00.

Returns

(list) The times converted to total seconds since 1970-01-01T00:00:00.

create_lat_variable()[source]

Create the common latitude variable.

create_lon_variable()[source]

Create the common longitude variable.

create_netcdf()[source]

Method to create the netCDF dataset

create_qc_variable(name, header, dimensions, **kwargs)[source]

Generic method to create a qc variable on the dataset.

Parameters
  • name – (str) The name of the variable to be created.

  • header – (str) The name of the column in the df pandas dataframe to use to populate the data of this variable.

  • dimensions – (tuple) The dimensions of the variable to be created e.g. (‘time’, ) or (‘time’, ‘index’)

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

create_specific_dimensions()[source]

Class specific implementation to create dimensions specific to that data product.

create_specific_variables()[source]

Class specific implementation to create variables specific to that data product, including any qc variables.

create_time_related_variable(name, data_type, values, long_name)[source]

Generic method to create variables day of year, day, year, month, hour, second, minute.

Parameters
  • name – (str) The name of the variable to be created.

  • data_type – The data type of the variable to be created e.g. numpy.float32

  • values – (sequence) The values to set for this variable.

  • long_name – (str) The long name of this variable.

create_time_variable()[source]

Create the common time variable.

create_variable(name, data_type, dims, header, **kwargs)[source]

Generic method to create a variable in the netCDF4 dataset.

Parameters
  • name – (str) The name of the variable to be created.

  • data_type – The data type of the variable to be created e.g. numpy.float32

  • dims – (tuple) The dimensions of the variable to be created e.g. (‘time’, ) or (‘time’, ‘index’)

  • header – (str) The name of the column in the pandas dataframe to use to populate the data of this variable.

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

data_product = 'UNDEFINED'
dt_header = 'Datetime'
fill_value = -1e+20
get_masked_data(mask_value)[source]

Create masked pandas dataframe based on self.qc and the qc flag requested. Sets self.df_masked.

Parameters

mask_value – (int) Max value of qc to show i.e. 1 will show only ‘good data’, 2 will show good data and data marked with a flag of 2.

headers = 'UNDEFINED'
qc_flag_level = 1
set_global_attributes()[source]

Sets the global attributes in the dataset based on those listed in the config file.

static times_as_datetimes(times)[source]

Convert times from strings to datetimes.

Parameters

times – (sequence) Times to convert to datetimes in format Y-m-d H:M:S.

Returns

(list) The times converted to datetimes.

class energy_balance.netcdf.soil_netcdf.SoilNetCDF(df, qc, date, frequency)[source]

Bases: energy_balance.netcdf.base_netcdf.BaseNetCDF

Class for creating soil netcdf files. Creates soil specific dimensions and variables

static convert_temps_to_kelvin(temps)[source]

Convert temperatures from degrees celsius to Kelvin.

Parameters

temps – (sequence) Temperatures to convert (in degrees C).

Returns

(list) The temperatures converted to Kelvin.

create_qc_variable(name, headers, **kwargs)[source]

SoilNetCDF specific implementation to account for index dimension.

Parameters
  • name – (str) The name of the variable to be created.

  • header – (list) The names of the columns in the df pandas dataframe to use to populate the data of this variable.

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

create_soil_heat_flux_variable()[source]

Create soil heat flux variable on the netCDF dataset.

create_soil_moisture_variable()[source]

Create soil water potential variable on the netCDF dataset.

create_soil_temp_variable()[source]

Create soil temperature variable on the netCDF dataset.

create_specific_dimensions()[source]

SoilNetCDF specific implementation to create index dimension.

create_specific_variables()[source]

SoilNetCDF specific implementation to create all soil specific variables.

create_variable(name, data_type, headers, **kwargs)[source]

SoilNetCDF specific implementation to account for index dimension.

Parameters
  • name – (str) The name of the variable to be created.

  • data_type – The data type of the variable to be created e.g. numpy.float32

  • headers – (list) The name of the columns in the pandas dataframe to use to populate the data of this variable.

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

class energy_balance.netcdf.radiation_netcdf.RadiationNetCDF(df, qc, date, frequency)[source]

Bases: energy_balance.netcdf.base_netcdf.BaseNetCDF

create_radiation_variables()[source]

Create all radiation variables.

create_specific_dimensions()[source]

Create any radiation specific dimensions - there are none.

create_specific_variables()[source]

RadiationNetCDF specific implementation to create all radiation specific variables, including qc variables.