Welcome to energy-balance-software’s documentation!

This software is intended to be used for logging data from an NCAS AMOF energy balance tower via an Campbell Scientific CR3000 logger using python. The software is intended to be used for soil and radiation data products, and where data is referred to, it is data relating to these data products. It may also work with with CR1000 or other Campbell loggers but this has not been tested.

There are various components as explained briefly below:

  • The software provides capabilities for streaming data from the Campbell logger, downloading data from specific dates and producing a csv file containing this data for each day.

  • There is functionality to put the data from these daily csv files into a MySQL database, to allow for visualization e.g. with Grafana.

  • There is a script to create daily or monthly netCDF files that conform to the NCAS-GENERAL Data Standard from these daily csv files. As part of this process, quality control is carried out on the variables, and quality control variables are added to the netCDF datasets.

  • It is also possible to create csv files for soil and radiation that have had quality control applied, to allow you to view the data manually or plot via the plotting script provided.

  • A plotting script is provided to plot specified csv columns against time, provided a timestamp exists in the csv file.

With this software, the process of collecting data, applying quality control, plotting and creating netCDF files should be much more straightforward.

Below, images of the sensors in use:

_images/sensors.jpg _images/sensors_2.jpg

Quick Guide

The full documentation can be found at https://ncas-energy-balance-1-software.readthedocs.io/en/latest/

  • This repository contains scripts for logging data from an NCAS AMOF energy balance via an Campbell Scientific CR3000 logger using python.

  • The logger should be configured to generate data tables for the data it collects.

  • Information about getting started with the logger can be found on the Campbell Scientific website (https://www.campbellsci.co.uk/cr3000). See in particular the ‘Getting Started Guide’ and the Videos & Tutorials.

  • The names of these data tables will be required to use this software and get the data from the logger.

  • This software replaces the need to use Microsoft Windows software for acquiring the data.

Before attempting to install this software, ensure you have Python 3.7 and git installed. (What is Git: https://github.com/git-guides/) (Installing Git: https://github.com/git-guides/install-git )

The software is intended to be used for soil and radiation data products, and where data is referred to, it is data relating to these data products. It may also work with with CR1000 or other Campbell loggers but this has not been tested.

The CR3000 logger:

_images/logger.jpg

There are various components as explained briefly below:

  • The software provides capabilities for streaming data from the Campbell logger, downloading data from specific dates and producing a csv file containing this data for each day.

  • There is functionality to put the data from these daily csv files into a MySQL database, to allow for visualization e.g. with Grafana.

  • There is a script to create daily or monthly netCDF files that conform to the NCAS-GENERAL Data Standard from these daily csv files. As part of this process quality control is carried out on the variables, and quality control variables are added to the netCDF datasets.

  • It is also possible to create csv files for soil and radiation that have had quality control applied, to allow you to view the data manually or plot via the plotting script provided.

  • A plotting script is provided to plot specified csv columns against time, provided a timestamp exists in the csv file.

With this software the process of collecting data, applying quality control, plotting and creating netCDF files should be much more straightforward.

To see which scripts are available and what they do, go to scripts.

Install

All scripts can be downloaded from the Github repo. In order to do this, follow the steps below. An explanation of using git clone can be found here: https://github.com/git-guides/git-clone

$ git clone https://github.com/ncasuk/ncas-energy-balance-1-software.git
$ cd ncas-energy-balance-1-software

In order to use the scripts, create a virtual environment and install the package within the ncas-energy-balance-1-software repository: The package is called energy_balance and contains the scripts and all you will need to use them.

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install .

If you would like to be able to edit the scripts, change the final line to

$ pip install -e .

Creating a virtual environment allows you to manage the installation requirements. You will need to activate the virtual environment before using the scripts (source venv/bin/activate) after opening a new terminal window.

If running script 7 - plot_csv.py, you will need to install matplotlib. On Raspberry Pi, run

$ sudo apt-get install python3-matplotlib

Otherwise,

$ pip install matplotlib

It is likely that you will use a Raspberry Pi to connect to the logger using a Moxa NPort in Real COM mode. Therefore you will run the scripts on the Raspberry Pi.

$ sudo chown root:dialout /dev/ttyr0
$ sudo chmod 660 /dev/ttyr0

Sometimes there can be a problem installing numpy on Raspberry Pi. The troubleshooting page for this is: https://numpy.org/devdocs/user/troubleshooting-importerror.html

The command:

$ sudo apt-get install libatlas-base-dev

usually works.

If you get the error error: invalid command 'bdist_wheel', run the below command.

$ pip install wheel

You also may encounter problems installing netCDF4 on Raspberry Pi. The below steps should be used in the case where you get the error ValueError: did not find HDF5 headers.

$ sudo apt-get install libhdf5-dev
$ sudo apt-get install libnetcdf-dev

Below, images of the logger set up with a Moxa Nport and the Raspberry Pi.

_images/full_setup.jpg _images/raspberry_pi.jpg

Downloading Data

To start downloading data from the logger, ensure you have activated your virtual environment and set up any config changes (see what to change and how: config). It is likely you will need to change logger_url, logger_csv_path and logger_tables in your config to get started.

You can either download data from a specific date range:

$ cd energy_balance/scripts
$ python download_data_by_date.py -s 2021-07-21 -e 2021-07-23

Or download all data from today:

$ cd energy_balance/scripts
$ python download_data.py

These commands will pull data from the tables you have specified from your logger and save it in csv files at the path you have provided.

Errors

Sometimes the scripts that retrieve data from the logger may raise an error such as:

NoDeviceException

If this occurs, please run the script again. The connection to the logger sometimes fails if there are too many queries sent.

Scripts

The scripts make use of the package PyCampbellCR1000 (on an updated fork) - this works for CR3000 loggers as well. The documentation for this package can be found here: https://pycampbellcr1000.readthedocs.io/en/latest/

To see the API and source code for these scripts, go to api.

Various settings used in these scripts can be set/changed in the config file: ncas-energy-balance-1-software/energy_balance/etc/config.ini. This includes input/output file paths and settings for netcdf global attributes. This is explained on the config page.

Note

To change the default config settings, provide your own config file by setting the environment variable CONFIG as the file path to your file. e.g.

$ export CONFIG='path/to/my/config.ini'

Note

To find your datalogger URL you must know:
  • The path to the port and the baud rate of the port. (URL = ‘serial:path:baudrate’ e.g. ‘serial:/dev/ttyUSB0:115200’)

  • For serial connections, You can also provide the byte size, parity and stop bits if required, this is assumed to be ‘8N1’ if not provided. (e.g. ‘serial:/dev/ttyUSB0:115200:8N1’)

  • It may also be possible to connect via tcp (URL = ‘tcp:host-ip:port’), however this is experimental and serial is recommended.

Use the -h option on any script to see the command line arguments available.

1. download_data.py:

  • Created to be set up as a cron job every 5 minutes (or another time interval). This downloads data from tables on the logger and saves to a daily csv file. Doing this provides a stream of data and saving as daily files allows netCDF files to easily be created.

  • If the command line argument -t is present, the script will update the logger time at midnight, to prevent the logger time drifting. So if set up to run on an interval with a cron job (that will run at midnight), the logger clock will be reset daily.

  • The files are made in the directory specified in the config file, under logger_csv_path, under another directory named after the table e.g. <logger_csv_path>/SoilMoisture/SoilMoisture_2021-07-21.csv

  • The datalogger URL must be set in the config file e.g. serial:/dev/ttyUSB0:115200 or tcp:host-ip:port (see above note explaining this.)

  • Edit logger_tables in the config file to change the tables downloaded. The default tables are Housekeeping, GPS_datetime, SoilTemperature, SoilMoisture, SoilHeatFlux and Radiation, these are AMOF specific.

To run once:

$ cd energy_balance/scripts
$ python download_data.py

This will produce an output if successful, example given below, showing how many records were found (repeated for each table):

Your download is starting.
Packet 0 with 13 records
Packet 1 with 14 records
Packet 2 with 14 records
---------------------------
41 new records were found

To set up a cron job:

$ crontab -e

Add a command, such as the one below, to this file:

*/5 * * * * . /home/pi/ncas-energy-balance-1-software/venv/bin/activate && /home/pi/ncas-energy-balance-1-software/energy_balance/scripts/download_data.py >> /home/pi/campbell_data/data-download-cron.log 2>&1

The section . /home/pi/ncas-energy-balance-1-software/venv/bin/activate activates the virtual environment. This sets this script running every 5 minutes. The first file path needs to point to your virtual environment and the second to the location of the script. The final file path points to the location at which to write a log file. This can be excluded if this is not required.

The command:

$ crontab -l

will list cron jobs you have set up.

By using download_data.py -t, instead of just download_data.py, the logger time will be updated at midnight.

2. download_data_by_date.py:

  • Intended to be used to bulk download data over a range of days.

  • Useful if system has been turned off/ was down etc.

  • This downloads data from tables on the logger and saves to a daily csv file.

  • The files are made in the directory specified in the config file, under logger_csv_path, under another directory named after the table e.g. <logger_csv_path>/SoilMoisture/SoilMoisture_2021-07-21.csv

  • Can be used in conjunction with the download_data.py script. For example, if the download_data.py script has stopped working over a period time, the download_data_by_date.py script can be used to fill in these missing days, and will fill partially complete daily files as well.

  • The datalogger URL must be set in the config file e.g. serial:/dev/ttyUSB0:115200 or tcp:host-ip:port

  • The start and end dates of the days to download should be provided on the command line (in the format YYYY-MM-DD). A start date is required but an end date is not. If an end date is not provided, data is downloaded only for the day provided as the start date.

  • If a file for a day has partial data, this script will download the rest of the data for that day, following on from the latest entry in that file.

  • Edit logger_tables in the config file to change the tables downloaded. The default tables are Housekeeping, GPS_datetime, SoilTemperature, SoilMoisture, SoilHeatFlux and Radiation.

To run:

The below command will download data for 21/07/2021, 22/07/2021 and 23/07/2021 and create a csv file for each day.

$ cd energy_balance/scripts
$ python download_data_by_date.py -s 2021-07-21 -e 2021-07-23

This next command will download data only for 21/07/2021.

$ python download_data_by_date.py -s 2021-07-21

You will see a statement saying Data downloaded for ... once this is complete.

3. add_to_mysql.py:

  • This script will load the csv data for today’s files, created by the download_data script, into MySQL tables, providing the tables have already been created in the database.

  • For information on setting up MySQL on a Raspberry Pi, see https://pimylifeup.com/raspberry-pi-mysql/

  • For information on creating tables in MySQL, see https://dev.mysql.com/doc/refman/8.0/en/creating-tables.html

  • These updating tables could then be used as a source for visualizing the data, for example with Grafana. This would mean the plots could be kept up to date and allow you to see the data in real time.

  • This could be set up as cron job along with the download_data script, to keep the tables up to date. See explanation below.

  • Edit logger_tables and mysql_tables in the config file to change the table names to those of your table names from the logger and the corresponding tables you have created in MySQL.

  • The default values used for the MySQL tables are housekeeping, gps, soil_temp, soil_moisture, soil_heat_flux and radiation. The defaults used for the logger tables are Housekeeping, GPS_datetime, SoilTemperature, SoilMoisture, SoilHeatFlux and Radiation.

  • The top level directory containing the csv files is taken from the config file (under logger_csv_path), assumed to be the same as that used to create the files. (i.e. the same as that used for the download_data.py script)

  • The username, password and database name should also be provided as command line arguments. See below:

$ cd energy_balance/scripts
$ python add_to_mysql.py -u <username> -p <password> -d <database>

This will output Inserted data into MySQL tables if successful.

Setting up as a cron job:

If the download data script is set up every 5 minutes, this script could be set up to run on a 5 minute interval but 3 minutes after the download data script. The jobs in the crontab file would like this:

*/5 * * * * . /home/pi/ncas-energy-balance-1-software/venv/bin/activate && /home/pi/ncas-energy-balance-1-software/energy_balance/scripts/download_data.py >> /home/pi/campbell_data/data-download-cron.log 2>&1
3-59/5 * * * * . /home/pi/ncas-energy-balance-1-software/venv/bin/activate && /home/pi/campbell_data/mysql_insert/add_to_mysql.py -u<username> -p<password> -d<database-name> >> /home/pi/campbell_data/cron_output/mysql-cron.log 2>&1

For extra security, the username and password for the database could be passed in from a text file, preventing them appearing in any logs. This can be done by using the path to the text file as below:

3-59/5 * * * * . /home/pi/ncas-energy-balance-1-software/venv/bin/activate && /home/pi/campbell_data/mysql_insert/add_to_mysql.py `cat /home/pi/campbell_data/mysql_insert/args.txt` >> /home/pi/campbell_data/cron_output/mysql-cron.log 2>&1

where /home/pi/campbell_data/mysql_insert/args.txt contains -u<username> -p<password> -d<database-name>

This means that the download data script would run at 00:00, 00:05, 00:10, 00:15 and every 5 minutes after. The mySQL script would run at 00:03, 00:08, 00:13 and every 5 minutes after.

4. create_files.py:

  • This script can be used to make netCDF files, that conform to the NCAS-GENERAL Data Standard, for soil and radiation data products. Quality control is carried out during this step, and quality control variables are included in the netCDF file.

  • Further details of the values used for quality control by these scripts can be found at: qc

  • The quality control level used to calculate valid min/max values is the value set in the config file under qc_flag_level.

  • Information on how the netCDF file should be built can be found at https://sites.google.com/ncas.ac.uk/ncasobservations/home/data-project/ncas-data-standards/ncas-amof/. Example files can also be found here.

  • For this to work, ensure settings in the config file are filled in correctly, e.g. column names, input files, input date format

  • Some of the quality control settings can be adjusted in the config file. e.g. the max/min temperature expected for Soil Temperature and the lower and upper bounds for the cleaning time of the radiation sensors. It would be sensible to discuss these settings with the instrument scientist.

  • The script takes some command line arguments to specify options for the creation of the files.

  • The files are created at the netcdf_path specified in the config file.

usage: create_files.py [-h] -s START_DATE [-e END_DATE] -f {daily,monthly}
                    -d {soil,radiation}

optional arguments:
-h, --help            show this help message and exit
-s START_DATE, --start-date START_DATE
                        The start date to create netCDF files for. e.g.
                        '2021-07-30' when creating daily files, '2021-07' when
                        creating monthly files.
-e END_DATE, --end-date END_DATE
                        The end date to create netCDF files for. e.g.
                        '2021-07-30' when creating daily files, '2021-07' when
                        creating monthly files. This is inclusive.
-f {daily,monthly}, --frequency {daily,monthly}
                        The frequency for creating the netCDF files, options
                        are daily or monthly.
-d {soil,radiation}, --data-product {soil,radiation}
                        The data product to create files for.

A start date is required, but an end date is not. If an end date is not provided, files are only created for the given start date. An example of usage is below.

To create a monthly netCDF file for June 2021, July 2021 and August 2021 for soil:

$ cd energy_balance/scripts
$ python create_files.py -s 2021-06 -e 2021-08 -f monthly -d soil

The file created for June 2021 would be ncas-energy-balance-1_<platform>_202106_soil_v<version>.nc, where platform and version are set in the config file.

To create a monthly netCDF file for soil for July 2021 only:

$ cd energy_balance/scripts
$ python create_files.py -s 2021-07 -f monthly -d soil

The file created would be called ncas-energy-balance-1_<platform>_202107_soil_v<version>.nc, where platform and version are set in the config file.

To create daily netCDF files for each day between 20th July 2021 and 27th July 2021 for radiation:

$ cd energy_balance/scripts
$ python create_files.py -s 2021-07-20 -e 2021-07-27 -f daily -d radiation

A file would be created for each day, e.g. for 20th July 2021: ncas-energy-balance-1_<platform>_20210720_radiation_v<version>.nc, where platform and version are set in the config file.

5. calculate_valid_min_max.py:

  • This script allows you to recalculate the valid min/max variables after manually changing the values of a quality control flag variable.

  • For example, the qc flag variable for soil_temperature is qc_flag_soil_temperature. If values of the qc flag variable are changed, it may change the valid minimum/maximum.

  • The quality control level used remains the value set in the config file under qc_flag_level.

  • To update the valid max/min values, use this script as below:

$ cd energy_balance/scripts
$ python calculate_valid_min_max.py -v soil_temperature -qc qc_flag_soil_temperature -fp /path/to/ncas-energy-balance-1_lab_20210730_soil_v0.1.nc

Once complete, you will see a message, e.g.

Recalculated valid min and valid max for soil_temperature, using qc_flag_soil_temperature as a mask, with qc flag value of 1

In general, the usage is:

usage: calculate_valid_min_max.py [-h] [-v VAR_NAME] [-qc QC_VAR_NAME] -fp FILE_PATH

optional arguments:
    -h, --help              show this help message and exit
    -v VAR_NAME, --var-name VAR_NAME
                            The name of the variable to update the min/max on.
                            e.g. 'soil_temperature'
    -qc QC_VAR_NAME, --qc-var-name QC_VAR_NAME
                            The name of the quality control variable to use as a
                            mask for retrieving valid values. e.g.
                            'qc_flag_soil_temperature'
    -fp FILE_PATH, --file-path FILE_PATH
                            The path to netCDF file on which to recalculate the
                            min/max e.g. /path/to/my/file.nc

6. create_qc_csvs.py:

  • This script will generate csvs for soil/radiation data that have been quality controlled according the level of quality control specified in the config file. These can then be plotted to see how changing the quality control changes the plot.

  • This will only apply automatic quality control as discussed in qc and will not take into account any manual changes done on the netCDF file.

  • Only columns used as variables in the netCDF files will be included. In the soil files these are: soil temperature, soil water potential, soil heat flux. In the radiation files: downwelling longwave radiation in air, upwelling longwave radiation in air, downwelling shortwave radiation in air, upwelling shortwave radiation in air and radiometer body temperature.

  • The name of the file created will be <data_product>_qc_<date>.csv e.g. soil_qc_20210730.csv.

  • The files are made in the directory specified in the config file, under qc_csv_path.

  • The quality control carried out flags data outside operational bounds, suspect data and data taken when sensors are being cleaned. To do this a quality control matrix is created, assigning each value a quality control flag. These are numbers from 0 to 255.

    • 0 is not used.

    • 1 means the data is ‘good’ i.e. it is within operational and expected bounds and hasn’t raised any suspicion.

    • Further values 2, 3, 4 etc. are assigned specific definitions e.g. 2 could mean the data is outside the operational bounds, 3 could mean there is a timestamp error.

    • Further details of the values used for quality control by these scripts can be found at: qc

  • The flag level to use can be set in the config file under qc_flag_level. Setting the level as 1, means only ‘good’ data is provided. This can be increased to include data from other qc flags, as described by the variables in the NetCDF files. (The level chosen will include data from that level and below.)

  • Some of the quality control settings can be adjusted in the config file. e.g. the max/min temperature expected for Soil Temperature and the lower and upper bounds for the cleaning time of the radiation sensors. It would be sensible to discuss these settings with the instrument scientist.

  • These csvs can be plotted using script #6 below.

usage: create_qc_csvs.py [-h] -s START_DATE [-e END_DATE] -f {daily,monthly}
                        -d {soil,radiation}

optional arguments:
-h, --help            show this help message and exit
-s START_DATE, --start-date START_DATE
                        The start date to create files for. e.g.
                        '2021-07-30' when creating daily files, '2021-07' when
                        creating monthly files.
-e END_DATE, --end-date END_DATE
                        The end date to create files for. e.g.
                        '2021-07-30' when creating daily files, '2021-07' when
                        creating monthly files. This is inclusive.
-f {daily,monthly}, --frequency {daily,monthly}
                        The frequency for creating the csv files, options
                        are daily or monthly.
-d {soil,radiation}, --data-product {soil,radiation}
                        The data product to create files for.
$ cd energy_balance/scripts
$ python create_qc_csvs.py -s 2021-07-30 -f daily -d radiation

An example of how the data could look before and after the quality control, in csv format, is shown below:

BEFORE:

Datetime,WP_kPa_1,T107_1,shf_1
2021-07-30 00:00:00,101.2294921875,21.69464111328125,0.4606184959411621,
2021-07-30 00:05:00,67.27587890625,21.682518005371094,8.577472686767578
2021-07-30 00:10:00,55.2167313385,21.796310424804688,2.078993320465088
2021-07-30 00:15:00,86.1962890625,21.664581298828125,0.1369409263134002

AFTER:

Datetime,WP_kPa_1,T107_1,shf_1
2021-07-30 00:00:00,,21.69464111328125,0.4606184959411621,
2021-07-30 00:05:00,67.27587890625,21.682518005371094,8.577472686767578
2021-07-30 00:10:00,55.2167313385,21.796310424804688,2.078993320465088
2021-07-30 00:15:00,,21.664581298828125,0.1369409263134002

The 2 soil water potential values (column WP_kPa_1) over 80kPa hav been masked out, as this is one of the quality control settings.

7. plot_csv.py:

  • This script can be used to generate quick plots from csv files, provided the file contains a date/time column, using matplotlib. It will plot the csv columns you specify against datetime.

  • The name of the datetime column must be specified in the config file, under datetime_header.

  • This will allow you take a quick look at any data, and could be used to look at how the plot changes when data is masked from the quality control.

  • The command line options allow you to specify the datetimes to plot between and which columns of the csv to plot.

  • If a start and/or end date are not provided, these will default to the start/end times in the csv.

usage: plot_csv.py [-h] [-s START] [-e END] -f FILE -c COLUMNS

optional arguments:
-h, --help            show this help message and exit
-s START, --start START
                        The start date/time for the plot in 'YYYY-MM-dd
                        HH:MM:SS' format. e.g. '2021-07-10 04:00:00'.
-e END, --end END     The end date/time for the plot in 'YYYY-MM-dd
                        HH:MM:SS' format. e.g. '2021-07-10 16:00:00'.
-fp FILE_PATH, --file-path FILE_PATH
                        The path to the csv file to plot. e.g. /path/to/file.csv
-c COLUMNS, --columns COLUMNS
                        The columns from the csv to plot against datetime,
                        provide as comma separated list if more than one e.g. 'IR01Dn,IR01Up'.

Note that datetimes should be provided in quotations to allow them to be parsed correctly.

$ cd energy_balance/scripts
$ python plot_csv.py -s '2021-07-10 04:00' -e '2021-07-10 16:00' -fp /path/to/my/file.csv -c shf_1,shf_2,shf_3

An example plot, of temperature from sensor 2, is show below:

_images/T107_2.png

The command for this was:

$ python plot_csv.py -s '2021-07-30 00:00:00' -e '2021-07-30 23:59:00' -fp path/to/SoilTemperature_2021-07-30.csv -c T107_2

Config

This explains the various options that can be set when using the scripts. To see the config file: https://github.com/ncasuk/ncas-energy-balance-1-software/blob/master/energy_balance/etc/config.ini

To override these settings, provide your own config file by setting the environment variable CONFIG as the file path to your file. e.g.

$ export CONFIG='path/to/my/config.ini'

Specifying types

It is possible to specify the type of the entries in the configuration file, for example if you want a value to be a list when the file is parsed.

This is managed through a [config_data_types] section at the top of the INI file which has the following options:

[config_data_types]
# only used by installed package
lists =
dicts =
ints =
floats =
boolean =
# use the below when creating your own file
extra_lists =
extra_dicts =
extra_ints =
extra_floats =
extra_booleans =

Simply adding the name of the value you want to format afer = will render the correct format. e.g. ints = index_length processing_level qc_flag_level will set index_length, processing_level and qc_flag_level as ints.

Settings

These settings apply across the scripts:

[common]
# url for logger E.g. tcp:iphost:port or serial:/dev/ttyUSB0:19200:8N1 or serial:/COM1:19200:8N1
logger_url = serial:/dev/ttyUSB0:115200
# latitude of station
latitude_value = 13.1
# longitude of station
longitude_value = 25.5
# name of date/time column in files/on logger
datetime_header = Datetime
# path to output/read logger csv files from
logger_csv_path = ~/ncas-energy-balance-1-software
# where to output netcdf files
netcdf_path = ~/ncas-energy-balance-1-software
# path to output qc csv files
qc_csv_path = ~/ncas-energy-balance-1-software
# masking will include values that have been quality controlled to this level or less (e.g. <= 1)
qc_flag_level = 1
# fill value to use in netcdf files
fill_value = -1e+20
# names of the data tables in the logger to process in scripts
logger_tables = Housekeeping GPS_datetime SoilTemperature SoilMoisture SoilHeatFlux Radiation
# names of the tables you have created in mysql (must map to logger tables)
mysql_tables = housekeeping gps soil_temp soil_moisture soil_heat_flux radiation

These settings are specific for the soil data product:

[soil]
# number of sensors
index_length = 3
# input file path to directory containing soil csv files to turn into netcdf files
input_file_path = ~/ncas-energy-balance-1-software
# double % to escape as it is a special character
# date format as it is on the input files
input_date_format = %%Y-%%m-%%d
# location and format of files - use {date} where the date exists in the file name
# if files are found in directories under the main 'input file path directory', include that here
soil_moisture_file = SoilMoisture/SoilMoisture_{date}.csv
soil_temperature_file = SoilTemperature/SoilTemperature_{date}.csv
soil_heat_flux_file = SoilHeatFlux/SoilHeatFlux_{date}.csv
# columns to extract: what the columns are called in the input csv file to allow them to be extracted
# give the below in index order i.e. the first listed will be given index 1
soil_moisture_headers = WP_kPa_1 WP_kPa_2 WP_kPa_3
soil_temperature_headers = T107_1 T107_2 T107_3
soil_heat_flux_headers = shf_1 shf_2 shf_3
# the max/min expected soil temp in degrees C. Any temperature outside this will be flagged as suspect data
max_expected_temp = 28
min_expected_temp = 18

These settings are specific for the radiation data product:

[radiation]
# input file path to directory containing radiation csv files to turn into netcdf files
input_file_path = ~/ncas-energy-balance-1-software
# double % to escape as it is a special character
# date format as it is on the input files
input_date_format = %%Y-%%m-%%d
# location and format of file
# if file is found in directories under the main 'input file path directory', include that here
radiation_file = Radiation/Radiation_{date}.csv
# columns to extract: what the columns are called in the input csv file to allow them to be extracted
# give the below in index order i.e. the first listed will be given index 1
# longwave downwelling
lwdn_header = IR01Dn
# longwave upwelling
lwup_header = IR01Up
# shortwave downwelling
swdn_header = SR01Dn
# shortwave upwelling
swup_header = SR01Up
# radiometer body temperature (in kelvin)
body_temp_header = NR01TK
# the time range to qc as 'sensor being cleaned'
# give in hh:mm:ss
cleaning_time_lower = 05:55:00
cleaning_time_upper = 06:05:00

These settings correspond to the global attributes on the netCDF files produced. Anything set here will be set as a global attribute:

[global]
Conventions = CF-1.6, NCAS-AMF-2.0.0
source = NCAS Energy Balance Station unit 1
instrument_manufacturer = Campbell Scientific
instrument_model = CR3000
instrument_serial_number =
instrument_software = EB1_logger.cr5
instrument_software_version = v1
creator_name = Eleanor Smith
creator_email = eleanor.smith@stfc.ac.uk
creator_url = https://orcid.org/0000-0002-6448-5778
institution = Centre for Environmental Data Analysis (CEDA)
processing_software_url = https://github.com/ncasuk/ncas-energy-balance-1-software
processing_software_version = v0.1
calibration_sensitivity =
calibration_certification_date =
calibration_certification_url =
sampling_interval = 5 minute
averaging_interval = 5 minute
product_version = 0.1
processing_level = 1
project = energy balance placement
project_principal_invesitgator = Eleanor Smith
project_principal_invesitgator_email = eleanor.smith@stfc.ac.uk
project_principal_invesitgator_url = https://orcid.org/0000-0002-6448-5778
licence = Data usage licence - UK Government Open Licence agreement: http://www.nationalarchives.gov.uk/doc/open-government-licence
acknowledment = Acknowledgement of NCAS as the data provider is required whenever and wherever these data are used
platform = lab
platform_type = stationary_platform
deployment_mode = land
title = Measurements from the NCAS energy balance station.
featureType = timeSeries
geospatial_bounds =
platform_altitude =
location_keywords =
amf_vocabularies_release = https://github.com/ncasuk/AMF_CVs/tree/v2.0.0
history =
comment =

Grafana

Grafana can be used to visualize data and can be used in conjunction with the download_data.py and add_to_mysql.py scripts (script numbers 1 and 3) to create plots that are kept up to date.

The MySQL database that you create when using the add_to_mysql.py script can be set up as a data source for Grafana.

With the data download and add to MySQL scripts running in 5 minute intervals with cron jobs, as discussed on the scripts page, the data will be kept up to date.

For an example, see the image below:

_images/grafana1.png

This shows the data for the Soil Temperature, Soil Water Potential and Soil Heat Flux for sensor channel 1. This allows you to see changes in the data over time and notice any anomalies. You can set the time range to be much smaller or larger than that shown in the image and can zoom in on data to see it in more detail.

These plots are time series plots, but there are many more types of plot available such as bar chart, gauge, pie chart and others.

There are other configurable options on Grafana. For example, it is possible to set up alerts if the data exceeds or goes below a certain value. You can also share a snapshot of the dashboard you create with someone else.

Install

Create a plot

  • To create plots you must create a new dashboard, then use the ‘add panel’ button in the top right corner.

  • Then, under Query, you can select the data source required (my database is called energybalance), the table from the database to use (my table is called soil_temp), and set the time column and the column to plot. (In the image below, the time column is called date and the column I am plotting is called temp3.)

_images/create_plot_grafana.png
  • There are options to change the name of the plot, line colours, axis names etc. on the right hand side.

Quality Control

This explains the quality control values and definitions used when creating quality control csvs and netCDF files (which include quality control variables once created) For more information on how these netCDF files have been built see https://sites.google.com/ncas.ac.uk/ncasobservations/home/data-project/ncas-data-standards/ncas-amof/. The control quality control variables are included as they are ‘Data Product Specific Variables’.

soil

For the soil data product, 3 quality control variables are created. Below each, the values and meanings used are listed. These are also included in the netCDF file created. These are:

  1. qc_flag_soil_heat_flux

    0: not used 1: good data 2: bad data - value outside operational range (-30 to 70 C) (e.g. if the soil temperature is outside this range, the corresponding value of soil heat flux is flagged) 3: suspect data 4: timestamp error

    • flag 2 is applied automatically, all other data is marked as 1.

  2. qc_flag_soil_temperature

    0: not used 1: good data 2: bad data - value outside operational range (-35 to 50 C) 3: suspect data (includes temperature data that is higher/ lower than expected - these expected values can be set in the config file - speak to instrument scientist to find out what this should be) 4: timestamp error

    • flags 2 and 3 (3 for outside range of expected temperature) are applied automatically, all other data is marked as 1. More data can be manually marked as suspect if needed.

  3. qc_flag_soil_water_potential

    0: not used 1: good data 2: bad data - soil water potential over 80kPa (contact between soil and sensor usually lost at this point) 3: bad data - value outside operational range (0 to 200 kPa) 4: suspect data 5: timestamp error

    • flags 2 and 3 are applied automatically, all other data is marked as 1.

radiation

For the radiation data product, 6 quality control variables are created.

  1. qc_flag_upwelling_shortwave

    0: not used 1: good data 2: no data 3: bad data sw radiation < 0 4: bad data sw radiation > 2000 W m-2 5: suspect data 6: timestamp error

    • flags 2, 3 and 4 are applied automatically, all other data is marked as 1.

  2. qc_flag_downwelling_shortwave

    0: not used 1: good data 2: no data 3: bad data sw radiation < 0 4: bad data sw radiation > 2000 W m-2 5: suspect data 6: timestamp error

    • flags 2, 3 and 4 are applied automatically, all other data is marked as 1.

  3. qc_flag_upwelling_longwave

    0: not used 1: good data 2: no data 3: bad data sw radiation < 0 4: bad data sw radiation > 1000 W m-2 5: suspect data 6: timestamp error

    • flags 2, 3 and 4 are applied automatically, all other data is marked as 1.

  4. qc_flag_downwelling_longwave

    0: not used 1: good data 2: no data 3: bad data sw radiation < 0 4: bad data sw radiation > 1000 W m-2 5: suspect data 6: timestamp error

    • flags 2, 3 and 4 are applied automatically, all other data is marked as 1.

  5. qc_flag_body_temperature

    0: not used 1: good data 2: bad data body temperature outside operational range -40 to 80C 3: suspect data 4: timestamp error

    • flag 2 is applied automatically, all other data is marked as 1. This flag is applied to all variables, as they are all affected if the body temperature is outside operational bounds.

  6. qc_flag_cleaning

    0: not used 1: good data 2: bad data sensor being cleaned (the times for this should be set in the config file - speak to instrument scientist to find out what this should be) 3: suspect data 4: timestamp error

    • flag 2 is applied automatically, all other data is marked as 1. This flag is applied to all variables, as they are all affected when the sensor is being cleaned.

  • The script to create netCDF files or that which makes the qc csvs, automatically applies flags as mentioned under each variable. Any other flags are there to be added manually after inspection of the data.

  • Manually adding flags may result in the valid max/min values of variables needing to be changed in the netCDF file - these are calculated only from ‘good data’. There is a script available to recalculate these values. (Script 5 - calculate_valid_min_max.py)

  • Suspect data has been included to cover other scenarios not covered by the other flags e.g. data is higher/lower than expected, the data has changed significantly since the last reading.

API

Scripts

1. download_data.py:

2. download_data_by_date.py:

energy_balance.scripts.download_data_by_date.get_data(url, start_date, end_date, dir_path)[source]

Extract data from the campbell data logger for each specified table and save to a daily csv file between the date ranges specified. Default tables are: Housekeeping, GPS_datetime, SoilTemperature, SoilMoisture, SoilHeatFlux and Radiation

Parameters
  • url – (str) URL for connection with logger in format ‘tcp:iphost:port’ or ‘serial:/dev/ttyUSB0:19200:8N1’

  • start_date – (datetime.datetime) The start date from which to collect data

  • end_date – (datetime.datetime) The end date after which to stop collecting data. (the end date will be included in the data.)

  • dir_path – (str) The path to the top level directory in which to create the csv files and folders.

Returns

None

energy_balance.scripts.download_data_by_date.get_data_from_range(device, table, csv_path, start, end, header)[source]

Gets range of data specified by start and end dates and saves to csv at the path specified.

Parameters
  • device – (pycampbellcr1000.CR1000 object) URL for connection with logger in format ‘tcp:iphost:port’ or ‘serial:/dev/ttyUSB0:19200:8N1’

  • table – (str) The name of the table on the logger from which the data is being extracted.

  • csv_path – (str) The path to the csv file to back fill with todays data.

  • start – (datetime.datetime) The start datetime from which to collect data

  • end – (datetime.datetime) The end datetime after which to stop collecting data. (end will be included in the data.)

Returns

None

3. add_to_mysql.py:

energy_balance.scripts.add_to_mysql.insert_into_tables(user, password, database, dir_path)[source]

Gets data from the csv files found in the specified directory path and inserts it into MySQL tables. The MySQL tables must have been created proir to running this. The default names used to map the logger tables to MySQL tables are: {‘Housekeeping’: ‘housekeeping’, ‘GPS_datetime’: ‘gps’, ‘SoilTemperature’: ‘soil_temp’, ‘SoilMoisture’: ‘soil_moisture’, ‘SoilHeatFlux’: ‘soil_heat_flux’, ‘Radiation’: ‘radiation’}

Parameters
  • user – (str) The username for connecting to MySQL.

  • password – (str) The password for connecting to MySQL.

  • database – (str) The names of the database in which the tables exist.

  • dir_path – (str) The path to the top level directory in which the csv files and folders were created.

Returns

None

4. create_files.py:

energy_balance.scripts.create_files.create_files(start_date, end_date, frequency, data_product)[source]

Create netcdf files for the specified data product in the time range provided. If no data product is provided, netcdf files are created for soil and radiation.

Parameters
  • start_date – (datetime.datetime) The start date for which to create the files.

  • end_date – (datetime.datetime) The end date for which to create the files.

  • frequency – (str) The frequency for files - daily or monthly.

  • data_product – (str) The data product to create the netcdf files for e.g. radiation or soil

Returns

None

energy_balance.scripts.create_files.create_radiation_files(date, frequency)[source]

Create radiation netcdf.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_files.create_soil_files(date, frequency)[source]

Create soil netcdf.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_files.get_create_file(data_product)[source]

Get the function for creating files for the specified data product.

5. calculate_valid_min_max.py:

energy_balance.scripts.calculate_valid_min_max.calculate_valid_min_max(fpath, var_name, qc_var_name, qc_value)[source]

Re calculate valid min and valid max for a variable, given the quality control variable and maximum desired quality control value. Useful after quality control variable has been changed manually.

Parameters
  • fpath – (str) Path to netCDF file on which to calculate the min/max

  • var_name – (str) The name of the variable to update the min/max on.

  • qc_var_name – (str) The name of the quality control variable to use as a mask for retrieving valid values.

  • qc_value – (int) Max value of qc to use i.e. 1 will calculate min/max on only ‘good data’, 2 will calculate it on good data and data marked with a flag of 2.

6. create_qc_csvs.py:

energy_balance.scripts.create_qc_csvs.create_files(start_date, end_date, frequency, data_product, fpath)[source]

Create masked csvs for the specified data product in the time range provided.

Parameters
  • start_date – (datetime.datetime) The start date for which to create the files.

  • end_date – (datetime.datetime) The end date for which to create the files.

  • frequency – (str) The frequency for files - daily or monthly.

  • data_product – (str) The data product to create the csvs for e.g. radiation or soil

  • fpath – (str) The directory path at which to create the output file.

Returns

None

energy_balance.scripts.create_qc_csvs.create_radiation_files(date, frequency, path)[source]

Create radiation masked csv.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_qc_csvs.create_soil_files(date, frequency, path)[source]

Create soil masked csv.

Parameters
  • date – (datetime.datetime) The date for which to create the file.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

None

energy_balance.scripts.create_qc_csvs.get_create_file(data_product)[source]

Get the function for creating files for the specified data product.

energy_balance.scripts.create_qc_csvs.prepare_date(date, frequency)[source]

Convert datetimes to strings, dependening on frequency. If monthly: format returned will be %Y%m If daily: format returned will be %Y%m%d

Parameters
  • date – (datetime.datetime) The date for which the file is being created.

  • frequency – (str) The frequency for files - daily or monthly.

Returns

(str) The date converted to string format.

7. plot_csv.py:

energy_balance.scripts.plot_csv.plot(start, end, columns, fpath)[source]

Plot the columns from the csv specified.

Parameters
  • start – (str) The start date/time from which to plot.

  • end – (str) The end date/time for the plot.

  • columns – (list) List of columns to plot.

  • fpath – (str) File path of csv file.

Returns

None

energy_balance.scripts.plot_csv.validate_time(time)[source]

Validate that time is in the correct format (Y-M-d H:M:S)

Parameters

time – (str) The time string to validate

Return time

(str) The input time, if validated. Otherwise an exception is raised.

Quality control

class energy_balance.netcdf.quality_control.QualityControl(date, frequency)[source]

Bases: object

Base class used for apply quality control to data in pandas data frames. Creates a quality control dataframe and a masked dataframe (the initial data with a quality control mask applied) from input csv files. The input files and various options are taken from a config file.

Constant values are taken from the config file, excluding ‘headers’ which must be set in each specific implementation.

Parameters
  • date – (datetime.datetime) The date to do the QC for. If frequency is monthly, only the year and month will be taken into account.

  • frequency – (str) ‘daily’ or ‘monthly’. Determines whether one days worth of data, or one months worth is taken from the csv files to create the dataframes.

apply_qc(conditions, choices, col)[source]

Generic method to apply QC to a column in a dataframe, new column is created in QC dataframe.

Parameters
  • conditions – (list) The conditions at which a QC flag should be applied. e.g. [np.isnan(self._df[col]), self._df[col] < -35, self._df[col] > 50]

  • choices – (list) The QC flag to be applied, corresponds to conditions. e.g. [2, 2, 2]

  • col – (str) The name of the column to apply QC to e.g. ‘WP_kPa_1’

create_dataframes()[source]

Class specific implementation to create pandas dataframe from input csv and empty QC dataframe other than column names. Sets self._df and self._qc

create_masked_csv(file_path)[source]

Create a csv file from the masked dataframe.

Parameters

file_path – (str) The path at which to create the csv file e.g. /path/to/my/file.csv

create_masked_df(qc_flag)[source]

Create masked pandas dataframe based on self._qc and the qc flag requested. Sets self._df_masked.

Parameters

qc_flag – (int) Max value of qc to show i.e. 1 will show only ‘good data’, 2 will show good data and data marked with a flag of 2.

property df

Returns the original dataframe created from the input csv files. All headers set in each class implementaiton of self.headers are included.

property df_masked

Returns the original dataframe masked following QC.

dt_header = 'Datetime'
execute_qc()[source]

Create the dataframes, apply the QC and create the masked dataframe.

headers = 'UNDEFINED'
prepare_date(input_date_format)[source]

Prepares the input date format so it matches with the frequency requested.

Parameters

input_date_format – (str) The format in which the date is provided in the input csv files.

Returns

(str) The date now converted to string format.

property qc

Returns the QC dataframe created based on conditions and choices set in the qc_variables method.

qc_flag_level = 1
qc_variables()[source]

Class specific implementation to apply QC to all columns.

class energy_balance.netcdf.soil_quality_control.SoilQualityControl(date, frequency)[source]

Bases: energy_balance.netcdf.quality_control.QualityControl

create_dataframes()[source]

SoilQualityControl specific implementation to create pandas dataframe from input csvs and empty QC dataframe other than column names. Sets self._df and self._qc

qc_variables()[source]

SoilQualityControl specific implementation to set QC conditions and flags and record in QC dataframe.

class energy_balance.netcdf.radiation_quality_control.RadiationQualityControl(date, frequency)[source]

Bases: energy_balance.netcdf.quality_control.QualityControl

apply_cleaning_and_temp_masks()[source]

Apply cleaning QC and body temperature QC to all columns in the dataframe.

create_dataframes()[source]

RadiationQualityControl specific implementation to create pandas dataframe from input csvs and empty QC dataframe other than column names. Sets self._df and self._qc

create_masked_df(qc_flag)[source]

RadiationQualityControl specific implementation to create the masked dataframe.

qc_variables()[source]

RadiationQualityControl specific implementation to set QC conditions and flags and record in QC dataframe.

NetCDF

class energy_balance.netcdf.base_netcdf.BaseNetCDF(df, qc, date, frequency)[source]

Bases: object

Base class used for creating netCDF files. Creates all the common variables found in netCDF files under the NCAS-GENERAL Data Standard. Sets all the required global attributes.

Constant values are taken from the config file, excluding ‘headers’ and ‘data_product’ which must be set in each specific implementation.

Parameters
  • df – A pandas dataframe containing all columns required to create the netCDF file.

  • qc – A pandas dataframe with the same columns as df, but containing the quality control values instead. (i.e. 1, 2, 3 etc.)

  • date – (datetime.datetime) The date to create the netCDF file for. If frequency is monthly, only the year and month will be taken into account.

  • frequency – (str) ‘daily’ or ‘monthly’. Determines whether the file will use data from one day or for one month.

convert_date_to_string(date, frequency)[source]

Generate a date string for the file name based on the date provided and the frequency required.

Parameters
  • date – (datetime.datetime) The date to convert to string.

  • frequency – (str) The frequency at which to have the date string.

Returns

(str) The date now converted to string format.

static convert_times(times)[source]

Convert times from strings to total seconds since 1970-01-01T00:00:00.

Parameters

times – (sequence) Times to convert to total seconds since 1970-01-01T00:00:00.

Returns

(list) The times converted to total seconds since 1970-01-01T00:00:00.

create_lat_variable()[source]

Create the common latitude variable.

create_lon_variable()[source]

Create the common longitude variable.

create_netcdf()[source]

Method to create the netCDF dataset

create_qc_variable(name, header, dimensions, **kwargs)[source]

Generic method to create a qc variable on the dataset.

Parameters
  • name – (str) The name of the variable to be created.

  • header – (str) The name of the column in the df pandas dataframe to use to populate the data of this variable.

  • dimensions – (tuple) The dimensions of the variable to be created e.g. (‘time’, ) or (‘time’, ‘index’)

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

create_specific_dimensions()[source]

Class specific implementation to create dimensions specific to that data product.

create_specific_variables()[source]

Class specific implementation to create variables specific to that data product, including any qc variables.

create_time_related_variable(name, data_type, values, long_name)[source]

Generic method to create variables day of year, day, year, month, hour, second, minute.

Parameters
  • name – (str) The name of the variable to be created.

  • data_type – The data type of the variable to be created e.g. numpy.float32

  • values – (sequence) The values to set for this variable.

  • long_name – (str) The long name of this variable.

create_time_variable()[source]

Create the common time variable.

create_variable(name, data_type, dims, header, **kwargs)[source]

Generic method to create a variable in the netCDF4 dataset.

Parameters
  • name – (str) The name of the variable to be created.

  • data_type – The data type of the variable to be created e.g. numpy.float32

  • dims – (tuple) The dimensions of the variable to be created e.g. (‘time’, ) or (‘time’, ‘index’)

  • header – (str) The name of the column in the pandas dataframe to use to populate the data of this variable.

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

data_product = 'UNDEFINED'
dt_header = 'Datetime'
fill_value = -1e+20
get_masked_data(mask_value)[source]

Create masked pandas dataframe based on self.qc and the qc flag requested. Sets self.df_masked.

Parameters

mask_value – (int) Max value of qc to show i.e. 1 will show only ‘good data’, 2 will show good data and data marked with a flag of 2.

headers = 'UNDEFINED'
qc_flag_level = 1
set_global_attributes()[source]

Sets the global attributes in the dataset based on those listed in the config file.

static times_as_datetimes(times)[source]

Convert times from strings to datetimes.

Parameters

times – (sequence) Times to convert to datetimes in format Y-m-d H:M:S.

Returns

(list) The times converted to datetimes.

class energy_balance.netcdf.soil_netcdf.SoilNetCDF(df, qc, date, frequency)[source]

Bases: energy_balance.netcdf.base_netcdf.BaseNetCDF

Class for creating soil netcdf files. Creates soil specific dimensions and variables

static convert_temps_to_kelvin(temps)[source]

Convert temperatures from degrees celsius to Kelvin.

Parameters

temps – (sequence) Temperatures to convert (in degrees C).

Returns

(list) The temperatures converted to Kelvin.

create_qc_variable(name, headers, **kwargs)[source]

SoilNetCDF specific implementation to account for index dimension.

Parameters
  • name – (str) The name of the variable to be created.

  • header – (list) The names of the columns in the df pandas dataframe to use to populate the data of this variable.

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

create_soil_heat_flux_variable()[source]

Create soil heat flux variable on the netCDF dataset.

create_soil_moisture_variable()[source]

Create soil water potential variable on the netCDF dataset.

create_soil_temp_variable()[source]

Create soil temperature variable on the netCDF dataset.

create_specific_dimensions()[source]

SoilNetCDF specific implementation to create index dimension.

create_specific_variables()[source]

SoilNetCDF specific implementation to create all soil specific variables.

create_variable(name, data_type, headers, **kwargs)[source]

SoilNetCDF specific implementation to account for index dimension.

Parameters
  • name – (str) The name of the variable to be created.

  • data_type – The data type of the variable to be created e.g. numpy.float32

  • headers – (list) The name of the columns in the pandas dataframe to use to populate the data of this variable.

  • kwargs – (dict) Dictionary of attributes {‘attr_name’: ‘attr_value’} to set on the variable e.g. {‘standard_name’: ‘soil_temperature’}

class energy_balance.netcdf.radiation_netcdf.RadiationNetCDF(df, qc, date, frequency)[source]

Bases: energy_balance.netcdf.base_netcdf.BaseNetCDF

create_radiation_variables()[source]

Create all radiation variables.

create_specific_dimensions()[source]

Create any radiation specific dimensions - there are none.

create_specific_variables()[source]

RadiationNetCDF specific implementation to create all radiation specific variables, including qc variables.

Indices and tables