Processing ERA-Interim dataset in ncdf using GDAL and GRASS GIS

Processing big data efficiently has become a necessity of the hour in ecological and climate change research. Now there are enormous number of public data available online, which demands high level of processing capabilities too.  In this post I will show you how we can process ESA-Interim data which is global atmospheric reanalysis from 1979 to present developed by ECMWF.

air jordan Promo Codegreen oakley sunglasses

The parameters available from this model are given here. The open source geospatial tools – GDAL and GRASS GIS can handle these type of data quite intuitively.  The temporal resolution is from 3 hours to 12 hours.
In this blog, I will explain how to download a 3 hours data (for skin temperature) in 4D ncdf format from ecmwf archive, import the data using GDAL and develop a temporal database in GRASS GIS 7.1.ray ban shield sunglassesray ban sunglasses for men

Acquiring the data:
oakley sunglasses sale
It is not very intuitive to get the required data from this archive. The data is available here:
We can retrieve the data in GRIB or NCDF format. I downloaded in NCDF format at a spatial resolution of 0.125 degrees for the years 1979 -1984.
The maximum limit for a single order is 20000 fields (time slots) so you may have to do it multiple times to acquire entire time frame from 1979.
nike air max penny
Below screenshot shows the input settings to download the variable “skin_temperature” in 3 hours. The “select time” and “select step” are important and has to be filled according to your temporal requirement. The below setting is for daily 3 hours data. In the next page (below pic, second column) , you can select the extent and the spatial resolution – finest being at 0.125 degrees.
Where Can I Buy Oakley sunglasses
ecmwf_download
Order Oakley sunglasses
Interpreting the data:

Here we use gdal to interpret the data and understand how it is arranged. Converting a time field into another raster format is straightforward, but what is important is to understand the time fields.
For example if we gdalinfo the data (3 hours data from 1979 to 1984)and get the first few lines;

gdalinfo netcdf-web232-20150203094809-31266-40370.nc |head -16

Ireland Oakley sunglasses

Driver: netCDF/Network Common Data Format
Files: netcdf-web232-20150203094809-31266-40370.nc
Size is 89, 70
Coordinate System is `'
Origin = (5.062500000000000,48.312500000000000)
Pixel Size = (0.125000000000000,-0.125000000000000)
Metadata:
latitude#long_name=latitude
latitude#units=degrees_north
longitude#long_name=longitude
longitude#units=degrees_east
NC_GLOBAL#Conventions=CF-1.0
NC_GLOBAL#history=2015-02-03 09:48:58 GMT by grib_to_netcdf-1.12.3: grib_to_netcdf /data/data01/netcdf-web232-20150203094420-31266-40369.target -o /data/data01/netcdf-web232-20150203094809-31266-40370.nc
NETCDF_DIM_EXTRA={time}
NETCDF_DIM_time_DEF={17536,4}
NETCDF_DIM_time_VALUES={692499,692502,692505,692508,692511,692514,692517,........}

NETCDF_DIM_time_DEF={17536,4}; means, there are 17536 (layers) fields in 4 dimensions. Look at the “NETCDF_DIM_time_VALUES” of 692499,692502 etc. Before wondering what those big integers are, let us search a bit more in gdalinfo.

gdalinfo netcdf-web232-20150203094809-31266-40370.nc|grep time#units
time#units=hours since 1900-01-01 00:00:0.0

So, unit is hours since 1900-01-01 00:00:0.0. For example,the time field of band 1 which represents 1979-01-01 03:00 is given as 692499 hours since 1900-01-01 00:00:0.0. It is a common way of representing times in big datasets, though the reference time is not very common.
ray ban sunglasses australia
Processing the data:

The main aim of this step is to get the data in GRASS GIS, as temporal data to do further analysis. The main part of this exercise is to convert the time in hours into an understandable format which would be a part of the raster name itself for the sake of search and lists. Here we use the date command in bash extensively to convert the time into a easy format. Also the data is scaled for which the offset and gain values are given in the metadata.

For example reading band 1 in the data:
custom oakley sunglasses


gdal_translate -b 1 -a_srs epsg:4326 netcdf-web232-20150203094809-31266-40370.nc band1.tif
gdalinfo band1.tif|grep NETCDF_DIM_time=|cut -d'=' -f2

692499 – Time is since 1900 01 01 00:00 in hours

echo $((692499 - 613608))

ray ban sunglasses Promo Code
78891 – Time is since 1970 01 01 00:00 in hours; Note that 613608 (which is constant here) is the number of hours from 1900 01 01 00:00 to 1970 01 01 00:00. This step is necessary since the date command in bash works with reference to the time from 1970 01 01 00:00 (same is the case with date package in R)

echo $((78891 * 60 * 60))

284007600The time in seconds since 1970 01 01 00:00

date -d @284007600 -u +%Y%j_%H%M_%d%m%Y

1979001_0300_01011979 – date format in YYYYDOY_HHMM_DDMMYYYY

date -d @284007600 -u -R|cut -c6-

01 Jan 1979 03:00:00 +0000 – This step is for input to the r.timestamp module in GRASS which is required for the preparation of temporal database in GRASS. This will register the right timestamp for each raster in GRASS for temporal queries.

The above commands are then put in a loop to convert all the fields in ncdf file to GRASS raster bands representing each time slot and named in understandable time format. See the complete code for processing in one go. Please read the comments in between. The script works when run inside a GRASS mapset where the rasters are supposed to be imported.

#Setting the variables
#Reading the Offset and Gain from the metadata for rescaling the data
OFFSET=`gdalinfo netcdf-web220-20150202222632-9509-34735.nc|grep skt#add_offset|cut -d'=' -f2`
GAIN=`gdalinfo netcdf-web220-20150202222632-9509-34735.nc|grep skt#scale_factor|cut -d'=' -f2`
#Number of bands from the metadata
NBANDS=`gdalinfo netcdf-web220-20150202222632-9509-34735.nc|grep NETCDF_DIM_time_DEF=|cut -d'{' -f2|cut -d',' -f1`
#Set your region using g.region before
for i in `seq 1 $NBANDS`; do
    #convert each band into tif in lat long coordinates (epsg 4326)
    gdal_translate -b ${i} -a_srs epsg:4326 netcdf-web220-20150202222632-9509-34735.nc b_${i}.tif
    #get the time since 1900 01 01 00:00 in hours as in meta data
    TIMEHRS1=`gdalinfo b_${i}.tif|grep NETCDF_DIM_time=|cut -d'=' -f2`
    #convert the time into since 1970 01 01 00:00 in hours
    TIMEHRS2=`echo $((${TIMEHRS1} - 613608))`
    #convert the time into seconds
    TIMESEC=`echo $((${TIMEHRS2} * 60 * 60))`
    #convert the time in seconds to date,long format for the raster name in GRASS
    DATE1=`date -d @${TIMESEC} -u +%Y%j_%H%M_%d%m%Y`
    #date is in the format acceptable for r.timestamp
    DATE2=`date -d @${TIMESEC} -u -R|cut -c6-`
    #warping the data into epsg 3035 projection
    gdalwarp -s_srs epsg:4326 -t_srs epsg:3035 -tr 1000 1000 -r bilinear b_${i}.tif b_${i}_3035.tif
    #importing the data into GRASS
    r.in.gdal input=b_${i}_3035.tif output=${DATE1}_ecmwf memory=${MEMORY} --o
    #rescaling the data using offset and gain values, using r.mapcalc
    r.mapcalc "${DATE1}_ecmwf = 1.0 * (${DATE1}_ecmwf * ${GAIN}) + ${OFFSET}" --o
    #assigning the timestamp
    r.timestamp map=${DATE1}_ecmwf date="${DATE2}"
    # clean up
    rm -f b_${i}.tif b_${i}_3035.tif
done

Once completed do a g.list to see the rasters created.

g.list rast |head -10

1979001_0300_01011979_ecmwf
1979001_0600_01011979_ecmwf
1979001_0900_01011979_ecmwf
1979001_1200_01011979_ecmwf
1979001_1500_01011979_ecmwf
1979001_1800_01011979_ecmwf
1979001_2100_01011979_ecmwf
1979002_0000_02011979_ecmwf
1979002_0300_02011979_ecmwf
1979002_0600_02011979_ecmwf

Once you have the rasters, it is time to use the TGRASS modules to create a temporal database of granularity 3 hours. please see the below commands to see how it is done. This opens to a wider options of analysis and increase the ease to use such big data in a time domain.

TGRASS commands:

#Create an empty temporal database
t.create output=T_3hourly_ecmwf type=strds semantictype=mean temporaltype=absolute title="3 hour ecmwf from 1979 Jan1" description="three hour daily ecmwf data" --o
#Create a text file with all the raster files to be in the created empty strds
g.list type=rast pattern=*_ecmwf >> filenames_ecmwf_3hourly.txt
#Now register those raster files in the temporal strds from the text file
t.register -i input=T_3hourly_ecmwf type=rast file=filenames_ecmwf_3hourly.txt start="1979-01-01 03:00" increment="3 hours" --o
#To see the details of the created temporal database
t.info T_3hourly_ecmwf
#To query data in time say for example between 2014-01-03 17:00:00 and 2014-01-05 07:00:00
t.rast.list input=Thourly_ecmwf columns=name method=comma where="start_time > '2014-01-03 17:00:00' and start_time < '2014-01-05 07:00:00'" -s

Now you have your entire ecmwf fields in GRASS GIS temporal database (strds). With this you can move ahead with your analysis. Please check this link for more details on TGRASS modules.

Enjoiii..

Please post comments or email on any queries..

SP

One thought on “Processing ERA-Interim dataset in ncdf using GDAL and GRASS GIS

  1. Hello Sir,
    First of all thanks for the tutorial.
    As i tried to follow the tutorial on Ubuntu and after entering the first command the following error is displayed:

    gdalinfo _grib2netcdf-atls20-95e2cf679cd58ee9b4db4dd119a05a8d-rvprZj.nc | head -16

    netcdf: 65536 is not a valid cdfid

    Please help

    Thanks and Regards
    John

Leave a Reply

Your email address will not be published. Required fields are marked *