Why it is much slower downloading from 'reanalysis-era5-complete'?

Xiaobo_Yang · 17 December 2019 15:14

Hi Michael,

I had a look at your script and I'd recommend you to make one request for one month of hourly ERA5 data.

...
for year in range(1982, 2018):
    for mon in range(1, 13):
        # Make a request, you can set day from 1 to 31 and time from 0 to 23 as you wish. Our system is smart enough to return the proper data

You are also recommended to check the status of our system by visiting https://cds.climate.copernicus.eu/live/queue.

I hope this helps.

Kind regards,

Xiaobo

Michael_Shaw · 17 December 2019 15:47

Thank you, Xiaobo. Any hints on setting days and times from 1-31 and 0-23 within the "retrieve" itself could help save a little time. Thanks much, again.

Michael_Shaw · 17 December 2019 16:18

E.g., will something as simple as this do? Or should I include some logic within the c.retrieve method for "day", e.g., to handle different length months, and if so then how? Additionally, can you explain why my previous script was seeing apparently arbitrary timeouts once in a while and, so, subsequent crashes? Is that a function of user demand on your end resulting in variable length (temporally, wall clock) retrievals and, so, occasional (and unpredictable) slower retrievals that exceed some threshold time length and the error?

import cdsapi

c = cdsapi.Client()

for year in range(2015,2020):

for mon in range(1,13):

if (mon < 10):

themon="0"+str(mon)

else:

themon=str(mon)

c.retrieve("reanalysis-era5-single-levels", {

"product_type": "reanalysis",

"format": "netcdf",

"variable": "runoff",

"year": str(year),

"month": themon,

# Assume cdsapi will skip extraneous days in shorter months without a problem...

"day": ["01","02","03","04","05","06","07","08","09","10","11",

"12","13","14","15","16","17","18","19","20","21","22",

"23","24","25","26","27","28","29","30","31"],

"time": ["00","01","02","03","04","05","06","07","08","09","10","11",

"12","13","14","15","16","17","18","19","20","21","22","23"],

}, "output."+str(year)+str(themon)+".nc")

Xiaobo_Yang · 17 December 2019 17:12

Hi Michael,

You day and time settings should work as expected.

The behaviour of when your script is executed depends on how busy our system is and the queue algorithm we adopted in our system. I'll ask our technical team to explain more if needed.

Kind regards,

Xiaobo

Michael_Shaw · 17 December 2019 18:21

Thank you Xiaobo.

Yes, more explanation of why/how retrievals fluctuate in required time as function of user demand as well as what to consider (e.g. how to design scripts to avoid) with respect to any size/time thresholds that might trigger various timeouts and crashes could be helpful.

Also, is there a way to put logic in the "day" and "time" objects/options in the retrieve method in order to, e.g. use a loop instead of a list like that?

Most importantly, though, I just want to be able to get through this pull of the data without so many timeouts and crashes that make it a bit too "user in the loop" to overcome them, so far.

Thanks very much again,

Michael

Niels_Holst · 10 January 2020 11:38

Can any one tell me, why month is running from 1 to 13 and not from 1 to 12?

Vivien_MAVEL · 10 January 2020 11:48

Months are ranging from 1 to 12.

>>> list(range(1, 13))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Niels_Holst · 10 January 2020 11:53

Thanks! I am new to Python and was surprised by how range() works.

Michael_Shaw · 10 January 2020 14:35

I found it surprising at first, too, Niels! I actually accidentally used 12 as the larger bound of my range when I first wrote the above before catching myself and remembering python "range" function uses the upper bound as the "stop" or "until" criterion, not the "through" criterion.

UnknownUser · 16 April 2020 12:10

hey,

I have also met the same problem. when I tried to retrieve data from reanalysis-era5-complete, it seems to take years for waiting in the queue. I also have a problem, is the reanalysis-era5-single-level can provide the same data retrieved before from MARS. the two codes are shown below.

one is for retrieving 10m-u/v-component wind,

retrieve,

class=ea,

date=2015-01-01/to/2015-12-31,

expver=1,

levtype=sfc,(sfc: surface, pl:pressure level, pt: potential vorticity level. Ocean data in MARS is archieved with levtype = dp, wave data is achived with levtype=sfc)

param=165.128/166.128,

step=0,

stream=oper, (oper: operational atmospheric model. Wave: wave model)

time=06:00:00,

type=fc, (forecast)(cf: control forecast)(an: analysis

another is for retrieving parameter ID with 140229, and 140245-10 metre wind speed

retrieve,

class=od, (od: operational archive, ea: EAR5)

date=2015-01-01/to/2015-12-31,

expver=1, (1: operational data,default. 69: IFS cycle 41r2 test data)

param=229.140/245.140,

step=600/624/648/672,

stream=waef, (waef: wave ensemble forecast)

time=00:00:00,

type=cf,

expect=any,

target="2015long.grib"

because we wanted to retrieve the exactly the same data as the codes tell from cds, but according to cds dataset, where we can not find the same variables. is there equivalent parameters on cds for us to retrieve?

Thanks!

Michela · 16 April 2020 19:51

Hi,

reanalysis ERA5 complete data is archived not inthe CDS disks but in the tape library at ECMWF's MARS archive. Please be aware that there is an additional queueing system for downloading data from the ECMWF's MARS archive - expect several hours to several days for submitted requests to complete at this time. You can check the Live status of your request. To retrieve MARS data efficiently (and get your data quicker!) you should retrieve all the data you need from one tape, then from the next tape, and so on. In most cases, this means retrieving all the data you need for one month, then for the next month, and so on. To find out what data is available on each tape, browse the ERA5 Catalogue and make your way until the bottom of the tree archive (where parameters are listed). Once you will have reached that level of the archive, what you see is what you can find on one single tape. See Retrieval efficiency page for more details.

The data of your requests can be also downloaded from the CDS web form or using the CDS API. You can retrieve the CDS API script from the web form using the button 'Show API request'. Please have a look at this article for more details about size limits of the requests and some efficiency tips: Climate Data Store (CDS) documentation and How to download ERA5. Unfortunately, the 10 m wind speed is not available on the CDS.

Here two examples of CDS API scripts for January 2015:

Expand source

import cdsapi
c = cdsapi.Client()
c.retrieve(

‘reanalysis-era5-single-levels’,

{

‘product_type’: ‘ensemble_members’,

‘variable’: ‘significant_height_of_combined_wind_waves_and_swell’,

‘year’: ‘2015’,

‘month’: ‘01’,

‘day’: [

‘01’, ‘02’, ‘03’,

‘04’, ‘05’, ‘06’,

‘07’, ‘08’, ‘09’,

‘10’, ‘11’, ‘12’,

‘13’, ‘14’, ‘15’,

‘16’, ‘17’, ‘18’,

‘19’, ‘20’, ‘21’,

‘22’, ‘23’, ‘24’,

‘25’, ‘26’, ‘27’,

‘28’, ‘29’, ‘30’,

‘31’,

],

‘time’: ‘00:00’,

‘format’: ‘grib’,

},

‘download.grib’)

Expand source

import cdsapi
c = cdsapi.Client()
c.retrieve(

‘reanalysis-era5-single-levels’,

{

‘product_type’: ‘reanalysis’,

‘variable’: [

‘10m_u_component_of_wind’, ‘10m_v_component_of_wind’,

],

‘year’: ‘2015’,

‘month’: ‘01’,

‘day’: [

‘01’, ‘02’, ‘03’,

‘04’, ‘05’, ‘06’,

‘07’, ‘08’, ‘09’,

‘10’, ‘11’, ‘12’,

‘13’, ‘14’, ‘15’,

‘16’, ‘17’, ‘18’,

‘19’, ‘20’, ‘21’,

‘22’, ‘23’, ‘24’,

‘25’, ‘26’, ‘27’,

‘28’, ‘29’, ‘30’,

‘31’,

],

‘time’: ‘00:00’,

‘format’: ‘grib’,

},

‘download.grib’)

Thanks

Michela

Brian_Yalle · 9 December 2020 19:47

Hi,

I have the same problem: it takes hours to download data.

In my case, I want data from ERA5 land hourly. My goal is to download 3 variables for a 10 years range, in a specific geographic area.

Although it took a while to download the first two variables (less than 2 hours), the third one has taken more than 2 hours. There were no time in queue (zero) but the "in progress" stage is taking too long.

I hope you can help me or any advise to improve my downloads.

Xiaobo_Yang · 10 December 2020 12:01

Hi Brian,

Yes, we are aware of this problem. And my colleagues are looking into performance issues of our storage system. Unfortunately it will take some time.

Regards,

Xiaobo

Matthew_Salter · 2 June 2021 11:36

Hi, I am trying to pull down some data from reanalysis-era5-complete. I realise this will take some time given that this data is archived in the tape library at ECMWF's MARS archive. However, when I check the status of my request at https://cds.climate.copernicus.eu/live/queue there are no requests listed for me as a user. I have successfully pulled down data from reanalysis-era5-single-levels without issue. Should I not be able to see my request in the list in the previously mentioned link?

Many thanks, Matt

Michela · 2 June 2021 15:38

Hi Matt,
Thank you for posting to the forum. As per our Forum guidelines, we advise users not to post personal information, so we have modified your message re: your CDS details.

The questions you raise are about the CDS infrastructure, and probably more suited as a user query for expert guidance from the CDS team - can you please raise this issue via our Support Portal (Support Portal) instead?
Best Regards,
Michela

Forum Administrator