We have a download chain since a long time to download your L2 products, and it seems that we were blacklisted from (nearly) 2022-01-19 17:00 UTC to 2022-01-20 12:00 UTC. I think this is related to the recent reprocessing concerning ozone issue because our chain have downloaded more files, but our chain applies the following logic to avoid blacklisting and we try to understand why is it not enough: our download chain ensure a minimum delay of 5 seconds between each HTTP request to your website. Is it enough? Because sometimes we have this kind of error:
HTTPError: 429 Client Error: Too Many Requests for url: https://oceandata.sci.gsfc.nasa.gov/ob/ ... SNPP_OC.nc
and I think we did not have this error in the past.
I've just tried to increase the delay to 10 seconds and it seems better for the moment.
Maybe we could improve our download approach to avoid these blacklistings? What delay do yo recommend between 2 HTTP requests? Are there other recommendations to avoid blacklisting?
Note that it is also difficult on our side to ensure that no other people from our company does other manual access to your website because the same IP is also used for general usage (firefox, etc...). Do you have recommendations about this problem?
Many thanks in advance,
Anyway we are still interested by any recommendation to avoid blacklisting.
The error code 429 "Too Many Requests" response status code indicates the user has sent too many requests in a given amount of time. Note that server side rate limiting is dynamic depending on server side load and the amount of clients currently being serviced. So, sometimes the rate limit is sixty(60) requests per minute while another time the limit might be twenty(20) request per minute.
Depending on your download client you should be able to limit requests so that the downloads avoid the 429 error code or retry the download when a 429 is received.
For example, wget has an option to add a wait time if needed. "-w 5" would have wget sleeping for five(5) seconds between download requests.
Another option is to allow wget to watch for 429 "Too Many Requests" errors and sleep for a few seconds to get below the server's dynamic rate limiting. The following command instructs wget to wait for ten(10) seconds before retrying the download when the server sends back an error 429 "Too Many Requests".
wget --waitretry=10 --retry-on-http-error=429 <URL>
Check the wget man page for more details.
Hope this helps.