Just added a post to this thread:
"Data access stopped due to IP adress blocked"
Don't know if that was the right thing to do? Probably I should have made a new one here?
In any case I hope you can help us! We have had access problems that both affect our real-time operations of locally received Terra/Aqua data and our automatic fetch of level-1 products.
How can we prevent this in the future?
It is really damaging to our real-time service when we can't access.
As you saw we couldn't even get to the forum here and post a question. It was not until I got home and was outside our firewall and got another IP address that I could post our problem.
Also, it worries me that your site seems down from when checking here: https://downforeveryoneorjustme.com/oceancolor.gsfc.nasa.gov
It appears there is a whole set of IP addresses blocked and not only the one above you unblocked yesterday.
We cannot still access your site from our production servers, neither from our desktops at the SMHI premises.
The IP address I gave you above is the one I get when on VPN from home. I now checked with our IT colleagues and we are coming with the following ip block when we access the internet:
I suggest you keep those open always, thanks.
There are limits imposed on data access, but these are reasonable limits and are only designed to prevent DDOS attacks on our systems.
If you have multiple systems behind a single firewall, and all attempt to connect to our systems at the same time, this can exceed the
imposed limits. If this is not the case, and you are accessing from a single system, if you have any "download accelerators" in use, or
spawn a lot of simultaneous processes, this too can exceed the limits.
We don't use any accelerators!
But, we do try fetch from more than one server. We have a setup where we have three types of servers "dev", "test" and "prod". Usually we only run fetch from the test and the prod server environments. We need a functioning test environment in order to test and verify things for production. The test and prod environments should be as similar as possible.
If there is no way you can accept that we try download files from both sites we will see what we can do. We do want to minimise the risk this incident will ever happen again!
Using your wget example - and with knowledge that this is run in a cron (yes, I can confirm you're doing it hourly :grin:), we were able to identify the reason you were being repeatedly blocked.
We have made a couple of small changes on our end to mitigate this, so you should be good to go from now on. There is a change you can make that may also help.
For efficiency you can have your web client ask our server if the date has changed on a file. If so, the file is downloaded. If not, the file is not downloaded. Here's how you would do that with wget (based on your previous example):
wget -q --post-data="subID=1527&subType=2&addurl=1&results_as_file=1&sdate=`date --date="yesterday" +"%Y-%m-%d"`" -O - https://oceandata.sci.gsfc.nasa.gov/api/file_search | wget --no-if-modified-since -N -i -
You could keep the '-c', it works but it will take time to, well, time-out if you have already successfully retrieved the file. When it does time out, you'll see a response like this:
Connecting to oceandata.sci.gsfc.nasa.gov
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable
The file is already fully retrieved; nothing to do.
If you have a good network connection, it is less likely that you'll need the continuation (-c) option.
Many thanks for the prompt help!
We will aim at implementing your proposed improvement soon! Probably avoid using the '-c' flag.
I can now confirm that we actually only do the fetch of the ocean color level-1 files from one single (production) server.
However, the other fetch of the utcpole.dat and leapsec.dat files for running SeaDAS locally we do from test and prod, and occasional also from the dev server. So, in "worst" case we might download from three sites. However, these are small files, and the download is not atempted as often as the other level-1 product fetch. Do you think that is still a problem? If yes, we will implement this fetch centrally on the same prod server, and change the setup a bit.