Our access to your server was impossible during two consecutive days :
from 21 August 20h UTC to 22 August 10h UTC
from 22 August 14h30 UTC to 23 August 10h UTC
It looks like our IP adress (xx.xxx.xx.xx) is blocked
We had similar issues in the past (see https://oceancolor.gsfc.nasa.gov/forum/oceancolor/topic_show.pl?tid=8432 and https://oceancolor.gsfc.nasa.gov/forum/oceancolor/topic_show.pl?tid=8930), due to bad or excessive number of wget request leading to ERROR 404
But this time, we didn't find anymore ERROR 404 on our side. We also checked our IR support team, and everything is fine.
Could you tell us what happened ? and remove the blocking ?
Thank you for your help,
Unfortunately, the access has not been permitted last night, so we would appreciate if you could check it again.
Is it possible to have more detail on the "many errors from your ip address" ? We must seek what process on our side has been responsible for that
The ip address, xx.xxx.xx.xx , was automatically blocked on 21
August and 22 August for generating too many errors on the web
servers. Errors could be generated by looking for files that do not
exist to downloading too many current times leading to spurious error
503's. An error is a code other than a 200, 301, and 304.
The automated block list is cleared every 5am EST5EDT.
As of today, xx.xxx.xx.xx is not currently blocked.
xx.xxx.xx.xx has been automatically blocked every day for the past four(4) days.
It looks like the client is opening up 20 connections to download random parts of every single file they request. This multipart
download is causing many "206" error codes which are triggering the temporary blocks. Multipart client downloads put extra load on our
system and are considered unnecessary. We do not limit the download throughput of the client. I would suggest not using any download
accelerators and download the files serially, one file at a time using a single connection.
# For example, download a file using a single wget request
wget -c https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2002165.0030_1.PDS.bz2
If the client script is still causing errors our system will automatically block the ip again.
We just made modifications to the way we download the ancillary files needed by our data processing (download once for all instead of each data processing downloading the ancillary data) : this should reduce drastically the number of wget requests. Fingers crossed...
Sorry for this new post, but our IP address has been blocked again from yesterday 15hUTC to today 10hUTC, after three days with successfull connection
As we have the feeling to make correct wget requests now, we would like to obtain from you more specific information from your logs about some unauthorized wget command sent by us to your server (identifying the files we attempt to download, and the time tag of the wget command would help a lot).
On our side, we are again going to limit the number of wget commands we send (adding a timeout in wget, adding sleeps in our scripts to avoid repeating wget command in a smal time interval...)
Your help is much appreciated,
- User Services
- Posts: 1461
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 4 times
The issue is that your IP is making an inordinate number of partial data connections, and accessing zero bytes of data for each connection.
For a single VIIRS geolocation file (V2018239015400.GEO-M_SNPP.nc), your IP downloaded the full file 11 times, but made 225 connections, 214 of them look like this:
xx.xxx.xx.xx oceandata.sci.gsfc.nasa.gov - [27/Aug/2018:10:29:53 -0400] "GET /cgi/getfile/V2018239015400.GEO-M_SNPP.nc HTTP/1.1" 206 0 "-" "-"
This is occurring for every file you access. I hope this helps you sort out what your script is doing that is odd.
Thank you for this very useful information. We identified the problem in our scripts, which was due to unsufficient wget options
Before, our wget options were limited to : --continue
Now we just added : --timeout 300 --tries 2.
Option timeout will limit the access duration to 5mn, and option tries will limit the number of attempts to download a file. By default its 20 times, it will be now 2 times
This modification has been installed in all scripts
We hope that the blocking of our IP will be now removed soon
We have had access problems for quite some days now. Unsure exactly how long, but probably a couple of weeks.
It is hitting operational production quite hard now! :-(
We have problems both getting MODIS ocean color data as well as ancillary data needed to run SeaDAS (on locally received Terra/Aqua data).
We fetch the MODIS level-1 data like this:
wget --post-data="subID=1527&subType=2&addurl=1&results_as_file=1&sdate=`date --date="yesterday" +"%Y-%m-%d"`" -O - https://oceandata.sci.gsfc.nasa.gov/api/file_search | wget -c -i -
And we fetch ancillary data from here:
We fetch the files from python when the local file is 2 weeks old. Like this:
usock = urllib2.urlopen(URL + filename)
LOG.warning("Failed opening file " + filename)
data = usock.read()
LOG.info("Data retrieved from url...")
The MODIS level-1 files are being fetch from a cron-type job, where we make an attempt to see of new data needs to be downloaded. So, we do regular attempts. Can't remember how often we try, but think it is at least every hour. We can check if this is important. We suspect that we are being blocked at the moment. This has happened before, unfortunately.
The IP address I tried just now, and which fails is this:
We have tried from various servers inhouse, and it fails (almost) everywhere. So, suspect you block us on a set of IPs?
Doing it from this IP adress here is okay:
Adding to the strange behaviour, and which made it difficult for us to capture that the problems are only related to blockings is that asking [url=]https://downforeveryoneorjustme.com/
[/url] says both this site https://oceandata.sci.gsfc.nasa.gov/ and this site [url=]https://oceancolor.gsfc.nasa.gov
[/url] are down!???
Grateful for any help!