Fetching multiple MOD00.Pyyyyjjj* files in parallel?

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
woodbri
Posts: 58
Joined: Thu Jun 04, 2015 10:50 am America/New_York
Answers: 0

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by woodbri » Sun Jun 28, 2020 1:24 pm America/New_York

Hi,

Has there been a recent policy change for fetching multiple MOD00.P* files in parallel?
I have had a script running successfully for some time that will do this for a given day, but recently it has been failing and only retrieving a single file.
I've also had other issues fetching data via getanc.py and modis_atteph.py.

Thanks,
-Steve

Tags:

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1469
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by OB.DAAC - SeanBailey » Mon Jun 29, 2020 9:14 pm America/New_York

Steve,
No, there hasn't been a change. As long as your download activity is "reasonable" you shouldn't have any issues.  If your access is deem unreasonable, you may find yourself in a temporary block, but that would mean no data, not limited data.  Since you are getting data, you aren't blocked,  Networks are notoriously finicky, and without more information about how things are failing for you I can't even speculate as to what the problem would be.

The getanc/modis_atteph issue, though, might be related to a DB issue we suffered last week but which *should* be fully recovered as of today.  If you're still having issues with those, please let us know (with specifics as to which data are troublesome).

Regards,
Sean

woodbri
Posts: 58
Joined: Thu Jun 04, 2015 10:50 am America/New_York
Answers: 0

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by woodbri » Tue Jun 30, 2020 3:25 pm America/New_York

Hi Sean,

Yeah still have random wget failures trying to access files. For example I just ran my script which said I have 2 files to download. It's a perl script and forks off a copy to fetch and process each file in parallel. It got the first file, and failed on the second. I immediately copy and pasted the failed wget into another window and it downloaded the file. I have been running this script for months without any problems. I keep a DB of all the files that have been processed so I don't reprocess any of them. And I obviously have the wget credentials setup correctly or I couldn't get the first file.

I'll try making a simplified script that does nothing but try to fetch the files in parallel and see if I can reproduce the problem with that.

$ /u/oceancolor-bin# ./parallel-modis-aqua.sh 1
## composites=
## contoursonly=
days_ago: 1
jdate: 181
date: 20200629
wget -q -O - 'https://oceandata.sci.gsfc.nasa.gov/api/file_search?subID=2380&subType=1&format=txt&search=MOD00.P2020181*'
wget -q -O - 'https://oceandata.sci.gsfc.nasa.gov/api/file_search?subID=2381&subType=1&format=txt&search=MOD00.P2020181*'
wget -q -O - 'https://oceandata.sci.gsfc.nasa.gov/api/file_search?subID=2382&subType=1&format=txt&search=MOD00.P2020181*'
--------------------------------------
$TODO = {
          'MOD00.P2020181.0530_1.PDS.bz2' => 'https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2020181.0530_1.PDS.bz2',
          'MOD00.P2020181.0710_1.PDS.bz2' => 'https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2020181.0710_1.PDS.bz2'
        };
## processing in /maps/nasa/tmp-31793, MOD00.P2020181.0530_1.PDS.bz2
## processing in /maps/nasa/tmp-31794, MOD00.P2020181.0710_1.PDS.bz2
## /u/oceancolor-bin/parallel-modis-seadas -v MOD00.P2020181.0710_1.PDS.bz2

wget -q 'https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2020181.0710_1.PDS.bz2'

Failed to wget 'https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2020181.0710_1.PDS.bz2' error 2048
ERROR: Failed to run '/u/oceancolor-bin/parallel-modis-seadas -v MOD00.P2020181.0710_1.PDS.bz2' with exit 256
parallel-modis-aqua.pl: Problems processing MOD00.P2020181.0710_1.PDS.bz2 in /maps/nasa/tmp-31794

woodbri
Posts: 58
Joined: Thu Jun 04, 2015 10:50 am America/New_York
Answers: 0

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by woodbri » Tue Jun 30, 2020 3:48 pm America/New_York

Hi Sean,I reran the script above and turned on wget debug logging which might help to determine the issue here.
(link=)map01.saltwatercentral.com/wget-error-log.txtI would appreciate if someone could help sort this out.Thanks,  -Steve

woodbri
Posts: 58
Joined: Thu Jun 04, 2015 10:50 am America/New_York
Answers: 0

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by woodbri » Tue Jun 30, 2020 4:49 pm America/New_York

Hi Sean,

I just had a thought on this. Since all the fetches are done in parallel and they are sharing the same cookie jar, it is possible that the cookie jar is getting messed up as each process reads and writes to the the cookie jar. Since this is a new problem within the last week, I'm wondering if you changed cookies and I'm now getting tripped up by them?

-Steve

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1469
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by OB.DAAC - SeanBailey » Wed Jul 01, 2020 7:51 am America/New_York

Steve,

I passed you error log along to my network guru and the response was "looks like the URS server is telling wget to delete the cookie, but it's not and eventually times out.  Also, he shouldn't be running wget as root..." 
So, indeed it seems to be a cookie/cookejar issue.  We didn't change anything, but we don't control the EarthData Login (URS) side.  They may have made a change.
...and you should run the script as a non-privileged user...

Sean

woodbri
Posts: 58
Joined: Thu Jun 04, 2015 10:50 am America/New_York
Answers: 0

Fetching multiple MOD00.Pyyyyjjj* files in parallel?

by woodbri » Wed Jul 01, 2020 10:49 am America/New_York

I'll make appropriate changes and also see if I can work around the cookie jar issue.
I appreciate your help.

Thanks,
-Steve

Post Reply