Curl Maximum redirects and fails

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
oo_processing
Posts: 307
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

Curl Maximum redirects and fails

by oo_processing » Fri May 15, 2020 3:20 pm America/New_York

Sean,

I'm seeing this issue again. As soon as 47 files are downloaded, it says Max redirects and fails.
Notice that I am feeding curl a list so as to keep network connections alive.
The interface numbers I've used are 2607:fe50:0:6330::100 -> 2607:fe50:0:6330::109 with the same results
All of them are doing this now. Is this a new issue (or the old one again)? I thought it was just a corrupt cookie jar, but I guess NOT?
Each curl command had a different cookie jar file, and they are using different network interfaces.

time curl -b .urs_cookies_109 -c .urs_cookies_109 -L -n --interface 2607:fe50:0:6330::109 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /shares/cms_optics/virtual_ant/S4P/bin/fa_density/non_fai_rois/CAPE_COD/x02.trimmed)}

[46/353]: https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2011341.1455_1.PDS.bz2 --> MOD00.A2011341.1455_1.PDS.bz2
--_curl_--https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2011341.1455_1.PDS.bz2
100  299M  100  299M    0     0  3916k      0  0:01:18  0:01:18 --:--:-- 3980k

[47/353]: https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2011343.1440_1.PDS.bz2 --> MOD00.A2011343.1440_1.PDS.bz2
--_curl_--https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2011343.1440_1.PDS.bz2
100  295M  100  295M    0     0  3907k      0  0:01:17  0:01:17 --:--:-- 3982k

[48/353]: https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2011350.1445_1.PDS.bz2 --> MOD00.A2011350.1445_1.PDS.bz2
--_curl_--https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2011350.1445_1.PDS.bz2
  0  295M    0   191    0     0     22      0 162d 20h  0:00:08 162d 20h    22
curl: (47) Maximum (50) redirects followed

AND

time curl -b .urs_cookies_106 -c .urs_cookies_106 -L -n --interface 2607:fe50:0:6330::106 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /shares/cms_optics/virtual_ant/S4P/bin/fa_density/non_fai_rois/CAPE_COD/x00)}

[47/400]: https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2001059.1450_1.PDS.bz2 --> MOD00.A2001059.1450_1.PDS.bz2
--_curl_--https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2001059.1450_1.PDS.bz2                                 
100  285M  100  285M    0     0  3935k      0  0:01:14  0:01:14 --:--:-- 4004k                                         

[48/400]: https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2001066.1455_1.PDS.bz2 --> MOD00.A2001066.1455_1.PDS.bz2
--_curl_--https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2001066.1455_1.PDS.bz2                                 
  0  285M    0   191    0     0     21      0 164d 19h  0:00:09 164d 18h    21                                         
curl: (47) Maximum (50) redirects followed   

Any advice would be appreciated. Do we have to keep our d/l list to 47 lines max? They are only d/ling at ~4000k. This is much slower than the last time I did bulk downloads. Is it throttled now>

Filters:

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1483
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Curl Maximum redirects and fails

by OB.DAAC - SeanBailey » Fri May 15, 2020 4:00 pm America/New_York

Brock,

We've not changed anything.  Using the obdaac_download.py script that I wrote (and posted, and made available via SeaDAS) I just pulled down 360 L2 files (8.2G) in 18 minutes - which is pretty much at the cap of my home wireless network - with no issues.  I've asked our network guy to take a peek....maybe he'll respond.

Sean

oo_processing
Posts: 307
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

Curl Maximum redirects and fails

by oo_processing » Fri May 15, 2020 4:06 pm America/New_York

Sean,

I sent a separate msg to Chris as well.
I have at least one user having this issue on another campus with IPv4 (I use IPv6 where I am).
She indicates that she doesn't have this problem with MERIS and OLCI file downloads. They all work in the same manner.
She reports all works well for MERIS and OLCI; I she can download and process images fine.
She indicated that after many tests, she found that if I download one MERIS or OLCI image first, then go back to process MODIS, it works again for a while.

Brock

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1483
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Curl Maximum redirects and fails

by OB.DAAC - SeanBailey » Fri May 15, 2020 4:28 pm America/New_York

Brock,
The getfile script is agnostic to the mission - except that OLCI and MERIS require the user to have accepted the appropriate EULA.  MODIS doesn't require that, so having that extra bit would make no difference to the Earthdata Login step when pulling down MODIS.  BTW, I also have accepted those EULAs for my user, and the files I was pulling were MODIS.  Try running your cURL commands with increased verbosity to see if anything jumps out at you as to why it's in a redirect loop (but please don't post the output here, if you can't interpret the output and simply insist do so as an attachment)

Sean

oo_processing
Posts: 307
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

Curl Maximum redirects and fails

by oo_processing » Fri May 15, 2020 4:48 pm America/New_York

I'm going to look at the verbose thing as well. Is there a hard limit on the number of sessions a user can connect with.
So, If NASA is seeing me (oo_username) coming in from various interfaces, that there is an issue?
So, I have several interfaces to d/l data, and all use the same user and pass (in the .netrc file)

Also, just an up front comment, I see this is every request
* Couldn't find host oceandata.sci.gsfc.nasa.gov in the .netrc file; using defaults

My .netrc has the eathdata login. Is that an issue? eg:

machine urs.earthdata.nasa.gov login oo_username password ????????????

Brock

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1483
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Curl Maximum redirects and fails

by OB.DAAC - SeanBailey » Fri May 15, 2020 4:56 pm America/New_York

You could try adding an entry for oceandata.sci.gsfc.nasa.gov with the same username and password used with urs.earthdata.nasa.gov
I don't have one, but maybe it's confusing cURL (although, not likely since there's been successful downloads).

The only limits we have are on the number of keep-alive requests, but that would just cause your client to reconnect silently - or so's the theory :wink:

Sean

oo_processing
Posts: 307
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

Curl Maximum redirects and fails

by oo_processing » Fri May 15, 2020 5:06 pm America/New_York

Sean,
What is the hard limit on the keep-alive requests? And is that per connection, or per user?
:)
Brock

oo_processing
Posts: 307
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

Curl Maximum redirects and fails

by oo_processing » Fri May 15, 2020 5:51 pm America/New_York

Sean,

I think there may be some interesting things in the verbose output. (Still slower than it used to be. An hour to d/l 47 files :confused: )

This is interesting:
In the 47th file:
< Connection: keep-alive                                                                                                   
< Keep-Alive: timeout=60                                                                                                   
< Location: /ob/getfile/MOD00.A2003021.1420_1.PDS.bz2

In the 48th file:
Found                                                                                                                                          
                                                                                                                                                                                                              
< Connection: keep-alive                                                                                                                                                                                                                    
< Keep-Alive: timeout=60                                                                                                                                                                                                                    
< Location: https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA 
                                      
I modified they .netrc as well without effect:
cat ~/.netrc
machine urs.earthdata.nasa.gov login oo_login password ??????????????
machine oceandata.sci.gsfc.nasa.gov login oo_login password ??????????????

I'm attaching it as a file as requested.
attachment 1

gnwiii
Posts: 713
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2
Has thanked: 1 time

Curl Maximum redirects and fails

by gnwiii » Sat May 16, 2020 8:52 am America/New_York

I found it helpful to add the entry "machine oceandata.sci.gsfc.nasa.gov ..." in my "~/.netrc".   I think both wget and curl have been tweaking SSO handling, so the specific version and configure options or your curl could matter.   Troubleshooting Authentication Issues with registry.redhat.io has examples using curl to play with tokens.

oo_processing
Posts: 307
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

Curl Maximum redirects and fails

by oo_processing » Sat May 16, 2020 3:15 pm America/New_York

Well, I do know that I was able (a little over a year back) use this same version with the same (but multiple) interfaces to do 10 concurrent d/l's in separate terminals using separate interfaces. I reprocessed 40 regions of interest from mission start on terra, aqua, and viirs. I did over 250,000 workorders. So I know that this curl version has the capacity to handle the keep-alives and downloads. It has not changed:

Name        : curl                         Relocations: (not relocatable)
Version     : 7.19.7                            Vendor: Red Hat, Inc.
Release     : 52.el6                       Build Date: Fri 29 Jan 2016 08:25:34 AM EST
Install Date: Mon 22 Jan 2018 03:01:14 AM EST      Build Host: x86-033.build.eng.bos.redhat.com
Group       : Applications/Internet         Source RPM: curl-7.19.7-52.el6.src.rpm

Of course, this was done before the introduction of the requirements for the .netrc, and the -b and -c in the curl command below (so I can't help but think that its is still something at the server side with authentication -- see previous post for log snippet):

This command fails after the 47th file download where the x00 list in the command has only 400 lines  (And all others lately, I searched the forum and noticed that others in the past have had the same issue, from other locations outside our university, but no real solution):
curl -b .urs_cookies_106 -c .urs_cookies_106 -L -n --interface 2607:fe50:0:6330::106 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /shares/cms_optics/virtual_ant/S4P/bin/fa_density/non_fai_rois/CAPE_COD/x00)}

The ones that I did before without issue looked like the below command (only whereas the x00 list in the above command has 400 lines, the one below contained 2500 lines, and achieved incredible d/l speeds -- I d/l'ed a year of PDS.bz2 files in under 3 hours):

This command never failed a year ago where the x00 list in the command had 2500 lines:
curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /cms_zfs/work_orders/modis/PDS/2006/x00)}

Post Reply