Wget constantly redirected

Please enter here to ask a question about any NASA Science related topics!
schckngs
Posts: 23
Joined: Wed Feb 03, 2021 1:43 pm America/New_York

Wget constantly redirected

by schckngs » Fri Feb 21, 2020 10:49 am America/New_York

Hi OC forum,
Quick question ... I'm using the new wget procedure and it works perfectly until..it doesn't. My script plugs along perfectly for a couple hours and then wget will suddenly be redirected and not find the files.  (Example below) By deleting the .urs_cookies file this seems to fix it. Just wondering if it's possible to tell if this is an error on my end (eg network?) or why this is happening? I'm working on a large processing and it would be nice to be able to leave it without having to check back constantly to make sure it's still working.

Example:
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2016201174500.L2_LAC_OC.nc
--2020-02-21 11:38:37--  https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2016201174500.L2_LAC_OC.nc
Resolving oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)... xx.xxx.xx.xx, 2001:4d0:2418:128::84
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /ob/getfile/A2016201174500.L2_LAC_OC.nc [following]
--2020-02-21 11:38:37--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/A2016201174500.L2_LAC_OC.nc
Reusing existing connection to oceandata.sci.gsfc.nasa.gov:443.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-21 11:38:37--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Resolving urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)... xx.xxx.xx.xx, 2001:4d0:241a:4081::89
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=7c6cfe60597147c8d27c205eceed61a246f5cdbb37f2c8a36b7e7119d78e6289 [following]
--2020-02-21 11:38:38--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=7c6cfe60597147c8d27c205eceed61a246f5cdbb37f2c8a36b7e7119d78e6289
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA&response_type=code [following]
--2020-02-21 11:38:38--  https://urs.earthdata.nasa.gov/oauth/authorize?redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA&response_type=code
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=24d69431beb953c1b53e7e01f9a9695febf519af9e04f82a1527452d197645f2 [following]
--2020-02-21 11:38:38--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=24d69431beb953c1b53e7e01f9a9695febf519af9e04f82a1527452d197645f2
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-21 11:38:38--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=afb8a035a75c5424a47ebfa2438e4360acdfa8fead849db0e26822df61a6ec55 [following]
--2020-02-21 11:38:39--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=afb8a035a75c5424a47ebfa2438e4360acdfa8fead849db0e26822df61a6ec55
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-21 11:38:39--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=f3b283d52a8ae0845c67c59cd228a760e71fc2112609b666c4e7bb87154e9aae [following]
--2020-02-21 11:38:39--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=f3b283d52a8ae0845c67c59cd228a760e71fc2112609b666c4e7bb87154e9aae
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict [following]
--2020-02-21 11:38:40--  https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|xx.xxx.xx.xx|:443... connected.

----ETC...---

20 redirections exceeded.

Tags:

schckngs
Posts: 23
Joined: Wed Feb 03, 2021 1:43 pm America/New_York

Wget constantly redirected

by schckngs » Fri Feb 21, 2020 1:23 pm America/New_York

I am mistaken.... it is not deleting .urs_cookies that fixes it, I have no idea why it started working again for me. I was able to download 6 files before it stopped again. Once it worked when I added --user and --password to the wget command, but just once!
Copy/pasting the link into a browser works part time, and other times the page keeps refreshing as well.

gnwiii
Posts: 604
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 1

Wget constantly redirected

by gnwiii » Fri Feb 21, 2020 3:49 pm America/New_York

Wget has been erratic for me.   Running with the "--debug" option gives lots of detail, but so far all it has shown is that wget sometimes tries to use IPv6 and dies with "no route to host".   For that it is ncessary to use the "-4" option but wget still gives random failures.

schckngs
Posts: 23
Joined: Wed Feb 03, 2021 1:43 pm America/New_York

Wget constantly redirected

by schckngs » Mon Feb 24, 2020 7:44 am America/New_York

Okay, glad in a sense to hear that this problem is not just me!
I left a script to process over the weekend and it was able to download 1 year of data before stopping...
Once again this time it only started working again once I deleted the .urs_cookies file.
I guess the next step is to see if curl is more reliable??

gnwiii
Posts: 604
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 1

Wget constantly redirected

by gnwiii » Mon Feb 24, 2020 10:29 am America/New_York

Sean posted some download scripts that retry downloads after an error occurs and save a log file for failed downloads.   I haven't tried these recently, but they were very useful in the past with heavily used internet.  I modifed the scripts to save the log file for each download so I could use the download rates from the logs to identify slowdown/failure times.   At my site, internet usage was high after morning coffee break until afternoon coffee, and at night for replication of data stores to another site.   Scheduling downloads for early AM (after replication was finished) worked well.

schckngs
Posts: 23
Joined: Wed Feb 03, 2021 1:43 pm America/New_York

Wget constantly redirected

by schckngs » Wed Feb 26, 2020 3:54 pm America/New_York

Thanks for sharing the link!  (Novice wget user here :grin:)

So I am not getting any error codes to work with - as posted above it retries to download until it maxes out, so to test a condition it needs to first exit the wget command. The problem seems to be the .urs_cookies file, the cookies are frequently (like once an hour or more) saved incorrectly. I delete the .urs_cookies file, it works again. Without the "--auth-no-challenge=on" option, the redirects also save an xml file for every retry when it stops working.

I'm following the wget command posted on the data download page:
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/${myfile}

Is it a bad idea to only load a .urs_cookies file I know works and not save or keep session cookies like so?
wget --load-cookies ~/.urs_cookies_good --auth-no-challenge=on --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/${myfile}

OB.DAAC - amscott
User Services
User Services
Posts: 64
Joined: Mon Jun 22, 2020 5:24 pm America/New_York

Wget constantly redirected

by OB.DAAC - amscott » Wed Feb 26, 2020 4:07 pm America/New_York

We removed the recommendation to --keep-session-cookies in the data download examples. What happens if you "save" cookies during your session, but don't "keep" them after you exit the browser?

schckngs
Posts: 23
Joined: Wed Feb 03, 2021 1:43 pm America/New_York

Wget constantly redirected

by schckngs » Thu Feb 27, 2020 1:21 pm America/New_York

Thanks, now I'm trying saving and loading cookies, but not keeping them.
I tried a run first saving a cookie to make sure I had one that works. Then in my script I ran subsequent wget commands with only the --load-cookies option (same as example in my previous comment). It worked well for a few hours and then same thing again.... then it won't work with the same cookie. I must be exceeding some kind of time limit or file limit...?

schckngs
Posts: 23
Joined: Wed Feb 03, 2021 1:43 pm America/New_York

Wget constantly redirected

by schckngs » Fri Feb 28, 2020 8:04 am America/New_York

As a follow up, that seems to have fixed it, thanks amscott. :)

Eg.
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/${l2nasa_file}

khyde
Posts: 37
Joined: Mon Dec 04, 2006 11:01 am America/New_York

Wget constantly redirected

by khyde » Fri Feb 28, 2020 1:32 pm America/New_York

Hello,

I have been having similar issues, but when I tried this latest example it did work.  However, I would really like to use the -N option in wget to only get newer files if the current one exists and this always returns a 400 Bad request error.  Is there a reason why -N will not work?

Thanks,
Kim

PS The example below also uses -c, which does work as long as -N isn't an option, although I haven't tested it with a partially downloaded file yet.

wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --content-disposition -c -N https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2020026183500.L1A_LAC.bz2
--2020-02-28 13:25:57--  https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2020026183500.L1A_LAC.bz2
Resolving oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)... xx.xxx.xx.xx, 2001:4d0:2418:128::84
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /ob/getfile/A2020026183500.L1A_LAC.bz2 [following]
--2020-02-28 13:25:58--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/A2020026183500.L1A_LAC.bz2
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-28 13:25:58--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Resolving urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)... xx.xxx.xx.xx, 2001:4d0:241a:4081::89
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|xx.xxx.xx.xx|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2020-02-28 13:25:58 ERROR 400: Bad Request.

Post Reply