Issues with wget authentication

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
earthengine_urs
Posts: 72
Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time

Issues with wget authentication

by earthengine_urs » Mon Jan 27, 2020 10:52 am America/New_York

Hi,

I'm having the same problems as above - wget only works --auth-no-challenge=on, but since this method sends plaintext password, it's not great. LPDAAC downloads (eg, https://e4ftl01.cr.usgs.gov/VIIRS/VNP13A1.001/2016.08.28/VNP13A1.A2016241.h24v06.001.2018162005736.h5) also require authentication, but work with just "wget --user --password". Is it possible to configure this site in a similar way?

Thanks,
Simon

Tags:

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1464
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Issues with wget authentication

by OB.DAAC - SeanBailey » Mon Jan 27, 2020 3:03 pm America/New_York

Simon,

True, using --auth-no-challenge=on in not ideal, but it is going over an HTTPS connection, so it isn't as bad as it could be :eek:
You probably should use the .netrc/urs_cookie approach described on https://oceancolor.gsfc.nasa.gov/data/download_methods/ instead of the command line username/password approach.

BTW, I ran your example file through wget with the --verbose option set, and it seems that the login fails (yes, I did pass it proper credentials :wink:), but the download proceeds anyway - which suggests to me that they're not verifying the URS response.  This would explain why it *works* for them but not us (we verify).

Sean

earthengine_urs
Posts: 72
Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time

Issues with wget authentication

by earthengine_urs » Mon Jan 27, 2020 10:57 pm America/New_York

Sean,

Thank you, but the problem is I actually can't use neither wget nor curl, as my binaries do not have access to the Internet. Instead, we have an internal system that proxies HTTP requests, and I don't think I'd be able to use plaintext auth there. I was hoping to use this system for HEAD requests to get file sizes - any chance you could turn off auth for HEAD, maybe? (The actual downloads go through another system, but it's too cumbersome to use for HEAD.)

The .netrc approach also might not be trivial with the internal system.

What auth failure do you see with the LP DAAC file? What about https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/2000.02.24/MOD10A1.A2000055.h34v10.006.2016061160522.hdf?
(They definitely require the correct credentials.)

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1464
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Issues with wget authentication

by OB.DAAC - SeanBailey » Tue Jan 28, 2020 12:46 pm America/New_York

Simon,

Yes, upon closer inspection, it does indeed seem to require a valid login.
It also spits out a 401 amid a flurry of 302s, so that is odd...I've asked folks to dig deeper to see if we can get wget to be happy without the --auth-no-challenge=on option set.
Perhaps there is something in the way we're making the authentication request to URS...

You do not need to a HEAD request to get a filesize.  In fact, bad form to do so ( in my opinion :grin: )

The file_search API does not require authentication and can be used to retrieve information about a file.
For example:
https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=V2020001020000.L1A_SNPP.nc&format=json
will return:
{"V2020001020000.L1A_SNPP.nc":{"cdate":"2020-01-03 13:34:00","checksum":"sha1:54ab2d38004208ad9612ba4581357686dc8071d0","getfile":"https:\/\/oceandata.sci.gsfc.nasa.gov\/ob\/getfile","size":386400944}}

If you are less specific in the search parameters, you'll get a JSON output with the information for all the files that match your search.

Regards,
Sean

earthengine_urs
Posts: 72
Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time

Issues with wget authentication

by earthengine_urs » Tue Jan 28, 2020 5:48 pm America/New_York

Thanks, Sean. This looks promising, but I'm hitting a weird snag. Using wget on such URLs works fine, but fetching them using our internal system returns a 403 and this:

<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>

Do you happen to have some IP blocks, maybe?

Thanks,
Simon

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1464
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Issues with wget authentication

by OB.DAAC - SeanBailey » Tue Jan 28, 2020 6:34 pm America/New_York

It is possible to get hit by a network block, but if you're seeing the error page, you're not (yet) blocked (hit it enough and you may get blocked).  More likely there is an issue with the request you're making.  Without seeing exactly what you're asking for, I can't say what that issue would be.

Sean

earthengine_urs
Posts: 72
Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time

Issues with wget authentication

by earthengine_urs » Tue Jan 28, 2020 7:54 pm America/New_York

I'm able to repeat the request with the exact same headers without issues from my desktop, so I'm kinda stumped. Could you look in nginx's error logs to see if there are any details? The URL I tried was:

https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=A2020014.L3m_DAY_CHL_chlor_a_4km.nc&format=json

BTW, is the information about file size.date exported to NASA CMR?

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1464
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

Issues with wget authentication

by OB.DAAC - SeanBailey » Wed Jan 29, 2020 12:08 pm America/New_York

Simon,

Would your internal system be seen as crawl-????.googlebot.com?  If so, then yes, you're being denied access to our search API.

Yes, the information returned by the API for filesize, etc. should match the corresponding information we provide to CMR.

Sean

earthengine_urs
Posts: 72
Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time

Issues with wget authentication

by earthengine_urs » Wed Jan 29, 2020 12:13 pm America/New_York

Sean,

Yes, crawl-????.googlebot.com sounds right. If it's impossible to unblock it, I'll try CMR.

gnwiii
Posts: 713
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2
Has thanked: 1 time

Issues with wget authentication

by gnwiii » Wed Jan 29, 2020 1:19 pm America/New_York

GNU wget's bugzilla has a new take on the use of --auth-no-challenge=on, arguing that there is no real security advantage, that the extra request has low cost/benefit, and that curl already defaults to wget's --auth-no-challenge=on behaviour.   I expect such a change is more likely to appear in wget2.

Post Reply