Issues with wget authentication

Please enter here to ask a question about any NASA Science related topics!
OB.DAAC - SeanBailey
User Services
User Services
Posts: 1222
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

Issues with wget authentication

by OB.DAAC - SeanBailey » Wed Jan 29, 2020 2:22 pm America/New_York

It's not impossible.  Just seemed prudent to keep crawlers out of APIs that require some apriori knowledge to be meaningful. 
I'll make the request of the network gurus that the deny entry be removed for file_search.

That said, if you are writing something that is intended to get data from multiple DAACs, then you probably do want to target CMR for consistency.
It is the reason CMR exists, after all.

Sean

Edit:  ask and ye shall receive...

>Done. Googlebot ips should now be able to see indexed pages and the api/file_search


Tags:

earthengine_urs
Posts: 34
Joined: Mon Jan 27, 2020 10:36 am America/New_York

Issues with wget authentication

by earthengine_urs » Wed Jan 29, 2020 4:56 pm America/New_York

Thank you, these URL now work. Last question - if a file has different creation time and modification time, which of them shows up in the cdate field? (Hopefully, modification time?)

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1222
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

Issues with wget authentication

by OB.DAAC - SeanBailey » Wed Jan 29, 2020 5:35 pm America/New_York

Short answer:   it is the time you should care about

Long answer:  No, it's not "modification time" - because in our database world, modification time is set for actions that have nothing to do with the file being modified :grin:
It's not the mtime in the sense of a filesystem mtime, it's more akin to ctime.   If the *file* is modified (i.e. contents changed), then the creation time is changed, because the only time we modify files is when we create (or recreate) them.  So, the time reported in the cdate field is the time you should care about...which is why it's the time we report.

Sean

earthengine_urs
Posts: 34
Joined: Mon Jan 27, 2020 10:36 am America/New_York

Issues with wget authentication

by earthengine_urs » Tue Feb 04, 2020 4:59 pm America/New_York

Sean,

To make sure I get this right - what are CMD concept ids for MODIS Terra and Aqua L3BIN datasets? For example, is this the correct concept id for CHL:
https://search.earthdata.nasa.gov/search/granules?p=C1458149034-OB_DAAC&q=modis%20terra%20l3bin&tl=1565042310!4!!

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1222
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

Issues with wget authentication

by OB.DAAC - SeanBailey » Fri Feb 07, 2020 9:12 am America/New_York

Simon,

You can get a list of the collections with information to help you select the desired concept IDs via the CMR API.
For example, to get all the MODIS L3 collections we archive:

curl "https://cmr.earthdata.nasa.gov/search/collections.json?archive_center=OB.DAAC&processing_level_id\%5b\%5d=3&instrument\%5b\%5d=MODIS"

Parse the resulting JSON output and use the information there to choose the IDs you care about.

Sean

earthengine_urs
Posts: 34
Joined: Mon Jan 27, 2020 10:36 am America/New_York

Issues with wget authentication

by earthengine_urs » Fri Apr 03, 2020 1:29 pm America/New_York

Sean,

I am able to read the file listing from NASA CMR, but the lack of the file size in the HEAD request is still a problem. Our generic downloading code looks at the expected file size to make sure the download is not interrupted. I can turn this check off or propagate the known file size from elsewhere, but the Content-Length header is fairly standard, so I was hoping your server can be configured to send it.

Thanks,
Simon

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1222
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

Issues with wget authentication

by OB.DAAC - SeanBailey » Wed Apr 15, 2020 9:14 am America/New_York

Simon,
I'm not sure what you're issue is... our server *does* report the content-length, e.g.:

$ curl --head -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 15 Apr 2020 13:12:31 GMT
Content-Type: application/octet-stream
Content-Length: 45004319
Connection: keep-alive
Keep-Alive: timeout=60
Set-Cookie: app-obdaac=ede629ca49fc9f69786b7b0801846946112c18bb; path=/; secure
Last-Modified: Wed, 04 Jan 2017 08:05:27 GMT
Content-Disposition: attachment; filename=T2017004001500.L1A_LAC.bz2
X-Username: <it's me!>
Referrer-Policy: no-referrer
Expect-CT: max-age=31536000, enforce
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Content-Security-Policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov

gnwiii
Posts: 642
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2

Issues with wget authentication

by gnwiii » Wed Apr 15, 2020 11:02 am America/New_York

From Fedroa 31 (also tried Linux Mint 19)  I get "400 Bad Request":
% curl --version
curl 7.66.0 (x86_64-redhat-linux-gnu) libcurl/7.66.0 OpenSSL/1.1.1d-fips zlib/1.2.11 brotli/1.0.7 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0
Release-Date: 2019-09-11
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

% curl --head -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2
HTTP/2 302
server: nginx
date: Wed, 15 Apr 2020 14:49:01 GMT
location: https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA
expires: Mon, 01 Jan 1970 00:00:00 GMT
cache-control: no-cache, must-revalidate, max-age=0, no-store
pragma: no-cache
set-cookie: app-obdaac=2237029341fdafdbf55ebfcb015dbc5632d67b75; path=/; secure
referrer-policy: no-referrer
expect-ct: max-age=31536000, enforce
strict-transport-security: max-age=31536000; includeSubDomains; preload
content-security-policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov

HTTP/1.1 400 Bad Request
Server: nginx/1.17.5
Date: Wed, 15 Apr 2020 14:49:01 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Access-Control-Allow-Origin: null
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST

Access-Control-Expose-Headers: true
Cache-Control: no-cache
Set-Cookie: urs_user_already_logged=yes; domain=earthdata.nasa.gov; path=/; expires=Thu, 16 Apr 2020 14:49:01 -0000
Set-Cookie: _urs-gui_session=266848e6e89b82c5c0cb2873323ecc62; path=/; expires=Thu, 16 Apr 2020 14:49:01 -0000; HttpOnly
X-Request-Id: 405584a6-f946-44a5-bc97-9cac8b84fc0c
X-Runtime: 0.191130
Strict-Transport-Security: max-age=31536000


Using your Python script works:

% obdaac_download.py T2017004001500.L1A_LAC.bz2
% bzip2 -t T2017004001500.L1A_LAC.bz2
% [no news is good news]


It seems curl wants an entry for oceandata.sci.nasa.gov in .netrc.

$ vi .netrc
$ curl --head -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2
HTTP/2 200
server: nginx
date: Wed, 15 Apr 2020 15:23:29 GMT
content-type: application/octet-stream
content-length: 45004319
set-cookie: app-obdaac=b1c9efa0e92c905814033b978d5faf9025567914; path=/; secure
last-modified: Wed, 04 Jan 2017 08:05:27 GMT
content-disposition: attachment; filename=T2017004001500.L1A_LAC.bz2
x-username: <...>
referrer-policy: no-referrer
expect-ct: max-age=31536000, enforce
strict-transport-security: max-age=31536000; includeSubDomains; preload
content-security-policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1222
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

Issues with wget authentication

by OB.DAAC - SeanBailey » Wed Apr 15, 2020 11:17 am America/New_York

Odd, but my point was that the server does report the content-length is still valid, since it does :razz:

As for the  cURL not working for you, what if you try -i instead of --head? e.g.:

$ curl -i  -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 15 Apr 2020 15:13:44 GMT
Content-Type: application/octet-stream
Content-Length: 45004319
Connection: keep-alive
Keep-Alive: timeout=60
Set-Cookie: app-obdaac=f706ab6ee85bbc3507b775433974d8527134e1dd; path=/; secure
Last-Modified: Wed, 04 Jan 2017 08:05:27 GMT
Content-Disposition: attachment; filename=T2017004001500.L1A_LAC.bz2
X-Username: <it's me!>
Referrer-Policy: no-referrer
Expect-CT: max-age=31536000, enforce
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Content-Security-Policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov

gnwiii
Posts: 642
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2

Issues with wget authentication

by gnwiii » Wed Apr 15, 2020 2:36 pm America/New_York

curl -i works without the second entry in ~/.netrc

Post Reply