Get Level0 files size from curl command line
-
- Posts: 3
- Joined: Thu Mar 28, 2024 11:22 am America/New_York
Get Level0 files size from curl command line
Hello,
I would like to check if a file I downloaded (say https://oceandata.sci.gsfc.nasa.gov/getfile/A2002311192500.L0_LAC.bz2 ) was fully downloaded to avoid downloading it again during batch processing. For that, I think I need to know the size of the file on the server, usually saved in "Content-length". This information is not available when I try to do "curl -sI link".
If I follow the location in the header, I get :
###################
curl -sI https://urs.earthdata.nasa.gov//oauth/authorize?client_id=pDPu0awH156XLrK6VV0Y0w&response_type=code&redirect_uri=https://oceandata.sci.gsfc.nasa.gov/getfile/urs/
[1] 81559
[2] 81560
[2]+ Done response_type=code
[data]$ HTTP/1.1 302 Found
Server: nginx/1.22.1
Date: Mon, 08 Jul 2024 17:38:48 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Cache-Control: no-store
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Location: https://urs.earthdata.nasa.gov/home
Set-Cookie: _urs-gui_session=xxxxx; path=/; expires=Tue, 09 Jul 2024 17:38:48 GMT; HttpOnly
X-Request-Id: xxxxx
X-Runtime: 0.011409
Strict-Transport-Security: max-age=31536000
################
Still no Content-length (maybe due to the nosniff?). Any solution?
Thanks
I would like to check if a file I downloaded (say https://oceandata.sci.gsfc.nasa.gov/getfile/A2002311192500.L0_LAC.bz2 ) was fully downloaded to avoid downloading it again during batch processing. For that, I think I need to know the size of the file on the server, usually saved in "Content-length". This information is not available when I try to do "curl -sI link".
If I follow the location in the header, I get :
###################
curl -sI https://urs.earthdata.nasa.gov//oauth/authorize?client_id=pDPu0awH156XLrK6VV0Y0w&response_type=code&redirect_uri=https://oceandata.sci.gsfc.nasa.gov/getfile/urs/
[1] 81559
[2] 81560
[2]+ Done response_type=code
[data]$ HTTP/1.1 302 Found
Server: nginx/1.22.1
Date: Mon, 08 Jul 2024 17:38:48 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Cache-Control: no-store
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Location: https://urs.earthdata.nasa.gov/home
Set-Cookie: _urs-gui_session=xxxxx; path=/; expires=Tue, 09 Jul 2024 17:38:48 GMT; HttpOnly
X-Request-Id: xxxxx
X-Runtime: 0.011409
Strict-Transport-Security: max-age=31536000
################
Still no Content-length (maybe due to the nosniff?). Any solution?
Thanks
Filters:
-
- Subject Matter Expert
- Posts: 450
- Joined: Fri Feb 05, 2021 9:17 am America/New_York
- Been thanked: 7 times
Re: Get Level0 files size from curl command line
Since the files are bz2 compressed, you can run the test command. If the download is incomplete, the file will not have a valid bz2 structure and the test will return a non-zero exit status.
Tommy
Code: Select all
bunzip2 -t <file>
Tommy
-
- Posts: 3
- Joined: Thu Mar 28, 2024 11:22 am America/New_York
Re: Get Level0 files size from curl command line
That would be a lovely solution if it would not take so much time to process. We are speaking at 10-20 seconds to check a single file (700 mb each), not realistic when batch processing thousands / more.
My initial step was to do something similar with linux built in sha1sum function, but many files (take a random MODIS AQUA L0 file) do not have a valid hash key, so I cannot use this approach. Open to suggestions.
My initial step was to do something similar with linux built in sha1sum function, but many files (take a random MODIS AQUA L0 file) do not have a valid hash key, so I cannot use this approach. Open to suggestions.
-
- Subject Matter Expert
- Posts: 450
- Joined: Fri Feb 05, 2021 9:17 am America/New_York
- Been thanked: 7 times
Re: Get Level0 files size from curl command line
Use wget and check the return code on the transfer?
Either way, there is no chance of you processing a partial file, because the bunzip2 will fail.
Tommy
Code: Select all
#!/bin/bash
url=https://oceandata.sci.gsfc.nasa.gov/getfile
file=MOD00.P2002165.0000_1.PDS.bz2
wget ${url}/${file} &> /dev/null
if [[ "$?" != 0 ]]; then
echo "Error downloading $file"
else
echo "Success"
fi
Tommy