Discrepancy between MERIS staged data and wget based on level2 browser

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
emaure
Posts: 27
Joined: Tue Mar 26, 2013 7:18 am America/New_York
Answers: 0

Discrepancy between MERIS staged data and wget based on level2 browser

by emaure » Wed Jan 09, 2019 7:11 pm America/New_York

Hi there,

I am trying to batch download ocean colour data for SeaWiFS, MERIS, MODIS and VIIRS based on the script discussed in a previous forum ([https://oceancolor.gsfc.nasa.gov/forum/oceancolor/topic_show.pl?pid=19165%5d).

I am able to download all sensors for my region of interest except for MERIS that the downloaded files cannot open.

For testing purposes, I downloaded the same files through the "dataorder system" and "wget" and the results are as shown below:
dataorder file: M2003105004421.L2_RR_OC.x.nc (file size:  163,412 KB). Although this file has ".nc" in fact, it is ".hdf".
wget file: M2003105004421.L2_RR_OC (file size:  86,610 KB)

Even though I am able to "wget" this data, the file is corrupt and cannot open. So, could you kindly help me understand the reason for this issue?
Sincerely,
Eligio

Tags:

OB ODPS - jgwilding
Subject Matter Expert
Subject Matter Expert
Posts: 139
Joined: Fri Feb 19, 2021 1:09 pm America/New_York
Answers: 0
Been thanked: 1 time

Discrepancy between MERIS staged data and wget based on level2 browser

by OB ODPS - jgwilding » Wed Jan 09, 2019 10:09 pm America/New_York

Hi Eligio,

The smaller file size of M2003105004421.L2_RR_OC.x.nc is because it is an extract.  The .nc suffix might be incorrect as you say, because the script that is creating the extract is assuming the input L2 file is a netcdf file.  We haven't reprocessed MERIS L2 data since switching the L2/L3 products to netcdf.  I will try to put a fix in for that for future orders.

What application are you using to try to open the file?  Does renaming M2003105004421.L2_RR_OC.x.nc to M2003105004421.L2_RR_OC.x.hdf work?

john

emaure
Posts: 27
Joined: Tue Mar 26, 2013 7:18 am America/New_York
Answers: 0

Discrepancy between MERIS staged data and wget based on level2 browser

by emaure » Thu Jan 10, 2019 1:27 am America/New_York

Hi John,

Thanks for your reply.

"What application are you using to try to open the file?  Does renaming M2003105004421.L2_RR_OC.x.nc to M2003105004421.L2_RR_OC.x.hdf work?"
To open this file (M2003105004421.L2_RR_OC.x.nc) I used Matlab. The file works fine even with ".nc" as long as I use hdf tools.

I found my initial post was not clear enough about what I wanted to ask.
My problem is with the smaller file size which I obtained through wget. It reports HDF file 'M2003105004421.L2_RR_OC' may be invalid or corrupt when I try to open it.
So this is my concern, why the file downloaded with wget gets corrupt only for the MERIS data?
I reported file size because I suspected that it did not complete the download

Thank you

Eligio

OB ODPS - jgwilding
Subject Matter Expert
Subject Matter Expert
Posts: 139
Joined: Fri Feb 19, 2021 1:09 pm America/New_York
Answers: 0
Been thanked: 1 time

Discrepancy between MERIS staged data and wget based on level2 browser

by OB ODPS - jgwilding » Thu Jan 10, 2019 8:30 am America/New_York

Hi Eligio,

I see now.  The file, M2003105004421.L2_RR_OC, is compressed with bzip2 on our system.  If you are downloading the file with wget and specifying the uncompressed name, some versions of wget will make that the output-file name, and you'll get a compressed file in an uncompressed name.  If you're on a Linux-based system, you can try renaming the file to M2003105004421.L2_RR_OC.bz2 followed by trying to uncompress it: bunzip2 M2003105004421.L2_RR_OC.bz2 or bzip2 -d M2003105004421.L2_RR_OC.bz2

Also the 'file' command should identify it as bzip2 compressed regardless of the suffix.

I believe MacOS has the bzip2 utility, and there are easy-to-get utlities for Windows that support it.

Some versions of wget support the --content-disposition option that will save the file to its actual name rather than what you asked for.

john

emaure
Posts: 27
Joined: Tue Mar 26, 2013 7:18 am America/New_York
Answers: 0

Discrepancy between MERIS staged data and wget based on level2 browser

by emaure » Fri Jan 11, 2019 3:15 am America/New_York

Hi John,

Wow, I am stupified. The problem was file compression!!!
I did not imagine that the file was compressed as it ends with '.L2_OC_RR'

Many thanks for your help, I am able to open it now.

As for the wget, actually, am using python requests on windows. I will check if there is a way to uncompress on the fly.

Thank you!
Eligio

OB ODPS - jgwilding
Subject Matter Expert
Subject Matter Expert
Posts: 139
Joined: Fri Feb 19, 2021 1:09 pm America/New_York
Answers: 0
Been thanked: 1 time

Discrepancy between MERIS staged data and wget based on level2 browser

by OB ODPS - jgwilding » Fri Jan 11, 2019 9:04 am America/New_York

Hi Eligio,

Happy to help, and glad that was the resolution to the problem.  If your Python script can look for the 'Content-Disposition' header, it should find the actual file name in that.  That is basically what wget's --content-disposition option is trying to do.

The getfile service call will accept the file name with or without the compressed suffix, using the uncompressed name as the key.  So as long as the key name is found in the DB, you'll get the file without having to know if it is compressed or not or have to guess as the type of compression.  However, that results in the compressed file being saved in the name given to the getfile service, which is usually the uncompressed name.

As an alternative, you can use the file_search service to provide you with the actual URL to download the file you want.  If you give the file_search service the uncompressed name and the file is compressed, the URL it returns will have the compressed name in it:

Ex:
    https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=M2003105004421.L2_RR_OC&results_as_file=1&addurl=1

I used wget with -O wget.out to save the result, and after wget completes, the wget.out file contains,

    https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/M2003105004421.L2_RR_OC.bz2

If you're using Python, you can probably just grab the output directly, and in that case, you would omit the 'results_as_file=1' parameter on the file_search URL.

john

Post Reply