Page 1 of 1

Discrepancy between MERIS staged data and wget based on level2 browser

Posted: Wed Jan 09, 2019 7:11 pm America/New_York
by emaure
Hi there,

I am trying to batch download ocean colour data for SeaWiFS, MERIS, MODIS and VIIRS based on the script discussed in a previous forum ([https://oceancolor.gsfc.nasa.gov/forum/oceancolor/topic_show.pl?pid=19165%5d).

I am able to download all sensors for my region of interest except for MERIS that the downloaded files cannot open.

For testing purposes, I downloaded the same files through the "dataorder system" and "wget" and the results are as shown below:
dataorder file: M2003105004421.L2_RR_OC.x.nc (file size:  163,412 KB). Although this file has ".nc" in fact, it is ".hdf".
wget file: M2003105004421.L2_RR_OC (file size:  86,610 KB)

Even though I am able to "wget" this data, the file is corrupt and cannot open. So, could you kindly help me understand the reason for this issue?
Sincerely,
Eligio

Discrepancy between MERIS staged data and wget based on level2 browser

Posted: Wed Jan 09, 2019 10:09 pm America/New_York
by OB ODPS - jgwilding
Hi Eligio,

The smaller file size of M2003105004421.L2_RR_OC.x.nc is because it is an extract.  The .nc suffix might be incorrect as you say, because the script that is creating the extract is assuming the input L2 file is a netcdf file.  We haven't reprocessed MERIS L2 data since switching the L2/L3 products to netcdf.  I will try to put a fix in for that for future orders.

What application are you using to try to open the file?  Does renaming M2003105004421.L2_RR_OC.x.nc to M2003105004421.L2_RR_OC.x.hdf work?

john

Discrepancy between MERIS staged data and wget based on level2 browser

Posted: Thu Jan 10, 2019 1:27 am America/New_York
by emaure
Hi John,

Thanks for your reply.

"What application are you using to try to open the file?  Does renaming M2003105004421.L2_RR_OC.x.nc to M2003105004421.L2_RR_OC.x.hdf work?"
To open this file (M2003105004421.L2_RR_OC.x.nc) I used Matlab. The file works fine even with ".nc" as long as I use hdf tools.

I found my initial post was not clear enough about what I wanted to ask.
My problem is with the smaller file size which I obtained through wget. It reports HDF file 'M2003105004421.L2_RR_OC' may be invalid or corrupt when I try to open it.
So this is my concern, why the file downloaded with wget gets corrupt only for the MERIS data?
I reported file size because I suspected that it did not complete the download

Thank you

Eligio

Discrepancy between MERIS staged data and wget based on level2 browser

Posted: Thu Jan 10, 2019 8:30 am America/New_York
by OB ODPS - jgwilding
Hi Eligio,

I see now.  The file, M2003105004421.L2_RR_OC, is compressed with bzip2 on our system.  If you are downloading the file with wget and specifying the uncompressed name, some versions of wget will make that the output-file name, and you'll get a compressed file in an uncompressed name.  If you're on a Linux-based system, you can try renaming the file to M2003105004421.L2_RR_OC.bz2 followed by trying to uncompress it: bunzip2 M2003105004421.L2_RR_OC.bz2 or bzip2 -d M2003105004421.L2_RR_OC.bz2

Also the 'file' command should identify it as bzip2 compressed regardless of the suffix.

I believe MacOS has the bzip2 utility, and there are easy-to-get utlities for Windows that support it.

Some versions of wget support the --content-disposition option that will save the file to its actual name rather than what you asked for.

john

Discrepancy between MERIS staged data and wget based on level2 browser

Posted: Fri Jan 11, 2019 3:15 am America/New_York
by emaure
Hi John,

Wow, I am stupified. The problem was file compression!!!
I did not imagine that the file was compressed as it ends with '.L2_OC_RR'

Many thanks for your help, I am able to open it now.

As for the wget, actually, am using python requests on windows. I will check if there is a way to uncompress on the fly.

Thank you!
Eligio

Discrepancy between MERIS staged data and wget based on level2 browser

Posted: Fri Jan 11, 2019 9:04 am America/New_York
by OB ODPS - jgwilding
Hi Eligio,

Happy to help, and glad that was the resolution to the problem.  If your Python script can look for the 'Content-Disposition' header, it should find the actual file name in that.  That is basically what wget's --content-disposition option is trying to do.

The getfile service call will accept the file name with or without the compressed suffix, using the uncompressed name as the key.  So as long as the key name is found in the DB, you'll get the file without having to know if it is compressed or not or have to guess as the type of compression.  However, that results in the compressed file being saved in the name given to the getfile service, which is usually the uncompressed name.

As an alternative, you can use the file_search service to provide you with the actual URL to download the file you want.  If you give the file_search service the uncompressed name and the file is compressed, the URL it returns will have the compressed name in it:

Ex:
    https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=M2003105004421.L2_RR_OC&results_as_file=1&addurl=1

I used wget with -O wget.out to save the result, and after wget completes, the wget.out file contains,

    https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/M2003105004421.L2_RR_OC.bz2

If you're using Python, you can probably just grab the output directly, and in that case, you would omit the 'results_as_file=1' parameter on the file_search URL.

john