xz compressed, not bzip2

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
oo_processing
Posts: 271
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0

xz compressed, not bzip2

by oo_processing » Mon Aug 05, 2019 1:42 pm America/New_York

I have had two downloaded files from 2018 that have the wrong compression type?
This is one of them (and how I 'fixed' them):
[seadas_l1a_geo_extract_h5]$ file /cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2
/cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2: xz compressed data
[070]$ mv MOD00.P2018070.0140_1.PDS.bz2 MOD00.P2018070.0140_1.PDS.xz
[070]$ xz --decompress MOD00.P2018070.0140_1.PDS.xz
[070]$ ll MOD00.P2018070.0140_1.PDS
-rw-rw-r-- 1 bmurch cms_optics 396889536 Jul 30 01:30 MOD00.P2018070.0140_1.PDS
[070]$ bzip2 MOD00.P2018070.0140_1.PDS
[070]$ file /cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2
/cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2: bzip2 compressed data, block size = 900k

Tags:

OB ODPS - towens
Subject Matter Expert
Subject Matter Expert
Posts: 232
Joined: Fri Feb 05, 2021 9:17 am America/New_York
Answers: 0

xz compressed, not bzip2

by OB ODPS - towens » Mon Aug 05, 2019 1:53 pm America/New_York

Our data provider changed to using xz compression for their long-term storage.
Our ingest code was not expecting it when we replaced some corrupted files with new copies.
I'll fix these on the server.

Thanks,
Tommy

OB ODPS - towens
Subject Matter Expert
Subject Matter Expert
Posts: 232
Joined: Fri Feb 05, 2021 9:17 am America/New_York
Answers: 0

xz compressed, not bzip2

by OB ODPS - towens » Mon Aug 05, 2019 1:57 pm America/New_York

I just checked the server, the file has the correct xz extension:    MOD00.P2018070.0140_1.PDS.xz
Is your code renaming it to bz2?

Tommy

oo_processing
Posts: 271
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0

xz compressed, not bzip2

by oo_processing » Mon Aug 05, 2019 2:37 pm America/New_York

Tommy,

I use the L1/2 browser to generate a L0 list. I then drop it into a file and I append the bz2 to the names and normally get them like this where x00 is the list:

time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /cms_zfs/work_orders/modis/PDS/2018/x00)}

However, I just noticed this:

[bin]$ time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS                                                                                                                                                                           
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                                                                                        
                                 Dload  Upload   Total   Spent    Left  Speed                                                                                                          
100  289M  100  289M    0     0  20.7M      0  0:00:13  0:00:13 --:--:-- 21.8M                                                                                                         

real    0m13.955s
user    0m0.362s
sys     0m0.258s
[bin]$ ll MOD00.A2000364.1045_1.PDS
-rw-rw-r-- 1 bmurch bmurch 303855723 Aug  5 14:23 MOD00.A2000364.1045_1.PDS                                        
[bin]$ file MOD00.A2000364.1045_1.PDS
MOD00.A2000364.1045_1.PDS: bzip2 compressed data, block size = 900k
[bin]$ time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  289M  100  289M    0     0  21.3M      0  0:00:13  0:00:13 --:--:-- 21.8M

real    0m13.591s
user    0m0.330s
sys     0m0.254s
[bin]$ file MOD00.A2000364.1045_1.PDS*
MOD00.A2000364.1045_1.PDS:     bzip2 compressed data, block size = 900k
MOD00.A2000364.1045_1.PDS.bz2: bzip2 compressed data, block size = 900k
[bin]$ diff MOD00.A2000364.1045_1.PDS MOD00.A2000364.1045_1.PDS.bz2

[bin]$

So, it appears that the same file is returned regardless of the extension in the above cases.
BUT not with xz extention

time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed 
110   665  110   665    0     0    119      0  0:00:05  0:00:05 --:--:--   197

real    0m5.581s
user    0m0.051s
sys     0m0.051s

[bin]$ cat MOD00.A2000364.1045_1.PDS.xz
<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>

So do you suggest that I need to test every downloaded file (with file command) and then determine the type of compression from that?

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1381
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

xz compressed, not bzip2

by OB.DAAC - SeanBailey » Mon Aug 05, 2019 6:22 pm America/New_York

I suggest you don't append the .bz2.  The file search is based on the uncompressed filename - which is why it pulls down the .bz2 file even if you don't append the extension.
.xz is not one of the compression extensions (currently) recognized by the script, so it doesn't know to strip it off when doing the lookup, and so doesn't find the file.  If you don't go to the effort to guess the extension, you won't have to, well, guess the extension :grin:

Let cURL assign the filename from the Content-Disposition header:

$ curl -O -J https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  289M  100  289M    0     0  6980k      0  0:00:42  0:00:42 --:--:-- 7370k
curl: Saved to filename 'MOD00.A2000364.1045_1.PDS.bz2'

$ curl -O -J https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2018070.0140_1.PDS
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  277M  100  277M    0     0  8695k      0  0:00:32  0:00:32 --:--:-- 8933k
curl: Saved to filename 'MOD00.P2018070.0140_1.PDS.xz'


Sean

oo_processing
Posts: 271
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0

xz compressed, not bzip2

by oo_processing » Wed Aug 07, 2019 2:35 pm America/New_York

But, I will have to decide what to do with the downloaded file.
So do I bunzip2 it? Or unxz?
I guess test to ensure it is a bzip2 file?

Brock

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1381
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1

xz compressed, not bzip2

by OB.DAAC - SeanBailey » Wed Aug 07, 2019 3:55 pm America/New_York

> But, I will have to decide what to do with the downloaded file.


Yes, you will, but the file extension should clue you in as to which decompression utility to use.

Post Reply