PDS files in subscriptions now in both bz2 and 'data' format

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
OB ODPS - jgwilding
Subject Matter Expert
Subject Matter Expert
Posts: 139
Joined: Fri Feb 19, 2021 1:09 pm America/New_York
Answers: 0
Been thanked: 1 time

PDS files in subscriptions now in both bz2 and 'data' format

by OB ODPS - jgwilding » Wed May 13, 2020 4:18 pm America/New_York

It seems there might be two solutions.

One, ignore files without the .bz2 suffix.  They will eventually get compressed, and then you can process them.  For example, in your example above the 2020134.1530 and 2020134.1535 files are now compressed as of this writing.

The second might involve some changes on your side.  If you're using the name of the downloaded file as the base of the work-order file, and that may or may not have .bz2 in the name, why not just strip off any .bz2 suffix from the downloaded file name and use that as the base for the work-order file name?  You must be uncompressing the .bz2 files so at some point, the name of the file is the uncompressed name.  Just use that as the work-order key.  If the file appears in the file-search list as uncompressed, you download it.  There's no .bz2 suffix to strip.  The work-order key is, say, MOD00.P2020134.1540_1.PDS.  Then later, the compressed file shows up: MOD00.P2020134.1540_1.PDS.bz2.  You strip off the .bz2 suffix and find that the work-order already exists, so nothing to do.

john

Tags:

oo_processing
Posts: 304
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

PDS files in subscriptions now in both bz2 and 'data' format

by oo_processing » Wed May 13, 2020 6:58 pm America/New_York

John,

There might be a simple third. Wait until after the bzip2 before pushing to the subscriptions.

Having said that, I have made adjustments to ignore the files returned without the bz2 extension.
If, as you say, they will be bzip2'ed and found in that format from a subscriptions file search, then it shouldn't be an issue for me with ignoring them.
My programs are designed for that.
When it appears in the subscription as a bz2 file, everything will be reprocessed again.

I guess we will see.

Cheers mate and stay safe all!

OB ODPS - jgwilding
Subject Matter Expert
Subject Matter Expert
Posts: 139
Joined: Fri Feb 19, 2021 1:09 pm America/New_York
Answers: 0
Been thanked: 1 time

PDS files in subscriptions now in both bz2 and 'data' format

by OB ODPS - jgwilding » Wed May 13, 2020 9:47 pm America/New_York

The default for the subscription file-search is to look back 3 days, so ignoring the non-compressed files is certainly the easiest solution in terms of work required, as all but an anomalous case would get compressed long before that.

The reason we changed the ingest scheme for the L0 data is because with multiple instances of L0s ingesting concurrently, we were killing our disks and seeing wall times exceed 10-15 minutes when they should finish in under 3 minutes.  And that's just if MODIS L0s are running.  We could just as easily have a 5-GB VIIRS L0 file ingesting along with them.  The new scheme allows us to archive the file to the storage system more quickly and allow it to perform the compression.  That will hopefully allow disks to live longer, and it reduces the latency for the down-stream products.

Stay safe as well.

oo_processing
Posts: 304
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

PDS files in subscriptions now in both bz2 and 'data' format

by oo_processing » Wed May 13, 2020 10:29 pm America/New_York

Good to know.
I understand there are many excellent reasons to change work flows.
I can understand your motivation.
I just thought it might be simple to change the push to the subscription database after the compression had been done.
No harm or foul though, as that would only mean that it wouldn't show up on my subscription list.
That's something that will happen in my code now as I am looping past non bz2 files.

Cheers, and thanks for the quick response.

oo_processing
Posts: 304
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

PDS files in subscriptions now in both bz2 and 'data' format

by oo_processing » Wed May 20, 2020 12:28 pm America/New_York

It looks like you have done something to resolve this issue? I have not seen this in my logs:

"STATUS: The $granule MOD00.P2020135.1805_1.PDS is not compressed. I'm going to skip it for now:

Since 5/14/2020

But then again, I saw that in this post:
https://oceancolor.gsfc.nasa.gov/forum/oceancolor/topic_show.pl?pid=54708#pid54708

That the 47 too many redirects started on the same day. Coincidence?
I ask as Sean said here:

https://oceancolor.gsfc.nasa.gov/forum/oceancolor/topic_show.pl?pid=54696#pid54696

"Nothing changed on our end either...the trouble is, there is a third party involved"

I understand that it is a different issue, but the dates are the same, and I think something did happen that day as this bz2 problem ended.
But the 47 redirects failed from that day forward (again?)

Cheers,
Brock

OB.DAAC - SeanBailey
User Services
User Services
Posts: 1468
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 5 times

PDS files in subscriptions now in both bz2 and 'data' format

by OB.DAAC - SeanBailey » Wed May 20, 2020 1:31 pm America/New_York

Brock,

Despite the coincidence, the two are completely unrelated. 
Any changes to address the compression race condition simply could not have resulted in an impact to an HTTP transfer.  The redirects for Bruce seemed to start on the 15th, but Stefan reported it going back months for him.  Still trying to identify the cause.  The coincidence I noticed with a different HTTP transfer issue we had that *did* start on the 15th, was, well, also just a coincidence.

Sean

oo_processing
Posts: 304
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 6 times

PDS files in subscriptions now in both bz2 and 'data' format

by oo_processing » Wed May 20, 2020 1:43 pm America/New_York

Sean,

Well, even shots in the dark sometimes hit their targets.
Indeed, they have been going on for months for Chuanmin and others here with many downloads.
I believe that Chuanmin wrote a little script to only do a smaller number (less than 45) and then delete the cookies file
and start again with the next group of less than 47 (after a sleep statement).

Works, but inelegant IMO. And a whole different way to prepare the lists.
I prefer hitting the L0 button on the returned results from a level 1&2 browser search, and then save the results as a file.

Then I do this:
split -d -l 1500 DO.GET_NASA_MODIS_PSD.2019.txt
and end up with 1500 files per list
x00, x01, ... x(n)

And then this:
time curl --interface 2607:fe50:0:6330::101 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /cms_zfs/work_orders/modis/PDS/2019/x01)}

As I mentioned, the d/ls were extremely fast. They must have throttled the stream at some point as I have a 10GB backbone to you.

Cheers, stay very safe.
We need you :grin: :eek: :grin:
Brock

Post Reply