Welcome to the Earthdata Forum! Here, the scientific user community and subject matter experts from NASA Distributed Active Archive Centers (DAACs), and other contributors, discuss research needs, data, and data applications.
by jvaldezch » Wed Sep 09, 2020 11:31 am America/New_York
I was wodering if it is possible to run processes in parallel. I'm currently processing L2 products to obtain AFAI, but when running processes like modis_GEO.py and modis_L1B.py I realize that both scripts uses current work directory and creates two files with a non unique name "ShmMem" and "GetAttr.temp" and this provoques other processes runing over same directory to crash.
I guess I would take the easy way out. Make a directory for each parallel process and move a set of files to each directory. Process each directory with one process. When done move all the files back to the original directory. Moving a file just changes a pointer to the actual data in the filesystem, so it is fast regardless of the size of the file. Assuming you are moving the file within a physical storage device.
We might revisit those scripts and make process unique temp filenames, but don't hold your breath.
by gnwiii » Wed Sep 09, 2020 3:03 pm America/New_York
The level-2 processing is the one that can benefit the most from parallel processing. The GEO and L1B processing are dominated by I/O. In the past (on a system "borrowed" from numerical modellers that had one data disk but 24 cores) it made sense run the lower level processing steps serially and use GNU parallel for the level-2 processing. There were diminishing returns and heat problems using all 24 cores, so in the end l2gen was given fewer cores.