l2gen very slow when using output from l1bextract_safe_nc

andrew.meredith · by **andrew.meredith** » Thu Feb 15, 2024 2:19 pm America/New_York

The time for l2gen to process an OLCI granule subset using l1bextract_safe_nc takes significantly longer than processing the full granule (l2gen version msl12 9.6.0-V2023.3 (Oct 4 2023 22:27:24)).

I tested on both RHEL8.9 and Ubuntu 22.04 with similar results.

The following example demonstrates the problem. The OLCI L1B granule is first subset and the subsetted L1B file is used as input to l2gen:

l1bextract_safe_nc -v --north=30.6654098 --south=24.1619781 --east=-79.8336442 --west=-84.6220924 S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3 S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3.FL3

time l2gen ifile=S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3.FL3/xfdumanifest.xml ofile=S3A_OL_1_EFR____20240213T152251.FL3.L2

Processing Rate = 0.413858 scans/sec

real 106m48.199s
user 94m18.375s
sys 12m9.557s

Processing the full granule as follows only took ~20 mins versus ~106 mins when using the subset granule:
time l2gen ifile=/scratch/SAPS/test/speed/S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3/xfdumanifest.xml ofile=S3A_OL_1_EFR____20240213T152251.L2

Processing Rate = 3.375413 scans/sec

real 20m12.506s
user 20m3.674s
sys 0m2.717s

An earlier version on l2gen (msl12 9.6.0-T2022.20 (Jul 28 2022 18:14:29)) produced more expected results. The subset input took ~4 mins to process and the full granule took ~17 mins.

Regards,
Andrew

andrew.meredith · by **andrew.meredith** » Tue Feb 20, 2024 4:54 pm America/New_York

The problem seems to be related to different values being assigned to ChunkSizes in the l1bextract_safe_nc output netCDF files. Using nccopy to recreated radiance files with updated ChunkSizes gave much better l2gen performance.

I see there's a "TODO: get and set chunksizes" comment in the netcdf_utils.py script. Any chance that's been addressed?

Thanks
Andrew

by **OB.DAAC - amscott** » Fri Feb 23, 2024 1:37 am America/New_York

A solution isn't ready yet. This is still being investigated.

andrew.meredith · by **andrew.meredith** » Fri Feb 23, 2024 11:04 am America/New_York

Thanks for the update.

I implemented the following change in the nccopy_var function in netcdf_utils.py to set chunking when creating the output variable that fixed the problem for me:

# create variable with same name, dimnames, storage format
zlib = srcvar.filters().get('zlib', False)
shuffle = srcvar.filters().get('shuffle', False)
complevel = srcvar.filters().get('complevel', 0)
chunking = srcvar.chunking()

for idx, dimname in enumerate(srcvar.dimensions):
if indices and dimname in indices:
if chunking[idx] > len(indices[dimname]):
chunking[idx] = len(indices[dimname])

dstvar = dstgrp.createVariable(srcvar.name,
srcvar.dtype,
srcvar.dimensions,
zlib=zlib,
shuffle=shuffle,
chunksizes=chunking,
complevel=complevel)

Andrew

Forum

l2gen very slow when using output from l1bextract_safe_nc

l2gen very slow when using output from l1bextract_safe_nc

Re: l2gen very slow when using output from l1bextract_safe_nc

Re: l2gen very slow when using output from l1bextract_safe_nc

Re: l2gen very slow when using output from l1bextract_safe_nc