l2gen very slow when using output from l1bextract_safe_nc
-
- Posts: 38
- Joined: Tue Dec 31, 2013 10:29 am America/New_York
l2gen very slow when using output from l1bextract_safe_nc
The time for l2gen to process an OLCI granule subset using l1bextract_safe_nc takes significantly longer than processing the full granule (l2gen version msl12 9.6.0-V2023.3 (Oct 4 2023 22:27:24)).
I tested on both RHEL8.9 and Ubuntu 22.04 with similar results.
The following example demonstrates the problem. The OLCI L1B granule is first subset and the subsetted L1B file is used as input to l2gen:
l1bextract_safe_nc -v --north=30.6654098 --south=24.1619781 --east=-79.8336442 --west=-84.6220924 S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3 S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3.FL3
time l2gen ifile=S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3.FL3/xfdumanifest.xml ofile=S3A_OL_1_EFR____20240213T152251.FL3.L2
Processing Rate = 0.413858 scans/sec
real 106m48.199s
user 94m18.375s
sys 12m9.557s
Processing the full granule as follows only took ~20 mins versus ~106 mins when using the subset granule:
time l2gen ifile=/scratch/SAPS/test/speed/S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3/xfdumanifest.xml ofile=S3A_OL_1_EFR____20240213T152251.L2
Processing Rate = 3.375413 scans/sec
real 20m12.506s
user 20m3.674s
sys 0m2.717s
An earlier version on l2gen (msl12 9.6.0-T2022.20 (Jul 28 2022 18:14:29)) produced more expected results. The subset input took ~4 mins to process and the full granule took ~17 mins.
Regards,
Andrew
I tested on both RHEL8.9 and Ubuntu 22.04 with similar results.
The following example demonstrates the problem. The OLCI L1B granule is first subset and the subsetted L1B file is used as input to l2gen:
l1bextract_safe_nc -v --north=30.6654098 --south=24.1619781 --east=-79.8336442 --west=-84.6220924 S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3 S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3.FL3
time l2gen ifile=S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3.FL3/xfdumanifest.xml ofile=S3A_OL_1_EFR____20240213T152251.FL3.L2
Processing Rate = 0.413858 scans/sec
real 106m48.199s
user 94m18.375s
sys 12m9.557s
Processing the full granule as follows only took ~20 mins versus ~106 mins when using the subset granule:
time l2gen ifile=/scratch/SAPS/test/speed/S3A_OL_1_EFR____20240213T152551_20240213T152851_20240213T171740_0180_109_068_2520_MAR_O_NR_002.SEN3/xfdumanifest.xml ofile=S3A_OL_1_EFR____20240213T152251.L2
Processing Rate = 3.375413 scans/sec
real 20m12.506s
user 20m3.674s
sys 0m2.717s
An earlier version on l2gen (msl12 9.6.0-T2022.20 (Jul 28 2022 18:14:29)) produced more expected results. The subset input took ~4 mins to process and the full granule took ~17 mins.
Regards,
Andrew
Filters:
-
- Posts: 38
- Joined: Tue Dec 31, 2013 10:29 am America/New_York
Re: l2gen very slow when using output from l1bextract_safe_nc
The problem seems to be related to different values being assigned to ChunkSizes in the l1bextract_safe_nc output netCDF files. Using nccopy to recreated radiance files with updated ChunkSizes gave much better l2gen performance.
I see there's a "TODO: get and set chunksizes" comment in the netcdf_utils.py script. Any chance that's been addressed?
Thanks
Andrew
I see there's a "TODO: get and set chunksizes" comment in the netcdf_utils.py script. Any chance that's been addressed?
Thanks
Andrew
-
- Posts: 396
- Joined: Mon Jun 22, 2020 5:24 pm America/New_York
- Has thanked: 8 times
- Been thanked: 8 times
Re: l2gen very slow when using output from l1bextract_safe_nc
A solution isn't ready yet. This is still being investigated.
-
- Posts: 38
- Joined: Tue Dec 31, 2013 10:29 am America/New_York
Re: l2gen very slow when using output from l1bextract_safe_nc
Thanks for the update.
I implemented the following change in the nccopy_var function in netcdf_utils.py to set chunking when creating the output variable that fixed the problem for me:
# create variable with same name, dimnames, storage format
zlib = srcvar.filters().get('zlib', False)
shuffle = srcvar.filters().get('shuffle', False)
complevel = srcvar.filters().get('complevel', 0)
chunking = srcvar.chunking()
for idx, dimname in enumerate(srcvar.dimensions):
if indices and dimname in indices:
if chunking[idx] > len(indices[dimname]):
chunking[idx] = len(indices[dimname])
dstvar = dstgrp.createVariable(srcvar.name,
srcvar.dtype,
srcvar.dimensions,
zlib=zlib,
shuffle=shuffle,
chunksizes=chunking,
complevel=complevel)
Andrew
I implemented the following change in the nccopy_var function in netcdf_utils.py to set chunking when creating the output variable that fixed the problem for me:
# create variable with same name, dimnames, storage format
zlib = srcvar.filters().get('zlib', False)
shuffle = srcvar.filters().get('shuffle', False)
complevel = srcvar.filters().get('complevel', 0)
chunking = srcvar.chunking()
for idx, dimname in enumerate(srcvar.dimensions):
if indices and dimname in indices:
if chunking[idx] > len(indices[dimname]):
chunking[idx] = len(indices[dimname])
dstvar = dstgrp.createVariable(srcvar.name,
srcvar.dtype,
srcvar.dimensions,
zlib=zlib,
shuffle=shuffle,
chunksizes=chunking,
complevel=complevel)
Andrew