Hi, I've requested a spatial subsetting of GEDI L2A data using the Harmony API. I am using a single rectangular area of 25x13 Km to generate 44 H5 files. This process can take between 10 minutes and 1 hour. I don't think I'm waiting in queue because the progress bar starts immediately after I submit the request. Why is this so slow? Is the subsetter routine open source? One possibility is to run it in my own machine in AWS so as to have more control of the computational resources.
Here's the main part of my code:
request = Request(
collection = Collection(id=concept_id),
shape = my_site,
temporal = temporal_range
)
task = harmony_client.submit(request)
Harmony API prohibitively slow for GEDI L2A data
Re: Harmony API prohibitively slow for GEDI L2A data
It may be this is much faster now with several recent and significant Harmony updates - though I suspect it is even now closer to the 10 min time, and hopefully only rarely hitting any Harmony backlog. Hitting a Harmony backlog cannot be ruled out, but should be much less common now.
At 10 min (~15 sec ea. file), I’m not sure I would characterize it as too slow - on par with our previous on-premise deployment. At 1 hour, >1m20s each, it is a bit more concerning.
The subsetter source is here: https://github.com/nasa/harmony-trajectory-subsetter. One thing to note is that the entire file has to be downloaded from the archive to a working machine prior to subsetting. For your case, the significantly subsetted region means that a significant percentage of the time (perhaps ~20%) is likely that whole-file downloading. We have not seen specific requirements to optimize this for the cloud, but it is something we have considered.
At 10 min (~15 sec ea. file), I’m not sure I would characterize it as too slow - on par with our previous on-premise deployment. At 1 hour, >1m20s each, it is a bit more concerning.
The subsetter source is here: https://github.com/nasa/harmony-trajectory-subsetter. One thing to note is that the entire file has to be downloaded from the archive to a working machine prior to subsetting. For your case, the significantly subsetted region means that a significant percentage of the time (perhaps ~20%) is likely that whole-file downloading. We have not seen specific requirements to optimize this for the cloud, but it is something we have considered.