I have a workflow that uses HLS (HLSL30.v2.0, HLSS30.v2.0). I access HLS through 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD' using pystac_client and stackstac.
It works, but I have noticed it is slow to process these STACs/COGs into memory (e.g., with compute()) when they need to be accessed locally, such as for plotting. Accessing a temporally long stack can take from 3 minutes (for 1 year) to 30 minutes (for 10 years).
By comparison, I can use the same workflow with S2 L2A from 'https://earth-search.aws.element84.com/v0' and retrieve the same outputs an order of magnitude faster (30 seconds for 1 year to 2 minutes for the full temporal depth).
This difference in access time holds true even for a basic workflow (e.g., https://stackstac.readthedocs.io/en/v0.2.0/basic.html).
Are there faster ways to access HLS that I should make use of?
- User Services
- Posts: 204
- Joined: Mon Sep 30, 2019 10:00 am America/New_York
- Has thanked: 9 times
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Would you be able to share your script? If you have a GitHub repository for your work, please post the link. Otherwise, you can attach the script or notebook here in the forum, or you can send it to LP DAAC User Services (LPDAAC@usgs.gov)
I cannot share the full notebook for my workflow, but I have attached a simple example based on the basic stackstac tutorial that shows the difference in speed between accessing S2 L2A through AWS E84 and HLS S30 from LPCLOUD.
I am not sure how to attach the notebook here on the forum since the file type is not accepted, so I have emailed it to LP DAAC User Services.
That being said, I have seen some moderate speed up in accessing HLS over the last week or two... not sure if any changes have been made! It is still slower than S2 L2A, but not by as much as before.
I was able to run the notebook you shared. As you mentioned, it works but it’s relatively slow. Unfortunately, we don’t know why there’s such a difference in speed, but we will look it and see if something on our side we can do to improve the performance. In the meantime, there are alternatives to using stackstac. I’ve pulled together an example that uses Dask here: https://github.com/nasa/HLS-Data-Resources/blob/main/python/how-tos/Data_Access__Create_HLS_Timeseries_Dask.ipynb.
The example assumes that you have gone through the process of getting a list of HLS URLs and have assign them as a variable (list). I know it doesn’t have the smooth coupling between STAC search results and commands to read in the data, but it is a working alternative.
Thanks for the response and notebook! We access thousands of HLS images at a time, so I am not sure about getting a list like that, but maybe there are other ways we can leverage dask on our end.
Looking forward to hearing if you find anything on your end in regards to the speed difference too.