HLSL30 - Cannot load certain tif into memory

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
mitchbon
Posts: 7
Joined: Wed Feb 22, 2023 4:40 pm America/New_York
Answers: 1
Been thanked: 1 time

HLSL30 - Cannot load certain tif into memory

by mitchbon » Fri Mar 10, 2023 5:05 pm America/New_York

Hello,

I have a set of code that builds an HLS data cube over a given location and time-interval, then runs further processing. As part of that, this data cube needs to be loaded into memory with persist().

The code works well in most scenarios. However, I have found that one particular tif is unable to be accessed, triggering an error and preventing the HLS data cube from being loaded into memory.

The relevant portions of the error:

CPLE_OpenFailedError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif' does not exist in the file system, and is not recognized as a supported dataset name.

RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif': RasterioIOError("'/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif' does not exist in the file system, and is not recognized as a supported dataset name.")

Note that it is always the same tif link that is provided by the error, and retrying the function triggers the same error.

Any help would be appreciated!
by mitchbon » Thu Feb 08, 2024 9:28 am America/New_York
Just chiming in here...

I have found that restarting my Python kernel has resolved this error, although I do not know if it works all the time.

From my testing, I think the issue may be related to the means by which imagery is processed from STAC into memory. Depending on how you do any compositing, chunking etc. sometimes your code may be looking for an image or band that has already been scrubbed from memory and cannot find it. This can get complicated, since you may need to delve into dask graphs (if using something like stackstac that uses dask on the back end) to understand what data the process is trying to access (its not always consistent). I still get these types of errors from time to time, but have noticed the frequency changes a lot depending on things like how you set up your chunks etc.

And just setting a high retry parameter has not worked for me. Generally it will fail forever until I restart the kernel.
Go to full post

Tags:

LP DAAC - dgolon
User Services
User Services
Posts: 288
Joined: Mon Sep 30, 2019 10:00 am America/New_York
Answers: 0
Has thanked: 16 times
Been thanked: 2 times
Contact:

Re: HLSL30 - Cannot load certain tif into memory

by LP DAAC - dgolon » Mon Mar 13, 2023 11:44 am America/New_York

Hi @mitchbon I've passed your question along to our developers. We will reach back on on this post once we have an answer or if we need additional information from you. Thanks!
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.

Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.

mitchbon
Posts: 7
Joined: Wed Feb 22, 2023 4:40 pm America/New_York
Answers: 1
Been thanked: 1 time

Re: HLSL30 - Cannot load certain tif into memory

by mitchbon » Mon Mar 13, 2023 12:27 pm America/New_York

I ran the same code again today and it was able to successfully find and load that problematic tif into memory. I guess that means this is some sort of server issue?
Last edited by mitchbon on Mon Mar 13, 2023 12:28 pm America/New_York, edited 1 time in total.

LP DAAC - jwilson
User Services
User Services
Posts: 268
Joined: Mon Sep 30, 2019 12:39 pm America/New_York
Answers: 1
Has thanked: 9 times

Re: HLSL30 - Cannot load certain tif into memory

by LP DAAC - jwilson » Wed Mar 15, 2023 12:40 pm America/New_York

@mitchbon

If you are unable to load EarthData assets from https URLs via vsicurl and rasterio in Python, there are 3 common solutions:

1. Ensure you have a properly configured .netrc file. Instructions can be found [here|https://github.com/nasa/LPDAAC-Data-Resources/blob/main/notebooks/Earthdata_Authentication__Create_netrc_file.ipynb].

2.Ensure that you have set the necessary gdal configurations to access data using vsicurl. The below code can be used to set these:
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')

3. Sometimes cached information can cause an issue, to resolve this, try restarting your Python Kernel.

cbourque17
Posts: 6
Joined: Mon Jan 15, 2024 11:05 am America/New_York
Answers: 0

Re: HLSL30 - Cannot load certain tif into memory

by cbourque17 » Mon Jan 15, 2024 11:08 am America/New_York

Hi, I have the same error, even when following @mitchbon instructions.
I have GDAL set to the configuration, a netcr file in my home directory and have tried restarting the kernel multiple times... Someone else in my lab also tried running the same code and had that error as well when its time to load the tif to memory. Thoughts?

victorohden
Posts: 3
Joined: Wed Feb 07, 2024 2:06 pm America/New_York
Answers: 0

Re: HLSL30 - Cannot load certain tif into memory

by victorohden » Wed Feb 07, 2024 3:53 pm America/New_York

Hi
Any update?
I also got that error. This happens when I try open the tif with rioxarray.

code:
chunk_size = dict(band=1, x=512, y=512)
rxr.open_rasterio('https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14QMG.2021306T170002.v2.0/HLS.L30.T14QMG.2021306T170002.v2.0.B05.tif',chunks=chunk_size, masked=True).squeeze('band', drop=True)

Error:
RasterioIOError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14QMG.2021306T170002.v2.0/HLS.L30.T14QMG.2021306T170002.v2.0.B05.tif' not recognized as a supported file format.

Seems a authentication problem because I am able to open the tiff file when downloaded manually. I have the .netrc (from https://github.com/nasa/LPDAAC-Data-Resources/blob/main/python/how-tos/Earthdata_Authentication__Create_netrc_file.ipynb)

Not sure what else I can do.

LP DAAC - dgolon
User Services
User Services
Posts: 288
Joined: Mon Sep 30, 2019 10:00 am America/New_York
Answers: 0
Has thanked: 16 times
Been thanked: 2 times
Contact:

Re: HLSL30 - Cannot load certain tif into memory

by LP DAAC - dgolon » Wed Feb 07, 2024 4:37 pm America/New_York

Hi @victorohden We posted this response on another thread related to this issue but I'll add it here. Please let me know if this does not resolve your issue:

We've updated the HLS_Tutorial.ipynb (https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb) to include a try loop around reading of the HLS files via https. There seems to be some sort of network issue that can cause that error, and it appears to be independent of the code being executed in the notebook. We would recommend adding a retry loop around reading of the files within your script. An example from the updated tutorial:
1) Use vsicurl to load the data directly into memory (be patient, may take a few seconds)
chunk_size = dict(band=1, x=512, y=512) # Tiles have 1 band and are divided into 512x512 pixel chunks

2) Sometimes a vsi curl error occurs so we need to retry if it does
max_retries = 10
for e in evi_band_links:
print(e)
# Try Loop
for _i in range(max_retries):
try:
# Open and build datasets
if e.rsplit('.', 2)[-2] == evi_bands[0]: # NIR index
nir = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
nir.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
elif e.rsplit('.', 2)[-2] == evi_bands[1]: # red index
red = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
red.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
elif e.rsplit('.', 2)[-2] == evi_bands[2]: # blue index
blue = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
blue.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
break # Break out of the retry loop
except Exception as ex:
print(f"vsi curl error: {ex}. Retrying...")
else:
print(f"Failed to process {e} after {max_retries} retries. Please check to see you're authenticated with earthaccess.")
print("The COGs have been loaded into memory!")
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.

Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.

victorohden
Posts: 3
Joined: Wed Feb 07, 2024 2:06 pm America/New_York
Answers: 0

Re: HLSL30 - Cannot load certain tif into memory

by victorohden » Wed Feb 07, 2024 5:13 pm America/New_York

Thank you for the quick feedback!
I am using the notebook that you mentioned. https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb.
I tried a bigger max_retries and got the same error.

mitchbon
Posts: 7
Joined: Wed Feb 22, 2023 4:40 pm America/New_York
Answers: 1
Been thanked: 1 time

Re: HLSL30 - Cannot load certain tif into memory

by mitchbon » Thu Feb 08, 2024 9:28 am America/New_York

Just chiming in here...

I have found that restarting my Python kernel has resolved this error, although I do not know if it works all the time.

From my testing, I think the issue may be related to the means by which imagery is processed from STAC into memory. Depending on how you do any compositing, chunking etc. sometimes your code may be looking for an image or band that has already been scrubbed from memory and cannot find it. This can get complicated, since you may need to delve into dask graphs (if using something like stackstac that uses dask on the back end) to understand what data the process is trying to access (its not always consistent). I still get these types of errors from time to time, but have noticed the frequency changes a lot depending on things like how you set up your chunks etc.

And just setting a high retry parameter has not worked for me. Generally it will fail forever until I restart the kernel.
Last edited by mitchbon on Thu Feb 08, 2024 9:31 am America/New_York, edited 3 times in total.

victorohden
Posts: 3
Joined: Wed Feb 07, 2024 2:06 pm America/New_York
Answers: 0

Re: HLSL30 - Cannot load certain tif into memory

by victorohden » Thu Feb 08, 2024 11:52 am America/New_York

Thanks all for the answers!
Now it is working here. I'm not sure what happened, but I restarted my machine and created a repo from scratch (again).
- Download the lpdaac_windows.yml file: https://github.com/nasa/LPDAAC-Data-Resources/blob/main/setup/lpdaac_windows.yml
- create an env using "conda env create -f "path_to\lpdaac_windows.yml"

After that, it is all set and the stuff is running smoothly.

Cheers.

Post Reply