HLSL30 - Cannot load certain tif into memory
HLSL30 - Cannot load certain tif into memory
Hello,
I have a set of code that builds an HLS data cube over a given location and time-interval, then runs further processing. As part of that, this data cube needs to be loaded into memory with persist().
The code works well in most scenarios. However, I have found that one particular tif is unable to be accessed, triggering an error and preventing the HLS data cube from being loaded into memory.
The relevant portions of the error:
CPLE_OpenFailedError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif' does not exist in the file system, and is not recognized as a supported dataset name.
RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif': RasterioIOError("'/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif' does not exist in the file system, and is not recognized as a supported dataset name.")
Note that it is always the same tif link that is provided by the error, and retrying the function triggers the same error.
Any help would be appreciated!
I have a set of code that builds an HLS data cube over a given location and time-interval, then runs further processing. As part of that, this data cube needs to be loaded into memory with persist().
The code works well in most scenarios. However, I have found that one particular tif is unable to be accessed, triggering an error and preventing the HLS data cube from being loaded into memory.
The relevant portions of the error:
CPLE_OpenFailedError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif' does not exist in the file system, and is not recognized as a supported dataset name.
RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif': RasterioIOError("'/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T10UDU.2014082T190147.v2.0/HLS.L30.T10UDU.2014082T190147.v2.0.B06.tif' does not exist in the file system, and is not recognized as a supported dataset name.")
Note that it is always the same tif link that is provided by the error, and retrying the function triggers the same error.
Any help would be appreciated!
Just chiming in here...
I have found that restarting my Python kernel has resolved this error, although I do not know if it works all the time.
From my testing, I think the issue may be related to the means by which imagery is processed from STAC into memory. Depending on how you do any compositing, chunking etc. sometimes your code may be looking for an image or band that has already been scrubbed from memory and cannot find it. This can get complicated, since you may need to delve into dask graphs (if using something like stackstac that uses dask on the back end) to understand what data the process is trying to access (its not always consistent). I still get these types of errors from time to time, but have noticed the frequency changes a lot depending on things like how you set up your chunks etc.
And just setting a high retry parameter has not worked for me. Generally it will fail forever until I restart the kernel.
Go to full postI have found that restarting my Python kernel has resolved this error, although I do not know if it works all the time.
From my testing, I think the issue may be related to the means by which imagery is processed from STAC into memory. Depending on how you do any compositing, chunking etc. sometimes your code may be looking for an image or band that has already been scrubbed from memory and cannot find it. This can get complicated, since you may need to delve into dask graphs (if using something like stackstac that uses dask on the back end) to understand what data the process is trying to access (its not always consistent). I still get these types of errors from time to time, but have noticed the frequency changes a lot depending on things like how you set up your chunks etc.
And just setting a high retry parameter has not worked for me. Generally it will fail forever until I restart the kernel.
Tags:
-
- User Services
- Posts: 288
- Joined: Mon Sep 30, 2019 10:00 am America/New_York
- Has thanked: 16 times
- Been thanked: 2 times
- Contact:
Re: HLSL30 - Cannot load certain tif into memory
Hi @mitchbon I've passed your question along to our developers. We will reach back on on this post once we have an answer or if we need additional information from you. Thanks!
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Re: HLSL30 - Cannot load certain tif into memory
I ran the same code again today and it was able to successfully find and load that problematic tif into memory. I guess that means this is some sort of server issue?
Last edited by mitchbon on Mon Mar 13, 2023 12:28 pm America/New_York, edited 1 time in total.
-
- User Services
- Posts: 268
- Joined: Mon Sep 30, 2019 12:39 pm America/New_York
- Has thanked: 9 times
Re: HLSL30 - Cannot load certain tif into memory
@mitchbon
If you are unable to load EarthData assets from https URLs via vsicurl and rasterio in Python, there are 3 common solutions:
1. Ensure you have a properly configured .netrc file. Instructions can be found [here|https://github.com/nasa/LPDAAC-Data-Resources/blob/main/notebooks/Earthdata_Authentication__Create_netrc_file.ipynb].
2.Ensure that you have set the necessary gdal configurations to access data using vsicurl. The below code can be used to set these:
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')
3. Sometimes cached information can cause an issue, to resolve this, try restarting your Python Kernel.
If you are unable to load EarthData assets from https URLs via vsicurl and rasterio in Python, there are 3 common solutions:
1. Ensure you have a properly configured .netrc file. Instructions can be found [here|https://github.com/nasa/LPDAAC-Data-Resources/blob/main/notebooks/Earthdata_Authentication__Create_netrc_file.ipynb].
2.Ensure that you have set the necessary gdal configurations to access data using vsicurl. The below code can be used to set these:
gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt')
gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt')
gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF')
3. Sometimes cached information can cause an issue, to resolve this, try restarting your Python Kernel.
-
- Posts: 6
- Joined: Mon Jan 15, 2024 11:05 am America/New_York
Re: HLSL30 - Cannot load certain tif into memory
Hi, I have the same error, even when following @mitchbon instructions.
I have GDAL set to the configuration, a netcr file in my home directory and have tried restarting the kernel multiple times... Someone else in my lab also tried running the same code and had that error as well when its time to load the tif to memory. Thoughts?
I have GDAL set to the configuration, a netcr file in my home directory and have tried restarting the kernel multiple times... Someone else in my lab also tried running the same code and had that error as well when its time to load the tif to memory. Thoughts?
-
- Posts: 3
- Joined: Wed Feb 07, 2024 2:06 pm America/New_York
Re: HLSL30 - Cannot load certain tif into memory
Hi
Any update?
I also got that error. This happens when I try open the tif with rioxarray.
code:
chunk_size = dict(band=1, x=512, y=512)
rxr.open_rasterio('https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14QMG.2021306T170002.v2.0/HLS.L30.T14QMG.2021306T170002.v2.0.B05.tif',chunks=chunk_size, masked=True).squeeze('band', drop=True)
Error:
RasterioIOError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14QMG.2021306T170002.v2.0/HLS.L30.T14QMG.2021306T170002.v2.0.B05.tif' not recognized as a supported file format.
Seems a authentication problem because I am able to open the tiff file when downloaded manually. I have the .netrc (from https://github.com/nasa/LPDAAC-Data-Resources/blob/main/python/how-tos/Earthdata_Authentication__Create_netrc_file.ipynb)
Not sure what else I can do.
Any update?
I also got that error. This happens when I try open the tif with rioxarray.
code:
chunk_size = dict(band=1, x=512, y=512)
rxr.open_rasterio('https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14QMG.2021306T170002.v2.0/HLS.L30.T14QMG.2021306T170002.v2.0.B05.tif',chunks=chunk_size, masked=True).squeeze('band', drop=True)
Error:
RasterioIOError: '/vsicurl/https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T14QMG.2021306T170002.v2.0/HLS.L30.T14QMG.2021306T170002.v2.0.B05.tif' not recognized as a supported file format.
Seems a authentication problem because I am able to open the tiff file when downloaded manually. I have the .netrc (from https://github.com/nasa/LPDAAC-Data-Resources/blob/main/python/how-tos/Earthdata_Authentication__Create_netrc_file.ipynb)
Not sure what else I can do.
-
- User Services
- Posts: 288
- Joined: Mon Sep 30, 2019 10:00 am America/New_York
- Has thanked: 16 times
- Been thanked: 2 times
- Contact:
Re: HLSL30 - Cannot load certain tif into memory
Hi @victorohden We posted this response on another thread related to this issue but I'll add it here. Please let me know if this does not resolve your issue:
We've updated the HLS_Tutorial.ipynb (https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb) to include a try loop around reading of the HLS files via https. There seems to be some sort of network issue that can cause that error, and it appears to be independent of the code being executed in the notebook. We would recommend adding a retry loop around reading of the files within your script. An example from the updated tutorial:
1) Use vsicurl to load the data directly into memory (be patient, may take a few seconds)
chunk_size = dict(band=1, x=512, y=512) # Tiles have 1 band and are divided into 512x512 pixel chunks
2) Sometimes a vsi curl error occurs so we need to retry if it does
max_retries = 10
for e in evi_band_links:
print(e)
# Try Loop
for _i in range(max_retries):
try:
# Open and build datasets
if e.rsplit('.', 2)[-2] == evi_bands[0]: # NIR index
nir = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
nir.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
elif e.rsplit('.', 2)[-2] == evi_bands[1]: # red index
red = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
red.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
elif e.rsplit('.', 2)[-2] == evi_bands[2]: # blue index
blue = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
blue.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
break # Break out of the retry loop
except Exception as ex:
print(f"vsi curl error: {ex}. Retrying...")
else:
print(f"Failed to process {e} after {max_retries} retries. Please check to see you're authenticated with earthaccess.")
print("The COGs have been loaded into memory!")
We've updated the HLS_Tutorial.ipynb (https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb) to include a try loop around reading of the HLS files via https. There seems to be some sort of network issue that can cause that error, and it appears to be independent of the code being executed in the notebook. We would recommend adding a retry loop around reading of the files within your script. An example from the updated tutorial:
1) Use vsicurl to load the data directly into memory (be patient, may take a few seconds)
chunk_size = dict(band=1, x=512, y=512) # Tiles have 1 band and are divided into 512x512 pixel chunks
2) Sometimes a vsi curl error occurs so we need to retry if it does
max_retries = 10
for e in evi_band_links:
print(e)
# Try Loop
for _i in range(max_retries):
try:
# Open and build datasets
if e.rsplit('.', 2)[-2] == evi_bands[0]: # NIR index
nir = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
nir.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
elif e.rsplit('.', 2)[-2] == evi_bands[1]: # red index
red = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
red.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
elif e.rsplit('.', 2)[-2] == evi_bands[2]: # blue index
blue = rxr.open_rasterio(e, chunks=chunk_size, masked=True).squeeze('band', drop=True)
blue.attrs['scale_factor'] = 0.0001 # hard coded the scale_factor attribute
break # Break out of the retry loop
except Exception as ex:
print(f"vsi curl error: {ex}. Retrying...")
else:
print(f"Failed to process {e} after {max_retries} retries. Please check to see you're authenticated with earthaccess.")
print("The COGs have been loaded into memory!")
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
-
- Posts: 3
- Joined: Wed Feb 07, 2024 2:06 pm America/New_York
Re: HLSL30 - Cannot load certain tif into memory
Thank you for the quick feedback!
I am using the notebook that you mentioned. https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb.
I tried a bigger max_retries and got the same error.
I am using the notebook that you mentioned. https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb.
I tried a bigger max_retries and got the same error.
Re: HLSL30 - Cannot load certain tif into memory
Just chiming in here...
I have found that restarting my Python kernel has resolved this error, although I do not know if it works all the time.
From my testing, I think the issue may be related to the means by which imagery is processed from STAC into memory. Depending on how you do any compositing, chunking etc. sometimes your code may be looking for an image or band that has already been scrubbed from memory and cannot find it. This can get complicated, since you may need to delve into dask graphs (if using something like stackstac that uses dask on the back end) to understand what data the process is trying to access (its not always consistent). I still get these types of errors from time to time, but have noticed the frequency changes a lot depending on things like how you set up your chunks etc.
And just setting a high retry parameter has not worked for me. Generally it will fail forever until I restart the kernel.
I have found that restarting my Python kernel has resolved this error, although I do not know if it works all the time.
From my testing, I think the issue may be related to the means by which imagery is processed from STAC into memory. Depending on how you do any compositing, chunking etc. sometimes your code may be looking for an image or band that has already been scrubbed from memory and cannot find it. This can get complicated, since you may need to delve into dask graphs (if using something like stackstac that uses dask on the back end) to understand what data the process is trying to access (its not always consistent). I still get these types of errors from time to time, but have noticed the frequency changes a lot depending on things like how you set up your chunks etc.
And just setting a high retry parameter has not worked for me. Generally it will fail forever until I restart the kernel.
Last edited by mitchbon on Thu Feb 08, 2024 9:31 am America/New_York, edited 3 times in total.
-
- Posts: 3
- Joined: Wed Feb 07, 2024 2:06 pm America/New_York
Re: HLSL30 - Cannot load certain tif into memory
Thanks all for the answers!
Now it is working here. I'm not sure what happened, but I restarted my machine and created a repo from scratch (again).
- Download the lpdaac_windows.yml file: https://github.com/nasa/LPDAAC-Data-Resources/blob/main/setup/lpdaac_windows.yml
- create an env using "conda env create -f "path_to\lpdaac_windows.yml"
After that, it is all set and the stuff is running smoothly.
Cheers.
Now it is working here. I'm not sure what happened, but I restarted my machine and created a repo from scratch (again).
- Download the lpdaac_windows.yml file: https://github.com/nasa/LPDAAC-Data-Resources/blob/main/setup/lpdaac_windows.yml
- create an env using "conda env create -f "path_to\lpdaac_windows.yml"
After that, it is all set and the stuff is running smoothly.
Cheers.