Intermittent error downloading HLS data, HTTP error 403.

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
matushodul
Posts: 1
Joined: Fri May 17, 2024 2:28 pm America/New_York
Answers: 0

Intermittent error downloading HLS data, HTTP error 403.

by matushodul » Wed May 22, 2024 10:07 am America/New_York

Hi LPDAAC team,

I seem to be running into a frequent but intermittent error while attempting to download HLS imagery using pystac_client. I am able to build the stack just fine, but when downloading it using .to_numpy(), I recieve a HTTP response code 403, see below for a complete example of the error message. I have the download function on a retry loop, which occasionally resolves the issue after a few retries, but more often it cycles through retries for hours until I end the process.

The error occurs in various ROI locations, on various networks. The exact .tif on which the download gets hung up is different every time. I can manually download the tif by clicking on the link in the error message. Have played around with various GDAL settings, including these:

GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_MAX_RETRY=5,
GDAL_HTTP_RETRY_DELAY=15,
CPL_VSIL_CURL_ALLOWED_EXTENSIONS='TIF',
GDAL_HTTP_UNSAFESSL='YES',
CPL_DEBUG='ON',
CPL_CURL_VERBOSE='ON',
GDAL_HTTP_TCP_KEEPALIVE='YES',
GDAL_HTTP_TCP_KEEPIDLE=120

Here is an example of the complete error message:
---------------------------------------------------------------------------
CPLE_HttpResponseError Traceback (most recent call last)
File rasterio/_base.pyx:308, in rasterio._base.DatasetBase.__init__()

File rasterio/_base.pyx:219, in rasterio._base.open_dataset()

File rasterio/_err.pyx:221, in rasterio._err.exc_wrap_pointer()

CPLE_HttpResponseError: HTTP response code: 403

During handling of the above exception, another exception occurred:

RasterioIOError Traceback (most recent call last)
File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/stackstac/rio_reader.py:327, in AutoParallelRioReader._open(self)
326 try:
--> 327 ds = SelfCleaningDatasetReader(self.url, sharing=False)
328 except Exception as e:

File rasterio/_base.pyx:310, in rasterio._base.DatasetBase.__init__()

RasterioIOError: HTTP response code: 403

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
Cell In[8], line 5
2 with open('saved_stacks/stack/epsg32610_0_stack', 'rb') as file:
3 stack_saved = pickle.load(file)
----> 5 stack_saved.to_numpy()

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/xarray/core/dataarray.py:787, in DataArray.to_numpy(self)
776 def to_numpy(self) -> np.ndarray:
777 """
778 Coerces wrapped data to numpy and returns a numpy.ndarray.
779
(...)
785 DataArray.data
786 """
--> 787 return self.variable.to_numpy()

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/xarray/namedarray/core.py:829, in NamedArray.to_numpy(self)
827 """Coerces wrapped data to numpy and returns a numpy.ndarray"""
828 # TODO an entrypoint so array libraries can choose coercion method?
--> 829 return to_numpy(self._data)

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/xarray/namedarray/pycompat.py:111, in to_numpy(data, **kwargs)
109 if is_chunked_array(data):
110 chunkmanager = get_chunked_array_type(data)
--> 111 data, *_ = chunkmanager.compute(data, **kwargs)
112 if isinstance(data, array_type("cupy")):
113 data = data.get()

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/xarray/namedarray/daskmanager.py:86, in DaskManager.compute(self, *data, **kwargs)
81 def compute(
82 self, *data: Any, **kwargs: Any
83 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
84 from dask.array import compute
---> 86 return compute(*data, **kwargs)

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/dask/base.py:661, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
658 postcomputes.append(x.__dask_postcompute__())
660 with shorten_traceback():
--> 661 results = schedule(dsk, keys, **kwargs)
663 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/stackstac/to_dask.py:189, in fetch_raster_window(reader_table, slices, dtype, fill_value)
182 # Only read if the window we're fetching actually overlaps with the asset
183 if windows.intersect(current_window, asset_window):
184 # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
185 # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
186 # would end up copied to even more threads.
187
188 # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 189 data = reader.read(current_window)
191 if all_empty:
192 # Turn `output` from a broadcast-trick array to a real array, so it's writeable
193 if (
194 np.isnan(data)
195 if np.isnan(fill_value)
196 else np.equal(data, fill_value)
197 ).all():
198 # Unless the data we just read is all empty anyway

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/stackstac/rio_reader.py:385, in AutoParallelRioReader.read(self, window, **kwargs)
384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385 reader = self.dataset
386 try:
387 result = reader.read(
388 window=window,
389 out_dtype=self.dtype,
(...)
393 **kwargs,
394 )

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/stackstac/rio_reader.py:381, in AutoParallelRioReader.dataset(self)
379 with self._dataset_lock:
380 if self._dataset is None:
--> 381 self._dataset = self._open()
382 return self._dataset

File ~/miniconda3/envs/decaf/lib/python3.9/site-packages/stackstac/rio_reader.py:336, in AutoParallelRioReader._open(self)
331 warnings.warn(msg)
332 return NodataReader(
333 dtype=self.dtype, fill_value=self.fill_value
334 )
--> 336 raise RuntimeError(msg) from e
337 if ds.count != 1:
338 ds.close()

RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T10UDU.2021208T191911.v2.0/HLS.S30.T10UDU.2021208T191911.v2.0.B03.tif': RasterioIOError('HTTP response code: 403')

Filters:

LP DAAC - dgolon
User Services
User Services
Posts: 324
Joined: Mon Sep 30, 2019 10:00 am America/New_York
Answers: 0
Has thanked: 24 times
Been thanked: 4 times
Contact:

Re: Intermittent error downloading HLS data, HTTP error 403.

by LP DAAC - dgolon » Wed May 22, 2024 3:03 pm America/New_York

Hi @matushodul Thanks for bringing this to our attention. Our developers are looking into it, and we will report back when we have an answer. Thanks -- Danielle
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.

Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.

LP DAAC - dgolon
User Services
User Services
Posts: 324
Joined: Mon Sep 30, 2019 10:00 am America/New_York
Answers: 0
Has thanked: 24 times
Been thanked: 4 times
Contact:

Re: Intermittent error downloading HLS data, HTTP error 403.

by LP DAAC - dgolon » Thu May 23, 2024 9:38 am America/New_York

Hello @matushodul Could you please send a copy of the code block that is failing to lpdaac@usgs.gov? We'd like to take a look. Please reference this post in your email. Thanks- Danielle
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.

Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.

Post Reply