Python download script utf-8 UnicodeDecodeError VNP46A1
Posted: Wed Jun 07, 2023 6:44 am America/New_York
When using the Python download script from here:
https://ladsweb.modaps.eosdis.nasa.gov/tools-and-services/data-download-scripts/#python
To download a file e.g:
"https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/5000/VNP46A1/2014/305/VNP46A1.A2014305.h25v06.001.2019133142702.h5"
The script fails here:
> return result.decode('utf-8') if isinstance(result, bytes) else result
With the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
The underlying curl. I am using Python 3.11.3. This is successful, but seems to return data that isn't decodable:
['curl', '--fail', '-sS', '-L', '-b session', '--get', 'https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/5000/VNP46A1/2014/305/VNP46A1.A2014305.h25v06.001.2019133142702.h5.csv', '-H', 'user-agent: tis/download.py_1.0--3.11.3 (main, Apr 7 2023, 21:05:46) [Clang 14.0.0 (clang-1400.0.29.202)]', '-H', 'Authorization: Bearer XXX-TOKEN-XXX']
---------------------------
What I've tried:
- checking the encoding using chardet which returns encoding: 'Windows-1252'. When using 'Windows-1252' to decode, the result is:
"UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5216: character maps to <undefined> "
This implies multiple types of encoding in the result.
- I've also tried using a less restrictive encoding 'latin-1', but the result of this is simply 'None'.
I need to download these files for every day of the year for multiple years, so not having a suitable download script is currently slowing/blocking my research.
https://ladsweb.modaps.eosdis.nasa.gov/tools-and-services/data-download-scripts/#python
To download a file e.g:
"https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/5000/VNP46A1/2014/305/VNP46A1.A2014305.h25v06.001.2019133142702.h5"
The script fails here:
> return result.decode('utf-8') if isinstance(result, bytes) else result
With the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
The underlying curl. I am using Python 3.11.3. This is successful, but seems to return data that isn't decodable:
['curl', '--fail', '-sS', '-L', '-b session', '--get', 'https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/5000/VNP46A1/2014/305/VNP46A1.A2014305.h25v06.001.2019133142702.h5.csv', '-H', 'user-agent: tis/download.py_1.0--3.11.3 (main, Apr 7 2023, 21:05:46) [Clang 14.0.0 (clang-1400.0.29.202)]', '-H', 'Authorization: Bearer XXX-TOKEN-XXX']
---------------------------
What I've tried:
- checking the encoding using chardet which returns encoding: 'Windows-1252'. When using 'Windows-1252' to decode, the result is:
"UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5216: character maps to <undefined> "
This implies multiple types of encoding in the result.
- I've also tried using a less restrictive encoding 'latin-1', but the result of this is simply 'None'.
I need to download these files for every day of the year for multiple years, so not having a suitable download script is currently slowing/blocking my research.