When using the Python download script from here:
https://ladsweb.modaps.eosdis.nasa.gov/tools-and-services/data-download-scripts/#python
To download a file e.g:
"https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/5000/VNP46A1/2014/305/VNP46A1.A2014305.h25v06.001.2019133142702.h5"
The script fails here:
> return result.decode('utf-8') if isinstance(result, bytes) else result
With the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
The underlying curl. I am using Python 3.11.3. This is successful, but seems to return data that isn't decodable:
['curl', '--fail', '-sS', '-L', '-b session', '--get', 'https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/5000/VNP46A1/2014/305/VNP46A1.A2014305.h25v06.001.2019133142702.h5.csv', '-H', 'user-agent: tis/download.py_1.0--3.11.3 (main, Apr 7 2023, 21:05:46) [Clang 14.0.0 (clang-1400.0.29.202)]', '-H', 'Authorization: Bearer XXX-TOKEN-XXX']
---------------------------
What I've tried:
- checking the encoding using chardet which returns encoding: 'Windows-1252'. When using 'Windows-1252' to decode, the result is:
"UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5216: character maps to <undefined> "
This implies multiple types of encoding in the result.
- I've also tried using a less restrictive encoding 'latin-1', but the result of this is simply 'None'.
I need to download these files for every day of the year for multiple years, so not having a suitable download script is currently slowing/blocking my research.
Python download script utf-8 UnicodeDecodeError VNP46A1
-
- Posts: 2
- Joined: Wed May 17, 2023 3:46 pm America/New_York
-
- User Services
- Posts: 263
- Joined: Mon Sep 30, 2019 8:33 am America/New_York
- Has thanked: 1 time
Re: Python download script utf-8 UnicodeDecodeError VNP46A1
The issue is that HDF5 is a binary format, not a text format. There is no text encoding that will work to decode it as text (as you are seeing). The script does also read csv or json files from the web site, to get the directory listings, so it needs to be able to read both text and binary data. Unfortunately, encodings are difficult to detect correctly.
Maybe the best way to handle this is, if the filename ends in .hdf or .h5 or .nc (these are the three data formats used by LAADS data providers) then just return the result as a string of bytes (no encoding).
Maybe the best way to handle this is, if the filename ends in .hdf or .h5 or .nc (these are the three data formats used by LAADS data providers) then just return the result as a string of bytes (no encoding).
Regards,
LAADS User Services
To receive news from LAADS DAAC direct to your inbox, email laadsdaac-join@lists.nasa.gov with “subscribe” in the subject line.
LAADS User Services
To receive news from LAADS DAAC direct to your inbox, email laadsdaac-join@lists.nasa.gov with “subscribe” in the subject line.