Python download script utf-8 UnicodeDecodeError VNP46A1

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
Posts: 2
Joined: Wed May 17, 2023 3:46 pm America/New_York
Answers: 0

Python download script utf-8 UnicodeDecodeError VNP46A1

by jamieallen59 » Wed Jun 07, 2023 6:44 am America/New_York

When using the Python download script from here:

To download a file e.g:

The script fails here:
> return result.decode('utf-8') if isinstance(result, bytes) else result

With the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

The underlying curl. I am using Python 3.11.3. This is successful, but seems to return data that isn't decodable:
['curl', '--fail', '-sS', '-L', '-b session', '--get', '', '-H', 'user-agent: tis/download.py_1.0--3.11.3 (main, Apr 7 2023, 21:05:46) [Clang 14.0.0 (clang-1400.0.29.202)]', '-H', 'Authorization: Bearer XXX-TOKEN-XXX']

What I've tried:
- checking the encoding using chardet which returns encoding: 'Windows-1252'. When using 'Windows-1252' to decode, the result is:
"UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5216: character maps to <undefined> "
This implies multiple types of encoding in the result.
- I've also tried using a less restrictive encoding 'latin-1', but the result of this is simply 'None'.

I need to download these files for every day of the year for multiple years, so not having a suitable download script is currently slowing/blocking my research.


User Services
User Services
Posts: 263
Joined: Mon Sep 30, 2019 8:33 am America/New_York
Answers: 1
Has thanked: 1 time

Re: Python download script utf-8 UnicodeDecodeError VNP46A1

by LAADS_UserServices_M » Wed Jun 07, 2023 11:14 am America/New_York

The issue is that HDF5 is a binary format, not a text format. There is no text encoding that will work to decode it as text (as you are seeing). The script does also read csv or json files from the web site, to get the directory listings, so it needs to be able to read both text and binary data. Unfortunately, encodings are difficult to detect correctly.

Maybe the best way to handle this is, if the filename ends in .hdf or .h5 or .nc (these are the three data formats used by LAADS data providers) then just return the result as a string of bytes (no encoding).
LAADS User Services

To receive news from LAADS DAAC direct to your inbox, email with “subscribe” in the subject line.

Post Reply