The following code requires
- The requests package which can be installed from the command line or terminal using the following command:
Code: Select all
pip install requests
- Python 3.6 or newer. To check Python's version, simply launch the Python application and Python's version will be printed on the first line.
- An Earthdata logon token. If you don't have a Earthdata login token, you can get one herehere.
- Copy the example script. Write the example script text to a file ending in .py. Advanced users may want to edit the code to add additional features and advanced filtering.
- Select a top level URL. Search https://asdc.larc.nasa.gov/data/ for a directory or file that contains the data you want to download.
- Find your Earthdata login token. You can generate a token or copy an existing token by going to https://urs.earthdata.nasa.gov/ and selecting the "Generate Token" option form the top menu. Please note the token's expiration date, after that date a new token must be generated.
- Add your URL and token to the script.
- Run the script.
Code: Select all
import requests
from pathlib import Path
url=<the URL of the file you want to download>
token=<your token>
header={"Authorization": f"Bearer {token}"}
response = requests.get(url, headers=header)
content = response.content
file_name = url.split('/')[-1]
data_path = Path(file_name)
data_path.write_bytes(content)
- Copy the example script. Write the example script text to a file ending in .py. Advanced users may want to edit the code to add additional features and advanced filtering.
- Select a top level URL. Search https://asdc.larc.nasa.gov/data/ for a directory that contains the data you want to download.
- Find your Earthdata login token. You can generate a token or copy an existing token by going to https://urs.earthdata.nasa.gov/ and selecting the "Generate Token" option form the top menu. Please note the token's expiration date, after that date a new token must be generated.
- Run the script.
- If downloading multiple files, when prompted provide your top level URL, you can enter "test", and it will download a small dataset, allowing you to ensure your code is working.
- If downloading multiple files, hen prompted, paste your token into the window. Please note, much like a website's password entry field, the value you enter is hidden. Simply paste the key in and hit enter. Some systems will clear your password once you paste the token, so you may have to copy it again if you want to rerun the code.
- Wait for the script to find links. The script will check each page in the hierarchy and collect a list of file links. Depending on the number of pages that must be checked, this may take a while.
- Let the script know if you want to remove existing files. If you've downloaded the data before and only want updates, you may want to leave the exiting files. The script will only re-download previously downloaded data if the file sizes have changed.
- Verify you have enough space for the download. The script will use the file headers to get the size of all files to be downloaded and provided the total size of the download in MB. Ensure you have enough drive space for your download.
- Download the files. The script will download the any files that are not already in the data folder, or are in the data folder, but have changed in size.
Code: Select all
from getpass import getpass
from http.client import NOT_FOUND, UNAUTHORIZED
from pathlib import Path
from requests import Session
url_to_path = lambda url, output_dir: output_dir.joinpath(url.split('/')[-1])
print("Welcome to the ASDC Download Script!\nThis script downloads data from https://asdc.larc.nasa.gov/data/")
with Session() as session:
# get login
url = input("Enter the top level URL (you can also enter 'test' to download a small dataset)\n\turl: ")
if url == "test":
url = "https://asdc.larc.nasa.gov/data/AJAX/CH2O_1/"
token = getpass("Enter your token, if you don't have a token, get one from https://urs.earthdata.nasa.gov/\n\ttoken: ")
if not token:
print("Token cannot be blank, exiting.")
exit()
session.headers = {"Authorization": f"Bearer {token}"}\
# verify login works
response = session.get(url)
if not response.ok:
if response.status_code == UNAUTHORIZED:
print(f"Earthdata Login reponded with Unauthorized, did you enter a valid token?")
exit()
if response.status_code == NOT_FOUND:
print(f"The top level URL does not exist, select a URL within https://asdc.larc.nasa.gov/data/")
exit()
output_dir = Path('data')
# get a list of all urls
pages = [url]
file_urls = []
print("Getting file links")
for i, page in enumerate(pages):
print(f"Checking {page} for links", end="\r", flush=True)
response = session.get(page)
if not response.ok:
if response.status_code == NOT_FOUND:
print(f"The follwoing page was not found: {url}")
else:
print(f"Recieved {response.reason} status for {page}")
content = response.content.decode('utf-8')
if '<table id="indexlist">' not in content:
print(f"Data table not found for {page}")
continue
table_content = content.split('<table id="indexlist">')[-1].split('</table>')[0]
hrefs = {part.split('"')[0] for i, part in enumerate(table_content.split('href="')) if i}
for href in hrefs:
if href.endswith('/'):
pages.append(page + href)
else:
file_urls.append(page + href)
if not file_urls:
print("No files found, exiting.")
exit()
# offer to remove existing data
output_dir.mkdir(exist_ok=True)
if output_dir.exists() and len(list(output_dir.iterdir())):
if input(f"There's already data in {output_dir.absolute()}, \n\tRemove it? [y/n]: ") == "y":
for path in output_dir.iterdir():
path.unlink()
# get a list of new files (ignore already download files if the size is the same)
print("Getting size")
total_size = 0
file_count = len(file_urls)
new_files = []
for i, url in enumerate(file_urls):
print(f"Getting size for file {i+1} of {file_count}", end="\r", flush=True)
_response = session.head(url)
if url_to_path(url, output_dir).exists() and _response.headers.get('content-length') != url_to_path(url, output_dir).stat().st_size:
continue
total_size += int(_response.headers.get('content-length'))
new_files.append(url)
if not new_files:
print("No new files, exiting.")
exit()
if input(f"Found {len(new_files)} files totaling {total_size // 1024**2} MB in {output_dir.absolute()}.\n\tDownload [y/n]: ") == 'n':
exit()
# downlaod files
for i, url in enumerate(new_files):
print(f"Downloading file {i+1} of {len(file_urls)}", end="\r", flush=True)
_response = session.get(url)
with url_to_path(url, output_dir).open('wb') as file:
file.write(_response._content)
print("\nDownload Complete")