Bulk downloading files using python

Please enter here to ask a question about any NASA Science related topics!
Post Reply
danit_twito
Posts: 4
Joined: Tue Oct 26, 2021 9:50 am America/New_York

Bulk downloading files using python

by danit_twito » Tue Oct 26, 2021 10:06 am America/New_York

Hi,
I am trying to find a way to download all the daily mapped Chlorophyll-a files between 2002 and 2020 using Python (Linux is not installed on my computer).
Could someone please attach a Python script for Earthdata bulk downloading?

Many thanks!

Tags:

gnwiii
Posts: 640
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2

Re: Bulk downloading files using python

by gnwiii » Tue Oct 26, 2021 5:45 pm America/New_York

See: https://oceancolor.gsfc.nasa.gov/data/download_methods/ for a python script. There are also ways to get lists of files to download, file checksums, etc.

danit_twito
Posts: 4
Joined: Tue Oct 26, 2021 9:50 am America/New_York

Re: Bulk downloading files using python

by danit_twito » Wed Oct 27, 2021 5:27 am America/New_York

Until now, I have used a Python script on the web to create data access.
for the URL variable I create this For loop ;
for year in range (2002, 2021):
for day in range(1,366):
url = "https://oceandata.sci.gsfc.nasa.gov/ob/getfile/A{
{:03d}.L3m_DAY_CHL_chlor_a_4km.nc".format(year, day).

After I did that, do I need to create a netrc file? (Is it necessary?)
If so, what should I do next?

Is there a simpler way to download the data? could you explain to me how to do it?

I apologize for asking so many questions, but I need detailed instructors. programming and dealing with this data are new to me.

Many thanks

OB.DAAC - amscott
User Services
User Services
Posts: 96
Joined: Mon Jun 22, 2020 5:24 pm America/New_York
Answers: 1

Re: Bulk downloading files using python

by OB.DAAC - amscott » Wed Oct 27, 2021 7:11 am America/New_York

All data downloads now require users to identify themselves by their Earthdata login credentials. Using the .netrc file eliminates the need to enter your credentials for each file so it is recommended to create one. You can use the .netrc file in conjunction with your python script. If you prefer to avoid creating a .netrc file, you may generate an appkey, which is also discussed using the links provided.
--
Alicia

danit_twito
Posts: 4
Joined: Tue Oct 26, 2021 9:50 am America/New_York

Re: Bulk downloading files using python

by danit_twito » Wed Oct 27, 2021 9:06 am America/New_York

Thanks for your response, Alicia.
I'm following those commands - viewtopic.php?f=7&t=2328&p=8245&hilit=C ... 2430#p8245
using Cygwin.
If for example, I want to download a thousand files, what should I assign in the URL variable? I saw there is an option to extract a text file with all the URLs, can I use it in some manner for this task?

Thanks

gnwiii
Posts: 640
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2

Re: Bulk downloading files using python

by gnwiii » Wed Oct 27, 2021 11:22 am America/New_York

The post you are using is not for oceancolor. You should follow the instructions for Ocean Color Data (https://oceancolor.gsfc.nasa.gov/data/download_methods/). The python script you can download from that page does work with Cygwin (using the Cygwin terminal with bash or zsh), native windows, or the WSL linux command-line. You do need to pay attention to the required versions of Python and the Python requests library.

Note that even experienced users encountered problems when the US Gov. stopped providing http URL's. This step was necessary to ensure that you are connecting to the official site and not some copycat site run by "bad actors". Security is not easy and it is a moving target trying to stay ahead of the "bad actors". It is not unusual for things to stop working until you apply the required security update (new versions of software or certificates).

You may be wishing for a simple GUI download system. There have been attempts to provide this in the past, but they tend to be fragile and don't provide the level of detail about problems that you get with command-line tools, so my colleagues who tried them soon went back to command-line tools.

danit_twito
Posts: 4
Joined: Tue Oct 26, 2021 9:50 am America/New_York

Re: Bulk downloading files using python

by danit_twito » Wed Oct 27, 2021 12:14 pm America/New_York

I will break the problem into chunks:
The first block in Cygwin :

USERNAME=danit_twito
PASSWORD=******
cd C:/NC_files
touch .netrc
echo "machine urs.earthdata.nasa.gov login danit_twito password ******" > .netrc
chmod 0600 .netrc
touch .urs_cookies

So far, the first block seems to be okay?

seconde block:

curl -d "sensor=octs&am;sdate=1996-11-01&edate=1997-01-01&dtype=L3b&addurl=1&results_as_file=1&search=*DAY_CHL*" https://oceandata.sci.gsfc.nasa.gov/api/file_search |grep getfile | cut -d "'" -f 2 | head -1 | xargs -n 1 curl -LJO -n -c ~/.urs_cookies -b ~/.urs_cookies


I copy-paste from the link you attached. I just need to adjust the options in the second block: start/end day, etc. based on the data that I want to use?

gnwiii
Posts: 640
Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2

Re: Bulk downloading files using python

by gnwiii » Wed Oct 27, 2021 2:03 pm America/New_York

As long as curl is working -- go for it.

Curl has worked for me in the past, but was not very reliable. I think older software (curl, and wget before wget2) have logic to deal with older web services which can result in mishandling of issues with modern services, and particularly single-sign-on (SSO) sites like NASA Earthdata. The python script has been more reliable for me, on both home and "corporate" networks.

For mass downloads I have learned to use the available checksum support. There have been times when 10% of the files in a multi-year download were corrupt.

Post Reply