S3 PODAAC Access Denied

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
trevorskaggs
Posts: 1
Joined: Fri Jun 23, 2023 11:07 am America/New_York
Answers: 0

S3 PODAAC Access Denied

by trevorskaggs » Fri Jun 23, 2023 11:18 am America/New_York

Trying to access the GRACE Product found here: https://podaac.jpl.nasa.gov/dataset/TELLUS_GRAC_L3_CSR_RL06_LND_v04

Following along with the S3 Documentation *(https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME), I have built out a `get_credentials` method that passes in credentials, works through the redirects and ultimately yields the AWS credentials from the credentials endpoint (https://archive.podaac.earthdata.nasa.gov/s3credentials)

After grabbing the credentials, I setup a Boto3 s3 client with the following code:
creds = get_credentias()
client = boto3.client(
's3',
aws_access_key_id=creds["accessKeyId"],
aws_secret_access_key=creds["secretAccessKey"],
aws_session_token=creds["sessionToken"],
region_name="us-west-2"
)

Session is established, but when trying to list the object in S3 I get this error: `ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied`

Command: client.list_objects_v2(Bucket='podaac-ops-cumulus-protected', Prefix="TELLUS_GRAC_L3_CSR_RL06_LND_v04/")

Had a few coworkers try the same approach/configuration with the same result, so it does not appear to be an issue with my specific account. I am able to download the tiles from the Earthsearch UI, but need systematic access for lambda function calls.

Tags:

PODAAC - wenhaoli
Subject Matter Expert
Subject Matter Expert
Posts: 34
Joined: Tue May 11, 2021 12:58 pm America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by PODAAC - wenhaoli » Tue Jun 27, 2023 1:13 pm America/New_York

Hi,

PO.DAAC has provided some helps to access the collections within Earthdata cloud (AWS). You could find out the useful information from PO.DAAC Cookbook at https://podaac.github.io/tutorials/. Please take a look at the "How to Access Data Directly in Cloud (netCDF)" section (https://podaac.github.io/tutorials/external/Direct_S3_Access_NetCDF.html) and see if you could get helps from there.

Thanks,

broodj3ham
Posts: 1
Joined: Fri Aug 25, 2023 5:34 am America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by broodj3ham » Fri Aug 25, 2023 5:35 am America/New_York

I'm having the exact same issue unfortunately. Did you find any solution?

When I follow the instructions here (https://podaac.github.io/tutorials/external/Direct_S3_Access_NetCDF.html) I can get the credentials successfully (from the 'https://archive.podaac.earthdata.nasa.gov/s3credentials' endpoint), however in the next part when using s3fs, the returned list `ssh_files` is empty (no error though...).

When I use boto3 directly, like the OP, I get a Access Denied error.
Last edited by broodj3ham on Fri Aug 25, 2023 6:09 am America/New_York, edited 1 time in total.

PODAAC - jmcnelis
Subject Matter Expert
Subject Matter Expert
Posts: 16
Joined: Tue Mar 14, 2023 1:41 pm America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by PODAAC - jmcnelis » Tue Sep 05, 2023 2:05 pm America/New_York

Thanks again, broodj3ham and trevorskaggs. I can't offer any explanation for why access using boto3 is not working from your EC2 instances running in us-west-2.

We are looking into the issue. In the meantime, can you please try using this exact code snippet? This assumes you have the .netrc file set up appropriately on the same host:
import os
import s3fs
import requests
import xarray as xr

def begin_s3_direct_access(url: str="https://archive.podaac.earthdata.nasa.gov/s3credentials"):
response = requests.get(url).json()
return s3fs.S3FileSystem(key=response['accessKeyId'],
secret=response['secretAccessKey'],
token=response['sessionToken'],
client_kwargs={'region_name':'us-west-2'})

fs = begin_s3_direct_access()

short_name = "TELLUS_GRAC_L3_CSR_RL06_LND_v04"

files = sorted(fs.glob(os.path.join("podaac-ops-cumulus-protected/", short_name, "*.nc")))
Thanks again to you both for bringing the issue to our attention.

Jack

alexfore
Posts: 1
Joined: Tue Dec 05, 2023 2:26 pm America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by alexfore » Tue Dec 05, 2023 2:27 pm America/New_York

I have exactly the same issue. I can use the s3fs interface to look at the podaac-ops-cumulus-protected bucket, however, I cannot use the AWS CLI interface. However, I also have access to other (non-public, swot) podaac buckets which do not have this issue. This seems like a configuration issue...

davenovelli
Posts: 1
Joined: Wed Jan 31, 2024 5:59 pm America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by davenovelli » Wed Jan 31, 2024 6:02 pm America/New_York

I'm also getting the same error, trying to access the OSTIA-UKMO-L4-GLOB-v2.0 prefix. I'm manually logging in, copying the resulting aws credentials json string into a script, parsing it and attempting to access the bucket/prefix that way.

russelan
Posts: 6
Joined: Tue Aug 30, 2022 10:23 am America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by russelan » Thu Feb 01, 2024 10:46 am America/New_York

Had anyone found a solution to this? I copied the code that Jack posted and I have not had any luck getting it to work yet.

When I use Python 3.12, I am getting the "An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied" from the "files = sorted(fs.glob(os.path.join("podaac-ops-cumulus-protected/", short_name, "*.nc")))" line.

When I use Python 3.6, the files variable come back empty.

I believe I have my Earthdata account all setup in my .netrc as it is working with the "podaac-data-downloader". If I move the .netrc I get a "json.decoder.JSONDecodeError" when I run "response = requests.get(url).json()" so I believe my .netrc is being read in the Python.

egoh
Posts: 1
Joined: Tue Feb 13, 2024 10:21 pm America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by egoh » Tue Feb 13, 2024 10:41 pm America/New_York

Hello, I'm running into the same issue as well. I started from the "Documentation" link (https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME) under the Direct S3-Access option for my data product of interest (https://podaac.jpl.nasa.gov/dataset/MITgcm_LLC4320_Pre-SWOT_JPL_L4_ACC_SMST_v1.0), and wrote the following script:
```
import argparse
import base64
import json
import os
from getpass import getpass

import boto3
import requests
from botocore.exceptions import NoCredentialsError

# Default bucket and region
DEFAULT_BUCKET = "podaac-ops-cumulus-public"
DEFAULT_OBJ_PREFIX = "MITgcm_LLC4320_Pre-SWOT_JPL_L4_ACC_SMST_v1.0"
DEFAULT_REGION = "us-west-2"
S3_CREDENTIALS_ENDPOINT = "https://archive.podaac.earthdata.nasa.gov/s3credentials"


def main(bucket_name, object_prefix, local_download_path):
creds = retrieve_credentials()
# Create an S3 client with the temporary credentials and default region
s3_client = boto3.client(
"s3",
region_name=DEFAULT_REGION,
aws_access_key_id=creds["accessKeyId"],
aws_secret_access_key=creds["secretAccessKey"],
aws_session_token=creds["sessionToken"],
)

try:
# List objects within a specified bucket and prefix
response = s3_client.list_objects(Bucket=bucket_name, Prefix=object_prefix)
if "Contents" in response:
for item in response["Contents"]:
key = item["Key"]
download_file(
s3_client,
bucket_name,
key,
f"{local_download_path}/{key.split('/')[-1]}",
)
else:
print("No objects found.")
except Exception as e:
print(f"Error listing objects in bucket {bucket_name}: {e}")


def retrieve_credentials():
"""Authenticate with EarthData and return a set of temporary S3 credentials."""
username = os.environ.get("EARTHDATA_USERNAME", input("Enter EDL username: "))
password = os.environ.get("EARTHDATA_PASSWORD", getpass("Enter EDL password: "))

edl_login_url = get_edl_login_url()
s3_cred_url = authenticate_with_edl(
encode_edl_creds(username, password), edl_login_url
)
return get_temporary_aws_credentials(s3_cred_url)


def get_edl_login_url():
"""Make initial GET request to S3 credentials service and get EDL login form URL."""
response = requests.get(S3_CREDENTIALS_ENDPOINT, allow_redirects=False)
response.raise_for_status()
return response.headers["location"]


def encode_edl_creds(username, password):
"""Encode EDL credentials for authorization."""
return base64.b64encode(f"{username}:{password}".encode("ascii")).decode("ascii")


def authenticate_with_edl(encoded_credentials, edl_login_url):
"""Make POST request to EDL login form URL and return the S3 credentials URL."""
response = requests.post(
edl_login_url,
data={"credentials": encoded_credentials},
headers={"Origin": S3_CREDENTIALS_ENDPOINT},
allow_redirects=False,
)
response.raise_for_status()
return response.headers["location"]


def get_temporary_aws_credentials(s3_cred_url):
"""Make authenticated GET request and return temporary AWS credentials."""
print(f"Obtaining S3 access token from {s3_cred_url}")
response = requests.get(s3_cred_url, allow_redirects=False)
final_response = requests.get(
S3_CREDENTIALS_ENDPOINT,
cookies={"accessToken": response.cookies["accessToken"]},
)
final_response.raise_for_status()
return json.loads(final_response.content)


def download_file(s3_client, bucket, key, download_path):
"""Download a file from S3."""
try:
s3_client.download_file(bucket, key, download_path)
print(f"File {key} downloaded to {download_path}")
except NoCredentialsError:
print("Credentials are not available or invalid")
except Exception as e:
print(f"Failed to download {key}: {e}")


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Download files from AWS S3 with temporary credentials"
)
parser.add_argument(
"--bucket",
default=DEFAULT_BUCKET,
help="S3 bucket name, defaults to a specific PODAAC dataset",
)
parser.add_argument(
"--prefix", help="Object prefix for files to download", default=DEFAULT_OBJ_PREFIX
)
parser.add_argument(
"--download-path", required=True, help="Local path to download files"
)
args = parser.parse_args()

main(args.bucket, args.prefix, args.download_path)
```

The following is a sample of my Conda environment:
```
python 3.10.12 hd12c33a_0_cpython conda-forge
boto3 1.29.1 py310h06a4308_0 anaconda
aws-c-auth 0.6.26 h987a71b_2 conda-forge
aws-c-cal 0.5.21 h48707d8_2 conda-forge
aws-c-common 0.8.14 h0b41bf4_0 conda-forge
aws-c-compression 0.2.16 h03acc5a_5 conda-forge
aws-c-event-stream 0.2.20 h00877a2_4 conda-forge
aws-c-http 0.7.6 hf342b9f_0 conda-forge
aws-c-io 0.13.19 h5b20300_3 conda-forge
aws-c-mqtt 0.8.6 hc4349f7_12 conda-forge
aws-c-s3 0.2.7 h909e904_1 conda-forge
aws-c-sdkutils 0.1.9 h03acc5a_0 conda-forge
aws-checksums 0.1.14 h03acc5a_5 conda-forge
aws-crt-cpp 0.19.8 hf7fbfca_12 conda-forge
aws-sdk-cpp 1.10.57 h17c43bd_8 conda-forge
requests 2.31.0 py310h06a4308_0 anaconda
requests-oauthlib 1.3.0 py_0 anaconda
s3transfer 0.7.0 py310h06a4308_0 anaconda
```

russelan
Posts: 6
Joined: Tue Aug 30, 2022 10:23 am America/New_York
Answers: 0

Re: S3 PODAAC Access Denied

by russelan » Wed Feb 14, 2024 9:40 am America/New_York

I wasn't able to get the direct S3 access working so I ended up looking at https://github.com/podaac/data-subscriber/tree/main/subscriber to make a custom implementation of the PO.DAAC Data Subscriber to solve my issue. Maybe you could do something similar?

Post Reply