OCSSW ancillary data server unresponsive

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
liamgumley
Posts: 25
Joined: Tue Aug 09, 2005 12:58 pm America/New_York
Answers: 0

OCSSW ancillary data server unresponsive

by liamgumley » Mon Aug 26, 2024 11:18 am America/New_York

The OCSSW ancillary data server has been unresponsive for more than 48 hours.

Example 1: The "Direct Data Access" link at https://oceancolor.gsfc.nasa.gov/data/find-data/ is unresponsive, i.e.
https://oceandata.sci.gsfc.nasa.gov/directdataaccess/

Example 2: Scripted ancillary data download is unresponsive (it times out) as shown below

[oper@leodbp1 ~]$ time getanc -s 2024200000000 --verbose
ancillary_data.db
Searching database: /home/oper/dbvm/apps/ocssw/var/log/ancillary_data.db

Input file: None
Sensor : None
Start time: 2024-07-18T00:00:00
End time : None

OBPG session started
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 385, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", line 1365, in getresponse
response.begin()
File "/usr/lib64/python3.6/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python3.6/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib64/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/usr/lib64/python3.6/ssl.py", line 1005, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib64/python3.6/ssl.py", line 867, in read
return self._sslobj.read(len, buffer)
File "/usr/lib64/python3.6/ssl.py", line 590, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 307, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='oceandata.sci.gsfc.nasa.gov', port=443): Read timed out. (read timeout=10.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
**response_kw)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 668, in urlopen
**response_kw)
[Previous line repeated 2 more times]
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='oceandata.sci.gsfc.nasa.gov', port=443): Max retries exceeded with url: /api/anc_data_api/?&m=0&s=2024-07-18T00:00:00&e=2024-07-18T00:05:00&missing_tags=1 (Caused by ReadTimeoutError("HTTPSConnectionPool(host='oceandata.sci.gsfc.nasa.gov', port=443): Read timed out. (read timeout=10.0)",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/oper/dbvm/apps/ocssw/bin/getanc", line 149, in <module>
exit(main())
File "/home/oper/dbvm/apps/ocssw/bin/getanc", line 142, in main
g.findweb()
File "/data1/oper/dbvm/apps/ocssw/bin/seadasutils/anc_utils.py", line 511, in findweb
verbose=self.verbose
File "/data1/oper/dbvm/apps/ocssw/bin/seadasutils/ProcUtils.py", line 105, in httpdl
with obpgSession.get(urlStr, stream=True, timeout=timeout, headers=headers) as req:
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 548, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 535, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 648, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='oceandata.sci.gsfc.nasa.gov', port=443): Max retries exceeded with url: /api/anc_data_api/?&m=0&s=2024-07-18T00:00:00&e=2024-07-18T00:05:00&missing_tags=1 (Caused by ReadTimeoutError("HTTPSConnectionPool(host='oceandata.sci.gsfc.nasa.gov', port=443): Read timed out. (read timeout=10.0)",))

real 1m0.497s
user 0m0.131s
sys 0m0.013s

Filters:

oo_processing
Posts: 338
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 10 times
Been thanked: 3 times

Re: OCSSW ancillary data server unresponsive

by oo_processing » Mon Aug 26, 2024 11:42 am America/New_York

Me, too.
Using --timeout=60 didn't solve the issue.

And the running time of getanc and modis_atteph command took ten or hundred times long of the timeout setting to finish or return an error message, in the last 24 hours, I made only about 210 requests.

I tried to post earlier but got a Web Page Blocked ! error.

Yuyuan

OB.DAACx - amscott
Posts: 396
Joined: Mon Jun 22, 2020 5:24 pm America/New_York
Answers: 1
Has thanked: 8 times
Been thanked: 8 times

Re: OCSSW ancillary data server unresponsive

by OB.DAACx - amscott » Mon Aug 26, 2024 4:19 pm America/New_York

Thanks for reporting your findings! The team has begun working on a solution to improve the response times.

oo_processing
Posts: 338
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 10 times
Been thanked: 3 times

Re: OCSSW ancillary data server unresponsive

by oo_processing » Wed Aug 28, 2024 9:28 am America/New_York

I also found freezing connections to the subscription API, the command looked like:

Code: Select all

curl --silent -L --connect-timeout 5 --retry 5 --retry-max-time 40 -d subID=1067&sdate=2024-08-26 00:00:00&edate=2024-08-28 23:59:59&results_as_file=1 https://oceandata.sci.gsfc.nasa.gov/api/file_search
The command could freeze when my program called it, I ran it again in console and got the results returned, but my program still freeze. I had to kill the curl process by pid, then my program could move on.

I wonder if this and the anc query issue came across. While I am asking our IT guys to check if it's our Network issue, please let me know what do you find on your side.

Thanks
Yuyuan

liamgumley
Posts: 25
Joined: Tue Aug 09, 2005 12:58 pm America/New_York
Answers: 0

Re: OCSSW ancillary data server unresponsive

by liamgumley » Thu Aug 29, 2024 4:55 pm America/New_York

To be clear, the problem is not that the ancillary data server is slow. The problem is that the ancillary data server is unresponsive; it does not return any data. It just times out.

OB.DAACx - SeanBailey
Posts: 1519
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times

Re: OCSSW ancillary data server unresponsive

by OB.DAACx - SeanBailey » Fri Aug 30, 2024 7:52 am America/New_York

Liam,

Yes, there is an issue that we've not yet rooted out. While not an ideal solution, you can increase the timeout period with getanc (e.g. --timeout=90; the default is 30). It should never take 90 seconds, or even 30, or even 1, but it has been. We'll keep roto-rootering until we find and clear the clog, but I have no idea how long it will take...it's a frustrating one...
Oh and to respond to Yuyuan, it's not just the ancillary service...

Sean

oo_processing
Posts: 338
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 10 times
Been thanked: 3 times

Re: OCSSW ancillary data server unresponsive

by oo_processing » Fri Aug 30, 2024 4:03 pm America/New_York

oo_processing wrote: Wed Aug 28, 2024 9:28 am America/New_York I also found freezing connections to the subscription API, the command looked like:

Code: Select all

curl --silent -L --connect-timeout 5 --retry 5 --retry-max-time 40 -d subID=1067&sdate=2024-08-26 00:00:00&edate=2024-08-28 23:59:59&results_as_file=1 https://oceandata.sci.gsfc.nasa.gov/api/file_search
The command could freeze when my program called it, I ran it again in console and got the results returned, but my program still freeze. I had to kill the curl process by pid, then my program could move on.

I wonder if this and the anc query issue came across. While I am asking our IT guys to check if it's our Network issue, please let me know what do you find on your side.

Thanks
Yuyuan
For subscription, my temporary solution is to force curl stop at 60 seconds. Because it's random, so if I don't grab this subID now, I would probably get it at the next hour.

Code: Select all

curl --max-time 60 ...
It was even worse when I didn't use the 'curl -L' option, I don't know if it meant something or just a random thing.

Yuyuan

oo_processing
Posts: 338
Joined: Wed Apr 06, 2005 12:11 pm America/New_York
Answers: 0
Has thanked: 10 times
Been thanked: 3 times

Re: OCSSW ancillary data server unresponsive

by oo_processing » Fri Sep 20, 2024 3:51 pm America/New_York

Looks like this has been fixed. Just out of curiosity, did you find out the cause?

Yuyuan

OB.DAACx - SeanBailey
Posts: 1519
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times

Re: OCSSW ancillary data server unresponsive

by OB.DAACx - SeanBailey » Sat Sep 21, 2024 1:28 pm America/New_York

Yes, it has been fixed :)
There are a lot of under the hood changes we've made - not all related to this problem, but allowed us to get past this problem. We believe the issue boiled down to a bad index on a database table.

Sean

Post Reply