Would it be possible to either create a listing of all files and their modification times that can be retrieved in one fetch per dataset?
I am the data catalog maintainer for Google Earth Engine. We fully mirror several ocean color datasets, and we would like to reingest older assets if their files get modified, but it's hard to detect that. The only way to do this is to rescan all of the directories for each product daily, which is brittle and time-consuming.
A solution for this would be to create a single per-product listing that stores all of this metadata (for example, LP DAAC does this for Landsat data).
We are working with NASA EOSDIS to try and make some progress on this issue in general, but meanwhile some temporary solution would be much appreciated.
You can use the file_search api to retrieve such a listing. Using your POC daily mapped files as an example:
wget --post-data="search=A20*L3m_DAY_POC_poc_4km.nc&dtype=L3m&sensor=aqua&format=json&std_only=1" https://oceandata.sci.gsfc.nasa.gov/api/file_search -O poc-daily.json
Yes, dumping the entire archive might take longer than the timeout window for the API...but I hate static files as I have to make sure they're updated regularly...the API will always be current.
Since the vast majority of the data are not going to change between reprocessing events (which we announce and you should subscribe to our mailing list to be made aware of these),
you can add the option
wget --post-data="search=A20*nc&dtype=L3m&sensor=aqua&format=json&std_only=1&psdate=2018-10-01" https://oceandata.sci.gsfc.nasa.gov/api/file_search -O l3-daily.json
The 'p' refers to the processed date. The 's' (as in ps) refers to the start range - an equivalent end range also exists. So you can search on a range of dates.See the FAQ on the file_search utility.