revision_date vs 'updated' in CMR granule search?

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
earthengine_urs
Posts: 72
Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time

revision_date vs 'updated' in CMR granule search?

by earthengine_urs » Tue Aug 17, 2021 8:38 pm America/New_York

Hi,

To find the recent MODIS granules, I'm trying to use CMR searches to list all the MODIS granules for a dataset past a certain production date. This would be easiest if the results are sorted by the production date, but this does not seem to be trivial.

I can use revision_date in temporal filters and as a sort key, but the returned entries don't have the revision_date field - they have the 'updated' field. These two dates do not have the same semantics - eg, in https://cmr.earthdata.nasa.gov/search/g ... ision_date the revision date is 20:08:25, which is past the 'updated' value of the entry (20:08:24)

Suggestions welcome.

Thanks,
Simon

Tags:

ASDC - ingridgs
Subject Matter Expert
Subject Matter Expert
Posts: 142
Joined: Fri Apr 23, 2021 9:14 am America/New_York
Answers: 1
Has thanked: 17 times
Been thanked: 7 times

Re: revision_date vs 'updated' in CMR granule search?

by ASDC - ingridgs » Thu Sep 30, 2021 2:35 pm America/New_York

In order to search for granules in a range of ProductionDate values, use the production_date parameter. See: https://cmr.earthdata.nasa.gov/search/s ... ction-date
E.g., to search for granules in your target collection with production_date in the range since 2020-10-16T20:08:25Z :
https://cmr.earthdata.nasa.gov/search/g ... ision_date

Note that I have replaced your use of concept_id with collection_concept_id which is more appropriate for this granule search.
Also note that I have used the temporal search syntax of "start," to indicate a search "since" the start date-time value. You may, of course, use the explicit "start, end" range as you have it in your example as well.
https://cmr.earthdata.nasa.gov/search/s ... e-searches

FYI: if you are looking for the meta-metadata revision_date field, specify umm_json as your result type.
E.g.,
https://cmr.earthdata.nasa.gov/search/g ... ision_date

Finally, just an additional FYI reminder, in case it's not clear, revision_date and updated refer to the date-time the record was updated in the CMR metadata database.

ASDC - ingridgs
Subject Matter Expert
Subject Matter Expert
Posts: 142
Joined: Fri Apr 23, 2021 9:14 am America/New_York
Answers: 1
Has thanked: 17 times
Been thanked: 7 times

Re: revision_date vs 'updated' in CMR granule search?

by ASDC - ingridgs » Fri Oct 01, 2021 2:21 pm America/New_York

Some followup questions:

1. What is the difference between concept_id and collection_concept_id and why is the latter more appropriate?

2. Is there a limit on the number of parallel CMR queries we can send you? We can sometimes generate hundreds of simultaneous queries unless I add some throttling.

ASDC - ingridgs
Subject Matter Expert
Subject Matter Expert
Posts: 142
Joined: Fri Apr 23, 2021 9:14 am America/New_York
Answers: 1
Has thanked: 17 times
Been thanked: 7 times

Re: revision_date vs 'updated' in CMR granule search?

by ASDC - ingridgs » Fri Oct 01, 2021 2:22 pm America/New_York

"Concept ID" is the unique identified assigned, by the CMR, to each of its holdings, Collections, Granules, etc. By convention, the ID indicates the type of concept by the first character: 'C' - Collection, 'G' - Granule, etc. So, in your case, a granule search, constrained to a specific CMR Collection, would more accurately specify the 'collection_concept_id' of the search space, as no granule would, itself, have a concept id of 'C*'. The CMR assumes your intent, however, and interprets the indicated concept_id as collection_concept_id, based on the naming convention. It's simply more accurate, and less prone to possible confusion or error, to scope the granule query to a collection via 'collection_concept_id'.

See: https://cmr.earthdata.nasa.gov/search/s ... h/api.html , Find all granules for a collection.

No, there is no defined limit on the concurrent number of queries initiated from a single client source. We want to be flexible to support the client's needs, while monitoring the system for any issues.
The system is designed and provisioned in order to accommodate the need for clients to define their request rate as required, subject to overall performance considerations. Naturally there are finite resources available to the platform and services, and performance and stability of the entire system may be degraded for all users if the system were to be placed under "excessive" load for long periods of time.

How many concurrent threads are wanting to be able to use? You said "hundreds"...can you be more specific?
What is your expected or measured rate of overall queries per sec/min/hour?
Do you have an acceptable error rate?
Do you have an acceptable search latency per request (average and/or 95th percentile)?

Do you supply a Client-ID in the header of your search requests? While the Client ID is not required, it is helpful and we encourage its use. If you haven't done so, please reference the https://wiki.earthdata.nasa.gov/display ... -Client-Id which describes its use and usefulness:
Client-Id is an additional header that allows the client to specify their name. Client Partners are strongly encouraged to use this header for the following reasons:

Helps the CMR operations team monitor query performance per client
Aids the CMR operations team in identifying clients who are attempting to contact them for assistance with a request.
Facilitates NASA in collecting information on how much traffic flows through a client provider and what kind of data interests their users.

Note this value is provided as part of the header for each request.
E.g.,
curl -v -H "Client-Id: Client Partner Name" -i https://cmr.uat.earthdata.nasa.gov/search/collections

If you are already providing the Client-Id, I can use it to examine your search history to quantify the volume and error-rate of use.

ASDC - ingridgs
Subject Matter Expert
Subject Matter Expert
Posts: 142
Joined: Fri Apr 23, 2021 9:14 am America/New_York
Answers: 1
Has thanked: 17 times
Been thanked: 7 times

Re: revision_date vs 'updated' in CMR granule search?

by ASDC - ingridgs » Fri Oct 01, 2021 2:22 pm America/New_York

Followup question: I plan to use either production or revision dates as a temporal filter to make sure I retrieve all records from the CMR. Eg, if previously I have read all the entries up to date X, my job's next run will ask for all entries between dates X and Y, after which I will save Y as the new date that we read all the entries for (for a particular collection).

But I can't know that for sure - it's possible that CMR will later add more entries with either production or revision dates earlier than Y. I can mitigate that by always looking for entries earlier than Y by, e.g., 1 day. Or 1 week?

What would you recommend as the safest approach? Is production or revision date more appropriate here? What offset from Y would be sufficient? Or maybe I should do something else entirely?

ASDC - ingridgs
Subject Matter Expert
Subject Matter Expert
Posts: 142
Joined: Fri Apr 23, 2021 9:14 am America/New_York
Answers: 1
Has thanked: 17 times
Been thanked: 7 times

Re: revision_date vs 'updated' in CMR granule search?

by ASDC - ingridgs » Fri Oct 01, 2021 2:23 pm America/New_York

by the CMR itself, to describe the metadata record - therefore it will always move forward in time. To iteratively query for updates you can safely move your start-time forward to your previous end-time.

The production date, on the other hand, is set by the provider as product metadata describing the holding and is part of the metadata record. This value could certainly be prior to your previous search interval since there is no constraint on the value and is solely dependant upon the needs of the data provider. The provider could update their holdings with metadata with earlier production date values, e.g., they reprocessed some data, or recovered some previously missing data. The updated records would have “new” revision_date values but “old” production dates. I don't think there is a way to mitigate this with an offset.

It sounds like you want to capture metadata records which have been added or modified since you last searched, in which case revision_date (or updated_since) would be more appropriate than production_date.

Post Reply