New Data format keyword request (ORNL DAAC and Metadata Stewardship)
-
- Posts: 10
- Joined: Mon May 17, 2021 7:51 pm America/New_York
- Location: Oak Ridge, TN
- Contact:
New Data format keyword request (ORNL DAAC and Metadata Stewardship)
Please add "Multiple" as a data format in the GCMD Data Format keyword hierarchy (details and use cases below). This request is from the ORNL DAAC and the Metadata Stewardship team to resolve https://bugs.earthdata.nasa.gov/browse/ECSE-1845.
Alternate Label(s): none
Definition: The collection or granule contains files of more than one type.
Reference: None
Use case: The Data Format hierarchy does include "Not Provided". However, there are cases where collections and granules contain multiple files covering different formats. It is not always practical to break this out into a list of separate formats, and it is useful to be able to display "Multiple" for the file formats on a dataset landing page. This data format is the best solution identified for the ORNL DAAC to curate total dataset size in CMR in support of Web Unification.
Alternate Label(s): none
Definition: The collection or granule contains files of more than one type.
Reference: None
Use case: The Data Format hierarchy does include "Not Provided". However, there are cases where collections and granules contain multiple files covering different formats. It is not always practical to break this out into a list of separate formats, and it is useful to be able to display "Multiple" for the file formats on a dataset landing page. This data format is the best solution identified for the ORNL DAAC to curate total dataset size in CMR in support of Web Unification.
Filters:
-
- Subject Matter Expert
- Posts: 70
- Joined: Fri Feb 03, 2023 1:02 pm America/New_York
- Been thanked: 1 time
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
Thank you for submitting your keyword request. Staff will review and contact you as soon as they are available.
Ticket started to track request: https://bugs.earthdata.nasa.gov/browse/CMRSCI-4802
Ticket started to track request: https://bugs.earthdata.nasa.gov/browse/CMRSCI-4802
-
- User Services
- Posts: 369
- Joined: Tue Dec 03, 2019 3:26 pm America/New_York
- Been thanked: 7 times
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
@ORNL - bewilson_ornl I understand your use case, but how will using 'Multiple' help the user know the format of the data? It also does not help when a user wants to do a CMR search for a specific data format by provider. How many data formats are you referring too? The field 'ArchiveAndDistributionInformation' is repeatable so you could document multiple formats.
-
- Posts: 10
- Joined: Mon May 17, 2021 7:51 pm America/New_York
- Location: Oak Ridge, TN
- Contact:
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
@tstevens -- A fair concern. However, there is already a "not supplied" option. For a significant number of existing datasets, the multiple different formats have not been curated, but are in the User Guide. The underlying problem is that Data Format is a required element in the ArchiveAndDistributionInformation element for UMM-C (and UMM-G). ArchiveAndDistributionInformation is the only place to provide information about the total collection size -- which is valuable information, particularly for field data. ORNL DAAC dataset sizes span 9 orders of magnitude. Under Web Unification, populating ArchiveAndDistributionInformation is the only way to get dataset size on dataset landing pages. We could do this by using "Not Supplied" for Data Format, but our sense was that indicating multiple formats was preferable. Those data aren't searchable by data format today.
-
- Posts: 257
- Joined: Tue Dec 03, 2019 3:31 pm America/New_York
- Has thanked: 1 time
- Been thanked: 5 times
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
@ORNL - bewilson_ornl I understand your case. Is the issue that there are too many formats to list in the collection? If we do add the keyword, I would suggest the more descriptive "Multiple Formats". Other options are adding an enumeration "Multiple Formats" to "Format Type" or adding Multiple Formats to the "Format Description". Just throwing it out there.
-
- Posts: 10
- Joined: Mon May 17, 2021 7:51 pm America/New_York
- Location: Oak Ridge, TN
- Contact:
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
There is a way to list multiple formats explicitly in the ArchiveAndDistributionInformation CMR element, but that also requires calculating the data volume for each format. For a substantial number of historic datasets, we do not have the specific formats for all of the files -- we can infer them and/or we can dig them out of textual descriptions in the User Guide. However, that's work that is not possible at this time. For that matter, it may not be value-added for all of the incoming heterogeneous field data for the ORNL DAAC. That file format information is generally in the User Guide (apart from the data volumes). That information in the User Guide isn't machine actionable, but the nature of the data we work with often requires that a user go through the User Guide in order to understand which files are which.
"Multiple Formats" could make sense, except that the key in the ArchiveAndDistributionInformation CMR element (which is what's involved for the ORNL DAAC in this case" is called DataFormat. So, to me, it made sense to have DataFormat="Multiple" in the metadata, which would result in a presentation on a landing page of "Data format: Multiple". Thoughts???
"Multiple Formats" could make sense, except that the key in the ArchiveAndDistributionInformation CMR element (which is what's involved for the ORNL DAAC in this case" is called DataFormat. So, to me, it made sense to have DataFormat="Multiple" in the metadata, which would result in a presentation on a landing page of "Data format: Multiple". Thoughts???
-
- User Services
- Posts: 369
- Joined: Tue Dec 03, 2019 3:26 pm America/New_York
- Been thanked: 7 times
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
@ORNL - bewilson_ornl Given the use case and that this is a requirement for the dataset landing pages, I think adding "Multiple" as a data format would be ok. However, are there plans to go back later and do cleanup to add the specific data formats or will the intention always be for the user go through the User Guide in order to understand which files are which? Thanks.
-
- Posts: 10
- Joined: Mon May 17, 2021 7:51 pm America/New_York
- Location: Oak Ridge, TN
- Contact:
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
@tstevens -- I think it is highly unlikely that historic collections will be reprocessed in order to sort out file formats. That is a substantial effort requiring a lot of manual curation, and likely of very limited value.
-
- User Services
- Posts: 369
- Joined: Tue Dec 03, 2019 3:26 pm America/New_York
- Been thanked: 7 times
Re: New Data format keyword request (ORNL DAAC and Metadata Stewardship)
@ORNL - bewilson_ornl The new data format keyword has been added and will be published on Friday May 9 during our next keyword release. Please review: