Conversation
|
@lavanya3k I tested on the following Concept IDs: C3173447237-NSIDC_CPRD And it pulled the latest version in 1.18.4! Great work! Just to confirm, this does cross-validate against Schema version 1.18.4/latest version available? (It's not just adjusting the schema specification value to the latest, right?) Also, can we fix the output message here so it says Let @slesaad test this too! |
|
@fb0023 - Thanks for testing the PR. The above fix will retrieve the latest version of the collection ID (e.g., 1.18.4) and validate it against umm-c 1.18.4 in pyQuARC. I'm not sure if this addresses the CMR validation completely, which was requested by the ESDIS team. I am going to let @slesaad and you take over, and let me know if anything is missing with the request. |
pyQuARC/main.py
Outdated
| def _get_latest_version(self, concept_id): | ||
| """ | ||
| Fetches the latest revision version for a given concept_id from CMR | ||
|
|
||
| Args: | ||
| concept_id (str): The concept ID to query | ||
|
|
||
| Returns: | ||
| str: The latest revision number, or None if not found | ||
| """ | ||
| try: | ||
| # Construct the CMR metadata URL for the concept | ||
| url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json" | ||
| headers = get_headers() | ||
| response = requests.get(url, headers=headers) | ||
|
|
||
| if response.status_code == 200: | ||
| # Extract revision-id from response headers | ||
| revision_id = response.headers.get('CMR-Revision-Id') | ||
| return revision_id | ||
| else: | ||
| print(f"Warning: Could not fetch latest version for {concept_id}. Using default.") | ||
| return None | ||
| except Exception as e: | ||
| print(f"Error fetching latest version for {concept_id}: {str(e)}") | ||
| return None | ||
|
|
||
| def _get_collection_version(self, concept_id): | ||
| """ | ||
| Fetch the MetadataSpecification.Version of a collection from CMR. | ||
| Args: | ||
| concept_id (str): The concept ID to query. | ||
|
|
||
| Returns: | ||
| str: The collection's MetadataSpecification.Version, or None if not found. | ||
| """ | ||
| try: | ||
| url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json" | ||
| headers = get_headers() | ||
| response = requests.get(url, headers=headers) | ||
|
|
||
| if response.status_code == 200: | ||
| data = response.json() | ||
| # UMM collections have MetadataSpecification.Version | ||
| version = data.get("MetadataSpecification", {}).get("Version") | ||
| return version | ||
| else: | ||
| print(f"Warning: Could not fetch metadata for {concept_id}.") | ||
| return None | ||
| except Exception as e: | ||
| print(f"Error fetching collection version for {concept_id}: {str(e)}") | ||
| return None | ||
|
|
There was a problem hiding this comment.
The two methods have a lot in common - can you DRY (dont repeat yourself) it? also not sure what's the difference between these two versions and it looks like you're just printing the collection_version, but not using it elsewhere, do you even need that?
There was a problem hiding this comment.
Agreed. Please take a look at the updated code changes.
| @@ -194,24 +260,19 @@ | |||
| ) | |||
| continue | |||
| content = content.encode() | |||
| cmr_response = self._validate_with_cmr(concept_id, content) | |||
| validation_errors, pyquarc_errors = checker.run(content) | |||
| self.errors.append( | |||
| { | |||
| "concept_id": concept_id, | |||
| "errors": validation_errors, | |||
| "cmr_validation": { | |||
| "errors": cmr_response.json().get("errors", []), | |||
| # TODO: show warnings | |||
| "warnings": cmr_response.json().get("warnings", []) | |||
| }, | |||
There was a problem hiding this comment.
why did we remove cmr validation?
There was a problem hiding this comment.
Do you suggest keeping the cmr_validation. Please feel free to edit the code.
There was a problem hiding this comment.
We have a ticket that's asking for cmr validation - why remove?
pyQuARC/main.py
Outdated
| info_type (str): Type of information to fetch. | ||
| Options: "revision" or "metadata_version". | ||
|
|
There was a problem hiding this comment.
i don't think this is implemented??
There was a problem hiding this comment.
It is only the metadata version, and not the revision.
pyQuARC/main.py
Outdated
|
|
||
| Returns: | ||
| str: The collection's MetadataSpecification.Version, or None if not found. | ||
| str: The requested info (revision ID or MetadataSpecification.Version), or None if not found. |
There was a problem hiding this comment.
| str: The requested info (revision ID or MetadataSpecification.Version), or None if not found. | |
| dict: {"revision_id": str | None, "metadata_version": str | None } A dict of Revision ID and Metadata Version of the collection |
pyQuARC/main.py
Outdated
| if response.status_code != 200: | ||
| print(f"Warning: Could not fetch data for {concept_id}. Status: {response.status_code}") | ||
| return {"revision_id": None, "metadata_version": None} | ||
|
|
||
| data = response.json() if response.content else {} | ||
| return { | ||
| "revision_id": response.headers.get("CMR-Revision-Id"), | ||
| "metadata_version": data.get("MetadataSpecification", {}).get("Version"), | ||
| } | ||
|
|
||
| except Exception as e: | ||
| print(f"Error fetching collection version for {concept_id}: {str(e)}") | ||
| return None | ||
| # Unified error handling — return dict even on failure | ||
| print(f"Error fetching collection info for {concept_id}: {str(e)}") |
There was a problem hiding this comment.
Rewrite this method with something like this:
failure_return_value = {"revision_id": None, "metadata_version": None}
try:
url = f"{self.cmr_host}/search/concepts/{concept_id}.umm_json"
headers = get_headers()
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json() if response.content else {}
return {
"revision_id": response.headers.get("CMR-Revision-Id"),
"metadata_version": data.get("MetadataSpecification", {}).get("Version"),
}
except Exception as e:
# Unified error handling — return dict even on failure
print(f"Error fetching collection info for {concept_id}: {str(e)}")
return failure_return_value
pyQuARC/main.py
Outdated
| if version_to_use: | ||
| print(f"Using latest revision {version_to_use} for {concept_id}") | ||
| if metadata_version: | ||
| print(f"Collection {concept_id} schema version: {metadata_version}") |
There was a problem hiding this comment.
Let's remove these print statements



Description of the code changes: During the pyQuARC run, we want to ensure that the collection ID retrieved from the CMR query uses the latest version of the schema (e.g., umm-c). There were instances in which multiple versions of the collection ID existed in the CMR, particularly CDDIS. With this code fix, pyQuARC runs for the latest or recent version of the collection.
Example: C1000000003-CDDIS (umm-c).
Expected output: Schema version is displayed at the top of the results