feat: add CMIP6-to-CMIP7 Data Request variable mappings#530
feat: add CMIP6-to-CMIP7 Data Request variable mappings#530lewisjared merged 15 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds bundled CMIP6→CMIP7 Data Request (DReq) variable mappings and updates Climate REF’s CMIP7 conversion + ESMValTool provider/diagnostics so CMIP7 datasets can be handled via OR-logic data requirements and correct CMIP7 DRS/filename conventions.
Changes:
- Introduces a frozen
DReqVariableMappingmodel and loads a bundledcmip6_cmip7_variable_map.jsonat import time for branding/realm/compound-name lookups. - Adds/updates conversion and filename/path generation to align with CMIP7 (MIP-DRS7) and updates CMIP7 conversion caching behavior.
- Updates ESMValTool diagnostics/recipe/config handling to accept either CMIP6 or CMIP7 inputs, plus provider setup to install pinned ESMValTool/ESMValCore via pip URLs.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/extract-data-request-mappings.py | New script to download/filter DReq export and generate the bundled mapping JSON. |
| scripts/create-cmip7-datasets.py | Writes CMIP7-style filenames when converting sample datasets. |
| packages/climate-ref-esmvaltool/tests/unit/diagnostics/test_base.py | Updates expectations for the generated ESMValTool config (projects/search settings). |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/recipe.py | Adds CMIP7 facet mapping and pins ESMValTool/ESMValCore git URLs. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/base.py | Adds CMIP6/CMIP7 selector helper + rewrites ESMValTool config to include CMIP7 local templates. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/zec.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically for recipe updates. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/tcre.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/tcr.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/sea_ice_sensitivity.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/sea_ice_area_basic.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/regional_historical_changes.py | Adds CMIP7 requirements + CMIP7 ESGF test cases using CMIP7Request. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/example.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/enso.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/ecs.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/cloud_scatterplots.py | Generalizes requirements to CMIP6+CMIP7; suptitle now uses project. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/cloud_radiative_effects.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_drivers_for_fire.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_at_global_warming_levels.py | Adds CMIP7 alternative requirements + CMIP7-specific grouping/matching facets in recipe update. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/init.py | Switches provider setup to install ESMValTool/ESMValCore via pip_packages. |
| packages/climate-ref-core/tests/unit/test_providers.py | Adds unit test coverage for pip_packages installation behavior. |
| packages/climate-ref-core/tests/unit/test_cmip6_to_cmip7.py | Reworks tests around DReq-backed branding/realm/compound-name lookups + serialization. |
| packages/climate-ref-core/tests/unit/esgf/test_cmip7.py | Updates CMIP7 conversion tests for new filename behavior. |
| packages/climate-ref-core/src/climate_ref_core/providers.py | Replaces single dev install URL with a list of pip_packages installed post-conda-create. |
| packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py | Generates CMIP7-style output filename for cached conversions. |
| packages/climate-ref-core/src/climate_ref_core/data/cmip6_cmip7_variable_map.json | Adds bundled subset of DReq mappings shipped with the package. |
| packages/climate-ref-core/src/climate_ref_core/cmip6_to_cmip7.py | Loads bundled DReq mappings; updates branding/realm/compound-name logic; adds CMIP7 filename/path helpers. |
| changelog/519.feature.md | Changelog entry for CMIP7 support via OR-logic requirements. |
| .vscode/settings.json | Editor setting update for Python REPL smart send. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/climate-ref-core/src/climate_ref_core/data/cmip6_cmip7_variable_map.json
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
…ppings Replace raw dict usage with a frozen attrs class for type-safe serialisation/deserialisation of Data Request variable mappings. The class is used both in the extract script (to_dict for JSON output) and at load time (from_dict when reading the bundled JSON).
- Replace __file__ with stable relative path in extract script metadata - Fix bundled JSON description containing absolute path - Validate branding suffix format before splitting in extract script - Fix docstring to match raise-on-duplicate behavior - Move cache check before xr.open_dataset in CMIP7 converter
52c7275 to
e327bfa
Compare
The function was using cmip6_path.name for the output filename instead of generating a proper CMIP7 filename. Import and use create_cmip7_filename to produce correct CMIP7 DRS filenames.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix docstring grammar in extract script - Add 60s timeout to urllib.request.urlopen - Add assert_not_called checks for mock_open/mock_convert in cache test
…nstruction _convert_file_to_cmip7 now looks up the DReq entry using table_id and variable_id to inject branding_suffix and region into the facets before calling create_cmip7_filename. This prevents empty branding components in the generated filenames. Tests no longer mock create_cmip7_filename, instead providing table_id in the facets so real filename construction is exercised.
For variables where out_name differs from variable_id (e.g. tasmax -> tas), the filename and DRS path now correctly use the CMIP7 out_name. The variable_id attribute in the dataset stays as the CMIP6 identity. - create_cmip7_filename and create_cmip7_path prefer out_name over variable_id (with fallback) - convert_cmip6_to_cmip7_attrs sets out_name and branded_variable from the DReq entry - _convert_file_to_cmip7 injects out_name during DReq enrichment - Added end-to-end tests for tasmax filename generation
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| str(cmip7_facets.get("frequency", "mon")), | ||
| str(cmip7_facets.get("variable_id", "tas")), | ||
| str(cmip7_facets.get("grid_label", "gn")), |
There was a problem hiding this comment.
output_file is now generated from facets via create_cmip7_filename(cmip7_facets) but no time_range is passed. For multi-file datasets split by time (typical on ESGF), every slice for the same variable/experiment will map to the same CMIP7 filename and collide/overwrite in the cache directory. Consider extracting the time range from cmip6_path.name (or from the dataset’s time coordinate) and passing it through to create_cmip7_filename, or otherwise incorporating the original CMIP6 timerange suffix to keep filenames unique per file.
- Make _get_dreq_entry a public API (get_dreq_entry) for cross-module use - Use out_name in DRS path construction in _convert_file_to_cmip7 - Use DReq region instead of hardcoded 'glb' in convert_cmip6_to_cmip7_attrs - Fix pytest parametrize ids to use pytest.param with explicit id strings - Add tests for out_name != variable_id (tasmax/tasmin) and non-glb region (ImonAnt)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py:71
- The inline comment describing the CMIP7 DRS path still says the variable component is
{variable_id}, but the code now usesout_name(and the corecreate_cmip7_pathusesout_nametoo). Update the comment to avoid documenting the wrong facet and confusing future changes.
# Build CMIP7 DRS path
# CMIP7 DRS: {activity_id}/{institution_id}/{source_id}/{experiment_id}/
# {variant_label}/{frequency}/{variable_id}/{grid_label}/{version}
# Ensure all facet values are strings (some may be integers from metadata)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ments * origin/dr-mappings: fix: remove redundant information from breaking changelog entry for Data Request API docs: add breaking changelog entry for DReq API changes fix: address third round of PR review comments fix: use out_name from DReq for CMIP7 filenames and paths fix: enrich cmip7_facets with DReq branding_suffix before filename construction chore: exclude setting chnage fix: address second round of PR review comments fix(core): use create_cmip7_filename in _convert_file_to_cmip7 fix: address PR review comments docs: add changelog entry for PR #530 refactor(core): update documentation and remove unused CMIP7 name mapping functions feat(core): add DReqVariableMapping attrs class for CMIP6-to-CMIP7 mappings # Conflicts: # packages/climate-ref-core/src/climate_ref_core/cmip6_to_cmip7.py # packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py
Add format_cmip7_time_range() to format dataset time coordinates as YYYYMM-YYYYMM for monthly data and None for fx, per the timeRangeDD spec in CMIP7 Global Attributes V1.0. Update create-cmip7-datasets.py to generate proper CMIP7 filenames with time ranges instead of reusing CMIP6 filenames.
… dr-mappings * 'dr-mappings' of github.com:Climate-REF/climate-ref: fix: apply ignore_datasets config in solver regression tests fix: use prefix-only replacement in strip_path_prefix fix: address PR review comments for solve helpers docs: add ESGF catalog and solver regression testing to developer guide docs: add changelog entry for PR #529 feat: add esgf_data_catalog fixture and solver regression baselines feat: Add an esgf catalog with ~100k entries fix: set mix_stderr attribute directly for Click 8.3+ compatibility fix: remove deprecated mix_stderr parameter from CliRunner feat: add CMIP7 catalog support in load_solve_catalog function feat: add solve helpers for solver regression testing
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Formatted time range string, or ``None`` for fixed-frequency data | ||
| or datasets without a time coordinate. | ||
| """ | ||
| if frequency == "fx" or "time" not in ds or len(ds["time"]) == 0: |
There was a problem hiding this comment.
The condition len(ds["time"]) == 0 will raise a TypeError if the time coordinate is a scalar (0-dimensional array) rather than empty. While this is an edge case, xarray allows scalar time coordinates for single-timestep datasets. Consider using ds["time"].size == 0 instead, which safely handles both array-like and scalar coordinates.
| if frequency == "fx" or "time" not in ds or len(ds["time"]) == 0: | |
| if frequency == "fx" or "time" not in ds or ds["time"].size == 0: |
Description
Add structured CMIP6-to-CMIP7 variable mappings sourced from the CMIP7 Data Request (DReq).
In the data request there are unique mappings of cmip6_compound_name ({table_id}.{variable_id}) to cmip7_compound_name. This is now the source of truth for mapping to branded variables.
Key changes:
extract-data-request-mappings.pyscript: Downloads the DReq release export, extracts variable mappings filtered to mon/fx tables and REF provider variables, and writes the bundled JSON.cmip6_cmip7_variable_map.json): Pre-extracted subset of DReq mappings shipped with climate-ref-core.Checklist
Please confirm that this pull request has done the following:
changelog/