This repository contains JSON schema, JSON-LD frames, contexts, and SHACL rule sets for validating CDIF metadata documents.
- Files
- Quick Start
- Validation Workflow
- RO-Crate Conversion and Validation
- Croissant Conversion
- Usage Examples
- Context Requirements
- Authoring Instances Without Prefixes
- Schema Structure
- Flattened Graph Schema
- Troubleshooting
- Composite SHACL Shapes
- SHACL Validation
- DDI-CDI Resolved Schema
- Notes
| File | Description |
|---|---|
CDIFDiscoverySchema.json |
JSON Schema for framed (tree) CDIF discovery profile metadata, generated by generate_validation_schema.py from CDIFDiscoveryProfile resolvedSchema |
CDIFCompleteSchema.json |
JSON Schema for framed (tree) CDIF complete profile metadata (discovery + data description + archive + provenance), generated by generate_validation_schema.py from CDIFcompleteProfile resolvedSchema |
CDIFDataDescriptionSchema.json |
JSON Schema for framed (tree) CDIF data description profile metadata (discovery + data description), generated by generate_validation_schema.py from CDIFDataDescriptionProfile resolvedSchema |
generate_validation_schema.py |
Generates framed-tree validation schemas from building block profile resolved schemas |
CDIF-graph-schema-2026.json |
JSON Schema for flattened JSON-LD graphs (@graph arrays), generated by generate_graph_schema.py |
generate_graph_schema.py |
Generates the graph schema from building block source schemas |
ShaclValidation/generate_shacl_shapes.py |
Generates composite SHACL shapes from building block rules.shacl files |
ShaclValidation/generate_shacl_report.py |
Generates markdown SHACL validation reports with severity grouping |
ShaclValidation/CDIF-Discovery-Shapes.ttl |
Composite SHACL shapes for CDIFDiscovery profile (generated by ShaclValidation/generate_shacl_shapes.py) |
ShaclValidation/CDIF-Complete-Shapes.ttl |
Composite SHACL shapes for CDIFcomplete profile (generated by generate_shacl_shapes.py --profile complete) |
CDIF-frame-2026.jsonld |
JSON-LD frame for 2026 schema |
CDIF-context-2026.jsonld |
JSON-LD context for authoring without namespace prefixes |
FrameAndValidate.py |
Python script for framing and validation |
croissant/ConvertToCroissant.py |
Converts CDIF JSON-LD to Croissant (mlcommons.org/croissant/1.0) format |
validate_building_blocks.py |
Validates building block schemas, SHACL shapes, and examples across the BB source tree |
validate-cdif.bat |
Windows batch script for oXygen XML Editor integration |
batch_validate.py |
Batch validation of CDIF metadata files across multiple file groups (JSON Schema + SHACL) |
validate_conformance.py |
Validates JSON-LD instances against the CDIF profiles they claim conformance to via schema:subjectOf/dcterms:conformsTo. Maps conformsTo URIs to profile/building-block schemas and reports per-file, per-profile results |
geocodes_harvester.py |
Harvests dataset metadata from the EarthCube GeoCodes SPARQL endpoint, extracts original JSON-LD from landing pages, and optionally converts to CDIF core or discovery profile format |
DCAT/dcat_to_cdif.py |
Converts DCAT JSON-LD catalogs to CDIF schema.org format. Maps DCAT/Dublin Core properties to schema.org equivalents per the CDIF DCAT implementation guide. See DCAT/README.md |
| File | Description |
|---|---|
ddi-cdi/ddi-cdi.schema_normative.json |
Full DDI-CDI normative JSON Schema (395 definitions) |
ddi-cdi/cls-InstanceVariable-resolved.json |
Self-contained resolved schema for DDI-CDI InstanceVariable class |
ddi-cdi/cls-InstanceVariable-resolved-README.md |
Documentation for the resolved schema generation process |
| File | Description |
|---|---|
CDIFDiscoverySchema.json |
Hand-maintained discovery schema (superseded by generated version) |
CDIFCompleteSchema.json |
Hand-maintained complete schema (superseded by generated version) |
CDIF-JSONLD-schema-2026.json |
Original all-in-one framed tree schema (superseded by CDIFDiscoverySchema + CDIFCompleteSchema) |
CDIF-JSONLD-schema-schemaprefix.json |
JSON Schema for CDIF Discovery profile metadata with schema: prefixes |
CDIF-frame.jsonld |
JSON-LD frame for legacy schema |
CDIF-context.jsonld |
Legacy JSON-LD context |
pip install PyLD jsonschema# Using Python script (default: 2026 schema)
python FrameAndValidate.py my-metadata.jsonld -v
# Using Windows batch script
validate-cdif.bat my-metadata.jsonldpython FrameAndValidate.py my-metadata.jsonld -o framed.json -vbatch_validate.py runs both JSON Schema and SHACL validation across multiple file groups:
python batch_validate.pyFile groups validated:
- testJSONMetadata -- 77 ADA metadata test files
- cdifbook -- 10 cdifbook example documents
- cdifProfiles -- 5 CDIF profile examples from building blocks
- adaProfiles -- 36 ADA profile examples from building blocks
Output shows per-file results for each validation type with severity-aware reporting:
- JSON Schema: PASS or FAIL
- SHACL: PASS (clean), PASS (N warnings, M info), FAIL (N violations, M warnings), or SKIP (for generated output files like
-croissant.json,-rocrate.json)
Group summaries and an overall summary list all violations and schema failures.
As of April 2026, validation across testJSONMetadata (77 files) and all 5 CDIF profile examples shows:
- JSON Schema: 77/77 testJSONMetadata pass against all three schemas (Discovery, DataDescription, Complete)
- Profile examples: 5/5 pass (Discovery, DiscoveryMinimal, DiscoveryComplete, DataDescription, Complete)
- SHACL Violations: 0 across all files
- SHACL Warnings/Info: All files pass with warnings/info only — these reflect optional-but-recommended properties (missing activity descriptions, contact points, physical data types, etc.)
SHACL severity levels are aligned with JSON Schema: properties that are optional in the JSON Schema are sh:Warning (not sh:Violation) in SHACL.
CDIF metadata is expressed as JSON-LD. To validate JSON-LD documents against the JSON Schema, you need to first frame the document to ensure it has the correct structure. The framing process:
- Reshapes the JSON-LD graph into a tree structure
- Ensures properties use the expected prefixes (e.g.,
schema:name) - Embeds referenced nodes inline
- Normalizes arrays and single values
Use a JSON-LD processor to apply CDIF-frame-2026.jsonld to your metadata document.
Validate the framed output against the appropriate schema:
CDIFDiscoverySchema.json-- discovery profile onlyCDIFDataDescriptionSchema.json-- discovery + data descriptionCDIFCompleteSchema.json-- discovery + data description + archive + provenance (default)
RO-Crate conversion and validation tools (ConvertToROCrate.py, ValidateROCrate.py) have been moved to the CDIF packaging repository. These tools convert nested/compacted CDIF JSON-LD into RO-Crate 1.1 form via JSON-LD expand + flatten.
See the packaging repository documentation for conversion details, validation checks, and usage.
croissant/ConvertToCroissant.py converts CDIF JSON-LD metadata to Croissant (mlcommons.org/croissant/1.0) JSON-LD, an ML-oriented dataset metadata format developed by MLCommons. Both formats build on schema.org and JSON-LD, so discovery-level metadata maps directly.
# Convert a CDIF document to Croissant
python croissant/ConvertToCroissant.py input.jsonld -o output-croissant.json
# Validate the output (requires: pip install mlcroissant)
mlcroissant validate --jsonld output-croissant.jsonSee croissant/README.md for detailed documentation on the conversion process, property mappings, example output files, and usage options. The full property-by-property mapping is in croissant/CDIFtoCroissant.md.
The FrameAndValidate.py script handles the complete workflow:
# Validate with 2026 schema (default)
python FrameAndValidate.py my-metadata.jsonld -v
# Save framed output
python FrameAndValidate.py my-metadata.jsonld -o framed.json -v
# Use legacy schema
python FrameAndValidate.py my-metadata.jsonld --frame archive/CDIF-frame.jsonld --schema archive/CDIF-JSONLD-schema-schemaprefix.json -vOptions:
-v, --validate- Validate against JSON Schema-o, --output FILE- Save framed output to file--schema FILE- Path to JSON Schema (default: CDIFCompleteSchema.json)--frame FILE- Path to JSON-LD frame (default: CDIF-frame-2026.jsonld)
The validate-cdif.bat script enables validation from within oXygen XML Editor.
- Go to Tools → External Tools → Configure...
- Click New and configure:
| Field | Value |
|---|---|
| Name | CDIF Validate |
| Command | Path to validate-cdif.bat |
| Arguments | "${cf}" |
| Working directory | (leave empty) |
- Open a JSON-LD file in oXygen
- Go to Tools → External Tools → CDIF Validate
- Results appear in the oXygen console
validate-cdif.bat file.jsonld # Validate with 2026 schema
validate-cdif.bat file.jsonld --framed # Validate + save framed output
validate-cdif.bat file.jsonld --legacy # Use pre-2026 schema
validate-cdif.bat --help # Show helpimport json
from pyld import jsonld
import jsonschema
# Load the frame
with open('CDIF-frame-2026.jsonld') as f:
frame = json.load(f)
# Load your JSON-LD metadata document
with open('my-metadata.jsonld') as f:
doc = json.load(f)
# Load the schema
with open('CDIFCompleteSchema.json') as f:
schema = json.load(f)
# Step 1: Frame the document
framed = jsonld.frame(doc, frame)
# Step 2: Validate against schema
try:
jsonschema.validate(instance=framed, schema=schema)
print("Validation successful!")
except jsonschema.ValidationError as e:
print(f"Validation failed: {e.message}")Required packages:
pip install PyLD jsonschemaconst jsonld = require('jsonld');
const Ajv = require('ajv');
const addFormats = require('ajv-formats');
const fs = require('fs');
async function validateCDIF(metadataPath) {
// Load files
const frame = JSON.parse(fs.readFileSync('CDIF-frame-2026.jsonld', 'utf8'));
const doc = JSON.parse(fs.readFileSync(metadataPath, 'utf8'));
const schema = JSON.parse(fs.readFileSync('CDIFCompleteSchema.json', 'utf8'));
// Step 1: Frame the document
const framed = await jsonld.frame(doc, frame);
// Step 2: Validate against schema
const ajv = new Ajv({ allErrors: true });
addFormats(ajv);
const validate = ajv.compile(schema);
if (validate(framed)) {
console.log('Validation successful!');
return true;
} else {
console.log('Validation failed:', validate.errors);
return false;
}
}
validateCDIF('my-metadata.jsonld');Required packages:
npm install jsonld ajv ajv-formatsYour JSON-LD metadata documents must include a @context with namespace prefixes. Only schema and dcterms are required at the discovery level; additional prefixes are needed depending on which optional properties are used.
Required (discovery level):
{
"@context": {
"schema": "http://schema.org/",
"dcterms": "http://purl.org/dc/terms/"
}
}Optional prefixes (add as needed for the properties you use):
| Prefix | IRI | When needed |
|---|---|---|
spdx |
http://spdx.org/rdf/terms# |
Checksum properties on distributions |
dcat |
http://www.w3.org/ns/dcat# |
dcat:CatalogRecord on subjectOf |
geosparql |
http://www.opengis.net/ont/geosparql# |
Spatial coverage geometry |
prov |
http://www.w3.org/ns/prov# |
Provenance (wasGeneratedBy) |
dqv |
http://www.w3.org/ns/dqv# |
Data quality measurements |
cdi |
http://ddialliance.org/Specification/DDI-CDI/1.0/RDF/ |
DDI-CDI variable/data structure properties |
csvw |
http://www.w3.org/ns/csvw# |
CSVW tabular data properties (data description level) |
Domain-specific metadata may also use extension namespace prefixes. For example, the XAS (X-ray absorption spectroscopy) test example uses:
| Prefix | IRI | Purpose |
|--------|-----|---------|
| `xas` | `http://cdi4exas.org/` | XAS-specific types and properties (beamline, detector, edge energy, etc.) |
| `cdifq` | `http://crossdomaininteroperability.org/cdifq/` | Placeholder namespace for data structure properties (`nColumns`, `nRows`) not yet assigned to a formal vocabulary |
The `cdifq` namespace is a temporary placeholder. Properties using it (such as row/column counts on data structures) may migrate to DDI-CDI, CSVW, or another standard vocabulary in the future. `croissant/ConvertToCroissant.py` includes `cdifq` in its output context so that these terms resolve correctly during JSON-LD processing.
### Legacy Schema Requirements
```json
{
"@context": {
"schema": "http://schema.org/",
"dcterms": "http://purl.org/dc/terms/",
"prov": "http://www.w3.org/ns/prov#",
"dqv": "http://www.w3.org/ns/dqv#",
"geosparql": "http://www.opengis.net/ont/geosparql#",
"spdx": "http://spdx.org/rdf/terms#",
"time": "http://www.w3.org/2006/time#"
}
}
If you prefer to author metadata without namespace prefixes (e.g., name instead of schema:name), you can use the CDIF-context-2026.jsonld context file. This context maps unprefixed property names to their full IRIs.
{
"@context": "https://your-server.org/CDIF-context-2026.jsonld",
"@type": "Dataset",
"@id": "https://example.org/dataset/123",
"name": "My Dataset",
"description": "A sample dataset description",
"identifier": "dataset-123",
"dateModified": "2024-01-15",
"url": "https://example.org/data/123",
"license": "https://creativecommons.org/licenses/by/4.0/",
"subjectOf": {
"@type": ["Dataset"],
"additionalType": ["dcat:CatalogRecord"],
"sdDatePublished": "2024-01-15"
}
}The validation workflow handles both prefixed and unprefixed instances:
- Unprefixed instance references
CDIF-context-2026.jsonld - Framing with
CDIF-frame-2026.jsonldtransforms the instance - The frame's context uses prefixed names, so the output has prefixed keys
- Validate against
CDIFCompleteSchema.json
This means you only need one schema. The framing step normalizes all instances to the prefixed format regardless of how they were authored.
For production use, host CDIF-context-2026.jsonld at a stable URL and reference it in your instances:
{
"@context": "https://your-server.org/CDIF-context-2026.jsonld",
...
}Or embed the context directly in your instance by copying the contents of CDIF-context-2026.jsonld.
The schema validates CDIF Discovery profile metadata with the following required fields:
@id- Resource identifier@type- Must includeschema:Dataset@context- JSON-LD context with required prefixesschema:name- Resource nameschema:identifier- Primary identifierschema:dateModified- Last modification dateschema:subjectOf- Metadata about the metadata record (requires@typecontainingschema:Datasetandschema:additionalTypecontainingdcat:CatalogRecord)- Either
schema:urlorschema:distribution- Access information - Either
schema:licenseorschema:conditionsOfAccess- Usage terms
The 2026 schema adds support for:
Variables (schema:variableMeasured):
- Items are
anyOfPropertyValue-based (cdifVariableMeasured) orschema:StatisticalVariable - PropertyValue variables: typed as
schema:PropertyValuewith DDI-CDI extensions (cdi:intendedDataType,cdi:simpleUnitOfMeasure,cdi:describedUnitOfMeasure,cdi:uses,cdi:role) cdi:role-- enum:MeasureComponent,AttributeComponent,DimensionComponent,DescriptorComponent,ReferenceValueComponent- StatisticalVariable: typed as
schema:StatisticalVariablewithschema:statType,schema:measuredProperty(required) cdi:physicalDataTypeis required at the data description level (CDIFDataDescription/CDIFcomplete profiles), not at discovery level
Distributions:
cdi:StructuredDataSet- For structured formats (JSON, XML, HDF5, NetCDF)cdi:TabularTextDataSet- For tabular text (wide format) with CSVW properties:csvw:delimiter,csvw:header,csvw:headerRowCountcdi:isDelimitedORcdi:isFixedWidthcdi:hasPhysicalMapping- Links variables to physical representation
cdi:LongStructureDataSet- For long/narrow data format where each row is a single observation:- A descriptor column identifies which variable each row measures (
cdi:role: DescriptorComponent) - A reference column holds the actual value (
cdi:role: ReferenceValueComponent) - Optional CSVW properties (delimiter, header, etc.) and DDI-CDI physical properties
cdi:hasPhysicalMapping- Links variables to physical representation- SHACL rules enforce exactly one
DescriptorComponentand at least oneReferenceValueComponent
- A descriptor column identifies which variable each row measures (
CDIF-graph-schema-2026.json is the graph-based counterpart to the framed tree schema. It validates flattened JSON-LD documents that use @graph arrays directly, without requiring framing first. This is useful for validating JSON-LD as it naturally comes out of RDF stores or JSON-LD flatten operations.
The schema is generated by generate_graph_schema.py from the CDIF building block source schemas.
The generator reads building block schemas from the metadataBuildingBlocks/_sources/ directory (the BuildingBlockSubmodule). The location is auto-detected or can be overridden:
# Auto-detect (looks for BuildingBlockSubmodule/_sources/ relative to script)
python generate_graph_schema.py
# Explicit path
python generate_graph_schema.py --bb-dir /path/to/_sources
# Environment variable
export CDIF_BB_DIR=/path/to/_sources
python generate_graph_schema.py
# Custom output path
python generate_graph_schema.py --output my-graph-schema.json# Validate a flattened JSON-LD document directly
python -c "
import json, jsonschema
with open('CDIF-graph-schema-2026.json') as f: schema = json.load(f)
with open('my-flattened.jsonld') as f: doc = json.load(f)
jsonschema.validate(doc, schema)
print('Valid')
"The graph schema accepts three input forms:
- A
{"@context": {...}, "@graph": [...]}document (the primary use case) - A bare array of typed objects
- A single typed object
The generated schema has this high-level structure:
root-graph: validates@contextprefix declarations +@grapharray of nodesroot-object: a nested if/then/else chain dispatching objects by@typeto the correct type definitionid-reference: shared{"@id": "string"}definition for cross-node references- 24 type definitions:
type-Dataset,type-Person,type-Organization,type-PropertyValue,type-DefinedTerm,type-CreativeWork,type-DataDownload,type-MediaObject,type-WebAPI,type-Action,type-HowTo,type-Place,type-ProperInterval,type-MonetaryGrant,type-Role,type-Activity,type-QualityMeasurement,type-Claim,type-CatalogRecord,type-Identifier,type-InstanceVariable,type-StructuredDataSet,type-TabularTextDataSet,type-LongStructureDataSet
Type dispatch is ordered most-specific-first (e.g., cdi:StructuredDataSet before schema:Dataset) so that subtypes are matched before their parent types.
The generator applies these transformations when reading building block source schemas:
- External
$refresolution -- Cross-building-block$refs (e.g.,../person/schema.yaml) are resolved to internal#/$defs/type-Xreferences anyOfalternatives -- Properties that reference other building block types getanyOf [type-ref, id-reference]so they accept either inline objects or@idcross-references@typedisambiguation -- Composite types get additional type markers for dispatch (e.g., cdifCatalogRecord becomesdcat:CatalogRecord, identifier addscdi:Identifier)@contextstripping -- Context declarations are removed from non-root types (the@contextgoes on the root-graph wrapper only)- Composite type assembly -- Complex types like
type-Datasetmerge mandatory + optional building blocks;type-StructuredDataSet/type-TabularTextDataSet/type-LongStructureDataSetcompose dataDownload + CDI extensions - Extended provenance --
type-Activitybuilt fromcdifProvbuilding block, requiring multi-typed@type: ["schema:Action", "prov:Activity"], merging basegeneratedByproperties (prov:used) with schema.org Action properties (schema:agent,schema:actionProcess, etc.). Instruments are nested withinprov:useditems viaschema:instrumentsub-key (instruments areprov:Entitysubclasses).type-HowToandtype-Claimadded as new dispatch types for methodology and assertion objects
-
Missing required property
- Ensure all required fields are present
- Check that
schema:subjectOfcontains required nested fields
-
Type mismatch
- Properties like
schema:spatialCoverageandschema:temporalCoverageexpect arrays - Check that
@typevalues use theschema:prefix
- Properties like
-
Invalid @type
- Root
@typemust includeschema:Dataset - For 2026 schema, variables must include both
schema:PropertyValueandcdi:InstanceVariable
- Root
-
Framing issues
- Ensure your document has proper
@idvalues for node references - Check that the
@contextis compatible with the frame
- Ensure your document has proper
-
dcterms:conformsTo syntax
- Must use object syntax:
[{"@id": "..."}]not["..."]
- Must use object syntax:
To see the framed output before validation:
python FrameAndValidate.py my-metadata.jsonld -o framed.jsonOr in Python:
framed = jsonld.frame(doc, frame)
print(json.dumps(framed, indent=2))In addition to JSON Schema validation, CDIF metadata can be validated using SHACL (Shapes Constraint Language) rules. SHACL validation operates on the RDF graph and can express constraints that JSON Schema cannot -- SPARQL-based targeting, cross-node relationships, and semantic inference.
The composite SHACL shapes are compiled from modular rules.shacl files in individual building blocks in the metadataBuildingBlocks repository. Two profiles are available:
- discovery —
ShaclValidation/CDIF-Discovery-Shapes.ttl(64 shapes) - complete —
ShaclValidation/CDIF-Complete-Shapes.ttl(76 shapes, adds provenance + data description)
Quick start:
# Validate against discovery shapes
python ShaclValidation/ShaclJSONLDContext.py my-metadata.jsonld ShaclValidation/CDIF-Discovery-Shapes.ttl
# Generate a markdown validation report
python ShaclValidation/generate_shacl_report.py my-metadata.jsonld ShaclValidation/CDIF-Complete-Shapes.ttl -o report.md
# Regenerate shapes after building block changes
python ShaclValidation/generate_shacl_shapes.py --profile discovery
python ShaclValidation/generate_shacl_shapes.py --profile completeSee ShaclValidation/README.md for detailed documentation on the SHACL tools, shapes architecture, report format, and how to add new building block shapes.
Recommendation: Use both JSON Schema and SHACL validation for comprehensive coverage. batch_validate.py runs both automatically across multiple file groups.
validate_conformance.py inspects JSON-LD files for schema:subjectOf/dcterms:conformsTo claims and validates each file against the profile schemas it claims to conform to. Supports cdifCore, CDIFDiscovery, CDIFDataDescription, CDIFcomplete, and building block schemas (provenance, manifest/archive distribution).
# Validate a directory of JSON-LD files against their claimed profiles
python validate_conformance.py testJSONMetadata/
# Summary only
python validate_conformance.py testJSONMetadata/ --summary
# Verbose per-file error details
python validate_conformance.py testJSONMetadata/ --verboseConformance URIs with ada: prefix are ignored. URIs are normalized (trailing slashes stripped, dataDescription mapped to data_description).
geocodes_harvester.py harvests dataset metadata from the EarthCube GeoCodes catalog (~170K indexed datasets). It queries the Blazegraph SPARQL endpoint, fetches original JSON-LD from source landing pages when available, and optionally converts records to CDIF profile format.
# List publishers and dataset counts
python geocodes_harvester.py --list-publishers
# Harvest 5 records from diverse publishers, convert to CDIF Discovery
python geocodes_harvester.py --count 5 --output ./examples --cdif discovery
# Harvest from a specific publisher
python geocodes_harvester.py --publisher "PANGAEA" --count 3 --output ./examples
# Harvest without CDIF conversion (raw schema.org JSON-LD)
python geocodes_harvester.py --count 5 --output ./raw-examplesThe CDIF conversion handles: property prefixing (schema:), @context/@type normalization, @list wrapping for creators, distribution fixes, subjectOf with conformsTo, type mappings (FundingAgency to Organization, Grant to MonetaryGrant, Croissant sc:Dataset to Dataset), Person name synthesis, and sameAs array normalization. All conversions are documented in each record's subjectOf description. Extra properties from the source are preserved (open-world assumption).
DCAT/dcat_to_cdif.py converts DCAT JSON-LD catalogs or individual dataset records to CDIF-conformant schema.org JSON-LD. Maps DCAT/Dublin Core properties to schema.org equivalents per the CDIF DCAT implementation guide.
# List datasets in a DCAT catalog
python DCAT/dcat_to_cdif.py catalog.jsonld --list
# Convert selected records, validate output
python DCAT/dcat_to_cdif.py catalog.jsonld --output ./examples \
--select 0,3,5 --catalog-name "My Catalog" --catalog-url "https://example.org/" \
--validateKey mappings: dcterms:title → schema:name, dcterms:description → schema:description, dcterms:modified → schema:dateModified, dcterms:license �� schema:license, dcterms:accessRights → schema:conditionsOfAccess, dcat:keyword → schema:keywords, dcat:Distribution → schema:DataDownload, dcterms:spatial → schema:spatialCoverage, dcterms:temporal → schema:temporalCoverage. Unmapped properties preserved (open world). Auto-detects Discovery vs Core profile based on spatial/temporal content.
See DCAT/README.md for the full property mapping table, PSDI catalog example, and known limitations.
The MetadataExamples/ directory contains sample CDIF JSON-LD documents for testing:
| File | Technique | Description |
|---|---|---|
tof-htk9-f770.json |
ToF-SIMS | Time-of-flight mass spectrometry particle analysis |
xrd-2j0t-gq80.json |
XRD | X-ray diffraction |
xanes-2arx-b516.json |
XANES | X-ray absorption near-edge structure |
yv1f-jb20.json |
-- | General dataset |
test_se_na2so4-testschemaorg-cdiv3.json |
XAS | X-ray absorption spectroscopy with DDI-CDI data structure (WideDataStructure, InstanceVariable, ValueMapping). Uses xas: and cdifq: extension namespaces |
nwis-water-quality-longdata.json |
Water Quality | NWIS groundwater nutrient analysis (464 rows, 20 columns) in cdi:LongStructureDataSet long (narrow) format with DescriptorComponent/ReferenceValueComponent roles, cdi:hasPhysicalMapping, and 5 MeasureComponent domain variables. Validates against graph schema (CDIF-graph-schema-2026.json) |
prov-ocean-temp-example.json |
Ocean Temperature | Extended provenance example demonstrating cdifProv building block: action chaining (schema:object/schema:result), multi-typed ["schema:Action", "prov:Activity"] activities, agents with Role wrappers, inline schema:HowTo methodology via schema:actionProcess with 3 steps, diverse instruments, facility location, and backward-compatible prov:used. Validates against graph schema |
Corresponding Croissant output files are in the croissant/ directory.
The ddi-cdi/cls-InstanceVariable-resolved.json file is a standalone JSON Schema (Draft 2020-12) for the DDI-CDI InstanceVariable class, derived from ddi-cdi/ddi-cdi.schema_normative.json. It resolves all $ref references into a self-contained schema suitable for use in editors like oXygen without needing the full 395-definition DDI-CDI schema.
The resolved schema applies several transformations to make the schema practical:
- Reverse properties removed - 767
_OF_reverse relationship properties stripped (use JSON-LD@reverseinstead) catalogDetailsremoved - Catalog-level metadata omitted from all classes- Redundant classes omitted -
cls-DataPoint,cls-Datum,cls-RepresentedVariablesimplified to IRI-only references - XSD types inlined - Primitive types (
xsd:string,xsd:integer, etc.) replaced with inline definitions - Patterns normalized -
if/then/elsearray patterns converted to consistentanyOf - Frequency-based
$refresolution - Common definitions (>3 uses) in$defs; rare definitions inlined
See ddi-cdi/cls-InstanceVariable-resolved-README.md for full details on the generation process, circular reference analysis, and transformation rationale.
- The framed tree schemas (
CDIFCompleteSchema.json,CDIFDiscoverySchema.json) are generated from building block profile resolved schemas usinggenerate_validation_schema.py. The hand-maintained originals and the all-in-oneCDIF-JSONLD-schema-2026.jsonare inarchive/. - Legacy schema (
CDIF-JSONLD-schema-schemaprefix.json) is still available for older documents. - All schema.org elements require the
schema:prefix for SHACL validation compatibility. - The frame ensures that after framing, the output structure matches what the JSON schema expects.
- For SHACL validation, use the corresponding
.shaclor.ttlfiles in this repository. @typeflexibility: All@typedefinitions in the framed schemas useanyOfto accept either a string ("schema:Dataset") or an array (["schema:Dataset"]). JSON-LD framing may compact single-element arrays to strings;FrameAndValidate.pyrecursively normalizes all@typevalues back to arrays.spdx:Checksumtyping: Allspdx:checksumobjects must include"@type": "spdx:Checksum". This is required by both the JSON Schema (required: ["@type"]) and SHACL shapes (sh:class spdx:Checksum).