Skip to content

Support multi-file spec loading, --entity filter, and --emit-spec #202

@amc-corey-cox

Description

@amc-corey-cox

Summary

Add the ability to load and merge multiple transformation spec files, filter execution by entity class, and emit the resolved spec.

Multi--T spec loading

Both map-data and validate-spec should accept multiple -T flags (and/or directories) and merge them into a single TransformationSpecification at load time.

Merge semantics:

  • class_derivations: append (order preserved per file, files in argument order)
  • enum_derivations: union by name (error on conflict)
  • slot_derivations: union by name (error on conflict)

This enables modular spec authoring — keeping enum derivations in a shared file, splitting class derivations by domain, or maintaining per-variable specs that get merged at load time.

linkml-map map-data -T enums.yaml -T measurements/*.yaml -s schema.yaml input/
linkml-map validate-spec -T specs/

--entity filter

Both map-data and validate-spec should accept --entity <class_name> to restrict processing to class_derivations matching that name. Only filters top-level class_derivations — nested object_derivations (e.g., Quantity inside MeasurementObservation) are unaffected and process normally as part of their parent.

linkml-map map-data -T specs/*.yaml --entity MeasurementObservation -s schema.yaml input/

This enables external parallelization (e.g., make -j) by running filtered transforms concurrently, and is useful for debugging individual entity transforms.

--emit-spec

  • On validate-spec: emit the resolved (merged + filtered) spec to stdout. Enables inspection of exactly what linkml-map would execute.
  • On map-data: emit the resolved spec to a file path (--emit-spec merged.yaml) as a side-effect alongside normal transformation. Useful for reproducibility logging.
# Preview the merged + filtered spec
linkml-map validate-spec -T specs/*.yaml --entity MeasurementObservation --emit-spec

# Transform and log what spec was used
linkml-map map-data -T specs/*.yaml --entity Foo --emit-spec resolved.yaml -s schema.yaml input/

Motivation

Transformation specs for large-scale ETL pipelines are naturally modular — per-variable, per-domain, or with shared enum definitions. Today, users must pre-compose these into a single file externally before passing to linkml-map. This is particularly painful for enum_derivations which need to be shared across entity transforms but duplicated into each composed file.

This follows the same pattern as other declarative config systems (Docker Compose -f, LinkML schema imports, Terraform modules): composable parts, merged at load time.

Design notes

  • The merge logic lives in spec loading (shared by both commands), not as a separate CLI command
  • --entity is distinct from --source-type (which overrides the type name passed to map_object, not which class_derivations are executed)
  • validate-spec with --entity + --emit-spec serves as a complete preview of what map-data would execute

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions