Summary
Add the ability to load and merge multiple transformation spec files, filter execution by entity class, and emit the resolved spec.
Multi--T spec loading
Both map-data and validate-spec should accept multiple -T flags (and/or directories) and merge them into a single TransformationSpecification at load time.
Merge semantics:
class_derivations: append (order preserved per file, files in argument order)
enum_derivations: union by name (error on conflict)
slot_derivations: union by name (error on conflict)
This enables modular spec authoring — keeping enum derivations in a shared file, splitting class derivations by domain, or maintaining per-variable specs that get merged at load time.
linkml-map map-data -T enums.yaml -T measurements/*.yaml -s schema.yaml input/
linkml-map validate-spec -T specs/
--entity filter
Both map-data and validate-spec should accept --entity <class_name> to restrict processing to class_derivations matching that name. Only filters top-level class_derivations — nested object_derivations (e.g., Quantity inside MeasurementObservation) are unaffected and process normally as part of their parent.
linkml-map map-data -T specs/*.yaml --entity MeasurementObservation -s schema.yaml input/
This enables external parallelization (e.g., make -j) by running filtered transforms concurrently, and is useful for debugging individual entity transforms.
--emit-spec
- On
validate-spec: emit the resolved (merged + filtered) spec to stdout. Enables inspection of exactly what linkml-map would execute.
- On
map-data: emit the resolved spec to a file path (--emit-spec merged.yaml) as a side-effect alongside normal transformation. Useful for reproducibility logging.
# Preview the merged + filtered spec
linkml-map validate-spec -T specs/*.yaml --entity MeasurementObservation --emit-spec
# Transform and log what spec was used
linkml-map map-data -T specs/*.yaml --entity Foo --emit-spec resolved.yaml -s schema.yaml input/
Motivation
Transformation specs for large-scale ETL pipelines are naturally modular — per-variable, per-domain, or with shared enum definitions. Today, users must pre-compose these into a single file externally before passing to linkml-map. This is particularly painful for enum_derivations which need to be shared across entity transforms but duplicated into each composed file.
This follows the same pattern as other declarative config systems (Docker Compose -f, LinkML schema imports, Terraform modules): composable parts, merged at load time.
Design notes
- The merge logic lives in spec loading (shared by both commands), not as a separate CLI command
--entity is distinct from --source-type (which overrides the type name passed to map_object, not which class_derivations are executed)
validate-spec with --entity + --emit-spec serves as a complete preview of what map-data would execute
Summary
Add the ability to load and merge multiple transformation spec files, filter execution by entity class, and emit the resolved spec.
Multi-
-Tspec loadingBoth
map-dataandvalidate-specshould accept multiple-Tflags (and/or directories) and merge them into a singleTransformationSpecificationat load time.Merge semantics:
class_derivations: append (order preserved per file, files in argument order)enum_derivations: union by name (error on conflict)slot_derivations: union by name (error on conflict)This enables modular spec authoring — keeping enum derivations in a shared file, splitting class derivations by domain, or maintaining per-variable specs that get merged at load time.
linkml-map map-data -T enums.yaml -T measurements/*.yaml -s schema.yaml input/ linkml-map validate-spec -T specs/--entityfilterBoth
map-dataandvalidate-specshould accept--entity <class_name>to restrict processing to class_derivations matching that name. Only filters top-level class_derivations — nestedobject_derivations(e.g.,QuantityinsideMeasurementObservation) are unaffected and process normally as part of their parent.linkml-map map-data -T specs/*.yaml --entity MeasurementObservation -s schema.yaml input/This enables external parallelization (e.g.,
make -j) by running filtered transforms concurrently, and is useful for debugging individual entity transforms.--emit-specvalidate-spec: emit the resolved (merged + filtered) spec to stdout. Enables inspection of exactly what linkml-map would execute.map-data: emit the resolved spec to a file path (--emit-spec merged.yaml) as a side-effect alongside normal transformation. Useful for reproducibility logging.Motivation
Transformation specs for large-scale ETL pipelines are naturally modular — per-variable, per-domain, or with shared enum definitions. Today, users must pre-compose these into a single file externally before passing to linkml-map. This is particularly painful for
enum_derivationswhich need to be shared across entity transforms but duplicated into each composed file.This follows the same pattern as other declarative config systems (Docker Compose
-f, LinkML schemaimports, Terraform modules): composable parts, merged at load time.Design notes
--entityis distinct from--source-type(which overrides the type name passed tomap_object, not which class_derivations are executed)validate-specwith--entity+--emit-specserves as a complete preview of whatmap-datawould execute