Skip to content

Consolidate normalization pipeline#198

Open
amc-corey-cox wants to merge 2 commits intomainfrom
consolidate-normalization
Open

Consolidate normalization pipeline#198
amc-corey-cox wants to merge 2 commits intomainfrom
consolidate-normalization

Conversation

@amc-corey-cox
Copy link
Copy Markdown
Contributor

@amc-corey-cox amc-corey-cox commented Apr 9, 2026

Summary

  • Compilers now receive derived_specification (with schema-inferred defaults) instead of the raw specification, eliminating a duplicate induce_missing_values call in PythonCompiler
  • Documents the two-phase normalization pipeline (load-time structural normalization vs schema-bind-time semantic induction) in the Transformer class docstring and derived_specification property
  • Fixes compiler output to include inferred populated_from and range values — previously these showed as None in compiled markdown/python output

Test plan

  • All 586 tests pass (531 unit + 55 compliance)
  • Regenerated golden files for markdown and python compiler output
  • Ruff clean

Closes #124

Copilot AI review requested due to automatic review settings April 9, 2026 14:02
Compilers now receive derived_specification (with inferred defaults)
instead of the raw specification, eliminating a duplicate
induce_missing_values call in PythonCompiler. Documents the two-phase
normalization pipeline in the Transformer class docstring.

Closes #124
@amc-corey-cox amc-corey-cox force-pushed the consolidate-normalization branch from cf46a51 to 2f83dfc Compare April 9, 2026 14:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the transformation-spec normalization pipeline by clearly separating load-time structural normalization from schema-bind-time semantic induction, and updates compilation pathways to use the schema-derived specification so compiler outputs include inferred fields (e.g., populated_from, range).

Changes:

  • Route compiler invocations (CLI + tests) through ObjectTransformer.derived_specification instead of the raw spec.
  • Remove redundant induce_missing_values invocation from PythonCompiler.
  • Document the two-phase normalization pipeline and regenerate golden compiled markdown output.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/linkml_map/transformer/transformer.py Documents the two-phase normalization pipeline and expands derived_specification documentation.
src/linkml_map/compiler/python_compiler.py Removes internal deep-copy + induction from the python compiler so it compiles the already-derived spec.
src/linkml_map/cli/cli.py CLI compile now compiles the derived specification.
tests/test_compliance/test_compliance_suite.py Compliance helper now compiles the derived spec when available.
tests/test_compiler/test_python_compiler.py Python compiler unit test now compiles the derived spec via ObjectTransformer.
tests/input/examples/personinfo_basic/output/personinfo_compiled.md Golden output updated to reflect inferred populated_from / range values.
Comments suppressed due to low confidence (1)

src/linkml_map/transformer/transformer.py:250

  • derived_specification can raise and/or cache a partially-induced spec when source_schemaview is unset or if induce_missing_values errors. Because _derived_specification is assigned before induction, an exception leaves the cache populated and subsequent calls will skip induction. Consider (1) returning None early when source_schemaview is not set (consistent with the | None return type), and (2) computing into a local variable and only assigning _derived_specification after induce_missing_values succeeds.
        if self._derived_specification is None:
            if self.specification is None:
                return None
            self._apply_source_schema_patches()
            self._derived_specification = deepcopy(self.specification)
            induce_missing_values(self._derived_specification, self.source_schemaview)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate the spec normalization pipeline

2 participants