Purpose
Implement Newton-Schulz nonlinearity path for inner momentum updates to align with Eq. 24-style formulation.
Mandatory Reading (blocking)
First comment must summarize:
reports/NL_IMPLEMENTATION_ORACLE.md section 6.1.1 and optimizer gap notes
reports/paper/NL-print.extracted.clean.txt Eq. (24)
src/nested_learning/optim/m3.py Newton-Schulz implementation
Required Code Anchors
src/nested_learning/optim/deep.py
src/nested_learning/optim/m3.py
src/nested_learning/optim/factory.py
Scope
- Add inner variant
muon_ns using Newton-Schulz output transform.
- Clarify difference between outer Muon optimizer and inner
muon_ns memory rule in docs.
- Keep backward compatibility with current
muon configs.
Test Requirements
- Unit tests for NS path shape/stability.
- Deterministic toy-case checks.
Deliverables
- Variant implementation + docs + ablation config.
Acceptance Criteria
- No regression in outer optimizer behavior.
- 5k run finite with expected telemetry keys.
- First issue comment contains mandatory reading summary.
Purpose
Implement Newton-Schulz nonlinearity path for inner momentum updates to align with Eq. 24-style formulation.
Mandatory Reading (blocking)
First comment must summarize:
reports/NL_IMPLEMENTATION_ORACLE.mdsection 6.1.1 and optimizer gap notesreports/paper/NL-print.extracted.clean.txtEq. (24)src/nested_learning/optim/m3.pyNewton-Schulz implementationRequired Code Anchors
src/nested_learning/optim/deep.pysrc/nested_learning/optim/m3.pysrc/nested_learning/optim/factory.pyScope
muon_nsusing Newton-Schulz output transform.muon_nsmemory rule in docs.muonconfigs.Test Requirements
Deliverables
Acceptance Criteria