V2.0 update#55
Open
kaylode wants to merge 3 commits into
Open
Conversation
- Bump version to 2.0.0 - Modernize all dependencies (Lightning 2.4+, wandb 0.17+, optuna 3.6+, etc.) - Add huggingface-hub and safetensors as core dependencies - Replace black/isort with ruff for linting and formatting - Modernize Dockerfile (CUDA 12.4, Ubuntu 22.04, uv) - Enhance Registry with generics, merge(), get_or_none(), __len__, __getitem__ - Refactor BasePipeline with shared _PipelineBase, extracted helpers - Fix LightningModelWrapper autocast device detection, use optimizers() API - Refactor LoggerObserver with dispatch table (O(1) routing) - Make Metric an ABC with @AbstractMethod decorators - Cache inspect.signature in getter.py via lru_cache - Add FSDP strategy support to Trainer - Add HuggingFaceHubMixin (save/load/push with safetensors) - Add HuggingFaceHubCallback for automatic model pushing - Modernize all GitHub workflows (actions v4/v5, Python 3.11, uv, caching) - Add lint.yml workflow (ruff check + format) - Add release.yml workflow (PyPI publishing via trusted OIDC) - Create AGENT.md development guide
…seus ML framework
Detailed Changes:
- Logger Overhaul:
- Fixed `LoggerObserver` singleton pattern for thread-safety and session persistence.
- Added master rank check for distributed training in `LoggerObserver.log` to prevent duplicate logs across GPUs.
- Improved `LoggerObserver.text()` to support multiple args and dictionary-to-JSON serialization.
- Resolved log duplication by making `loguru` handlers (StdoutLogger/FileLogger) instance-specific using name filters.
- Subpackage Decoupling:
- Moved heavy domain-specific dependencies (lightning, torchvision, transformers) to optional extras.
- Converted `theseus/` and `theseus/base/` __init__ files to lazy-loadable structures (removed eager wildcards).
- Implemented lazy imports in `LoggerObserver` to break global dependencies on PyTorch/Plotly.
- ML Module Simplification:
- Removed monolithic `MLPipeline`, `MLTrainer`, and custom ML-specific callbacks/metrics.
- Replaced with streamlined `tradml.py` containing direct-fit functions and `TradMLTuner` with Optuna integration.
- Fixed critical bugs in `LabelEncode` (lazy column initialization) and `FillNaN` (None-value handling).
- CI/CD & Infrastructure:
- Updated pyproject.toml and uv.lock to modernized dependency stack.
- Rewrote tabular test suite in `tests/tabular/` to use the new simplified functional API.
- Preserved existing preprocessors, reduction, and visualization utilities.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Detailed Changes:
Logger Overhaul:
LoggerObserversingleton pattern for thread-safety and session persistence.LoggerObserver.logto prevent duplicate logs across GPUs.LoggerObserver.text()to support multiple args and dictionary-to-JSON serialization.loguruhandlers (StdoutLogger/FileLogger) instance-specific using name filters.Subpackage Decoupling:
theseus/andtheseus/base/init files to lazy-loadable structures (removed eager wildcards).LoggerObserverto break global dependencies on PyTorch/Plotly.ML Module Simplification:
MLPipeline,MLTrainer, and custom ML-specific callbacks/metrics.tradml.pycontaining direct-fit functions andTradMLTunerwith Optuna integration.LabelEncode(lazy column initialization) andFillNaN(None-value handling).CI/CD & Infrastructure:
tests/tabular/to use the new simplified functional API.Bump version to 2.0.0
Modernize all dependencies (Lightning 2.4+, wandb 0.17+, optuna 3.6+, etc.)
Add huggingface-hub and safetensors as core dependencies
Replace black/isort with ruff for linting and formatting
Modernize Dockerfile (CUDA 12.4, Ubuntu 22.04, uv)
Enhance Registry with generics, merge(), get_or_none(), len, getitem
Refactor BasePipeline with shared _PipelineBase, extracted helpers
Fix LightningModelWrapper autocast device detection, use optimizers() API
Refactor LoggerObserver with dispatch table (O(1) routing)
Make Metric an ABC with @AbstractMethod decorators
Cache inspect.signature in getter.py via lru_cache
Add FSDP strategy support to Trainer
Add HuggingFaceHubMixin (save/load/push with safetensors)
Add HuggingFaceHubCallback for automatic model pushing
Modernize all GitHub workflows (actions v4/v5, Python 3.11, uv, caching)
Add lint.yml workflow (ruff check + format)
Add release.yml workflow (PyPI publishing via trusted OIDC)
Create AGENT.md development guide