Skip to content

Version: 2.0.0-beta.1#27

Merged
crvernon merged 56 commits into
mainfrom
develop
May 20, 2026
Merged

Version: 2.0.0-beta.1#27
crvernon merged 56 commits into
mainfrom
develop

Conversation

@crvernon
Copy link
Copy Markdown
Member

This pull request introduces several significant improvements and additions across documentation, configuration, CI workflows, and citation metadata. The most notable changes include the introduction of a beginner tutorial series, major ML and AI assistant subsystems, expanded documentation, new environment variable options for AI providers, CI workflow enhancements, and the addition of a formal citation file.

Key changes:

Documentation and Tutorials

  • Added a comprehensive beginner tutorial series (docs/tutorials/beginner/, notebooks/beginner/) with 10 step-by-step guides for non-experts, including foundational concepts and Jupyter notebooks. The notebook directory structure was reorganized to separate beginner and advanced tracks, and documentation now recommends the beginner path for new users.
  • Expanded and reorganized documentation to cover new features (AI assistants, ML optimization, cloud providers, artifacts, overlays, cost estimation, etc.), with new and updated pages and cross-links.

Machine Learning and AI Assistant Features

  • Introduced an ML optimization subsystem (scalable.ml) for learned resource prediction and adaptive scaling, with new CLI commands, telemetry, and settings.
  • Added an AI assistant subsystem (scalable.ai) with pluggable LLM backend support, multiple assistant commands (onboarding, diagnosis, explanation, composition, migration), and a prompt template system.

Cloud, Kubernetes, and Artifacts

  • Added Kubernetes and AWS cloud providers, cloud cost estimation, and an artifact store layer supporting local and remote (S3/GCS) storage. Manifest overlays and cost telemetry are now supported.

Configuration and Environment

  • Added a new .env.example section for AI/LLM provider configuration, supporting multiple providers (OpenAI, Anthropic, Google, xAI, Groq, Ollama) with provider-agnostic variables and advanced override options.

CI and Testing

  • Enhanced CI workflows to trigger on version branches, expanded the test matrix, and added a dedicated job to validate and dry-run example manifests. [1] [2] [3]

Citation and Metadata

  • Added a CITATION.cff file for formal citation, including authors, version, DOI, and keywords.

Other Notable Updates

  • Updated changelog and version links to reflect new release branches and semantic versioning.
  • Deprecated legacy Dockerfile/config auto-discovery in favor of manifest-driven configuration.

These changes collectively advance the project's usability, extensibility, and scientific reproducibility, making it accessible for both new and advanced users.

crvernon and others added 30 commits May 19, 2026 14:42
Creates the additive Phase 1 package structure off of version/2.0.0:
manifest/, providers/, session/, planning/, cli/. Each new package ships
with a docstring describing its Phase 1 role and its hooks for later
phases (telemetry, AI assistants, Kubernetes/cloud providers, ML
advisor).

scalable/manifest/schema.py defines the frozen v1 schema dataclasses
(ManifestModel, ProjectConfig, TargetConfig, ComponentConfig, TaskConfig)
and SCHEMA_VERSION = 1. The schema is intentionally implemented with
stdlib dataclasses so manifest validation works without the optional
[ai] extra (resolves Phase 1 plan section 9 open question #1).

scalable/manifest/errors.py declares the ManifestError hierarchy used by
the parser, validator, and Phase 4 AI migration assistant.

scalable/cli/main.py is a Phase 1 stub for the [project.scripts] entry
point; the real validate / plan --dry-run wiring lands in WU-10.

pyproject.toml: version bumped to 2.0.0a1, pyyaml pinned explicitly,
empty placeholder extras for ai/cloud/kubernetes registered so
pip install scalable[ai] resolves cleanly from day one, scalable
console script registered, packages.find used so the new sub-packages
are picked up by setuptools.

Verified: existing 73 unit tests pass unchanged; ruff clean on all new
modules. No public API removed or renamed.

Refs plans/v2.0.0_phase1_plan.md WU-1.
Phase 1: provider abstraction + scalable.yaml manifest foundation
…sing

phase 2 progress towards telemetry and deterministic advising
Implements Phase 3 of the v2.0.0 roadmap:

- KubernetesProvider over Dask Kubernetes Operator
- AWSBatchProvider over dask-cloudprovider (Fargate/EC2)
- GCPProvider scaffold (validation only; build_cluster deferred)
- ArtifactStore protocol with local and fsspec backends
- RemoteCacheBackend for opt-in remote cache (SCALABLE_CACHE_REMOTE)
- Manifest overlays (overlays: block + targets[*].overlay)
- CostEstimate primitives and static cost tables
- scalable run CLI verb
- Settings: cache_remote_uri, default_storage, runs_dir_remote
- Telemetry: CostEvent, cost.jsonl stream, cost in report
- Provider protocol: optional estimate_cost() method
- Public API: Phase 3 exports with optional-dep guards
- Docs: cloud.rst, kubernetes.rst, artifacts.rst, overlays.rst, cost.rst
- Example manifests: gke, aws, overlays
- 238 unit tests passing, ruff clean

Version bumped to 2.0.0a3.
Phase 3: cloud + Kubernetes execution, artifact stores, overlays, cost
Implements the Phase 4 deliverables from the v2.0.0 development plan:

- AI assistant subsystem (scalable.ai) with pluggable LLM backend
  protocol and heuristic-only fallback mode
- Component onboarding assistant (scalable init-component)
- Failure diagnosis assistant (scalable diagnose)
- Plan explanation assistant (scalable explain)
- Workflow composition assistant (scalable compose)
- Manifest migration assistant (scalable migrate)
- ScalableSession.plan(objective=, policy=) now functional with
  heuristic-based resource/worker adjustments
- Prompt template system for all assistants
- Settings: SCALABLE_AI_BACKEND, SCALABLE_AI_MODEL, SCALABLE_AI_ENDPOINT
- Populated [project.optional-dependencies] ai extra
- Version bumped to 2.0.0a4
- 356 unit tests passing, ruff clean

All AI features work without an LLM backend via deterministic heuristic
fallbacks. LLM enhancement is opt-in. All outputs are reviewable
artifacts - never auto-executed.

Ref: plans/v2.0.0_phase4_plan.md
Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>
Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>
[WIP] Fix failing GitHub Actions job 'ruff + mypy'
Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/fe9e5b5a-f73f-4999-8e77-194af9b7b931

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>
crvernon added 25 commits May 19, 2026 20:28
- Add scalable.ml package: LearnedAdvisor, AdaptiveScaler, FeatureExtractor,
  ResourceModel, HyperparameterSearch, cross_validate_advisor
- Add scalable.emulation package: @emulatable decorator, EmulatorRegistry,
  EmulatorDispatch, ActiveLearner, GradientBoostingEmulator,
  RandomForestEmulator, uncertainty calibration
- Add scalable advise CLI command with ML-backed recommendations
- Add EmulationEvent to telemetry events
- Add Phase 5 settings (ML cache, emulator registry, enable flags)
- Add [ml] optional dependency extra (scikit-learn, dask-ml, joblib)
- Bump version to 2.0.0a5
- 75 new unit tests, 431 total passing
@crvernon crvernon requested a review from pralitp May 20, 2026 20:49
@crvernon crvernon merged commit cd9fc40 into main May 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants