Version: 2.0.0-beta.1 by crvernon · Pull Request #27 · JGCRI/scalable

crvernon · 2026-05-20T20:49:10Z

This pull request introduces several significant improvements and additions across documentation, configuration, CI workflows, and citation metadata. The most notable changes include the introduction of a beginner tutorial series, major ML and AI assistant subsystems, expanded documentation, new environment variable options for AI providers, CI workflow enhancements, and the addition of a formal citation file.

Key changes:

Documentation and Tutorials

Added a comprehensive beginner tutorial series (docs/tutorials/beginner/, notebooks/beginner/) with 10 step-by-step guides for non-experts, including foundational concepts and Jupyter notebooks. The notebook directory structure was reorganized to separate beginner and advanced tracks, and documentation now recommends the beginner path for new users.
Expanded and reorganized documentation to cover new features (AI assistants, ML optimization, cloud providers, artifacts, overlays, cost estimation, etc.), with new and updated pages and cross-links.

Machine Learning and AI Assistant Features

Introduced an ML optimization subsystem (scalable.ml) for learned resource prediction and adaptive scaling, with new CLI commands, telemetry, and settings.
Added an AI assistant subsystem (scalable.ai) with pluggable LLM backend support, multiple assistant commands (onboarding, diagnosis, explanation, composition, migration), and a prompt template system.

Cloud, Kubernetes, and Artifacts

Added Kubernetes and AWS cloud providers, cloud cost estimation, and an artifact store layer supporting local and remote (S3/GCS) storage. Manifest overlays and cost telemetry are now supported.

Configuration and Environment

Added a new .env.example section for AI/LLM provider configuration, supporting multiple providers (OpenAI, Anthropic, Google, xAI, Groq, Ollama) with provider-agnostic variables and advanced override options.

CI and Testing

Enhanced CI workflows to trigger on version branches, expanded the test matrix, and added a dedicated job to validate and dry-run example manifests. [1] [2] [3]

Citation and Metadata

Added a CITATION.cff file for formal citation, including authors, version, DOI, and keywords.

Other Notable Updates

Updated changelog and version links to reflect new release branches and semantic versioning.
Deprecated legacy Dockerfile/config auto-discovery in favor of manifest-driven configuration.

These changes collectively advance the project's usability, extensibility, and scientific reproducibility, making it accessible for both new and advanced users.

Creates the additive Phase 1 package structure off of version/2.0.0: manifest/, providers/, session/, planning/, cli/. Each new package ships with a docstring describing its Phase 1 role and its hooks for later phases (telemetry, AI assistants, Kubernetes/cloud providers, ML advisor). scalable/manifest/schema.py defines the frozen v1 schema dataclasses (ManifestModel, ProjectConfig, TargetConfig, ComponentConfig, TaskConfig) and SCHEMA_VERSION = 1. The schema is intentionally implemented with stdlib dataclasses so manifest validation works without the optional [ai] extra (resolves Phase 1 plan section 9 open question #1). scalable/manifest/errors.py declares the ManifestError hierarchy used by the parser, validator, and Phase 4 AI migration assistant. scalable/cli/main.py is a Phase 1 stub for the [project.scripts] entry point; the real validate / plan --dry-run wiring lands in WU-10. pyproject.toml: version bumped to 2.0.0a1, pyyaml pinned explicitly, empty placeholder extras for ai/cloud/kubernetes registered so pip install scalable[ai] resolves cleanly from day one, scalable console script registered, packages.find used so the new sub-packages are picked up by setuptools. Verified: existing 73 unit tests pass unchanged; ruff clean on all new modules. No public API removed or renamed. Refs plans/v2.0.0_phase1_plan.md WU-1.

…validation

Phase 1: provider abstraction + scalable.yaml manifest foundation

…sing phase 2 progress towards telemetry and deterministic advising

Implements Phase 3 of the v2.0.0 roadmap: - KubernetesProvider over Dask Kubernetes Operator - AWSBatchProvider over dask-cloudprovider (Fargate/EC2) - GCPProvider scaffold (validation only; build_cluster deferred) - ArtifactStore protocol with local and fsspec backends - RemoteCacheBackend for opt-in remote cache (SCALABLE_CACHE_REMOTE) - Manifest overlays (overlays: block + targets[*].overlay) - CostEstimate primitives and static cost tables - scalable run CLI verb - Settings: cache_remote_uri, default_storage, runs_dir_remote - Telemetry: CostEvent, cost.jsonl stream, cost in report - Provider protocol: optional estimate_cost() method - Public API: Phase 3 exports with optional-dep guards - Docs: cloud.rst, kubernetes.rst, artifacts.rst, overlays.rst, cost.rst - Example manifests: gke, aws, overlays - 238 unit tests passing, ruff clean Version bumped to 2.0.0a3.

Phase 3: cloud + Kubernetes execution, artifact stores, overlays, cost

Implements the Phase 4 deliverables from the v2.0.0 development plan: - AI assistant subsystem (scalable.ai) with pluggable LLM backend protocol and heuristic-only fallback mode - Component onboarding assistant (scalable init-component) - Failure diagnosis assistant (scalable diagnose) - Plan explanation assistant (scalable explain) - Workflow composition assistant (scalable compose) - Manifest migration assistant (scalable migrate) - ScalableSession.plan(objective=, policy=) now functional with heuristic-based resource/worker adjustments - Prompt template system for all assistants - Settings: SCALABLE_AI_BACKEND, SCALABLE_AI_MODEL, SCALABLE_AI_ENDPOINT - Populated [project.optional-dependencies] ai extra - Version bumped to 2.0.0a4 - 356 unit tests passing, ruff clean All AI features work without an LLM backend via deterministic heuristic fallbacks. LLM enhancement is opt-in. All outputs are reviewable artifacts - never auto-executed. Ref: plans/v2.0.0_phase4_plan.md

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/b7e62493-29e0-4a5f-9bdb-28a778012e68 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

[WIP] Fix failing GitHub Actions job 'ruff + mypy'

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/fe9e5b5a-f73f-4999-8e77-194af9b7b931 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Phase 4: AI assistant features

- Add scalable.ml package: LearnedAdvisor, AdaptiveScaler, FeatureExtractor, ResourceModel, HyperparameterSearch, cross_validate_advisor - Add scalable.emulation package: @emulatable decorator, EmulatorRegistry, EmulatorDispatch, ActiveLearner, GradientBoostingEmulator, RandomForestEmulator, uncertainty calibration - Add scalable advise CLI command with ML-backed recommendations - Add EmulationEvent to telemetry events - Add Phase 5 settings (ML cache, emulator registry, enable flags) - Add [ml] optional dependency extra (scikit-learn, dask-ml, joblib) - Bump version to 2.0.0a5 - 75 new unit tests, 431 total passing

Version/2.0.0 phase5 ml emulation

Version/2.0.0

crvernon and others added 30 commits May 19, 2026 14:42

ignore roo

1bbd5b7

resize logo

b0c3ba4

resize logo

0ed8671

resize logo

65d539e

WU-2: add scalable.yaml parser with env expansion + schema checks

2f6c7fd

WU-3: add manifest semantic validator + validation report tests

5b6b7b9

WU-4: add provider protocol, deployment spec, and registry

a2eccad

WU-5: add LocalProvider with tagged local execution and tests

609b9e0

WU-6: add SlurmProvider translation layer with mocked tests

738a0b5

WU-7: add manifest-to-legacy adapter and ModelConfig deprecation gate

1bbac8f

feat(v2-phase1): add session+dryrun APIs, CLI commands, docs, and CI …

cdbc51d

…validation

ignore env files

abecbeb

add env example file

78ca831

update changelog

f9b5642

Merge pull request #20 from JGCRI/version/2.0.0-phase1-provider-manifest

fc0a8e8

Phase 1: provider abstraction + scalable.yaml manifest foundation

phase 2 progress towards telemetry and deterministic advising

5cf2d97

Merge pull request #21 from JGCRI/version/2.0.0-phase2-telemetry-advi…

1ae928c

…sing phase 2 progress towards telemetry and deterministic advising

Merge pull request #22 from JGCRI/version/2.0.0-phase3-cloud-kubernetes

6c55dde

Phase 3: cloud + Kubernetes execution, artifact stores, overlays, cost

Initial plan

78b12cb

Merge version/2.0.0-phase4-ai-assistants into fix branch

468ff67

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Fix lint violations in session and AI planning tests

6e18a73

Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Add explicit GitHub Actions token permissions in tests workflow

cc28345

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/b7e62493-29e0-4a5f-9bdb-28a778012e68 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

Merge pull request #24 from JGCRI/copilot/fix-ruff-mypy-job-failure

10f57eb

[WIP] Fix failing GitHub Actions job 'ruff + mypy'

Rollback branch content to commit 1460fff

3dc3683

Agent-Logs-Url: https://github.com/JGCRI/scalable/sessions/fe9e5b5a-f73f-4999-8e77-194af9b7b931 Co-authored-by: crvernon <3947069+crvernon@users.noreply.github.com>

ruff fixes

3b4bd09

ruff fixes

d911ef8

Merge pull request #23 from JGCRI/version/2.0.0-phase4-ai-assistants

a864a60

Phase 4: AI assistant features

crvernon added 25 commits May 19, 2026 20:28

Add Phase 5 implementation plan

60dfd39

Merge Phase 4: AI assistant features into version/2.0.0

bc321c9

Merge pull request #25 from JGCRI/version/2.0.0-phase5-ml-emulation

a3b68a6

Version/2.0.0 phase5 ml emulation

update docs

dcb8a2d

how-to tutorials in docs

b3b2151

jupyter notebook tutorials

6112929

pydanticai transition

ea975f4

support tests failure for ai

2dc4988

ruff adjustments

c4021e1

formatting for ruff

d0df09f

Merge pull request #26 from JGCRI/version/2.0.0

7c2b2bd

Version/2.0.0

adjust title overline length

12df6d5

update storylines in tutorials

a9fcec7

update systems language

1e7dc30

extending more ai provider options

d3e0e5b

update docs on ai provider expansion

8254f5b

handle env setup for ai notebook

be11ab9

docs update for provider support

1a9568a

new beginner level tutorials

8ef0f57

beginner tutorials

b42ff4e

break out tutorials into beginner and advanced

bf7c7ad

reorder docs

eff3b5d

update version to 2.0.0-beta.1

d3bce61

local dev install clarity

ec67aa1

crvernon requested a review from pralitp May 20, 2026 20:49

fix formatting via ruff feedback

f67c8d3

crvernon merged commit cd9fc40 into main May 20, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version: 2.0.0-beta.1#27

Version: 2.0.0-beta.1#27
crvernon merged 56 commits into
mainfrom
develop

crvernon commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

crvernon commented May 20, 2026

Documentation and Tutorials

Machine Learning and AI Assistant Features

Cloud, Kubernetes, and Artifacts

Configuration and Environment

CI and Testing

Citation and Metadata

Other Notable Updates

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants