Skip to content

Pass train_dtype into _load_model_memory_efficient#105

Closed
RobotSail wants to merge 1 commit into
mainfrom
agent/mini-trainer-dev/cec043a1
Closed

Pass train_dtype into _load_model_memory_efficient#105
RobotSail wants to merge 1 commit into
mainfrom
agent/mini-trainer-dev/cec043a1

Conversation

@RobotSail

Copy link
Copy Markdown
Collaborator

Summary

  • Adds train_dtype as an explicit parameter to _load_model_memory_efficient instead of extracting it from base_kwargs internally
  • Removes the now-unnecessary torch_dtype lookup and validation from inside the function
  • Updates the call site in from_pretrained to extract and validate torch_dtype before passing it as train_dtype
  • Adds two new tests: one verifying train_dtype overrides base_kwargs["torch_dtype"], and one verifying it works when base_kwargs has no torch_dtype

Closes #34

Test plan

  • ruff check passes
  • ruff format --check passes
  • Existing test_memory_efficient_loading_calls_alignment_hook updated with new parameter
  • New test_memory_efficient_loading_uses_train_dtype verifies override behavior
  • New test_memory_efficient_loading_fallback_when_base_kwargs_missing_dtype verifies no-dtype-in-kwargs case

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Instead of extracting torch_dtype from base_kwargs inside the function,
accept train_dtype as a proper parameter. This makes the API clearer and
ensures the caller controls which dtype is used for model loading.

Closes #34

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@RobotSail, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 16 minutes and 44 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c243ffe7-b9c4-485b-90f4-259600fe8f8a

📥 Commits

Reviewing files that changed from the base of the PR and between e3db6da and 700e288.

📒 Files selected for processing (2)
  • src/mini_trainer/osft_utils.py
  • tests/test_osft.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch agent/mini-trainer-dev/cec043a1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@RobotSail RobotSail closed this Jun 2, 2026
@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mini_trainer/osft_utils.py 40.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass train_dtype into _load_gpt_oss_model_memory_efficient

1 participant