Refactor Axolotl to pass a generation error callback#4927
Open
mgrange1998 wants to merge 8 commits intofacebook:mainfrom
Open
Refactor Axolotl to pass a generation error callback#4927mgrange1998 wants to merge 8 commits intofacebook:mainfrom
mgrange1998 wants to merge 8 commits intofacebook:mainfrom
Conversation
…tch_utils Summary: Renames the `max_parallelism` parameter to `max_concurrency` across GenerationStep, GenerationNode, and the generation strategy dispatch utilities. Adds backward-compatible deprecated `max_parallelism` parameters with deprecation warnings where the public API is affected (`choose_generation_strategy`). Internal variable names (`sobol_parallelism`, `bo_parallelism`) are renamed to `sobol_concurrency`, `bo_concurrency` for consistency. Differential Revision: D92457714
Summary: Renames the `parallelism` parameter to `concurrency` in `Client.run_trials()` and adds backward-compatible deprecated `max_parallelism` parameters in `AxClient.create_experiment()` and `AxClient.get_max_parallelism()` → `get_max_concurrency()`. Both include deprecation warnings guiding callers to use the new parameter names, with validation that old and new parameters are not specified simultaneously. Differential Revision: D93771849
…Settings Summary: Renames `num_parallel_jobs` to `num_concurrent_jobs` in `BenchmarkExecutionSettings` and all nightly benchmark configurations. Also updates the docstring in `BenchmarkMethod` to reference "pending trials" instead of "parallelism". This is a mechanical rename with no behavioral change. Differential Revision: D93771883
…ants, and telemetry Summary: Updates remaining references from "parallelism" to "concurrency" across orchestration, telemetry, early stopping, and other modules. This covers docstrings, comments, constant names (`MAX_PENDING_TRIALS` → `MAX_CONCURRENT_TRIALS`, `DUMMY_MAX_PENDING_TRIALS` → `DUMMY_MAX_CONCURRENT_TRIALS`), telemetry field names, and variable names in test files. No behavioral changes — purely a terminology alignment. Differential Revision: D93771906
…tDesign.concurrency_limit` Summary: As titled, adding a simple `ExperimentDesign` object. Putting it into properties for serialization for now, so as to not do duplicate work ahead of the storage refactor implementation (and also in case we change things while working on this stack). Differential Revision: D89770462
Summary: Migrates all references from `experiment._properties[Keys.EXPERIMENT_TOTAL_CONCURRENT_ARMS]` to `experiment.design.concurrency_limit`, completing the transition to the `ExperimentDesign` dataclass introduced in the prior diff. This affects generation node input constructors (including `ALL_N` and `REPEAT_N`), the Axolotl updater, and associated tests. Also cleans up the `no-commit` code in `generation_node_input_constructors.py` to use the new `concurrency_limit` field with a fallback to a default of 10. Differential Revision: D89772029
Summary:
## Changes
Consolidates `generate_candidates` and `_prepare_trials` into a unified API:
- Renames `generate_candidates` → `generate_candidate_trials` and changes its return type to a 3-tuple `(existing_candidates, new_trials, error)`, incorporating the existing-candidate-trial logic that was previously in `_prepare_trials`.
- Extracts the capacity/limit calculation from `_prepare_trials` into a new `compute_n_to_generate` method, which the Orchestrator's main loop now calls before `generate_candidate_trials`.
- Renames `should_generate_candidates_for_pts` → `should_generate_candidate_trials_for_pts` and adds a "not enough data" check that validates metrics have at least 1 day of data before allowing generation.
- Adds two new test methods for the "not enough data" and "missing metrics + not enough data" scenarios.
## Devmate session
How doing this with Devmate went:
1. First we ask Devmate to analyse the difference betwen the methods; it does remarkably well:{F1984363089} {F1984363089} {F1984363089}
2. Next a tangent: I renamed `generate_candidates` with a more precise name (`generate_candidate_trials`), since that is the method we will keep between the two, and it might as well have a better name. Asked Devmate to apply the changes throughout fbcode.
{F1984363157} {F1984363170}
3. Now for the hard part: get `generate_candidate_trials` to match the behavior or `_prepare_trials`, without me writing any of the code: {F1984363323} {F1984363333}
^ Pretty good for starters! I give corrections, see above; it applies them well: {F1984363346}
Then with one more small correction, we have a very solid plan: {F1984363398}, which Devmate implements: {F1984363406} {F1984363458}. I think it did really well!
Differential Revision: D89750211
Summary: Adds an `on_generation_error` callback parameter to `generate_candidate_trials` and a corresponding `on_generation_error` function in Axolotl utils. This allows callers like Axolotl to format error messages (including paste upload with full traceback) without the Orchestrator needing to know about paste infrastructure. The `generate_candidate_trials` return type changes from a 3-tuple to a 4-tuple, adding a `cannot_generate_reason` string that the callback populates when generation fails. The existing candidate trial accounting is also moved from `generate_candidate_trials` into `compute_n_to_generate`, so the `n` parameter now represents exactly the number of new trials to generate. Differential Revision: D89751541
|
@mgrange1998 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89751541. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Adds an
on_generation_errorcallback parameter togenerate_candidate_trialsand a correspondingon_generation_errorfunction in Axolotl utils. This allows callers like Axolotl to format error messages (including paste upload with full traceback) without the Orchestrator needing to know about paste infrastructure.The
generate_candidate_trialsreturn type changes from a 3-tuple to a 4-tuple, adding acannot_generate_reasonstring that the callback populates when generation fails. The existing candidate trial accounting is also moved fromgenerate_candidate_trialsintocompute_n_to_generate, so thenparameter now represents exactly the number of new trials to generate.Differential Revision: D89751541