You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/app/data_science/conf.py
+5-1Lines changed: 5 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -145,7 +145,9 @@ class DataScienceBasePropSetting(KaggleBasePropSetting):
145
145
coder_longer_timeout_multiplier_upper: int=3
146
146
runner_longer_timeout_multiplier_upper: int=2
147
147
coder_timeout_increase_stage: float=0.3
148
-
runner_timeout_increase_stage: float=0.15
148
+
runner_timeout_increase_stage: float=0.3
149
+
runner_timeout_increase_stage_patience: int=2
150
+
"""Number of failures tolerated before escalating to next timeout level (stage width). Every 'patience' failures, timeout increases by 'runner_timeout_increase_stage'"""
149
151
show_hard_limit: bool=True
150
152
151
153
#### enable runner code change summary
@@ -174,6 +176,8 @@ class DataScienceBasePropSetting(KaggleBasePropSetting):
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/dev/runner/prompts.yaml
+4-2Lines changed: 4 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -34,16 +34,18 @@ DSCoSTEER_eval:
34
34
For example, if the code uses only a very small portion of the allowed time, and hyperparameters like `n_estimators` or `epochs` have low values, with early stopping not being triggered and possible signs of underfitting, you should suggest increasing these hyperparameters.
35
35
You should also notice other resources utilization hyper-parameters.
36
36
For example, if you are using a GPU with large memory, and the batch size is set very low, you should suggest increasing the batch size if it is not reasonable.
37
-
37
+
For example, prioritize adjustments to batch size and number of epochs. If further tuning is needed, consider parameters with significant impact on performance such as learning rate and the number of model folds. For CV competitions, also consider image size (imgsize), and for NLP competitions, consider maximum sequence length (maxlen), as these can have a substantial impact on results.
38
38
## Evaluation Guidelines
39
39
1. The code execution time or resource utilization suggest that there is room for improvement in the hyperparameters.
40
40
2. The code must apply early stopping strategy already (in order to prevent overfitting).
41
41
3. Your suggestion should have a strong chance of improving the model's performance. Focus on the most obvious and impactful opportunities for quick improvement by leveraging more training time. Don't explore hyperparameters with low confidence. If there are no obvious and impactful opportunities and the code runs well, please accept it.
42
42
4. Only include the suggestions in your response without leak any time limit information because the user might over-fit the model to the time limit.
43
43
5. Never make your judgment only based on the time spent, you should also consider the code and the stdout.
44
+
44
45
If the code satisfy the requirements:
45
46
- Set "hyperparameter_tuning_decision" to true.
46
-
- In "hyperparameter_tuning_suggestion", provide a clear, specific, and actionable suggestion. Begin with a concrete observation, then state a direct action to take. Do not use vague language, options, or uncertainty (avoid words like "A or B"). For example: "[Observation] The maximum number of epochs was reached, but the validation loss is still decreasing and early stopping was not activated. Only small portion of the allowed time was used. [Suggestion] Increase epochs to 100 to avoid underfitting and further improve model performance."
47
+
- In "hyperparameter_tuning_suggestion", provide a clear, specific, and actionable suggestion. Begin with a concrete observation, then state a direct action to take. Do not use vague language, options, or uncertainty (avoid words like "A or B"). For example: "[Observation] Training stopped due to early stopping while the validation loss was still decreasing. This suggests the patience parameter may be too small.
48
+
[Suggestion] Increase the early stopping patience to allow more training epochs before stopping, which can further improve model performance."
47
49
If the code does not satisfy the requirements:
48
50
- Set "hyperparameter_tuning_decision" to false.
49
51
- Set "hyperparameter_tuning_suggestion" to an empty string.
- Low Runtime Case: current max runtime ({{ time_max }} hours) is far from the time limit.
550
548
- Prefer hypotheses with runtimes ≤ {{ full_time }} hours.
551
549
- Hypotheses slightly above {{ time_max }} hours can be retained only with strong justification.
552
550
{% endif %}
553
551
554
552
### Ensemble Model Core Principle in Low Runtime Case
555
-
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
556
-
Please note: you are operating under a time budget dedicated to ensemble training of {{res_time}} seconds, and the maximum allowed time is {{full_time}} seconds.
557
-
558
-
Please take the remaining {{res_time}} seconds to carefully consider and design the most reasonable and optimal ensemble models based on your current progress.
553
+
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
554
+
These are examples:
555
+
556
+
Example 1:
559
557
Assume training a single model takes about 1 hour. For example, if you have roughly twice that time left, you can try training multiple models with different random seeds or data splits to reuse time effectively.
560
558
If you have more time, you might consider training a multi-fold ensemble. Use your judgment to decide how many folds or seeds fit within your remaining time budget.
559
+
560
+
Example 2:
561
+
Assume training a single fold of a model takes at most {{ time_max }} hours. Within your remaining time budget, prioritize training multiple folds of the same model rather than trying many different models.
562
+
For instance, if you have roughly 2 × {{ time_max }} hours left, you could train 2 folds of the same model with different data splits or random seeds.
563
+
If more time is available, you might consider increasing the number of folds further. Use your judgment to decide how many folds fit within the remaining time budget while respecting the time_max constraint for a single fold.
561
564
562
565
### 2. Training-Time Resource Allocation
563
-
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
566
+
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within remaining time budget**.
564
567
- Avoid reducing base model quality just to save time. For example:
565
568
- Freezing large parts of the model (e.g., embeddings)
566
569
- Using only embedding-level regression instead of full modeling
@@ -702,19 +705,10 @@ task_gen:
702
705
10. File Handling & DataFrame Generation: Generate a pandas DataFrame with columns [“id”, “path”, “fold”].
703
706
- id: a unique identifier for each sample.
704
707
- path: the file path of the corresponding sample.
705
-
- split: indicates the assignment of each sample for data splitting. Two modes are supported:
706
-
- K-Fold (optional): assign integers 0, 1, …, K-1 for each fold.
707
-
- Train/Test Split (optional): assign "train" or "test" for each sample according to the split ratio (e.g., 8:2).
708
-
- Ensure reproducibility: the DataFrame must be generated exactly the same way every time the script runs, e.g., by fixing the random seed 42.
709
-
Data Splitting: use this DataFrame to perform dataset splitting, selecting samples for training and testing based on the fold column.
710
-
11. Random Seed for Model Training:
711
-
- If training neural networks, ensure the initial weights and all random operations use a fixed seed of 42 (e.g., torch.manual_seed(42), numpy.random.seed(42), random.seed(42)).
712
-
- If training machine learning models such as LightGBM, XGBoost, or scikit-learn estimators, absolutely ensure the random seed is fixed (e.g., `random_state=42`) to guarantee reproducibility.
713
-
- This is mandatory: all aspects of the experiment must be fully reproducible and aligned, including dataset splits and random seeds;
714
-
- For multi-fold training, use out-of-fold (OOF) predictions as validation scores and save them as an oof file.
715
-
12. Hypothesis Handling: At the initial stage, multiple hypotheses may be proposed simultaneously. If some hypotheses overlap, select the most promising one for implementation and ignore redundant overlapping hypotheses. Each implemented hypothesis should remain an independent task.
716
-
Ensure reproducibility: the DataFrame must be generated exactly the same way every time the script runs, regardless of system or runtime conditions (e.g., by fixing the random seed).
708
+
709
+
11. Hypothesis Handling: At the initial stage, multiple hypotheses may be proposed simultaneously. If some hypotheses overlap, select the most promising one for implementation and ignore redundant overlapping hypotheses. Each implemented hypothesis should remain an independent task.
717
710
{% endif %}
711
+
718
712
## Package Declaration
719
713
At the end of your design, **you MUST** provide a key `packages` in the final JSON output.
720
714
It should be an **array of PyPI package names** (strings) that you expect to `import` in the forthcoming implementation.
0 commit comments