CodeCureAgent is an autonomous LLM-based agent for automated static analysis warning repair.
It classifies and fixes SonarQube rule violations in Java projects.
Please find our paper describing CodeCureAgent and its evaluation here: https://arxiv.org/abs/2509.11787.
You have two options. Either set up CodeCureAgent using the provided Dev Container (requires VS Code), or use the pre-built Docker image.
- User requirements:
- Basic Docker and/or VS Code Dev Container familiarity.
- OpenAI API key with credits. Create an account on the OpenAI website and purchase credits to use the API. Generate an API token on the same website.
- Hardware requirements:
- At least 40 GB free disk space.
- At least 8 GB free RAM (16 GB+ recommended).
- Internet access.
- Software requirements:
- Linux, macOS, or Windows with WSL2.
- Docker 20.10+.
- VS Code (optional, for Dev Container workflow).
-
Ensure you have the Dev Containers extension installed in VS Code. You can install it from the Visual Studio Code Marketplace.
-
Clone the CodeCureAgent repository:
git clone https://github.com/sola-st/CodeCureAgent.git
-
Open the repository folder in VS Code.
-
Reopen in Container: When prompted by VS Code to "Reopen in Container," click it.
If not prompted, open the Command Palette (Ctrl+Shift+P) and select "Dev Containers: Reopen in Container."
VS Code will now build and start the Dev Container, setting up the environment for you. This will take roughly 4 minutes.
After the Dev Container is built it will continue to run further setups in the terminal. Wait until this is completed too (roughly 2 more minutes).
If the Dev Container opened in less than a few minutes it likely failed to create the container properly. Then rebuild the container via opening the Command Palette (Ctrl+Shift+P) and selecting "Dev Containers: Rebuild in Container." -
Within your VS Code terminal, move to the folder code_cure_agent
cd code_cure_agent -
Set the OpenAI API Key
Inside the Dev Container terminal, configure your OpenAI API key by running:
python3 set_api_key.py
The script will prompt you to paste your API token.
-
Clone the CodeCureAgent repository:
git clone https://github.com/sola-st/CodeCureAgent.git
-
Move into the repository root:
cd CodeCureAgent -
Pull the pre-built image from Docker Hub:
docker pull pascaljoos12d/codecureagent:latest
-
Start container with mounted experiment folders:
docker run -it --rm \ -v "$(pwd)/code_cure_agent/experimental_setups:/workspace/CodeCureAgent/code_cure_agent/experimental_setups" \ -v "$(pwd)/code_cure_agent/evaluation_results:/workspace/CodeCureAgent/code_cure_agent/evaluation_results" \ pascaljoos12d/codecureagent:latest
Any experiment logs written inside the container are immediately visible on the host (and vice versa).
The container starts a bash shell inside the repository root. -
From the container shell, move into
code_cure_agentbefore running any commands :cd code_cure_agent -
Set the OpenAI API Key
From the container shell, configure your OpenAI API key by running:
python3 set_api_key.py
The script will prompt you to paste your API token.
All commands in the following sections 2 to 6 must be run from the folder code_cure_agent.
Run CodeCureAgent on a small included example with 3 warnings:
./run_on_dataset.sh ./experimental_setups/example_dataset/example_dataset_input_file.csv hyperparams.jsonWhat happens:
- Input file is processed warning-by-warning.
- For each warning, CodeCureAgent checks out the target repository and commit.
- It initiates the autonomous repair process, first classifying the warning as true positive or false positive and then fixing or suppressing the warning accordingly.
- Detailed live logs are shown in terminal.
- Run logs are stored in
code_cure_agent/experimental_setups/experiment_X(auto-incrementing index).
Expected terminal output (shortened):
...
<Info on the configuration of the run>
...
Project checkout procedure starting.
...
CodeCureAgent is now running the Classification-Sub-Agent.
AUTHORISED COMMANDS LEFT: 20
CODECUREAGENT THOUGHTS: <Some agent thoughts>
NEXT ACTION: COMMAND = <Some command selected by the agent> ARGUMENTS = {<Arguments to the command>}
SYSTEM: Command `<command_name>` returned:
<Output from the command>
AUTHORISED COMMANDS LEFT: 19
... and so on, until the agent finishes its run.
Review the included logs, summaries, CSVs, Markdown files, and plots from our 1000-warning evaluation.
The experiment input files are located in code_cure_agent/experimental_setups/evaluation_dataset.
Most relevant is here the file code_cure_agent/experimental_setups/evaluation_dataset/evaluation_dataset_filled_up_to_1000_input_file.csv, which is the full input file to CodeCureAgent.
All log files from running the experiment on the 1000 warnings are located in code_cure_agent/evaluation_results/evaluation_outputs (split into multiple batches of experiment runs).
The most interesting files in this log output are the files in the subfolders code_cure_agent/evaluation_results/evaluation_outputs/experiment_X/run_summaries. These show for each warning run: details about the warning, classification and fix results including a diff of made changes for successful fixes. (Multi-File Fix Example: code_cure_agent/evaluation_results/evaluation_outputs/experiment_1/run_summaries/6_summary.diff)
The extracted and aggregated evaluation results are located in code_cure_agent/evaluation_results.
Important evaluation result files:
- code_cure_agent/evaluation_results/analysis_results_overview_all.md: Aggregated results for all 1000 warnings. Includes effectiveness stats (with manual inspection) (RQ1), efficiency stats (RQ2), and ablation stats (RQ4).
- code_cure_agent/evaluation_results/analysis_results_overview_first_291.md: Aggregated results for the subset of 291 warnings with distinct SonarQube rules.
- code_cure_agent/evaluation_results/analysis_results_overview_random_samples.md: Aggregated results for the subset of 709 random samples.
- code_cure_agent/evaluation_results/plots: Plots visualizing the evaluation results.
- code_cure_agent/evaluation_results/evaluation_results.csv: CSV file with extracted evaluation results for all 1000 warnings (used to create the aggregated Markdown files and plots). This file includes the manual inspection results and reasoning for each warning.
The baseline comparisons are located in the comparative_study folder. Each baseline has its own subfolder with README and assets.
- Sorald comparison assets:
- iSMELL comparison assets:
- CORE comparison assets:
This section details how to regenerate result files and plots from included experiment outputs (via scripts).
If you want to recompute metrics from the provided full logs, copy all folders/files from:
code_cure_agent/evaluation_results/evaluation_outputs
into:
code_cure_agent/experimental_setups
-
Create evaluation results CSV:
python3 experimental_setups/write_experiment_results_to_csv_file.py -t evaluation_results/new_experiment_results.csv
-
Create extended results CSV:
python3 experimental_setups/extend_evaluation_results_with_more_stats.py evaluation_results/new_experiment_results.csv -t evaluation_results/new_experiment_results_extended.csv
-
Aggregate stats to Markdown (for table-ready summaries):
python3 experimental_setups/calculate_stats_from_evaluation_results.py evaluation_results/new_experiment_results_extended.csv -t evaluation_results/new_experiment_results_analysis.md
-
Create per-warning run-summaries (the
x_summary.difffiles):python3 experimental_setups/create_warning_summaries.py -e evaluation_results/new_experiment_results_extended.csv
-
Open all relevant resources for a repaired warning to perform a manual inspection (Requires VS Code):
python3 experimental_setups/show_next_warning_for_manual_inspection.py -e evaluation_results/new_experiment_results.csv --id-to-show 1
Provide the ID of the warning to inspect via
--id-to-show. -
Regenerate plots via notebooks in code_cure_agent/experimental_setups
Run:
./run_on_dataset.sh ./experimental_setups/evaluation_dataset/evaluation_dataset_filled_up_to_1000_input_file.csv hyperparams.jsonUse the same script chain from Section 4.2 to create:
- evaluation CSV
- extended CSV
- aggregated Markdown stats
- per-warning summaries
- plots
-
Create a CSV listing the target repositories (no header), one line per repository:
- repository URL
- commit ID (or
MASTERfor latest commit on master/main branch) - target Java version.
The Java version the project compiles to. Used to configure the SonarQube analyzer with the correct rules. Can be automatically inferred by using the script code_cure_agent/experimental_setups/infer_target_java_version_of_projects.py
Example repo list (used in the following): code_cure_agent/experimental_setups/evaluation_dataset/evaluation_dataset_repos_list_with_java_versions.csv
-
Mine SonarQube warnings via Sorald:
java -jar ./sorald/sorald.jar mine \ --git-repos-list ./experimental_setups/evaluation_dataset/evaluation_dataset_repos_list_with_java_versions.csv \ --miner-output-file ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_out.txt \ --stats-output-file ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_mining_result.json \ --temp-dir ./experimental_setups/evaluation_dataset/temp \ --stats-on-git-repos \ --rule-parameters ./sonarqube_quality_profile/quality_profile_rule_parameters.json \ --handled-rulesNotes:
- Run from
code_cure_agent. - Remove
--handled-rulesto mine all supported rules. - Filter by rule IDs with
--rule-keysor by rule type with--rule-types. - SonarWay quality profile used in our experiments (pass to
--rule-keys):
- Run from
-
Convert mining JSON to CodeCureAgent input CSV:
python3 ./experimental_setups/prepare_experiment_input_file.py ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_mining_result.json \ --target-csv-file-path ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_input_file_all_violations.csv --rule-violations-mode single -
Optionally sample from the warnings using code_cure_agent/experimental_setups/sample_rule_violations_from_input_file.py
-
Run CodeCureAgent on the created input file:
./run_on_dataset.sh ./experimental_setups/evaluation_dataset/mining_results/evaluation_dataset_input_file_all_violations.csv hyperparams.json
Current limitation:
- Only Maven projects are supported that build with
mvn clean packageusing Maven 3.6.3.
7.1 Modify Hyperparameters in: code_cure_agent/hyperparams.json
-
Budget Control Strategy:
Defines how the agent views the remaining cycles, suggested fixes, and minimum required fixes:- FULL-TRACK: Put the max, consumed, and left budget in the prompt (default for our experiments).
- NO-TRACK: Suppresses budget information.
Example Configuration:
"budget_control": { "name": "FULL-TRACK", "params": { "#fixes": 4 //The agent should suggest at least 4 patches within the given budget, the number is updated based on agent progress (4 is default). } }
-
Repetition Handling:
Default settings restrict repetitions (reprompting the LLM for repeated actions)."repetition_handling": "RESTRICT",
Change to "NONE" to allow unrestricted repetitions.
-
Cycle Limits:
Control the maximum allowed cycles (budget) in the different sub-agents.
Default for our experiment:"classification_cycles_limit": 20, "fix_cycles_limit": 40
-
Prioritize
write_fixWhen Few Cycles Are Left:
Default for our experiment:"prioritize_write_fix_cycle_threshold": 5
In the code_cure_agent/run_on_dataset.sh file, locate the line:
./run.sh --ai-settings agent_config_and_prompt_files/ai_settings.yaml --model-version gpt-4.1-mini-2025-04-14 -m json_file --experiment-file "$2"Replace --model-version with one of:
gpt-3.5-turbo-0125gpt-4-turbo-2024-04-09gpt-4o-mini-2024-07-18gpt-4o-2024-08-06gpt-4.1-nano-2025-04-14gpt-4.1-mini-2025-04-14gpt-4.1-2025-04-14
Reasoning models are not supported by the API version currently used in this project.
The following are the most important folders for the CodeCureAgent implementation:
- code_cure_agent/agent_core: Main parts of the CodeCureAgent implementation
- code_cure_agent/agent_core/agents: Implementation of the cyclic agent with two sub-agents
- code_cure_agent/agent_core/commands: Implemented tools that the agent can use
- code_cure_agent/agent_config_and_prompt_files: Used prompt files for the two sub-agents and agent configuration
