Skip to content

wjurayj/final_answer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

Our paper shows how reasoning models can use additional test-time compute to improve their confidence allocation and deliver stronger performance in selective question answering.

alt text

Installation

Be sure to use our version of vllm, altered from the original s1 repo

pip install -r requirements.txt
cd eval/lm-evaluation-harness
pip install -e .[vllm]

Usage

First run:

scripts/generate_chains.sh

Then, run

scripts/incremental_answers.sh

Then use notebooks/figures_aime.ipynb to recreate the plots from the paper.

Citing

If you find our paper or code useful, consider citing us:

@misc{jurayj2025finalanswertesttimescaling,
      title={Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering}, 
      author={William Jurayj and Jeffrey Cheng and Benjamin Van Durme},
      year={2025},
      eprint={2502.13962},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.13962}, 
}

About

Code for "Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors