Add MathAlea French math MCQ community task#3
Add MathAlea French math MCQ community task#3Lduignan1 wants to merge 5 commits intoOpenLLM-France:mainfrom
Conversation
Jeronymous
left a comment
There was a problem hiding this comment.
The ultimate test should be to have good results with a SOTA model that is not too big (SmolLM 3B) ?
You chose to expose only the MCF formulation. Which exclude pretrained models, and in my (small) experience is challenging to make work with instruct models.
Maybe you can propose several formulations (CF, MCF and hybrid) by doing something similar to https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks/mlmm_arc_challenge.py
community_tasks/mathalea.py
Outdated
| GRADE_LEVELS = { | ||
| "cinquième": "cinquieme", | ||
| "quatrième": "quatrieme", | ||
| "troisième": "troisieme", | ||
| "première": "premiere", | ||
| "terminale": "terminale", | ||
| } |
There was a problem hiding this comment.
You could such a function to normalize the subset names
import unicodedata
def remove_accents(text: str) -> str:
return ''.join(
c for c in unicodedata.normalize('NFD', text)
if unicodedata.category(c) != 'Mn'
)
community_tasks/mathalea.py
Outdated
| stop_sequence=["\n"], | ||
| version=0, | ||
| ) | ||
| for subset, alias in GRADE_LEVELS.items() |
There was a problem hiding this comment.
This can be factored, by treating "all" as other subsets
community_tasks/mathalea.py
Outdated
| def prompt_mathalea(line, task_name: str = None): | ||
| """Build a multiple-choice prompt from a MathAlea dataset line.""" | ||
| choices = line["choices"] | ||
| query = f"{line['question'].strip()}\n" |
There was a problem hiding this comment.
If you add "Réponse :" in the end, don't you want to add "Question :" prefix here?
Or even better, with first an instruction such that "Réponds à la question à choix multiple suivante, en répondant avec le format LETTER, où LETTER est une lettre parmi A, B, C ou D.".
Aussi, est-ce qu'on veut que les modèles soient autorisés à raisonner avant de répondre?
Si oui, il faut faire quelque chose à la "gpqa_diamond_instruct " -- https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/tasks/gpqa.py
(c'est un dataset assez compliqué, les modèles ne sont pas sensés pouvoir répondre directement la bonne lettre sans réfléchir avant).
There was a problem hiding this comment.
est une lettre parmi A, B, C ou D.
Oh wait, it seems that sometimes there are less than 4 possible answers, right ?
Summary
community_tasks/mathalea.py— a multiple-choice math benchmark for French middle andhigh school students (cinquième, quatrième, troisième, première, terminale)
mathalea:all(full dataset) + 5 per-grade subtasksloglikelihood_accmetric with single-letter choices (A, B, C, D)Test plan
lighteval accelerate "model_name=..." "community|mathalea:all|0"