Add MathAlea French math MCQ community task by Lduignan1 · Pull Request #3 · OpenLLM-France/lighteval

Lduignan1 · 2026-03-06T13:17:39Z

Summary

Add community_tasks/mathalea.py — a multiple-choice math benchmark for French middle and
high school students (cinquième, quatrième, troisième, première, terminale)
Dataset: OpenLLM-BPI/MathAleaMCQ
6 task configs: mathalea:all (full dataset) + 5 per-grade subtasks
Uses loglikelihood_acc metric with single-letter choices (A, B, C, D)

Test plan

Verify task loads: lighteval accelerate "model_name=..." "community|mathalea:all|0"
Verify per-grade subtasks load correctly
Confirm dataset is accessible on HuggingFace

Jeronymous

The ultimate test should be to have good results with a SOTA model that is not too big (SmolLM 3B) ?

You chose to expose only the MCF formulation. Which exclude pretrained models, and in my (small) experience is challenging to make work with instruct models.
Maybe you can propose several formulations (CF, MCF and hybrid) by doing something similar to https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks/mlmm_arc_challenge.py

Jeronymous · 2026-03-11T09:37:50Z

community_tasks/mathalea.py

+GRADE_LEVELS = {
+    "cinquième": "cinquieme",
+    "quatrième": "quatrieme",
+    "troisième": "troisieme",
+    "première": "premiere",
+    "terminale": "terminale",
+}


You could such a function to normalize the subset names

import unicodedata def remove_accents(text: str) -> str: return ''.join( c for c in unicodedata.normalize('NFD', text) if unicodedata.category(c) != 'Mn' )

Jeronymous · 2026-03-11T09:38:39Z

community_tasks/mathalea.py

+        stop_sequence=["\n"],
+        version=0,
+    )
+    for subset, alias in GRADE_LEVELS.items()


This can be factored, by treating "all" as other subsets

Jeronymous · 2026-03-11T09:48:21Z

community_tasks/mathalea.py

+def prompt_mathalea(line, task_name: str = None):
+    """Build a multiple-choice prompt from a MathAlea dataset line."""
+    choices = line["choices"]
+    query = f"{line['question'].strip()}\n"


If you add "Réponse :" in the end, don't you want to add "Question :" prefix here?
Or even better, with first an instruction such that "Réponds à la question à choix multiple suivante, en répondant avec le format LETTER, où LETTER est une lettre parmi A, B, C ou D.".

Aussi, est-ce qu'on veut que les modèles soient autorisés à raisonner avant de répondre?
Si oui, il faut faire quelque chose à la "gpqa_diamond_instruct " -- https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/tasks/gpqa.py
(c'est un dataset assez compliqué, les modèles ne sont pas sensés pouvoir répondre directement la bonne lettre sans réfléchir avant).

est une lettre parmi A, B, C ou D.

Oh wait, it seems that sometimes there are less than 4 possible answers, right ?

Lduignan1 added 4 commits March 6, 2026 13:59

Add MathAlea benchmark for French math multiple-choice evaluation

0d59c8d

Fix gold index retrieval in prompt_mathalea function

7859993

Update MathAlea metadata with detailed description, language, and tags

3354541

Fix dataset reference in MathAlea metadata

e372a0f

Jeronymous reviewed Mar 11, 2026

View reviewed changes

Refactor MathAlea dataset configuration and prompt generation functions

d42f5fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MathAlea French math MCQ community task#3

Add MathAlea French math MCQ community task#3
Lduignan1 wants to merge 5 commits intoOpenLLM-France:mainfrom
Lduignan1:mathalea

Lduignan1 commented Mar 6, 2026 •

edited

Loading

Uh oh!

Jeronymous left a comment

Uh oh!

Jeronymous Mar 11, 2026

Uh oh!

Jeronymous Mar 11, 2026

Uh oh!

Jeronymous Mar 11, 2026

Uh oh!

Jeronymous Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Lduignan1 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Jeronymous left a comment

Choose a reason for hiding this comment

Uh oh!

Jeronymous Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Jeronymous Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Jeronymous Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Jeronymous Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lduignan1 commented Mar 6, 2026 •

edited

Loading