Skip to content

Add MathAlea French math MCQ community task#3

Open
Lduignan1 wants to merge 5 commits intoOpenLLM-France:mainfrom
Lduignan1:mathalea
Open

Add MathAlea French math MCQ community task#3
Lduignan1 wants to merge 5 commits intoOpenLLM-France:mainfrom
Lduignan1:mathalea

Conversation

@Lduignan1
Copy link

@Lduignan1 Lduignan1 commented Mar 6, 2026

Summary

  • Add community_tasks/mathalea.py — a multiple-choice math benchmark for French middle and
    high school students (cinquième, quatrième, troisième, première, terminale)
  • Dataset: OpenLLM-BPI/MathAleaMCQ
  • 6 task configs: mathalea:all (full dataset) + 5 per-grade subtasks
  • Uses loglikelihood_acc metric with single-letter choices (A, B, C, D)

Test plan

  • Verify task loads: lighteval accelerate "model_name=..." "community|mathalea:all|0"
  • Verify per-grade subtasks load correctly
  • Confirm dataset is accessible on HuggingFace

Copy link
Member

@Jeronymous Jeronymous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate test should be to have good results with a SOTA model that is not too big (SmolLM 3B) ?

You chose to expose only the MCF formulation. Which exclude pretrained models, and in my (small) experience is challenging to make work with instruct models.
Maybe you can propose several formulations (CF, MCF and hybrid) by doing something similar to https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks/mlmm_arc_challenge.py

Comment on lines +29 to +35
GRADE_LEVELS = {
"cinquième": "cinquieme",
"quatrième": "quatrieme",
"troisième": "troisieme",
"première": "premiere",
"terminale": "terminale",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could such a function to normalize the subset names

import unicodedata

def remove_accents(text: str) -> str:
    return ''.join(
        c for c in unicodedata.normalize('NFD', text)
        if unicodedata.category(c) != 'Mn'
    )

stop_sequence=["\n"],
version=0,
)
for subset, alias in GRADE_LEVELS.items()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be factored, by treating "all" as other subsets

def prompt_mathalea(line, task_name: str = None):
"""Build a multiple-choice prompt from a MathAlea dataset line."""
choices = line["choices"]
query = f"{line['question'].strip()}\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add "Réponse :" in the end, don't you want to add "Question :" prefix here?
Or even better, with first an instruction such that "Réponds à la question à choix multiple suivante, en répondant avec le format LETTER, où LETTER est une lettre parmi A, B, C ou D.".

Aussi, est-ce qu'on veut que les modèles soient autorisés à raisonner avant de répondre?
Si oui, il faut faire quelque chose à la "gpqa_diamond_instruct " -- https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/tasks/gpqa.py
(c'est un dataset assez compliqué, les modèles ne sont pas sensés pouvoir répondre directement la bonne lettre sans réfléchir avant).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

est une lettre parmi A, B, C ou D.

Oh wait, it seems that sometimes there are less than 4 possible answers, right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants