HeshamHaroon's picture
Update: Auto-evaluation on Space startup
de63c9e verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Arabic Function Calling Leaderboard
emoji: ๐Ÿ†
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - arabic
  - function-calling
  - leaderboard
  - llm-evaluation

๐Ÿ† Arabic Function Calling Leaderboard

ู„ูˆุญุฉ ุชู‚ูŠูŠู… ุงุณุชุฏุนุงุก ุงู„ุฏูˆุงู„ ุจุงู„ุนุฑุจูŠุฉ

Overview

The Arabic Function Calling Leaderboard (AFCL) evaluates Large Language Models on their ability to:

  1. Understand Arabic queries (MSA + Dialects)
  2. Select appropriate functions from available options
  3. Extract correct arguments from Arabic text
  4. Handle parallel and complex function calls
  5. Detect when no function should be called

Models Evaluated

  • Arabic-Native: Jais, ALLaM, SILMA, AceGPT
  • Multilingual: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya

Dataset

๐Ÿ“Š Dataset: HeshamHaroon/Arabic_Function_Calling

  • 1,470 total samples across 10 categories
  • Simple, Multiple, Parallel, Parallel Multiple
  • Irrelevance Detection
  • Dialect Handling (Egyptian, Gulf, Levantine)

Evaluation

The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts.

Citation

@misc{afcl2024,
    title={Arabic Function Calling Leaderboard},
    author={Hesham Haroon},
    year={2024},
    url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard}
}