Spaces:

HeshamHaroon
/

Arabic-Function-Calling-Leaderboard

Running

App Files Files Community

Arabic-Function-Calling-Leaderboard / README.md

HeshamHaroon

Update: Auto-evaluation on Space startup

de63c9e verified 14 days ago

preview code

raw

history blame contribute delete

1.51 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Arabic Function Calling Leaderboard
emoji: 🏆
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - arabic
  - function-calling
  - leaderboard
  - llm-evaluation

🏆 Arabic Function Calling Leaderboard

لوحة تقييم استدعاء الدوال بالعربية

Overview

The Arabic Function Calling Leaderboard (AFCL) evaluates Large Language Models on their ability to:

Understand Arabic queries (MSA + Dialects)
Select appropriate functions from available options
Extract correct arguments from Arabic text
Handle parallel and complex function calls
Detect when no function should be called

Models Evaluated

Arabic-Native: Jais, ALLaM, SILMA, AceGPT
Multilingual: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya

Dataset

📊 Dataset: HeshamHaroon/Arabic_Function_Calling

1,470 total samples across 10 categories
Simple, Multiple, Parallel, Parallel Multiple
Irrelevance Detection
Dialect Handling (Egyptian, Gulf, Levantine)

Evaluation

The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts.

Citation

@misc{afcl2024,
    title={Arabic Function Calling Leaderboard},
    author={Hesham Haroon},
    year={2024},
    url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard}
}