Upload from GitHub Actions: add gpt-5.1, gemini-3 9ea2dd3 verified davidpomerenke commited on 15 days ago
Upload from GitHub Actions: flores filter for available dev split 34b05c6 verified davidpomerenke commited on Nov 10
Upload from GitHub Actions: model name no bracket stuff aa92add verified davidpomerenke commited on Nov 9
Upload from GitHub Actions: Merge pull request #22 from datenlabor-bmz/dev 2cdada4 verified davidpomerenke commited on Oct 27
Upload from GitHub Actions: Add auto-translated datasets 68a93b5 verified davidpomerenke commited on Sep 20
Upload from GitHub Actions: Merge pull request #18 from datenlabor-bmz/pr-17 a0d1624 verified davidpomerenke commited on Sep 11
Upload from GitHub Actions: Add auto-translated datasets c790fdb verified davidpomerenke commited on Sep 1
Upload from GitHub Actions: ran full evaluation locally 088f96f verified davidpomerenke commited on Aug 30
Upload from GitHub Actions: updated and cleaned up scripts for new eval runs 963cb78 verified davidpomerenke commited on Aug 29
Upload from GitHub Actions: Update models.py, models.json, and results.json with latest evaluation data and model additions 8eebb41 verified davidpomerenke commited on Aug 27
Upload from GitHub Actions: Add Todos for using existing machine-translated datasets rather than our own ones 56adaa2 verified davidpomerenke commited on Aug 14
Upload from GitHub Actions: updated translation functions 8f5ce26 verified davidpomerenke commited on Aug 13
Upload from GitHub Actions: import flexibility on backend b8cbeff verified davidpomerenke commited on Aug 13
Upload from GitHub Actions: updated frontend and backend to fix bugs 4e8cb1a verified davidpomerenke commited on Aug 13
Upload from GitHub Actions: Merge pull request #13 from datenlabor-bmz/jn-dev 80d21cb verified davidpomerenke commited on Aug 8
Upload from GitHub Actions: Merge pull request #10 from datenlabor-bmz/jn-dev c2eeeac verified davidpomerenke commited on Aug 5
Upload from GitHub Actions: updated batch size and delay 02f927b verified davidpomerenke commited on Aug 5
Upload from GitHub Actions: updated workflow settings e51c770 verified davidpomerenke commited on Aug 5
Upload from GitHub Actions: Merge pull request #9 from datenlabor-bmz/jn-dev 7c06aef verified davidpomerenke commited on Aug 5
Upload from GitHub Actions: Merge pull request #7 from datenlabor-bmz/jn-dev 6878a71 verified davidpomerenke commited on Jul 25
Upload from GitHub Actions: Merge pull request #6 from datenlabor-bmz/jn-dev 6234f5c verified davidpomerenke commited on Jul 24
Upload from GitHub Actions: Exclude TruthfulQA from proficiency score 3fbff09 verified davidpomerenke commited on Jul 4
Upload from GitHub Actions: TruthfulQA translation WIP fd102e9 verified davidpomerenke commited on Jul 4
Upload from GitHub Actions: Get more results, compute average based on all tasks 98c6811 verified davidpomerenke commited on Jul 2
Upload from GitHub Actions: Translate MMLU and evaluate 4c5c136 verified davidpomerenke commited on Jun 30
Upload from GitHub Actions: Evaluate on autotranslated GSM dataset f3a09a2 verified davidpomerenke commited on Jun 29
Upload from GitHub Actions: Evaluate Google Translate 338dc9b verified davidpomerenke commited on Jun 28
Upload from GitHub Actions: More models and languages a73f888 verified davidpomerenke commited on Jun 6
Upload from GitHub Actions: Merge remote changes and apply terminology updates: Commercial->closed-source, Open->open-source ebaf279 verified davidpomerenke commited on Jun 4
Upload from GitHub Actions: Use task subset for average score b1e5b40 verified davidpomerenke commited on Jun 4
Upload from GitHub Actions: Eavaluate on 40 languages 941d5c5 verified davidpomerenke commited on Jun 4
Upload from GitHub Actions: Update model ranking fetching f840423 verified davidpomerenke commited on May 22
Upload from GitHub Actions: Use FLORES+ via Huggingface 913253a verified davidpomerenke commited on May 22
Upload from GitHub Actions: Merge pull request #4 from datenlabor-bmz/jonas-dev 7c6a118 verified davidpomerenke commited on May 12