RoBERTa-SA-FT (Anno-lexical)
This model is a sentence-level media (lexical) bias classifier. It is the Synthetic-Annotations Fine-Tuned (SA-FT) classifier from the paper “The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection” (NAACL Findings 2025; arXiv:2411.11081). It’s a roberta-base encoder with a 2-layer classification head trained on the Anno-lexical dataset (48,330 sentences; 70/15/15 splits). Labels are binary: 0 = neutral/non-lexical-bias, 1 = lexical-bias.
Paper: The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
Dataset: mediabiasgroup/anno-lexical
Intended use & limitations
- Intended use: research on lexical/loaded-language bias in news-like English sentences; benchmarking SA-FT vs human-annotation fine-tuning.
- Out-of-scope: detecting non-lexical forms of media bias (e.g., informational/selection bias), political leaning, stance, or factuality; production deployments without human oversight.
- Known caveats: Compared with a model fine-tuned on human labels (HA-FT), SA-FT emphasizes recall over precision; robustness to perturbations is weaker than HA-FT.
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
m = "mediabiasgroup/roberta-anno-lexical-ft"
tok = AutoTokenizer.from_pretrained(m)
model = AutoModelForSequenceClassification.from_pretrained(m)
text = "Democrats shamelessly rammed the bill through Congress."
inputs = tok(text, return_tensors="pt")
probs = model(**inputs).logits.softmax(-1).tolist()[0]
print({"neutral": probs[0], "lexical_bias": probs[1]})
Training data & setup
Training data: Anno-lexical (48,330 sentences; binary lexical-bias labels aggregated via majority vote from LLM annotators).
Base encoder: roberta-base; head: 2-layer classifier.
Hardware: single A100; single-run training.
Evaluation
BABE (test): P 0.875 / R 0.814 / F1 0.843 / MCC 0.662
BASIL (all): P 0.171 / R 0.502 / F1 0.254 / MCC 0.205 (Positive class = lexical bias; BASIL informational bias treated as neutral.)
Safety, bias & ethics
Media bias perception is subjective and culturally dependent. This model may over-flag biased wording and should not be used to penalize individuals or outlets. Use with human-in-the-loop review and domain-specific calibration.
Citation
If you use this model, please cite:
@inproceedings{horych-etal-2025-promises,
title = "The Promises and Pitfalls of {LLM} Annotations in Dataset Labeling: a Case Study on Media Bias Detection",
author = "Horych, Tom{\'a}{\v{s}} and
Mandl, Christoph and
Ruas, Terry and
Greiner-Petter, Andre and
Gipp, Bela and
Aizawa, Akiko and
Spinde, Timo",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.75/",
doi = "10.18653/v1/2025.findings-naacl.75",
pages = "1370--1386",
ISBN = "979-8-89176-195-7"
}
- Downloads last month
- 16
Model tree for mediabiasgroup/roberta-anno-lexical-ft
Base model
FacebookAI/roberta-baseDataset used to train mediabiasgroup/roberta-anno-lexical-ft
Collection including mediabiasgroup/roberta-anno-lexical-ft
Evaluation results
- precision on BABE (test)self-reported0.875
- recall on BABE (test)self-reported0.814
- f1 on BABE (test)self-reported0.843
- mcc on BABE (test)self-reported0.662
- precision on BASIL (all sentences)self-reported0.171
- recall on BASIL (all sentences)self-reported0.502
- f1 on BASIL (all sentences)self-reported0.254
- mcc on BASIL (all sentences)self-reported0.205