Estonian POS Tagging Model (XLM-RoBERTa-Base)
This model is a fine-tuned version of FacebookAI/xlm-roberta-base for Estonian Part-of-Speech (POS) tagging.
It is trained on the Universal Dependencies Treebank (UDT), specifically:
The model provides strong token-level linguistic annotation performance and can be used for downstream Estonian NLP tasks.
Evaluation Results
POS tagging accuracy on UDT test datasets (EDT + EWT): 0.9775
| Label | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| ADJ | 0.96 | 0.96 | 0.96 | 4902 |
| ADP | 0.95 | 0.97 | 0.96 | 1134 |
| ADV | 0.97 | 0.98 | 0.97 | 6511 |
| AUX | 0.98 | 0.99 | 0.98 | 3409 |
| CCONJ | 0.99 | 0.99 | 0.99 | 2510 |
| DET | 0.91 | 0.93 | 0.92 | 1142 |
| INTJ | 0.85 | 0.83 | 0.84 | 129 |
| NOUN | 0.98 | 0.98 | 0.98 | 15336 |
| NUM | 0.97 | 0.95 | 0.96 | 1104 |
| PRON | 0.98 | 0.98 | 0.98 | 3425 |
| PROPN | 0.96 | 0.95 | 0.96 | 3805 |
| PUNCT | 1.00 | 1.00 | 1.00 | 9939 |
| SCONJ | 0.98 | 0.98 | 0.98 | 1459 |
| SYM | 0.85 | 0.73 | 0.79 | 63 |
| VERB | 0.99 | 0.98 | 0.98 | 6746 |
| X | 0.78 | 0.53 | 0.63 | 75 |
| Accuracy | โ | โ | 0.98 | 61689 |
| Macro avg | 0.94 | 0.92 | 0.93 | 61689 |
| Weighted avg | 0.98 | 0.98 | 0.98 | 61689 |
- Downloads last month
- 18
Model tree for PitchayaS/xlm-roberta-estonian-pos
Base model
FacebookAI/xlm-roberta-base