Update README.md
Browse files
README.md
CHANGED
|
@@ -4,10 +4,11 @@ language:
|
|
| 4 |
- az
|
| 5 |
pipeline_tag: text-classification
|
| 6 |
---
|
| 7 |
-
# Sentiment Analysis
|
|
|
|
| 8 |
|
| 9 |
## Model Description
|
| 10 |
-
|
| 11 |
|
| 12 |
## How to Use
|
| 13 |
You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
|
|
@@ -19,58 +20,51 @@ pip install transformers
|
|
| 19 |
```
|
| 20 |
|
| 21 |
```python
|
| 22 |
-
from transformers import AutoModelForSequenceClassification,
|
| 23 |
import torch
|
| 24 |
|
| 25 |
-
# Load
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
text
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
model
|
| 35 |
-
with torch.no_grad():
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
#
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
```
|
| 46 |
|
| 47 |
## Language Label Information
|
| 48 |
|
| 49 |
The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
|
| 50 |
|
| 51 |
-
| Label |
|
| 52 |
-
|
| 53 |
-
| 0 |
|
| 54 |
-
|
|
| 55 |
-
|
|
| 56 |
-
|
| 57 |
-
| LABEL_4 | el | Greek |
|
| 58 |
-
| LABEL_5 | en | English |
|
| 59 |
-
| LABEL_6 | es | Spanish |
|
| 60 |
-
| LABEL_7 | fr | French |
|
| 61 |
-
| LABEL_8 | hi | Hindi |
|
| 62 |
-
| LABEL_9 | it | Italian |
|
| 63 |
-
| LABEL_10 | ja | Japanese |
|
| 64 |
-
| LABEL_11 | nl | Dutch |
|
| 65 |
-
| LABEL_12 | pl | Polish |
|
| 66 |
-
| LABEL_13 | pt | Portuguese |
|
| 67 |
-
| LABEL_14 | ru | Russian |
|
| 68 |
-
| LABEL_15 | sw | Swahili |
|
| 69 |
-
| LABEL_16 | th | Thai |
|
| 70 |
-
| LABEL_17 | tr | Turkish |
|
| 71 |
-
| LABEL_18 | ur | Urdu |
|
| 72 |
-
| LABEL_19 | vi | Vietnamese |
|
| 73 |
-
| LABEL_20 | zh | Chinese |
|
| 74 |
|
| 75 |
This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.
|
| 76 |
|
|
|
|
| 4 |
- az
|
| 5 |
pipeline_tag: text-classification
|
| 6 |
---
|
| 7 |
+
# Sentiment Analysis Model for Azerbaijani Text
|
| 8 |
+
This repository hosts a fine-tuned XLM-RoBERTa model for sentiment analysis on Azerbaijani text. The model is capable of classifying text into three categories: negative, neutral, and positive. This README provides guidelines on how to setup and use the model for your own sentiment analysis tasks.
|
| 9 |
|
| 10 |
## Model Description
|
| 11 |
+
The model is based on `xlm-roberta-base`, which has been fine-tuned on a diverse dataset of Azerbaijani text samples. It is designed to understand the sentiment expressed in texts and classify them accordingly.
|
| 12 |
|
| 13 |
## How to Use
|
| 14 |
You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
|
|
|
|
| 20 |
```
|
| 21 |
|
| 22 |
```python
|
| 23 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 24 |
import torch
|
| 25 |
|
| 26 |
+
# Load the model and tokenizer from Hugging Face Hub
|
| 27 |
+
model_name = "LocalDoc/sentiment_analysis_azerbaijani"
|
| 28 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 29 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| 30 |
+
|
| 31 |
+
def predict_sentiment(text):
|
| 32 |
+
# Encode the text using the tokenizer
|
| 33 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
| 34 |
+
|
| 35 |
+
# Get predictions from the model
|
| 36 |
+
with torch.no_grad():
|
| 37 |
+
outputs = model(**inputs)
|
| 38 |
+
|
| 39 |
+
# Convert logits to probabilities using softmax
|
| 40 |
+
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
|
| 41 |
+
|
| 42 |
+
# Get the highest probability and corresponding label
|
| 43 |
+
top_prob, top_label = torch.max(probs, dim=-1)
|
| 44 |
+
labels = ["negative", "neutral", "positive"]
|
| 45 |
+
|
| 46 |
+
# Return the label with the highest probability
|
| 47 |
+
return labels[top_label], top_prob
|
| 48 |
+
|
| 49 |
+
# Example text
|
| 50 |
+
text = "Bu mənim xoşuma gəlir"
|
| 51 |
+
|
| 52 |
+
# Get the sentiment
|
| 53 |
+
predicted_label, probability = predict_sentiment(text)
|
| 54 |
+
print(f"Predicted sentiment: {predicted_label} with a probability of {probability.item():.4f}")
|
| 55 |
+
|
| 56 |
```
|
| 57 |
|
| 58 |
## Language Label Information
|
| 59 |
|
| 60 |
The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:
|
| 61 |
|
| 62 |
+
| Label | Result |
|
| 63 |
+
|-------|--------|
|
| 64 |
+
| 0 | negative |
|
| 65 |
+
| 1 | neutral |
|
| 66 |
+
| 2 | positive |
|
| 67 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.
|
| 70 |
|