vrashad's picture
Update README.md
347458d verified
|
raw
history blame
3.73 kB
metadata
license: cc-by-nc-4.0
language:
  - az
pipeline_tag: text-classification

Sentiment Analysis Model for Azerbaijani Text

This repository hosts a fine-tuned XLM-RoBERTa model for sentiment analysis on Azerbaijani text. The model is capable of classifying text into three categories: negative, neutral, and positive. This README provides guidelines on how to setup and use the model for your own sentiment analysis tasks.

Model Description

The model is based on xlm-roberta-base, which has been fine-tuned on a diverse dataset of Azerbaijani text samples. It is designed to understand the sentiment expressed in texts and classify them accordingly.

How to Use

You can use this model directly with a pipeline for text classification, or you can use it with the transformers library for more custom usage, as shown in the example below.

Quick Start

First, install the transformers library if you haven't already:

pip install transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the model and tokenizer from Hugging Face Hub
model_name = "LocalDoc/sentiment_analysis_azerbaijani"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    # Encode the text using the tokenizer
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

    # Get predictions from the model
    with torch.no_grad():
        outputs = model(**inputs)

    # Convert logits to probabilities using softmax
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)

    # Get the highest probability and corresponding label
    top_prob, top_label = torch.max(probs, dim=-1)
    labels = ["negative", "neutral", "positive"]

    # Return the label with the highest probability
    return labels[top_label], top_prob

# Example text
text = "Bu mənim xoşuma gəlir"

# Get the sentiment
predicted_label, probability = predict_sentiment(text)
print(f"Predicted sentiment: {predicted_label} with a probability of {probability.item():.4f}")

Language Label Information

The model outputs a label for each prediction, corresponding to one of the languages listed below. Each label is associated with a specific language code as detailed in the following table:

Label Result
0 negative
1 neutral
2 positive

This mapping is utilized to decode the model's predictions into understandable language names, facilitating the interpretation of results for further processing or analysis.

Training Performance

The model was trained over three epochs, showing consistent improvement in accuracy and loss:

Epoch 1: Training Loss: 0.0127, Validation Loss: 0.0174, Accuracy: 0.9966, F1 Score: 0.9966
Epoch 2: Training Loss: 0.0149, Validation Loss: 0.0141, Accuracy: 0.9973, F1 Score: 0.9973
Epoch 3: Training Loss: 0.0001, Validation Loss: 0.0109, Accuracy: 0.9984, F1 Score: 0.9984

Test Results

The model achieved the following results on the test set:

Loss: 0.0133
Accuracy: 0.9975
F1 Score: 0.9975
Precision: 0.9975
Recall: 0.9975
Evaluation Time: 17.5 seconds
Samples per Second: 599.685
Steps per Second: 9.424

License

The dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International license. This license allows you to freely share and redistribute the dataset with attribution to the source but prohibits commercial use and the creation of derivative works.

Contact information

If you have any questions or suggestions, please contact us at [[email protected]].