sts_v11 / README.md
seregadgl's picture
Add new SentenceTransformer model
34d7252 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:111476
  - loss:CosineSimilarityLoss
base_model: sergeyzh/LaBSE-ru-sts
widget:
  - source_sentence: 'трюковый самокат plank 180 белый '
    sentences:
      - смарт-телевизор 75 sony kd-75x950h
      - самокат для трюков плэнк 1.80 м белый
      - xiaomi mi 11 8gb 128gb
  - source_sentence: 'вейп vaporesso xros '
    sentences:
      - садовая ограда классика 4 2 м белый
      - кухонные весы
      - электронная сигарета voopoo drag
  - source_sentence: серьги l atelier precieux 1628710
    sentences:
      - фильтр hepa для пылесоса варис st400
      - потолочная люстра майтон nostalgia ceiling chandelier mod048pl-06g
      - серьги atelier de bijoux 1628712
  - source_sentence: 'мобильный геймпад триггерами x2 '
    sentences:
      - электроскутер nitro pro milano 750w led
      - наушники без проводов мейзу ep52 lite
      - геймпад с функцией триггеров x2
  - source_sentence: комод 7 рисунком машинки 4 ящика
    sentences:
      - удлинитель far f 505 d lara выключателем 2 0м
      - беззеркальный фотоаппарат nikon z50 kit 16-50mm ilce-7cl красный
      - комод 8 с изображением супергероев 6 ящиков
datasets:
  - seregadgl/data_cross_gpt_139k
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - cosine_mcc
model-index:
  - name: SentenceTransformer based on sergeyzh/LaBSE-ru-sts
    results:
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: eval
          type: eval
        metrics:
          - type: cosine_accuracy
            value: 0.9722640832436311
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.630459189414978
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.9724366041896361
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.5821653008460999
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.9647847565278758
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9802107980210798
            name: Cosine Recall
          - type: cosine_ap
            value: 0.9945729266353226
            name: Cosine Ap
          - type: cosine_mcc
            value: 0.9445047865635516
            name: Cosine Mcc

SentenceTransformer based on sergeyzh/LaBSE-ru-sts

This is a sentence-transformers model finetuned from sergeyzh/LaBSE-ru-sts on the data_cross_gpt_139k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sergeyzh/LaBSE-ru-sts
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seregadgl/sts_v11")
# Run inference
sentences = [
    'комод 7 рисунком машинки 4 ящика',
    'комод 8 с изображением супергероев 6 ящиков',
    'беззеркальный фотоаппарат nikon z50 kit 16-50mm ilce-7cl красный',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9723
cosine_accuracy_threshold 0.6305
cosine_f1 0.9724
cosine_f1_threshold 0.5822
cosine_precision 0.9648
cosine_recall 0.9802
cosine_ap 0.9946
cosine_mcc 0.9445

Training Details

Training Dataset

data_cross_gpt_139k

  • Dataset: data_cross_gpt_139k at 9e1f5ca
  • Size: 111,476 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 14.84 tokens
    • max: 45 tokens
    • min: 4 tokens
    • mean: 15.64 tokens
    • max: 55 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    нож кухонный 21см синий кухонный нож 22см зелёный 0.0
    блок питания универсальный для мерцающих флэш гирлянд rich led бахрома занавес нить белый адаптер питания для мигающих led гирлянд "luminous decor" бахрома занавес нить зелёный 0.0
    защитная пленка для apple iphone 6 прозрачная protective film for apple iphone 6 transparent 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

data_cross_gpt_139k

  • Dataset: data_cross_gpt_139k at 9e1f5ca
  • Size: 27,870 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 15.05 tokens
    • max: 58 tokens
    • min: 4 tokens
    • mean: 15.57 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.48
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    сумка дорожная складная полет оранжевая bradex td 0599 сумка для путешествий складная брадекс orange 1.0
    наушники sennheiser hd 450bt белый наушники сенхайзер hd 450bt white 1.0
    перчатки stg al-05-1871 синие серые черные зеленыеполноразмерные xl перчатки stg al-05-1871 blue gray black green full size xl 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 4.7459131195420915e-05
  • weight_decay: 0.03196240090522689
  • num_train_epochs: 2
  • warmup_ratio: 0.014344463935915175
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 4.7459131195420915e-05
  • weight_decay: 0.03196240090522689
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.014344463935915175
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss eval_cosine_ap
0.0287 100 0.189 - -
0.0574 200 0.0695 - -
0.0861 300 0.067 - -
0.1148 400 0.0643 - -
0.1435 500 0.0594 0.0549 0.9862
0.1722 600 0.0565 - -
0.2009 700 0.0535 - -
0.2296 800 0.0506 - -
0.2583 900 0.0549 - -
0.2870 1000 0.0535 0.0451 0.9888
0.3157 1100 0.0492 - -
0.3444 1200 0.0499 - -
0.3731 1300 0.0486 - -
0.4018 1400 0.0458 - -
0.4305 1500 0.0458 0.0419 0.9877
0.4592 1600 0.0502 - -
0.4879 1700 0.045 - -
0.5166 1800 0.0435 - -
0.5454 1900 0.0426 - -
0.5741 2000 0.0422 0.0386 0.9906
0.6028 2100 0.0436 - -
0.6315 2200 0.043 - -
0.6602 2300 0.0432 - -
0.6889 2400 0.0397 - -
0.7176 2500 0.0394 0.0357 0.9903
0.7463 2600 0.039 - -
0.7750 2700 0.0398 - -
0.8037 2800 0.0394 - -
0.8324 2900 0.0426 - -
0.8611 3000 0.0345 0.0341 0.9921
0.8898 3100 0.0361 - -
0.9185 3200 0.0365 - -
0.9472 3300 0.0401 - -
0.9759 3400 0.0391 - -
1.0046 3500 0.0342 0.0310 0.9928
1.0333 3600 0.0267 - -
1.0620 3700 0.0264 - -
1.0907 3800 0.0263 - -
1.1194 3900 0.0248 - -
1.1481 4000 0.0282 0.0301 0.9928
1.1768 4100 0.0279 - -
1.2055 4200 0.0258 - -
1.2342 4300 0.0248 - -
1.2629 4400 0.0289 - -
1.2916 4500 0.0261 0.0291 0.9935
1.3203 4600 0.0262 - -
1.3490 4700 0.0276 - -
1.3777 4800 0.0256 - -
1.4064 4900 0.0272 - -
1.4351 5000 0.0283 0.0284 0.9939
1.4638 5100 0.0254 - -
1.4925 5200 0.0252 - -
1.5212 5300 0.0234 - -
1.5499 5400 0.0228 - -
1.5786 5500 0.0248 0.0277 0.9941
1.6073 5600 0.024 - -
1.6361 5700 0.0225 - -
1.6648 5800 0.0234 - -
1.6935 5900 0.0226 - -
1.7222 6000 0.0248 0.0265 0.9942
1.7509 6100 0.0247 - -
1.7796 6200 0.0219 - -
1.8083 6300 0.026 - -
1.8370 6400 0.0209 - -
1.8657 6500 0.0252 0.0262 0.9945
1.8944 6600 0.0218 - -
1.9231 6700 0.0223 - -
1.9518 6800 0.0228 - -
1.9805 6900 0.0242 - -
2.0 6968 - 0.0257 0.9946

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}