SentenceTransformer based on sergeyzh/LaBSE-ru-sts

This is a sentence-transformers model finetuned from sergeyzh/LaBSE-ru-sts on the data_cross_gpt_139k dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sergeyzh/LaBSE-ru-sts
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seregadgl/sts_v11")
# Run inference
sentences = [
    'комод 7 рисунком машинки 4 ящика',
    'комод 8 с изображением супергероев 6 ящиков',
    'беззеркальный фотоаппарат nikon z50 kit 16-50mm ilce-7cl красный',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9723
cosine_accuracy_threshold 0.6305
cosine_f1 0.9724
cosine_f1_threshold 0.5822
cosine_precision 0.9648
cosine_recall 0.9802
cosine_ap 0.9946
cosine_mcc 0.9445

Training Details

Training Dataset

data_cross_gpt_139k

  • Dataset: data_cross_gpt_139k at 9e1f5ca
  • Size: 111,476 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 14.84 tokens
    • max: 45 tokens
    • min: 4 tokens
    • mean: 15.64 tokens
    • max: 55 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    нож кухонный 21см синий кухонный нож 22см зелёный 0.0
    блок питания универсальный для мерцающих флэш гирлянд rich led бахрома занавес нить белый адаптер питания для мигающих led гирлянд "luminous decor" бахрома занавес нить зелёный 0.0
    защитная пленка для apple iphone 6 прозрачная protective film for apple iphone 6 transparent 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

data_cross_gpt_139k

  • Dataset: data_cross_gpt_139k at 9e1f5ca
  • Size: 27,870 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 15.05 tokens
    • max: 58 tokens
    • min: 4 tokens
    • mean: 15.57 tokens
    • max: 53 tokens
    • min: 0.0
    • mean: 0.48
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    сумка дорожная складная полет оранжевая bradex td 0599 сумка для путешествий складная брадекс orange 1.0
    наушники sennheiser hd 450bt белый наушники сенхайзер hd 450bt white 1.0
    перчатки stg al-05-1871 синие серые черные зеленыеполноразмерные xl перчатки stg al-05-1871 blue gray black green full size xl 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 4.7459131195420915e-05
  • weight_decay: 0.03196240090522689
  • num_train_epochs: 2
  • warmup_ratio: 0.014344463935915175
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 4.7459131195420915e-05
  • weight_decay: 0.03196240090522689
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.014344463935915175
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss eval_cosine_ap
0.0287 100 0.189 - -
0.0574 200 0.0695 - -
0.0861 300 0.067 - -
0.1148 400 0.0643 - -
0.1435 500 0.0594 0.0549 0.9862
0.1722 600 0.0565 - -
0.2009 700 0.0535 - -
0.2296 800 0.0506 - -
0.2583 900 0.0549 - -
0.2870 1000 0.0535 0.0451 0.9888
0.3157 1100 0.0492 - -
0.3444 1200 0.0499 - -
0.3731 1300 0.0486 - -
0.4018 1400 0.0458 - -
0.4305 1500 0.0458 0.0419 0.9877
0.4592 1600 0.0502 - -
0.4879 1700 0.045 - -
0.5166 1800 0.0435 - -
0.5454 1900 0.0426 - -
0.5741 2000 0.0422 0.0386 0.9906
0.6028 2100 0.0436 - -
0.6315 2200 0.043 - -
0.6602 2300 0.0432 - -
0.6889 2400 0.0397 - -
0.7176 2500 0.0394 0.0357 0.9903
0.7463 2600 0.039 - -
0.7750 2700 0.0398 - -
0.8037 2800 0.0394 - -
0.8324 2900 0.0426 - -
0.8611 3000 0.0345 0.0341 0.9921
0.8898 3100 0.0361 - -
0.9185 3200 0.0365 - -
0.9472 3300 0.0401 - -
0.9759 3400 0.0391 - -
1.0046 3500 0.0342 0.0310 0.9928
1.0333 3600 0.0267 - -
1.0620 3700 0.0264 - -
1.0907 3800 0.0263 - -
1.1194 3900 0.0248 - -
1.1481 4000 0.0282 0.0301 0.9928
1.1768 4100 0.0279 - -
1.2055 4200 0.0258 - -
1.2342 4300 0.0248 - -
1.2629 4400 0.0289 - -
1.2916 4500 0.0261 0.0291 0.9935
1.3203 4600 0.0262 - -
1.3490 4700 0.0276 - -
1.3777 4800 0.0256 - -
1.4064 4900 0.0272 - -
1.4351 5000 0.0283 0.0284 0.9939
1.4638 5100 0.0254 - -
1.4925 5200 0.0252 - -
1.5212 5300 0.0234 - -
1.5499 5400 0.0228 - -
1.5786 5500 0.0248 0.0277 0.9941
1.6073 5600 0.024 - -
1.6361 5700 0.0225 - -
1.6648 5800 0.0234 - -
1.6935 5900 0.0226 - -
1.7222 6000 0.0248 0.0265 0.9942
1.7509 6100 0.0247 - -
1.7796 6200 0.0219 - -
1.8083 6300 0.026 - -
1.8370 6400 0.0209 - -
1.8657 6500 0.0252 0.0262 0.9945
1.8944 6600 0.0218 - -
1.9231 6700 0.0223 - -
1.9518 6800 0.0228 - -
1.9805 6900 0.0242 - -
2.0 6968 - 0.0257 0.9946

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
32
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for seregadgl/sts_v11

Finetuned
(1)
this model

Evaluation results