SentenceTransformer based on jhu-clsp/mmBERT-small

This is a sentence-transformers model finetuned from jhu-clsp/mmBERT-small on the vn-sts-0003 dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jhu-clsp/mmBERT-small
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("8Opt/mmbert-sts-0003-0001")
# Run inference
sentences = [
    'Chi phí chăm sóc trẻ em tăng gần 6%',
    "Chăm sóc trẻ em 'tiêu tốn 15.000 bảng mỗi năm'",
    'Không có con chó nào đuổi theo quả bóng trong cỏ.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9201, 0.8759],
#         [0.9201, 1.0000, 0.8777],
#         [0.8759, 0.8777, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric 8Opt-sts-dev-0001 8Opt-sts-test-0001
pearson_cosine 0.6497 0.6495
spearman_cosine 0.6934 0.6941

Training Details

Training Dataset

vn-sts-0003

  • Dataset: vn-sts-0003 at 5164e7d
  • Size: 21,543 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 17.83 tokens
    • max: 79 tokens
    • min: 6 tokens
    • mean: 17.71 tokens
    • max: 87 tokens
    • min: 0.01
    • mean: 2.55
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Một tay đua xe đạp trần truồng Một tay đua xe đạp đang mặc đồ màu đen. 3.315
    Thật không may, câu trả lời cho câu hỏi của bạn là chúng tôi không biết. Nếu cuộc trò chuyện không phải về công việc, bạn sẽ biết câu trả lời cho câu hỏi của mình là gì. 0.4
    Tòa án Trung Quốc duy trì án tử hình cho những kẻ giết người Mekong Tòa án duy trì án tử hình cho những kẻ giết người Mekong 4.4
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

vn-sts-0003

  • Dataset: vn-sts-0003 at 5164e7d
  • Size: 3,077 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 18.03 tokens
    • max: 80 tokens
    • min: 6 tokens
    • mean: 17.6 tokens
    • max: 71 tokens
    • min: 0.04
    • mean: 2.44
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Có một vài thứ để giúp bánh của bạn tăng cân một chút. John Cavan có một số phương pháp hợp lý và thích hợp để tăng trọng lượng thỏ trong câu trả lời của mình. 3.0
    Tháng 4 / tháng 5 / tháng 6 năm 1963 trở thành một nhà lãnh đạo của các cuộc biểu tình dân quyền ở Greensboro, Bắc Carolina. Mùa xuân năm 1966 trở thành người đứng đầu chi nhánh Chicago của CLC Operation Breadbasket. 0.5
    một người đàn ông đi dọc theo một con đường. một người đàn ông đi qua một sợi dây thừng chặt chẽ. 0.8
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss 8Opt-sts-dev-0001_spearman_cosine 8Opt-sts-test-0001_spearman_cosine
0.0742 100 4.7338 4.7076 0.4797 -
0.1485 200 4.6967 4.6637 0.5562 -
0.2227 300 4.6581 4.6458 0.6063 -
0.2970 400 4.6176 4.6038 0.6419 -
0.3712 500 4.6214 4.6475 0.5919 -
0.4454 600 4.6049 4.5982 0.6413 -
0.5197 700 4.598 4.5892 0.6367 -
0.5939 800 4.6088 4.5742 0.6751 -
0.6682 900 4.581 4.5980 0.6726 -
0.7424 1000 4.5617 4.5716 0.6776 -
0.8166 1100 4.5584 4.5545 0.6890 -
0.8909 1200 4.5537 4.5487 0.6929 -
0.9651 1300 4.5175 4.5535 0.6934 -
-1 -1 - - - 0.6941

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@article{10531646,
    author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
    year={2024},
    doi={10.1109/TASLP.2024.3402087}
}
Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 8Opt/mmbert-sts-0003-0001

Finetuned
(15)
this model

Dataset used to train 8Opt/mmbert-sts-0003-0001

Evaluation results