SentenceTransformer based on jhu-clsp/mmBERT-small
This is a sentence-transformers model finetuned from jhu-clsp/mmBERT-small on the vn-sts-0003 dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: jhu-clsp/mmBERT-small
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("8Opt/mmbert-sts-0003-0001")
# Run inference
sentences = [
'Chi phí chăm sóc trẻ em tăng gần 6%',
"Chăm sóc trẻ em 'tiêu tốn 15.000 bảng mỗi năm'",
'Không có con chó nào đuổi theo quả bóng trong cỏ.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9201, 0.8759],
# [0.9201, 1.0000, 0.8777],
# [0.8759, 0.8777, 1.0000]])
Evaluation
Metrics
Semantic Similarity
- Datasets:
8Opt-sts-dev-0001and8Opt-sts-test-0001 - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | 8Opt-sts-dev-0001 | 8Opt-sts-test-0001 |
|---|---|---|
| pearson_cosine | 0.6497 | 0.6495 |
| spearman_cosine | 0.6934 | 0.6941 |
Training Details
Training Dataset
vn-sts-0003
- Dataset: vn-sts-0003 at 5164e7d
- Size: 21,543 training samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 5 tokens
- mean: 17.83 tokens
- max: 79 tokens
- min: 6 tokens
- mean: 17.71 tokens
- max: 87 tokens
- min: 0.01
- mean: 2.55
- max: 5.0
- Samples:
sentence1 sentence2 score Một tay đua xe đạp trần truồngMột tay đua xe đạp đang mặc đồ màu đen.3.315Thật không may, câu trả lời cho câu hỏi của bạn là chúng tôi không biết.Nếu cuộc trò chuyện không phải về công việc, bạn sẽ biết câu trả lời cho câu hỏi của mình là gì.0.4Tòa án Trung Quốc duy trì án tử hình cho những kẻ giết người MekongTòa án duy trì án tử hình cho những kẻ giết người Mekong4.4 - Loss:
CoSENTLosswith these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Evaluation Dataset
vn-sts-0003
- Dataset: vn-sts-0003 at 5164e7d
- Size: 3,077 evaluation samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 5 tokens
- mean: 18.03 tokens
- max: 80 tokens
- min: 6 tokens
- mean: 17.6 tokens
- max: 71 tokens
- min: 0.04
- mean: 2.44
- max: 5.0
- Samples:
sentence1 sentence2 score Có một vài thứ để giúp bánh của bạn tăng cân một chút.John Cavan có một số phương pháp hợp lý và thích hợp để tăng trọng lượng thỏ trong câu trả lời của mình.3.0Tháng 4 / tháng 5 / tháng 6 năm 1963 trở thành một nhà lãnh đạo của các cuộc biểu tình dân quyền ở Greensboro, Bắc Carolina.Mùa xuân năm 1966 trở thành người đứng đầu chi nhánh Chicago của CLC Operation Breadbasket.0.5một người đàn ông đi dọc theo một con đường.một người đàn ông đi qua một sợi dây thừng chặt chẽ.0.8 - Loss:
CoSENTLosswith these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | 8Opt-sts-dev-0001_spearman_cosine | 8Opt-sts-test-0001_spearman_cosine |
|---|---|---|---|---|---|
| 0.0742 | 100 | 4.7338 | 4.7076 | 0.4797 | - |
| 0.1485 | 200 | 4.6967 | 4.6637 | 0.5562 | - |
| 0.2227 | 300 | 4.6581 | 4.6458 | 0.6063 | - |
| 0.2970 | 400 | 4.6176 | 4.6038 | 0.6419 | - |
| 0.3712 | 500 | 4.6214 | 4.6475 | 0.5919 | - |
| 0.4454 | 600 | 4.6049 | 4.5982 | 0.6413 | - |
| 0.5197 | 700 | 4.598 | 4.5892 | 0.6367 | - |
| 0.5939 | 800 | 4.6088 | 4.5742 | 0.6751 | - |
| 0.6682 | 900 | 4.581 | 4.5980 | 0.6726 | - |
| 0.7424 | 1000 | 4.5617 | 4.5716 | 0.6776 | - |
| 0.8166 | 1100 | 4.5584 | 4.5545 | 0.6890 | - |
| 0.8909 | 1200 | 4.5537 | 4.5487 | 0.6929 | - |
| 0.9651 | 1300 | 4.5175 | 4.5535 | 0.6934 | - |
| -1 | -1 | - | - | - | 0.6941 |
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.1.1
- Transformers: 4.57.1
- PyTorch: 2.8.0+cu126
- Accelerate: 1.11.0
- Datasets: 4.0.0
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CoSENTLoss
@article{10531646,
author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
year={2024},
doi={10.1109/TASLP.2024.3402087}
}
- Downloads last month
- 5
Model tree for 8Opt/mmbert-sts-0003-0001
Base model
jhu-clsp/mmBERT-smallDataset used to train 8Opt/mmbert-sts-0003-0001
Evaluation results
- Pearson Cosine on 8Opt sts dev 0001self-reported0.650
- Spearman Cosine on 8Opt sts dev 0001self-reported0.693
- Pearson Cosine on 8Opt sts test 0001self-reported0.650
- Spearman Cosine on 8Opt sts test 0001self-reported0.694