SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Pollen-mediated gene flow can have significant implications for the management of invasive species.',
    'If an invasive species is able to hybridize with a native species through pollen-mediated gene flow, it may gain a competitive advantage, leading to the displacement of the native species and altered ecosystem dynamics.',
    'A condition that occurs when glucocorticoids are abruptly discontinued, leading to symptoms such as adrenal insufficiency and fatigue.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.7669, -0.0300],
#         [ 0.7669,  1.0000,  0.0155],
#         [-0.0300,  0.0155,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 33,038 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 4 tokens
    • mean: 18.77 tokens
    • max: 75 tokens
    • min: 6 tokens
    • mean: 31.75 tokens
    • max: 67 tokens
  • Samples:
    sentence_0 sentence_1
    In terrestrial ecosystems, detrital storage can significantly influence soil formation and fertility. The accumulation of detritus can lead to the formation of humus, a rich source of nutrients for plants, while also affecting soil structure and water-holding capacity.
    Rebound anxiety A phenomenon where individuals experiencing protracted withdrawal syndrome from anxiolytic medications exhibit intensified anxiety symptoms, often exceeding pre-treatment levels.
    Synchrony Breakdown A phenomenon where population synchrony is disrupted, often due to changes in environmental conditions, species interactions, or other factors that affect the populations' dynamics.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 100
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 100
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.9671 500 0.2396
1.9342 1000 0.1298
2.9014 1500 0.0946
3.8685 2000 0.0726
4.8356 2500 0.0589
5.8027 3000 0.0479
6.7698 3500 0.043
7.7369 4000 0.037
8.7041 4500 0.0349
9.6712 5000 0.03
10.6383 5500 0.0286
11.6054 6000 0.0269
12.5725 6500 0.0248
13.5397 7000 0.0232
14.5068 7500 0.0223
15.4739 8000 0.0212
16.4410 8500 0.0202
17.4081 9000 0.0186
18.3752 9500 0.0172
19.3424 10000 0.018
20.3095 10500 0.0159
21.2766 11000 0.0155
22.2437 11500 0.016
23.2108 12000 0.0144
24.1779 12500 0.0142
25.1451 13000 0.0141
26.1122 13500 0.0127
27.0793 14000 0.0138
28.0464 14500 0.0123
29.0135 15000 0.0117
29.9807 15500 0.0118
30.9478 16000 0.0117
31.9149 16500 0.0121
32.8820 17000 0.0111
33.8491 17500 0.0105
34.8162 18000 0.0104
35.7834 18500 0.0107
36.7505 19000 0.0107
37.7176 19500 0.0098
38.6847 20000 0.01
39.6518 20500 0.0104
40.6190 21000 0.0099
41.5861 21500 0.0094
42.5532 22000 0.0091
43.5203 22500 0.0096
44.4874 23000 0.0086
45.4545 23500 0.0087
46.4217 24000 0.0081
47.3888 24500 0.008
48.3559 25000 0.0078
49.3230 25500 0.0087
50.2901 26000 0.0075
51.2573 26500 0.0077
52.2244 27000 0.0076
53.1915 27500 0.0076
54.1586 28000 0.0074
55.1257 28500 0.0072
56.0928 29000 0.0076
57.0600 29500 0.0066
58.0271 30000 0.0073
58.9942 30500 0.0075
59.9613 31000 0.0064
60.9284 31500 0.0069
61.8956 32000 0.0071
62.8627 32500 0.0073
63.8298 33000 0.0071
64.7969 33500 0.0068
65.7640 34000 0.0065
66.7311 34500 0.0069
67.6983 35000 0.0063
68.6654 35500 0.0067
69.6325 36000 0.0059
70.5996 36500 0.0061
71.5667 37000 0.0061
72.5338 37500 0.0065
73.5010 38000 0.0056
74.4681 38500 0.0057
75.4352 39000 0.0063
76.4023 39500 0.0059
77.3694 40000 0.006
78.3366 40500 0.0066
79.3037 41000 0.0061
80.2708 41500 0.0062
81.2379 42000 0.0057
82.2050 42500 0.0057
83.1721 43000 0.0055
84.1393 43500 0.0054
85.1064 44000 0.0048
86.0735 44500 0.0051
87.0406 45000 0.006
88.0077 45500 0.0055
88.9749 46000 0.0057
89.9420 46500 0.0052
90.9091 47000 0.0054
91.8762 47500 0.0052
92.8433 48000 0.0053
93.8104 48500 0.0051
94.7776 49000 0.006
95.7447 49500 0.005
96.7118 50000 0.0058
97.6789 50500 0.005
98.6460 51000 0.0052
99.6132 51500 0.0056

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.2
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
369
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for amanrajput/MiniLM-L6-v2-biology-finetuned

Finetuned
(671)
this model