SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("Jimmy-Ooi/Tyrisonase_test_model_1000_10_adafactor")
# Run inference
sentences = [
    'CCCCc1ccc(/C(CC)=N/NC(N)=S)cc1',
    'COc1ccccc1/C=N/NC(=O)c1ccc(OC)c(OC)c1',
    'O=C(/C=C/c1ccc(O)c(O)c1)NCCc1c[nH]c2ccc(O)cc12',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.2733, -0.0760],
#         [-0.2733,  1.0000,  0.9683],
#         [-0.0760,  0.9683,  1.0000]])

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 188,228 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 8 tokens
    • mean: 38.76 tokens
    • max: 106 tokens
    • min: 8 tokens
    • mean: 39.72 tokens
    • max: 145 tokens
    • 0: ~50.20%
    • 2: ~49.80%
  • Samples:
    premise hypothesis label
    COc1cc(OC)c(C2CCN(C)C2CO)c(O)c1-c1cc(-c2ccccc2Cl)[nH]n1 O=C(O)c1ccc(OCc2cn(Cc3cc(=O)c(O)co3)nn2)cc1 2
    Cl.NC(Cc1ccc(=O)n(O)c1)C(=O)O Cn1c2ccccc2c2cc(/C=C/C(=O)c3cccc(NC(=O)c4cccc(Cl)c4)c3)ccc21 0
    Cc1ccc(O)cc1O O=C1NC(=O)C(=Cc2cc(O)c(O)c(O)c2)C(=O)N1 0
  • Loss: SoftmaxLoss

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 33,217 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 11 tokens
    • mean: 39.05 tokens
    • max: 145 tokens
    • min: 8 tokens
    • mean: 39.29 tokens
    • max: 145 tokens
    • 0: ~51.40%
    • 2: ~48.60%
  • Samples:
    premise hypothesis label
    CC(=O)Oc1c(/N=N/c2ccc(C3=N/C(=C/c4ccc(OC(F)(F)F)cc4)C(=O)O3)cc2)ccc2ccccc12 CC1(C)C=Cc2c(cc3c(c2O)C(=O)CC(c2ccc(O)cc2O)O3)O1 0
    COc1cc2c(c(OC)c1OC)[C@@H]1OC@HC@@HC@H[C@H]1OC2=O O=C(c1cccc(F)c1)N1CCN(Cc2ccc(F)cc2)CC1 0
    CC(=O)Nc1ccc(/N=N/c2c(N)nc(N)nc2Cl)cc1 NC(=S)c1ccncc1 0
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • weight_decay: 0.001
  • num_train_epochs: 10
  • warmup_steps: 100
  • fp16: True
  • optim: adafactor

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adafactor
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0340 100 0.7569
0.0680 200 0.6817
0.1020 300 0.6559
0.1360 400 0.6302
0.1700 500 0.6204
0.2039 600 0.6015
0.2379 700 0.5901
0.2719 800 0.583
0.3059 900 0.5892
0.3399 1000 0.58
0.3739 1100 0.5752
0.4079 1200 0.5707
0.4419 1300 0.5727
0.4759 1400 0.5562
0.5099 1500 0.5736
0.5438 1600 0.5609
0.5778 1700 0.5545
0.6118 1800 0.5528
0.6458 1900 0.5503
0.6798 2000 0.5527
0.7138 2100 0.5552
0.7478 2200 0.5499
0.7818 2300 0.5477
0.8158 2400 0.5429
0.8498 2500 0.5314
0.8838 2600 0.5542
0.9177 2700 0.5373
0.9517 2800 0.5321
0.9857 2900 0.5412
1.0197 3000 0.5367
1.0537 3100 0.5368
1.0877 3200 0.5388
1.1217 3300 0.5419
1.1557 3400 0.5303
1.1897 3500 0.5369
1.2237 3600 0.5357
1.2576 3700 0.5296
1.2916 3800 0.5368
1.3256 3900 0.5351
1.3596 4000 0.533
1.3936 4100 0.5294
1.4276 4200 0.5341
1.4616 4300 0.5307
1.4956 4400 0.5295
1.5296 4500 0.5269
1.5636 4600 0.5272
1.5976 4700 0.5227
1.6315 4800 0.529
1.6655 4900 0.5316
1.6995 5000 0.53
1.7335 5100 0.5251
1.7675 5200 0.5294
1.8015 5300 0.5225
1.8355 5400 0.5204
1.8695 5500 0.5139
1.9035 5600 0.525
1.9375 5700 0.5242
1.9714 5800 0.5208
2.0054 5900 0.5183
2.0394 6000 0.523
2.0734 6100 0.5144
2.1074 6200 0.514
2.1414 6300 0.516
2.1754 6400 0.527
2.2094 6500 0.5182
2.2434 6600 0.5213
2.2774 6700 0.5162
2.3114 6800 0.5202
2.3453 6900 0.5258
2.3793 7000 0.5191
2.4133 7100 0.5185
2.4473 7200 0.5134
2.4813 7300 0.5231
2.5153 7400 0.513
2.5493 7500 0.5167
2.5833 7600 0.5089
2.6173 7700 0.5163
2.6513 7800 0.517
2.6852 7900 0.5081
2.7192 8000 0.5171
2.7532 8100 0.5138
2.7872 8200 0.508
2.8212 8300 0.5172
2.8552 8400 0.5109
2.8892 8500 0.5023
2.9232 8600 0.5128
2.9572 8700 0.5119
2.9912 8800 0.5082
3.0252 8900 0.5183
3.0591 9000 0.512
3.0931 9100 0.5112
3.1271 9200 0.5157
3.1611 9300 0.5066
3.1951 9400 0.5035
3.2291 9500 0.5037
3.2631 9600 0.5112
3.2971 9700 0.5147
3.3311 9800 0.5112
3.3651 9900 0.5
3.3990 10000 0.5152
3.4330 10100 0.5146
3.4670 10200 0.5103
3.5010 10300 0.5129
3.5350 10400 0.5005
3.5690 10500 0.5065
3.6030 10600 0.5105
3.6370 10700 0.5101
3.6710 10800 0.5058
3.7050 10900 0.5093
3.7390 11000 0.5102
3.7729 11100 0.511
3.8069 11200 0.4982
3.8409 11300 0.4973
3.8749 11400 0.5068
3.9089 11500 0.497
3.9429 11600 0.5018
3.9769 11700 0.5028
4.0109 11800 0.5132
4.0449 11900 0.5024
4.0789 12000 0.4992
4.1128 12100 0.4954
4.1468 12200 0.5094
4.1808 12300 0.5091
4.2148 12400 0.507
4.2488 12500 0.504
4.2828 12600 0.5029
4.3168 12700 0.4976
4.3508 12800 0.5001
4.3848 12900 0.5077
4.4188 13000 0.496
4.4528 13100 0.5075
4.4867 13200 0.5059
4.5207 13300 0.5111
4.5547 13400 0.504
4.5887 13500 0.4977
4.6227 13600 0.5156
4.6567 13700 0.4949
4.6907 13800 0.5064
4.7247 13900 0.5014
4.7587 14000 0.5006
4.7927 14100 0.5018
4.8266 14200 0.5079
4.8606 14300 0.5089
4.8946 14400 0.5006
4.9286 14500 0.5123
4.9626 14600 0.5019
4.9966 14700 0.5023
5.0306 14800 0.496
5.0646 14900 0.4934
5.0986 15000 0.5006
5.1326 15100 0.5021
5.1666 15200 0.4989
5.2005 15300 0.4932
5.2345 15400 0.5023
5.2685 15500 0.5047
5.3025 15600 0.5007
5.3365 15700 0.4982
5.3705 15800 0.5005
5.4045 15900 0.5101
5.4385 16000 0.4958
5.4725 16100 0.5039
5.5065 16200 0.4988
5.5404 16300 0.5028
5.5744 16400 0.499
5.6084 16500 0.4923
5.6424 16600 0.5024
5.6764 16700 0.5022
5.7104 16800 0.5007
5.7444 16900 0.4982
5.7784 17000 0.4969
5.8124 17100 0.4981
5.8464 17200 0.4987
5.8804 17300 0.4964
5.9143 17400 0.4974
5.9483 17500 0.4925
5.9823 17600 0.5087
6.0163 17700 0.4963
6.0503 17800 0.4954
6.0843 17900 0.4914
6.1183 18000 0.4878
6.1523 18100 0.5001
6.1863 18200 0.5008
6.2203 18300 0.5035
6.2542 18400 0.5016
6.2882 18500 0.4944
6.3222 18600 0.5011
6.3562 18700 0.4927
6.3902 18800 0.4965
6.4242 18900 0.5039
6.4582 19000 0.4971
6.4922 19100 0.4992
6.5262 19200 0.488
6.5602 19300 0.4935
6.5942 19400 0.5032
6.6281 19500 0.4955
6.6621 19600 0.494
6.6961 19700 0.4997
6.7301 19800 0.4941
6.7641 19900 0.4996
6.7981 20000 0.4951
6.8321 20100 0.497
6.8661 20200 0.4989
6.9001 20300 0.4937
6.9341 20400 0.4983
6.9680 20500 0.4968
7.0020 20600 0.5024
7.0360 20700 0.4979
7.0700 20800 0.4919
7.1040 20900 0.509
7.1380 21000 0.4961
7.1720 21100 0.4981
7.2060 21200 0.4903
7.2400 21300 0.4995
7.2740 21400 0.4961
7.3080 21500 0.4929
7.3419 21600 0.4919
7.3759 21700 0.5023
7.4099 21800 0.4865
7.4439 21900 0.4984
7.4779 22000 0.4882
7.5119 22100 0.4928
7.5459 22200 0.4929
7.5799 22300 0.504
7.6139 22400 0.4998
7.6479 22500 0.494
7.6818 22600 0.4891
7.7158 22700 0.4981
7.7498 22800 0.4888
7.7838 22900 0.4893
7.8178 23000 0.4948
7.8518 23100 0.4985
7.8858 23200 0.5004
7.9198 23300 0.492
7.9538 23400 0.4937
7.9878 23500 0.4947
8.0218 23600 0.4932
8.0557 23700 0.491
8.0897 23800 0.4966
8.1237 23900 0.5002
8.1577 24000 0.4956
8.1917 24100 0.4923
8.2257 24200 0.4935
8.2597 24300 0.492
8.2937 24400 0.489
8.3277 24500 0.4948
8.3617 24600 0.4937
8.3956 24700 0.4909
8.4296 24800 0.5005
8.4636 24900 0.4962
8.4976 25000 0.4865
8.5316 25100 0.4893
8.5656 25200 0.4931
8.5996 25300 0.4968
8.6336 25400 0.4951
8.6676 25500 0.4907
8.7016 25600 0.505
8.7356 25700 0.4938
8.7695 25800 0.4953
8.8035 25900 0.4968
8.8375 26000 0.4854
8.8715 26100 0.4847
8.9055 26200 0.4918
8.9395 26300 0.4987
8.9735 26400 0.4918
9.0075 26500 0.5023
9.0415 26600 0.4976
9.0755 26700 0.4947
9.1094 26800 0.4924
9.1434 26900 0.4914
9.1774 27000 0.4976
9.2114 27100 0.4908
9.2454 27200 0.4873
9.2794 27300 0.491
9.3134 27400 0.4912
9.3474 27500 0.4915
9.3814 27600 0.4933
9.4154 27700 0.4949
9.4494 27800 0.4978
9.4833 27900 0.4956
9.5173 28000 0.4854
9.5513 28100 0.4919
9.5853 28200 0.4919
9.6193 28300 0.4979
9.6533 28400 0.4921
9.6873 28500 0.4961
9.7213 28600 0.4918
9.7553 28700 0.4923
9.7893 28800 0.4934
9.8232 28900 0.4871
9.8572 29000 0.4879
9.8912 29100 0.4922
9.9252 29200 0.4921
9.9592 29300 0.4884
9.9932 29400 0.4936

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.3
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
13
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Jimmy-Ooi/Tyrisonase_test_model_1000_10_adafactor

Finetuned
(2700)
this model