SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google-bert/bert-base-cased
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- csv

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jimmy-Ooi/Tyrisonase_test_model_1000_10_adafactor")
# Run inference
sentences = [
    'CCCCc1ccc(/C(CC)=N/NC(N)=S)cc1',
    'COc1ccccc1/C=N/NC(=O)c1ccc(OC)c(OC)c1',
    'O=C(/C=C/c1ccc(O)c(O)c1)NCCc1c[nH]c2ccc(O)cc12',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.2733, -0.0760],
#         [-0.2733,  1.0000,  0.9683],
#         [-0.0760,  0.9683,  1.0000]])

Training Details

Training Dataset

csv

Dataset: csv
Size: 188,228 training samples
Columns: premise, hypothesis, and label
Approximate statistics based on the first 1000 samples:
premise hypothesis label
type string string int
details
min: 8 tokens
mean: 38.76 tokens
max: 106 tokens

min: 8 tokens
mean: 39.72 tokens
max: 145 tokens

0: ~50.20%
2: ~49.80%

	premise	hypothesis	label
type	string	string	int
details	min: 8 tokens mean: 38.76 tokens max: 106 tokens	min: 8 tokens mean: 39.72 tokens max: 145 tokens	0: ~50.20% 2: ~49.80%

Samples:

premise	hypothesis	label
`COc1cc(OC)c(C2CCN(C)C2CO)c(O)c1-c1cc(-c2ccccc2Cl)[nH]n1`	`O=C(O)c1ccc(OCc2cn(Cc3cc(=O)c(O)co3)nn2)cc1`	`2`
`Cl.NC(Cc1ccc(=O)n(O)c1)C(=O)O`	`Cn1c2ccccc2c2cc(/C=C/C(=O)c3cccc(NC(=O)c4cccc(Cl)c4)c3)ccc21`	`0`
`Cc1ccc(O)cc1O`	`O=C1NC(=O)C(=Cc2cc(O)c(O)c(O)c2)C(=O)N1`	`0`

Loss: SoftmaxLoss

Evaluation Dataset

csv

Dataset: csv
Size: 33,217 evaluation samples
Columns: premise, hypothesis, and label
Approximate statistics based on the first 1000 samples:
premise hypothesis label
type string string int
details
min: 11 tokens
mean: 39.05 tokens
max: 145 tokens

min: 8 tokens
mean: 39.29 tokens
max: 145 tokens

0: ~51.40%
2: ~48.60%

	premise	hypothesis	label
type	string	string	int
details	min: 11 tokens mean: 39.05 tokens max: 145 tokens	min: 8 tokens mean: 39.29 tokens max: 145 tokens	0: ~51.40% 2: ~48.60%

Samples:

premise	hypothesis	label
`CC(=O)Oc1c(/N=N/c2ccc(C3=N/C(=C/c4ccc(OC(F)(F)F)cc4)C(=O)O3)cc2)ccc2ccccc12`	`CC1(C)C=Cc2c(cc3c(c2O)C(=O)CC(c2ccc(O)cc2O)O3)O1`	`0`
`COc1cc2c(c(OC)c1OC)[C@@H]1OC@HC@@HC@H[C@H]1OC2=O`	`O=C(c1cccc(F)c1)N1CCN(Cc2ccc(F)cc2)CC1`	`0`
`CC(=O)Nc1ccc(/N=N/c2c(N)nc(N)nc2Cl)cc1`	`NC(=S)c1ccncc1`	`0`

Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
per_device_eval_batch_size: 64
weight_decay: 0.001
num_train_epochs: 10
warmup_steps: 100
fp16: True
optim: adafactor

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.001
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 100
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adafactor
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0340	100	0.7569
0.0680	200	0.6817
0.1020	300	0.6559
0.1360	400	0.6302
0.1700	500	0.6204
0.2039	600	0.6015
0.2379	700	0.5901
0.2719	800	0.583
0.3059	900	0.5892
0.3399	1000	0.58
0.3739	1100	0.5752
0.4079	1200	0.5707
0.4419	1300	0.5727
0.4759	1400	0.5562
0.5099	1500	0.5736
0.5438	1600	0.5609
0.5778	1700	0.5545
0.6118	1800	0.5528
0.6458	1900	0.5503
0.6798	2000	0.5527
0.7138	2100	0.5552
0.7478	2200	0.5499
0.7818	2300	0.5477
0.8158	2400	0.5429
0.8498	2500	0.5314
0.8838	2600	0.5542
0.9177	2700	0.5373
0.9517	2800	0.5321
0.9857	2900	0.5412
1.0197	3000	0.5367
1.0537	3100	0.5368
1.0877	3200	0.5388
1.1217	3300	0.5419
1.1557	3400	0.5303
1.1897	3500	0.5369
1.2237	3600	0.5357
1.2576	3700	0.5296
1.2916	3800	0.5368
1.3256	3900	0.5351
1.3596	4000	0.533
1.3936	4100	0.5294
1.4276	4200	0.5341
1.4616	4300	0.5307
1.4956	4400	0.5295
1.5296	4500	0.5269
1.5636	4600	0.5272
1.5976	4700	0.5227
1.6315	4800	0.529
1.6655	4900	0.5316
1.6995	5000	0.53
1.7335	5100	0.5251
1.7675	5200	0.5294
1.8015	5300	0.5225
1.8355	5400	0.5204
1.8695	5500	0.5139
1.9035	5600	0.525
1.9375	5700	0.5242
1.9714	5800	0.5208
2.0054	5900	0.5183
2.0394	6000	0.523
2.0734	6100	0.5144
2.1074	6200	0.514
2.1414	6300	0.516
2.1754	6400	0.527
2.2094	6500	0.5182
2.2434	6600	0.5213
2.2774	6700	0.5162
2.3114	6800	0.5202
2.3453	6900	0.5258
2.3793	7000	0.5191
2.4133	7100	0.5185
2.4473	7200	0.5134
2.4813	7300	0.5231
2.5153	7400	0.513
2.5493	7500	0.5167
2.5833	7600	0.5089
2.6173	7700	0.5163
2.6513	7800	0.517
2.6852	7900	0.5081
2.7192	8000	0.5171
2.7532	8100	0.5138
2.7872	8200	0.508
2.8212	8300	0.5172
2.8552	8400	0.5109
2.8892	8500	0.5023
2.9232	8600	0.5128
2.9572	8700	0.5119
2.9912	8800	0.5082
3.0252	8900	0.5183
3.0591	9000	0.512
3.0931	9100	0.5112
3.1271	9200	0.5157
3.1611	9300	0.5066
3.1951	9400	0.5035
3.2291	9500	0.5037
3.2631	9600	0.5112
3.2971	9700	0.5147
3.3311	9800	0.5112
3.3651	9900	0.5
3.3990	10000	0.5152
3.4330	10100	0.5146
3.4670	10200	0.5103
3.5010	10300	0.5129
3.5350	10400	0.5005
3.5690	10500	0.5065
3.6030	10600	0.5105
3.6370	10700	0.5101
3.6710	10800	0.5058
3.7050	10900	0.5093
3.7390	11000	0.5102
3.7729	11100	0.511
3.8069	11200	0.4982
3.8409	11300	0.4973
3.8749	11400	0.5068
3.9089	11500	0.497
3.9429	11600	0.5018
3.9769	11700	0.5028
4.0109	11800	0.5132
4.0449	11900	0.5024
4.0789	12000	0.4992
4.1128	12100	0.4954
4.1468	12200	0.5094
4.1808	12300	0.5091
4.2148	12400	0.507
4.2488	12500	0.504
4.2828	12600	0.5029
4.3168	12700	0.4976
4.3508	12800	0.5001
4.3848	12900	0.5077
4.4188	13000	0.496
4.4528	13100	0.5075
4.4867	13200	0.5059
4.5207	13300	0.5111
4.5547	13400	0.504
4.5887	13500	0.4977
4.6227	13600	0.5156
4.6567	13700	0.4949
4.6907	13800	0.5064
4.7247	13900	0.5014
4.7587	14000	0.5006
4.7927	14100	0.5018
4.8266	14200	0.5079
4.8606	14300	0.5089
4.8946	14400	0.5006
4.9286	14500	0.5123
4.9626	14600	0.5019
4.9966	14700	0.5023
5.0306	14800	0.496
5.0646	14900	0.4934
5.0986	15000	0.5006
5.1326	15100	0.5021
5.1666	15200	0.4989
5.2005	15300	0.4932
5.2345	15400	0.5023
5.2685	15500	0.5047
5.3025	15600	0.5007
5.3365	15700	0.4982
5.3705	15800	0.5005
5.4045	15900	0.5101
5.4385	16000	0.4958
5.4725	16100	0.5039
5.5065	16200	0.4988
5.5404	16300	0.5028
5.5744	16400	0.499
5.6084	16500	0.4923
5.6424	16600	0.5024
5.6764	16700	0.5022
5.7104	16800	0.5007
5.7444	16900	0.4982
5.7784	17000	0.4969
5.8124	17100	0.4981
5.8464	17200	0.4987
5.8804	17300	0.4964
5.9143	17400	0.4974
5.9483	17500	0.4925
5.9823	17600	0.5087
6.0163	17700	0.4963
6.0503	17800	0.4954
6.0843	17900	0.4914
6.1183	18000	0.4878
6.1523	18100	0.5001
6.1863	18200	0.5008
6.2203	18300	0.5035
6.2542	18400	0.5016
6.2882	18500	0.4944
6.3222	18600	0.5011
6.3562	18700	0.4927
6.3902	18800	0.4965
6.4242	18900	0.5039
6.4582	19000	0.4971
6.4922	19100	0.4992
6.5262	19200	0.488
6.5602	19300	0.4935
6.5942	19400	0.5032
6.6281	19500	0.4955
6.6621	19600	0.494
6.6961	19700	0.4997
6.7301	19800	0.4941
6.7641	19900	0.4996
6.7981	20000	0.4951
6.8321	20100	0.497
6.8661	20200	0.4989
6.9001	20300	0.4937
6.9341	20400	0.4983
6.9680	20500	0.4968
7.0020	20600	0.5024
7.0360	20700	0.4979
7.0700	20800	0.4919
7.1040	20900	0.509
7.1380	21000	0.4961
7.1720	21100	0.4981
7.2060	21200	0.4903
7.2400	21300	0.4995
7.2740	21400	0.4961
7.3080	21500	0.4929
7.3419	21600	0.4919
7.3759	21700	0.5023
7.4099	21800	0.4865
7.4439	21900	0.4984
7.4779	22000	0.4882
7.5119	22100	0.4928
7.5459	22200	0.4929
7.5799	22300	0.504
7.6139	22400	0.4998
7.6479	22500	0.494
7.6818	22600	0.4891
7.7158	22700	0.4981
7.7498	22800	0.4888
7.7838	22900	0.4893
7.8178	23000	0.4948
7.8518	23100	0.4985
7.8858	23200	0.5004
7.9198	23300	0.492
7.9538	23400	0.4937
7.9878	23500	0.4947
8.0218	23600	0.4932
8.0557	23700	0.491
8.0897	23800	0.4966
8.1237	23900	0.5002
8.1577	24000	0.4956
8.1917	24100	0.4923
8.2257	24200	0.4935
8.2597	24300	0.492
8.2937	24400	0.489
8.3277	24500	0.4948
8.3617	24600	0.4937
8.3956	24700	0.4909
8.4296	24800	0.5005
8.4636	24900	0.4962
8.4976	25000	0.4865
8.5316	25100	0.4893
8.5656	25200	0.4931
8.5996	25300	0.4968
8.6336	25400	0.4951
8.6676	25500	0.4907
8.7016	25600	0.505
8.7356	25700	0.4938
8.7695	25800	0.4953
8.8035	25900	0.4968
8.8375	26000	0.4854
8.8715	26100	0.4847
8.9055	26200	0.4918
8.9395	26300	0.4987
8.9735	26400	0.4918
9.0075	26500	0.5023
9.0415	26600	0.4976
9.0755	26700	0.4947
9.1094	26800	0.4924
9.1434	26900	0.4914
9.1774	27000	0.4976
9.2114	27100	0.4908
9.2454	27200	0.4873
9.2794	27300	0.491
9.3134	27400	0.4912
9.3474	27500	0.4915
9.3814	27600	0.4933
9.4154	27700	0.4949
9.4494	27800	0.4978
9.4833	27900	0.4956
9.5173	28000	0.4854
9.5513	28100	0.4919
9.5853	28200	0.4919
9.6193	28300	0.4979
9.6533	28400	0.4921
9.6873	28500	0.4961
9.7213	28600	0.4918
9.7553	28700	0.4923
9.7893	28800	0.4934
9.8232	28900	0.4871
9.8572	29000	0.4879
9.8912	29100	0.4922
9.9252	29200	0.4921
9.9592	29300	0.4884
9.9932	29400	0.4936

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.1.2
Transformers: 4.57.3
PyTorch: 2.9.0+cu126
Accelerate: 1.12.0
Datasets: 4.0.0
Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 13

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Jimmy-Ooi/Tyrisonase_test_model_1000_10_adafactor

Base model

google-bert/bert-base-cased

Finetuned

(2700)

this model