Update README.md
Browse files
README.md
CHANGED
|
@@ -2606,7 +2606,7 @@ license: mit
|
|
| 2606 |
|
| 2607 |
# gte-small
|
| 2608 |
|
| 2609 |
-
Gegeral Text Embeddings (GTE) model.
|
| 2610 |
|
| 2611 |
The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including [GTE-large](https://huggingface.co/thenlper/gte-large), [GTE-base](https://huggingface.co/thenlper/gte-base), and [GTE-small](https://huggingface.co/thenlper/gte-small). The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc.
|
| 2612 |
|
|
@@ -2685,3 +2685,18 @@ print(cos_sim(embeddings[0], embeddings[1]))
|
|
| 2685 |
### Limitation
|
| 2686 |
|
| 2687 |
This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2606 |
|
| 2607 |
# gte-small
|
| 2608 |
|
| 2609 |
+
Gegeral Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
|
| 2610 |
|
| 2611 |
The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including [GTE-large](https://huggingface.co/thenlper/gte-large), [GTE-base](https://huggingface.co/thenlper/gte-base), and [GTE-small](https://huggingface.co/thenlper/gte-small). The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc.
|
| 2612 |
|
|
|
|
| 2685 |
### Limitation
|
| 2686 |
|
| 2687 |
This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
|
| 2688 |
+
|
| 2689 |
+
### Citation
|
| 2690 |
+
|
| 2691 |
+
If you find our paper or models helpful, please consider citing them as follows:
|
| 2692 |
+
|
| 2693 |
+
```
|
| 2694 |
+
@misc{li2023general,
|
| 2695 |
+
title={Towards General Text Embeddings with Multi-stage Contrastive Learning},
|
| 2696 |
+
author={Zehan Li and Xin Zhang and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Meishan Zhang},
|
| 2697 |
+
year={2023},
|
| 2698 |
+
eprint={2308.03281},
|
| 2699 |
+
archivePrefix={arXiv},
|
| 2700 |
+
primaryClass={cs.CL}
|
| 2701 |
+
}
|
| 2702 |
+
```
|