Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ If you just want to check out how to use the model, please check out the [Usage
|
|
| 26 |
|
| 27 |
Welcome to JaColBERT version 2, the second release of JaColBERT, a Japanese-only document retrieval model based on [ColBERT](https://github.com/stanford-futuredata/ColBERT).
|
| 28 |
|
| 29 |
-
JaColBERTv2 is a model that offers very strong out-of-domain generalisation. Having been only trained on a single dataset (MMarco), it reaches state-of-the-art performance
|
| 30 |
|
| 31 |
JaColBERTv2 was initialised off JaColBERTv1 and trained using knowledge distillation with 31 negative examples per positive example. It was trained for 250k steps using a batch size of 32.
|
| 32 |
|
|
@@ -66,21 +66,22 @@ We present the first results, on two datasets: JQaRa, a passage retrieval task c
|
|
| 66 |
|
| 67 |
JaColBERTv2 reaches state-of-the-art results on both datasets, outperforming models with 5x more parameters.
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
|
| 71 |
-
|
|
| 72 |
-
|
|
| 73 |
-
|
|
| 74 |
-
|
|
| 75 |
-
|
|
| 76 |
-
|
|
| 77 |
-
| m-e5-
|
| 78 |
-
| m-e5-
|
| 79 |
-
|
|
| 80 |
-
|
|
| 81 |
-
| sup-simcse-ja-
|
| 82 |
-
|
|
| 83 |
-
|
|
|
|
|
| 84 |
|
| 85 |
|
| 86 |
# Usage
|
|
|
|
| 26 |
|
| 27 |
Welcome to JaColBERT version 2, the second release of JaColBERT, a Japanese-only document retrieval model based on [ColBERT](https://github.com/stanford-futuredata/ColBERT).
|
| 28 |
|
| 29 |
+
JaColBERTv2 is a model that offers very strong out-of-domain generalisation. Having been only trained on a single dataset (MMarco), it reaches state-of-the-art performance.
|
| 30 |
|
| 31 |
JaColBERTv2 was initialised off JaColBERTv1 and trained using knowledge distillation with 31 negative examples per positive example. It was trained for 250k steps using a batch size of 32.
|
| 32 |
|
|
|
|
| 66 |
|
| 67 |
JaColBERTv2 reaches state-of-the-art results on both datasets, outperforming models with 5x more parameters.
|
| 68 |
|
| 69 |
+
|
| 70 |
+
| | | | JQaRa | | | | JSQuAD | | |
|
| 71 |
+
| ------------------- | --- | --------- | --------- | --------- | --------- | --- | --------- | --------- | --------- |
|
| 72 |
+
| | | NDCG@10 | MRR@10 | NDCG@100 | MRR@100 | | R@1 | R@5 | R@10 |
|
| 73 |
+
| JaColBERTv2 | | **0.585** | **0.836** | **0.753** | **0.838** | | **0.918** | **0.975** | **0.982** |
|
| 74 |
+
| JaColBERT | | 0.549 | 0.811 | 0.730 | 0.814 | | 0.906 | 0.968 | 0.978 |
|
| 75 |
+
| bge-m3+all | | 0.576 | 0.818 | 0.745 | 0.820 | | N/A | N/A | N/A |
|
| 76 |
+
| bg3-m3+dense | | 0.539 | 0.785 | 0.721 | 0.788 | | 0.850 | 0.959 | 0.976 |
|
| 77 |
+
| m-e5-large | | 0.554 | 0.799 | 0.731 | 0.801 | | 0.865 | 0.966 | 0.977 |
|
| 78 |
+
| m-e5-base | | 0.471 | 0.727 | 0.673 | 0.731 | | *0.838* | *0.955* | 0.973 |
|
| 79 |
+
| m-e5-small | | 0.492 | 0.729 | 0.689 | 0.733 | | *0.840* | *0.954* | 0.973 |
|
| 80 |
+
| GLuCoSE | | 0.308 | 0.518 | 0.564 | 0.527 | | 0.645 | 0.846 | 0.897 |
|
| 81 |
+
| sup-simcse-ja-base | | 0.324 | 0.541 | 0.572 | 0.550 | | 0.632 | 0.849 | 0.897 |
|
| 82 |
+
| sup-simcse-ja-large | | 0.356 | 0.575 | 0.596 | 0.583 | | 0.603 | 0.833 | 0.889 |
|
| 83 |
+
| fio-base-v0.1 | | 0.372 | 0.616 | 0.608 | 0.622 | | 0.700 | 0.879 | 0.924 |
|
| 84 |
+
| | | | | | | | | | |
|
| 85 |
|
| 86 |
|
| 87 |
# Usage
|