Update README_CN.md
Browse files- README_CN.md +2 -107
README_CN.md
CHANGED
|
@@ -1,30 +1,10 @@
|
|
| 1 |
[**中文说明**](README_CN.md) | [**English**](README.md)
|
| 2 |
# 项目介绍
|
| 3 |
本项目旨在提供更好的中文CLIP模型。该项目使用的训练数据均为公开可访问的图像URL及相关中文文本描述,总量达到400M。经过筛选后,我们最终使用了100M的数据进行训练。
|
| 4 |
-
本项目于QQ-ARC Joint Lab, Tencent PCG
|
|
|
|
| 5 |
<br><br>
|
| 6 |
|
| 7 |
-
# 模型及实验
|
| 8 |
-
<span id="model_card"></span>
|
| 9 |
-
## 模型规模 & 下载链接
|
| 10 |
-
QA-CLIP目前开源3个不同规模,其模型信息和下载方式见下表:
|
| 11 |
-
|
| 12 |
-
<table border="1" width="100%">
|
| 13 |
-
<tr align="center">
|
| 14 |
-
<th>模型规模</th><th>下载链接</th><th>参数量</th><th>视觉侧骨架</th><th>视觉侧参数量</th><th>文本侧骨架</th><th>文本侧参数量</th><th>分辨率</th>
|
| 15 |
-
</tr>
|
| 16 |
-
<tr align="center">
|
| 17 |
-
<td>QA-CLIP<sub>RN50</sub></td><td><a href="https://huggingface.co/TencentARC/QA-CLIP/resolve/main/QA-CLIP-RN50.pt">Download</a></td><td>77M</td><td>ResNet50</td><td>38M</td><td>RBT3</td><td>39M</td><td>224</td>
|
| 18 |
-
</tr>
|
| 19 |
-
<tr align="center">
|
| 20 |
-
<td>QA-CLIP<sub>ViT-B/16</sub></td><td><a href="https://huggingface.co/TencentARC/QA-CLIP/resolve/main/QA-CLIP-base.pt">Download</a></td><td>188M</td><td>ViT-B/16</td><td>86M</td><td>RoBERTa-wwm-Base</td><td>102M</td><td>224</td>
|
| 21 |
-
</tr>
|
| 22 |
-
<tr align="center">
|
| 23 |
-
<td>QA-CLIP<sub>ViT-L/14</sub></td><td><a href="https://huggingface.co/TencentARC/QA-CLIP/resolve/main/QA-CLIP-large.pt">Download</a></td><td>406M</td><td>ViT-L/14</td><td>304M</td><td>RoBERTa-wwm-Base</td><td>102M</td><td>224</td>
|
| 24 |
-
</tr>
|
| 25 |
-
</table>
|
| 26 |
-
<br>
|
| 27 |
-
|
| 28 |
## 实验结果
|
| 29 |
针对图文检索任务,我们在[MUGE Retrieval](https://tianchi.aliyun.com/muge)、[Flickr30K-CN](https://github.com/li-xirong/cross-lingual-cap)和[COCO-CN](https://github.com/li-xirong/coco-cn)上进行了zero-shot测试。
|
| 30 |
针对图像零样本分类任务,我们在ImageNet数据集上进行了测试。测试结果见下表:
|
|
@@ -156,19 +136,6 @@ QA-CLIP目前开源3个不同规模,其模型信息和下载方式见下表:
|
|
| 156 |
|
| 157 |
|
| 158 |
# 使用教程
|
| 159 |
-
## 安装要求
|
| 160 |
-
环境配置要求:
|
| 161 |
-
|
| 162 |
-
* python >= 3.6.4
|
| 163 |
-
* pytorch >= 1.8.0 (with torchvision >= 0.9.0)
|
| 164 |
-
* CUDA Version >= 10.2
|
| 165 |
-
|
| 166 |
-
安装本项目所需库
|
| 167 |
-
```bash
|
| 168 |
-
cd /yourpath/QA-CLIP-main
|
| 169 |
-
pip install -r requirements.txt
|
| 170 |
-
```
|
| 171 |
-
|
| 172 |
## 推理代码
|
| 173 |
推理代码示例:
|
| 174 |
```python
|
|
@@ -202,78 +169,6 @@ probs = logits_per_image.softmax(dim=1)
|
|
| 202 |
```
|
| 203 |
<br><br>
|
| 204 |
|
| 205 |
-
## 预测及评估
|
| 206 |
-
|
| 207 |
-
### 图文检索测试数据集下载
|
| 208 |
-
<b>[Chinese-CLIP](https://github.com/OFA-Sys/Chinese-CLIP)</b>项目中已经预处理好测试集,这是他们提供的下载链接:
|
| 209 |
-
|
| 210 |
-
MUGE数据:[下载链接](https://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/datasets/MUGE.zip)
|
| 211 |
-
|
| 212 |
-
Flickr30K-CN数据:[下载链接](https://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/datasets/Flickr30k-CN.zip)
|
| 213 |
-
|
| 214 |
-
另外[COCO-CN](https://github.com/li-xirong/coco-cn)数据的获取需要向原作者进行申请
|
| 215 |
-
### ImageNet数据集下载
|
| 216 |
-
原始数据请自行下载,[中文标签](http://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/datasets/ImageNet-1K/label_cn.txt)和[英文标签](http://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/datasets/ImageNet-1K/label.txt)同样由<b>[Chinese-CLIP](https://github.com/OFA-Sys/Chinese-CLIP)</b>项目提供
|
| 217 |
-
### 图文检索评估
|
| 218 |
-
图文检索评估代码可以参考如下:
|
| 219 |
-
```bash
|
| 220 |
-
split=test # 指定计算valid或test集特征
|
| 221 |
-
resume=your_ckp_path
|
| 222 |
-
DATAPATH=your_DATAPATH
|
| 223 |
-
dataset_name=Flickr30k-CN
|
| 224 |
-
# dataset_name=MUGE
|
| 225 |
-
|
| 226 |
-
python -u eval/extract_features.py \
|
| 227 |
-
--extract-image-feats \
|
| 228 |
-
--extract-text-feats \
|
| 229 |
-
--image-data="${DATAPATH}/datasets/${dataset_name}/lmdb/${split}/imgs" \
|
| 230 |
-
--text-data="${DATAPATH}/datasets/${dataset_name}/${split}_texts.jsonl" \
|
| 231 |
-
--img-batch-size=32 \
|
| 232 |
-
--text-batch-size=32 \
|
| 233 |
-
--context-length=52 \
|
| 234 |
-
--resume=${resume} \
|
| 235 |
-
--vision-model=ViT-B-16 \
|
| 236 |
-
--text-model=RoBERTa-wwm-ext-base-chinese
|
| 237 |
-
|
| 238 |
-
python -u eval/make_topk_predictions.py \
|
| 239 |
-
--image-feats="${DATAPATH}/datasets/${dataset_name}/${split}_imgs.img_feat.jsonl" \
|
| 240 |
-
--text-feats="${DATAPATH}/datasets/${dataset_name}/${split}_texts.txt_feat.jsonl" \
|
| 241 |
-
--top-k=10 \
|
| 242 |
-
--eval-batch-size=32768 \
|
| 243 |
-
--output="${DATAPATH}/datasets/${dataset_name}/${split}_predictions.jsonl"
|
| 244 |
-
|
| 245 |
-
python -u eval/make_topk_predictions_tr.py \
|
| 246 |
-
--image-feats="${DATAPATH}/datasets/${dataset_name}/${split}_imgs.img_feat.jsonl" \
|
| 247 |
-
--text-feats="${DATAPATH}/datasets/${dataset_name}/${split}_texts.txt_feat.jsonl" \
|
| 248 |
-
--top-k=10 \
|
| 249 |
-
--eval-batch-size=32768 \
|
| 250 |
-
--output="${DATAPATH}/datasets/${dataset_name}/${split}_tr_predictions.jsonl"
|
| 251 |
-
|
| 252 |
-
python eval/evaluation.py \
|
| 253 |
-
${DATAPATH}/datasets/${dataset_name}/${split}_texts.jsonl \
|
| 254 |
-
${DATAPATH}/datasets/${dataset_name}/${split}_predictions.jsonl \
|
| 255 |
-
${DATAPATH}/datasets/${dataset_name}/output1.json
|
| 256 |
-
cat ${DATAPATH}/datasets/${dataset_name}/output1.json
|
| 257 |
-
|
| 258 |
-
python eval/transform_ir_annotation_to_tr.py \
|
| 259 |
-
--input ${DATAPATH}/datasets/${dataset_name}/${split}_texts.jsonl
|
| 260 |
-
|
| 261 |
-
python eval/evaluation_tr.py \
|
| 262 |
-
${DATAPATH}/datasets/${dataset_name}/${split}_texts.tr.jsonl \
|
| 263 |
-
${DATAPATH}/datasets/${dataset_name}/${split}_tr_predictions.jsonl \
|
| 264 |
-
${DATAPATH}/datasets/${dataset_name}/output2.json
|
| 265 |
-
cat ${DATAPATH}/datasets/${dataset_name}/output2.json
|
| 266 |
-
```
|
| 267 |
-
|
| 268 |
-
### ImageNet零样本分类
|
| 269 |
-
ImageNet零样本分类的代码参考如下
|
| 270 |
-
```bash
|
| 271 |
-
bash scripts/zeroshot_eval.sh 0 \
|
| 272 |
-
${DATAPATH} imagenet \
|
| 273 |
-
ViT-B-16 RoBERTa-wwm-ext-base-chinese \
|
| 274 |
-
./pretrained_weights/QA-CLIP-base.pt
|
| 275 |
-
```
|
| 276 |
-
<br><br>
|
| 277 |
# 致谢
|
| 278 |
项目代码基于<b>[Chinese-CLIP](https://github.com/OFA-Sys/Chinese-CLIP)</b>实现,非常感谢他们优秀的开源工作。
|
| 279 |
<br><br>
|
|
|
|
| 1 |
[**中文说明**](README_CN.md) | [**English**](README.md)
|
| 2 |
# 项目介绍
|
| 3 |
本项目旨在提供更好的中文CLIP模型。该项目使用的训练数据均为公开可访问的图像URL及相关中文文本描述,总量达到400M。经过筛选后,我们最终使用了100M的数据进行训练。
|
| 4 |
+
本项目于QQ-ARC Joint Lab, Tencent PCG完成。
|
| 5 |
+
更详细的信息可以参考[QA-CLIP项目的主页面](https://huggingface.co/TencentARC/QA-CLIP)。
|
| 6 |
<br><br>
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
## 实验结果
|
| 9 |
针对图文检索任务,我们在[MUGE Retrieval](https://tianchi.aliyun.com/muge)、[Flickr30K-CN](https://github.com/li-xirong/cross-lingual-cap)和[COCO-CN](https://github.com/li-xirong/coco-cn)上进行了zero-shot测试。
|
| 10 |
针对图像零样本分类任务,我们在ImageNet数据集上进行了测试。测试结果见下表:
|
|
|
|
| 136 |
|
| 137 |
|
| 138 |
# 使用教程
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
## 推理代码
|
| 140 |
推理代码示例:
|
| 141 |
```python
|
|
|
|
| 169 |
```
|
| 170 |
<br><br>
|
| 171 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
# 致谢
|
| 173 |
项目代码基于<b>[Chinese-CLIP](https://github.com/OFA-Sys/Chinese-CLIP)</b>实现,非常感谢他们优秀的开源工作。
|
| 174 |
<br><br>
|