init q8 gguf
Browse files- .gitattributes +1 -0
- GGUF_Q8_0_README.md +223 -0
- gguf_output/dream-coder-7b-f16.gguf +3 -0
- gguf_output/dream-coder-7b-q8_0.gguf +3 -0
- pyproject.toml +16 -0
- quantize_dream_q8_0.py +342 -0
- uv.lock +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.gguf filter=lfs diff=lfs merge=lfs -text
|
GGUF_Q8_0_README.md
ADDED
|
@@ -0,0 +1,223 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dream-Coder GGUF Q8_0 量化指南
|
| 2 |
+
|
| 3 |
+
本指南专门为 Dream-Coder v0-Instruct-7B 模型的 GGUF Q8_0 量化而设计。
|
| 4 |
+
|
| 5 |
+
## 快速开始
|
| 6 |
+
|
| 7 |
+
### 1. 环境准备
|
| 8 |
+
|
| 9 |
+
```bash
|
| 10 |
+
# 1. 克隆并编译 llama.cpp
|
| 11 |
+
git clone https://github.com/ggerganov/llama.cpp
|
| 12 |
+
cd llama.cpp
|
| 13 |
+
make -j$(nproc)
|
| 14 |
+
|
| 15 |
+
# 2. 安装 Python 依赖
|
| 16 |
+
pip install transformers>=4.46.2 torch safetensors numpy
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
### 2. 执行量化
|
| 20 |
+
|
| 21 |
+
#### 方法一: 使用提供的脚本
|
| 22 |
+
|
| 23 |
+
```bash
|
| 24 |
+
# 设置 llama.cpp 路径
|
| 25 |
+
export LLAMA_CPP_PATH=/path/to/llama.cpp
|
| 26 |
+
|
| 27 |
+
# 运行量化脚本
|
| 28 |
+
./quantize_example.sh
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
#### 方法二: 手动执行
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
python quantize_dream_q8_0.py \
|
| 35 |
+
--model_path /path/to/Dream-Coder-v0-Instruct-7B \
|
| 36 |
+
--llama_cpp_path /path/to/llama.cpp \
|
| 37 |
+
--output_dir ./gguf_output \
|
| 38 |
+
--keep_f16
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
### 3. 参数说明
|
| 42 |
+
|
| 43 |
+
- `--model_path`: Dream-Coder 模型路径 (默认: 当前目录)
|
| 44 |
+
- `--llama_cpp_path`: llama.cpp 项目路径 (必需)
|
| 45 |
+
- `--output_dir`: 输出目录 (默认: ./gguf_output)
|
| 46 |
+
- `--keep_f16`: 保留 F16 中间文件
|
| 47 |
+
|
| 48 |
+
## 架构适配说明
|
| 49 |
+
|
| 50 |
+
### Dream-Coder 特殊配置处理
|
| 51 |
+
|
| 52 |
+
本量化脚本专门处理了 Dream-Coder 的以下特殊配置:
|
| 53 |
+
|
| 54 |
+
1. **架构映射**: DreamModel → LlamaForCausalLM (兼容性)
|
| 55 |
+
2. **特殊 Token ID**:
|
| 56 |
+
- `mask_token_id`: 151666 (diffusion 关键 token)
|
| 57 |
+
- `bos_token_id`: 151665
|
| 58 |
+
- `eos_token_id`: 151643
|
| 59 |
+
- `pad_token_id`: 151643
|
| 60 |
+
|
| 61 |
+
3. **模型参数**:
|
| 62 |
+
- 词汇表大小: 152,064
|
| 63 |
+
- 隐藏维度: 3,584
|
| 64 |
+
- 注意力头: 28 (4 个 key-value heads)
|
| 65 |
+
- 层数: 28
|
| 66 |
+
- 上下文长度: 32,768
|
| 67 |
+
|
| 68 |
+
4. **Diffusion 特性**:
|
| 69 |
+
- 保持 `mask_token_id` 元数据
|
| 70 |
+
- RoPE theta: 1,000,000.0
|
| 71 |
+
- 激活函数: SiLU
|
| 72 |
+
|
| 73 |
+
## 输出说明
|
| 74 |
+
|
| 75 |
+
### 文件结构
|
| 76 |
+
```
|
| 77 |
+
gguf_output/
|
| 78 |
+
├── dream-coder-7b-f16.gguf # F16 中间文件 (可选保留)
|
| 79 |
+
└── dream-coder-7b-q8_0.gguf # 最终 Q8_0 量化文件
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### 性能预期
|
| 83 |
+
|
| 84 |
+
| 指标 | 原始 (BF16) | Q8_0 |
|
| 85 |
+
|------|-------------|------|
|
| 86 |
+
| 内存占用 | ~14GB | ~6.7GB |
|
| 87 |
+
| 推理速度 | 1.0x | 1.2-1.5x |
|
| 88 |
+
| 精度损失 | 0% | <0.1% |
|
| 89 |
+
|
| 90 |
+
## 使用方法
|
| 91 |
+
|
| 92 |
+
### llama.cpp 命令行
|
| 93 |
+
|
| 94 |
+
```bash
|
| 95 |
+
# 基本使用
|
| 96 |
+
./llama.cpp/main \
|
| 97 |
+
-m gguf_output/dream-coder-7b-q8_0.gguf \
|
| 98 |
+
-p "def quicksort(arr):" \
|
| 99 |
+
-n 512 \
|
| 100 |
+
-c 2048
|
| 101 |
+
|
| 102 |
+
# 高级参数
|
| 103 |
+
./llama.cpp/main \
|
| 104 |
+
-m gguf_output/dream-coder-7b-q8_0.gguf \
|
| 105 |
+
-p "Write a binary search function" \
|
| 106 |
+
-n 256 \
|
| 107 |
+
-c 2048 \
|
| 108 |
+
--temp 0.1 \
|
| 109 |
+
--top-p 0.95 \
|
| 110 |
+
--repeat-penalty 1.1 \
|
| 111 |
+
-t 8
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
### Python (llama-cpp-python)
|
| 115 |
+
|
| 116 |
+
```bash
|
| 117 |
+
pip install llama-cpp-python
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
```python
|
| 121 |
+
from llama_cpp import Llama
|
| 122 |
+
|
| 123 |
+
# 加载模型
|
| 124 |
+
llm = Llama(
|
| 125 |
+
model_path="gguf_output/dream-coder-7b-q8_0.gguf",
|
| 126 |
+
n_ctx=2048,
|
| 127 |
+
n_threads=8,
|
| 128 |
+
n_gpu_layers=0 # CPU 推理, 设置 >0 启用 GPU 加速
|
| 129 |
+
)
|
| 130 |
+
|
| 131 |
+
# 生成代码
|
| 132 |
+
output = llm(
|
| 133 |
+
"def fibonacci(n):",
|
| 134 |
+
max_tokens=512,
|
| 135 |
+
temperature=0.1,
|
| 136 |
+
top_p=0.95,
|
| 137 |
+
repeat_penalty=1.1
|
| 138 |
+
)
|
| 139 |
+
|
| 140 |
+
print(output['choices'][0]['text'])
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
### 带 GPU 加速
|
| 144 |
+
|
| 145 |
+
如果编译了 CUDA 支持:
|
| 146 |
+
|
| 147 |
+
```bash
|
| 148 |
+
# 编译 CUDA 版本
|
| 149 |
+
cd llama.cpp
|
| 150 |
+
make clean
|
| 151 |
+
make LLAMA_CUBLAS=1 -j$(nproc)
|
| 152 |
+
|
| 153 |
+
# 使用 GPU 加速 (部分层)
|
| 154 |
+
./main \
|
| 155 |
+
-m gguf_output/dream-coder-7b-q8_0.gguf \
|
| 156 |
+
-p "def quicksort(arr):" \
|
| 157 |
+
-n 512 \
|
| 158 |
+
-ngl 20 # GPU 层数
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
## 故障排除
|
| 162 |
+
|
| 163 |
+
### 常见问题
|
| 164 |
+
|
| 165 |
+
1. **转换失败**:
|
| 166 |
+
- 确保 llama.cpp 已正确编译
|
| 167 |
+
- 检查 Python 依赖版本
|
| 168 |
+
- 验证模型文件完整性
|
| 169 |
+
|
| 170 |
+
2. **量化失败**:
|
| 171 |
+
- 检查磁盘空间 (需要 ~20GB 临时空间)
|
| 172 |
+
- 确保有足够内存 (推荐 32GB+)
|
| 173 |
+
|
| 174 |
+
3. **推理错误**:
|
| 175 |
+
- 验证 GGUF 文件完整性
|
| 176 |
+
- 检查上下文长度设置
|
| 177 |
+
- 尝试降低 `n_gpu_layers`
|
| 178 |
+
|
| 179 |
+
### 验证模型
|
| 180 |
+
|
| 181 |
+
```bash
|
| 182 |
+
# 文件完整性检查
|
| 183 |
+
ls -lh gguf_output/dream-coder-7b-q8_0.gguf
|
| 184 |
+
|
| 185 |
+
# 简单推理测试
|
| 186 |
+
echo "def hello():" | ./llama.cpp/main -m gguf_output/dream-coder-7b-q8_0.gguf -n 20
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
## 性能优化
|
| 190 |
+
|
| 191 |
+
### CPU 优化
|
| 192 |
+
- 使用 `-t` 参数设置线程数
|
| 193 |
+
- 启用 AVX2/AVX512 编译选项
|
| 194 |
+
- 调整 batch size (`-b` 参数)
|
| 195 |
+
|
| 196 |
+
### GPU 优化
|
| 197 |
+
- 使用 CUDA/OpenCL 编译
|
| 198 |
+
- 调整 GPU 层数 (`-ngl`)
|
| 199 |
+
- 监控 GPU 内存使用
|
| 200 |
+
|
| 201 |
+
### 内存优化
|
| 202 |
+
- 使用 `--mmap` 启用内存映射
|
| 203 |
+
- 调整 `--mlock` 参数
|
| 204 |
+
- 设置合适的上下文长度
|
| 205 |
+
|
| 206 |
+
## 注意事项
|
| 207 |
+
|
| 208 |
+
1. **Diffusion 特性**: Dream-Coder 使用 diffusion 生成,与传统 autoregressive 模型不同
|
| 209 |
+
2. **特殊 Token**: 保持 `mask_token_id` (151666) 的正确处理
|
| 210 |
+
3. **上下文长度**: 支持最大 32K tokens,但推荐 2K-4K 以获得最佳性能
|
| 211 |
+
4. **生成参数**: 推荐使用较低的 temperature (0.1-0.3) 和合适的 top_p (0.9-0.95)
|
| 212 |
+
|
| 213 |
+
## 技术支持
|
| 214 |
+
|
| 215 |
+
如遇问题,请检查:
|
| 216 |
+
1. llama.cpp 版本和编译状态
|
| 217 |
+
2. Python 依赖版本兼容性
|
| 218 |
+
3. 模型文件完整性
|
| 219 |
+
4. 系统资源 (内存/磁盘)
|
| 220 |
+
|
| 221 |
+
更多信息参考:
|
| 222 |
+
- [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp)
|
| 223 |
+
- [GGUF 格式说明](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)
|
gguf_output/dream-coder-7b-f16.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1cd0e2ecd60fce13c9a4f6831dfeaab924704c6bf36ec608bf5996cc7794cf06
|
| 3 |
+
size 15237853216
|
gguf_output/dream-coder-7b-q8_0.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d4628b3c1e531f063c6e15ad4a64ceb3d9c082a7f5a4820ef19975ec4e4d79c7
|
| 3 |
+
size 8098525216
|
pyproject.toml
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "dream-coder-v0-instruct-7b"
|
| 3 |
+
version = "0.1.0"
|
| 4 |
+
description = "Dream-Coder quantization tools"
|
| 5 |
+
readme = "README.md"
|
| 6 |
+
requires-python = ">=3.11"
|
| 7 |
+
dependencies = [
|
| 8 |
+
"torch>=2.5.0",
|
| 9 |
+
"transformers>=4.46.2",
|
| 10 |
+
"safetensors>=0.4.0",
|
| 11 |
+
"numpy>=1.24.0",
|
| 12 |
+
"psutil>=5.9.0",
|
| 13 |
+
"accelerate>=0.34.0",
|
| 14 |
+
"mistral-common>=1.8.4",
|
| 15 |
+
"sentencepiece>=0.2.1",
|
| 16 |
+
]
|
quantize_dream_q8_0.py
ADDED
|
@@ -0,0 +1,342 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Dream-Coder v0-Instruct-7B GGUF Q8_0 量化脚本
|
| 4 |
+
|
| 5 |
+
本脚本专门为 Dream-Coder 模型架构设计,处理其特殊的 diffusion 架构。
|
| 6 |
+
基于 llama.cpp 的转换工具,增加了对 Dream-Coder 特有配置的支持。
|
| 7 |
+
|
| 8 |
+
使用方法:
|
| 9 |
+
python quantize_dream_q8_0.py --model_path /path/to/Dream-Coder-v0-Instruct-7B --output_dir ./gguf_output
|
| 10 |
+
|
| 11 |
+
依赖:
|
| 12 |
+
- llama.cpp (需要先 git clone 并编译)
|
| 13 |
+
- transformers>=4.46.2
|
| 14 |
+
- torch
|
| 15 |
+
- safetensors
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
import os
|
| 19 |
+
import sys
|
| 20 |
+
import json
|
| 21 |
+
import argparse
|
| 22 |
+
import subprocess
|
| 23 |
+
import tempfile
|
| 24 |
+
from pathlib import Path
|
| 25 |
+
from typing import Dict, Any
|
| 26 |
+
|
| 27 |
+
def check_llama_cpp_installation(llama_cpp_path: str) -> bool:
|
| 28 |
+
"""检查 llama.cpp 安装和编译状态"""
|
| 29 |
+
required_files = [
|
| 30 |
+
"convert_hf_to_gguf.py", # 转换脚本
|
| 31 |
+
"build/bin/llama-quantize" # 编译后的量化工具
|
| 32 |
+
]
|
| 33 |
+
|
| 34 |
+
for file in required_files:
|
| 35 |
+
file_path = Path(llama_cpp_path) / file
|
| 36 |
+
if not file_path.exists():
|
| 37 |
+
print(f"缺少文件: {file_path}")
|
| 38 |
+
return False
|
| 39 |
+
|
| 40 |
+
return True
|
| 41 |
+
|
| 42 |
+
def prepare_dream_config(model_path: str) -> Dict[str, Any]:
|
| 43 |
+
"""
|
| 44 |
+
准备 Dream-Coder 特定的配置信息
|
| 45 |
+
处理架构差异和特殊 token
|
| 46 |
+
"""
|
| 47 |
+
config_path = Path(model_path) / "config.json"
|
| 48 |
+
|
| 49 |
+
with open(config_path, 'r', encoding='utf-8') as f:
|
| 50 |
+
config = json.load(f)
|
| 51 |
+
|
| 52 |
+
# Dream-Coder 特定配置映射
|
| 53 |
+
dream_config = {
|
| 54 |
+
# 基本架构信息
|
| 55 |
+
"model_type": "llama", # 映射到 llama.cpp 支持的类型
|
| 56 |
+
"architectures": ["LlamaForCausalLM"], # 兼容映射
|
| 57 |
+
|
| 58 |
+
# 模型参数
|
| 59 |
+
"vocab_size": config.get("vocab_size", 152064),
|
| 60 |
+
"hidden_size": config.get("hidden_size", 3584),
|
| 61 |
+
"intermediate_size": config.get("intermediate_size", 18944),
|
| 62 |
+
"num_hidden_layers": config.get("num_hidden_layers", 28),
|
| 63 |
+
"num_attention_heads": config.get("num_attention_heads", 28),
|
| 64 |
+
"num_key_value_heads": config.get("num_key_value_heads", 4),
|
| 65 |
+
"max_position_embeddings": config.get("max_position_embeddings", 32768),
|
| 66 |
+
|
| 67 |
+
# 特殊配置
|
| 68 |
+
"hidden_act": config.get("hidden_act", "silu"),
|
| 69 |
+
"rms_norm_eps": config.get("rms_norm_eps", 1e-06),
|
| 70 |
+
"rope_theta": config.get("rope_theta", 1000000.0),
|
| 71 |
+
"rope_scaling": config.get("rope_scaling"),
|
| 72 |
+
|
| 73 |
+
# 特殊 token ID
|
| 74 |
+
"bos_token_id": config.get("bos_token_id", 151665),
|
| 75 |
+
"eos_token_id": config.get("eos_token_id", 151643),
|
| 76 |
+
"pad_token_id": config.get("pad_token_id", 151643),
|
| 77 |
+
|
| 78 |
+
# Dream-Coder 特有: mask token (关键!)
|
| 79 |
+
"mask_token_id": config.get("mask_token_id", 151666),
|
| 80 |
+
|
| 81 |
+
# 其他参数
|
| 82 |
+
"tie_word_embeddings": config.get("tie_word_embeddings", False),
|
| 83 |
+
"torch_dtype": config.get("torch_dtype", "bfloat16"),
|
| 84 |
+
"use_cache": config.get("use_cache", True),
|
| 85 |
+
"attention_dropout": config.get("attention_dropout", 0.0),
|
| 86 |
+
"initializer_range": config.get("initializer_range", 0.02),
|
| 87 |
+
|
| 88 |
+
# Dream-Coder diffusion 相关
|
| 89 |
+
"max_window_layers": config.get("max_window_layers", 28),
|
| 90 |
+
"sliding_window": config.get("sliding_window"),
|
| 91 |
+
"use_sliding_window": config.get("use_sliding_window", False),
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
return dream_config
|
| 95 |
+
|
| 96 |
+
def create_compatible_config(model_path: str, temp_dir: str) -> str:
|
| 97 |
+
"""
|
| 98 |
+
创建与 llama.cpp 兼容的配置文件
|
| 99 |
+
"""
|
| 100 |
+
dream_config = prepare_dream_config(model_path)
|
| 101 |
+
|
| 102 |
+
# 创建临时配置文件
|
| 103 |
+
temp_config_path = Path(temp_dir) / "config.json"
|
| 104 |
+
|
| 105 |
+
with open(temp_config_path, 'w', encoding='utf-8') as f:
|
| 106 |
+
json.dump(dream_config, f, indent=2, ensure_ascii=False)
|
| 107 |
+
|
| 108 |
+
return str(temp_config_path)
|
| 109 |
+
|
| 110 |
+
def convert_to_gguf_f16(model_path: str, llama_cpp_path: str, output_path: str) -> bool:
|
| 111 |
+
"""
|
| 112 |
+
第一步: 转换 PyTorch 模型到 GGUF F16 格式
|
| 113 |
+
"""
|
| 114 |
+
print("步骤 1: 转换 PyTorch 模型到 GGUF F16...")
|
| 115 |
+
|
| 116 |
+
convert_script = Path(llama_cpp_path) / "convert_hf_to_gguf.py"
|
| 117 |
+
|
| 118 |
+
cmd = [
|
| 119 |
+
sys.executable,
|
| 120 |
+
str(convert_script),
|
| 121 |
+
model_path,
|
| 122 |
+
"--outfile", output_path,
|
| 123 |
+
"--outtype", "f16",
|
| 124 |
+
"--verbose", # 显示详细信息
|
| 125 |
+
]
|
| 126 |
+
|
| 127 |
+
try:
|
| 128 |
+
result = subprocess.run(
|
| 129 |
+
cmd,
|
| 130 |
+
check=True,
|
| 131 |
+
capture_output=True,
|
| 132 |
+
text=True,
|
| 133 |
+
cwd=llama_cpp_path
|
| 134 |
+
)
|
| 135 |
+
print("✓ F16 转换成功")
|
| 136 |
+
print(f"输出: {result.stdout}")
|
| 137 |
+
return True
|
| 138 |
+
except subprocess.CalledProcessError as e:
|
| 139 |
+
print(f"✗ F16 转换失败: {e}")
|
| 140 |
+
print(f"错误输出: {e.stderr}")
|
| 141 |
+
return False
|
| 142 |
+
|
| 143 |
+
def quantize_to_q8_0(f16_path: str, llama_cpp_path: str, q8_0_path: str) -> bool:
|
| 144 |
+
"""
|
| 145 |
+
第二步: 量化 F16 模型到 Q8_0
|
| 146 |
+
"""
|
| 147 |
+
print("步骤 2: 量化到 Q8_0...")
|
| 148 |
+
|
| 149 |
+
quantize_tool = Path(llama_cpp_path) / "build/bin/llama-quantize"
|
| 150 |
+
if os.name == 'nt': # Windows
|
| 151 |
+
quantize_tool = quantize_tool.with_suffix('.exe')
|
| 152 |
+
|
| 153 |
+
cmd = [
|
| 154 |
+
str(quantize_tool),
|
| 155 |
+
f16_path,
|
| 156 |
+
q8_0_path,
|
| 157 |
+
"Q8_0"
|
| 158 |
+
]
|
| 159 |
+
|
| 160 |
+
try:
|
| 161 |
+
result = subprocess.run(
|
| 162 |
+
cmd,
|
| 163 |
+
check=True,
|
| 164 |
+
capture_output=True,
|
| 165 |
+
text=True,
|
| 166 |
+
cwd=llama_cpp_path
|
| 167 |
+
)
|
| 168 |
+
print("✓ Q8_0 量化成功")
|
| 169 |
+
print(f"输出: {result.stdout}")
|
| 170 |
+
return True
|
| 171 |
+
except subprocess.CalledProcessError as e:
|
| 172 |
+
print(f"✗ Q8_0 量化失败: {e}")
|
| 173 |
+
print(f"错误输出: {e.stderr}")
|
| 174 |
+
return False
|
| 175 |
+
|
| 176 |
+
def verify_gguf_model(gguf_path: str, llama_cpp_path: str) -> bool:
|
| 177 |
+
"""
|
| 178 |
+
验证生成的 GGUF 模型
|
| 179 |
+
"""
|
| 180 |
+
print("步骤 3: 验证 GGUF 模型...")
|
| 181 |
+
|
| 182 |
+
# 检查文件是否存在
|
| 183 |
+
if not Path(gguf_path).exists():
|
| 184 |
+
print(f"✗ GGUF 文件不存在: {gguf_path}")
|
| 185 |
+
return False
|
| 186 |
+
|
| 187 |
+
# 获取文件大小
|
| 188 |
+
file_size = Path(gguf_path).stat().st_size / (1024**3) # GB
|
| 189 |
+
print(f"✓ GGUF 文件大小: {file_size:.2f} GB")
|
| 190 |
+
|
| 191 |
+
# 使用 llama.cpp 的 main 程序简单测试
|
| 192 |
+
main_tool = Path(llama_cpp_path) / "build/bin/llama-cli"
|
| 193 |
+
if os.name == 'nt':
|
| 194 |
+
main_tool = main_tool.with_suffix('.exe')
|
| 195 |
+
|
| 196 |
+
if main_tool.exists():
|
| 197 |
+
cmd = [
|
| 198 |
+
str(main_tool),
|
| 199 |
+
"-m", gguf_path,
|
| 200 |
+
"-p", "def quicksort(arr):",
|
| 201 |
+
"-n", "10",
|
| 202 |
+
"--temp", "0.1"
|
| 203 |
+
]
|
| 204 |
+
|
| 205 |
+
try:
|
| 206 |
+
result = subprocess.run(
|
| 207 |
+
cmd,
|
| 208 |
+
check=True,
|
| 209 |
+
capture_output=True,
|
| 210 |
+
text=True,
|
| 211 |
+
timeout=30,
|
| 212 |
+
cwd=llama_cpp_path
|
| 213 |
+
)
|
| 214 |
+
print("✓ 模型验证成功")
|
| 215 |
+
print("示例输出:")
|
| 216 |
+
print(result.stdout[-200:]) # 显示最后 200 字符
|
| 217 |
+
return True
|
| 218 |
+
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
|
| 219 |
+
print(f"⚠ 模型验证失败,但文件可能仍然有效: {e}")
|
| 220 |
+
return True # 验证失败不一定意味着量化失败
|
| 221 |
+
else:
|
| 222 |
+
print("⚠ 未找到 main 工具,跳过验证")
|
| 223 |
+
return True
|
| 224 |
+
|
| 225 |
+
def main():
|
| 226 |
+
parser = argparse.ArgumentParser(
|
| 227 |
+
description="Dream-Coder v0-Instruct-7B GGUF Q8_0 量化工具"
|
| 228 |
+
)
|
| 229 |
+
parser.add_argument(
|
| 230 |
+
"--model_path",
|
| 231 |
+
type=str,
|
| 232 |
+
default=".",
|
| 233 |
+
help="Dream-Coder 模型路径 (默认: 当前目录)"
|
| 234 |
+
)
|
| 235 |
+
parser.add_argument(
|
| 236 |
+
"--llama_cpp_path",
|
| 237 |
+
type=str,
|
| 238 |
+
required=True,
|
| 239 |
+
help="llama.cpp 项目路径 (必需)"
|
| 240 |
+
)
|
| 241 |
+
parser.add_argument(
|
| 242 |
+
"--output_dir",
|
| 243 |
+
type=str,
|
| 244 |
+
default="./gguf_output",
|
| 245 |
+
help="输出目录 (默认: ./gguf_output)"
|
| 246 |
+
)
|
| 247 |
+
parser.add_argument(
|
| 248 |
+
"--keep_f16",
|
| 249 |
+
action="store_true",
|
| 250 |
+
help="保留 F16 中间文件"
|
| 251 |
+
)
|
| 252 |
+
|
| 253 |
+
args = parser.parse_args()
|
| 254 |
+
|
| 255 |
+
# 路径处理
|
| 256 |
+
model_path = Path(args.model_path).resolve()
|
| 257 |
+
llama_cpp_path = Path(args.llama_cpp_path).resolve()
|
| 258 |
+
output_dir = Path(args.output_dir).resolve()
|
| 259 |
+
|
| 260 |
+
print("=" * 60)
|
| 261 |
+
print("Dream-Coder v0-Instruct-7B GGUF Q8_0 量化工具")
|
| 262 |
+
print("=" * 60)
|
| 263 |
+
print(f"模型路径: {model_path}")
|
| 264 |
+
print(f"llama.cpp 路径: {llama_cpp_path}")
|
| 265 |
+
print(f"输出目录: {output_dir}")
|
| 266 |
+
print()
|
| 267 |
+
|
| 268 |
+
# 检查输入
|
| 269 |
+
if not model_path.exists():
|
| 270 |
+
print(f"✗ 模型路径不存在: {model_path}")
|
| 271 |
+
return 1
|
| 272 |
+
|
| 273 |
+
if not (model_path / "config.json").exists():
|
| 274 |
+
print(f"✗ 未找到模型配置文件: {model_path}/config.json")
|
| 275 |
+
return 1
|
| 276 |
+
|
| 277 |
+
if not check_llama_cpp_installation(llama_cpp_path):
|
| 278 |
+
print(f"✗ llama.cpp 安装不完整或未编译: {llama_cpp_path}")
|
| 279 |
+
print("请先运行:")
|
| 280 |
+
print(f" cd {llama_cpp_path}")
|
| 281 |
+
print(" make -j$(nproc)")
|
| 282 |
+
return 1
|
| 283 |
+
|
| 284 |
+
# 创建输出目录
|
| 285 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 286 |
+
|
| 287 |
+
# 输出文件路径
|
| 288 |
+
f16_path = output_dir / "dream-coder-7b-f16.gguf"
|
| 289 |
+
q8_0_path = output_dir / "dream-coder-7b-q8_0.gguf"
|
| 290 |
+
|
| 291 |
+
# 执行转换流程
|
| 292 |
+
success = True
|
| 293 |
+
|
| 294 |
+
# 步骤 1: 转换到 F16
|
| 295 |
+
if not convert_to_gguf_f16(str(model_path), str(llama_cpp_path), str(f16_path)):
|
| 296 |
+
success = False
|
| 297 |
+
|
| 298 |
+
# 步骤 2: 量化到 Q8_0
|
| 299 |
+
if success and not quantize_to_q8_0(str(f16_path), str(llama_cpp_path), str(q8_0_path)):
|
| 300 |
+
success = False
|
| 301 |
+
|
| 302 |
+
# 步骤 3: 验证模型
|
| 303 |
+
if success and not verify_gguf_model(str(q8_0_path), str(llama_cpp_path)):
|
| 304 |
+
success = False
|
| 305 |
+
|
| 306 |
+
# 清理中间文件
|
| 307 |
+
if success and not args.keep_f16 and f16_path.exists():
|
| 308 |
+
f16_path.unlink()
|
| 309 |
+
print("✓ 已删除 F16 中间文件")
|
| 310 |
+
|
| 311 |
+
# 结果报告
|
| 312 |
+
print()
|
| 313 |
+
print("=" * 60)
|
| 314 |
+
if success:
|
| 315 |
+
print("✓ 量化完成!")
|
| 316 |
+
print(f"输出文件: {q8_0_path}")
|
| 317 |
+
|
| 318 |
+
# 文件信息
|
| 319 |
+
if q8_0_path.exists():
|
| 320 |
+
size_gb = q8_0_path.stat().st_size / (1024**3)
|
| 321 |
+
print(f"文件大小: {size_gb:.2f} GB")
|
| 322 |
+
print(f"预期内存占用: ~{size_gb:.1f} GB")
|
| 323 |
+
|
| 324 |
+
print()
|
| 325 |
+
print("使用方法:")
|
| 326 |
+
print(f" # 使用 llama.cpp")
|
| 327 |
+
print(f" {llama_cpp_path}/main -m {q8_0_path} -p 'def quicksort(arr):' -n 512")
|
| 328 |
+
print()
|
| 329 |
+
print(f" # 使用 llama-cpp-python")
|
| 330 |
+
print(f" from llama_cpp import Llama")
|
| 331 |
+
print(f" llm = Llama(model_path='{q8_0_path}', n_ctx=2048)")
|
| 332 |
+
print(f" output = llm('def quicksort(arr):', max_tokens=512)")
|
| 333 |
+
|
| 334 |
+
else:
|
| 335 |
+
print("✗ 量化失败")
|
| 336 |
+
return 1
|
| 337 |
+
|
| 338 |
+
print("=" * 60)
|
| 339 |
+
return 0
|
| 340 |
+
|
| 341 |
+
if __name__ == "__main__":
|
| 342 |
+
sys.exit(main())
|
uv.lock
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|