Add baseline perplexity logs/data from BF16 & Q8_0
Browse files- README.md +5 -4
- logs/perplexity-GLM-4.7-BF16.log +208 -0
- logs/perplexity-GLM-4.7-Q8_0.log +204 -0
README.md
CHANGED
|
@@ -20,7 +20,8 @@ Currently cooking this now!
|
|
| 20 |
- [x] download bf16 safetensors https://huggingface.co/zai-org/GLM-4.7
|
| 21 |
- [x] use llama.cpp/convert_hf_to_gguf.py to create bf16 GGUF
|
| 22 |
- [x] calculate imatrix and upload to HF first so others can use as desired
|
| 23 |
-
- [
|
|
|
|
| 24 |
- [ ] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
|
| 25 |
- [ ] upload IQ5_K if all looking good
|
| 26 |
- [ ] continue with smaller quants
|
|
@@ -47,9 +48,9 @@ Perplexity computed against *wiki.test.raw*.
|
|
| 47 |
|
| 48 |
These first two are just test quants for baseline perplexity comparison:
|
| 49 |
* `BF16` 667.598 GiB (16.003 BPW)
|
| 50 |
-
- Final estimate: PPL =
|
| 51 |
-
* `Q8_0`
|
| 52 |
-
- Final estimate: PPL =
|
| 53 |
|
| 54 |
## IQ5_K TODO
|
| 55 |
|
|
|
|
| 20 |
- [x] download bf16 safetensors https://huggingface.co/zai-org/GLM-4.7
|
| 21 |
- [x] use llama.cpp/convert_hf_to_gguf.py to create bf16 GGUF
|
| 22 |
- [x] calculate imatrix and upload to HF first so others can use as desired
|
| 23 |
+
- [x] cook Q8_0 and test perplexity of BF16 and Q8_0 for baseline data
|
| 24 |
+
- [ ] look into making MTP nextn tensors full q8_0 (won't effect RAM+VRAM usage otherwise)
|
| 25 |
- [ ] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
|
| 26 |
- [ ] upload IQ5_K if all looking good
|
| 27 |
- [ ] continue with smaller quants
|
|
|
|
| 48 |
|
| 49 |
These first two are just test quants for baseline perplexity comparison:
|
| 50 |
* `BF16` 667.598 GiB (16.003 BPW)
|
| 51 |
+
- Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9267 +/- 0.02423
|
| 52 |
+
* `Q8_0` 354.794 GiB (8.505 BPW)
|
| 53 |
+
- Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9320 +/- 0.02428
|
| 54 |
|
| 55 |
## IQ5_K TODO
|
| 56 |
|
logs/perplexity-GLM-4.7-BF16.log
ADDED
|
@@ -0,0 +1,208 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
model=/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf
|
| 2 |
+
|
| 3 |
+
numactl -N "$SOCKET" -m "$SOCKET" \
|
| 4 |
+
./build/bin/llama-perplexity \
|
| 5 |
+
-m "$model" \
|
| 6 |
+
-f wiki.test.raw \
|
| 7 |
+
--seed 1337 \
|
| 8 |
+
--ctx-size 512 \
|
| 9 |
+
-ub 4096 -b 4096 \
|
| 10 |
+
--numa numactl \
|
| 11 |
+
--threads 96 \
|
| 12 |
+
--threads-batch 128 \
|
| 13 |
+
--validate-quants \
|
| 14 |
+
--no-mmap
|
| 15 |
+
|
| 16 |
+
SOCKET is set to: 0
|
| 17 |
+
main: build = 4073 (55626050)
|
| 18 |
+
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
|
| 19 |
+
main: seed = 1337
|
| 20 |
+
CPU: using device CPU - 0 MiB free
|
| 21 |
+
llama_model_loader: additional 14 GGUFs metadata loaded.
|
| 22 |
+
llama_model_loader: loaded meta data with 49 key-value pairs and 1761 tensors from /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf (version GGUF V3 (latest))
|
| 23 |
+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
| 24 |
+
llama_model_loader: - kv 0: general.architecture str = glm4moe
|
| 25 |
+
llama_model_loader: - kv 1: general.type str = model
|
| 26 |
+
llama_model_loader: - kv 2: general.sampling.temp f32 = 1.000000
|
| 27 |
+
llama_model_loader: - kv 3: general.name str = GLM 4.7
|
| 28 |
+
llama_model_loader: - kv 4: general.version str = 4.7
|
| 29 |
+
llama_model_loader: - kv 5: general.basename str = GLM
|
| 30 |
+
llama_model_loader: - kv 6: general.size_label str = 160x21B
|
| 31 |
+
llama_model_loader: - kv 7: general.license str = mit
|
| 32 |
+
llama_model_loader: - kv 8: general.tags arr[str,1] = ["text-generation"]
|
| 33 |
+
llama_model_loader: - kv 9: general.languages arr[str,2] = ["en", "zh"]
|
| 34 |
+
llama_model_loader: - kv 10: glm4moe.block_count u32 = 93
|
| 35 |
+
llama_model_loader: - kv 11: glm4moe.context_length u32 = 202752
|
| 36 |
+
llama_model_loader: - kv 12: glm4moe.embedding_length u32 = 5120
|
| 37 |
+
llama_model_loader: - kv 13: glm4moe.feed_forward_length u32 = 12288
|
| 38 |
+
llama_model_loader: - kv 14: glm4moe.attention.head_count u32 = 96
|
| 39 |
+
llama_model_loader: - kv 15: glm4moe.attention.head_count_kv u32 = 8
|
| 40 |
+
llama_model_loader: - kv 16: glm4moe.rope.freq_base f32 = 1000000.000000
|
| 41 |
+
llama_model_loader: - kv 17: glm4moe.attention.layer_norm_rms_epsilon f32 = 0.000010
|
| 42 |
+
llama_model_loader: - kv 18: glm4moe.expert_used_count u32 = 8
|
| 43 |
+
llama_model_loader: - kv 19: glm4moe.expert_group_count u32 = 1
|
| 44 |
+
llama_model_loader: - kv 20: glm4moe.expert_group_used_count u32 = 1
|
| 45 |
+
llama_model_loader: - kv 21: glm4moe.attention.key_length u32 = 128
|
| 46 |
+
llama_model_loader: - kv 22: glm4moe.attention.value_length u32 = 128
|
| 47 |
+
llama_model_loader: - kv 23: general.file_type u32 = 32
|
| 48 |
+
llama_model_loader: - kv 24: glm4moe.rope.dimension_count u32 = 64
|
| 49 |
+
llama_model_loader: - kv 25: glm4moe.expert_count u32 = 160
|
| 50 |
+
llama_model_loader: - kv 26: glm4moe.expert_feed_forward_length u32 = 1536
|
| 51 |
+
llama_model_loader: - kv 27: glm4moe.expert_shared_count u32 = 1
|
| 52 |
+
llama_model_loader: - kv 28: glm4moe.leading_dense_block_count u32 = 3
|
| 53 |
+
llama_model_loader: - kv 29: glm4moe.expert_gating_func u32 = 2
|
| 54 |
+
llama_model_loader: - kv 30: glm4moe.expert_weights_scale f32 = 2.500000
|
| 55 |
+
llama_model_loader: - kv 31: glm4moe.expert_weights_norm bool = true
|
| 56 |
+
llama_model_loader: - kv 32: glm4moe.nextn_predict_layers u32 = 1
|
| 57 |
+
llama_model_loader: - kv 33: general.quantization_version u32 = 2
|
| 58 |
+
llama_model_loader: - kv 34: tokenizer.ggml.model str = gpt2
|
| 59 |
+
llama_model_loader: - kv 35: tokenizer.ggml.pre str = glm4
|
| 60 |
+
llama_model_loader: - kv 36: tokenizer.ggml.tokens arr[str,151552] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
| 61 |
+
llama_model_loader: - kv 37: tokenizer.ggml.token_type arr[i32,151552] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
| 62 |
+
llama_model_loader: - kv 38: tokenizer.ggml.merges arr[str,318088] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
|
| 63 |
+
llama_model_loader: - kv 39: tokenizer.ggml.eos_token_id u32 = 151329
|
| 64 |
+
llama_model_loader: - kv 40: tokenizer.ggml.padding_token_id u32 = 151329
|
| 65 |
+
llama_model_loader: - kv 41: tokenizer.ggml.bos_token_id u32 = 151331
|
| 66 |
+
llama_model_loader: - kv 42: tokenizer.ggml.eot_token_id u32 = 151336
|
| 67 |
+
llama_model_loader: - kv 43: tokenizer.ggml.unknown_token_id u32 = 151329
|
| 68 |
+
llama_model_loader: - kv 44: tokenizer.ggml.eom_token_id u32 = 151338
|
| 69 |
+
llama_model_loader: - kv 45: tokenizer.chat_template str = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
|
| 70 |
+
llama_model_loader: - kv 46: split.no u16 = 0
|
| 71 |
+
llama_model_loader: - kv 47: split.count u16 = 15
|
| 72 |
+
llama_model_loader: - kv 48: split.tensors.count i32 = 1761
|
| 73 |
+
llama_model_loader: - type f32: 835 tensors
|
| 74 |
+
llama_model_loader: - type bf16: 926 tensors
|
| 75 |
+
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
|
| 76 |
+
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
|
| 77 |
+
load: printing all EOG tokens:
|
| 78 |
+
load: - 151329 ('<|endoftext|>')
|
| 79 |
+
load: - 151336 ('<|user|>')
|
| 80 |
+
load: - 151338 ('<|observation|>')
|
| 81 |
+
load: special tokens cache size = 36
|
| 82 |
+
load: token to piece cache size = 0.9713 MB
|
| 83 |
+
llm_load_print_meta: format = GGUF V3 (latest)
|
| 84 |
+
llm_load_print_meta: arch = glm4moe
|
| 85 |
+
llm_load_print_meta: n_ctx_train = 202752
|
| 86 |
+
llm_load_print_meta: n_embd = 5120
|
| 87 |
+
llm_load_print_meta: n_layer = 93
|
| 88 |
+
llm_load_print_meta: n_head = 96
|
| 89 |
+
llm_load_print_meta: n_head_kv = 8
|
| 90 |
+
llm_load_print_meta: n_rot = 64
|
| 91 |
+
llm_load_print_meta: n_swa = 0
|
| 92 |
+
llm_load_print_meta: n_swa_pattern = 1
|
| 93 |
+
llm_load_print_meta: n_embd_head_k = 128
|
| 94 |
+
llm_load_print_meta: n_embd_head_v = 128
|
| 95 |
+
llm_load_print_meta: n_gqa = 12
|
| 96 |
+
llm_load_print_meta: n_embd_k_gqa = 1024
|
| 97 |
+
llm_load_print_meta: n_embd_v_gqa = 1024
|
| 98 |
+
llm_load_print_meta: f_norm_eps = 0.0e+00
|
| 99 |
+
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
|
| 100 |
+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
| 101 |
+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
| 102 |
+
llm_load_print_meta: f_logit_scale = 0.0e+00
|
| 103 |
+
llm_load_print_meta: n_ff = 12288
|
| 104 |
+
llm_load_print_meta: n_expert = 160
|
| 105 |
+
llm_load_print_meta: n_expert_used = 8
|
| 106 |
+
llm_load_print_meta: causal attn = 1
|
| 107 |
+
llm_load_print_meta: pooling type = 0
|
| 108 |
+
llm_load_print_meta: rope type = 2
|
| 109 |
+
llm_load_print_meta: rope scaling = linear
|
| 110 |
+
llm_load_print_meta: freq_base_train = 1000000.0
|
| 111 |
+
llm_load_print_meta: freq_scale_train = 1
|
| 112 |
+
llm_load_print_meta: n_ctx_orig_yarn = 202752
|
| 113 |
+
llm_load_print_meta: rope_finetuned = unknown
|
| 114 |
+
llm_load_print_meta: ssm_d_conv = 0
|
| 115 |
+
llm_load_print_meta: ssm_d_inner = 0
|
| 116 |
+
llm_load_print_meta: ssm_d_state = 0
|
| 117 |
+
llm_load_print_meta: ssm_dt_rank = 0
|
| 118 |
+
llm_load_print_meta: model type = 355B.A32B
|
| 119 |
+
llm_load_print_meta: model ftype = BF16
|
| 120 |
+
llm_load_print_meta: model params = 358.338 B
|
| 121 |
+
llm_load_print_meta: model size = 667.598 GiB (16.003 BPW)
|
| 122 |
+
llm_load_print_meta: repeating layers = 664.707 GiB (16.003 BPW, 356.786 B parameters)
|
| 123 |
+
llm_load_print_meta: general.name = GLM 4.7
|
| 124 |
+
print_info: vocab type = BPE
|
| 125 |
+
print_info: n_vocab = 151552
|
| 126 |
+
print_info: n_merges = 318088
|
| 127 |
+
print_info: BOS token = 151331 '[gMASK]'
|
| 128 |
+
print_info: EOS token = 151329 '<|endoftext|>'
|
| 129 |
+
print_info: EOT token = 151336 '<|user|>'
|
| 130 |
+
print_info: EOM token = 151338 '<|observation|>'
|
| 131 |
+
print_info: UNK token = 151329 '<|endoftext|>'
|
| 132 |
+
print_info: PAD token = 151329 '<|endoftext|>'
|
| 133 |
+
print_info: LF token = 198 'Ċ'
|
| 134 |
+
print_info: FIM PRE token = 151347 '<|code_prefix|>'
|
| 135 |
+
print_info: FIM SUF token = 151349 '<|code_suffix|>'
|
| 136 |
+
print_info: FIM MID token = 151348 '<|code_middle|>'
|
| 137 |
+
print_info: EOG token = 151329 '<|endoftext|>'
|
| 138 |
+
print_info: EOG token = 151336 '<|user|>'
|
| 139 |
+
print_info: EOG token = 151338 '<|observation|>'
|
| 140 |
+
print_info: max token length = 1024
|
| 141 |
+
llm_load_tensors: ggml ctx size = 0.72 MiB
|
| 142 |
+
model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
|
| 143 |
+
model has unused tensor blk.92.attn_q.weight (size = 125829120 bytes) -- ignoring
|
| 144 |
+
model has unused tensor blk.92.attn_k.weight (size = 10485760 bytes) -- ignoring
|
| 145 |
+
model has unused tensor blk.92.attn_v.weight (size = 10485760 bytes) -- ignoring
|
| 146 |
+
model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
|
| 147 |
+
model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
|
| 148 |
+
model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
|
| 149 |
+
model has unused tensor blk.92.attn_output.weight (size = 125829120 bytes) -- ignoring
|
| 150 |
+
model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
|
| 151 |
+
model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
|
| 152 |
+
model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
|
| 153 |
+
model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
|
| 154 |
+
model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
|
| 155 |
+
model has unused tensor blk.92.ffn_gate_exps.weight (size = 2516582400 bytes) -- ignoring
|
| 156 |
+
model has unused tensor blk.92.ffn_down_exps.weight (size = 2516582400 bytes) -- ignoring
|
| 157 |
+
model has unused tensor blk.92.ffn_up_exps.weight (size = 2516582400 bytes) -- ignoring
|
| 158 |
+
model has unused tensor blk.92.ffn_gate_shexp.weight (size = 15728640 bytes) -- ignoring
|
| 159 |
+
model has unused tensor blk.92.ffn_down_shexp.weight (size = 15728640 bytes) -- ignoring
|
| 160 |
+
model has unused tensor blk.92.ffn_up_shexp.weight (size = 15728640 bytes) -- ignoring
|
| 161 |
+
model has unused tensor blk.92.nextn.eh_proj.weight (size = 104857600 bytes) -- ignoring
|
| 162 |
+
model has unused tensor blk.92.nextn.embed_tokens.weight (size = 1551892480 bytes) -- ignoring
|
| 163 |
+
model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
|
| 164 |
+
model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
|
| 165 |
+
model has unused tensor blk.92.nextn.shared_head_head.weight (size = 1551892480 bytes) -- ignoring
|
| 166 |
+
model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring
|
| 167 |
+
llm_load_tensors: offloading 0 repeating layers to GPU
|
| 168 |
+
llm_load_tensors: offloaded 0/94 layers to GPU
|
| 169 |
+
llm_load_tensors: CPU buffer size = 673051.91 MiB
|
| 170 |
+
....................................................................................................
|
| 171 |
+
llama_new_context_with_model: n_ctx = 4096
|
| 172 |
+
llama_new_context_with_model: n_batch = 4096
|
| 173 |
+
llama_new_context_with_model: n_ubatch = 4096
|
| 174 |
+
llama_new_context_with_model: flash_attn = 1
|
| 175 |
+
llama_new_context_with_model: attn_max_b = 0
|
| 176 |
+
llama_new_context_with_model: fused_moe = 1
|
| 177 |
+
llama_new_context_with_model: grouped er = 0
|
| 178 |
+
llama_new_context_with_model: fused_up_gate = 1
|
| 179 |
+
llama_new_context_with_model: fused_mmad = 1
|
| 180 |
+
llama_new_context_with_model: rope_cache = 0
|
| 181 |
+
llama_new_context_with_model: graph_reuse = 0
|
| 182 |
+
llama_new_context_with_model: k_cache_hadam = 0
|
| 183 |
+
llama_new_context_with_model: split_mode_graph_scheduling = 0
|
| 184 |
+
llama_new_context_with_model: ser = -1, 0
|
| 185 |
+
llama_new_context_with_model: freq_base = 1000000.0
|
| 186 |
+
llama_new_context_with_model: freq_scale = 1
|
| 187 |
+
llama_kv_cache_init: CPU KV buffer size = 1472.00 MiB
|
| 188 |
+
llama_new_context_with_model: KV self size = 1472.00 MiB, K (f16): 736.00 MiB, V (f16): 736.00 MiB
|
| 189 |
+
llama_new_context_with_model: CPU output buffer size = 4.63 MiB
|
| 190 |
+
llama_new_context_with_model: CPU compute buffer size = 2448.00 MiB
|
| 191 |
+
llama_new_context_with_model: graph nodes = 4278
|
| 192 |
+
llama_new_context_with_model: graph splits = 1
|
| 193 |
+
XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
|
| 194 |
+
|
| 195 |
+
system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
| 196 |
+
perplexity: tokenizing the input ..
|
| 197 |
+
perplexity: tokenization took 369.524 ms
|
| 198 |
+
perplexity: calculating perplexity over 565 chunks, n_ctx=512, batch_size=4096, n_seq=8
|
| 199 |
+
perplexity: 35.41 seconds per pass - ETA 41.67 minutes
|
| 200 |
+
======================================= HAVE_FANCY_SIMD is defined
|
| 201 |
+
[1]2.8926,[2]3.5859,[3]2.7995,[4]2.5299,[5]2.5820,[6]2.7924,[7]2.8602,[8]2.8440,[9]2.9531,[10]2.8590,[11]2.8730,[12]3.0487,[13]3.0794,[14]3.0892,[15]3.2528,[16]3.3286,[17]3.4781,[18]3.6998,[19]3.6556,[20]3.7121,[21]3.7950,[22]3.7604,[23]3.6688,[24]3.5707,[25]3.4949,[26]3.4440,[27]3.4100,[28]3.4292,[29]3.4712,[30]3.5417,[31]3.6083,[32]3.6691,[33]3.7315,[34]3.7650,[35]3.8319,[36]3.8767,[37]3.8820,[38]3.9504,[39]3.9800,[40]4.0195,[41]4.0972,[42]4.1190,[43]4.1284,[44]4.1543,[45]4.2516,[46]4.3116,[47]4.2478,[48]4.1584,[49]4.0910,[50]4.0475,[51]4.0689,[52]4.0855,[53]4.1178,[54]4.1084,[55]4.1223,[56]4.1429,[57]4.1026,[58]4.1010,[59]4.0923,[60]4.1306,[61]4.1677,[62]4.2150,[63]4.2443,[64]4.2543,[65]4.2501,[66]4.2200,[67]4.1734,[68]4.1355,[69]4.1541,[70]4.1574,[71]4.1546,[72]4.1518,[73]4.1598,[74]4.1934,[75]4.1994,[76]4.1521,[77]4.1263,[78]4.1095,[79]4.0587,[80]4.0150,[81]4.0283,[82]4.0150,[83]4.0132,[84]4.0227,[85]3.9949,[86]3.9912,[87]3.9738,[88]3.9697,[89]3.9520,[90]3.9212,[91]3.8889,[92]3.8951,[93]3.9118,[94]3.8968,[95]3.8973,[96]3.9268,[97]3.9782,[98]3.9853,[99]3.9751,[100]3.9550,[101]3.9726,[102]3.9796,[103]4.0039,[104]3.9914,[105]4.0071,[106]4.0377,[107]4.1099,[108]4.1147,[109]4.1234,[110]4.1661,[111]4.1921,[112]4.1598,[113]4.1268,[114]4.1000,[115]4.0745,[116]4.0617,[117]4.0441,[118]4.0464,[119]4.0372,[120]4.0213,[121]4.0136,[122]3.9937,[123]3.9645,[124]3.9409,[125]3.9260,[126]3.9032,[127]3.8961,[128]3.8874,[129]3.8858,[130]3.8736,[131]3.8596,[132]3.8444,[133]3.8363,[134]3.8445,[135]3.8635,[136]3.8547,[137]3.8548,[138]3.8440,[139]3.8302,[140]3.8446,[141]3.8409,[142]3.8392,[143]3.8296,[144]3.8255,[145]3.8188,[146]3.8146,[147]3.8108,[148]3.8116,[149]3.8098,[150]3.8094,[151]3.7967,[152]3.7892,[153]3.7905,[154]3.7836,[155]3.7803,[156]3.7782,[157]3.7770,[158]3.7761,[159]3.7935,[160]3.8045,[161]3.8100,[162]3.8173,[163]3.8091,[164]3.8198,[165]3.8265,[166]3.8521,[167]3.8757,[168]3.8860,[169]3.9163,[170]3.9370,[171]3.9477,[172]3.9762,[173]3.9626,[174]3.9470,[175]3.9237,[176]3.9019,[177]3.8868,[178]3.8703,[179]3.8488,[180]3.8429,[181]3.8378,[182]3.8535,[183]3.8738,[184]3.9045,[185]3.9231,[186]3.9293,[187]3.9519,[188]3.9836,[189]4.0054,[190]4.0192,[191]4.0387,[192]4.0456,[193]4.0544,[194]4.0543,[195]4.0486,[196]4.0450,[197]4.0581,[198]4.0750,[199]4.0670,[200]4.0727,[201]4.0722,[202]4.0716,[203]4.0654,[204]4.0744,[205]4.0801,[206]4.0843,[207]4.0873,[208]4.0926,[209]4.0917,[210]4.0891,[211]4.0932,[212]4.0882,[213]4.0845,[214]4.0857,[215]4.0868,[216]4.0884,[217]4.0867,[218]4.0942,[219]4.0868,[220]4.0828,[221]4.0791,[222]4.0770,[223]4.0765,[224]4.0779,[225]4.0765,[226]4.0812,[227]4.0746,[228]4.0713,[229]4.0567,[230]4.0453,[231]4.0378,[232]4.0381,[233]4.0360,[234]4.0335,[235]4.0250,[236]4.0318,[237]4.0302,[238]4.0364,[239]4.0459,[240]4.0590,[241]4.0687,[242]4.0778,[243]4.0899,[244]4.1006,[245]4.1147,[246]4.1263,[247]4.1401,[248]4.1464,[249]4.1493,[250]4.1472,[251]4.1316,[252]4.1209,[253]4.1191,[254]4.1189,[255]4.1198,[256]4.1254,[257]4.1256,[258]4.1253,[259]4.1267,[260]4.1302,[261]4.1271,[262]4.1285,[263]4.1278,[264]4.1275,[265]4.1276,[266]4.1275,[267]4.1251,[268]4.1233,[269]4.1204,[270]4.1261,[271]4.1258,[272]4.1201,[273]4.1190,[274]4.1083,[275]4.1042,[276]4.0905,[277]4.0848,[278]4.0805,[279]4.0820,[280]4.0881,[281]4.0898,[282]4.0962,[283]4.1031,[284]4.1058,[285]4.1108,[286]4.1218,[287]4.1370,[288]4.1342,[289]4.1329,[290]4.1335,[291]4.1335,[292]4.1274,[293]4.1134,[294]4.1103,[295]4.1105,[296]4.1009,[297]4.0888,[298]4.0808,[299]4.0697,[300]4.0591,[301]4.0560,[302]4.0441,[303]4.0357,[304]4.0237,[305]4.0140,[306]4.0099,[307]4.0140,[308]4.0193,[309]4.0326,[310]4.0195,[311]4.0172,[312]4.0067,[313]3.9996,[314]3.9946,[315]3.9920,[316]3.9833,[317]3.9755,[318]3.9677,[319]3.9595,[320]3.9536,[321]3.9482,[322]3.9438,[323]3.9335,[324]3.9261,[325]3.9215,[326]3.9150,[327]3.9152,[328]3.9144,[329]3.9135,[330]3.9103,[331]3.9061,[332]3.9120,[333]3.9152,[334]3.9183,[335]3.9194,[336]3.9192,[337]3.9205,[338]3.9191,[339]3.9186,[340]3.9205,[341]3.9221,[342]3.9250,[343]3.9330,[344]3.9396,[345]3.9520,[346]3.9518,[347]3.9450,[348]3.9426,[349]3.9442,[350]3.9370,[351]3.9252,[352]3.9174,[353]3.9150,[354]3.9170,[355]3.9247,[356]3.9376,[357]3.9408,[358]3.9448,[359]3.9542,[360]3.9664,[361]3.9682,[362]3.9736,[363]3.9792,[364]3.9850,[365]3.9872,[366]3.9916,[367]3.9953,[368]4.0015,[369]4.0083,[370]4.0144,[371]4.0172,[372]4.0259,[373]4.0398,[374]4.0499,[375]4.0548,[376]4.0584,[377]4.0630,[378]4.0761,[379]4.0884,[380]4.0907,[381]4.0854,[382]4.0842,[383]4.0849,[384]4.0920,[385]4.0956,[386]4.0997,[387]4.1017,[388]4.1049,[389]4.1116,[390]4.1125,[391]4.1031,[392]4.0949,[393]4.0861,[394]4.0818,[395]4.0761,[396]4.0697,[397]4.0610,[398]4.0538,[399]4.0491,[400]4.0380,[401]4.0337,[402]4.0337,[403]4.0253,[404]4.0164,[405]4.0131,[406]4.0054,[407]3.9969,[408]3.9874,[409]3.9806,[410]3.9732,[411]3.9717,[412]3.9702,[413]3.9710,[414]3.9645,[415]3.9647,[416]3.9618,[417]3.9547,[418]3.9455,[419]3.9515,[420]3.9461,[421]3.9482,[422]3.9493,[423]3.9420,[424]3.9412,[425]3.9408,[426]3.9413,[427]3.9390,[428]3.9395,[429]3.9347,[430]3.9341,[431]3.9344,[432]3.9283,[433]3.9225,[434]3.9149,[435]3.9135,[436]3.9066,[437]3.9002,[438]3.8942,[439]3.8923,[440]3.8930,[441]3.8913,[442]3.8897,[443]3.8962,[444]3.9068,[445]3.9028,[446]3.9000,[447]3.8982,[448]3.8965,[449]3.9023,[450]3.9016,[451]3.8999,[452]3.9030,[453]3.9107,[454]3.9138,[455]3.9145,[456]3.9183,[457]3.9182,[458]3.9207,[459]3.9211,[460]3.9267,[461]3.9319,[462]3.9347,[463]3.9349,[464]3.9312,[465]3.9296,[466]3.9381,[467]3.9380,[468]3.9369,[469]3.9433,[470]3.9453,[471]3.9497,[472]3.9552,[473]3.9562,[474]3.9549,[475]3.9572,[476]3.9594,[477]3.9623,[478]3.9615,[479]3.9622,[480]3.9628,[481]3.9651,[482]3.9661,[483]3.9713,[484]3.9682,[485]3.9712,[486]3.9697,[487]3.9751,[488]3.9814,[489]3.9876,[490]3.9882,[491]3.9925,[492]3.9963,[493]3.9991,[494]4.0051,[495]4.0107,[496]4.0100,[497]4.0089,[498]4.0092,[499]4.0104,[500]4.0123,[501]4.0120,[502]4.0116,[503]4.0162,[504]4.0218,[505]4.0215,[506]4.0210,[507]4.0234,[508]4.0283,[509]4.0367,[510]4.0391,[511]4.0437,[512]4.0377,[513]4.0354,[514]4.0312,[515]4.0323,[516]4.0297,[517]4.0277,[518]4.0264,[519]4.0220,[520]4.0215,[521]4.0209,[522]4.0164,[523]4.0151,[524]4.0173,[525]4.0163,[526]4.0140,[527]4.0163,[528]4.0114,[529]4.0060,[530]4.0015,[531]3.9968,[532]3.9967,[533]3.9942,[534]3.9916,[535]3.9871,[536]3.9818,[537]3.9746,[538]3.9725,[539]3.9639,[540]3.9634,[541]3.9671,[542]3.9654,[543]3.9604,[544]3.9588,[545]3.9594,[546]3.9594,[547]3.9612,[548]3.9594,[549]3.9536,[550]3.9484,[551]3.9439,[552]3.9376,[553]3.9338,[554]3.9303,[555]3.9238,[556]3.9185,[557]3.9141,[558]3.9171,[559]3.9154,[560]3.9133,[561]3.9149,[562]3.9191,[563]3.9244,[564]3.9281,[565]3.9267,
|
| 202 |
+
llama_print_timings: load time = 155689.14 ms
|
| 203 |
+
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 204 |
+
llama_print_timings: prompt eval time = 2116254.98 ms / 289280 tokens ( 7.32 ms per token, 136.69 tokens per second)
|
| 205 |
+
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 206 |
+
llama_print_timings: total time = 2127881.74 ms / 289281 tokens
|
| 207 |
+
|
| 208 |
+
Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9267 +/- 0.02423
|
logs/perplexity-GLM-4.7-Q8_0.log
ADDED
|
@@ -0,0 +1,204 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
model=/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-Q8_0.gguf
|
| 2 |
+
|
| 3 |
+
numactl -N "$SOCKET" -m "$SOCKET" \
|
| 4 |
+
./build/bin/llama-perplexity \
|
| 5 |
+
-m "$model" \
|
| 6 |
+
-f wiki.test.raw \
|
| 7 |
+
--seed 1337 \
|
| 8 |
+
--ctx-size 512 \
|
| 9 |
+
-ub 4096 -b 4096 \
|
| 10 |
+
--numa numactl \
|
| 11 |
+
--threads 96 \
|
| 12 |
+
--threads-batch 128 \
|
| 13 |
+
--validate-quants \
|
| 14 |
+
--no-mmap
|
| 15 |
+
|
| 16 |
+
SOCKET is set to: 1
|
| 17 |
+
main: build = 4073 (55626050)
|
| 18 |
+
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
|
| 19 |
+
main: seed = 1337
|
| 20 |
+
CPU: using device CPU - 0 MiB free
|
| 21 |
+
llama_model_loader: loaded meta data with 46 key-value pairs and 1761 tensors from /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-Q8_0.gguf (version GGUF V3 (latest))
|
| 22 |
+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
| 23 |
+
llama_model_loader: - kv 0: general.architecture str = glm4moe
|
| 24 |
+
llama_model_loader: - kv 1: general.type str = model
|
| 25 |
+
llama_model_loader: - kv 2: general.sampling.temp f32 = 1.000000
|
| 26 |
+
llama_model_loader: - kv 3: general.name str = GLM 4.7
|
| 27 |
+
llama_model_loader: - kv 4: general.version str = 4.7
|
| 28 |
+
llama_model_loader: - kv 5: general.basename str = GLM
|
| 29 |
+
llama_model_loader: - kv 6: general.size_label str = 160x21B
|
| 30 |
+
llama_model_loader: - kv 7: general.license str = mit
|
| 31 |
+
llama_model_loader: - kv 8: general.tags arr[str,1] = ["text-generation"]
|
| 32 |
+
llama_model_loader: - kv 9: general.languages arr[str,2] = ["en", "zh"]
|
| 33 |
+
llama_model_loader: - kv 10: glm4moe.block_count u32 = 93
|
| 34 |
+
llama_model_loader: - kv 11: glm4moe.context_length u32 = 202752
|
| 35 |
+
llama_model_loader: - kv 12: glm4moe.embedding_length u32 = 5120
|
| 36 |
+
llama_model_loader: - kv 13: glm4moe.feed_forward_length u32 = 12288
|
| 37 |
+
llama_model_loader: - kv 14: glm4moe.attention.head_count u32 = 96
|
| 38 |
+
llama_model_loader: - kv 15: glm4moe.attention.head_count_kv u32 = 8
|
| 39 |
+
llama_model_loader: - kv 16: glm4moe.rope.freq_base f32 = 1000000.000000
|
| 40 |
+
llama_model_loader: - kv 17: glm4moe.attention.layer_norm_rms_epsilon f32 = 0.000010
|
| 41 |
+
llama_model_loader: - kv 18: glm4moe.expert_used_count u32 = 8
|
| 42 |
+
llama_model_loader: - kv 19: glm4moe.expert_group_count u32 = 1
|
| 43 |
+
llama_model_loader: - kv 20: glm4moe.expert_group_used_count u32 = 1
|
| 44 |
+
llama_model_loader: - kv 21: glm4moe.attention.key_length u32 = 128
|
| 45 |
+
llama_model_loader: - kv 22: glm4moe.attention.value_length u32 = 128
|
| 46 |
+
llama_model_loader: - kv 23: general.file_type u32 = 7
|
| 47 |
+
llama_model_loader: - kv 24: glm4moe.rope.dimension_count u32 = 64
|
| 48 |
+
llama_model_loader: - kv 25: glm4moe.expert_count u32 = 160
|
| 49 |
+
llama_model_loader: - kv 26: glm4moe.expert_feed_forward_length u32 = 1536
|
| 50 |
+
llama_model_loader: - kv 27: glm4moe.expert_shared_count u32 = 1
|
| 51 |
+
llama_model_loader: - kv 28: glm4moe.leading_dense_block_count u32 = 3
|
| 52 |
+
llama_model_loader: - kv 29: glm4moe.expert_gating_func u32 = 2
|
| 53 |
+
llama_model_loader: - kv 30: glm4moe.expert_weights_scale f32 = 2.500000
|
| 54 |
+
llama_model_loader: - kv 31: glm4moe.expert_weights_norm bool = true
|
| 55 |
+
llama_model_loader: - kv 32: glm4moe.nextn_predict_layers u32 = 1
|
| 56 |
+
llama_model_loader: - kv 33: general.quantization_version u32 = 2
|
| 57 |
+
llama_model_loader: - kv 34: tokenizer.ggml.model str = gpt2
|
| 58 |
+
llama_model_loader: - kv 35: tokenizer.ggml.pre str = glm4
|
| 59 |
+
llama_model_loader: - kv 36: tokenizer.ggml.tokens arr[str,151552] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
| 60 |
+
llama_model_loader: - kv 37: tokenizer.ggml.token_type arr[i32,151552] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
| 61 |
+
llama_model_loader: - kv 38: tokenizer.ggml.merges arr[str,318088] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
|
| 62 |
+
llama_model_loader: - kv 39: tokenizer.ggml.eos_token_id u32 = 151329
|
| 63 |
+
llama_model_loader: - kv 40: tokenizer.ggml.padding_token_id u32 = 151329
|
| 64 |
+
llama_model_loader: - kv 41: tokenizer.ggml.bos_token_id u32 = 151331
|
| 65 |
+
llama_model_loader: - kv 42: tokenizer.ggml.eot_token_id u32 = 151336
|
| 66 |
+
llama_model_loader: - kv 43: tokenizer.ggml.unknown_token_id u32 = 151329
|
| 67 |
+
llama_model_loader: - kv 44: tokenizer.ggml.eom_token_id u32 = 151338
|
| 68 |
+
llama_model_loader: - kv 45: tokenizer.chat_template str = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
|
| 69 |
+
llama_model_loader: - type f32: 835 tensors
|
| 70 |
+
llama_model_loader: - type q8_0: 926 tensors
|
| 71 |
+
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
|
| 72 |
+
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
|
| 73 |
+
load: printing all EOG tokens:
|
| 74 |
+
load: - 151329 ('<|endoftext|>')
|
| 75 |
+
load: - 151336 ('<|user|>')
|
| 76 |
+
load: - 151338 ('<|observation|>')
|
| 77 |
+
load: special tokens cache size = 36
|
| 78 |
+
load: token to piece cache size = 0.9713 MB
|
| 79 |
+
llm_load_print_meta: format = GGUF V3 (latest)
|
| 80 |
+
llm_load_print_meta: arch = glm4moe
|
| 81 |
+
llm_load_print_meta: n_ctx_train = 202752
|
| 82 |
+
llm_load_print_meta: n_embd = 5120
|
| 83 |
+
llm_load_print_meta: n_layer = 93
|
| 84 |
+
llm_load_print_meta: n_head = 96
|
| 85 |
+
llm_load_print_meta: n_head_kv = 8
|
| 86 |
+
llm_load_print_meta: n_rot = 64
|
| 87 |
+
llm_load_print_meta: n_swa = 0
|
| 88 |
+
llm_load_print_meta: n_swa_pattern = 1
|
| 89 |
+
llm_load_print_meta: n_embd_head_k = 128
|
| 90 |
+
llm_load_print_meta: n_embd_head_v = 128
|
| 91 |
+
llm_load_print_meta: n_gqa = 12
|
| 92 |
+
llm_load_print_meta: n_embd_k_gqa = 1024
|
| 93 |
+
llm_load_print_meta: n_embd_v_gqa = 1024
|
| 94 |
+
llm_load_print_meta: f_norm_eps = 0.0e+00
|
| 95 |
+
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
|
| 96 |
+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
| 97 |
+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
| 98 |
+
llm_load_print_meta: f_logit_scale = 0.0e+00
|
| 99 |
+
llm_load_print_meta: n_ff = 12288
|
| 100 |
+
llm_load_print_meta: n_expert = 160
|
| 101 |
+
llm_load_print_meta: n_expert_used = 8
|
| 102 |
+
llm_load_print_meta: causal attn = 1
|
| 103 |
+
llm_load_print_meta: pooling type = 0
|
| 104 |
+
llm_load_print_meta: rope type = 2
|
| 105 |
+
llm_load_print_meta: rope scaling = linear
|
| 106 |
+
llm_load_print_meta: freq_base_train = 1000000.0
|
| 107 |
+
llm_load_print_meta: freq_scale_train = 1
|
| 108 |
+
llm_load_print_meta: n_ctx_orig_yarn = 202752
|
| 109 |
+
llm_load_print_meta: rope_finetuned = unknown
|
| 110 |
+
llm_load_print_meta: ssm_d_conv = 0
|
| 111 |
+
llm_load_print_meta: ssm_d_inner = 0
|
| 112 |
+
llm_load_print_meta: ssm_d_state = 0
|
| 113 |
+
llm_load_print_meta: ssm_dt_rank = 0
|
| 114 |
+
llm_load_print_meta: model type = 355B.A32B
|
| 115 |
+
llm_load_print_meta: model ftype = Q8_0
|
| 116 |
+
llm_load_print_meta: model params = 358.338 B
|
| 117 |
+
llm_load_print_meta: model size = 354.794 GiB (8.505 BPW)
|
| 118 |
+
llm_load_print_meta: repeating layers = 353.259 GiB (8.505 BPW, 356.786 B parameters)
|
| 119 |
+
llm_load_print_meta: general.name = GLM 4.7
|
| 120 |
+
print_info: vocab type = BPE
|
| 121 |
+
print_info: n_vocab = 151552
|
| 122 |
+
print_info: n_merges = 318088
|
| 123 |
+
print_info: BOS token = 151331 '[gMASK]'
|
| 124 |
+
print_info: EOS token = 151329 '<|endoftext|>'
|
| 125 |
+
print_info: EOT token = 151336 '<|user|>'
|
| 126 |
+
print_info: EOM token = 151338 '<|observation|>'
|
| 127 |
+
print_info: UNK token = 151329 '<|endoftext|>'
|
| 128 |
+
print_info: PAD token = 151329 '<|endoftext|>'
|
| 129 |
+
print_info: LF token = 198 'Ċ'
|
| 130 |
+
print_info: FIM PRE token = 151347 '<|code_prefix|>'
|
| 131 |
+
print_info: FIM SUF token = 151349 '<|code_suffix|>'
|
| 132 |
+
print_info: FIM MID token = 151348 '<|code_middle|>'
|
| 133 |
+
print_info: EOG token = 151329 '<|endoftext|>'
|
| 134 |
+
print_info: EOG token = 151336 '<|user|>'
|
| 135 |
+
print_info: EOG token = 151338 '<|observation|>'
|
| 136 |
+
print_info: max token length = 1024
|
| 137 |
+
llm_load_tensors: ggml ctx size = 0.72 MiB
|
| 138 |
+
model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
|
| 139 |
+
model has unused tensor blk.92.attn_q.weight (size = 66846720 bytes) -- ignoring
|
| 140 |
+
model has unused tensor blk.92.attn_k.weight (size = 5570560 bytes) -- ignoring
|
| 141 |
+
model has unused tensor blk.92.attn_v.weight (size = 5570560 bytes) -- ignoring
|
| 142 |
+
model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
|
| 143 |
+
model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
|
| 144 |
+
model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
|
| 145 |
+
model has unused tensor blk.92.attn_output.weight (size = 66846720 bytes) -- ignoring
|
| 146 |
+
model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
|
| 147 |
+
model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
|
| 148 |
+
model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
|
| 149 |
+
model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
|
| 150 |
+
model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
|
| 151 |
+
model has unused tensor blk.92.ffn_gate_exps.weight (size = 1336934400 bytes) -- ignoring
|
| 152 |
+
model has unused tensor blk.92.ffn_down_exps.weight (size = 1336934400 bytes) -- ignoring
|
| 153 |
+
model has unused tensor blk.92.ffn_up_exps.weight (size = 1336934400 bytes) -- ignoring
|
| 154 |
+
model has unused tensor blk.92.ffn_gate_shexp.weight (size = 8355840 bytes) -- ignoring
|
| 155 |
+
model has unused tensor blk.92.ffn_down_shexp.weight (size = 8355840 bytes) -- ignoring
|
| 156 |
+
model has unused tensor blk.92.ffn_up_shexp.weight (size = 8355840 bytes) -- ignoring
|
| 157 |
+
model has unused tensor blk.92.nextn.eh_proj.weight (size = 55705600 bytes) -- ignoring
|
| 158 |
+
model has unused tensor blk.92.nextn.embed_tokens.weight (size = 824442880 bytes) -- ignoring
|
| 159 |
+
model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
|
| 160 |
+
model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
|
| 161 |
+
model has unused tensor blk.92.nextn.shared_head_head.weight (size = 824442880 bytes) -- ignoring
|
| 162 |
+
model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring
|
| 163 |
+
llm_load_tensors: offloading 0 repeating layers to GPU
|
| 164 |
+
llm_load_tensors: offloaded 0/94 layers to GPU
|
| 165 |
+
llm_load_tensors: CPU buffer size = 357693.32 MiB
|
| 166 |
+
....................................................................................................
|
| 167 |
+
llama_new_context_with_model: n_ctx = 4096
|
| 168 |
+
llama_new_context_with_model: n_batch = 4096
|
| 169 |
+
llama_new_context_with_model: n_ubatch = 4096
|
| 170 |
+
llama_new_context_with_model: flash_attn = 1
|
| 171 |
+
llama_new_context_with_model: attn_max_b = 0
|
| 172 |
+
llama_new_context_with_model: fused_moe = 1
|
| 173 |
+
llama_new_context_with_model: grouped er = 0
|
| 174 |
+
llama_new_context_with_model: fused_up_gate = 1
|
| 175 |
+
llama_new_context_with_model: fused_mmad = 1
|
| 176 |
+
llama_new_context_with_model: rope_cache = 0
|
| 177 |
+
llama_new_context_with_model: graph_reuse = 0
|
| 178 |
+
llama_new_context_with_model: k_cache_hadam = 0
|
| 179 |
+
llama_new_context_with_model: split_mode_graph_scheduling = 0
|
| 180 |
+
llama_new_context_with_model: ser = -1, 0
|
| 181 |
+
llama_new_context_with_model: freq_base = 1000000.0
|
| 182 |
+
llama_new_context_with_model: freq_scale = 1
|
| 183 |
+
llama_kv_cache_init: CPU KV buffer size = 1472.00 MiB
|
| 184 |
+
llama_new_context_with_model: KV self size = 1472.00 MiB, K (f16): 736.00 MiB, V (f16): 736.00 MiB
|
| 185 |
+
llama_new_context_with_model: CPU output buffer size = 4.63 MiB
|
| 186 |
+
llama_new_context_with_model: CPU compute buffer size = 2448.00 MiB
|
| 187 |
+
llama_new_context_with_model: graph nodes = 4094
|
| 188 |
+
llama_new_context_with_model: graph splits = 1
|
| 189 |
+
XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
|
| 190 |
+
|
| 191 |
+
system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
| 192 |
+
perplexity: tokenizing the input ..
|
| 193 |
+
perplexity: tokenization took 360.715 ms
|
| 194 |
+
perplexity: calculating perplexity over 565 chunks, n_ctx=512, batch_size=4096, n_seq=8
|
| 195 |
+
perplexity: 25.21 seconds per pass - ETA 29.67 minutes
|
| 196 |
+
======================================= HAVE_FANCY_SIMD is defined
|
| 197 |
+
[1]2.8959,[2]3.6434,[3]2.8310,[4]2.5436,[5]2.5836,[6]2.7702,[7]2.8400,[8]2.8174,[9]2.9294,[10]2.8401,[11]2.8496,[12]3.0270,[13]3.0620,[14]3.0705,[15]3.2246,[16]3.3080,[17]3.4611,[18]3.6886,[19]3.6442,[20]3.6780,[21]3.7565,[22]3.7264,[23]3.6369,[24]3.5435,[25]3.4685,[26]3.4180,[27]3.3834,[28]3.4027,[29]3.4477,[30]3.5210,[31]3.5891,[32]3.6516,[33]3.7132,[34]3.7451,[35]3.8137,[36]3.8594,[37]3.8661,[38]3.9354,[39]3.9641,[40]4.0048,[41]4.0810,[42]4.1113,[43]4.1201,[44]4.1480,[45]4.2458,[46]4.3072,[47]4.2442,[48]4.1574,[49]4.0901,[50]4.0465,[51]4.0657,[52]4.0822,[53]4.1156,[54]4.1056,[55]4.1212,[56]4.1420,[57]4.1044,[58]4.1039,[59]4.0942,[60]4.1331,[61]4.1697,[62]4.2170,[63]4.2465,[64]4.2550,[65]4.2505,[66]4.2230,[67]4.1734,[68]4.1364,[69]4.1563,[70]4.1579,[71]4.1559,[72]4.1545,[73]4.1637,[74]4.1959,[75]4.2064,[76]4.1549,[77]4.1298,[78]4.1117,[79]4.0598,[80]4.0158,[81]4.0294,[82]4.0146,[83]4.0108,[84]4.0195,[85]3.9923,[86]3.9916,[87]3.9726,[88]3.9728,[89]3.9561,[90]3.9278,[91]3.8956,[92]3.9027,[93]3.9185,[94]3.9019,[95]3.9027,[96]3.9306,[97]3.9850,[98]3.9926,[99]3.9812,[100]3.9634,[101]3.9818,[102]3.9879,[103]4.0120,[104]4.0010,[105]4.0156,[106]4.0467,[107]4.1231,[108]4.1280,[109]4.1357,[110]4.1785,[111]4.2032,[112]4.1710,[113]4.1368,[114]4.1108,[115]4.0851,[116]4.0736,[117]4.0560,[118]4.0587,[119]4.0481,[120]4.0329,[121]4.0274,[122]4.0069,[123]3.9776,[124]3.9540,[125]3.9373,[126]3.9145,[127]3.9066,[128]3.8968,[129]3.8941,[130]3.8821,[131]3.8662,[132]3.8516,[133]3.8443,[134]3.8524,[135]3.8710,[136]3.8620,[137]3.8627,[138]3.8528,[139]3.8387,[140]3.8534,[141]3.8502,[142]3.8484,[143]3.8394,[144]3.8351,[145]3.8286,[146]3.8248,[147]3.8222,[148]3.8222,[149]3.8209,[150]3.8206,[151]3.8077,[152]3.7998,[153]3.8008,[154]3.7940,[155]3.7907,[156]3.7890,[157]3.7872,[158]3.7872,[159]3.8032,[160]3.8161,[161]3.8224,[162]3.8287,[163]3.8204,[164]3.8305,[165]3.8369,[166]3.8625,[167]3.8860,[168]3.8959,[169]3.9262,[170]3.9469,[171]3.9577,[172]3.9865,[173]3.9727,[174]3.9564,[175]3.9329,[176]3.9114,[177]3.8957,[178]3.8778,[179]3.8558,[180]3.8507,[181]3.8438,[182]3.8594,[183]3.8794,[184]3.9089,[185]3.9272,[186]3.9334,[187]3.9559,[188]3.9880,[189]4.0094,[190]4.0229,[191]4.0421,[192]4.0492,[193]4.0587,[194]4.0587,[195]4.0535,[196]4.0500,[197]4.0631,[198]4.0796,[199]4.0712,[200]4.0760,[201]4.0759,[202]4.0756,[203]4.0696,[204]4.0778,[205]4.0834,[206]4.0877,[207]4.0912,[208]4.0955,[209]4.0946,[210]4.0919,[211]4.0961,[212]4.0910,[213]4.0873,[214]4.0881,[215]4.0889,[216]4.0900,[217]4.0884,[218]4.0964,[219]4.0895,[220]4.0854,[221]4.0809,[222]4.0791,[223]4.0787,[224]4.0801,[225]4.0796,[226]4.0838,[227]4.0773,[228]4.0732,[229]4.0597,[230]4.0484,[231]4.0414,[232]4.0428,[233]4.0405,[234]4.0377,[235]4.0286,[236]4.0343,[237]4.0335,[238]4.0402,[239]4.0498,[240]4.0630,[241]4.0730,[242]4.0821,[243]4.0943,[244]4.1055,[245]4.1200,[246]4.1317,[247]4.1456,[248]4.1515,[249]4.1541,[250]4.1523,[251]4.1369,[252]4.1263,[253]4.1245,[254]4.1243,[255]4.1253,[256]4.1307,[257]4.1307,[258]4.1305,[259]4.1322,[260]4.1356,[261]4.1327,[262]4.1342,[263]4.1330,[264]4.1324,[265]4.1325,[266]4.1326,[267]4.1305,[268]4.1289,[269]4.1260,[270]4.1317,[271]4.1313,[272]4.1257,[273]4.1244,[274]4.1134,[275]4.1103,[276]4.0964,[277]4.0917,[278]4.0874,[279]4.0890,[280]4.0952,[281]4.0968,[282]4.1033,[283]4.1100,[284]4.1127,[285]4.1175,[286]4.1287,[287]4.1437,[288]4.1408,[289]4.1397,[290]4.1404,[291]4.1400,[292]4.1343,[293]4.1202,[294]4.1160,[295]4.1165,[296]4.1065,[297]4.0942,[298]4.0864,[299]4.0755,[300]4.0647,[301]4.0612,[302]4.0495,[303]4.0404,[304]4.0284,[305]4.0184,[306]4.0143,[307]4.0180,[308]4.0233,[309]4.0367,[310]4.0238,[311]4.0212,[312]4.0107,[313]4.0034,[314]3.9977,[315]3.9950,[316]3.9859,[317]3.9778,[318]3.9702,[319]3.9620,[320]3.9558,[321]3.9499,[322]3.9452,[323]3.9349,[324]3.9271,[325]3.9225,[326]3.9163,[327]3.9167,[328]3.9152,[329]3.9142,[330]3.9111,[331]3.9074,[332]3.9128,[333]3.9161,[334]3.9195,[335]3.9202,[336]3.9201,[337]3.9212,[338]3.9199,[339]3.9193,[340]3.9212,[341]3.9228,[342]3.9258,[343]3.9339,[344]3.9405,[345]3.9526,[346]3.9522,[347]3.9450,[348]3.9423,[349]3.9441,[350]3.9367,[351]3.9249,[352]3.9172,[353]3.9145,[354]3.9164,[355]3.9242,[356]3.9373,[357]3.9406,[358]3.9443,[359]3.9535,[360]3.9655,[361]3.9672,[362]3.9728,[363]3.9787,[364]3.9844,[365]3.9868,[366]3.9910,[367]3.9949,[368]4.0013,[369]4.0081,[370]4.0143,[371]4.0169,[372]4.0260,[373]4.0397,[374]4.0495,[375]4.0549,[376]4.0584,[377]4.0628,[378]4.0758,[379]4.0880,[380]4.0904,[381]4.0858,[382]4.0842,[383]4.0851,[384]4.0921,[385]4.0958,[386]4.0998,[387]4.1016,[388]4.1047,[389]4.1113,[390]4.1120,[391]4.1027,[392]4.0948,[393]4.0865,[394]4.0821,[395]4.0769,[396]4.0705,[397]4.0613,[398]4.0546,[399]4.0499,[400]4.0388,[401]4.0347,[402]4.0352,[403]4.0264,[404]4.0172,[405]4.0147,[406]4.0081,[407]3.9994,[408]3.9898,[409]3.9830,[410]3.9754,[411]3.9729,[412]3.9715,[413]3.9726,[414]3.9661,[415]3.9669,[416]3.9641,[417]3.9565,[418]3.9474,[419]3.9536,[420]3.9480,[421]3.9502,[422]3.9511,[423]3.9437,[424]3.9430,[425]3.9426,[426]3.9427,[427]3.9407,[428]3.9416,[429]3.9369,[430]3.9371,[431]3.9370,[432]3.9307,[433]3.9249,[434]3.9173,[435]3.9165,[436]3.9090,[437]3.9024,[438]3.8963,[439]3.8945,[440]3.8950,[441]3.8937,[442]3.8922,[443]3.8990,[444]3.9095,[445]3.9055,[446]3.9026,[447]3.9006,[448]3.8990,[449]3.9045,[450]3.9039,[451]3.9022,[452]3.9055,[453]3.9131,[454]3.9161,[455]3.9167,[456]3.9206,[457]3.9207,[458]3.9231,[459]3.9236,[460]3.9291,[461]3.9344,[462]3.9372,[463]3.9386,[464]3.9347,[465]3.9335,[466]3.9423,[467]3.9422,[468]3.9420,[469]3.9482,[470]3.9498,[471]3.9545,[472]3.9600,[473]3.9611,[474]3.9597,[475]3.9624,[476]3.9648,[477]3.9676,[478]3.9668,[479]3.9677,[480]3.9684,[481]3.9708,[482]3.9717,[483]3.9769,[484]3.9736,[485]3.9763,[486]3.9747,[487]3.9800,[488]3.9868,[489]3.9931,[490]3.9936,[491]3.9980,[492]4.0018,[493]4.0046,[494]4.0104,[495]4.0159,[496]4.0151,[497]4.0141,[498]4.0144,[499]4.0157,[500]4.0174,[501]4.0171,[502]4.0167,[503]4.0213,[504]4.0270,[505]4.0272,[506]4.0266,[507]4.0290,[508]4.0340,[509]4.0422,[510]4.0450,[511]4.0495,[512]4.0437,[513]4.0411,[514]4.0371,[515]4.0380,[516]4.0353,[517]4.0336,[518]4.0324,[519]4.0278,[520]4.0275,[521]4.0273,[522]4.0228,[523]4.0213,[524]4.0233,[525]4.0224,[526]4.0202,[527]4.0225,[528]4.0174,[529]4.0120,[530]4.0074,[531]4.0026,[532]4.0026,[533]4.0000,[534]3.9972,[535]3.9927,[536]3.9871,[537]3.9798,[538]3.9782,[539]3.9698,[540]3.9683,[541]3.9724,[542]3.9702,[543]3.9649,[544]3.9627,[545]3.9644,[546]3.9640,[547]3.9662,[548]3.9646,[549]3.9587,[550]3.9531,[551]3.9488,[552]3.9426,[553]3.9391,[554]3.9354,[555]3.9293,[556]3.9236,[557]3.9196,[558]3.9225,[559]3.9208,[560]3.9187,[561]3.9201,[562]3.9244,[563]3.9297,[564]3.9335,[565]3.9320,
|
| 198 |
+
llama_print_timings: load time = 137402.15 ms
|
| 199 |
+
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 200 |
+
llama_print_timings: prompt eval time = 1530320.65 ms / 289280 tokens ( 5.29 ms per token, 189.03 tokens per second)
|
| 201 |
+
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 202 |
+
llama_print_timings: total time = 1541349.05 ms / 289281 tokens
|
| 203 |
+
|
| 204 |
+
Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9320 +/- 0.02428
|