Add baseline perplexity logs/data from BF16 & Q8_0

Browse files

Files changed (3) hide show

README.md +5 -4
logs/perplexity-GLM-4.7-BF16.log +208 -0
logs/perplexity-GLM-4.7-Q8_0.log +204 -0

README.md CHANGED Viewed

@@ -20,7 +20,8 @@ Currently cooking this now!
 - [x] download bf16 safetensors https://huggingface.co/zai-org/GLM-4.7
 - [x] use llama.cpp/convert_hf_to_gguf.py to create bf16 GGUF
 - [x] calculate imatrix and upload to HF first so others can use as desired
-- [ ] cook Q8_0 and test perplexity of BF16 and Q8_0 for baseline data
 - [ ] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
 - [ ] upload IQ5_K if all looking good
 - [ ] continue with smaller quants
@@ -47,9 +48,9 @@ Perplexity computed against *wiki.test.raw*.
 These first two are just test quants for baseline perplexity comparison:
 * `BF16` 667.598 GiB (16.003 BPW)
-  - Final estimate: PPL = TODO
-* `Q8_0`
-  - Final estimate: PPL = TODO
 ## IQ5_K TODO

 - [x] download bf16 safetensors https://huggingface.co/zai-org/GLM-4.7
 - [x] use llama.cpp/convert_hf_to_gguf.py to create bf16 GGUF
 - [x] calculate imatrix and upload to HF first so others can use as desired
+- [x] cook Q8_0 and test perplexity of BF16 and Q8_0 for baseline data
+- [ ] look into making MTP nextn tensors full q8_0 (won't effect RAM+VRAM usage otherwise)
 - [ ] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
 - [ ] upload IQ5_K if all looking good
 - [ ] continue with smaller quants
 These first two are just test quants for baseline perplexity comparison:
 * `BF16` 667.598 GiB (16.003 BPW)
+  - Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9267 +/- 0.02423
+* `Q8_0` 354.794 GiB (8.505 BPW)
+  - Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9320 +/- 0.02428
 ## IQ5_K TODO

logs/perplexity-GLM-4.7-BF16.log ADDED Viewed

	@@ -0,0 +1,208 @@

+model=/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf
+numactl -N "$SOCKET" -m "$SOCKET" \
+./build/bin/llama-perplexity \
+    -m "$model" \
+    -f wiki.test.raw \
+    --seed 1337 \
+    --ctx-size 512 \
+    -ub 4096 -b 4096 \
+    --numa numactl \
+    --threads 96 \
+    --threads-batch 128 \
+    --validate-quants \
+    --no-mmap
+SOCKET is set to: 0
+main: build = 4073 (55626050)
+main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
+main: seed  = 1337
+CPU: using device CPU - 0 MiB free
+llama_model_loader: additional 14 GGUFs metadata loaded.
+llama_model_loader: loaded meta data with 49 key-value pairs and 1761 tensors from /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf (version GGUF V3 (latest))
+llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
+llama_model_loader: - kv   0:                       general.architecture str              = glm4moe
+llama_model_loader: - kv   1:                               general.type str              = model
+llama_model_loader: - kv   2:                      general.sampling.temp f32              = 1.000000
+llama_model_loader: - kv   3:                               general.name str              = GLM 4.7
+llama_model_loader: - kv   4:                            general.version str              = 4.7
+llama_model_loader: - kv   5:                           general.basename str              = GLM
+llama_model_loader: - kv   6:                         general.size_label str              = 160x21B
+llama_model_loader: - kv   7:                            general.license str              = mit
+llama_model_loader: - kv   8:                               general.tags arr[str,1]       = ["text-generation"]
+llama_model_loader: - kv   9:                          general.languages arr[str,2]       = ["en", "zh"]
+llama_model_loader: - kv  10:                        glm4moe.block_count u32              = 93
+llama_model_loader: - kv  11:                     glm4moe.context_length u32              = 202752
+llama_model_loader: - kv  12:                   glm4moe.embedding_length u32              = 5120
+llama_model_loader: - kv  13:                glm4moe.feed_forward_length u32              = 12288
+llama_model_loader: - kv  14:               glm4moe.attention.head_count u32              = 96
+llama_model_loader: - kv  15:            glm4moe.attention.head_count_kv u32              = 8
+llama_model_loader: - kv  16:                     glm4moe.rope.freq_base f32              = 1000000.000000
+llama_model_loader: - kv  17:   glm4moe.attention.layer_norm_rms_epsilon f32              = 0.000010
+llama_model_loader: - kv  18:                  glm4moe.expert_used_count u32              = 8
+llama_model_loader: - kv  19:                 glm4moe.expert_group_count u32              = 1
+llama_model_loader: - kv  20:            glm4moe.expert_group_used_count u32              = 1
+llama_model_loader: - kv  21:               glm4moe.attention.key_length u32              = 128
+llama_model_loader: - kv  22:             glm4moe.attention.value_length u32              = 128
+llama_model_loader: - kv  23:                          general.file_type u32              = 32
+llama_model_loader: - kv  24:               glm4moe.rope.dimension_count u32              = 64
+llama_model_loader: - kv  25:                       glm4moe.expert_count u32              = 160
+llama_model_loader: - kv  26:         glm4moe.expert_feed_forward_length u32              = 1536
+llama_model_loader: - kv  27:                glm4moe.expert_shared_count u32              = 1
+llama_model_loader: - kv  28:          glm4moe.leading_dense_block_count u32              = 3
+llama_model_loader: - kv  29:                 glm4moe.expert_gating_func u32              = 2
+llama_model_loader: - kv  30:               glm4moe.expert_weights_scale f32              = 2.500000
+llama_model_loader: - kv  31:                glm4moe.expert_weights_norm bool             = true
+llama_model_loader: - kv  32:               glm4moe.nextn_predict_layers u32              = 1
+llama_model_loader: - kv  33:               general.quantization_version u32              = 2
+llama_model_loader: - kv  34:                       tokenizer.ggml.model str              = gpt2
+llama_model_loader: - kv  35:                         tokenizer.ggml.pre str              = glm4
+llama_model_loader: - kv  36:                      tokenizer.ggml.tokens arr[str,151552]  = ["!", "\"", "#", "$", "%", "&", "'", ...
+llama_model_loader: - kv  37:                  tokenizer.ggml.token_type arr[i32,151552]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
+llama_model_loader: - kv  38:                      tokenizer.ggml.merges arr[str,318088]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
+llama_model_loader: - kv  39:                tokenizer.ggml.eos_token_id u32              = 151329
+llama_model_loader: - kv  40:            tokenizer.ggml.padding_token_id u32              = 151329
+llama_model_loader: - kv  41:                tokenizer.ggml.bos_token_id u32              = 151331
+llama_model_loader: - kv  42:                tokenizer.ggml.eot_token_id u32              = 151336
+llama_model_loader: - kv  43:            tokenizer.ggml.unknown_token_id u32              = 151329
+llama_model_loader: - kv  44:                tokenizer.ggml.eom_token_id u32              = 151338
+llama_model_loader: - kv  45:                    tokenizer.chat_template str              = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
+llama_model_loader: - kv  46:                                   split.no u16              = 0
+llama_model_loader: - kv  47:                                split.count u16              = 15
+llama_model_loader: - kv  48:                        split.tensors.count i32              = 1761
+llama_model_loader: - type  f32:  835 tensors
+llama_model_loader: - type bf16:  926 tensors
+load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: printing all EOG tokens:
+load:   - 151329 ('<|endoftext|>')
+load:   - 151336 ('<|user|>')
+load:   - 151338 ('<|observation|>')
+load: special tokens cache size = 36
+load: token to piece cache size = 0.9713 MB
+llm_load_print_meta: format           = GGUF V3 (latest)
+llm_load_print_meta: arch             = glm4moe
+llm_load_print_meta: n_ctx_train      = 202752
+llm_load_print_meta: n_embd           = 5120
+llm_load_print_meta: n_layer          = 93
+llm_load_print_meta: n_head           = 96
+llm_load_print_meta: n_head_kv        = 8
+llm_load_print_meta: n_rot            = 64
+llm_load_print_meta: n_swa            = 0
+llm_load_print_meta: n_swa_pattern    = 1
+llm_load_print_meta: n_embd_head_k    = 128
+llm_load_print_meta: n_embd_head_v    = 128
+llm_load_print_meta: n_gqa            = 12
+llm_load_print_meta: n_embd_k_gqa     = 1024
+llm_load_print_meta: n_embd_v_gqa     = 1024
+llm_load_print_meta: f_norm_eps       = 0.0e+00
+llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
+llm_load_print_meta: f_clamp_kqv      = 0.0e+00
+llm_load_print_meta: f_max_alibi_bias = 0.0e+00
+llm_load_print_meta: f_logit_scale    = 0.0e+00
+llm_load_print_meta: n_ff             = 12288
+llm_load_print_meta: n_expert         = 160
+llm_load_print_meta: n_expert_used    = 8
+llm_load_print_meta: causal attn      = 1
+llm_load_print_meta: pooling type     = 0
+llm_load_print_meta: rope type        = 2
+llm_load_print_meta: rope scaling     = linear
+llm_load_print_meta: freq_base_train  = 1000000.0
+llm_load_print_meta: freq_scale_train = 1
+llm_load_print_meta: n_ctx_orig_yarn  = 202752
+llm_load_print_meta: rope_finetuned   = unknown
+llm_load_print_meta: ssm_d_conv       = 0
+llm_load_print_meta: ssm_d_inner      = 0
+llm_load_print_meta: ssm_d_state      = 0
+llm_load_print_meta: ssm_dt_rank      = 0
+llm_load_print_meta: model type       = 355B.A32B
+llm_load_print_meta: model ftype      = BF16
+llm_load_print_meta: model params     = 358.338 B
+llm_load_print_meta: model size       = 667.598 GiB (16.003 BPW)
+llm_load_print_meta: repeating layers = 664.707 GiB (16.003 BPW, 356.786 B parameters)
+llm_load_print_meta: general.name     = GLM 4.7
+print_info: vocab type       = BPE
+print_info: n_vocab          = 151552
+print_info: n_merges         = 318088
+print_info: BOS token        = 151331 '[gMASK]'
+print_info: EOS token        = 151329 '<|endoftext|>'
+print_info: EOT token        = 151336 '<|user|>'
+print_info: EOM token        = 151338 '<|observation|>'
+print_info: UNK token        = 151329 '<|endoftext|>'
+print_info: PAD token        = 151329 '<|endoftext|>'
+print_info: LF token         = 198 'Ċ'
+print_info: FIM PRE token    = 151347 '<|code_prefix|>'
+print_info: FIM SUF token    = 151349 '<|code_suffix|>'
+print_info: FIM MID token    = 151348 '<|code_middle|>'
+print_info: EOG token        = 151329 '<|endoftext|>'
+print_info: EOG token        = 151336 '<|user|>'
+print_info: EOG token        = 151338 '<|observation|>'
+print_info: max token length = 1024
+llm_load_tensors: ggml ctx size =    0.72 MiB
+model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.attn_q.weight (size = 125829120 bytes) -- ignoring
+model has unused tensor blk.92.attn_k.weight (size = 10485760 bytes) -- ignoring
+model has unused tensor blk.92.attn_v.weight (size = 10485760 bytes) -- ignoring
+model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
+model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
+model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
+model has unused tensor blk.92.attn_output.weight (size = 125829120 bytes) -- ignoring
+model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
+model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
+model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
+model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_exps.weight (size = 2516582400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_down_exps.weight (size = 2516582400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_up_exps.weight (size = 2516582400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_shexp.weight (size = 15728640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_down_shexp.weight (size = 15728640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_up_shexp.weight (size = 15728640 bytes) -- ignoring
+model has unused tensor blk.92.nextn.eh_proj.weight (size = 104857600 bytes) -- ignoring
+model has unused tensor blk.92.nextn.embed_tokens.weight (size = 1551892480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.shared_head_head.weight (size = 1551892480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring
+llm_load_tensors: offloading 0 repeating layers to GPU
+llm_load_tensors: offloaded 0/94 layers to GPU
+llm_load_tensors:        CPU buffer size = 673051.91 MiB
+....................................................................................................
+llama_new_context_with_model: n_ctx         = 4096
+llama_new_context_with_model: n_batch       = 4096
+llama_new_context_with_model: n_ubatch      = 4096
+llama_new_context_with_model: flash_attn    = 1
+llama_new_context_with_model: attn_max_b    = 0
+llama_new_context_with_model: fused_moe     = 1
+llama_new_context_with_model: grouped er    = 0
+llama_new_context_with_model: fused_up_gate = 1
+llama_new_context_with_model: fused_mmad    = 1
+llama_new_context_with_model: rope_cache    = 0
+llama_new_context_with_model: graph_reuse   = 0
+llama_new_context_with_model: k_cache_hadam = 0
+llama_new_context_with_model: split_mode_graph_scheduling = 0
+llama_new_context_with_model: ser           = -1, 0
+llama_new_context_with_model: freq_base     = 1000000.0
+llama_new_context_with_model: freq_scale    = 1
+llama_kv_cache_init:        CPU KV buffer size =  1472.00 MiB
+llama_new_context_with_model: KV self size  = 1472.00 MiB, K (f16):  736.00 MiB, V (f16):  736.00 MiB
+llama_new_context_with_model:        CPU  output buffer size =     4.63 MiB
+llama_new_context_with_model:        CPU compute buffer size =  2448.00 MiB
+llama_new_context_with_model: graph nodes  = 4278
+llama_new_context_with_model: graph splits = 1
+XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
+system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
+perplexity: tokenizing the input ..
+perplexity: tokenization took 369.524 ms
+perplexity: calculating perplexity over 565 chunks, n_ctx=512, batch_size=4096, n_seq=8
+perplexity: 35.41 seconds per pass - ETA 41.67 minutes
+======================================= HAVE_FANCY_SIMD is defined
+[1]2.8926,[2]3.5859,[3]2.7995,[4]2.5299,[5]2.5820,[6]2.7924,[7]2.8602,[8]2.8440,[9]2.9531,[10]2.8590,[11]2.8730,[12]3.0487,[13]3.0794,[14]3.0892,[15]3.2528,[16]3.3286,[17]3.4781,[18]3.6998,[19]3.6556,[20]3.7121,[21]3.7950,[22]3.7604,[23]3.6688,[24]3.5707,[25]3.4949,[26]3.4440,[27]3.4100,[28]3.4292,[29]3.4712,[30]3.5417,[31]3.6083,[32]3.6691,[33]3.7315,[34]3.7650,[35]3.8319,[36]3.8767,[37]3.8820,[38]3.9504,[39]3.9800,[40]4.0195,[41]4.0972,[42]4.1190,[43]4.1284,[44]4.1543,[45]4.2516,[46]4.3116,[47]4.2478,[48]4.1584,[49]4.0910,[50]4.0475,[51]4.0689,[52]4.0855,[53]4.1178,[54]4.1084,[55]4.1223,[56]4.1429,[57]4.1026,[58]4.1010,[59]4.0923,[60]4.1306,[61]4.1677,[62]4.2150,[63]4.2443,[64]4.2543,[65]4.2501,[66]4.2200,[67]4.1734,[68]4.1355,[69]4.1541,[70]4.1574,[71]4.1546,[72]4.1518,[73]4.1598,[74]4.1934,[75]4.1994,[76]4.1521,[77]4.1263,[78]4.1095,[79]4.0587,[80]4.0150,[81]4.0283,[82]4.0150,[83]4.0132,[84]4.0227,[85]3.9949,[86]3.9912,[87]3.9738,[88]3.9697,[89]3.9520,[90]3.9212,[91]3.8889,[92]3.8951,[93]3.9118,[94]3.8968,[95]3.8973,[96]3.9268,[97]3.9782,[98]3.9853,[99]3.9751,[100]3.9550,[101]3.9726,[102]3.9796,[103]4.0039,[104]3.9914,[105]4.0071,[106]4.0377,[107]4.1099,[108]4.1147,[109]4.1234,[110]4.1661,[111]4.1921,[112]4.1598,[113]4.1268,[114]4.1000,[115]4.0745,[116]4.0617,[117]4.0441,[118]4.0464,[119]4.0372,[120]4.0213,[121]4.0136,[122]3.9937,[123]3.9645,[124]3.9409,[125]3.9260,[126]3.9032,[127]3.8961,[128]3.8874,[129]3.8858,[130]3.8736,[131]3.8596,[132]3.8444,[133]3.8363,[134]3.8445,[135]3.8635,[136]3.8547,[137]3.8548,[138]3.8440,[139]3.8302,[140]3.8446,[141]3.8409,[142]3.8392,[143]3.8296,[144]3.8255,[145]3.8188,[146]3.8146,[147]3.8108,[148]3.8116,[149]3.8098,[150]3.8094,[151]3.7967,[152]3.7892,[153]3.7905,[154]3.7836,[155]3.7803,[156]3.7782,[157]3.7770,[158]3.7761,[159]3.7935,[160]3.8045,[161]3.8100,[162]3.8173,[163]3.8091,[164]3.8198,[165]3.8265,[166]3.8521,[167]3.8757,[168]3.8860,[169]3.9163,[170]3.9370,[171]3.9477,[172]3.9762,[173]3.9626,[174]3.9470,[175]3.9237,[176]3.9019,[177]3.8868,[178]3.8703,[179]3.8488,[180]3.8429,[181]3.8378,[182]3.8535,[183]3.8738,[184]3.9045,[185]3.9231,[186]3.9293,[187]3.9519,[188]3.9836,[189]4.0054,[190]4.0192,[191]4.0387,[192]4.0456,[193]4.0544,[194]4.0543,[195]4.0486,[196]4.0450,[197]4.0581,[198]4.0750,[199]4.0670,[200]4.0727,[201]4.0722,[202]4.0716,[203]4.0654,[204]4.0744,[205]4.0801,[206]4.0843,[207]4.0873,[208]4.0926,[209]4.0917,[210]4.0891,[211]4.0932,[212]4.0882,[213]4.0845,[214]4.0857,[215]4.0868,[216]4.0884,[217]4.0867,[218]4.0942,[219]4.0868,[220]4.0828,[221]4.0791,[222]4.0770,[223]4.0765,[224]4.0779,[225]4.0765,[226]4.0812,[227]4.0746,[228]4.0713,[229]4.0567,[230]4.0453,[231]4.0378,[232]4.0381,[233]4.0360,[234]4.0335,[235]4.0250,[236]4.0318,[237]4.0302,[238]4.0364,[239]4.0459,[240]4.0590,[241]4.0687,[242]4.0778,[243]4.0899,[244]4.1006,[245]4.1147,[246]4.1263,[247]4.1401,[248]4.1464,[249]4.1493,[250]4.1472,[251]4.1316,[252]4.1209,[253]4.1191,[254]4.1189,[255]4.1198,[256]4.1254,[257]4.1256,[258]4.1253,[259]4.1267,[260]4.1302,[261]4.1271,[262]4.1285,[263]4.1278,[264]4.1275,[265]4.1276,[266]4.1275,[267]4.1251,[268]4.1233,[269]4.1204,[270]4.1261,[271]4.1258,[272]4.1201,[273]4.1190,[274]4.1083,[275]4.1042,[276]4.0905,[277]4.0848,[278]4.0805,[279]4.0820,[280]4.0881,[281]4.0898,[282]4.0962,[283]4.1031,[284]4.1058,[285]4.1108,[286]4.1218,[287]4.1370,[288]4.1342,[289]4.1329,[290]4.1335,[291]4.1335,[292]4.1274,[293]4.1134,[294]4.1103,[295]4.1105,[296]4.1009,[297]4.0888,[298]4.0808,[299]4.0697,[300]4.0591,[301]4.0560,[302]4.0441,[303]4.0357,[304]4.0237,[305]4.0140,[306]4.0099,[307]4.0140,[308]4.0193,[309]4.0326,[310]4.0195,[311]4.0172,[312]4.0067,[313]3.9996,[314]3.9946,[315]3.9920,[316]3.9833,[317]3.9755,[318]3.9677,[319]3.9595,[320]3.9536,[321]3.9482,[322]3.9438,[323]3.9335,[324]3.9261,[325]3.9215,[326]3.9150,[327]3.9152,[328]3.9144,[329]3.9135,[330]3.9103,[331]3.9061,[332]3.9120,[333]3.9152,[334]3.9183,[335]3.9194,[336]3.9192,[337]3.9205,[338]3.9191,[339]3.9186,[340]3.9205,[341]3.9221,[342]3.9250,[343]3.9330,[344]3.9396,[345]3.9520,[346]3.9518,[347]3.9450,[348]3.9426,[349]3.9442,[350]3.9370,[351]3.9252,[352]3.9174,[353]3.9150,[354]3.9170,[355]3.9247,[356]3.9376,[357]3.9408,[358]3.9448,[359]3.9542,[360]3.9664,[361]3.9682,[362]3.9736,[363]3.9792,[364]3.9850,[365]3.9872,[366]3.9916,[367]3.9953,[368]4.0015,[369]4.0083,[370]4.0144,[371]4.0172,[372]4.0259,[373]4.0398,[374]4.0499,[375]4.0548,[376]4.0584,[377]4.0630,[378]4.0761,[379]4.0884,[380]4.0907,[381]4.0854,[382]4.0842,[383]4.0849,[384]4.0920,[385]4.0956,[386]4.0997,[387]4.1017,[388]4.1049,[389]4.1116,[390]4.1125,[391]4.1031,[392]4.0949,[393]4.0861,[394]4.0818,[395]4.0761,[396]4.0697,[397]4.0610,[398]4.0538,[399]4.0491,[400]4.0380,[401]4.0337,[402]4.0337,[403]4.0253,[404]4.0164,[405]4.0131,[406]4.0054,[407]3.9969,[408]3.9874,[409]3.9806,[410]3.9732,[411]3.9717,[412]3.9702,[413]3.9710,[414]3.9645,[415]3.9647,[416]3.9618,[417]3.9547,[418]3.9455,[419]3.9515,[420]3.9461,[421]3.9482,[422]3.9493,[423]3.9420,[424]3.9412,[425]3.9408,[426]3.9413,[427]3.9390,[428]3.9395,[429]3.9347,[430]3.9341,[431]3.9344,[432]3.9283,[433]3.9225,[434]3.9149,[435]3.9135,[436]3.9066,[437]3.9002,[438]3.8942,[439]3.8923,[440]3.8930,[441]3.8913,[442]3.8897,[443]3.8962,[444]3.9068,[445]3.9028,[446]3.9000,[447]3.8982,[448]3.8965,[449]3.9023,[450]3.9016,[451]3.8999,[452]3.9030,[453]3.9107,[454]3.9138,[455]3.9145,[456]3.9183,[457]3.9182,[458]3.9207,[459]3.9211,[460]3.9267,[461]3.9319,[462]3.9347,[463]3.9349,[464]3.9312,[465]3.9296,[466]3.9381,[467]3.9380,[468]3.9369,[469]3.9433,[470]3.9453,[471]3.9497,[472]3.9552,[473]3.9562,[474]3.9549,[475]3.9572,[476]3.9594,[477]3.9623,[478]3.9615,[479]3.9622,[480]3.9628,[481]3.9651,[482]3.9661,[483]3.9713,[484]3.9682,[485]3.9712,[486]3.9697,[487]3.9751,[488]3.9814,[489]3.9876,[490]3.9882,[491]3.9925,[492]3.9963,[493]3.9991,[494]4.0051,[495]4.0107,[496]4.0100,[497]4.0089,[498]4.0092,[499]4.0104,[500]4.0123,[501]4.0120,[502]4.0116,[503]4.0162,[504]4.0218,[505]4.0215,[506]4.0210,[507]4.0234,[508]4.0283,[509]4.0367,[510]4.0391,[511]4.0437,[512]4.0377,[513]4.0354,[514]4.0312,[515]4.0323,[516]4.0297,[517]4.0277,[518]4.0264,[519]4.0220,[520]4.0215,[521]4.0209,[522]4.0164,[523]4.0151,[524]4.0173,[525]4.0163,[526]4.0140,[527]4.0163,[528]4.0114,[529]4.0060,[530]4.0015,[531]3.9968,[532]3.9967,[533]3.9942,[534]3.9916,[535]3.9871,[536]3.9818,[537]3.9746,[538]3.9725,[539]3.9639,[540]3.9634,[541]3.9671,[542]3.9654,[543]3.9604,[544]3.9588,[545]3.9594,[546]3.9594,[547]3.9612,[548]3.9594,[549]3.9536,[550]3.9484,[551]3.9439,[552]3.9376,[553]3.9338,[554]3.9303,[555]3.9238,[556]3.9185,[557]3.9141,[558]3.9171,[559]3.9154,[560]3.9133,[561]3.9149,[562]3.9191,[563]3.9244,[564]3.9281,[565]3.9267,
+llama_print_timings:        load time =  155689.14 ms
+llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_print_timings: prompt eval time = 2116254.98 ms / 289280 tokens (    7.32 ms per token,   136.69 tokens per second)
+llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_print_timings:       total time = 2127881.74 ms / 289281 tokens
+Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9267 +/- 0.02423

logs/perplexity-GLM-4.7-Q8_0.log ADDED Viewed

	@@ -0,0 +1,204 @@

+model=/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-Q8_0.gguf
+numactl -N "$SOCKET" -m "$SOCKET" \
+./build/bin/llama-perplexity \
+    -m "$model" \
+    -f wiki.test.raw \
+    --seed 1337 \
+    --ctx-size 512 \
+    -ub 4096 -b 4096 \
+    --numa numactl \
+    --threads 96 \
+    --threads-batch 128 \
+    --validate-quants \
+    --no-mmap
+SOCKET is set to: 1
+main: build = 4073 (55626050)
+main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
+main: seed  = 1337
+CPU: using device CPU - 0 MiB free
+llama_model_loader: loaded meta data with 46 key-value pairs and 1761 tensors from /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-Q8_0.gguf (version GGUF V3 (latest))
+llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
+llama_model_loader: - kv   0:                       general.architecture str              = glm4moe
+llama_model_loader: - kv   1:                               general.type str              = model
+llama_model_loader: - kv   2:                      general.sampling.temp f32              = 1.000000
+llama_model_loader: - kv   3:                               general.name str              = GLM 4.7
+llama_model_loader: - kv   4:                            general.version str              = 4.7
+llama_model_loader: - kv   5:                           general.basename str              = GLM
+llama_model_loader: - kv   6:                         general.size_label str              = 160x21B
+llama_model_loader: - kv   7:                            general.license str              = mit
+llama_model_loader: - kv   8:                               general.tags arr[str,1]       = ["text-generation"]
+llama_model_loader: - kv   9:                          general.languages arr[str,2]       = ["en", "zh"]
+llama_model_loader: - kv  10:                        glm4moe.block_count u32              = 93
+llama_model_loader: - kv  11:                     glm4moe.context_length u32              = 202752
+llama_model_loader: - kv  12:                   glm4moe.embedding_length u32              = 5120
+llama_model_loader: - kv  13:                glm4moe.feed_forward_length u32              = 12288
+llama_model_loader: - kv  14:               glm4moe.attention.head_count u32              = 96
+llama_model_loader: - kv  15:            glm4moe.attention.head_count_kv u32              = 8
+llama_model_loader: - kv  16:                     glm4moe.rope.freq_base f32              = 1000000.000000
+llama_model_loader: - kv  17:   glm4moe.attention.layer_norm_rms_epsilon f32              = 0.000010
+llama_model_loader: - kv  18:                  glm4moe.expert_used_count u32              = 8
+llama_model_loader: - kv  19:                 glm4moe.expert_group_count u32              = 1
+llama_model_loader: - kv  20:            glm4moe.expert_group_used_count u32              = 1
+llama_model_loader: - kv  21:               glm4moe.attention.key_length u32              = 128
+llama_model_loader: - kv  22:             glm4moe.attention.value_length u32              = 128
+llama_model_loader: - kv  23:                          general.file_type u32              = 7
+llama_model_loader: - kv  24:               glm4moe.rope.dimension_count u32              = 64
+llama_model_loader: - kv  25:                       glm4moe.expert_count u32              = 160
+llama_model_loader: - kv  26:         glm4moe.expert_feed_forward_length u32              = 1536
+llama_model_loader: - kv  27:                glm4moe.expert_shared_count u32              = 1
+llama_model_loader: - kv  28:          glm4moe.leading_dense_block_count u32              = 3
+llama_model_loader: - kv  29:                 glm4moe.expert_gating_func u32              = 2
+llama_model_loader: - kv  30:               glm4moe.expert_weights_scale f32              = 2.500000
+llama_model_loader: - kv  31:                glm4moe.expert_weights_norm bool             = true
+llama_model_loader: - kv  32:               glm4moe.nextn_predict_layers u32              = 1
+llama_model_loader: - kv  33:               general.quantization_version u32              = 2
+llama_model_loader: - kv  34:                       tokenizer.ggml.model str              = gpt2
+llama_model_loader: - kv  35:                         tokenizer.ggml.pre str              = glm4
+llama_model_loader: - kv  36:                      tokenizer.ggml.tokens arr[str,151552]  = ["!", "\"", "#", "$", "%", "&", "'", ...
+llama_model_loader: - kv  37:                  tokenizer.ggml.token_type arr[i32,151552]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
+llama_model_loader: - kv  38:                      tokenizer.ggml.merges arr[str,318088]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
+llama_model_loader: - kv  39:                tokenizer.ggml.eos_token_id u32              = 151329
+llama_model_loader: - kv  40:            tokenizer.ggml.padding_token_id u32              = 151329
+llama_model_loader: - kv  41:                tokenizer.ggml.bos_token_id u32              = 151331
+llama_model_loader: - kv  42:                tokenizer.ggml.eot_token_id u32              = 151336
+llama_model_loader: - kv  43:            tokenizer.ggml.unknown_token_id u32              = 151329
+llama_model_loader: - kv  44:                tokenizer.ggml.eom_token_id u32              = 151338
+llama_model_loader: - kv  45:                    tokenizer.chat_template str              = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
+llama_model_loader: - type  f32:  835 tensors
+llama_model_loader: - type q8_0:  926 tensors
+load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: printing all EOG tokens:
+load:   - 151329 ('<|endoftext|>')
+load:   - 151336 ('<|user|>')
+load:   - 151338 ('<|observation|>')
+load: special tokens cache size = 36
+load: token to piece cache size = 0.9713 MB
+llm_load_print_meta: format           = GGUF V3 (latest)
+llm_load_print_meta: arch             = glm4moe
+llm_load_print_meta: n_ctx_train      = 202752
+llm_load_print_meta: n_embd           = 5120
+llm_load_print_meta: n_layer          = 93
+llm_load_print_meta: n_head           = 96
+llm_load_print_meta: n_head_kv        = 8
+llm_load_print_meta: n_rot            = 64
+llm_load_print_meta: n_swa            = 0
+llm_load_print_meta: n_swa_pattern    = 1
+llm_load_print_meta: n_embd_head_k    = 128
+llm_load_print_meta: n_embd_head_v    = 128
+llm_load_print_meta: n_gqa            = 12
+llm_load_print_meta: n_embd_k_gqa     = 1024
+llm_load_print_meta: n_embd_v_gqa     = 1024
+llm_load_print_meta: f_norm_eps       = 0.0e+00
+llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
+llm_load_print_meta: f_clamp_kqv      = 0.0e+00
+llm_load_print_meta: f_max_alibi_bias = 0.0e+00
+llm_load_print_meta: f_logit_scale    = 0.0e+00
+llm_load_print_meta: n_ff             = 12288
+llm_load_print_meta: n_expert         = 160
+llm_load_print_meta: n_expert_used    = 8
+llm_load_print_meta: causal attn      = 1
+llm_load_print_meta: pooling type     = 0
+llm_load_print_meta: rope type        = 2
+llm_load_print_meta: rope scaling     = linear
+llm_load_print_meta: freq_base_train  = 1000000.0
+llm_load_print_meta: freq_scale_train = 1
+llm_load_print_meta: n_ctx_orig_yarn  = 202752
+llm_load_print_meta: rope_finetuned   = unknown
+llm_load_print_meta: ssm_d_conv       = 0
+llm_load_print_meta: ssm_d_inner      = 0
+llm_load_print_meta: ssm_d_state      = 0
+llm_load_print_meta: ssm_dt_rank      = 0
+llm_load_print_meta: model type       = 355B.A32B
+llm_load_print_meta: model ftype      = Q8_0
+llm_load_print_meta: model params     = 358.338 B
+llm_load_print_meta: model size       = 354.794 GiB (8.505 BPW)
+llm_load_print_meta: repeating layers = 353.259 GiB (8.505 BPW, 356.786 B parameters)
+llm_load_print_meta: general.name     = GLM 4.7
+print_info: vocab type       = BPE
+print_info: n_vocab          = 151552
+print_info: n_merges         = 318088
+print_info: BOS token        = 151331 '[gMASK]'
+print_info: EOS token        = 151329 '<|endoftext|>'
+print_info: EOT token        = 151336 '<|user|>'
+print_info: EOM token        = 151338 '<|observation|>'
+print_info: UNK token        = 151329 '<|endoftext|>'
+print_info: PAD token        = 151329 '<|endoftext|>'
+print_info: LF token         = 198 'Ċ'
+print_info: FIM PRE token    = 151347 '<|code_prefix|>'
+print_info: FIM SUF token    = 151349 '<|code_suffix|>'
+print_info: FIM MID token    = 151348 '<|code_middle|>'
+print_info: EOG token        = 151329 '<|endoftext|>'
+print_info: EOG token        = 151336 '<|user|>'
+print_info: EOG token        = 151338 '<|observation|>'
+print_info: max token length = 1024
+llm_load_tensors: ggml ctx size =    0.72 MiB
+model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.attn_q.weight (size = 66846720 bytes) -- ignoring
+model has unused tensor blk.92.attn_k.weight (size = 5570560 bytes) -- ignoring
+model has unused tensor blk.92.attn_v.weight (size = 5570560 bytes) -- ignoring
+model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
+model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
+model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
+model has unused tensor blk.92.attn_output.weight (size = 66846720 bytes) -- ignoring
+model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
+model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
+model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
+model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_exps.weight (size = 1336934400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_down_exps.weight (size = 1336934400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_up_exps.weight (size = 1336934400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_shexp.weight (size = 8355840 bytes) -- ignoring
+model has unused tensor blk.92.ffn_down_shexp.weight (size = 8355840 bytes) -- ignoring
+model has unused tensor blk.92.ffn_up_shexp.weight (size = 8355840 bytes) -- ignoring
+model has unused tensor blk.92.nextn.eh_proj.weight (size = 55705600 bytes) -- ignoring
+model has unused tensor blk.92.nextn.embed_tokens.weight (size = 824442880 bytes) -- ignoring
+model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.shared_head_head.weight (size = 824442880 bytes) -- ignoring
+model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring
+llm_load_tensors: offloading 0 repeating layers to GPU
+llm_load_tensors: offloaded 0/94 layers to GPU
+llm_load_tensors:        CPU buffer size = 357693.32 MiB
+....................................................................................................
+llama_new_context_with_model: n_ctx         = 4096
+llama_new_context_with_model: n_batch       = 4096
+llama_new_context_with_model: n_ubatch      = 4096
+llama_new_context_with_model: flash_attn    = 1
+llama_new_context_with_model: attn_max_b    = 0
+llama_new_context_with_model: fused_moe     = 1
+llama_new_context_with_model: grouped er    = 0
+llama_new_context_with_model: fused_up_gate = 1
+llama_new_context_with_model: fused_mmad    = 1
+llama_new_context_with_model: rope_cache    = 0
+llama_new_context_with_model: graph_reuse   = 0
+llama_new_context_with_model: k_cache_hadam = 0
+llama_new_context_with_model: split_mode_graph_scheduling = 0
+llama_new_context_with_model: ser           = -1, 0
+llama_new_context_with_model: freq_base     = 1000000.0
+llama_new_context_with_model: freq_scale    = 1
+llama_kv_cache_init:        CPU KV buffer size =  1472.00 MiB
+llama_new_context_with_model: KV self size  = 1472.00 MiB, K (f16):  736.00 MiB, V (f16):  736.00 MiB
+llama_new_context_with_model:        CPU  output buffer size =     4.63 MiB
+llama_new_context_with_model:        CPU compute buffer size =  2448.00 MiB
+llama_new_context_with_model: graph nodes  = 4094
+llama_new_context_with_model: graph splits = 1
+XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
+system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
+perplexity: tokenizing the input ..
+perplexity: tokenization took 360.715 ms
+perplexity: calculating perplexity over 565 chunks, n_ctx=512, batch_size=4096, n_seq=8
+perplexity: 25.21 seconds per pass - ETA 29.67 minutes
+======================================= HAVE_FANCY_SIMD is defined
+[1]2.8959,[2]3.6434,[3]2.8310,[4]2.5436,[5]2.5836,[6]2.7702,[7]2.8400,[8]2.8174,[9]2.9294,[10]2.8401,[11]2.8496,[12]3.0270,[13]3.0620,[14]3.0705,[15]3.2246,[16]3.3080,[17]3.4611,[18]3.6886,[19]3.6442,[20]3.6780,[21]3.7565,[22]3.7264,[23]3.6369,[24]3.5435,[25]3.4685,[26]3.4180,[27]3.3834,[28]3.4027,[29]3.4477,[30]3.5210,[31]3.5891,[32]3.6516,[33]3.7132,[34]3.7451,[35]3.8137,[36]3.8594,[37]3.8661,[38]3.9354,[39]3.9641,[40]4.0048,[41]4.0810,[42]4.1113,[43]4.1201,[44]4.1480,[45]4.2458,[46]4.3072,[47]4.2442,[48]4.1574,[49]4.0901,[50]4.0465,[51]4.0657,[52]4.0822,[53]4.1156,[54]4.1056,[55]4.1212,[56]4.1420,[57]4.1044,[58]4.1039,[59]4.0942,[60]4.1331,[61]4.1697,[62]4.2170,[63]4.2465,[64]4.2550,[65]4.2505,[66]4.2230,[67]4.1734,[68]4.1364,[69]4.1563,[70]4.1579,[71]4.1559,[72]4.1545,[73]4.1637,[74]4.1959,[75]4.2064,[76]4.1549,[77]4.1298,[78]4.1117,[79]4.0598,[80]4.0158,[81]4.0294,[82]4.0146,[83]4.0108,[84]4.0195,[85]3.9923,[86]3.9916,[87]3.9726,[88]3.9728,[89]3.9561,[90]3.9278,[91]3.8956,[92]3.9027,[93]3.9185,[94]3.9019,[95]3.9027,[96]3.9306,[97]3.9850,[98]3.9926,[99]3.9812,[100]3.9634,[101]3.9818,[102]3.9879,[103]4.0120,[104]4.0010,[105]4.0156,[106]4.0467,[107]4.1231,[108]4.1280,[109]4.1357,[110]4.1785,[111]4.2032,[112]4.1710,[113]4.1368,[114]4.1108,[115]4.0851,[116]4.0736,[117]4.0560,[118]4.0587,[119]4.0481,[120]4.0329,[121]4.0274,[122]4.0069,[123]3.9776,[124]3.9540,[125]3.9373,[126]3.9145,[127]3.9066,[128]3.8968,[129]3.8941,[130]3.8821,[131]3.8662,[132]3.8516,[133]3.8443,[134]3.8524,[135]3.8710,[136]3.8620,[137]3.8627,[138]3.8528,[139]3.8387,[140]3.8534,[141]3.8502,[142]3.8484,[143]3.8394,[144]3.8351,[145]3.8286,[146]3.8248,[147]3.8222,[148]3.8222,[149]3.8209,[150]3.8206,[151]3.8077,[152]3.7998,[153]3.8008,[154]3.7940,[155]3.7907,[156]3.7890,[157]3.7872,[158]3.7872,[159]3.8032,[160]3.8161,[161]3.8224,[162]3.8287,[163]3.8204,[164]3.8305,[165]3.8369,[166]3.8625,[167]3.8860,[168]3.8959,[169]3.9262,[170]3.9469,[171]3.9577,[172]3.9865,[173]3.9727,[174]3.9564,[175]3.9329,[176]3.9114,[177]3.8957,[178]3.8778,[179]3.8558,[180]3.8507,[181]3.8438,[182]3.8594,[183]3.8794,[184]3.9089,[185]3.9272,[186]3.9334,[187]3.9559,[188]3.9880,[189]4.0094,[190]4.0229,[191]4.0421,[192]4.0492,[193]4.0587,[194]4.0587,[195]4.0535,[196]4.0500,[197]4.0631,[198]4.0796,[199]4.0712,[200]4.0760,[201]4.0759,[202]4.0756,[203]4.0696,[204]4.0778,[205]4.0834,[206]4.0877,[207]4.0912,[208]4.0955,[209]4.0946,[210]4.0919,[211]4.0961,[212]4.0910,[213]4.0873,[214]4.0881,[215]4.0889,[216]4.0900,[217]4.0884,[218]4.0964,[219]4.0895,[220]4.0854,[221]4.0809,[222]4.0791,[223]4.0787,[224]4.0801,[225]4.0796,[226]4.0838,[227]4.0773,[228]4.0732,[229]4.0597,[230]4.0484,[231]4.0414,[232]4.0428,[233]4.0405,[234]4.0377,[235]4.0286,[236]4.0343,[237]4.0335,[238]4.0402,[239]4.0498,[240]4.0630,[241]4.0730,[242]4.0821,[243]4.0943,[244]4.1055,[245]4.1200,[246]4.1317,[247]4.1456,[248]4.1515,[249]4.1541,[250]4.1523,[251]4.1369,[252]4.1263,[253]4.1245,[254]4.1243,[255]4.1253,[256]4.1307,[257]4.1307,[258]4.1305,[259]4.1322,[260]4.1356,[261]4.1327,[262]4.1342,[263]4.1330,[264]4.1324,[265]4.1325,[266]4.1326,[267]4.1305,[268]4.1289,[269]4.1260,[270]4.1317,[271]4.1313,[272]4.1257,[273]4.1244,[274]4.1134,[275]4.1103,[276]4.0964,[277]4.0917,[278]4.0874,[279]4.0890,[280]4.0952,[281]4.0968,[282]4.1033,[283]4.1100,[284]4.1127,[285]4.1175,[286]4.1287,[287]4.1437,[288]4.1408,[289]4.1397,[290]4.1404,[291]4.1400,[292]4.1343,[293]4.1202,[294]4.1160,[295]4.1165,[296]4.1065,[297]4.0942,[298]4.0864,[299]4.0755,[300]4.0647,[301]4.0612,[302]4.0495,[303]4.0404,[304]4.0284,[305]4.0184,[306]4.0143,[307]4.0180,[308]4.0233,[309]4.0367,[310]4.0238,[311]4.0212,[312]4.0107,[313]4.0034,[314]3.9977,[315]3.9950,[316]3.9859,[317]3.9778,[318]3.9702,[319]3.9620,[320]3.9558,[321]3.9499,[322]3.9452,[323]3.9349,[324]3.9271,[325]3.9225,[326]3.9163,[327]3.9167,[328]3.9152,[329]3.9142,[330]3.9111,[331]3.9074,[332]3.9128,[333]3.9161,[334]3.9195,[335]3.9202,[336]3.9201,[337]3.9212,[338]3.9199,[339]3.9193,[340]3.9212,[341]3.9228,[342]3.9258,[343]3.9339,[344]3.9405,[345]3.9526,[346]3.9522,[347]3.9450,[348]3.9423,[349]3.9441,[350]3.9367,[351]3.9249,[352]3.9172,[353]3.9145,[354]3.9164,[355]3.9242,[356]3.9373,[357]3.9406,[358]3.9443,[359]3.9535,[360]3.9655,[361]3.9672,[362]3.9728,[363]3.9787,[364]3.9844,[365]3.9868,[366]3.9910,[367]3.9949,[368]4.0013,[369]4.0081,[370]4.0143,[371]4.0169,[372]4.0260,[373]4.0397,[374]4.0495,[375]4.0549,[376]4.0584,[377]4.0628,[378]4.0758,[379]4.0880,[380]4.0904,[381]4.0858,[382]4.0842,[383]4.0851,[384]4.0921,[385]4.0958,[386]4.0998,[387]4.1016,[388]4.1047,[389]4.1113,[390]4.1120,[391]4.1027,[392]4.0948,[393]4.0865,[394]4.0821,[395]4.0769,[396]4.0705,[397]4.0613,[398]4.0546,[399]4.0499,[400]4.0388,[401]4.0347,[402]4.0352,[403]4.0264,[404]4.0172,[405]4.0147,[406]4.0081,[407]3.9994,[408]3.9898,[409]3.9830,[410]3.9754,[411]3.9729,[412]3.9715,[413]3.9726,[414]3.9661,[415]3.9669,[416]3.9641,[417]3.9565,[418]3.9474,[419]3.9536,[420]3.9480,[421]3.9502,[422]3.9511,[423]3.9437,[424]3.9430,[425]3.9426,[426]3.9427,[427]3.9407,[428]3.9416,[429]3.9369,[430]3.9371,[431]3.9370,[432]3.9307,[433]3.9249,[434]3.9173,[435]3.9165,[436]3.9090,[437]3.9024,[438]3.8963,[439]3.8945,[440]3.8950,[441]3.8937,[442]3.8922,[443]3.8990,[444]3.9095,[445]3.9055,[446]3.9026,[447]3.9006,[448]3.8990,[449]3.9045,[450]3.9039,[451]3.9022,[452]3.9055,[453]3.9131,[454]3.9161,[455]3.9167,[456]3.9206,[457]3.9207,[458]3.9231,[459]3.9236,[460]3.9291,[461]3.9344,[462]3.9372,[463]3.9386,[464]3.9347,[465]3.9335,[466]3.9423,[467]3.9422,[468]3.9420,[469]3.9482,[470]3.9498,[471]3.9545,[472]3.9600,[473]3.9611,[474]3.9597,[475]3.9624,[476]3.9648,[477]3.9676,[478]3.9668,[479]3.9677,[480]3.9684,[481]3.9708,[482]3.9717,[483]3.9769,[484]3.9736,[485]3.9763,[486]3.9747,[487]3.9800,[488]3.9868,[489]3.9931,[490]3.9936,[491]3.9980,[492]4.0018,[493]4.0046,[494]4.0104,[495]4.0159,[496]4.0151,[497]4.0141,[498]4.0144,[499]4.0157,[500]4.0174,[501]4.0171,[502]4.0167,[503]4.0213,[504]4.0270,[505]4.0272,[506]4.0266,[507]4.0290,[508]4.0340,[509]4.0422,[510]4.0450,[511]4.0495,[512]4.0437,[513]4.0411,[514]4.0371,[515]4.0380,[516]4.0353,[517]4.0336,[518]4.0324,[519]4.0278,[520]4.0275,[521]4.0273,[522]4.0228,[523]4.0213,[524]4.0233,[525]4.0224,[526]4.0202,[527]4.0225,[528]4.0174,[529]4.0120,[530]4.0074,[531]4.0026,[532]4.0026,[533]4.0000,[534]3.9972,[535]3.9927,[536]3.9871,[537]3.9798,[538]3.9782,[539]3.9698,[540]3.9683,[541]3.9724,[542]3.9702,[543]3.9649,[544]3.9627,[545]3.9644,[546]3.9640,[547]3.9662,[548]3.9646,[549]3.9587,[550]3.9531,[551]3.9488,[552]3.9426,[553]3.9391,[554]3.9354,[555]3.9293,[556]3.9236,[557]3.9196,[558]3.9225,[559]3.9208,[560]3.9187,[561]3.9201,[562]3.9244,[563]3.9297,[564]3.9335,[565]3.9320,
+llama_print_timings:        load time =  137402.15 ms
+llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_print_timings: prompt eval time = 1530320.65 ms / 289280 tokens (    5.29 ms per token,   189.03 tokens per second)
+llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_print_timings:       total time = 1541349.05 ms / 289281 tokens
+Final estimate: PPL over 565 chunks for n_ctx=512 = 3.9320 +/- 0.02428