upload imatrix log file has cossim per layer

Browse files

Files changed (2) hide show

README.md +2 -2
logs/imatrix-GLM-4.7-BF16.log +667 -0

README.md CHANGED Viewed

@@ -19,12 +19,12 @@ Currently cooking this now!
 - [x] download bf16 safetensors https://huggingface.co/zai-org/GLM-4.7
 - [x] use llama.cpp/convert_hf_to_gguf.py to create bf16 GGUF
-- [ ] calculate imatrix and upload to HF first so others can use as desired
 - [ ] cook Q8_0 and test perplexity of BF16 and Q8_0 for baseline data
 - [ ] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
 - [ ] upload IQ5_K if all looking good
 - [ ] continue with smaller quants
-- [ ] chek if any folks open discussions with desired RAM/VRAM breakpoints
 ## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.7
 *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.

 - [x] download bf16 safetensors https://huggingface.co/zai-org/GLM-4.7
 - [x] use llama.cpp/convert_hf_to_gguf.py to create bf16 GGUF
+- [x] calculate imatrix and upload to HF first so others can use as desired
 - [ ] cook Q8_0 and test perplexity of BF16 and Q8_0 for baseline data
 - [ ] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
 - [ ] upload IQ5_K if all looking good
 - [ ] continue with smaller quants
+- [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
 ## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.7
 *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.

logs/imatrix-GLM-4.7-BF16.log ADDED Viewed

	@@ -0,0 +1,667 @@

+model=/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf
+numactl -N ${SOCKET} -m ${SOCKET} \
+./build/bin/llama-imatrix \
+    --model "$model"\
+    -f ubergarm-imatrix-calibration-corpus-v02.txt \
+    -o /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
+    --no-fused-moe \
+    --no-fused-up-gate \
+    --no-fused-mul-multiadd \
+    --ctx-size 512 \
+    -ub 4096 -b 4096 \
+    --threads 96 \
+    --threads-batch 128 \
+    --no-mmap \
+    --numa numactl \
+    --verbosity 1 \
+    --layer-similarity
+CPU: using device CPU - 0 MiB free
+llama_model_loader: additional 14 GGUFs metadata loaded.
+llama_model_loader: loaded meta data with 49 key-value pairs and 1761 tensors from /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf (version GGUF V3 (latest))
+llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
+llama_model_loader: - kv   0:                       general.architecture str              = glm4moe
+llama_model_loader: - kv   1:                               general.type str              = model
+llama_model_loader: - kv   2:                      general.sampling.temp f32              = 1.000000
+llama_model_loader: - kv   3:                               general.name str              = GLM 4.7
+llama_model_loader: - kv   4:                            general.version str              = 4.7
+llama_model_loader: - kv   5:                           general.basename str              = GLM
+llama_model_loader: - kv   6:                         general.size_label str              = 160x21B
+llama_model_loader: - kv   7:                            general.license str              = mit
+llama_model_loader: - kv   8:                               general.tags arr[str,1]       = ["text-generation"]
+llama_model_loader: - kv   9:                          general.languages arr[str,2]       = ["en", "zh"]
+llama_model_loader: - kv  10:                        glm4moe.block_count u32              = 93
+llama_model_loader: - kv  11:                     glm4moe.context_length u32              = 202752
+llama_model_loader: - kv  12:                   glm4moe.embedding_length u32              = 5120
+llama_model_loader: - kv  13:                glm4moe.feed_forward_length u32              = 12288
+llama_model_loader: - kv  14:               glm4moe.attention.head_count u32              = 96
+llama_model_loader: - kv  15:            glm4moe.attention.head_count_kv u32              = 8
+llama_model_loader: - kv  16:                     glm4moe.rope.freq_base f32              = 1000000.000000
+llama_model_loader: - kv  17:   glm4moe.attention.layer_norm_rms_epsilon f32              = 0.000010
+llama_model_loader: - kv  18:                  glm4moe.expert_used_count u32              = 8
+llama_model_loader: - kv  19:                 glm4moe.expert_group_count u32              = 1
+llama_model_loader: - kv  20:            glm4moe.expert_group_used_count u32              = 1
+llama_model_loader: - kv  21:               glm4moe.attention.key_length u32              = 128
+llama_model_loader: - kv  22:             glm4moe.attention.value_length u32              = 128
+llama_model_loader: - kv  23:                          general.file_type u32              = 32
+llama_model_loader: - kv  24:               glm4moe.rope.dimension_count u32              = 64
+llama_model_loader: - kv  25:                       glm4moe.expert_count u32              = 160
+llama_model_loader: - kv  26:         glm4moe.expert_feed_forward_length u32              = 1536
+llama_model_loader: - kv  27:                glm4moe.expert_shared_count u32              = 1
+llama_model_loader: - kv  28:          glm4moe.leading_dense_block_count u32              = 3
+llama_model_loader: - kv  29:                 glm4moe.expert_gating_func u32              = 2
+llama_model_loader: - kv  30:               glm4moe.expert_weights_scale f32              = 2.500000
+llama_model_loader: - kv  31:                glm4moe.expert_weights_norm bool             = true
+llama_model_loader: - kv  32:               glm4moe.nextn_predict_layers u32              = 1
+llama_model_loader: - kv  33:               general.quantization_version u32              = 2
+llama_model_loader: - kv  34:                       tokenizer.ggml.model str              = gpt2
+llama_model_loader: - kv  35:                         tokenizer.ggml.pre str              = glm4
+llama_model_loader: - kv  36:                      tokenizer.ggml.tokens arr[str,151552]  = ["!", "\"", "#", "$", "%", "&", "'", ...
+llama_model_loader: - kv  37:                  tokenizer.ggml.token_type arr[i32,151552]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
+llama_model_loader: - kv  38:                      tokenizer.ggml.merges arr[str,318088]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
+llama_model_loader: - kv  39:                tokenizer.ggml.eos_token_id u32              = 151329
+llama_model_loader: - kv  40:            tokenizer.ggml.padding_token_id u32              = 151329
+llama_model_loader: - kv  41:                tokenizer.ggml.bos_token_id u32              = 151331
+llama_model_loader: - kv  42:                tokenizer.ggml.eot_token_id u32              = 151336
+llama_model_loader: - kv  43:            tokenizer.ggml.unknown_token_id u32              = 151329
+llama_model_loader: - kv  44:                tokenizer.ggml.eom_token_id u32              = 151338
+llama_model_loader: - kv  45:                    tokenizer.chat_template str              = [gMASK]<sop>\n{%- if tools -%}\n<|syste...
+llama_model_loader: - kv  46:                                   split.no u16              = 0
+llama_model_loader: - kv  47:                                split.count u16              = 15
+llama_model_loader: - kv  48:                        split.tensors.count i32              = 1761
+llama_model_loader: - type  f32:  835 tensors
+llama_model_loader: - type bf16:  926 tensors
+load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: printing all EOG tokens:
+load:   - 151329 ('<|endoftext|>')
+load:   - 151336 ('<|user|>')
+load:   - 151338 ('<|observation|>')
+load: special tokens cache size = 36
+load: token to piece cache size = 0.9713 MB
+llm_load_print_meta: format           = GGUF V3 (latest)
+llm_load_print_meta: arch             = glm4moe
+llm_load_print_meta: n_ctx_train      = 202752
+llm_load_print_meta: n_embd           = 5120
+llm_load_print_meta: n_layer          = 93
+llm_load_print_meta: n_head           = 96
+llm_load_print_meta: n_head_kv        = 8
+llm_load_print_meta: n_rot            = 64
+llm_load_print_meta: n_swa            = 0
+llm_load_print_meta: n_swa_pattern    = 1
+llm_load_print_meta: n_embd_head_k    = 128
+llm_load_print_meta: n_embd_head_v    = 128
+llm_load_print_meta: n_gqa            = 12
+llm_load_print_meta: n_embd_k_gqa     = 1024
+llm_load_print_meta: n_embd_v_gqa     = 1024
+llm_load_print_meta: f_norm_eps       = 0.0e+00
+llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
+llm_load_print_meta: f_clamp_kqv      = 0.0e+00
+llm_load_print_meta: f_max_alibi_bias = 0.0e+00
+llm_load_print_meta: f_logit_scale    = 0.0e+00
+llm_load_print_meta: n_ff             = 12288
+llm_load_print_meta: n_expert         = 160
+llm_load_print_meta: n_expert_used    = 8
+llm_load_print_meta: causal attn      = 1
+llm_load_print_meta: pooling type     = 0
+llm_load_print_meta: rope type        = 2
+llm_load_print_meta: rope scaling     = linear
+llm_load_print_meta: freq_base_train  = 1000000.0
+llm_load_print_meta: freq_scale_train = 1
+llm_load_print_meta: n_ctx_orig_yarn  = 202752
+llm_load_print_meta: rope_finetuned   = unknown
+llm_load_print_meta: ssm_d_conv       = 0
+llm_load_print_meta: ssm_d_inner      = 0
+llm_load_print_meta: ssm_d_state      = 0
+llm_load_print_meta: ssm_dt_rank      = 0
+llm_load_print_meta: model type       = 355B.A32B
+llm_load_print_meta: model ftype      = BF16
+llm_load_print_meta: model params     = 358.338 B
+llm_load_print_meta: model size       = 667.598 GiB (16.003 BPW)
+llm_load_print_meta: repeating layers = 664.707 GiB (16.003 BPW, 356.786 B parameters)
+llm_load_print_meta: general.name     = GLM 4.7
+print_info: vocab type       = BPE
+print_info: n_vocab          = 151552
+print_info: n_merges         = 318088
+print_info: BOS token        = 151331 '[gMASK]'
+print_info: EOS token        = 151329 '<|endoftext|>'
+print_info: EOT token        = 151336 '<|user|>'
+print_info: EOM token        = 151338 '<|observation|>'
+print_info: UNK token        = 151329 '<|endoftext|>'
+print_info: PAD token        = 151329 '<|endoftext|>'
+print_info: LF token         = 198 'Ċ'
+print_info: FIM PRE token    = 151347 '<|code_prefix|>'
+print_info: FIM SUF token    = 151349 '<|code_suffix|>'
+print_info: FIM MID token    = 151348 '<|code_middle|>'
+print_info: EOG token        = 151329 '<|endoftext|>'
+print_info: EOG token        = 151336 '<|user|>'
+print_info: EOG token        = 151338 '<|observation|>'
+print_info: max token length = 1024
+llm_load_tensors: ggml ctx size =    0.72 MiB
+model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.attn_q.weight (size = 125829120 bytes) -- ignoring
+model has unused tensor blk.92.attn_k.weight (size = 10485760 bytes) -- ignoring
+model has unused tensor blk.92.attn_v.weight (size = 10485760 bytes) -- ignoring
+model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
+model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
+model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
+model has unused tensor blk.92.attn_output.weight (size = 125829120 bytes) -- ignoring
+model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
+model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
+model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
+model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_exps.weight (size = 2516582400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_down_exps.weight (size = 2516582400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_up_exps.weight (size = 2516582400 bytes) -- ignoring
+model has unused tensor blk.92.ffn_gate_shexp.weight (size = 15728640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_down_shexp.weight (size = 15728640 bytes) -- ignoring
+model has unused tensor blk.92.ffn_up_shexp.weight (size = 15728640 bytes) -- ignoring
+model has unused tensor blk.92.nextn.eh_proj.weight (size = 104857600 bytes) -- ignoring
+model has unused tensor blk.92.nextn.embed_tokens.weight (size = 1551892480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.shared_head_head.weight (size = 1551892480 bytes) -- ignoring
+model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring
+llm_load_tensors: offloading 0 repeating layers to GPU
+llm_load_tensors: offloaded 0/94 layers to GPU
+llm_load_tensors:        CPU buffer size = 673051.91 MiB
+....................................................................................................
+llama_new_context_with_model: n_ctx         = 512
+llama_new_context_with_model: n_batch       = 512
+llama_new_context_with_model: n_ubatch      = 512
+llama_new_context_with_model: flash_attn    = 1
+llama_new_context_with_model: attn_max_b    = 0
+llama_new_context_with_model: fused_moe     = 0
+llama_new_context_with_model: grouped er    = 0
+llama_new_context_with_model: fused_up_gate = 0
+llama_new_context_with_model: fused_mmad    = 0
+llama_new_context_with_model: rope_cache    = 0
+llama_new_context_with_model: graph_reuse   = 0
+llama_new_context_with_model: k_cache_hadam = 0
+llama_new_context_with_model: split_mode_graph_scheduling = 0
+llama_new_context_with_model: ser           = -1, 0
+llama_new_context_with_model: freq_base     = 1000000.0
+llama_new_context_with_model: freq_scale    = 1
+llama_kv_cache_init:        CPU KV buffer size =   184.00 MiB
+llama_new_context_with_model: KV self size  =  184.00 MiB, K (f16):   92.00 MiB, V (f16):   92.00 MiB
+llama_new_context_with_model:        CPU  output buffer size =     0.58 MiB
+llama_new_context_with_model:        CPU compute buffer size =   306.00 MiB
+llama_new_context_with_model: graph nodes  = 4634
+llama_new_context_with_model: graph splits = 1
+XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
+system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
+compute_imatrix: tokenizing the input ..
+compute_imatrix: tokenization took 508.555 ms
+compute_imatrix: computing over 814 chunks with batch_size 512
+compute_imatrix: 9.95 seconds per pass - ETA 2 hours 15.02 minutes
+======================================= HAVE_FANCY_SIMD is defined
+[1]17.5129,[2]6.9568,[3]4.5205,[4]3.2674,[5]2.6460,[6]2.2556,[7]2.0217,[8]1.8697,[9]1.8579,
+save_imatrix: entry '             blk.73.ffn_gate_exps.weight' has partial data (98.75%) 2 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '               blk.73.ffn_up_exps.weight' has partial data (98.75%) 2 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '             blk.56.ffn_down_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '             blk.56.ffn_gate_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '               blk.56.ffn_up_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '             blk.48.ffn_gate_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '              blk.6.ffn_gate_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '              blk.6.ffn_down_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '                blk.6.ffn_up_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '             blk.73.ffn_down_exps.weight' has partial data (98.75%) 2 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '             blk.48.ffn_down_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: entry '               blk.48.ffn_up_exps.weight' has partial data (99.38%) 1 out of 160 experts are missing data Storing **but be aware**
+save_imatrix: stored collected data after 10 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[10]1.7776,[11]1.8889,[12]1.9858,[13]2.0575,[14]2.1264,[15]2.0255,[16]1.9435,[17]1.8890,[18]1.8325,[19]1.7753,
+save_imatrix: stored collected data after 20 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[20]1.7407,[21]1.6968,[22]1.6679,[23]1.6358,[24]1.6059,[25]1.5762,[26]1.6561,[27]1.7531,[28]1.8685,[29]1.8406,
+save_imatrix: stored collected data after 30 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[30]1.8213,[31]1.8366,[32]1.8313,[33]1.9053,[34]1.8840,[35]1.8786,[36]1.8690,[37]1.8635,[38]1.8952,[39]1.9113,
+save_imatrix: stored collected data after 40 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[40]1.9002,[41]1.9285,[42]1.9373,[43]1.9526,[44]1.9645,[45]1.9712,[46]1.9569,[47]1.9671,[48]1.9661,[49]1.9672,
+save_imatrix: stored collected data after 50 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[50]1.9573,[51]1.9761,[52]1.9892,[53]1.9766,[54]1.9847,[55]1.9872,[56]1.9924,[57]1.9852,[58]2.0341,[59]2.0846,
+save_imatrix: stored collected data after 60 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[60]2.1336,[61]2.1480,[62]2.1927,[63]2.2231,[64]2.2163,[65]2.2162,[66]2.2193,[67]2.2055,[68]2.2219,[69]2.2626,
+save_imatrix: stored collected data after 70 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[70]2.3155,[71]2.3455,[72]2.3846,[73]2.4166,[74]2.4360,[75]2.4648,[76]2.4808,[77]2.5090,[78]2.5058,[79]2.4882,
+save_imatrix: stored collected data after 80 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[80]2.4855,[81]2.4882,[82]2.5159,[83]2.5581,[84]2.5770,[85]2.5819,[86]2.5855,[87]2.5763,[88]2.5781,[89]2.5664,
+save_imatrix: stored collected data after 90 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[90]2.5561,[91]2.5523,[92]2.5356,[93]2.5154,[94]2.5440,[95]2.5928,[96]2.6132,[97]2.6158,[98]2.6236,[99]2.6440,
+save_imatrix: stored collected data after 100 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[100]2.6613,[101]2.6685,[102]2.6709,[103]2.7057,[104]2.7300,[105]2.7232,[106]2.7660,[107]2.8099,[108]2.8406,[109]2.8811,
+save_imatrix: stored collected data after 110 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[110]2.9123,[111]2.9467,[112]2.9798,[113]2.9740,[114]2.9899,[115]3.0050,[116]3.0137,[117]3.0245,[118]3.0569,[119]3.0943,
+save_imatrix: stored collected data after 120 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[120]3.1353,[121]3.1299,[122]3.1032,[123]3.0869,[124]3.1069,[125]3.0955,[126]3.0717,[127]3.0709,[128]3.0689,[129]3.0750,
+save_imatrix: stored collected data after 130 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[130]3.0827,[131]3.0998,[132]3.1160,[133]3.1226,[134]3.1614,[135]3.1798,[136]3.1540,[137]3.1290,[138]3.1061,[139]3.0821,
+save_imatrix: stored collected data after 140 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[140]3.0908,[141]3.1033,[142]3.1438,[143]3.1734,[144]3.1791,[145]3.2029,[146]3.2291,[147]3.2515,[148]3.2845,[149]3.3145,
+save_imatrix: stored collected data after 150 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[150]3.3451,[151]3.3640,[152]3.3849,[153]3.4019,[154]3.4113,[155]3.4073,[156]3.4248,[157]3.4352,[158]3.4461,[159]3.4588,
+save_imatrix: stored collected data after 160 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[160]3.4724,[161]3.4748,[162]3.4794,[163]3.4943,[164]3.4998,[165]3.5079,[166]3.5212,[167]3.5227,[168]3.5252,[169]3.5323,
+save_imatrix: stored collected data after 170 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[170]3.5418,[171]3.5468,[172]3.5522,[173]3.5591,[174]3.5774,[175]3.5892,[176]3.5948,[177]3.6013,[178]3.6183,[179]3.6066,
+save_imatrix: stored collected data after 180 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[180]3.6147,[181]3.6290,[182]3.6530,[183]3.6691,[184]3.6754,[185]3.6775,[186]3.6758,[187]3.6737,[188]3.6741,[189]3.6748,
+save_imatrix: stored collected data after 190 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[190]3.6751,[191]3.6712,[192]3.6937,[193]3.7245,[194]3.7498,[195]3.7776,[196]3.7992,[197]3.8353,[198]3.8457,[199]3.8632,
+save_imatrix: stored collected data after 200 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[200]3.8543,[201]3.8694,[202]3.8607,[203]3.8371,[204]3.8146,[205]3.8352,[206]3.8499,[207]3.8590,[208]3.8685,[209]3.8886,
+save_imatrix: stored collected data after 210 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[210]3.9044,[211]3.9209,[212]3.9405,[213]3.9567,[214]3.9581,[215]3.9352,[216]3.9111,[217]3.8873,[218]3.8636,[219]3.8407,
+save_imatrix: stored collected data after 220 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[220]3.8241,[221]3.8216,[222]3.8120,[223]3.8085,[224]3.7952,[225]3.7765,[226]3.7761,[227]3.7828,[228]3.8044,[229]3.8287,
+save_imatrix: stored collected data after 230 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[230]3.8397,[231]3.8627,[232]3.8583,[233]3.8832,[234]3.9142,[235]3.9274,[236]3.9423,[237]3.9476,[238]3.9719,[239]4.0009,
+save_imatrix: stored collected data after 240 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[240]3.9975,[241]4.0078,[242]4.0223,[243]4.0432,[244]4.0635,[245]4.0782,[246]4.0913,[247]4.1018,[248]4.0917,[249]4.1182,
+save_imatrix: stored collected data after 250 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[250]4.1322,[251]4.1512,[252]4.1620,[253]4.1670,[254]4.1736,[255]4.1769,[256]4.1893,[257]4.1941,[258]4.2055,[259]4.2207,
+save_imatrix: stored collected data after 260 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[260]4.2305,[261]4.2418,[262]4.2543,[263]4.2700,[264]4.2824,[265]4.2997,[266]4.2846,[267]4.2893,[268]4.2945,[269]4.3088,
+save_imatrix: stored collected data after 270 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[270]4.3304,[271]4.3455,[272]4.3672,[273]4.3680,[274]4.3670,[275]4.3766,[276]4.3829,[277]4.3999,[278]4.4148,[279]4.4279,
+save_imatrix: stored collected data after 280 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[280]4.4372,[281]4.4395,[282]4.4538,[283]4.4654,[284]4.4684,[285]4.4848,[286]4.4863,[287]4.4904,[288]4.4993,[289]4.4958,
+save_imatrix: stored collected data after 290 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[290]4.5076,[291]4.5134,[292]4.5196,[293]4.5372,[294]4.5508,[295]4.5653,[296]4.5830,[297]4.5879,[298]4.6079,[299]4.6212,
+save_imatrix: stored collected data after 300 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[300]4.6382,[301]4.6506,[302]4.6647,[303]4.6700,[304]4.6895,[305]4.6978,[306]4.7023,[307]4.7108,[308]4.7293,[309]4.7394,
+save_imatrix: stored collected data after 310 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[310]4.7444,[311]4.7529,[312]4.7618,[313]4.7761,[314]4.7839,[315]4.7935,[316]4.8056,[317]4.8184,[318]4.8331,[319]4.8379,
+save_imatrix: stored collected data after 320 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[320]4.8422,[321]4.8356,[322]4.8464,[323]4.8297,[324]4.8477,[325]4.8512,[326]4.8283,[327]4.8413,[328]4.8518,[329]4.8579,
+save_imatrix: stored collected data after 330 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[330]4.8640,[331]4.8633,[332]4.8671,[333]4.8863,[334]4.8834,[335]4.8949,[336]4.9110,[337]4.9204,[338]4.9253,[339]4.9127,
+save_imatrix: stored collected data after 340 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[340]4.9237,[341]4.9406,[342]4.9567,[343]4.9745,[344]4.9973,[345]5.0267,[346]5.0290,[347]5.0303,[348]5.0331,[349]5.0414,
+save_imatrix: stored collected data after 350 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[350]5.0554,[351]5.0758,[352]5.0761,[353]5.0728,[354]5.0839,[355]5.0801,[356]5.0811,[357]5.0798,[358]5.0753,[359]5.0797,
+save_imatrix: stored collected data after 360 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[360]5.0912,[361]5.0878,[362]5.0861,[363]5.0686,[364]5.0505,[365]5.0334,[366]5.0192,[367]4.9996,[368]4.9826,[369]4.9654,
+save_imatrix: stored collected data after 370 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[370]4.9512,[371]4.9353,[372]4.9188,[373]4.9057,[374]4.8921,[375]4.8736,[376]4.8610,[377]4.8467,[378]4.8301,[379]4.8157,
+save_imatrix: stored collected data after 380 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[380]4.8141,[381]4.7993,[382]4.7929,[383]4.7967,[384]4.7841,[385]4.7779,[386]4.7666,[387]4.7479,[388]4.7307,[389]4.7229,
+save_imatrix: stored collected data after 390 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[390]4.7128,[391]4.6979,[392]4.6796,[393]4.6613,[394]4.6594,[395]4.6573,[396]4.6530,[397]4.6422,[398]4.6436,[399]4.6429,
+save_imatrix: stored collected data after 400 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[400]4.6261,[401]4.6110,[402]4.6038,[403]4.5899,[404]4.5786,[405]4.5680,[406]4.5586,[407]4.5421,[408]4.5262,[409]4.5115,
+save_imatrix: stored collected data after 410 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[410]4.4991,[411]4.4877,[412]4.4820,[413]4.4730,[414]4.4690,[415]4.4643,[416]4.4624,[417]4.4571,[418]4.4519,[419]4.4374,
+save_imatrix: stored collected data after 420 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[420]4.4230,[421]4.4077,[422]4.3944,[423]4.3803,[424]4.3684,[425]4.3546,[426]4.3397,[427]4.3292,[428]4.3145,[429]4.3076,
+save_imatrix: stored collected data after 430 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[430]4.2946,[431]4.2847,[432]4.2735,[433]4.2636,[434]4.2620,[435]4.2610,[436]4.2546,[437]4.2443,[438]4.2379,[439]4.2240,
+save_imatrix: stored collected data after 440 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[440]4.2110,[441]4.1987,[442]4.1868,[443]4.1755,[444]4.1721,[445]4.1629,[446]4.1593,[447]4.1535,[448]4.1430,[449]4.1400,
+save_imatrix: stored collected data after 450 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[450]4.1326,[451]4.1249,[452]4.1137,[453]4.1065,[454]4.0994,[455]4.0910,[456]4.0787,[457]4.0669,[458]4.0547,[459]4.0430,
+save_imatrix: stored collected data after 460 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[460]4.0317,[461]4.0223,[462]4.0129,[463]4.0069,[464]3.9991,[465]3.9951,[466]3.9893,[467]3.9839,[468]3.9785,[469]3.9728,
+save_imatrix: stored collected data after 470 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[470]3.9673,[471]3.9618,[472]3.9564,[473]3.9517,[474]3.9461,[475]3.9405,[476]3.9357,[477]3.9303,[478]3.9250,[479]3.9215,
+save_imatrix: stored collected data after 480 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[480]3.9109,[481]3.9015,[482]3.8973,[483]3.8903,[484]3.8828,[485]3.8726,[486]3.8630,[487]3.8537,[488]3.8443,[489]3.8383,
+save_imatrix: stored collected data after 490 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[490]3.8310,[491]3.8243,[492]3.8204,[493]3.8151,[494]3.8084,[495]3.8004,[496]3.7998,[497]3.7966,[498]3.7917,[499]3.7900,
+save_imatrix: stored collected data after 500 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[500]3.7876,[501]3.7866,[502]3.7874,[503]3.7902,[504]3.7887,[505]3.7828,[506]3.7748,[507]3.7788,[508]3.7894,[509]3.7982,
+save_imatrix: stored collected data after 510 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[510]3.8064,[511]3.8136,[512]3.8212,[513]3.8257,[514]3.8295,[515]3.8312,[516]3.8390,[517]3.8421,[518]3.8486,[519]3.8575,
+save_imatrix: stored collected data after 520 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[520]3.8708,[521]3.8874,[522]3.9011,[523]3.8995,[524]3.9063,[525]3.9102,[526]3.9165,[527]3.9179,[528]3.9201,[529]3.9289,
+save_imatrix: stored collected data after 530 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[530]3.9341,[531]3.9355,[532]3.9425,[533]3.9482,[534]3.9554,[535]3.9553,[536]3.9550,[537]3.9558,[538]3.9602,[539]3.9650,
+save_imatrix: stored collected data after 540 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[540]3.9698,[541]3.9741,[542]3.9765,[543]3.9788,[544]3.9835,[545]3.9886,[546]3.9973,[547]4.0056,[548]4.0122,[549]4.0208,
+save_imatrix: stored collected data after 550 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[550]4.0278,[551]4.0358,[552]4.0422,[553]4.0482,[554]4.0550,[555]4.0610,[556]4.0581,[557]4.0553,[558]4.0519,[559]4.0563,
+save_imatrix: stored collected data after 560 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[560]4.0622,[561]4.0664,[562]4.0717,[563]4.0722,[564]4.0769,[565]4.0773,[566]4.0819,[567]4.0827,[568]4.0828,[569]4.0823,
+save_imatrix: stored collected data after 570 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[570]4.0830,[571]4.0859,[572]4.0823,[573]4.0797,[574]4.0752,[575]4.0715,[576]4.0642,[577]4.0590,[578]4.0525,[579]4.0453,
+save_imatrix: stored collected data after 580 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[580]4.0425,[581]4.0443,[582]4.0423,[583]4.0433,[584]4.0410,[585]4.0407,[586]4.0404,[587]4.0377,[588]4.0320,[589]4.0325,
+save_imatrix: stored collected data after 590 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[590]4.0294,[591]4.0221,[592]4.0155,[593]4.0081,[594]4.0022,[595]3.9988,[596]3.9974,[597]3.9952,[598]3.9942,[599]3.9917,
+save_imatrix: stored collected data after 600 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[600]3.9871,[601]3.9813,[602]3.9814,[603]3.9815,[604]3.9813,[605]3.9772,[606]3.9751,[607]3.9720,[608]3.9753,[609]3.9744,
+save_imatrix: stored collected data after 610 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[610]3.9720,[611]3.9726,[612]3.9723,[613]3.9676,[614]3.9607,[615]3.9530,[616]3.9455,[617]3.9375,[618]3.9301,[619]3.9224,
+save_imatrix: stored collected data after 620 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[620]3.9147,[621]3.9061,[622]3.8977,[623]3.8901,[624]3.8827,[625]3.8750,[626]3.8686,[627]3.8608,[628]3.8540,[629]3.8483,
+save_imatrix: stored collected data after 630 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[630]3.8414,[631]3.8345,[632]3.8297,[633]3.8220,[634]3.8178,[635]3.8159,[636]3.8125,[637]3.8052,[638]3.7995,[639]3.7934,
+save_imatrix: stored collected data after 640 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[640]3.7862,[641]3.7807,[642]3.7742,[643]3.7684,[644]3.7622,[645]3.7555,[646]3.7488,[647]3.7428,[648]3.7423,[649]3.7357,
+save_imatrix: stored collected data after 650 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[650]3.7289,[651]3.7222,[652]3.7158,[653]3.7091,[654]3.7022,[655]3.6956,[656]3.6892,[657]3.6834,[658]3.6769,[659]3.6795,
+save_imatrix: stored collected data after 660 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[660]3.6798,[661]3.6828,[662]3.6807,[663]3.6746,[664]3.6705,[665]3.6650,[666]3.6584,[667]3.6531,[668]3.6477,[669]3.6427,
+save_imatrix: stored collected data after 670 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[670]3.6377,[671]3.6320,[672]3.6260,[673]3.6203,[674]3.6165,[675]3.6115,[676]3.6057,[677]3.6006,[678]3.5948,[679]3.5888,
+save_imatrix: stored collected data after 680 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[680]3.5861,[681]3.5802,[682]3.5752,[683]3.5704,[684]3.5649,[685]3.5604,[686]3.5584,[687]3.5571,[688]3.5532,[689]3.5485,
+save_imatrix: stored collected data after 690 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[690]3.5423,[691]3.5359,[692]3.5305,[693]3.5249,[694]3.5211,[695]3.5185,[696]3.5168,[697]3.5141,[698]3.5125,[699]3.5101,
+save_imatrix: stored collected data after 700 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[700]3.5082,[701]3.5068,[702]3.5052,[703]3.5033,[704]3.5014,[705]3.4998,[706]3.4983,[707]3.4959,[708]3.4946,[709]3.4925,
+save_imatrix: stored collected data after 710 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[710]3.4907,[711]3.4886,[712]3.4894,[713]3.4891,[714]3.4893,[715]3.4904,[716]3.4915,[717]3.4923,[718]3.4931,[719]3.4948,
+save_imatrix: stored collected data after 720 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[720]3.4969,[721]3.4973,[722]3.4981,[723]3.4991,[724]3.5006,[725]3.5017,[726]3.5034,[727]3.5048,[728]3.5068,[729]3.5068,
+save_imatrix: stored collected data after 730 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[730]3.5070,[731]3.5082,[732]3.5111,[733]3.5122,[734]3.5126,[735]3.5127,[736]3.5141,[737]3.5162,[738]3.5169,[739]3.5198,
+save_imatrix: stored collected data after 740 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[740]3.5214,[741]3.5233,[742]3.5248,[743]3.5255,[744]3.5255,[745]3.5267,[746]3.5283,[747]3.5298,[748]3.5312,[749]3.5323,
+save_imatrix: stored collected data after 750 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[750]3.5335,[751]3.5345,[752]3.5365,[753]3.5398,[754]3.5405,[755]3.5417,[756]3.5434,[757]3.5449,[758]3.5457,[759]3.5472,
+save_imatrix: stored collected data after 760 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[760]3.5482,[761]3.5489,[762]3.5507,[763]3.5511,[764]3.5530,[765]3.5540,[766]3.5556,[767]3.5563,[768]3.5573,[769]3.5577,
+save_imatrix: stored collected data after 770 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[770]3.5588,[771]3.5610,[772]3.5617,[773]3.5619,[774]3.5626,[775]3.5646,[776]3.5655,[777]3.5679,[778]3.5679,[779]3.5693,
+save_imatrix: stored collected data after 780 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[780]3.5708,[781]3.5729,[782]3.5750,[783]3.5778,[784]3.5781,[785]3.5787,[786]3.5794,[787]3.5812,[788]3.5814,[789]3.5837,
+save_imatrix: stored collected data after 790 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[790]3.5849,[791]3.5861,[792]3.5863,[793]3.5874,[794]3.5896,[795]3.5911,[796]3.5914,[797]3.5930,[798]3.5942,[799]3.5979,
+save_imatrix: stored collected data after 800 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[800]3.5984,[801]3.5983,[802]3.6000,[803]3.6018,[804]3.6027,[805]3.6036,[806]3.6041,[807]3.6050,[808]3.6054,[809]3.6062,
+save_imatrix: stored collected data after 810 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+[810]3.6083,[811]3.6108,[812]3.6119,[813]3.6131,[814]3.6137,
+save_imatrix: stored collected data after 814 chunks in /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat
+Final estimate: PPL = 3.6137 +/- 0.01805
+======================== sorted layer importances
+  0: Layer   0, <cos_sim> = 0.433589
+  1: Layer   2, <cos_sim> = 0.752289
+  2: Layer   1, <cos_sim> = 0.764358
+  3: Layer   3, <cos_sim> = 0.861103
+  4: Layer   4, <cos_sim> = 0.90387
+  5: Layer  32, <cos_sim> = 0.905589
+  6: Layer   6, <cos_sim> = 0.912358
+  7: Layer  37, <cos_sim> = 0.913118
+  8: Layer  39, <cos_sim> = 0.913941
+  9: Layer  31, <cos_sim> = 0.914878
+ 10: Layer  23, <cos_sim> = 0.915726
+ 11: Layer  91, <cos_sim> = 0.915909
+ 12: Layer  41, <cos_sim> = 0.917222
+ 13: Layer  40, <cos_sim> = 0.918507
+ 14: Layer  33, <cos_sim> = 0.918549
+ 15: Layer  29, <cos_sim> = 0.919203
+ 16: Layer  30, <cos_sim> = 0.919353
+ 17: Layer  28, <cos_sim> = 0.921385
+ 18: Layer  38, <cos_sim> = 0.921396
+ 19: Layer  24, <cos_sim> = 0.922245
+ 20: Layer  34, <cos_sim> = 0.922372
+ 21: Layer  22, <cos_sim> = 0.922432
+ 22: Layer  26, <cos_sim> = 0.924714
+ 23: Layer  36, <cos_sim> = 0.924901
+ 24: Layer  14, <cos_sim> = 0.925139
+ 25: Layer  25, <cos_sim> = 0.9268
+ 26: Layer  13, <cos_sim> = 0.92694
+ 27: Layer  35, <cos_sim> = 0.927297
+ 28: Layer  10, <cos_sim> = 0.927834
+ 29: Layer  27, <cos_sim> = 0.928177
+ 30: Layer  11, <cos_sim> = 0.929866
+ 31: Layer  21, <cos_sim> = 0.929894
+ 32: Layer  85, <cos_sim> = 0.93049
+ 33: Layer   7, <cos_sim> = 0.930774
+ 34: Layer  84, <cos_sim> = 0.932103
+ 35: Layer   8, <cos_sim> = 0.933102
+ 36: Layer   9, <cos_sim> = 0.935479
+ 37: Layer  42, <cos_sim> = 0.935862
+ 38: Layer  12, <cos_sim> = 0.936215
+ 39: Layer   5, <cos_sim> = 0.941695
+ 40: Layer  43, <cos_sim> = 0.943382
+ 41: Layer  86, <cos_sim> = 0.947319
+ 42: Layer  15, <cos_sim> = 0.948505
+ 43: Layer  20, <cos_sim> = 0.948549
+ 44: Layer  18, <cos_sim> = 0.951088
+ 45: Layer  44, <cos_sim> = 0.952598
+ 46: Layer  83, <cos_sim> = 0.952599
+ 47: Layer  19, <cos_sim> = 0.952615
+ 48: Layer  45, <cos_sim> = 0.953287
+ 49: Layer  17, <cos_sim> = 0.956447
+ 50: Layer  80, <cos_sim> = 0.957907
+ 51: Layer  16, <cos_sim> = 0.957981
+ 52: Layer  46, <cos_sim> = 0.958118
+ 53: Layer  81, <cos_sim> = 0.959244
+ 54: Layer  87, <cos_sim> = 0.959352
+ 55: Layer  90, <cos_sim> = 0.960285
+ 56: Layer  82, <cos_sim> = 0.961087
+ 57: Layer  47, <cos_sim> = 0.961475
+ 58: Layer  89, <cos_sim> = 0.962276
+ 59: Layer  88, <cos_sim> = 0.963196
+ 60: Layer  79, <cos_sim> = 0.963523
+ 61: Layer  48, <cos_sim> = 0.963567
+ 62: Layer  50, <cos_sim> = 0.964597
+ 63: Layer  49, <cos_sim> = 0.965508
+ 64: Layer  51, <cos_sim> = 0.965609
+ 65: Layer  52, <cos_sim> = 0.967696
+ 66: Layer  54, <cos_sim> = 0.968009
+ 67: Layer  53, <cos_sim> = 0.970224
+ 68: Layer  76, <cos_sim> = 0.970396
+ 69: Layer  78, <cos_sim> = 0.971591
+ 70: Layer  55, <cos_sim> = 0.971771
+ 71: Layer  75, <cos_sim> = 0.973436
+ 72: Layer  77, <cos_sim> = 0.975951
+ 73: Layer  58, <cos_sim> = 0.978094
+ 74: Layer  56, <cos_sim> = 0.978404
+ 75: Layer  57, <cos_sim> = 0.979015
+ 76: Layer  59, <cos_sim> = 0.979639
+ 77: Layer  73, <cos_sim> = 0.980629
+ 78: Layer  67, <cos_sim> = 0.981126
+ 79: Layer  66, <cos_sim> = 0.981658
+ 80: Layer  72, <cos_sim> = 0.981951
+ 81: Layer  65, <cos_sim> = 0.981978
+ 82: Layer  61, <cos_sim> = 0.982014
+ 83: Layer  68, <cos_sim> = 0.982152
+ 84: Layer  74, <cos_sim> = 0.982164
+ 85: Layer  60, <cos_sim> = 0.982302
+ 86: Layer  71, <cos_sim> = 0.982914
+ 87: Layer  63, <cos_sim> = 0.983344
+ 88: Layer  70, <cos_sim> = 0.983749
+ 89: Layer  64, <cos_sim> = 0.984071
+ 90: Layer  69, <cos_sim> = 0.984258
+ 91: Layer  62, <cos_sim> = 0.984467
+======================== sorted attention importances
+  0: Layer   0, <cos_sim> = 0.335289
+  1: Layer   1, <cos_sim> = 0.552763
+  2: Layer   2, <cos_sim> = 0.637396
+  3: Layer   3, <cos_sim> = 0.816339
+  4: Layer   7, <cos_sim> = 0.824544
+  5: Layer  13, <cos_sim> = 0.850178
+  6: Layer   6, <cos_sim> = 0.850298
+  7: Layer   4, <cos_sim> = 0.851804
+  8: Layer   9, <cos_sim> = 0.859275
+  9: Layer   8, <cos_sim> = 0.866695
+ 10: Layer  12, <cos_sim> = 0.874505
+ 11: Layer  15, <cos_sim> = 0.876165
+ 12: Layer   5, <cos_sim> = 0.876507
+ 13: Layer  10, <cos_sim> = 0.87806
+ 14: Layer  11, <cos_sim> = 0.880676
+ 15: Layer  16, <cos_sim> = 0.893902
+ 16: Layer  17, <cos_sim> = 0.899423
+ 17: Layer  21, <cos_sim> = 0.900672
+ 18: Layer  14, <cos_sim> = 0.9032
+ 19: Layer  19, <cos_sim> = 0.909055
+ 20: Layer  20, <cos_sim> = 0.911488
+ 21: Layer  18, <cos_sim> = 0.917251
+ 22: Layer  23, <cos_sim> = 0.919361
+ 23: Layer  22, <cos_sim> = 0.928206
+ 24: Layer  24, <cos_sim> = 0.932381
+ 25: Layer  25, <cos_sim> = 0.936273
+ 26: Layer  32, <cos_sim> = 0.938645
+ 27: Layer  28, <cos_sim> = 0.941543
+ 28: Layer  26, <cos_sim> = 0.942651
+ 29: Layer  33, <cos_sim> = 0.943323
+ 30: Layer  27, <cos_sim> = 0.943763
+ 31: Layer  37, <cos_sim> = 0.944613
+ 32: Layer  31, <cos_sim> = 0.945652
+ 33: Layer  30, <cos_sim> = 0.946387
+ 34: Layer  38, <cos_sim> = 0.948997
+ 35: Layer  39, <cos_sim> = 0.94954
+ 36: Layer  35, <cos_sim> = 0.950607
+ 37: Layer  41, <cos_sim> = 0.951778
+ 38: Layer  34, <cos_sim> = 0.952551
+ 39: Layer  40, <cos_sim> = 0.95284
+ 40: Layer  29, <cos_sim> = 0.952981
+ 41: Layer  42, <cos_sim> = 0.954776
+ 42: Layer  36, <cos_sim> = 0.958211
+ 43: Layer  85, <cos_sim> = 0.963066
+ 44: Layer  43, <cos_sim> = 0.963722
+ 45: Layer  44, <cos_sim> = 0.964977
+ 46: Layer  45, <cos_sim> = 0.966557
+ 47: Layer  46, <cos_sim> = 0.969251
+ 48: Layer  84, <cos_sim> = 0.971393
+ 49: Layer  86, <cos_sim> = 0.971928
+ 50: Layer  51, <cos_sim> = 0.973333
+ 51: Layer  52, <cos_sim> = 0.974347
+ 52: Layer  83, <cos_sim> = 0.974803
+ 53: Layer  50, <cos_sim> = 0.977621
+ 54: Layer  48, <cos_sim> = 0.977849
+ 55: Layer  47, <cos_sim> = 0.97789
+ 56: Layer  81, <cos_sim> = 0.978345
+ 57: Layer  82, <cos_sim> = 0.978486
+ 58: Layer  49, <cos_sim> = 0.978655
+ 59: Layer  80, <cos_sim> = 0.97866
+ 60: Layer  53, <cos_sim> = 0.979166
+ 61: Layer  91, <cos_sim> = 0.98049
+ 62: Layer  58, <cos_sim> = 0.981312
+ 63: Layer  54, <cos_sim> = 0.981736
+ 64: Layer  87, <cos_sim> = 0.982023
+ 65: Layer  79, <cos_sim> = 0.982483
+ 66: Layer  78, <cos_sim> = 0.983622
+ 67: Layer  88, <cos_sim> = 0.983653
+ 68: Layer  90, <cos_sim> = 0.985642
+ 69: Layer  61, <cos_sim> = 0.986197
+ 70: Layer  89, <cos_sim> = 0.986293
+ 71: Layer  68, <cos_sim> = 0.986564
+ 72: Layer  59, <cos_sim> = 0.986572
+ 73: Layer  73, <cos_sim> = 0.98676
+ 74: Layer  71, <cos_sim> = 0.986905
+ 75: Layer  55, <cos_sim> = 0.986992
+ 76: Layer  72, <cos_sim> = 0.987429
+ 77: Layer  76, <cos_sim> = 0.987882
+ 78: Layer  57, <cos_sim> = 0.988337
+ 79: Layer  56, <cos_sim> = 0.988355
+ 80: Layer  77, <cos_sim> = 0.98847
+ 81: Layer  67, <cos_sim> = 0.988501
+ 82: Layer  65, <cos_sim> = 0.98852
+ 83: Layer  70, <cos_sim> = 0.988926
+ 84: Layer  74, <cos_sim> = 0.988971
+ 85: Layer  64, <cos_sim> = 0.988973
+ 86: Layer  63, <cos_sim> = 0.989051
+ 87: Layer  66, <cos_sim> = 0.989456
+ 88: Layer  60, <cos_sim> = 0.989791
+ 89: Layer  69, <cos_sim> = 0.99087
+ 90: Layer  75, <cos_sim> = 0.991036
+ 91: Layer  62, <cos_sim> = 0.991942
+======================== sorted ffn importances
+  0: Layer   0, <cos_sim> = 0.584305
+  1: Layer   1, <cos_sim> = 0.599857
+  2: Layer   2, <cos_sim> = 0.734115
+  3: Layer   6, <cos_sim> = 0.807659
+  4: Layer   3, <cos_sim> = 0.823881
+  5: Layer   8, <cos_sim> = 0.855655
+  6: Layer  11, <cos_sim> = 0.855686
+  7: Layer   4, <cos_sim> = 0.858089
+  8: Layer  14, <cos_sim> = 0.858127
+  9: Layer  12, <cos_sim> = 0.860683
+ 10: Layer   5, <cos_sim> = 0.864949
+ 11: Layer   7, <cos_sim> = 0.867606
+ 12: Layer   9, <cos_sim> = 0.883365
+ 13: Layer  10, <cos_sim> = 0.884968
+ 14: Layer  15, <cos_sim> = 0.885658
+ 15: Layer  16, <cos_sim> = 0.887954
+ 16: Layer  20, <cos_sim> = 0.892895
+ 17: Layer  18, <cos_sim> = 0.90077
+ 18: Layer  19, <cos_sim> = 0.901247
+ 19: Layer  13, <cos_sim> = 0.902837
+ 20: Layer  24, <cos_sim> = 0.914728
+ 21: Layer  22, <cos_sim> = 0.91663
+ 22: Layer  25, <cos_sim> = 0.91686
+ 23: Layer  17, <cos_sim> = 0.91982
+ 24: Layer  26, <cos_sim> = 0.920503
+ 25: Layer  23, <cos_sim> = 0.921116
+ 26: Layer  27, <cos_sim> = 0.924545
+ 27: Layer  29, <cos_sim> = 0.92818
+ 28: Layer  32, <cos_sim> = 0.931219
+ 29: Layer  21, <cos_sim> = 0.931957
+ 30: Layer  31, <cos_sim> = 0.931987
+ 31: Layer  28, <cos_sim> = 0.933451
+ 32: Layer  30, <cos_sim> = 0.934623
+ 33: Layer  34, <cos_sim> = 0.935862
+ 34: Layer  37, <cos_sim> = 0.93849
+ 35: Layer  36, <cos_sim> = 0.939261
+ 36: Layer  33, <cos_sim> = 0.94047
+ 37: Layer  39, <cos_sim> = 0.942833
+ 38: Layer  40, <cos_sim> = 0.943535
+ 39: Layer  35, <cos_sim> = 0.943962
+ 40: Layer  41, <cos_sim> = 0.944572
+ 41: Layer  91, <cos_sim> = 0.944611
+ 42: Layer  38, <cos_sim> = 0.94701
+ 43: Layer  43, <cos_sim> = 0.951876
+ 44: Layer  42, <cos_sim> = 0.953462
+ 45: Layer  44, <cos_sim> = 0.954221
+ 46: Layer  45, <cos_sim> = 0.954828
+ 47: Layer  84, <cos_sim> = 0.960194
+ 48: Layer  46, <cos_sim> = 0.962422
+ 49: Layer  47, <cos_sim> = 0.963472
+ 50: Layer  50, <cos_sim> = 0.963841
+ 51: Layer  48, <cos_sim> = 0.964882
+ 52: Layer  51, <cos_sim> = 0.96498
+ 53: Layer  49, <cos_sim> = 0.965125
+ 54: Layer  85, <cos_sim> = 0.965745
+ 55: Layer  90, <cos_sim> = 0.966198
+ 56: Layer  52, <cos_sim> = 0.968709
+ 57: Layer  89, <cos_sim> = 0.969302
+ 58: Layer  86, <cos_sim> = 0.970209
+ 59: Layer  79, <cos_sim> = 0.971392
+ 60: Layer  80, <cos_sim> = 0.97181
+ 61: Layer  83, <cos_sim> = 0.971817
+ 62: Layer  53, <cos_sim> = 0.972442
+ 63: Layer  81, <cos_sim> = 0.972559
+ 64: Layer  87, <cos_sim> = 0.973106
+ 65: Layer  78, <cos_sim> = 0.973454
+ 66: Layer  57, <cos_sim> = 0.973742
+ 67: Layer  77, <cos_sim> = 0.97382
+ 68: Layer  82, <cos_sim> = 0.974303
+ 69: Layer  55, <cos_sim> = 0.974649
+ 70: Layer  54, <cos_sim> = 0.974867
+ 71: Layer  76, <cos_sim> = 0.975321
+ 72: Layer  75, <cos_sim> = 0.975472
+ 73: Layer  88, <cos_sim> = 0.975633
+ 74: Layer  58, <cos_sim> = 0.976417
+ 75: Layer  56, <cos_sim> = 0.976436
+ 76: Layer  60, <cos_sim> = 0.976607
+ 77: Layer  73, <cos_sim> = 0.977296
+ 78: Layer  72, <cos_sim> = 0.977447
+ 79: Layer  65, <cos_sim> = 0.977744
+ 80: Layer  67, <cos_sim> = 0.977822
+ 81: Layer  70, <cos_sim> = 0.977891
+ 82: Layer  59, <cos_sim> = 0.978032
+ 83: Layer  71, <cos_sim> = 0.978203
+ 84: Layer  69, <cos_sim> = 0.97839
+ 85: Layer  64, <cos_sim> = 0.978551
+ 86: Layer  66, <cos_sim> = 0.978619
+ 87: Layer  63, <cos_sim> = 0.97875
+ 88: Layer  74, <cos_sim> = 0.979117
+ 89: Layer  62, <cos_sim> = 0.979471
+ 90: Layer  68, <cos_sim> = 0.979636
+ 91: Layer  61, <cos_sim> = 0.980855
+llama_print_timings:        load time =  195855.60 ms
+llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_print_timings: prompt eval time = 7668240.55 ms / 416768 tokens (   18.40 ms per token,    54.35 tokens per second)
+llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_print_timings:       total time = 7872041.86 ms / 416769 tokens