pure_model_weights / code /xtuner /slidechat_baseline_eval.txt
WinstonHu's picture
Upload folder xtuner to code/xtuner
e5e24c9 verified
raw
history blame
6.72 kB
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_lgg_text_guided_reducer_attn.csv
Evaluation Summary:
---------------------
Total Samples : 121
Correct : 91
Accuracy : 75.21%
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_lgg_text_guided_reducer_attn.csv
Evaluation Summary:
---------------------
Total Samples : 121
Correct : 92
Accuracy : 76.03%
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_lgg_text_guided_reducer_attn.csv
Evaluation Summary:
---------------------
Total Samples : 121
Correct : 94
Accuracy : 77.69%
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_blca_random_selection.csv
Evaluation Summary:
---------------------
Total Samples : 158
Correct : 130
Accuracy : 82.28%
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_reducer_attn_rephase.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 27
Accuracy : 79.41%
Average Generation Time : 1.7586 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_reducer_attn_rephase.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 27
Accuracy : 79.41%
Average Generation Time : 0.3120 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 5.6964 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices2.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 5.5581 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_no_visual_input.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 5.8198 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage1.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 5.2619 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage1_llm_only.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 5.2839 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_llm_only.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 4
Accuracy : 11.76%
Average Generation Time : 5.5989 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_random_fixed_question_id.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 4.2200 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_random_fixed_question_id.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 1
Accuracy : 2.94%
Average Generation Time : 4.1402 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_random_fixed_question_id_2.csv
Evaluation Summary:
---------------------
Total Samples : 20
Correct : 0
Accuracy : 0.00%
Average Generation Time : 1.3137 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_random_fixed_question_id_2.csv
Evaluation Summary:
---------------------
Total Samples : 20
Correct : 0
Accuracy : 0.00%
Average Generation Time : 2.9803 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_beam_search_decoding.csv
Evaluation Summary:
---------------------
Total Samples : 45
Correct : 3
Accuracy : 6.67%
Average Generation Time : 6.5524 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_beam_search_decoding.csv
Evaluation Summary:
---------------------
Total Samples : 1
Correct : 0
Accuracy : 0.00%
Average Generation Time : 7.5340 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_beam_search_decoding.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 3
Accuracy : 8.82%
Average Generation Time : 6.4082 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_random_fixed_question_id_2_decoding.csv
Evaluation Summary:
---------------------
Total Samples : 20
Correct : 0
Accuracy : 0.00%
Average Generation Time : 20.2492 seconds
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_test_no_choices_stage2_random_fixed_question_id_decoding_2.csv
Evaluation Summary:
---------------------
Total Samples : 20
Correct : 0
Accuracy : 0.00%
Average Generation Time : 20.2163 seconds
Image Seq Len (avg/min/max) : 11482.2/3095/23567
/data/qingq/PathVLM/baselines/github/SlideChat/outputs/output_skcm_divprune_025.csv
Evaluation Summary:
---------------------
Total Samples : 34
Correct : 27
Accuracy : 79.41%
Average Generation Time : 0.1512 seconds