# NullAI Technical Deck: DeepSeek R1 32B Fine-tuned Model --- ## 日本語 (Japanese) ### 1. NullAIシステムアーキテクチャ NullAIは、多領域における知識推論と検証を統合した高度な知識基盤システムです。単なるLLMではなく、構造化された知識管理と多段階検証システムを組み合わせています。 #### 1.1 階層構造 ``` ┌─────────────────────────────────────┐ │ Layer 5: State Management │ ← システム状態管理 ├─────────────────────────────────────┤ │ Layer 4: Judge System │ ← 回答の検証・評価 │ ├─ Alpha Lobe (基礎ロジック) │ │ ├─ Beta Basic (専門知識整合性) │ │ └─ Beta Advanced (深層推論) │ ├─────────────────────────────────────┤ │ Layer 3: Inference Engine │ ← DeepSeek R1による推論 ├─────────────────────────────────────┤ │ Layer 2: Episodic Binding │ ← 知識タイルの関連付け ├─────────────────────────────────────┤ │ Layer 1: Spatial Encoding │ ← 知識の空間配置 └─────────────────────────────────────┘ ``` ### 2. Knowledge Tile System(知識タイルシステム) #### 2.1 構造 各知識は、以下の要素を持つタイルとして構造化されます: ```python { "tile_id": "unique_identifier", "domain": "medical|legal|programming|science|general", "content": "知識の内容", "coordinates": { "x": float, # 概念空間上のX座標 "y": float, # 概念空間上のY座標 "z": float # 概念空間上のZ座標 }, "certainty_score": float, # 0.0-1.0 "orcid_verified": bool, "expert_id": "ORCID_ID", "reasoning_chain": [...], "citations": [...] } ``` #### 2.2 空間座標システム - **X軸**: 抽象度(具体的 ← → 抽象的) - **Y軸**: 専門性(基礎 ← → 高度専門) - **Z軸**: 時間性(普遍的 ← → 最新動向) この3次元空間により、関連知識の効率的な検索と推論が可能になります。 ### 3. Judge System(判定システム) #### 3.1 Alpha Lobe - 基礎ロジック検証 ```python def alpha_lobe_check(reasoning_chain): """ 基礎的な論理整合性を検証 - 矛盾の検出 - 前提と結論の整合性 - 推論ステップの妥当性 """ return { "passed": bool, "issues": [], "confidence": float } ``` #### 3.2 Beta Lobe (Basic) - 専門知識整合性 ```python def beta_lobe_basic(answer, domain_knowledge): """ ドメイン固有の知識との整合性を確認 - 専門用語の正確性 - ドメイン常識との一致 - 標準プロトコルの遵守 """ return { "domain_consistency": float, "terminology_accuracy": float, "protocol_compliance": bool } ``` #### 3.3 Beta Lobe (Advanced) - 深層推論検証 ```python def beta_lobe_advanced(answer, reasoning_chain, meta_knowledge): """ 高度な推論プロセスを検証 - 多段階推論の妥当性 - 因果関係の正確性 - エッジケースの考慮 """ return { "reasoning_depth": int, "causal_accuracy": float, "edge_case_coverage": float } ``` ### 4. ファインチューニング詳細 #### 4.1 トレーニングプロセス **フェーズ1: データ準備** ```bash # データセットの分割(8:1:1の比率) - 訓練データ: 8,768例 - 検証データ: 975例 - テストデータ: 保留 # データ形式 { "text": "システムプロンプト + 質問 + 回答", "domain": "medical|legal|programming|science|general", "difficulty": 1-5, "requires_reasoning": bool } ``` **フェーズ2: モデル量子化** ```bash # MLXでの4bit量子化 python -m mlx_lm.convert \ --hf-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \ --mlx-path ./deepseek-r1-32b-mlx-4bit \ --quantize \ --q-bits 4 \ --q-group-size 64 \ --trust-remote-code # 結果: 61GB → 17.2GB (72%削減) ``` **フェーズ3: LoRAファインチューニング** ```bash python -m mlx_lm lora \ --model ./deepseek-r1-32b-mlx-4bit \ --train \ --data . \ --iters 1000 \ --adapter-path ./adapters \ --batch-size 1 \ --learning-rate 1e-5 \ --steps-per-report 10 \ --steps-per-eval 100 \ --save-every 250 \ --grad-checkpoint \ --max-seq-length 2048 ``` #### 4.2 ハイパーパラメータ選択の根拠 | パラメータ | 値 | 理由 | |-----------|-----|------| | Learning Rate | 1e-5 | 大規模モデルの安定した学習のため | | Batch Size | 1 | メモリ制約下での最大効率 | | LoRA Rank | 16 | パラメータ効率と品質のバランス | | LoRA Alpha | 32 | Rank×2の標準設定 | | Max Seq Length | 2048 | 長文推論に対応 | | Gradient Checkpointing | True | メモリ使用量削減 | #### 4.3 学習曲線解析 ``` Iteration Train Loss Val Loss Improvement ---------------------------------------------- 0 - 3.318 - 100 1.548 1.583 52.3% 200 0.860 0.934 71.9% 300 0.682 1.113 66.5% 400 1.260 0.741 77.7% 500 0.681 0.832 74.9% 600 0.561 0.885 73.3% 700 0.710 0.897 73.0% 800 0.589 0.621 81.3% 900 0.574 0.705 78.7% 1000 0.583 0.712 78.5% ``` **観察結果:** - 初期100イテレーションで急激な改善(52.3%) - 200-500イテレーションで安定した学習 - 800イテレーション付近で最良の検証ロス - 最終的に78.5%の改善を達成 ### 5. 推論最適化 #### 5.1 Apple Silicon (MPS) 最適化 ```python # MLXは自動的にApple Siliconに最適化 - Unified Memory Architecture活用 - Metal Performance Shaders使用 - Neural Engine活用(一部演算) ``` #### 5.2 推論速度 | メトリクス | 値 | |----------|-----| | トークン/秒 | 30-35 | | イテレーション/秒 | 0.35-0.40 | | ピークメモリ | 19.9GB | | 平均レイテンシ | ~2.8秒/iteration | --- ## English ### 1. NullAI System Architecture NullAI is an advanced knowledge-based system that integrates multi-domain knowledge reasoning and verification. It's not just an LLM, but combines structured knowledge management with multi-stage verification systems. #### 1.1 Hierarchical Structure ``` ┌─────────────────────────────────────┐ │ Layer 5: State Management │ ← System state management ├─────────────────────────────────────┤ │ Layer 4: Judge System │ ← Answer verification & evaluation │ ├─ Alpha Lobe (Basic Logic) │ │ ├─ Beta Basic (Domain Consistency)│ │ └─ Beta Advanced (Deep Reasoning) │ ├─────────────────────────────────────┤ │ Layer 3: Inference Engine │ ← DeepSeek R1 inference ├─────────────────────────────────────┤ │ Layer 2: Episodic Binding │ ← Knowledge tile association ├─────────────────────────────────────┤ │ Layer 1: Spatial Encoding │ ← Knowledge spatial placement └─────────────────────────────────────┘ ``` ### 2. Knowledge Tile System #### 2.1 Structure Each piece of knowledge is structured as a tile with the following elements: ```python { "tile_id": "unique_identifier", "domain": "medical|legal|programming|science|general", "content": "Knowledge content", "coordinates": { "x": float, # X coordinate in concept space "y": float, # Y coordinate in concept space "z": float # Z coordinate in concept space }, "certainty_score": float, # 0.0-1.0 "orcid_verified": bool, "expert_id": "ORCID_ID", "reasoning_chain": [...], "citations": [...] } ``` #### 2.2 Spatial Coordinate System - **X-axis**: Abstraction level (Concrete ← → Abstract) - **Y-axis**: Expertise level (Basic ← → Advanced) - **Z-axis**: Temporality (Universal ← → Latest trends) This 3D space enables efficient retrieval and reasoning of related knowledge. ### 3. Judge System #### 3.1 Alpha Lobe - Basic Logic Verification ```python def alpha_lobe_check(reasoning_chain): """ Verifies basic logical consistency - Contradiction detection - Premise-conclusion consistency - Reasoning step validity """ return { "passed": bool, "issues": [], "confidence": float } ``` #### 3.2 Beta Lobe (Basic) - Domain Knowledge Consistency ```python def beta_lobe_basic(answer, domain_knowledge): """ Checks consistency with domain-specific knowledge - Terminology accuracy - Domain common sense alignment - Standard protocol compliance """ return { "domain_consistency": float, "terminology_accuracy": float, "protocol_compliance": bool } ``` #### 3.3 Beta Lobe (Advanced) - Deep Reasoning Verification ```python def beta_lobe_advanced(answer, reasoning_chain, meta_knowledge): """ Verifies advanced reasoning processes - Multi-step reasoning validity - Causal relationship accuracy - Edge case consideration """ return { "reasoning_depth": int, "causal_accuracy": float, "edge_case_coverage": float } ``` ### 4. Fine-tuning Details #### 4.1 Training Process **Phase 1: Data Preparation** ```bash # Dataset split (8:1:1 ratio) - Training data: 8,768 examples - Validation data: 975 examples - Test data: Withheld # Data format { "text": "System prompt + Question + Answer", "domain": "medical|legal|programming|science|general", "difficulty": 1-5, "requires_reasoning": bool } ``` **Phase 2: Model Quantization** ```bash # 4-bit quantization with MLX python -m mlx_lm.convert \ --hf-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \ --mlx-path ./deepseek-r1-32b-mlx-4bit \ --quantize \ --q-bits 4 \ --q-group-size 64 \ --trust-remote-code # Result: 61GB → 17.2GB (72% reduction) ``` **Phase 3: LoRA Fine-tuning** ```bash python -m mlx_lm lora \ --model ./deepseek-r1-32b-mlx-4bit \ --train \ --data . \ --iters 1000 \ --adapter-path ./adapters \ --batch-size 1 \ --learning-rate 1e-5 \ --steps-per-report 10 \ --steps-per-eval 100 \ --save-every 250 \ --grad-checkpoint \ --max-seq-length 2048 ``` #### 4.2 Hyperparameter Selection Rationale | Parameter | Value | Reasoning | |-----------|-------|-----------| | Learning Rate | 1e-5 | Stable learning for large models | | Batch Size | 1 | Maximum efficiency under memory constraints | | LoRA Rank | 16 | Balance between parameter efficiency and quality | | LoRA Alpha | 32 | Standard setting of Rank×2 | | Max Seq Length | 2048 | Support for long-form reasoning | | Gradient Checkpointing | True | Reduced memory usage | #### 4.3 Learning Curve Analysis ``` Iteration Train Loss Val Loss Improvement ---------------------------------------------- 0 - 3.318 - 100 1.548 1.583 52.3% 200 0.860 0.934 71.9% 300 0.682 1.113 66.5% 400 1.260 0.741 77.7% 500 0.681 0.832 74.9% 600 0.561 0.885 73.3% 700 0.710 0.897 73.0% 800 0.589 0.621 81.3% 900 0.574 0.705 78.7% 1000 0.583 0.712 78.5% ``` **Observations:** - Rapid improvement in first 100 iterations (52.3%) - Stable learning from iterations 200-500 - Best validation loss around iteration 800 - Final improvement of 78.5% achieved ### 5. Inference Optimization #### 5.1 Apple Silicon (MPS) Optimization ```python # MLX automatically optimizes for Apple Silicon - Unified Memory Architecture utilization - Metal Performance Shaders usage - Neural Engine utilization (partial operations) ``` #### 5.2 Inference Speed | Metric | Value | |--------|-------| | Tokens/sec | 30-35 | | Iterations/sec | 0.35-0.40 | | Peak Memory | 19.9GB | | Average Latency | ~2.8s/iteration | ### 6. Model Capabilities by Domain **Medical Domain:** - Diagnostic reasoning pathways - Treatment protocol recommendations - Drug interaction analysis - Clinical guideline interpretation **Legal Domain:** - Legal precedent analysis - Statutory interpretation - Contract clause analysis - Regulatory compliance guidance **Programming Domain:** - Code generation and optimization - Bug detection and debugging - Algorithm design and analysis - Software architecture recommendations **Scientific Domain:** - Research methodology design - Statistical analysis guidance - Experimental design optimization - Data interpretation support **General Domain:** - Broad knowledge retrieval - Multi-domain reasoning - Explanation generation - Knowledge synthesis ### 7. Limitations and Future Work **Current Limitations:** - Requires significant RAM (20GB+) for inference - Response latency on non-optimized hardware - Domain-specific accuracy varies **Future Improvements:** - Further quantization experiments (3-bit, 2-bit) - Domain-specific adapter modules - Real-time ORCID verification integration - Expanded training dataset across domains - Multi-lingual support expansion