nullai-deepseek-r1-32b / TECHNICAL_DECK.md
kofdai's picture
Upload folder using huggingface_hub
f5c4dc5 verified
|
raw
history blame
14.4 kB

NullAI Technical Deck: DeepSeek R1 32B Fine-tuned Model


日本語 (Japanese)

1. NullAIシステムアーキテクチャ

NullAIは、多領域における知識推論と検証を統合した高度な知識基盤システムです。単なるLLMではなく、構造化された知識管理と多段階検証システムを組み合わせています。

1.1 階層構造

┌─────────────────────────────────────┐
│   Layer 5: State Management        │  ← システム状態管理
├─────────────────────────────────────┤
│   Layer 4: Judge System             │  ← 回答の検証・評価
│   ├─ Alpha Lobe  (基礎ロジック)     │
│   ├─ Beta Basic  (専門知識整合性)   │
│   └─ Beta Advanced (深層推論)       │
├─────────────────────────────────────┤
│   Layer 3: Inference Engine         │  ← DeepSeek R1による推論
├─────────────────────────────────────┤
│   Layer 2: Episodic Binding         │  ← 知識タイルの関連付け
├─────────────────────────────────────┤
│   Layer 1: Spatial Encoding         │  ← 知識の空間配置
└─────────────────────────────────────┘

2. Knowledge Tile System(知識タイルシステム)

2.1 構造

各知識は、以下の要素を持つタイルとして構造化されます:

{
    "tile_id": "unique_identifier",
    "domain": "medical|legal|programming|science|general",
    "content": "知識の内容",
    "coordinates": {
        "x": float,  # 概念空間上のX座標
        "y": float,  # 概念空間上のY座標
        "z": float   # 概念空間上のZ座標
    },
    "certainty_score": float,  # 0.0-1.0
    "orcid_verified": bool,
    "expert_id": "ORCID_ID",
    "reasoning_chain": [...],
    "citations": [...]
}

2.2 空間座標システム

  • X軸: 抽象度(具体的 ← → 抽象的)
  • Y軸: 専門性(基礎 ← → 高度専門)
  • Z軸: 時間性(普遍的 ← → 最新動向)

この3次元空間により、関連知識の効率的な検索と推論が可能になります。

3. Judge System(判定システム)

3.1 Alpha Lobe - 基礎ロジック検証

def alpha_lobe_check(reasoning_chain):
    """
    基礎的な論理整合性を検証
    - 矛盾の検出
    - 前提と結論の整合性
    - 推論ステップの妥当性
    """
    return {
        "passed": bool,
        "issues": [],
        "confidence": float
    }

3.2 Beta Lobe (Basic) - 専門知識整合性

def beta_lobe_basic(answer, domain_knowledge):
    """
    ドメイン固有の知識との整合性を確認
    - 専門用語の正確性
    - ドメイン常識との一致
    - 標準プロトコルの遵守
    """
    return {
        "domain_consistency": float,
        "terminology_accuracy": float,
        "protocol_compliance": bool
    }

3.3 Beta Lobe (Advanced) - 深層推論検証

def beta_lobe_advanced(answer, reasoning_chain, meta_knowledge):
    """
    高度な推論プロセスを検証
    - 多段階推論の妥当性
    - 因果関係の正確性
    - エッジケースの考慮
    """
    return {
        "reasoning_depth": int,
        "causal_accuracy": float,
        "edge_case_coverage": float
    }

4. ファインチューニング詳細

4.1 トレーニングプロセス

フェーズ1: データ準備

# データセットの分割(8:1:1の比率)
- 訓練データ: 8,768例
- 検証データ: 975例
- テストデータ: 保留

# データ形式
{
    "text": "システムプロンプト + 質問 + 回答",
    "domain": "medical|legal|programming|science|general",
    "difficulty": 1-5,
    "requires_reasoning": bool
}

フェーズ2: モデル量子化

# MLXでの4bit量子化
python -m mlx_lm.convert \
    --hf-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
    --mlx-path ./deepseek-r1-32b-mlx-4bit \
    --quantize \
    --q-bits 4 \
    --q-group-size 64 \
    --trust-remote-code

# 結果: 61GB → 17.2GB (72%削減)

フェーズ3: LoRAファインチューニング

python -m mlx_lm lora \
    --model ./deepseek-r1-32b-mlx-4bit \
    --train \
    --data . \
    --iters 1000 \
    --adapter-path ./adapters \
    --batch-size 1 \
    --learning-rate 1e-5 \
    --steps-per-report 10 \
    --steps-per-eval 100 \
    --save-every 250 \
    --grad-checkpoint \
    --max-seq-length 2048

4.2 ハイパーパラメータ選択の根拠

パラメータ 理由
Learning Rate 1e-5 大規模モデルの安定した学習のため
Batch Size 1 メモリ制約下での最大効率
LoRA Rank 16 パラメータ効率と品質のバランス
LoRA Alpha 32 Rank×2の標準設定
Max Seq Length 2048 長文推論に対応
Gradient Checkpointing True メモリ使用量削減

4.3 学習曲線解析

Iteration   Train Loss   Val Loss   Improvement
----------------------------------------------
    0          -         3.318         -
  100        1.548       1.583       52.3%
  200        0.860       0.934       71.9%
  300        0.682       1.113       66.5%
  400        1.260       0.741       77.7%
  500        0.681       0.832       74.9%
  600        0.561       0.885       73.3%
  700        0.710       0.897       73.0%
  800        0.589       0.621       81.3%
  900        0.574       0.705       78.7%
 1000        0.583       0.712       78.5%

観察結果:

  • 初期100イテレーションで急激な改善(52.3%)
  • 200-500イテレーションで安定した学習
  • 800イテレーション付近で最良の検証ロス
  • 最終的に78.5%の改善を達成

5. 推論最適化

5.1 Apple Silicon (MPS) 最適化

# MLXは自動的にApple Siliconに最適化
- Unified Memory Architecture活用
- Metal Performance Shaders使用
- Neural Engine活用(一部演算)

5.2 推論速度

メトリクス
トークン/秒 30-35
イテレーション/秒 0.35-0.40
ピークメモリ 19.9GB
平均レイテンシ ~2.8秒/iteration

English

1. NullAI System Architecture

NullAI is an advanced knowledge-based system that integrates multi-domain knowledge reasoning and verification. It's not just an LLM, but combines structured knowledge management with multi-stage verification systems.

1.1 Hierarchical Structure

┌─────────────────────────────────────┐
│   Layer 5: State Management        │  ← System state management
├─────────────────────────────────────┤
│   Layer 4: Judge System             │  ← Answer verification & evaluation
│   ├─ Alpha Lobe  (Basic Logic)      │
│   ├─ Beta Basic  (Domain Consistency)│
│   └─ Beta Advanced (Deep Reasoning)  │
├─────────────────────────────────────┤
│   Layer 3: Inference Engine         │  ← DeepSeek R1 inference
├─────────────────────────────────────┤
│   Layer 2: Episodic Binding         │  ← Knowledge tile association
├─────────────────────────────────────┤
│   Layer 1: Spatial Encoding         │  ← Knowledge spatial placement
└─────────────────────────────────────┘

2. Knowledge Tile System

2.1 Structure

Each piece of knowledge is structured as a tile with the following elements:

{
    "tile_id": "unique_identifier",
    "domain": "medical|legal|programming|science|general",
    "content": "Knowledge content",
    "coordinates": {
        "x": float,  # X coordinate in concept space
        "y": float,  # Y coordinate in concept space
        "z": float   # Z coordinate in concept space
    },
    "certainty_score": float,  # 0.0-1.0
    "orcid_verified": bool,
    "expert_id": "ORCID_ID",
    "reasoning_chain": [...],
    "citations": [...]
}

2.2 Spatial Coordinate System

  • X-axis: Abstraction level (Concrete ← → Abstract)
  • Y-axis: Expertise level (Basic ← → Advanced)
  • Z-axis: Temporality (Universal ← → Latest trends)

This 3D space enables efficient retrieval and reasoning of related knowledge.

3. Judge System

3.1 Alpha Lobe - Basic Logic Verification

def alpha_lobe_check(reasoning_chain):
    """
    Verifies basic logical consistency
    - Contradiction detection
    - Premise-conclusion consistency
    - Reasoning step validity
    """
    return {
        "passed": bool,
        "issues": [],
        "confidence": float
    }

3.2 Beta Lobe (Basic) - Domain Knowledge Consistency

def beta_lobe_basic(answer, domain_knowledge):
    """
    Checks consistency with domain-specific knowledge
    - Terminology accuracy
    - Domain common sense alignment
    - Standard protocol compliance
    """
    return {
        "domain_consistency": float,
        "terminology_accuracy": float,
        "protocol_compliance": bool
    }

3.3 Beta Lobe (Advanced) - Deep Reasoning Verification

def beta_lobe_advanced(answer, reasoning_chain, meta_knowledge):
    """
    Verifies advanced reasoning processes
    - Multi-step reasoning validity
    - Causal relationship accuracy
    - Edge case consideration
    """
    return {
        "reasoning_depth": int,
        "causal_accuracy": float,
        "edge_case_coverage": float
    }

4. Fine-tuning Details

4.1 Training Process

Phase 1: Data Preparation

# Dataset split (8:1:1 ratio)
- Training data: 8,768 examples
- Validation data: 975 examples
- Test data: Withheld

# Data format
{
    "text": "System prompt + Question + Answer",
    "domain": "medical|legal|programming|science|general",
    "difficulty": 1-5,
    "requires_reasoning": bool
}

Phase 2: Model Quantization

# 4-bit quantization with MLX
python -m mlx_lm.convert \
    --hf-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
    --mlx-path ./deepseek-r1-32b-mlx-4bit \
    --quantize \
    --q-bits 4 \
    --q-group-size 64 \
    --trust-remote-code

# Result: 61GB → 17.2GB (72% reduction)

Phase 3: LoRA Fine-tuning

python -m mlx_lm lora \
    --model ./deepseek-r1-32b-mlx-4bit \
    --train \
    --data . \
    --iters 1000 \
    --adapter-path ./adapters \
    --batch-size 1 \
    --learning-rate 1e-5 \
    --steps-per-report 10 \
    --steps-per-eval 100 \
    --save-every 250 \
    --grad-checkpoint \
    --max-seq-length 2048

4.2 Hyperparameter Selection Rationale

Parameter Value Reasoning
Learning Rate 1e-5 Stable learning for large models
Batch Size 1 Maximum efficiency under memory constraints
LoRA Rank 16 Balance between parameter efficiency and quality
LoRA Alpha 32 Standard setting of Rank×2
Max Seq Length 2048 Support for long-form reasoning
Gradient Checkpointing True Reduced memory usage

4.3 Learning Curve Analysis

Iteration   Train Loss   Val Loss   Improvement
----------------------------------------------
    0          -         3.318         -
  100        1.548       1.583       52.3%
  200        0.860       0.934       71.9%
  300        0.682       1.113       66.5%
  400        1.260       0.741       77.7%
  500        0.681       0.832       74.9%
  600        0.561       0.885       73.3%
  700        0.710       0.897       73.0%
  800        0.589       0.621       81.3%
  900        0.574       0.705       78.7%
 1000        0.583       0.712       78.5%

Observations:

  • Rapid improvement in first 100 iterations (52.3%)
  • Stable learning from iterations 200-500
  • Best validation loss around iteration 800
  • Final improvement of 78.5% achieved

5. Inference Optimization

5.1 Apple Silicon (MPS) Optimization

# MLX automatically optimizes for Apple Silicon
- Unified Memory Architecture utilization
- Metal Performance Shaders usage
- Neural Engine utilization (partial operations)

5.2 Inference Speed

Metric Value
Tokens/sec 30-35
Iterations/sec 0.35-0.40
Peak Memory 19.9GB
Average Latency ~2.8s/iteration

6. Model Capabilities by Domain

Medical Domain:

  • Diagnostic reasoning pathways
  • Treatment protocol recommendations
  • Drug interaction analysis
  • Clinical guideline interpretation

Legal Domain:

  • Legal precedent analysis
  • Statutory interpretation
  • Contract clause analysis
  • Regulatory compliance guidance

Programming Domain:

  • Code generation and optimization
  • Bug detection and debugging
  • Algorithm design and analysis
  • Software architecture recommendations

Scientific Domain:

  • Research methodology design
  • Statistical analysis guidance
  • Experimental design optimization
  • Data interpretation support

General Domain:

  • Broad knowledge retrieval
  • Multi-domain reasoning
  • Explanation generation
  • Knowledge synthesis

7. Limitations and Future Work

Current Limitations:

  • Requires significant RAM (20GB+) for inference
  • Response latency on non-optimized hardware
  • Domain-specific accuracy varies

Future Improvements:

  • Further quantization experiments (3-bit, 2-bit)
  • Domain-specific adapter modules
  • Real-time ORCID verification integration
  • Expanded training dataset across domains
  • Multi-lingual support expansion