nullai-deepseek-r1-32b / TECHNICAL_DECK.md

kofdai

Upload folder using huggingface_hub

f5c4dc5 verified 23 days ago

preview code

raw

history blame

14.4 kB

NullAI Technical Deck: DeepSeek R1 32B Fine-tuned Model

日本語 (Japanese)

1. NullAIシステムアーキテクチャ

NullAIは、多領域における知識推論と検証を統合した高度な知識基盤システムです。単なるLLMではなく、構造化された知識管理と多段階検証システムを組み合わせています。

1.1 階層構造

┌─────────────────────────────────────┐
│   Layer 5: State Management        │  ← システム状態管理
├─────────────────────────────────────┤
│   Layer 4: Judge System             │  ← 回答の検証・評価
│   ├─ Alpha Lobe  (基礎ロジック)     │
│   ├─ Beta Basic  (専門知識整合性)   │
│   └─ Beta Advanced (深層推論)       │
├─────────────────────────────────────┤
│   Layer 3: Inference Engine         │  ← DeepSeek R1による推論
├─────────────────────────────────────┤
│   Layer 2: Episodic Binding         │  ← 知識タイルの関連付け
├─────────────────────────────────────┤
│   Layer 1: Spatial Encoding         │  ← 知識の空間配置
└─────────────────────────────────────┘

2. Knowledge Tile System（知識タイルシステム）

2.1 構造

各知識は、以下の要素を持つタイルとして構造化されます：

{
    "tile_id": "unique_identifier",
    "domain": "medical|legal|programming|science|general",
    "content": "知識の内容",
    "coordinates": {
        "x": float,  # 概念空間上のX座標
        "y": float,  # 概念空間上のY座標
        "z": float   # 概念空間上のZ座標
    },
    "certainty_score": float,  # 0.0-1.0
    "orcid_verified": bool,
    "expert_id": "ORCID_ID",
    "reasoning_chain": [...],
    "citations": [...]
}

2.2 空間座標システム

X軸: 抽象度（具体的 ← → 抽象的）
Y軸: 専門性（基礎 ← → 高度専門）
Z軸: 時間性（普遍的 ← → 最新動向）

この3次元空間により、関連知識の効率的な検索と推論が可能になります。

3. Judge System（判定システム）

3.1 Alpha Lobe - 基礎ロジック検証

def alpha_lobe_check(reasoning_chain):
    """
    基礎的な論理整合性を検証
    - 矛盾の検出
    - 前提と結論の整合性
    - 推論ステップの妥当性
    """
    return {
        "passed": bool,
        "issues": [],
        "confidence": float
    }

3.2 Beta Lobe (Basic) - 専門知識整合性

def beta_lobe_basic(answer, domain_knowledge):
    """
    ドメイン固有の知識との整合性を確認
    - 専門用語の正確性
    - ドメイン常識との一致
    - 標準プロトコルの遵守
    """
    return {
        "domain_consistency": float,
        "terminology_accuracy": float,
        "protocol_compliance": bool
    }

3.3 Beta Lobe (Advanced) - 深層推論検証

def beta_lobe_advanced(answer, reasoning_chain, meta_knowledge):
    """
    高度な推論プロセスを検証
    - 多段階推論の妥当性
    - 因果関係の正確性
    - エッジケースの考慮
    """
    return {
        "reasoning_depth": int,
        "causal_accuracy": float,
        "edge_case_coverage": float
    }

4. ファインチューニング詳細

4.1 トレーニングプロセス

フェーズ1: データ準備

# データセットの分割（8:1:1の比率）
- 訓練データ: 8,768例
- 検証データ: 975例
- テストデータ: 保留

# データ形式
{
    "text": "システムプロンプト + 質問 + 回答",
    "domain": "medical|legal|programming|science|general",
    "difficulty": 1-5,
    "requires_reasoning": bool
}

フェーズ2: モデル量子化

# MLXでの4bit量子化
python -m mlx_lm.convert \
    --hf-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
    --mlx-path ./deepseek-r1-32b-mlx-4bit \
    --quantize \
    --q-bits 4 \
    --q-group-size 64 \
    --trust-remote-code

# 結果: 61GB → 17.2GB (72%削減)

フェーズ3: LoRAファインチューニング

python -m mlx_lm lora \
    --model ./deepseek-r1-32b-mlx-4bit \
    --train \
    --data . \
    --iters 1000 \
    --adapter-path ./adapters \
    --batch-size 1 \
    --learning-rate 1e-5 \
    --steps-per-report 10 \
    --steps-per-eval 100 \
    --save-every 250 \
    --grad-checkpoint \
    --max-seq-length 2048

4.2 ハイパーパラメータ選択の根拠

パラメータ	値	理由
Learning Rate	1e-5	大規模モデルの安定した学習のため
Batch Size	1	メモリ制約下での最大効率
LoRA Rank	16	パラメータ効率と品質のバランス
LoRA Alpha	32	Rank×2の標準設定
Max Seq Length	2048	長文推論に対応
Gradient Checkpointing	True	メモリ使用量削減

4.3 学習曲線解析

Iteration   Train Loss   Val Loss   Improvement
----------------------------------------------
    0          -         3.318         -
  100        1.548       1.583       52.3%
  200        0.860       0.934       71.9%
  300        0.682       1.113       66.5%
  400        1.260       0.741       77.7%
  500        0.681       0.832       74.9%
  600        0.561       0.885       73.3%
  700        0.710       0.897       73.0%
  800        0.589       0.621       81.3%
  900        0.574       0.705       78.7%
 1000        0.583       0.712       78.5%

観察結果:

初期100イテレーションで急激な改善（52.3%）
200-500イテレーションで安定した学習
800イテレーション付近で最良の検証ロス
最終的に78.5%の改善を達成

5. 推論最適化

5.1 Apple Silicon (MPS) 最適化

# MLXは自動的にApple Siliconに最適化
- Unified Memory Architecture活用
- Metal Performance Shaders使用
- Neural Engine活用（一部演算）

5.2 推論速度

メトリクス	値
トークン/秒	30-35
イテレーション/秒	0.35-0.40
ピークメモリ	19.9GB
平均レイテンシ	~2.8秒/iteration

English

1. NullAI System Architecture

NullAI is an advanced knowledge-based system that integrates multi-domain knowledge reasoning and verification. It's not just an LLM, but combines structured knowledge management with multi-stage verification systems.

1.1 Hierarchical Structure

┌─────────────────────────────────────┐
│   Layer 5: State Management        │  ← System state management
├─────────────────────────────────────┤
│   Layer 4: Judge System             │  ← Answer verification & evaluation
│   ├─ Alpha Lobe  (Basic Logic)      │
│   ├─ Beta Basic  (Domain Consistency)│
│   └─ Beta Advanced (Deep Reasoning)  │
├─────────────────────────────────────┤
│   Layer 3: Inference Engine         │  ← DeepSeek R1 inference
├─────────────────────────────────────┤
│   Layer 2: Episodic Binding         │  ← Knowledge tile association
├─────────────────────────────────────┤
│   Layer 1: Spatial Encoding         │  ← Knowledge spatial placement
└─────────────────────────────────────┘

2. Knowledge Tile System

2.1 Structure

Each piece of knowledge is structured as a tile with the following elements:

{
    "tile_id": "unique_identifier",
    "domain": "medical|legal|programming|science|general",
    "content": "Knowledge content",
    "coordinates": {
        "x": float,  # X coordinate in concept space
        "y": float,  # Y coordinate in concept space
        "z": float   # Z coordinate in concept space
    },
    "certainty_score": float,  # 0.0-1.0
    "orcid_verified": bool,
    "expert_id": "ORCID_ID",
    "reasoning_chain": [...],
    "citations": [...]
}

2.2 Spatial Coordinate System

X-axis: Abstraction level (Concrete ← → Abstract)
Y-axis: Expertise level (Basic ← → Advanced)
Z-axis: Temporality (Universal ← → Latest trends)

This 3D space enables efficient retrieval and reasoning of related knowledge.

3. Judge System

3.1 Alpha Lobe - Basic Logic Verification

def alpha_lobe_check(reasoning_chain):
    """
    Verifies basic logical consistency
    - Contradiction detection
    - Premise-conclusion consistency
    - Reasoning step validity
    """
    return {
        "passed": bool,
        "issues": [],
        "confidence": float
    }

3.2 Beta Lobe (Basic) - Domain Knowledge Consistency

def beta_lobe_basic(answer, domain_knowledge):
    """
    Checks consistency with domain-specific knowledge
    - Terminology accuracy
    - Domain common sense alignment
    - Standard protocol compliance
    """
    return {
        "domain_consistency": float,
        "terminology_accuracy": float,
        "protocol_compliance": bool
    }

3.3 Beta Lobe (Advanced) - Deep Reasoning Verification

def beta_lobe_advanced(answer, reasoning_chain, meta_knowledge):
    """
    Verifies advanced reasoning processes
    - Multi-step reasoning validity
    - Causal relationship accuracy
    - Edge case consideration
    """
    return {
        "reasoning_depth": int,
        "causal_accuracy": float,
        "edge_case_coverage": float
    }

4. Fine-tuning Details

4.1 Training Process

Phase 1: Data Preparation

# Dataset split (8:1:1 ratio)
- Training data: 8,768 examples
- Validation data: 975 examples
- Test data: Withheld

# Data format
{
    "text": "System prompt + Question + Answer",
    "domain": "medical|legal|programming|science|general",
    "difficulty": 1-5,
    "requires_reasoning": bool
}

Phase 2: Model Quantization

# 4-bit quantization with MLX
python -m mlx_lm.convert \
    --hf-path deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
    --mlx-path ./deepseek-r1-32b-mlx-4bit \
    --quantize \
    --q-bits 4 \
    --q-group-size 64 \
    --trust-remote-code

# Result: 61GB → 17.2GB (72% reduction)

Phase 3: LoRA Fine-tuning

python -m mlx_lm lora \
    --model ./deepseek-r1-32b-mlx-4bit \
    --train \
    --data . \
    --iters 1000 \
    --adapter-path ./adapters \
    --batch-size 1 \
    --learning-rate 1e-5 \
    --steps-per-report 10 \
    --steps-per-eval 100 \
    --save-every 250 \
    --grad-checkpoint \
    --max-seq-length 2048

4.2 Hyperparameter Selection Rationale

Parameter	Value	Reasoning
Learning Rate	1e-5	Stable learning for large models
Batch Size	1	Maximum efficiency under memory constraints
LoRA Rank	16	Balance between parameter efficiency and quality
LoRA Alpha	32	Standard setting of Rank×2
Max Seq Length	2048	Support for long-form reasoning
Gradient Checkpointing	True	Reduced memory usage

4.3 Learning Curve Analysis

Iteration   Train Loss   Val Loss   Improvement
----------------------------------------------
    0          -         3.318         -
  100        1.548       1.583       52.3%
  200        0.860       0.934       71.9%
  300        0.682       1.113       66.5%
  400        1.260       0.741       77.7%
  500        0.681       0.832       74.9%
  600        0.561       0.885       73.3%
  700        0.710       0.897       73.0%
  800        0.589       0.621       81.3%
  900        0.574       0.705       78.7%
 1000        0.583       0.712       78.5%

Observations:

Rapid improvement in first 100 iterations (52.3%)
Stable learning from iterations 200-500
Best validation loss around iteration 800
Final improvement of 78.5% achieved

5. Inference Optimization

5.1 Apple Silicon (MPS) Optimization

# MLX automatically optimizes for Apple Silicon
- Unified Memory Architecture utilization
- Metal Performance Shaders usage
- Neural Engine utilization (partial operations)

5.2 Inference Speed

Metric	Value
Tokens/sec	30-35
Iterations/sec	0.35-0.40
Peak Memory	19.9GB
Average Latency	~2.8s/iteration

6. Model Capabilities by Domain

Medical Domain:

Diagnostic reasoning pathways
Treatment protocol recommendations
Drug interaction analysis
Clinical guideline interpretation

Legal Domain:

Legal precedent analysis
Statutory interpretation
Contract clause analysis
Regulatory compliance guidance

Programming Domain:

Code generation and optimization
Bug detection and debugging
Algorithm design and analysis
Software architecture recommendations

Scientific Domain:

Research methodology design
Statistical analysis guidance
Experimental design optimization
Data interpretation support

General Domain:

Broad knowledge retrieval
Multi-domain reasoning
Explanation generation
Knowledge synthesis

7. Limitations and Future Work

Current Limitations:

Requires significant RAM (20GB+) for inference
Response latency on non-optimized hardware
Domain-specific accuracy varies

Future Improvements:

Further quantization experiments (3-bit, 2-bit)
Domain-specific adapter modules
Real-time ORCID verification integration
Expanded training dataset across domains
Multi-lingual support expansion