GPT-OSS-20B Digital Twin v1

datasets:

  • FreedomIntelligence/medical-o1-reasoning-SFT

pipeline_tag: text-generation

model-index:

  • name: gpt-oss-20b-digital-twin-v1## Model Details

    results:

    • task:### Model Description

      type: text-generation

      name: Medical Question Answering

      metrics:

      • type: format_compliance

        value: 95

        name: Reasoning Structure Compliance- Developed by: [More Information Needed]

      • type: semantic_accuracy- Funded by [optional]: [More Information Needed]

        value: 85- Shared by [optional]: [More Information Needed]

        name: Medical Accuracy- Model type: [More Information Needed]

---- Language(s) (NLP): [More Information Needed]

  • License: [More Information Needed]

GPT-OSS-20B Medical Digital Twin v1 🫀- Finetuned from model [optional]: [More Information Needed]

A 20-billion parameter Medical Digital Twin AI trained using GRPO (Group Relative Policy Optimization) on OpenAI's GPT-OSS-20B base model.### Model Sources [optional]

🌟 Model Description

This model acts as a Medical Digital Twin - simulating physiological reasoning processes before providing medical responses. It's specifically designed to:- Repository: [More Information Needed]

  • Paper [optional]: [More Information Needed]

  • 🧠 Show Clinical Reasoning: Uses <think> tags to demonstrate step-by-step diagnostic thinking- Demo [optional]: [More Information Needed]

  • 👥 Dual Communication: Adapts tone for patient support or physician collaboration

  • 🎯 Accuracy-Focused: Trained with semantic similarity rewards for medical correctness## Uses

  • Large-Scale: 20B parameters with efficient LoRA fine-tuning

🏗️ Architecture

Direct Use

| Component | Specification |

|-----------|---------------|

| Base Model | OpenAI GPT-OSS-20B (20 billion parameters) |

| Training Method | GRPO (Reinforcement Learning from Human Feedback) |[More Information Needed]

| Adaptation | LoRA (Low-Rank Adaptation) rank 64 |

| Quantization | 4-bit NF4 for memory efficiency |### Downstream Use [optional]

| Context Length | 4,096 tokens |

| Hardware Used | NVIDIA A100 80GB |

🎓 Training Details[More Information Needed]

Dataset### Out-of-Scope Use

  • Source: FreedomIntelligence/medical-o1-reasoning-SFT

  • Language: English medical Q&A with reasoning chains

  • Size: 500 curated examples

  • Focus: Clinical reasoning, differential diagnosis, patient safety[More Information Needed]

Training Configuration## Bias, Risks, and Limitations


# Hyperparameters<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Max Sequence Length:     4,096 tokens

LoRA Rank:              64[More Information Needed]

Batch Size:             1 (per device)

Gradient Accumulation:  16 steps### Recommendations

Effective Batch Size:   16

Learning Rate:          3e-6<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Training Steps:         300

Optimizer:              AdamW (β1=0.9, β2=0.999)Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

LR Schedule:            Cosine with 5% warmup

Precision:              BFloat16## How to Get Started with the Model

Use the code below to get started with the model.

Reward Functions

[More Information Needed]

  1. Format Reward (90% initial weight → 40% final):

    • Encourages structured reasoning with <think> tags## Training Details

    • Rewards: -1.0 (no tags) to +2.0 (excellent reasoning)

    • Adaptive weight: decreases as format compliance improves### Training Data

  2. Semantic Reward (10% initial weight → 60% final):

    • Measures answer accuracy via cosine similarity

    • Compares model output to ground truth medical responses[More Information Needed]

    • Uses sentence-transformers embeddings

Training Procedure

Training Results

  • ✅ Format Compliance: 95%+ (responses use structured reasoning)

  • ✅ Semantic Accuracy: 85%+ similarity to expert answers

  • ✅ Convergence: Stable after 150 steps#### Preprocessing [optional]

  • ✅ Total Training Time: ~15 hours on A100 80GB

[More Information Needed]

💻 Usage

Installation#### Training Hyperparameters


pip install torch transformers unsloth sentence-transformers

```#### Speeds, Sizes, Times [optional]



### Basic Inference (CPU/GPU)<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->



```python[More Information Needed]

from transformers import AutoModelForCausalLM, AutoTokenizer

import torch## Evaluation



# Load model<!-- This section describes the evaluation protocols and provides the results. -->

model_id = "AhmedSSoliman/gpt-oss-20b-digital-twin-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)### Testing Data, Factors & Metrics

model = AutoModelForCausalLM.from_pretrained(

    model_id,#### Testing Data

    torch_dtype=torch.float16,

    device_map="auto",  # Automatically uses GPU if available<!-- This should link to a Dataset Card if possible. -->

)

[More Information Needed]

# System prompt

system_prompt = """You are a Medical Digital Twin AI.#### Factors

Step 1: Analyze within <think> tags with detailed clinical reasoning.

Step 2: Provide a clear, actionable response."""<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->



# Format input[More Information Needed]

messages = [

    {"role": "system", "content": system_prompt},#### Metrics

    {"role": "user", "content": "I have chest pain radiating to my left arm. What should I do?"}

]<!-- These are the evaluation metrics being used, ideally with a description of why. -->



# Tokenize[More Information Needed]

inputs = tokenizer.apply_chat_template(

    messages, ### Results

    tokenize=True, 

    add_generation_prompt=True,[More Information Needed]

    return_tensors="pt"

).to(model.device)#### Summary



# Generate

outputs = model.generate(

    inputs,## Model Examination [optional]

    max_new_tokens=1024,

    temperature=0.6,<!-- Relevant interpretability work for the model goes here -->

    top_p=0.9,

    do_sample=True[More Information Needed]

)

## Environmental Impact

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)

print(response)<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Optimized Inference with Unsloth (Faster)

  • Hardware Type: [More Information Needed]

from unsloth import FastLanguageModel- **Cloud Provider:** [More Information Needed]

- **Compute Region:** [More Information Needed]

model, tokenizer = FastLanguageModel.from_pretrained(- **Carbon Emitted:** [More Information Needed]

    model_name="AhmedSSoliman/gpt-oss-20b-digital-twin-v1",

    max_seq_length=4096,## Technical Specifications [optional]

    dtype=None,  # Auto-detect

    load_in_4bit=True,  # 4-bit quantization (~40GB VRAM)### Model Architecture and Objective

)

[More Information Needed]

FastLanguageModel.for_inference(model)  # Enable inference optimizations

### Compute Infrastructure

# Use same generation code as above

```[More Information Needed]



### Hardware Requirements#### Hardware



| Precision | VRAM Required | Speed | Use Case |[More Information Needed]

|-----------|---------------|-------|----------|

| 4-bit NF4 | 40GB+ | Fast | Production inference |#### Software

| 8-bit | 60GB+ | Medium | Balanced |

| FP16 | 80GB+ | Slower | Full precision |[More Information Needed]

| CPU (FP32) | 80GB RAM | Very Slow | No GPU available |

## Citation [optional]

## 📊 Example Output

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**Input:**

```**BibTeX:**

A 45-year-old male presents with chest pain radiating to the left arm, 

sweating, and shortness of breath for 30 minutes. Assessment?[More Information Needed]

APA:

Output:


<think>

This presentation is highly concerning for acute coronary syndrome (ACS), ## Glossary [optional]

specifically myocardial infarction (MI). Key clinical features:

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

1. Typical angina pattern: chest pain with left arm radiation

2. Associated symptoms: diaphoresis (sweating), dyspnea[More Information Needed]

3. Duration: 30 minutes (prolonged, not fleeting)

4. Age and gender: 45-year-old male (at-risk demographic)## More Information [optional]



Differential diagnoses to consider:[More Information Needed]

- STEMI (ST-elevation MI) - most critical

- NSTEMI (Non-ST-elevation MI)## Model Card Authors [optional]

- Unstable angina

- Less likely: pulmonary embolism, aortic dissection[More Information Needed]



This is a time-sensitive emergency requiring immediate medical intervention.## Model Card Contact

The "golden hour" for reperfusion therapy is critical for outcomes.

</think>[More Information Needed]

### Framework versions

⚠️ **EMERGENCY - CALL 911 IMMEDIATELY**

- PEFT 0.18.0
This patient is experiencing a potential **heart attack** (myocardial infarction). 

**Immediate Actions:**
1. Call 911 or activate emergency medical services NOW
2. Have patient sit or lie down, keep calm
3. If aspirin available: Give 325mg chewable aspirin (if no allergies)
4. Do NOT drive to hospital - wait for ambulance with cardiac monitoring
5. If unconscious: Begin CPR, use AED if available

**Emergency Department Will:**
- 12-lead ECG within 10 minutes
- Cardiac biomarkers (troponin, CK-MB)
- Oxygen, IV access, cardiac monitoring
- Consider reperfusion therapy (PCI or thrombolytics)

**Time is muscle** - every minute counts in preserving heart tissue.

⚠️ Important Disclaimers

Medical Disclaimer

This model is for research and educational purposes ONLY. It is:

  • NOT a substitute for professional medical advice
  • NOT validated for clinical use or diagnosis
  • NOT approved by any medical regulatory body
  • NOT suitable for emergency medical decisions

Always consult qualified healthcare providers for medical decisions.

Limitations

  1. Training Data Constraints:

    • Limited to 500 training examples
    • May not cover all medical specialties equally
    • English-language only
  2. Potential Biases:

    • Dataset may contain inherent biases
    • Not tested across diverse patient populations
    • Limited real-world clinical validation
  3. Technical Limitations:

    • Cannot access patient records or perform examinations
    • No integration with medical databases or guidelines
    • May generate plausible but incorrect information
  4. Safety Considerations:

    • Should not be used for triage or diagnosis
    • May miss critical symptoms or contraindications
    • Requires human medical oversight

🔬 Evaluation Metrics

Performance on held-out test cases:

Metric Score Description
Format Compliance 95% Uses <think> tags consistently
Semantic Accuracy 85% Cosine similarity to expert answers
Safety Referrals 92% Recommends professional care when appropriate
Response Length 600 tokens Balanced detail without verbosity
Reasoning Depth 150 words avg Sufficient clinical analysis

🚀 Deployment

Web Interface (Gradio)

# Clone repository
git clone https://github.com/AhmedSSoliman/medical-digital-twin.git
cd medical-digital-twin

# Run web interface
python chat_gpt_oss_20b.py

# With authentication
python chat_gpt_oss_20b.py --auth admin:password

# Public sharing
python chat_gpt_oss_20b.py  # Creates shareable link

REST API (Example with FastAPI)

from fastapi import FastAPI
from unsloth import FastLanguageModel

app = FastAPI()
model, tokenizer = FastLanguageModel.from_pretrained("AhmedSSoliman/gpt-oss-20b-digital-twin-v1")

@app.post("/generate")
async def generate(query: str):
    # Add generation logic here
    return {"response": response}

📚 Citation

If you use this model in your research, please cite:

@misc{gpt-oss-medical-twin-2024,
  author = {Ahmed S. Soliman},
  title = {GPT-OSS-20B Medical Digital Twin v1},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedSSoliman/gpt-oss-20b-digital-twin-v1}},
  note = {Medical AI trained with GRPO on clinical reasoning tasks}
}

🙏 Acknowledgments

  • OpenAI for the GPT-OSS-20B base model
  • Unsloth AI for training optimizations and memory efficiency
  • FreedomIntelligence for the medical reasoning dataset
  • TRL Library for GRPO implementation
  • Sentence Transformers for semantic evaluation

📄 License

This model inherits the Apache 2.0 license from GPT-OSS-20B.

Additional Terms:

  • Must include medical disclaimer when deployed
  • Not for commercial diagnostic use without proper medical oversight
  • Derivative works must maintain safety warnings

🔗 Links

📧 Contact

For questions, issues, or collaborations:

  • Author: Ahmed S. Soliman
  • GitHub: @AhmedSSoliman
  • Email: Contact via GitHub

Version: 1.0
Last Updated: December 8, 2024
Model Size: 20B parameters (LoRA adapters: ~250MB)
Training Compute: ~1,200 A100 GPU hours

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AhmedSSoliman/gpt-oss-20b-digital-twin-v1

Base model

openai/gpt-oss-20b
Adapter
(46)
this model