WeDLM-8B-Instruct-MLX
This is a full-precision (fp16) MLX version of tencent/WeDLM-8B-Instruct for inference on Apple Silicon.
Model Details
- Base Model: tencent/WeDLM-8B-Instruct
- Precision: fp16 (no quantization)
- Format: MLX SafeTensors
- Size: ~15.2 GB
About WeDLM
WeDLM (Window-based Efficient Diffusion Language Model) is a novel approach that combines:
- Entropy-based parallel decoding: Multiple tokens generated simultaneously based on prediction confidence
- Topological reordering: Efficient KV cache layout while preserving logical positions via RoPE
- Window-based generation: Fixed-size window processed in parallel per forward pass
Reference: arXiv:2512.22737
Usage
Installation
pip install mlx mlx-lm
Quick Start
from mlx_lm import load, generate
model, tokenizer = load("zimengxiong/WeDLM-8B-Instruct-MLX")
response = generate(model, tokenizer, prompt="What is machine learning?", max_tokens=256)
print(response)
Chat Template
from mlx_lm import load, generate
model, tokenizer = load("zimengxiong/WeDLM-8B-Instruct-MLX")
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)
Model Architecture
| Parameter | Value |
|---|---|
| Hidden Size | 4096 |
| Intermediate Size | 12288 |
| Num Layers | 36 |
| Num Attention Heads | 32 |
| Num KV Heads | 8 |
| Head Dim | 128 |
| Vocab Size | 151936 |
| Max Position Embeddings | 16384 |
| RoPE Theta | 1000000 |
Related Models
| Variant | HuggingFace |
|---|---|
| 4-bit | zimengxiong/WeDLM-8B-Instruct-MLX-4bit |
| 8-bit | zimengxiong/WeDLM-8B-Instruct-MLX-8bit |
| fp16 (this model) | zimengxiong/WeDLM-8B-Instruct-MLX |
License
This model inherits the license from the base model tencent/WeDLM-8B-Instruct.
- Downloads last month
- 8
Model tree for zimengxiong/WeDLM-8B-Instruct-MLX
Base model
tencent/WeDLM-8B-Instruct