Lettuce MiniLM BGE-M3 v1 - ONNX INT8
Lettuce MiniLM BGE-M3 v1 is a distilled and quantized MiniLM-based sentence embedding model designed for fast, on-device semantic search and conversational memory retrieval.
It provides:
- Small model size (int8 quantized ONNX)
- Low-latency inference on CPU and mobile devices
- 384-dimensional embeddings
- Sentence-transformers compatible tokenizer + config
- Great for roleplay memory systems, local RAG, vector search, clustering
This model is ideal for applications where speed and size matter, such as:
- Offline / on-device chat apps
- Memory retrieval for roleplay systems
- Lightweight RAG pipelines
- Mobile devices (Android, iOS)
- Desktop apps (Tauri, Electron, native)
Model Description
- Base architecture:
all-MiniLM-L6-v2(6-layer MiniLM encoder) - Teacher:
BAAI/bge-m3 - Embed dimension: 384
- Format: ONNX (int8 quantized)
- Pooling: Mean pooling + normalization
- Tokenizer: WordPiece (MiniLM-compatible)
This model was trained by distilling a larger teacher embedding model (BGE-M3) into a compact MiniLM student, then exporting and quantizing the model to ONNX int8 for maximum runtime efficiency.
Usage (Python + ONNX Runtime)
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("zeolit/lettuce-minilm-bge-m3-v1-onnx-int8")
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
def embed(texts):
if isinstance(texts, str):
texts = [texts]
enc = tokenizer(
texts,
padding=True,
truncation=True,
max_length=256,
return_tensors="np",
)
outputs = session.run(
["sentence_embedding"],
{
"input_ids": enc["input_ids"],
"attention_mask": enc["attention_mask"]
},
)[0]
return outputs # (batch_size, 384)
embeddings = embed("Sam took a bullet for me.")
print(embeddings.shape) # (1, 384)
Limitations
- Primarily optimized for English text
- Student model is smaller than the teacher โ not intended for high-precision semantic tasks
- Not suitable for safety-critical decision making
License
This model is released under the Apache-2.0 license.
Derived from:
sentence-transformers/all-MiniLM-L6-v2โ Apache-2.0BAAI/bge-m3โ MIT License
Acknowledgements
Thanks to:
- Sentence Transformers
- BAAI for the BGE-M3 model
- The ONNX Runtime team
- Downloads last month
- 16