| # Ministral-3-3B-Turbo | |
| Run Ministral-3-3B on Qualcomm NPU with NexaSDK and turbo optimization. | |
| ## Quickstart | |
| Install nexaSDK and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai) | |
| Activate your device with your access token: | |
| ```bash | |
| nexa config set license '<access_token>' | |
| ``` | |
| Run the model locally in one line: | |
| ```bash | |
| nexa infer NexaAI/Ministral-3-3B-turbo-npu | |
| ``` | |
| ## Model Description | |
| **Ministral-3-3B-Instruct-2512** is the instruction-tuned variant of Mistral AI’s smallest Ministral 3 model: a compact multimodal language model combining a ~3.4B-parameter language core with a 0.4B-parameter vision encoder. | |
| It is post-trained in FP8 for instruction-following, making it well-suited for chat-style agents, tool use, and grounded reasoning on both text and images. | |
| With a large 256k context window and efficient edge-oriented design, it targets real-time use on GPUs and other resource-constrained hardware. | |
| ## Features | |
| - **Multimodal (vision + text)**: Understands and reasons over images alongside text in a single conversation. | |
| - **Instruction-tuned**: Optimized for following natural-language instructions, chat, and assistant-style workflows. | |
| - **Agentic capabilities**: Native support for function calling and structured JSON-style outputs for tool and API orchestration. | |
| - **Large context window**: Up to **256k tokens** for long documents, multi-step workflows, and complex sessions. | |
| - **Edge-optimized FP8 weights**: FP8 checkpoint designed for efficient deployment and serving, including on a single modern GPU. | |
| - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic. | |
| - **Part of the Ministral 3 family**: Seamlessly aligned with 3B/8B/14B base, instruct, and reasoning variants for scalable deployments. | |
| ## Use Cases | |
| - **Vision + language assistants** | |
| - Image captioning and explanation (UI screenshots, photos, diagrams) | |
| - Multimodal Q&A (e.g., “describe this chart and summarize its implications”) | |
| - **Lightweight agents and tools** | |
| - Function-calling workflows (retrieval, calculators, external APIs) | |
| - JSON-structured responses for downstream automation | |
| - **Text understanding & generation** | |
| - Classification, tagging, routing, and extraction from long documents | |
| - Short-form copywriting, drafting, and rewriting across multiple languages | |
| - **Edge & low-resource deployments** | |
| - On-device or near-edge assistants where latency, context length, and cost matter | |
| - Local/private workloads that benefit from a small yet capable multimodal model | |
| ## Inputs and Outputs | |
| **Inputs** | |
| - **Text-only prompts** | |
| - Single-turn or multi-turn chat-style conversations (`system`, `user`, `assistant` roles). | |
| - Long-context inputs up to the model’s context limit (e.g., documents, logs, transcripts). | |
| - **Multimodal prompts** | |
| - One or more images (e.g., URLs or image tensors) combined with text. | |
| - **Structured tool schemas** | |
| - Function / tool definitions for agentic workflows (JSON schemas describing functions and parameters). | |
| **Outputs** | |
| - **Generated text** | |
| - Answers, explanations, step-by-step reasoning, summaries, and creative content. | |
| - **Multimodal-aware responses** | |
| - Text grounded in the provided images (descriptions, comparisons, localized details). | |
| - **Structured tool calls** | |
| - JSON-like tool call objects for function execution and programmatic integration. | |
| - **Logits / probabilities (advanced)** | |
| - For users accessing the raw model via low-level APIs, token-level scores for custom decoding or research. | |
| ## License | |
| This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact `[email protected]` |