Banner

aquif-3.5-Nano-1B

aquif-3.5-Nano-1B is a lightweight yet capable language model with 1.72B parameters, delivering strong performance-to-efficiency ratios for resource-constrained deployments. Built on Qwen3-1.7B with comprehensive instruction-tuning, this model achieves competitive results across reasoning, mathematics, and code generation tasks while maintaining compatibility with consumer-grade hardware.

With a 40K token context window and bfloat16 precision, aquif-3.5-Nano-1B enables practical applications requiring fast inference and minimal memory overhead.

Model Overview

Attribute Value
Total Parameters 1.72B
Context Window 40K tokens
Hidden Size 2048
Attention Heads 16
Key-Value Heads 8
Hidden Layers 28
Activation SiLU
Precision BF16
Model Type Causal Language Model (Qwen3)
Multilingual 10 languages
License Apache 2.0

Key Features

Efficient Architecture

aquif-3.5-Nano-1B achieves remarkable performance density through:

  • Optimized Layer Configuration: 28 layers with full attention mechanism balancing capacity and efficiency
  • Extended Context: 40K token window enables complex reasoning and document processing without architectural constraints
  • Memory Efficient: Designed for deployment on devices with 8GB+ VRAM; quantization support available for smaller footprints
  • Fast Inference: Minimal parameter count enables rapid token generation and batch processing

Strong Core Capabilities

  • Reasoning: Excels at multi-step logical inference and problem-solving
  • Mathematics: Robust performance on mathematical reasoning and calculation tasks
  • Code Generation: Proficient in generating and understanding code across multiple programming languages
  • Instruction Following: Refined through comprehensive instruction-tuning for reliable task execution

Multilingual Support

Native support for 10 languages including English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

Evaluation

Benchmark Performance

Metric aquif-3.5-Nano-1B Ministral 3 3B Instruct aquif-3.5-3B Granite 4.0 H 1B Qwen3-1.7B
MMLU 72.9 70.7 70.2 59.7 59.1
GPQA Diamond 42.0 35.8 35.8 29.7 27.7
AIME 2025 28.7 22.0 13.4 6.3 7.3
LiveCodeBench 24.8 24.7 23.1 11.5 12.6
Average 42.1 38.3 35.6 26.8 26.7

Performance Analysis

Despite being a lightweight nano model, aquif-3.5-Nano-1B demonstrates exceptional capability relative to parameter count:

  • MMLU: 72.9% accuracy, significantly outperforming base Qwen3-1.7B and Granite 4.0 H 1B
  • GPQA Diamond: 42.0% on expert-level questions, indicating robust reasoning capability
  • AIME 2025: 28.7% on advanced mathematics, demonstrating substantial improvement over comparable models
  • LiveCodeBench: 24.8% on real-world programming tasks, competitive with larger instruction-tuned variants

The model shows particular strength in reasoning and technical tasks, making it suitable for applications requiring intelligence without significant computational overhead.

Installation

pip install transformers torch

For faster inference with quantization support:

pip install transformers torch bitsandbytes

Technical Specifications

  • Architecture: Qwen3 Causal Language Model
  • Attention Mechanism: Full attention across all 28 layers
  • Position Encoding: RoPE (Rotary Position Embeddings) with theta=1,000,000
  • Normalization: RMSNorm with epsilon=1e-6
  • Vocabulary Size: 151,936 tokens
  • Head Dimension: 128
  • Intermediate Size: 6,144
  • Training Data Format: Instructions and reasoning tasks in multilingual contexts
  • Attention Dropout: 0.0
  • KV Caching: Enabled for efficient multi-turn inference

Use Cases

aquif-3.5-Nano-1B excels at:

  • Edge Deployment: Real-time inference on resource-limited devices
  • API Services: Cost-effective inference at scale with minimal latency
  • Research Prototyping: Fast experimentation with instruction-following models
  • Educational Applications: Learning model behavior without computational barriers
  • Local Processing: Privacy-preserving on-device inference
  • Embedded Systems: Integration into IoT and edge computing environments
  • Mathematical Reasoning: Problem-solving and technical explanation tasks
  • Code Assistance: Programming help and code generation at constrained budgets

Limitations and Considerations

  • Parameter Scale: While efficient, smaller capacity compared to 3B+ models may limit performance on extremely complex tasks
  • Context Length: 40K tokens supports extended reasoning but less than frontier models
  • Hardware Optimization: Best performance with recent hardware supporting BF16; FP32 inference available but slower
  • Specialized Domains: May require domain-specific fine-tuning for niche applications
  • Real-Time Requirements: Suitable for most applications; extremely latency-critical scenarios may benefit from optimization or quantization

Performance Optimization

  • Quantization: Use INT8 quantization to reduce memory footprint from ~4GB to 2-2.5GB
  • Flash Attention: Compatible with flash-attention implementations for faster inference
  • KV Caching: Leverages caching for efficient multi-turn conversations
  • Batch Inference: Process multiple prompts simultaneously for throughput optimization

Acknowledgements

  • Qwen Team: Base architecture and foundational model
  • HuggingFace: Model infrastructure and community ecosystem
  • aquif AI Research Team: Instruction-tuning optimization and performance refinement

License

This project is released under the Apache 2.0 License.


Made in 🇧🇷

© 2025 aquif AI. All rights reserved.

Downloads last month
138
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquif-ai/aquif-3.5-Nano-1B

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(344)
this model
Finetunes
2 models
Quantizations
3 models

Collections including aquif-ai/aquif-3.5-Nano-1B