Model Card: Chatbot Caca Retrieval
Model Description
Lightweight retrieval-based QA system untuk Bahasa Indonesia.
Training Data
- Source: datasets-caca-3500
- Size: 3,500 conversational QA pairs
- Language: Indonesian
- Format: User-Assistant conversations
Architecture
- Algorithm: Hybrid scoring system
- BM25 (40% weight) - keyword matching
- TF-IDF + Cosine Similarity (50% weight) - semantic matching
- Fuzzy String Matching (10% weight) - typo tolerance
Performance
| Metric | Value |
|---|---|
| Model Size | 2.69 MB |
| Query Latency | <10 ms |
| Memory Usage | ~5 MB RAM |
| Paraphrase Accuracy | High |
Limitations
- Only works for questions in dataset or similar paraphrases
- No generative capability
- Limited to Indonesian language
Ethical Considerations
- Responses reflect training data (datasets-caca-3500)
- Personality may include sarcasm/humor
- Not suitable for critical applications
License
MIT License