# Model Card: Chatbot Caca Retrieval ## Model Description Lightweight retrieval-based QA system untuk Bahasa Indonesia. ### Training Data - **Source:** datasets-caca-3500 - **Size:** 3,500 conversational QA pairs - **Language:** Indonesian - **Format:** User-Assistant conversations ### Architecture - **Algorithm:** Hybrid scoring system - BM25 (40% weight) - keyword matching - TF-IDF + Cosine Similarity (50% weight) - semantic matching - Fuzzy String Matching (10% weight) - typo tolerance ### Performance | Metric | Value | |--------|-------| | Model Size | 2.69 MB | | Query Latency | <10 ms | | Memory Usage | ~5 MB RAM | | Paraphrase Accuracy | High | ### Limitations - Only works for questions in dataset or similar paraphrases - No generative capability - Limited to Indonesian language ### Ethical Considerations - Responses reflect training data (datasets-caca-3500) - Personality may include sarcasm/humor - Not suitable for critical applications ### License MIT License