Caca-Chatbot / model_card.md
Lyon28's picture
Create model_card.md
d646643 verified

Model Card: Chatbot Caca Retrieval

Model Description

Lightweight retrieval-based QA system untuk Bahasa Indonesia.

Training Data

  • Source: datasets-caca-3500
  • Size: 3,500 conversational QA pairs
  • Language: Indonesian
  • Format: User-Assistant conversations

Architecture

  • Algorithm: Hybrid scoring system
    • BM25 (40% weight) - keyword matching
    • TF-IDF + Cosine Similarity (50% weight) - semantic matching
    • Fuzzy String Matching (10% weight) - typo tolerance

Performance

Metric Value
Model Size 2.69 MB
Query Latency <10 ms
Memory Usage ~5 MB RAM
Paraphrase Accuracy High

Limitations

  • Only works for questions in dataset or similar paraphrases
  • No generative capability
  • Limited to Indonesian language

Ethical Considerations

  • Responses reflect training data (datasets-caca-3500)
  • Personality may include sarcasm/humor
  • Not suitable for critical applications

License

MIT License