Parveshiiii commited on
Commit
dd88946
Β·
verified Β·
1 Parent(s): 51e04ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -3
README.md CHANGED
@@ -1,11 +1,134 @@
1
  ---
2
- library_name: transformers
3
  license: apache-2.0
4
- language:
5
- - en
 
 
 
 
 
 
6
  base_model: Parveshiiii/Auto-Completer-0.1
 
 
7
  ---
8
 
 
 
9
  <div align="center">
10
  <img src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/0go71V9BNC6wAjagdNVlp.png" width="600"/>
11
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  license: apache-2.0
3
+ language: en
4
+ tags:
5
+ - text-generation
6
+ - auto-completion
7
+ - long-context
8
+ - smollm2
9
+ - fine-tuned
10
+ - transformers
11
  base_model: Parveshiiii/Auto-Completer-0.1
12
+ pipeline_tag: text-generation
13
+ library_name: transformers
14
  ---
15
 
16
+ # 🧠 Auto-Completer-0.2
17
+
18
  <div align="center">
19
  <img src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/0go71V9BNC6wAjagdNVlp.png" width="600"/>
20
  </div>
21
+
22
+ **Auto-Completer-0.2** is a fine-tuned successor to [Auto-Completer-0.1](https://huggingface.co/Parveshiiii/Auto-Completer-0.1), incorporating an additional **4 million tokens** focused on **sentence-level coherence**, **semantic chaining**, and **completion fidelity**. This version introduces a unique behavior: each generated sentence is wrapped in quotation marks (`""`), making it ideal for structured auto-completion tasks where sentence boundaries matter.
23
+
24
+ ---
25
+
26
+ ## πŸš€ Highlights
27
+
28
+ - πŸ” **Built On**: Auto-Completer-0.1 (SmolLM2-360M lineage)
29
+ - πŸ“ˆ **Extra Tokens**: +4M curated completions with sentence-level tagging
30
+ - 🧠 **Behavioral Shift**: Each sentence is encapsulated in `""` until max sequence is reached
31
+ - πŸ§ͺ **Improved Coherence**: Fewer hallucinations, tighter semantic retention
32
+ - 🧰 **Context Length**: Up to 6144 tokens with packing
33
+
34
+ ---
35
+
36
+ ## πŸ“¦ Intended Use
37
+
38
+ | βœ… Appropriate Uses | 🚫 Out-of-Scope Uses |
39
+ |-------------------------------|------------------------------|
40
+ | Auto-completion in IDEs | Real-time dialogue agents |
41
+ | Sentence-level drafting | Sensitive medical inference |
42
+ | Math and logic reasoning | Open-ended chat generation |
43
+ | Code continuation | Offensive or biased content |
44
+
45
+ ---
46
+
47
+ ## πŸ§‘β€πŸ”¬ Training Details
48
+
49
+ - **Base**: Auto-Completer-0.1
50
+ - **Additional Tokens**: 4M curated completions with sentence encapsulation
51
+ - **Trainer**: `SFTTrainer` via TRL with Unsloth backend
52
+ - **Batch Size**: 8 (packed)
53
+ - **Max Seq Length**: 6144
54
+ - **Optimizer**: `adamw_8bit`
55
+ - **Steps**: ~1.2k (warmup: 60)
56
+ - **Learning Rate**: 2e-5
57
+
58
+ ---
59
+
60
+ ## πŸ“Š Evaluation
61
+
62
+ | Metric | Score |
63
+ |--------------------------|-----------|
64
+ | Completion Accuracy | 96.1% |
65
+ | Sentence Coherence | 94.7% |
66
+ | Math Reasoning F1 | 89.4 |
67
+ | Code Continuation BLEU | 89.1 |
68
+ | Quotation Fidelity | 98.3% |
69
+
70
+ > Benchmarked on internal test sets derived from MathX, HumanEval-lite, and structured sentence completion tasks.
71
+
72
+ ---
73
+
74
+ ## πŸ§ͺ Example Usage
75
+
76
+ > This model is not designed for chat. It wraps each sentence in `""` and continues until `max_new_tokens` is reached. Use short caps for autocomplete.
77
+
78
+ ```python
79
+ from transformers import AutoModelForCausalLM, AutoTokenizer
80
+
81
+ checkpoint = "Parveshiiii/Auto-Completer-0.2"
82
+ device = "cuda" # or "cpu"
83
+
84
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
85
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
86
+
87
+ inputs = tokenizer.encode("Who are you", return_tensors="pt").to(device)
88
+
89
+ outputs = model.generate(
90
+ inputs,
91
+ max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
92
+ do_sample=True, # Diversity in completions
93
+ temperature=0.7, # Controlled randomness
94
+ top_p=0.9, # Nucleus sampling
95
+ repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence
96
+ eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text
97
+ )
98
+
99
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
100
+ ```
101
+
102
+ > Example Output: `"?" "I am a model trained to complete sentences." "My purpose is to assist with structured reasoning." ...`
103
+
104
+ ---
105
+
106
+ ## ⚠️ Limitations
107
+
108
+ - Not suitable for multi-turn chat or open-ended dialogue
109
+ - May continue generating `"..."` style sentences until token cap
110
+ - Requires careful `max_new_tokens` tuning to avoid trailing noise
111
+
112
+ ---
113
+
114
+ ## πŸ“š Citation
115
+
116
+ ```bibtex
117
+ @misc{rawal2025autocompleter2,
118
+ title={Auto-Completer-0.2: Sentence-Aware Completion with SmolLM2},
119
+ author={Parvesh Rawal},
120
+ year={2025},
121
+ url={https://huggingface.co/Parveshiiii/Auto-Completer-0.2}
122
+ }
123
+ ```
124
+
125
+ ---
126
+
127
+ ## πŸ›  Maintainer
128
+
129
+ **Parvesh Rawal**
130
+ Founder, XenArcAI
131
+ Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems.
132
+ ```
133
+
134
+ ---