Lyon28 commited on
Commit
f9e51c6
·
verified ·
1 Parent(s): 29e65f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -507
README.md CHANGED
@@ -53,510 +53,4 @@ print(f"Loaded {len(data['qa_pairs'])} QA pairs!")
53
 
54
  ## Credits
55
 
56
- Created by Lyon28
57
-
58
-
59
- <!--
60
- HEADER SECTION
61
- -->
62
-
63
- <div align="center">
64
- <picture>
65
- <source
66
- media="(prefers-color-scheme: dark)"
67
- srcset="https://huggingface.co/Lyon28/caca-10m/resolve/main/logo-dark.png"
68
- type="image/png"
69
- />
70
- <source
71
- media="(prefers-color-scheme: light)"
72
- srcset="https://huggingface.co/Lyon28/caca-10m/resolve/main/logo-light.png"
73
- type="image/png"
74
- />
75
- <img
76
- src="https://huggingface.co/Lyon28/caca-10m/resolve/main/logo.png"
77
- alt="Caca Transformers Logo"
78
- title="Caca - Modern Transformer Architecture"
79
- width="60%"
80
- height="auto"
81
- loading="lazy"
82
- />
83
- </picture>
84
- </div>
85
-
86
- <!--
87
- BADGES SECTION
88
- -->
89
-
90
- <div align="center">
91
-
92
- <!-- Social Links -->
93
- <p>
94
- <a href="https://huggingface.co/Lyon28" target="_blank" rel="noopener noreferrer">
95
- <img
96
- src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Lyon28-ffc107?color=ffc107&logoColor=white"
97
- alt="Hugging Face Profile"
98
- title="Visit Hugging Face Profile"
99
- />
100
- </a>
101
- </p>
102
-
103
- <!-- License Badge -->
104
- <p>
105
- <a
106
- href="https://github.com/Lyon-28/caca-transformers?tab=Apache-2.0-1-ov-file"
107
- target="_blank"
108
- rel="noopener noreferrer"
109
- title="Apache 2.0 License"
110
- >
111
- <img
112
- src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"
113
- alt="License: Apache 2.0"
114
- height="20"
115
- />
116
- </a>
117
- </p>
118
-
119
- <!-- PyPI Badge -->
120
- <p>
121
- <a href="https://pypi.org/project/caca-transformers/" target="_blank" rel="noopener noreferrer">
122
- <img
123
- src="https://img.shields.io/pypi/v/caca-transformers?color=blue&label=PyPI&logo=pypi&logoColor=white"
124
- alt="PyPI Version"
125
- title="View on PyPI"
126
- />
127
- </a>
128
- </p>
129
-
130
- <!-- GitHub Stars -->
131
- <p>
132
- <a href="https://github.com/Lyon-28/caca-transformers" target="_blank" rel="noopener noreferrer">
133
- <img
134
- src="https://img.shields.io/github/stars/Lyon-28/caca-transformers?style=social&label=Star&maxAge=2592000"
135
- alt="GitHub Stars"
136
- title="Star on GitHub"
137
- />
138
- </a>
139
- </p>
140
-
141
- <!-- Description -->
142
- <p>
143
- <strong>Arsitektur Transformer Modern dengan GQA, RoPE, SwiGLU &amp; Flash Attention</strong>
144
- </p>
145
-
146
- </div>
147
-
148
- <!-- Horizontal Rule -->
149
- <hr/>
150
-
151
- <!--
152
- WARNING/ALERT SECTION
153
- -->
154
-
155
- <blockquote>
156
- <p>
157
- <strong>🔬 RESEARCH PROJECT</strong>
158
- </p>
159
- <p>
160
- <strong>⚠️ PERHATIAN: MODEL UNTRAINED</strong>
161
- </p>
162
- <p>
163
- Model ini memiliki bobot random dan memerlukan pretraining sebelum digunakan.
164
- Tidak bisa langsung digunakan untuk inference!<br/>
165
- Model ini adalah eksperimen arsitektur dan belum divalidasi untuk production use.
166
- </p>
167
- </blockquote>
168
-
169
- <!--
170
- MAIN TITLE
171
- -->
172
-
173
- <h1 align="center">
174
- 🐣 CACA-10M - TINY
175
- </h1>
176
-
177
- <p align="center">
178
- <strong>🔢 10,485,760 Parameters (0.01B)</strong>
179
- </p>
180
-
181
- <p align="center">
182
- <strong>💾 ~0.02GB (FP16) / ~0.04GB (FP32)</strong>
183
- </p>
184
-
185
- <p align="center">
186
- <strong>📏 8,192 Context Length</strong>
187
- </p>
188
-
189
- <p align="center">
190
- <strong>🎯 Use Case:</strong> Eksperimen cepat, edge devices, pembelajaran
191
- </p>
192
-
193
- <p align="center">
194
- <strong>🖥️ Recommended GPU:</strong> GTX 1060 6GB or better
195
- </p>
196
-
197
- <!--
198
- FEATURES SECTION
199
- -->
200
-
201
- <h2>🎯 Fitur Utama</h2>
202
-
203
- <p>
204
- Arsitektur Caca menggabungkan teknik-teknik modern terbaik dari berbagai model state-of-the-art:
205
- </p>
206
-
207
- <ul>
208
- <li>
209
- <strong>🔄 Grouped Query Attention (GQA)</strong> -
210
- Keseimbangan optimal antara kecepatan inference dan kualitas output
211
- </li>
212
- <li>
213
- <strong>🌀 RoPE (Rotary Positional Embeddings)</strong> -
214
- Encoding posisi yang terbukti efektif untuk sequence panjang
215
- </li>
216
- <li>
217
- <strong>⚡ SwiGLU Activation</strong> -
218
- Performa superior dibanding ReLU/GELU dalam language modeling
219
- </li>
220
- <li>
221
- <strong>📊 RMSNorm</strong> -
222
- Normalisasi yang lebih efisien dan stabil dibanding LayerNorm
223
- </li>
224
- <li>
225
- <strong>🪟 Sliding Window Attention</strong> -
226
- Efisiensi memori untuk context window panjang (4,096 tokens)
227
- </li>
228
- <li>
229
- <strong>💫 Flash Attention Compatible</strong> -
230
- Support untuk Flash Attention 2-4x lebih cepat (opsional)
231
- </li>
232
- <li>
233
- <strong>🔄 KV Cache Support</strong> -
234
- Efficient autoregressive generation dengan caching
235
- </li>
236
- </ul>
237
-
238
- <!--
239
- TABLE SECTION - dengan semua atribut
240
- -->
241
-
242
- <h2 align="center">🏗️ Spesifikasi Teknis</h2>
243
-
244
- <div align="center">
245
-
246
- <table>
247
- <caption>
248
- <strong>Model Configuration Parameters</strong>
249
- </caption>
250
- <colgroup>
251
- <col style="width: 50%"/>
252
- <col style="width: 50%"/>
253
- </colgroup>
254
- <thead>
255
- <tr>
256
- <th align="left">Parameter</th>
257
- <th align="right">Nilai</th>
258
- </tr>
259
- </thead>
260
- <tbody>
261
- <tr>
262
- <td align="left"><strong>Total Parameters</strong></td>
263
- <td align="right"><code>10,485,760</code> (~0.01B)</td>
264
- </tr>
265
- <tr>
266
- <td align="left"><strong>Vocab Size</strong></td>
267
- <td align="right"><code>50,000</code></td>
268
- </tr>
269
- <tr>
270
- <td align="left"><strong>Hidden Size</strong></td>
271
- <td align="right"><code>256</code></td>
272
- </tr>
273
- <tr>
274
- <td align="left"><strong>Num Layers</strong></td>
275
- <td align="right"><code>8</code></td>
276
- </tr>
277
- <tr>
278
- <td align="left"><strong>Attention Heads</strong></td>
279
- <td align="right"><code>8</code></td>
280
- </tr>
281
- <tr>
282
- <td align="left"><strong>KV Heads (GQA)</strong></td>
283
- <td align="right"><code>2</code></td>
284
- </tr>
285
- <tr>
286
- <td align="left"><strong>GQA Ratio</strong></td>
287
- <td align="right"><code>4:1</code></td>
288
- </tr>
289
- <tr>
290
- <td align="left"><strong>Intermediate Size</strong></td>
291
- <td align="right"><code>682</code></td>
292
- </tr>
293
- <tr>
294
- <td align="left"><strong>Context Length</strong></td>
295
- <td align="right"><code>8,192</code> tokens</td>
296
- </tr>
297
- <tr>
298
- <td align="left"><strong>Sliding Window</strong></td>
299
- <td align="right"><code>4,096</code> tokens</td>
300
- </tr>
301
- <tr>
302
- <td align="left"><strong>RoPE Theta</strong></td>
303
- <td align="right"><code>10,000</code></td>
304
- </tr>
305
- <tr>
306
- <td align="left"><strong>Memory (FP16)</strong></td>
307
- <td align="right">~<code>0.02</code> GB</td>
308
- </tr>
309
- <tr>
310
- <td align="left"><strong>Memory (FP32)</strong></td>
311
- <td align="right">~<code>0.04</code> GB</td>
312
- </tr>
313
- </tbody>
314
- <tfoot>
315
- <tr>
316
- <td colspan="2" align="center">
317
- <small><em>All values are approximate and may vary based on implementation</em></small>
318
- </td>
319
- </tr>
320
- </tfoot>
321
- </table>
322
-
323
- </div>
324
-
325
- <!--
326
- DETAILS/SUMMARY - Collapsible sections
327
- -->
328
-
329
- <h2>📚 Model Family</h2>
330
-
331
- <p>Kami menyediakan berbagai ukuran model untuk berbagai use case:</p>
332
-
333
- <details open>
334
- <summary>
335
- <strong>🐣 Tiny &amp; Small Models (10M - 500M)</strong>
336
- </summary>
337
-
338
- <p>Cocok untuk: Eksperimen cepat, edge devices, pembelajaran</p>
339
-
340
- <table>
341
- <thead>
342
- <tr>
343
- <th>Model</th>
344
- <th>Params</th>
345
- <th>Hidden</th>
346
- <th>Layers</th>
347
- <th>Heads</th>
348
- <th>KV Heads</th>
349
- <th>Context</th>
350
- <th>Memory (FP16)</th>
351
- </tr>
352
- </thead>
353
- <tbody>
354
- <tr>
355
- <td>
356
- <a href="https://huggingface.co/Lyon28/caca-10m" target="_blank">caca-10M</a>
357
- </td>
358
- <td>10M</td>
359
- <td>256</td>
360
- <td>8</td>
361
- <td>8</td>
362
- <td>2</td>
363
- <td>8K</td>
364
- <td>~0.02 GB</td>
365
- </tr>
366
- <tr>
367
- <td>
368
- <a href="https://huggingface.co/Lyon28/caca-50m" target="_blank">caca-50M</a>
369
- </td>
370
- <td>50M</td>
371
- <td>512</td>
372
- <td>12</td>
373
- <td>8</td>
374
- <td>2</td>
375
- <td>8K</td>
376
- <td>~0.1 GB</td>
377
- </tr>
378
- <tr>
379
- <td>
380
- <a href="https://huggingface.co/Lyon28/caca-100m" target="_blank">caca-100M</a>
381
- </td>
382
- <td>100M</td>
383
- <td>768</td>
384
- <td>12</td>
385
- <td>12</td>
386
- <td>3</td>
387
- <td>8K</td>
388
- <td>~0.2 GB</td>
389
- </tr>
390
- </tbody>
391
- </table>
392
-
393
- </details>
394
-
395
- <details>
396
- <summary>
397
- <strong>🦅 Medium Models (1B - 10B)</strong>
398
- </summary>
399
-
400
- <p>Cocok untuk: Aplikasi production, fine-tuning, domain-specific tasks</p>
401
-
402
- <p><em>Click to expand for model list...</em></p>
403
-
404
- </details>
405
-
406
- <!--
407
- CODE BLOCKS dengan syntax highlighting
408
- -->
409
-
410
- <h2>🚀 Quick Start</h2>
411
-
412
- <h3>💻 Installation</h3>
413
-
414
- <pre><code class="language-bash"># Install dengan xFormers untuk speedup 3x
415
- pip install caca-transformers[xformers]
416
-
417
- # Atau manual
418
- pip install caca-transformers
419
- pip install xformers
420
-
421
- # Untuk Flash Attention (4x speedup) - opsional
422
- pip install flash-attn --no-build-isolation
423
- </code></pre>
424
-
425
- <h3>Penggunaan Dasar</h3>
426
-
427
- <pre><code class="language-python">from caca_transformers import CacaForCausalLM, CacaConfig
428
- import torch
429
-
430
- # Load model
431
- model = CacaForCausalLM.from_pretrained("Lyon28/caca-10m")
432
-
433
- # Atau buat dari scratch
434
- config = CacaConfig()
435
- model = CacaForCausalLM(config)
436
-
437
- # Info model
438
- print(f"Parameters: {model.num_parameters():,}")
439
- </code></pre>
440
-
441
- <!--
442
- INLINE ELEMENTS
443
- -->
444
-
445
- <h2>💡 Tips &amp; Best Practices</h2>
446
-
447
- <p>
448
- Gunakan <kbd>Ctrl</kbd> + <kbd>C</kbd> untuk copy code.
449
- Parameter <code>learning_rate</code> sebaiknya <mark>3e-4</mark> untuk pretraining.
450
- Formula RMSNorm: <code>x / RMS(x) * γ</code> dimana
451
- RMS(x) = <code>sqrt(mean(x<sup>2</sup>) + ε)</code>
452
- </p>
453
-
454
- <p>
455
- <small>
456
- <em>Note: Semua nilai adalah perkiraan dan dapat bervariasi</em>
457
- </small>
458
- </p>
459
-
460
- <p>
461
- Referensi: <cite>Attention is All You Need</cite> (Vaswani et al., 2017)
462
- </p>
463
-
464
- <!--
465
- MIXED CONTENT TABLE
466
- -->
467
-
468
- <h2>📊 Perbandingan dengan Arsitektur Lain</h2>
469
-
470
- <table>
471
- <thead>
472
- <tr>
473
- <th rowspan="2">Feature</th>
474
- <th colspan="2">Decoder-Only</th>
475
- <th colspan="2">Others</th>
476
- </tr>
477
- <tr>
478
- <th>Caca</th>
479
- <th>LLaMA 2</th>
480
- <th>GPT-3</th>
481
- <th>BERT</th>
482
- </tr>
483
- </thead>
484
- <tbody>
485
- <tr>
486
- <td>GQA</td>
487
- <td align="center">✅</td>
488
- <td align="center">✅</td>
489
- <td align="center">❌</td>
490
- <td align="center">❌</td>
491
- </tr>
492
- <tr>
493
- <td>RoPE</td>
494
- <td align="center">✅</td>
495
- <td align="center">✅</td>
496
- <td align="center">❌</td>
497
- <td align="center">❌</td>
498
- </tr>
499
- <tr>
500
- <td>Open Source</td>
501
- <td align="center">✅</td>
502
- <td align="center">✅</td>
503
- <td align="center">❌</td>
504
- <td align="center">✅</td>
505
- </tr>
506
- </tbody>
507
- </table>
508
-
509
- <!--
510
- FOOTER SECTION
511
- -->
512
-
513
- <hr/>
514
-
515
- <div align="center">
516
-
517
- <h2>🌟 Star History</h2>
518
-
519
- <a href="https://star-history.com/#Lyon-28/caca-transformers&Date" target="_blank" rel="noopener noreferrer">
520
- <img
521
- src="https://api.star-history.com/svg?repos=Lyon-28/caca-transformers&type=Date"
522
- alt="Star History Chart"
523
- title="View Star History"
524
- width="100%"
525
- loading="lazy"
526
- />
527
- </a>
528
-
529
- </div>
530
-
531
- <hr/>
532
-
533
- <div align="center">
534
-
535
- <p>
536
- <strong>🚀 Built with ❤️ for the Indonesian AI Community</strong>
537
- </p>
538
-
539
- <p>
540
- <a href="https://github.com/Lyon-28/caca-transformers" target="_blank" rel="noopener noreferrer">GitHub</a>
541
-
542
- <a href="https://huggingface.co/Lyon28" target="_blank" rel="noopener noreferrer">Hugging Face</a>
543
- </p>
544
-
545
- <p>
546
- <small>
547
- <strong>Dibuat oleh
548
- <a href="https://huggingface.co/Lyon28" target="_blank" rel="noopener noreferrer">Lyon</a>
549
- </strong>
550
- <br/>
551
- Apache 2.0 License | 2025
552
- </small>
553
- </p>
554
-
555
- </div>
556
-
557
- <!--
558
- TODO:
559
- - Add more model variants
560
- - Include benchmark results
561
- - Add training scripts
562
- -->
 
53
 
54
  ## Credits
55
 
56
+ Created by Lyon28