lucrbrtv (Luc Robert--Villanueva)

upvoted an article 3 days ago

Article

We Got Claude to Fine-Tune an Open Source LLM

4 days ago

•

299

upvoted an article 6 days ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

7 days ago

•

224

liked a Space 6 days ago

Z Image Turbo

🏃

1.2k

Generate images from text prompts

liked 2 datasets 7 days ago

HuggingFaceM4/FineVisionMax

Viewer • Updated Oct 21 • 24.2M • 27.3k • 15

timm/mini-imagenet

Viewer • Updated Nov 20, 2024 • 65k • 10.1k • 18

upvoted a paper 8 days ago

Rotary Position Embedding for Vision Transformer

Paper • 2403.13298 • Published Mar 20, 2024 • 6

upvoted 2 articles 8 days ago

Article

~Don't~ Repeat Yourself

Apr 5, 2022

•

48

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

+5

May 21

•

234

reacted to sergiopaniego's post with 🔥 9 days ago

Post

1698

nanochat is now in transformers!

The LLM by @karpathy is officially in the library, and we wrote a blog covering: how did we port the model, differences from the original, and how to run or train it.

go read it 🤓

nanochat-students/transformers

liked a model 10 days ago

Tongyi-MAI/Z-Image-Turbo

Text-to-Image • Updated 6 days ago • 187k • • 2.27k

liked a model 11 days ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated 11 days ago • 9.34k • 640

reacted to samerzaher80's post with 👍 11 days ago

Post

1615

AetherMind_SRL: How I beat 7B models on MMLU with 184M params and a $300 GPU
I’m Sameer, a solo researcher from Iraq working on a single RTX 3050 8GB laptop.Today I’m releasing AetherMind_SRL – a 184M-parameter NLI model that was trained only on tasks (SNLI, MNLI, ANLI, and a small clinical Alzheimer’s dataset).
It was never fine-tuned or even shown a single MMLU question during training.Yet here are the zero-shot MMLU (57 subjects) results:Model
MMLU Zero-Shot
Training Data
AetherMind_SRL (me)
184M
36.05 %
Only NLI (SNLI/MNLI/ANLI + ADNI)
DeBERTa-v3-base
278M
~30.8 %
General pre-training
BERT-large
340M
27–30 %
General pre-training
LLaMA-1 7B
7B
34–35 %
Massive text corpus
LLaMA-2 7B
7B
~45 %
Bigger + better data

Yes – my 184M model beats every classic 300–400M model and the original 7-billion-parameter LLaMA-1, all while running at 300+ samples/sec on a $300 laptop GPU.How did this happen?I built a standardized self-improvement loop called AetherMind Self-Reflective Learning (SRL) v1.0:Train normally on NLI
Let the model predict on hard adversarial data (ANLI)
Log every mistake + low-confidence case
Build a balanced “SMART” buffer (60% errors + 40% correct anchors)
Fine-tune with tiny LR and error-weighted loss
Repeat until stable
That’s it. No external knowledge, no MMLU data, no cluster.
Just pure reasoning transfer from entailment/contradiction patterns → real-world knowledge.Try it yourself python
from transformers import pipeline
import torch

nli_pipeline = pipeline(
"text-classification",
model="samerzaher80/AetherMind_SRL",
device=0 if torch.cuda.is_available() else -1
)

# DEFINE YOUR TEST HERE
premise = "Patient shows progressive memory decline."
hypothesis = "Patient shows progressive memory decline."

input_text = f"{premise} [SEP] {hypothesis}"
result = nli_pipeline(input_text)[0]
print(f"Prediction: {result['label']}")
print(f"Confidence: {result['score']:
Model: samerzaher80/AetherMind_SRL

liked a model 12 days ago

google/embeddinggemma-300m

liked a dataset 14 days ago

MrLight/paper-visa

Viewer • Updated Dec 6, 2024 • 103k • 535 • 1

upvoted an article 14 days ago

Article

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Nov 3

•

47

reacted to Kseniase's post with ❤️ 17 days ago

Post

6031

12 Types of JEPA

Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:

1. LeJEPA → LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the “ideal” JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA

2. JEPA-T → JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text

3. Text-JEPA → Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs

4. N-JEPA (Noise-based JEPA) → Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification

5. SparseJEPA → SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy

6. TS-JEPA (Time Series JEPA) → Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders

Read further below ↓
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

upvoted an article 17 days ago

Article

Text-to-image Architectural Experiments

25 days ago

•

37

upvoted an article 18 days ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

Oct 30

•

67

liked a model 18 days ago

janhq/Jan-v2-VL-high-gguf

Image-Text-to-Text • 8B • Updated 12 days ago • 56.1k • 25

upvoted an article 19 days ago

Article

We’re open-sourcing our text-to-image model and the process behind it

26 days ago

•

73

Luc Robert--Villanueva

AI & ML interests

Recent Activity

Organizations

We Got Claude to Fine-Tune an Open Source LLM

Transformers v5: Simple model definitions powering the AI ecosystem

Z Image Turbo

HuggingFaceM4/FineVisionMax

timm/mini-imagenet

Rotary Position Embedding for Vision Transformer

~Don't~ Repeat Yourself

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Tongyi-MAI/Z-Image-Turbo

deepseek-ai/DeepSeek-Math-V2

google/embeddinggemma-300m

MrLight/paper-visa

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Text-to-image Architectural Experiments

Why Did MiniMax M2 End Up as a Full Attention Model?

janhq/Jan-v2-VL-high-gguf

We’re open-sourcing our text-to-image model and the process behind it

Luc Robert--Villanueva

AI & ML interests

Recent Activity

Organizations

lucrbrtv's activity

We Got Claude to Fine-Tune an Open Source LLM

Transformers v5: Simple model definitions powering the AI ecosystem

Z Image Turbo

~Don't~ Repeat Yourself

nanoVLM: The simplest repository to train your VLM in pure PyTorch

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Text-to-image Architectural Experiments

Why Did MiniMax M2 End Up as a Full Attention Model?

We’re open-sourcing our text-to-image model and the process behind it