Locai L1-Large
Locai L1-Large is an open-source instruction-tuned model based on Qwen3 235B Instruct (2507), post-trained using our Forget-Me-Not framework. This framework combines experience replay and self-improvement to enhance performance whilst mitigating catastrophic forgetting. Paper coming soon.
The model achieves state-of-the-art performance on Arena Hard v2, outperforming non-reasoning variants of GPT-5, Claude Sonnet 4.5, Gemini Flash 2.5, DeepSeek V3.2, and Mistral Medium, whilst delivering competitive results across instruction-following, mathematics, and scientific reasoning.
For more details on the model training, please refer to our technical report.
Highlights
- π State-of-the-art alignment: Highest score on Arena Hard v2, outperforming all non-reasoning frontier models including GPT-5 and Claude Sonnet 4.5
- π― Improved itself: Model generated and evaluated its own training data across helpfulness, relevance, conciseness, complexity, correctness, and harmlessness, improving the base model's instruction-following, safety and alignment.
- π‘οΈ Enhanced Safety: 17% improvement on AgentHarm benchmark (27.7 vs 33.4) compared to the base model
- π¬ Maintains base capabilities: Retains Qwen's strong performance in mathematics and scientific reasoning due to forget-me-not method.
- β‘ Efficient Training: Parameter Efficient Fine-Tuning using LoRA on just 1 node of 8ΓH200 GPUs
- π± Sustainable: Trained using 100% renewable energy on UK data centres
- π Low-Resource Language Support: Improved proficiency in Celtic languages (Welsh, Irish, Scottish Gaelic) plus Basque, Armenian, Tagalog, and Swahili through bidirectional translation pairs
Evaluation Results
| Model | Arena Hard v2 | IFEval | IFBench | GSM Plus | GPQA Diamond | AgentHarm |
|---|---|---|---|---|---|---|
| Locai L1-Large | 72.9 | 92.45 | 40.14 | 90.43 | 63.63 | 27.7 |
| Qwen3-235B-Instruct | 70.8 | 91.97 | 39.46 | 90.48 | 62.63 | 33.4 |
| GPT-5 | 68.9 | 91.85 | 41.5 | 89.14 | 70.20 | 12.8 |
| Claude Sonnet 4.5 | 52.8 | 92.57 | 34.69 | 91.48 | 68.69 | 16.6 |
| Gemini 2.5 Flash | 54.4 | 91.13 | 34.01 | 89.67 | 35.35 | 40.5 |
| DeepSeek V3.2 | 52.5 | 90.89 | 35.71 | 90.10 | 80.30 | 18.2 |
| Mistral Medium | 37.9 | 81.65 | 28.91 | 89.62 | 71.21 | 69.1 |
Benchmark Details
- Arena Hard v2: Evaluates alignment with human preferences using real-world user queries
- IFEval: Measures strict instruction-following accuracy
- IFBench: Tests precise instruction-following on out-of-distribution constraints
- GSM Plus: Assesses mathematical reasoning on grade-school level problems
- GPQA Diamond: Evaluates expert-level scientific reasoning
- AgentHarm: Measures safety and robustness against adversarial attacks (lower is better)
Usage
Installation
pip install transformers torch accelerate
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "locailabs/locai-l1-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
messages = [
{"role": "user", "content": "Explain quantum entanglement in simple terms"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.7,
top_k=20,
top_p=0.8
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using vLLM (Recommended for Production)
from vllm import LLM, SamplingParams
llm = LLM(model="locailabs/locai-l1-large")
sampling_params = SamplingParams(
temperature=0.7,
top_k=20,
top_p=0.8,
)
prompts = [
"Explain quantum entanglement in simple terms."
]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
Training Details
Training Configuration
- Base Model: Qwen3-235B-Instruct-2507
- Method: Supervised Fine-Tuning (SFT) using Parameter Efficient Fine-Tuning (PEFT) through Low-Rank Adaptation (LoRA)
- Hardware: 1 node Γ 8 NVIDIA H200 GPUs
- Energy: 100% renewable energy (UK data centres)
- Parallelisation: Tensor parallelism, expert parallelism, and sequence parallelism
- MoE Optimisations: Grouped GEMM, permute fusion, shared expert overlap, auxiliary loss for balanced expert utilisation
- Memory & Compute: Activation recomputation, sample packing, Flash Attention, loss fusion with final layer
Training Data
The model was trained on a curated dataset combining:
- Self-improvement data: Generated and evaluated by the model across helpfulness, relevance, conciseness, complexity, correctness, and harmlessness
- Low-resource language translations: Bidirectional translation pairs from OpenSubtitles corpora
- Cultural alignment data: British cultural knowledge generated from CultureBank
- Self-cognition data: Multilingual Q&A pairs about the model
Ethical Considerations
Locai L1-Large has been developed with consideration for:
- Sustainability: Trained using 100% renewable energy in UK data centres
- Inclusivity: Enhanced support for low-resource languages to reduce digital inequality
- Safety: Improved robustness against adversarial attacks (17% improvement on AgentHarm)
Citation
@misc{locai2025l1large,
title={Locai L1-Large: Self-Improving Language Models with Forget-Me-Not},
author={Locai Labs},
year={2025},
url={https://www.locai.chat}
}
License
Apache 2.0
Model Card Contact
- Website: www.locai.chat
- Hugging Face: locailabs
- Issues: Please report via Hugging Face discussions
- Downloads last month
- 42
Model tree for locailabs/locai-l1-large
Base model
Qwen/Qwen3-235B-A22B-Instruct-2507