-
-
-
-
-
-
Inference Providers
Active filters:
rlvr
SultanR/SmolTulu-1.7b-Reinforced-GGUF
Text Generation
•
2B
•
Updated
•
11
•
1
thuml/rt1-world-model-multi-step-rlvr
Updated
•
16
thuml/rt1-world-model-single-step-rlvr
Updated
•
10
thuml/webarena-world-model-rlvr
2B
•
Updated
•
6
thuml/bytesized32-world-model-rlvr-binary-reward
2B
•
Updated
•
9
thuml/bytesized32-world-model-rlvr-task-specific-reward
2B
•
Updated
•
6
DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO
Text Generation
•
8B
•
Updated
•
84
•
1
Question Answering
•
4B
•
Updated
•
17
•
2
thinkwee/NOVER1-Qwen2.5-7B
Question Answering
•
8B
•
Updated
•
5
•
2
mradermacher/NOVER1-Qwen3-4B-GGUF
4B
•
Updated
•
93
•
1
mradermacher/NOVER1-Qwen2.5-7B-GGUF
8B
•
Updated
•
89
•
1
mradermacher/NOVER1-Qwen3-4B-i1-GGUF
4B
•
Updated
•
1.66k
•
1
mradermacher/NOVER1-Qwen2.5-7B-i1-GGUF
8B
•
Updated
•
1.91k
•
1
DebateLabKIT/Phi-4-Argunaut-1-HIRPO
Text Generation
•
415k
•
Updated
•
5
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-GGUF
8B
•
Updated
•
120
•
1
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-i1-GGUF
8B
•
Updated
•
1.69k
•
1
Text Generation
•
2B
•
Updated
•
23
•
8
Text Generation
•
4B
•
Updated
•
24
•
1
mradermacher/airesupdated-v2-GGUF
Reinforcement Learning
•
4B
•
Updated
•
181
ABaroian/Apertus-8B-RLVR-GSM
Text Generation
•
Updated
•
2
Anonymouslolol/qwen3-8B-hanabi-step110
Reinforcement Learning
•
Updated
•
13
Text Generation
•
4B
•
Updated
•
1
mradermacher/Phi-4-Argunaut-1-HIRPO-GGUF
15B
•
Updated
•
374
mradermacher/Phi-4-Argunaut-1-HIRPO-i1-GGUF
15B
•
Updated
•
1.84k
TrialPanorama/LLaMA-3-8B-TP
Text Generation
•
Updated
•
11
Text Generation
•
Updated
•
4
TrialPanorama/Qwen-3-8B-TP
Text Generation
•
Updated
•
6