Aw yisss.
Yeah, model's absolute banger. Actually got spoiled by it so it's kinda hard to go back to cydonia D:
What GPUs do you guys have? And what quant? Cydonia to this one is a pretty big jump.
What GPUs do you guys have? And what quant? Cydonia to this one is a pretty big jump.
RX 5500 XT Nitro+ (8 Gb) at the moment ;D (waiting for RX 9070 XT Sapphire Pulse to get in stock around where I am atm)
Running Q5K_M with 16k context on KoboldCpp Vulkan and zero layers offloaded to VRAM (got 64 gb of 5600 mhz DDR5 and Ryzen 9800X3D). Funny enough, with increase of layers offload to VRAM speed actually decreases rather that what's expected.
Takes about 15 mins to get 16k context loaded up (23-27 sec per 512 batch), then it's about 1.2 tps for generation. With ContextShift it's alright considering the quality I get.
Hi. Thanks for really cool tune. I use low quant IQ3_M on KoboldCp, 4090 + 32Gb RAM, no offload 20k context with KV cache quant to 8 bit. But so far it's most coherent, slop free, really good instruction following model for me. Cydonia, recent qwen3 models don't stand even close. Only GLM-4-32B some where close to this fine tune.
Hi there!
4060Ti 16gb with 64 system RAM, q5, similar experience like Nesaliti. Nemotron is a BEAST !
