What if you could take a pre-built AI brain, teach it to think out loud, and make it dramatically smarter at a specific task, all on your own laptop? I take SmolLM2, train it to perform unit conversions, and go from 0% to 72% accuracy using Chain-of-Thought prompting, LoRA fine-tuning, and Rejection Fine-Tuning.
Patch tokenization, BSQ quantization, a causal transformer, and arithmetic coding — built from the ground up. Here's what broke, what worked, and where this research is headed.
The paper described the model. Here's what actually happened when we deployed it on 24 H100 GPUs at a major cancer center—the infrastructure decisions, the failures, and the thing I wish I'd built first.