Patch tokenization, BSQ quantization, a causal transformer, and arithmetic coding — built from the ground up. Here's what broke, what worked, and where this research is headed.
The paper described the model. Here's what actually happened when we deployed it on 24 H100 GPUs at a major cancer center—the infrastructure decisions, the failures, and the thing I wish I'd built first.