z-lab's DFlash drafter was trained against stock Gemma 4 31B. We dropped it on top of our QLoRA fine-tune and it captured 92% of the published speedup with no drafter retraining. Here is the math, the vLLM patch we had to upstream to make it run, and the prod-cutover numbers (~15× faster, ~4× cheaper).
DFlashSpeculative DecodingGemma 4vLLMInferenceH100QLoRADFO
Read MoreGemma4 and Qwen3 were trained by different organizations on different data with different architectures. Their internal representations are 99.2% similar at matched depth. Neither model knew the other existed.
LarQLInterpretabilityCKACross-ModelMechanistic InterpretabilityUniversal Constants
Read MoreTwo natively-trained 1-bit language models, from two different organizations, converge on the same anomaly: the four-stage circuit that organizes every fp16 transformer simply isn't there. Both models still answer correctly. The structure is gone, but the behavior survived.
LarQLInterpretabilityQuantizationBitNetBonsaiMechanistic Interpretability
Read MoreWe ran a 200-item RAG arena on the AskTheDoctor corpus across three models and two retrieval configurations. The headline (v2-atd ≈ Llama 4 Scout, both at ~0.58) is interesting. The methodology footnote is more interesting: we then re-judged 415 of those answers with two different LLM judges and got Spearman ρ = 0.55 between them. That number is the case for human calibration.
RAG-ArenaScoredQARAG RoutingEXITLLM-as-JudgeSpearmanEvaluationQLoRA
Read MoreA single rank-1 weight edit suppresses one learned fact while leaving the rest of the model intact. No fine-tuning. No retraining. Just a feature subtracted from one layer's gate matrix — with a receipt.
LarQLInterpretabilityKnowledge EditingUnlearningMechanistic Interpretability
Read MoreI've run LarQL on 9 models from 5 organizations — from a 360M toy to OpenAI's 120B MoE. Three numbers hold within ±15% across all of them. One pattern vanishes the moment you go to 1-bit weights.
LarQLInterpretabilityTransformersMachine LearningMechanistic Interpretability
Read More