This tool visualizes n-gram speculation effectiveness between full and quantized model responses.
We use this tool at
doubleword to investigate the forms of speculative decoding specific to batched inference that power our Batched API.
Hover over words in the full response to see matching n-grams in the quantized response.