About Me
I work in the LLM inference space and am the co-founder of doubleword.ai, where we're building state of the art inference systems. Our focus is on making LLM inference cheaper in the offline, batched inference setting.
Before working with LLMs, I did a PhD in computational quantum physics, where I specialized in simulating quantum systems using GPUs and early-stage quantum computers. You can check out some of my work here.
In this blog, I'll share insights from my work in LLM inference here and post deep dives into some of our work in LLM inference systems research.
Tools
I've built some interactive tools to help with LLM development and optimization:
- N-gram Speculation Visualizer - Visualize how speculative decoding with n-gram matching can speed up LLM inference by predicting and verifying multiple tokens at once.
- JSONL Toolkit - Upload, edit, and manage JSONL files for batch API requests. Supports bulk editing of model parameters, random sampling, and dataset repetition.