A Frontier Open Source LLM Will Be Released On 3rd December 2026
I have seen a version of the above plot going around Twitter and wanted to dig a bit deeper into it. What the plot above is showing is the gap between open weights LLMs and closed source LLMs. We measure this gap by looking at the frontier of performance of open weights LLMs on a benchmark and then looking back into the past how long ago was the closed source frontier at that level. It is a measure of how long it took for open source models to catch up to the new capabilities reached by the closed source model frontier. This benchmark is the Artificial Analysis Intelligence Index - their headline index that tries to assess the overall capabilities of models. In general it correlates quite well with the ‘vibe’ people seem to get from models.
You can see that around summer 2024 the gap on this benchmark starts to shrink, and has been reliably shrinking since then. If you plot a line of best fit and extend it into the future you find that the gap shrinks to 0 months around December 3rd 2026 - 6 months or so from the time of writing.
Now is probably a good time to liquidate your pension, fly to a remote island somewhere, and live out the remaining 6 months or so of civilization in peace.
…
Except.
This might not be the whole picture. This is only a single benchmark, and doesn’t give a complete picture of the capabilities of LLMs. Kindly, Artificial Analysis gives us access to 18 different benchmarks that they have measured for these models. I have repeated the analysis for all the 18 different benchmarks and I have summarized them in the plot below:
For each of the 18 datasets we have created a similar chart. You can see all 18 at the bottom of the page. At each month we have created a box plot of the gap for each dataset. We have then plotted all the box plots over time. We have also calculated the average of the gaps across datasets, and calcuated a line of best fit for that. That line is almost completely flat, at just under 5 months for the entire period.
What is notable is that a large amount of the total improvement of models has been in the coding benchmark. The coding index has gone from 15 months behind to only a month or two behind. Most other datasets have a moderate increase over time in their gaps.
So maybe the open source apocalypse won’t happen yet.
What this exercise does suggest is the difficulty of measuring LLM quality. Depending on how you measure it you would predict the open source singularity by Christmas, or you would say that open source LLMs are consistently 5 months behind close source, and that the gap might be growing.