$ cat diff/2026-05-14-2026-05-14-three-claims-keynote.md

diff/

Three claims from the GTC keynote, checked against the paper

The stage version of the new long-context architecture is cleaner than the arXiv preprint. Here is what survives the comparison, and what quietly does not.

A keynote is a marketing artifact. A paper is a research artifact. They are allowed to disagree, and they often do. The interesting question is where they disagree and how loudly the disagreement is hidden. Here are three claims from Tuesday’s keynote, lined up against the paper that was released the same morning, sentence by sentence.

Claim 1: “We process two million tokens of context at the same cost as our previous fifty-thousand.”

The keynote framed this as a flat cost curve. The paper does not. Table 4 in the supplementary material shows compute per token rising sub-linearly with context length — which is genuinely impressive — but it is not flat. At two million tokens, per-token compute is roughly 1.8x the cost at fifty thousand. The wall-clock latency is approximately seven times longer. Both numbers are still ahead of every published competitor. Neither number is “the same.”

Verdict: partially true. The architecture genuinely scales better than its peers. The “same cost” framing is a stage simplification that the paper does not endorse.

Claim 2: “Accuracy on long-context retrieval improves with more context, rather than degrading.”

This is the claim that earned the loudest applause. It is also the claim that the paper most carefully qualifies. The improvement is shown on a synthetic needle-in-haystack benchmark that the lab introduced in the same paper, with retrieval anchors that were specifically constructed to be unambiguous. The paper notes, in a paragraph in section 5.4 that was not in the keynote, that on the existing LongBench-Multi suite the model’s accuracy at the two-million-token mark is 4.1 points below its accuracy at five hundred thousand tokens.

Verdict: misleading by selection. The claim is true on the benchmark the lab built. It is false on the benchmark the field has been using. A reader who relied only on the keynote would have the wrong mental model.

Claim 3: “The architecture is open. Weights will be released.”

This was stated unambiguously. The paper, in its acknowledgements and licensing footnote, restricts the weight release to non-commercial research use, with a commercial license obtainable on application. That is not “open” in the sense the audience understood. It is a research-only release with a paid commercial track. Several existing models marketed as open use the same structure. The criticism applies broadly, not just to this lab. But the gap between the stage word and the licence file is, here, a full step wider than usual.

Verdict: false as stated. The release is restricted in ways that the keynote did not disclose.

What to take away

None of this is scandal. Keynotes are theatre. But a diff between the theatre and the paper is the most useful thing a reader can do with a release of this size, and almost no one will do it for you. We do it because it is the only honest way to read this kind of announcement.

If you only have time for one number from Tuesday: the sub-linear cost curve is the real result. Everything else can wait for the third-party reproductions.

// end of file

// end of file