Long-context AI prompts slow down as attention costs grow quadratically. DeepSeek’s new v3.2-Exp release introduces DeepSeek Sparse Attention (DSA), touted to dramatically reduce compute and API costs, with the company claiming a 50% price cut.
Spars e attention is not new; it was used in models like OpenAI’s GPT-3 (2019) and Google’s Reformer (2020). Western labs’ current usage is not fully disclosed, but the technique is widely cited as a path to efficiency.
DeepSeek has been notable for other reasons: its R1 model reportedly matched OpenAI’s o1 performance at a training cost of about $6 million, and its earlier chat app briefly topped the iPhone App Store.
In v3.2-Exp, DeepSeek implements what it calls a “fine-grained sparse attention” mechanism and a “lightning indexer” that scores pairwise word relevance and keeps the top 2,048 connections for each word, skipping others without hurting overall comprehension.
The release includes open-source components under the MIT License with open weights, enabling peer review and further experimentation. Tech press has noted that early benchmarks suggest cost savings in long-context scenarios, though independent verification remains pending.
Even if results hold, the approach could reduce AI inference costs for long conversations or large-scale deployments, potentially changing how companies balance hardware and software efficiency in the coming years.