Alibaba Qwen Clinches 2025 NeurIPS Best Paper Award for Attention Mechanism Breakthrough

ago 52 minutes
Alibaba Qwen Clinches 2025 NeurIPS Best Paper Award for Attention Mechanism Breakthrough

The Alibaba Qwen team has been honored with the prestigious “NeurIPS 2025 Best Paper Award” at the Conference on Neural Information Processing Systems (NeurIPS). This renowned conference is a key event in the fields of machine learning and artificial intelligence.

A Groundbreaking Research on Attention Mechanisms

The award-winning paper, titled “Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free,” marks a significant advance in the understanding of attention mechanisms within large language models (LLMs). It is the first study to systematically analyze the impact of attention gating on model performance and training.

The Importance of Gating

Gating serves an essential function in controlling information flow within LLMs. By acting like “intelligent noise-canceling headphones,” it filters out irrelevant data, thereby enhancing model efficiency. This mechanism is widely utilized in various LLM architectures.

Extensive Comparative Study

To evaluate the effects of gating comprehensively, the Qwen team examined over 30 variants of 15 billion Mixture-of-Experts (MoE) models and 1.7 billion dense models. These models were trained on an extensive dataset containing 3.5 trillion tokens.

  • The study found that adding a head-specific sigmoid gate after Scaled Dot-Product Attention (SDPA) consistently improved model performance.
  • This architectural tweak enhanced training stability, allowed for higher learning rates, and improved scaling capabilities.

Innovations in Qwen3-Next

These findings were integrated into the Qwen3-Next model, released in September 2025. This innovative model replaced standard attention mechanisms with a new combination of Gated DeltaNet and Gated Attention, leading to improved in-context learning and computational efficiency.

Commitment to Community and Collaboration

The Qwen team has made a commitment to support further research by releasing related codes and models on GitHub and HuggingFace. This move encourages community adoption and collaboration in advancing the understanding of attention mechanisms in LLMs.

Recognition from the NeurIPS Selection Committee

The NeurIPS Selection Committee praised the paper, highlighting its practical recommendations and the robust evidence supporting the proposed architectural modifications. They emphasized the significant work enabled by access to industrial-scale computing resources.

Moreover, they commended the authors for sharing their findings openly, especially at a time when scientific sharing in the realm of LLMs has diminished. This approach is expected to contribute to a broader understanding and advancement of attention mechanisms in large language models.