Skip to content

[Chapter 8] Correction regarding the relationship between GQA and MQA (Page 51) #8

@Sameta-cani

Description

@Sameta-cani

Location

  • File: chapters/nlp-book-chapter8.pdf
  • Page: 51
  • Section: 8.3.4 Sharing across Heads and Layers

Problem Description
I would like to report a potential error in the description of Grouped-Query Attention (GQA).
In the text regarding the parameter $n_g$ (number of groups), the book states:

"By contrast, when $n_g = 1$, it becomes the GQA model."

Reasoning
If $n_g$ represents the number of groups:

  1. $n_g = 1$ implies that all query heads share a single Key-Value pair. This is the exact definition of MQA (Multi-Query Attention).
  2. As proposed in the original GQA paper (Ainslie et al.), GQA is an interpolation between MHA and MQA.
    • Limit 1 ($n_g = 1$): MQA
    • Limit 2 ($n_g = H$): MHA
    • Intermediate: GQA

Therefore, stating that $n_g=1$ becomes the "GQA model" is confusing, as GQA usually refers to the general case or the intermediate state, whereas the specific limit of 1 is widely recognized as MQA.

Suggested Fix
I suggest changing the sentence to:

"By contrast, when $n_g = 1$, it becomes the MQA model."

Thank you for the great resources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions