[Chapter 8] Correction regarding the relationship between GQA and MQA (Page 51)

**Location**
* **File:** `chapters/nlp-book-chapter8.pdf`
* **Page:** 51
* **Section:** 8.3.4 Sharing across Heads and Layers

**Problem Description**
I would like to report a potential error in the description of Grouped-Query Attention (GQA).
In the text regarding the parameter $n_g$ (number of groups), the book states:

> "By contrast, when $n_g = 1$, it becomes the **GQA** model."

**Reasoning**
If $n_g$ represents the number of groups:
1.  **$n_g = 1$** implies that all query heads share a single Key-Value pair. This is the exact definition of **MQA (Multi-Query Attention)**.
2.  As proposed in the original GQA paper (*Ainslie et al.*), GQA is an interpolation between MHA and MQA.
    * Limit 1 ($n_g = 1$): MQA
    * Limit 2 ($n_g = H$): MHA
    * Intermediate: GQA

Therefore, stating that $n_g=1$ becomes the "GQA model" is confusing, as GQA usually refers to the general case or the intermediate state, whereas the specific limit of 1 is widely recognized as MQA.

**Suggested Fix**
I suggest changing the sentence to:

> "By contrast, when $n_g = 1$, it becomes the **MQA** model."

Thank you for the great resources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chapter 8] Correction regarding the relationship between GQA and MQA (Page 51) #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Chapter 8] Correction regarding the relationship between GQA and MQA (Page 51) #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions