Add Knowledge Database to Kernel optimization #85

kaiming-cheng · 2026-01-27T08:29:35Z

This PR Introduces a hierarchical optimization database that stores GPU kernel optimization techniques and code examples for the RAG-based optimization.

Key components:

OptNode / OptHierarchy: Tree structure organizing optimizations by bottleneck type (latency, memory, utilization) → technique → code example
docs/: Optimization technique documentation (TMA, PID swizzling, persistence)
code_samples/: Reference Triton kernel implementations (matmul, matadd with various optimizations applied)

Optimization techniques covered:

Host-side and device-side Tensor Memory Accelerator (TMA)
PID swizzling for L2 cache locality
Persistent kernel programming style

This database enables the agent to retrieve relevant optimization strategies and reference implementations based on diagnosed performance bottlenecks.

Test

query = "use TMA for memory optimization"
prescriber = RAGPrescriber()

opt_node, similarities = prescriber.retrieve(query)
Retrieved: 
============================= On-Device Tensor Memory Accel... (similarity: 0.573)

Generated context (4620 chars):
--------------------------------------------------------------------------------
## Optimization Technique
...

context = prescriber.build_context(opt_node, max_code_examples=1, max_chars=2000)
## Code Examples
...
 
add_kernel[grid](
        x,
        y,
        output,
        M,
        N,
        x.stride(0),
        x.stride(1),
        BLOCK_SIZE_M=BLOCK_SIZE_M,
        BLOCK_SIZE_N=BLOCK_SIZE_N,
    )
    return output

(Showing 1 of 2 examples)

Jack-Khuu

Looks good, is it hard to add the integration code using the RAG into this PR too?

Remember to cite for the code_samples/docs

Drop [Optimization 7/n] from the title just to avoid confusion

kernel_perf_agent/kernel_opt/database/base.py

Jack-Khuu · 2026-01-28T01:13:29Z

kernel_perf_agent/kernel_opt/database/base.py

+        """Adds a child node to the current node."""
+        self.opt_parents.extend(parent_nodes)
+
+    def remove_parents(self, parent_nodes):


Do we need this for any reason?

good call - I've removed it in following commit

Jack-Khuu · 2026-01-28T01:18:00Z

kernel_perf_agent/kernel_opt/database/base.py

+        level_1_opts = [optnode_latency, optnode_memory, optnode_utilization]
+        self.root.add_children(level_1_opts)
+        optnode_latency.add_parents([self.root])
+        optnode_memory.add_parents([self.root])
+        optnode_utilization.add_parents([self.root])


nit: For legibility can we add a helper like add_relation or something that updates the child+parent symmetrically

It's easy to parse here, but level3 is a harder to parse

Laurawly · 2026-01-29T19:36:51Z

triton_kernel_agent/opt_worker_component/prescribing/RAG_based_prescriber.py

+
+        # Default path
+        if database_path is None:
+            database_path = (


database_path seems wrong, use Path(__file__).resolve().parents[...] until you hit the project root (where pyproject.toml is)

Laurawly · 2026-01-29T19:38:35Z

triton_kernel_agent/opt_worker_component/prescribing/RAG_based_prescriber.py

+            return 0.0
+        return dot_product / (norm_vec1 * norm_vec2)
+
+    def retrieve(self, opt_prompt: str) -> tuple[OptNode | None, dict[OptNode, float]]:


retrieve() calls embeddings.embed_query(node.opt_desc) for every node each time. Some nodes include full code examples which is slow and costly.
Precompute embeddings once at init and cache them per node.
Or at least cache in-memory dict {OptNode: embedding} after first compute.
Also consider embedding only L1/L2 text nodes for retrieval, then traverse down for code examples. Embedding code blobs is noisy and expensive.

Laurawly · 2026-01-29T19:39:23Z

triton_kernel_agent/opt_worker_component/prescribing/RAG_based_prescriber.py

+
+        return best_node, opt_similarity
+
+    def build_context(self, opt_node: OptNode) -> str:


It traverses from the selected node down and concatenates every descendant’s opt_desc, including entire code files. That will quickly blow context limits and drown signal.
Put a max character/token budget and stop after N leaf examples.
Add separators between nodes (right now it just concatenates).
Optionally include only: (a) technique description + (b) top-k leaf code examples.

Laurawly · 2026-01-29T19:40:13Z

kernel_perf_agent/kernel_opt/database/base.py

+
+from pathlib import Path
+
+from kernel_perf_agent.kernel_opt.database.docs import (


I don’t see a kernel_perf_agent/kernel_opt/database/docs/__init__.py added in this PR.

Thanks for the catch, updated this in the c57e13c

Laurawly · 2026-01-29T19:40:49Z

pyproject.toml

    "python-dotenv",
    "gradio>=5.5.0",
    "requests",
+    "langchain-openai",


Adding langchain-openai is a big dependency. If you only need embeddings, consider using the project’s existing LLM client (if any) or a thinner dependency.

If you keep it, I’d suggest pinning compatible versions or adding it as an optional dependency for the RAG feature.

Good point! We can use OpenAI's text-embedding model for simplicity

kaiming-cheng requested review from Jack-Khuu and Laurawly January 27, 2026 08:29

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026

kaiming-cheng changed the title ~~[Optimization 7/n] Add Database in Kernel_opt~~ [Optimization 7/n] Add Knowledge Database to Kernel optimization Jan 27, 2026

Kaiming Cheng added 2 commits January 27, 2026 16:27

Introducing Database module to kernel_opt

48c9b0c

fix ruff

84708fd

kaiming-cheng force-pushed the kaiming/opt_component_7_clean branch from b9cb0d7 to 84708fd Compare January 28, 2026 00:29

Jack-Khuu reviewed Jan 28, 2026

View reviewed changes

kaiming-cheng changed the title ~~[Optimization 7/n] Add Knowledge Database to Kernel optimization~~ Add Knowledge Database to Kernel optimization Jan 28, 2026

kaiming-cheng and others added 2 commits January 28, 2026 20:37

Merge branch 'main' into kaiming/opt_component_7_clean

5d01d11

change base and add integration example

069a80d

Jack-Khuu approved these changes Jan 29, 2026

View reviewed changes

Laurawly reviewed Jan 29, 2026

View reviewed changes

update rag integration

c57e13c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Knowledge Database to Kernel optimization #85

Add Knowledge Database to Kernel optimization #85

Uh oh!

kaiming-cheng commented Jan 27, 2026 •

edited

Loading

Uh oh!

Jack-Khuu left a comment

Uh oh!

Uh oh!

Jack-Khuu Jan 28, 2026

Uh oh!

kaiming-cheng Jan 29, 2026

Uh oh!

Jack-Khuu Jan 28, 2026

Uh oh!

Laurawly Jan 29, 2026

Uh oh!

Laurawly Jan 29, 2026

Uh oh!

Laurawly Jan 29, 2026

Uh oh!

Laurawly Jan 29, 2026

Uh oh!

kaiming-cheng Jan 30, 2026

Uh oh!

Laurawly Jan 29, 2026

Uh oh!

kaiming-cheng Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		return best_node, opt_similarity

		def build_context(self, opt_node: OptNode) -> str:


		from pathlib import Path

		from kernel_perf_agent.kernel_opt.database.docs import (

Add Knowledge Database to Kernel optimization #85

Are you sure you want to change the base?

Add Knowledge Database to Kernel optimization #85

Uh oh!

Conversation

kaiming-cheng commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key components:

Optimization techniques covered:

Test

Uh oh!

Jack-Khuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaiming-cheng commented Jan 27, 2026 •

edited

Loading