CASSANDRA-21301: add AGENTS.md and CLAUDE.md by maoling · Pull Request #4734 · apache/cassandra

maoling · 2026-04-12T13:32:48Z

more details in the CASSANDRA-21301

netudima · 2026-04-13T11:41:16Z

+
+## Environment
+
+- Java 11 (default) or 17.


we support 21 in runtime now as well

netudima · 2026-04-13T11:42:49Z

+- Commit messages should reference the JIRA issue. Disclose that AI assistance was used in the PR description.
+
+    ```
+    CASSANDRA-XXXXX: Brief description of the change


the format does not match the agreed commit message described in https://cassandra.apache.org/_/development/how_to_commit.html

jmckenzie-dev · 2026-04-13T13:34:36Z

+
+    ```bash
+    # Run a single unit test class
+    ant testsome -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest -Dtest.methods=testGetAllRangesEmpty


Differentiating how to run unit vs. integration could be useful. Though I think it's more likely that just adding a SKILL.md for running tests that has a distilled extraction of guidance from build.xml is probably the better way to do that.

So - maybe a quick example here + pointing to build.xml w/some string bread crumbs to know what to search for could be a decent 1st step.

this is an area i feel we should have a script. For example the above is 100% wrong, and is what 90% of Cassandra dev use =D

CI uses testclasslist which has different JVM arguments, so if you test using testsome and it fails in CI you will struggle to figure out why... they are not the same!

We also have different test scopes and they have their own command or special flags... this is way too much to put in a agents file and really should just be a script to make it idiot proof (LLMs do stupid things, try to make this deterministic when possible)

dcapwell · 2026-04-13T16:00:56Z

+## Build
+
+```bash
+ant build                # compile all classes (includes Accord submodule)
+ant jar                  # build the main JAR
+ant clean                # remove locally created artifacts
+ant realclean            # remove entire build directory and downloaded artifacts
+```


not a blocker for this PR at all, but i find its best to not let the harnest know / touch ant as its wasteful for tokens. Locally I have 2 scripts

ai-ci-test <test> -- runs the test and strips the output so its "success" *or* the failing task ai-build -- `ant clean && ant build` but strips the output so its "success" or the failing task

dcapwell · 2026-04-13T16:01:25Z

+ant realclean            # remove entire build directory and downloaded artifacts
+```
+
+Do NOT run `ant build` if you only need to verify a small change compiles.


how do you test compile then?

dcapwell · 2026-04-13T16:06:28Z

+    ant testsome -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest -Dtest.methods=testGetAllRangesEmpty
+    ```
+
+- When fixing a bug, first create a regression test that reproduces the failure, then implement the fix and verify.


not a blocker for this patch, but this is an area where skills can help. How to work with CQLTester, how to work with jvm-dtest, how to work with qt or stateful, how to work with the simulator; etc.

first create a regression test that reproduces the failure

do we have any example JIRA where this went well? I find claude makes really horrible tests. It will try to create a fake unit test that makes no sense then it defines success as it made the logic handle its test... but you actually go through Cassandra and everything is broken.

dcapwell · 2026-04-13T16:07:07Z

+    ```
+
+- When fixing a bug, first create a regression test that reproduces the failure, then implement the fix and verify.
+- Provide test(s) coverage for all new or modified code.


how? this is vague so LLMs are likely to make up random numbers; if we care about this (which we don't track) it should be deterministic

dcapwell · 2026-04-13T16:22:59Z

+ant check                # runs checkstyle, RAT license check, and builds
+ant checkstyle           # checkstyle only


is there a reason the harness should use checkstyle vs check? the more we add the more likely it will do the wrong thing

dcapwell · 2026-04-13T16:24:22Z

+## Code Style
+Cassandra enforces style via Checkstyle (`ant checkstyle`). Key rules are included in `checkstyle.xml` file.
+General style:
+- 4-space indentation, no tabs.
+- Braces on a new line below control statements (Allman style).
+- Brace-less style for single-line control statements.
+- Match existing code style in the file you are editing.
+- All new files must include the Apache License 2.0 header.
+- Concise English documentation is required for complex classes and methods; trivial ones may not require them.


Sadly this is misleading and the actual style guide is at https://cassandra.apache.org/_/development/code_style.html

The only way for a agent to know our style guide is to read that rule; and this is manual and lacking deterministic checks

That style guide comes from the code right? We could just have a .md redirect pointing to that file for digging into specifics around style. Not sure when an LLM would be triggered to check that vs. just inferring from the local context from files.

isn't this in the website project?

$ # in cassandra dir $ fd code_s $

✓ ~/src/github/apache/cassandra-website git:(trunk) $ fd code_s site-content/source/modules/ROOT/pages/development/code_style.adoc

dcapwell · 2026-04-13T16:26:26Z

+    Co-authored-by: GitHub Copilot
+    Co-authored-by: Claude
+    Co-authored-by: gemini-code-assist
+    ```


i don't think this is agreed to yet. Mick proposed in slack (informal) to use the Linux Kernal way of Assisted-by; as of this moment we don't have a syntax agreed to.

dcapwell · 2026-04-13T16:27:26Z

+    Co-authored-by: gemini-code-assist
+    ```
+
+- Do NOT modify submodule references without understanding the implications. Submodule changes must be committed and pushed before the parent Cassandra commit.


do we need this comment? this tells the LLM to hack around Accord rather than make clean fixes. Why are we asking harnesses to own git?

dcapwell · 2026-04-13T16:28:22Z

+- 🚫 Never commit secrets, credentials, or API keys.
+- 🚫 Never run the full test suite (`ant test`) — it takes hours. Run targeted tests only.
+- 🚫 Never bypass Checkstyle violations without a suppression comment explaining why.
+- 🚫 Never create summary or documentation files unless explicitly asked.


i feel like this should be removed; harnesses are better at maintaining docs than us humans... so actually making sure features are documented should be desired?

dcapwell · 2026-04-13T16:29:03Z

+- 🚫 Never bypass Checkstyle violations without a suppression comment explaining why.
+- 🚫 Never create summary or documentation files unless explicitly asked.
+- ⚠️ Ask before modifying the CQL grammar (`src/antlr/Cql.g`) — changes cascade widely.
+- ⚠️ Ask before modifying `modules/accord/` — it is a separate repository.


this is a git thing, this tells the harness to hack when the proper solution is to update accord.

jmckenzie-dev · 2026-04-14T15:18:07Z

Broadly, LLM's aren't as good at being told what not to do vs. being told what to do. So we should try and err on the side of being "positively oriented" when possible.

@dcapwell - do you have some scripts for the CI and testing stuff we could bundle in with this PR?

dcapwell · 2026-04-14T23:20:16Z

yeah, i should be able to add the 3 scripts i use...

rustyrazorblade · 2026-04-14T23:54:48Z

Broadly, LLM's aren't as good at being told what not to do vs. being told what to do. So we should try and err on the side of being "positively oriented" when possible.

There's some truth to this, but I think it's still useful to say what not to do. I've found in my own repos telling the LLM not to write mock-echo tests where it just verifies the mocked thing returns results, to be quite helpful. I think it helps a lot to provide the positive direction in addition to the negative.

dcapwell · 2026-04-15T16:30:50Z

+
+## Git Workflow
+- Do NOT commit unless explicitly asked.
+- Commit messages should reference the JIRA issue. Disclose that AI assistance was used in the PR description.


honestly I think we should remove this. Logically this is only needed if the committer pushes this on the author (many don't), and is done after the review.

For agent files you need them to work in 100% of sessions, and this section isn't applicable majority of the time.

For example, how does it learn the JIRA?, how does it learn the reviewers? how does it learn that a reviewer gave code feedback and should be be included as a co-author? what should the commit message be and is it in-sync with JIRA (we are required to push context to JIRA as a source of truth)?

dcapwell · 2026-04-15T16:38:34Z

going to push review branch in a second; here are example usage of the scripts

$ ai-build
BUILD FAILED
==================================================

Output from failed target 'None':
----------------------------------------
Buildfile: /Users/dcapwell/src/github/apache/cassandra/trunk/build.xml
BUILD FAILED
/Users/dcapwell/src/github/apache/cassandra/trunk/build.xml:223: Unsupported JDK version used: 25
Total time: 0 seconds
$ setjdk 21
$ ai-build
BUILD SUCCESSFUL
$ vim src/java/org/apache/cassandra/config/Config.java # make it not compile
$ ai-build
BUILD FAILED
Failed target: _build_java
==================================================

Output from failed target '_build_java':
----------------------------------------
_build_java:
     [echo] Compiling for Java 21...
    [javac] Compiling 3194 source files to /Users/dcapwell/src/github/apache/cassandra/trunk/build/classes/main
    [javac] Note: Annotation processing is enabled because one or more processors were found
    [javac]   on the class path. A future release of javac may disable annotation processing
    [javac]   unless at least one processor is specified by name (-processor), or a search
    [javac]   path is specified (--processor-path, --processor-module-path), or annotation
    [javac]   processing is enabled explicitly (-proc:only, -proc:full).
    [javac]   Use -Xlint:-options to suppress this message.
    [javac]   Use -proc:none to disable annotation processing.
    [javac] /Users/dcapwell/src/github/apache/cassandra/trunk/src/java/org/apache/cassandra/config/Config.java:18: error: ';' expected
    [javac] package org.apache.cassandra.config
    [javac]                                    ^
    [javac] 1 error
BUILD FAILED
/Users/dcapwell/src/github/apache/cassandra/trunk/build.xml:693: The following error occurred while executing this line:
/Users/dcapwell/src/github/apache/cassandra/trunk/build.xml:678: Compile failed; see the compiler error output for details.
Total time: 10 seconds

dcapwell · 2026-04-16T12:47:28Z

@@ -0,0 +1 @@
+@AGENTS.md


this isn't a symlink in git, so when you checkout you get

$ cat CLAUDE.md @AGENTS.md%

jmckenzie-dev · 2026-04-16T13:23:24Z

There's some truth to this, but I think it's still useful to say what not to do. I've found in my own repos telling the LLM not to write mock-echo tests where it just verifies the mocked thing returns results, to be quite helpful. I think it helps a lot to provide the positive direction in addition to the negative.

As is tradition, I communicated the tip of the iceberg on what was in my head. 😭

What I meant to get across: we should absolutely do both, but be aware that this is a current limitation so lean on the positive hard knowing they'll generally comply with that and then all caps "THE WORLD WILL END IF YOU DO THIS BAD THING" + copy/paste multiple times to tweak the attention mechanism and get it to maybe respect what we tell it not to do.

So yeah - quite helpful, but understanding that the "mathematical vibe" is probably like 4:1 in terms of impact on saying what an LLM should do vs. what they should not in terms of their compliance. Plus it varies based on model too.

Wild West.

dcapwell · 2026-04-16T18:38:00Z

dcapwell@4ae63da

Posted my scripts and updated AGENTS.md to point to it (also fixed the fact CLAUDE.md wasn't a symlink). I didn't resolve any other feedback; left for author

maoling · 2026-04-20T09:19:51Z

Thanks everyone for the reviews. I’ve done my best to address the comments. PTAL.
Regarding the great scripts created by @dcapwell, could you please Co-authored-by with this PR?
I’ve added more resources to the JIRA, including guidance on how to write a good AGENTS.md and examples from other well-known projects.
I’ve removed the CLAUDE.md file due to potential risks such as vendor neutrality concerns, IP and licensing issues, and possible vendor lock-in. Although a few ASF projects (e.g., Apache Airflow) are already using CLAUDE.md, I think it’s safer to avoid it here for now.

dcapwell · 2026-04-21T13:30:17Z

Regarding the great scripts created by @dcapwell, could you please Co-authored-by with this PR?

Can you apply it? Don't worry about co authorship or attribution, thats something committers can do when merging

I’ve removed the CLAUDE.md file due to potential risks ... IP and licensing issues,

Is there anywhere these concerns are posted? Can you share a link?

i get the vendor neutral aspect but also have to also grok with the fact only like 3-4 people in Cassandra don't use Claude Code; we are crazy people; and most new contributors are also likely using claude code.

maoling · 2026-04-28T09:20:40Z

Applied @dcapwell's scripts via git and added him as Co-authored-by.
Regarding the legality of CLAUDE.md, I might be over-concerned. According to the ASF guideline (https://www.apache.org/legal/generative-tooling.html), contributors may use any tools as long as they follow the guidance in that document. Additionally, many ASF projects already use CLAUDE.md, such as Apache Airflow, Apache Superset, Apache Spark, and Apache Struts. Therefore, I have reintroduced CLAUDE.md.

smiklosovic · 2026-04-30T12:31:02Z

@jmckenzie-dev @rustyrazorblade @netudima anything else you would like to see in the first iteration of this? cc @dcapwell

dcapwell · 2026-04-30T19:05:43Z

+- Java 11 (default), 17, 21.
+- Python 3 for `cqlsh` and dtests.
+- Apache Ant >= 1.10 for all builds. Do NOT attempt to use Maven, Gradle, or any other build tool. Cassandra uses Ant exclusively.
+- Do NOT attempt to install dependencies, you do not have Internet access


, you do not have Internet access

should remove this. Agents work best when they can fetch docs... so saying it can't do that will degrade experience.

Do NOT attempt to install dependencies should be fine, or you could expand if you want to Do NOT attempt to install dependencies, every dependency requires OSS community approval first

dcapwell · 2026-04-30T19:06:24Z

+
+- Java 11 (default), 17, 21.
+- Python 3 for `cqlsh` and dtests.
+- Apache Ant >= 1.10 for all builds. Do NOT attempt to use Maven, Gradle, or any other build tool. Cassandra uses Ant exclusively.


wonder if this will confuse claude to think to use ant rather than the scripts... will see i guess =D

dcapwell · 2026-04-30T19:07:48Z

+    .build/sh/ai-ci-test org.apache.cassandra.service.StorageServiceServerTest
+    ```
+
+- `ai-ci-test` does NOT support method-level filtering — it runs the entire test class.


FYI; i have lived with this for years... telling claude this does not stop it from trying 80% of the time... tis a pain... but its a limitation with ant! In #4778 we could fix it as i made sure gradle supports method level

nothing to change in this patch, its just something i know opus still struggles with

dcapwell · 2026-04-30T19:09:44Z

+`.build/sh/ai-build` includes checkstyle validation. There is no need to run checkstyle separately.
+
+## Code Style
+Cassandra enforces style via Checkstyle (run via `.build/sh/ai-build`). The official style guide is at https://cassandra.apache.org/_/development/code_style.html. Always defer to it when in doubt.


i wonder if claude will still fetch this given above we said there isn't network... i have this in my agent and its almost never fetched...

dcapwell · 2026-04-30T19:12:30Z

+
+## Git Workflow
+- Do NOT commit unless explicitly asked.
+- Commit messages format. For example:


im still against this but will defer to others. How will claude know who is the reviewer? IMO this is a cassandra committer's concern and not something agents should worry about; just more chances for failure

Assuming it can read the JIRA, it should be able to get both the feedback there, as well as the PR. I do that now with my projects.

Assuming it can read the JIRA

We don't ship that capability yet. I have it in my commit process, its a simple API call that doesn't need auth, but there are tricky bits as the additional authors isn't a standard field, so very easy for agents to get this wrong.

Again, I am 100% against asking contributors to care about our commit message; this is a committer's issue to me (not every member agrees with my stance btw; its not a settled debate).

Also, agent files are for 100% of sessions, and this is needed after all review is done and you are ready to merge; aka its not needed for 100% of sessions so can cause issues.

dcapwell

I am ok with the file as is, and we can refine later on if we see issues. Ill leave to @jmckenzie-dev and @dk2k how they feel about all existing comments.

+1

rustyrazorblade · 2026-04-30T19:15:30Z

@jmckenzie-dev @rustyrazorblade @netudima anything else you would like to see in the first iteration of this? cc @dcapwell

Thanks for checking. I took a brief scan and I think @dcapwell has covered what I would have asked to address. It's a solid first pass and I expect to treat this as a living document more than we normally would, so I'm not concerned with having it be absolutely perfect.

I think the biggest item that can be left to follow ups would be to improve the clarity around the code structure and the relationship between different core classes and their responsibilities. That can just as easily be a follow up.

dk2k · 2026-05-02T17:18:15Z

I have no objections

dcapwell · 2026-05-02T22:57:05Z

Just bringing visibility. @driftx posted https://issues.apache.org/jira/browse/CASSANDRA-21301?focusedCommentId=18073143&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-18073143 with a -1 to having top level AGENTS.md and CLAUDE.md; so we can not merge until we resolve this feedback.

driftx · 2026-05-05T20:09:54Z

At the time, I don't think AGENTS.md was on the table. That said I'll amend to -0, but do we really need both?

Co-authored-by: David Capwell <dcapwell@gmail.com>

maoling · 2026-05-06T04:01:32Z

Latest updates: Remove internet restriction and simplify commit message format
A few points to answer reviewers:
- Modern AI models are becoming increasingly powerful and can easily understand project structure. We may further improve this in the future.
- Both AGENTS.md and CLAUDE.md are currently needed. Claude Code does not yet support AGENTS.md, and it is one of the mainstream AI coding tools.

Here are some examples from other ASF projects, which suggest we are on the right track :)

Apache Spark:
https://github.com/apache/spark/blob/master/AGENTS.md
https://github.com/apache/spark/blob/master/CLAUDE.md

Apache Iceberg:
https://github.com/apache/iceberg/blob/main/AGENTS.md

Apache Airflow:
https://github.com/apache/airflow/blob/main/AGENTS.md
https://github.com/apache/airflow/blob/main/CLAUDE.md

Apache Superset:
https://github.com/apache/superset/blob/master/CLAUDE.md
https://github.com/apache/superset/blob/master/AGENTS.md

Apache Struts
https://github.com/apache/struts/blob/main/CLAUDE.md

driftx · 2026-05-06T10:59:40Z

Claude Code does not yet support AGENTS.md

That's disappointing, but alright.

examples from other ASF projects, which suggest we are on the right track

The inconsistency there suggests that perhaps nobody really knows what the correct track is yet, and I do think we'll see friction from this, but I am -0.

netudima reviewed Apr 13, 2026

View reviewed changes

Comment thread AGENTS.md Outdated

## Environment

- Java 11 (default) or 17.

Copy link
Copy Markdown

Contributor

netudima Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support 21 in runtime now as well

netudima reviewed Apr 13, 2026

View reviewed changes

jmckenzie-dev reviewed Apr 13, 2026

View reviewed changes

dcapwell reviewed Apr 13, 2026

View reviewed changes

dcapwell reviewed Apr 15, 2026

View reviewed changes

dcapwell reviewed Apr 16, 2026

View reviewed changes

maoling force-pushed the CASSANDRA-21301 branch from 7318476 to 0e37a6e Compare April 20, 2026 09:15

maoling changed the title ~~CASSANDRA-21301: add AGENTS.md and CLAUDE.md~~ CASSANDRA-21301: add AGENTS.md for AI coding Apr 20, 2026

maoling force-pushed the CASSANDRA-21301 branch from 0e37a6e to 723b0af Compare April 28, 2026 09:17

maoling changed the title ~~CASSANDRA-21301: add AGENTS.md for AI coding~~ CASSANDRA-21301: add AGENTS.md and CLAUDE.md Apr 28, 2026

dcapwell reviewed Apr 30, 2026

View reviewed changes

maoling and others added 2 commits May 6, 2026 11:13

CASSANDRA-21301: add AGENTS.md and CLAUDE.md

246e353

Co-authored-by: David Capwell <dcapwell@gmail.com>

Remove internet restriction and simplify commit message format

6208aef

maoling force-pushed the CASSANDRA-21301 branch from 723b0af to 6208aef Compare May 6, 2026 03:56

		ant check # runs checkstyle, RAT license check, and builds
		ant checkstyle # checkstyle only

		@@ -0,0 +1 @@
		@AGENTS.md No newline at end of file

Conversation

maoling commented Apr 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcapwell Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev commented Apr 14, 2026

Uh oh!

dcapwell commented Apr 14, 2026

Uh oh!

rustyrazorblade commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcapwell commented Apr 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev commented Apr 16, 2026

Uh oh!

dcapwell commented Apr 16, 2026

Uh oh!

maoling commented Apr 20, 2026

Uh oh!

dcapwell commented Apr 21, 2026

Uh oh!

maoling commented Apr 28, 2026

Uh oh!

smiklosovic commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcapwell Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcapwell Apr 14, 2026 •

edited

Loading

smiklosovic commented Apr 30, 2026 •

edited

Loading

dcapwell Apr 30, 2026 •

edited

Loading