refactor recursive comment grammar rules with external scanner#8
refactor recursive comment grammar rules with external scanner#8joshuadavidthomas wants to merge 1 commit intointerdependence:mainfrom
Conversation
|
Interesting. It's been a while since I touched this project, but I specifically designed the original implementation to avoid having to write a scanner. If implemented for comments, I'm thinking it might also be a good idea to consider using a similar technique for paired statements in general. I will test this out. Just to clarify, this fixed your Zed extension issues? |
|
Hey, apologies for how long it took me to get back to this! To answer your question: yes, this fixed the crashes I was hitting with Zed. I ended up releasing the Zed extension pointing at my fork in the meantime, but I'd love to point it back at upstream if/when this lands. I wasn't aware of it at the time of working on this, but there was a separate bug in tree-sitter's WASM loader (tree-sitter/tree-sitter#4844) that was also causing crashes with this grammar (zed-industries/zed#29827). Based on my reading of the two issues, it seems like it was unrelated to the recursive issue I ran into though I haven't dug in deep enough to confirm that. Re: extending the scanner approach to paired statements in general, sounds like a good idea to me. The scanner is already there, so the incremental cost is low. The paired statements don't appear to have the same recursive pattern issue, so it'd be more of a consistency thing, right? Happy to help out with that if it'd be useful. And thanks for this project, has made my Neovim and now Zed experience much better! |
I've been working on a new language extension for the Zed editor for Django templates. I tried using this tree-sitter grammar, but kept running into crashes. The logs from the editor were no help, but after some fumbling around I narrowed it down to the comment rules.
Both
unpaired_commentandpaired_commentuse recursive patterns that I think are the cause of the issue, possibly because Zed extensions get compiled to WASM (though that's just a hunch, no concrete evidence that's the core issue).The problematic patterns:
unpaired_comment:repeat(seq(alias($.unpaired_comment, ""), repeat(/.|\s/)))paired_comment:repeat(seq(alias($.paired_comment, ""), repeat(/.|\s/)))To fix this, I made two changes, one small to unpaired comments and one large to paired comments.
For unpaired comments, I changed to a simple
token()pattern -- Django just ignores everything between{#and#}, so no recursion needed.For paired comments, I added an external C scanner inspired by tree-sitter-liquid, but took a different approach to preserve the original parsing behavior. The scanner uses depth tracking to find the balanced closing
{% endcomment %}, incrementing depth when it sees nested{% comment %}tags and decrementing for{% endcomment %}. This maintains the exact same tree structure as the original grammar (single comment node), just without the recursive patterns that caused crashes.