diff --git a/AGENTS.md b/AGENTS.md index 8e7c1c7..813d0c3 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -46,6 +46,18 @@ wrong results. The LSP server checks this on startup and triggers a forced rebui | `ExtractAliases` | LSP handlers use `ExtractAliasesInScope(text, lineNum)` — scope-aware. Only `Completion` and `CodeAction` use the unscoped `ExtractAliases` intentionally | | Any new store query | Add an index if the query will run on hot paths (definition, hover, references) | +## Token walking + +Use `parser.TokenWalker` for new code that iterates over tokens. It provides: +- Consistent depth tracking with automatic clamping (never goes negative) +- Forward progress guarantees via `EnsureProgress()` +- Module/function detection helpers +- Statement boundary handling + +See `internal/parser/token_walker.go` for the API and `token_walk_test.go` for examples. + +The consistency tests in `elixir_test.go` (`TestModuleScopeConsistency`, `TestDepthTrackingConsistency`) verify that similar functions handle edge cases the same way. Add new edge cases there when fixing scope-related bugs. + ## Common gotchas - **Nested modules**: `defmodule Inner do` inside `Outer` creates `Outer.Inner` in the store but the raw line only says `Inner`. Use `LookupModulesInFile` / `LookupEnclosingModule` rather than regex-scanning lines for module names. diff --git a/docs/architecture.md b/docs/architecture.md index 990dfaa..00e2610 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -5,9 +5,9 @@ Dexter is a fast Elixir LSP server. It indexes module and function definitions f ## Module structure - `cmd/main.go` — CLI entrypoint: `init`, `reindex`, `lookup`, `lsp` subcommands -- `internal/parser/` — Regex-based Elixir parser. Extracts defmodule, def, defp, defmacro, defdelegate, defguard, defprotocol, defimpl, @type, @callback. Handles heredocs, module nesting, alias resolution for defdelegate targets. +- `internal/parser/` — Elixir parser backed by a hand-rolled tokenizer (`tokenizer.go`). The tokenizer produces a flat token stream (handling heredocs, sigils, strings, comments as opaque tokens) and `parser_tokenized.go` walks it to extract defmodule, def, defp, defmacro, defdelegate, defguard, defprotocol, defimpl, @type, @callback, alias, import, use, and Module.function references. Handles module nesting, alias resolution for defdelegate targets, and multi-line expressions natively via bracket depth tracking. - `internal/store/` — SQLite layer. Tables: `files` (path + mtime), `definitions` (module, function, kind, line, file_path, delegate_to, delegate_as), `refs` (module, function, line, file_path, kind). -- `internal/lsp/` — LSP server. `server.go` handles all LSP methods. `elixir.go` contains pure functions for cursor expression extraction, alias/import resolution, use-chain parsing. `rename.go` has rename helpers. `hover.go` has hover formatting. `documents.go` is an in-memory open-buffer store. +- `internal/lsp/` — LSP server. `server.go` handles all LSP methods. `elixir.go` contains pure functions for cursor expression extraction, alias/import/use extraction (tokenizer-based), and use-chain parsing. `rename.go` has rename helpers. `hover.go` has hover formatting. `documents.go` is an in-memory open-buffer store. - `internal/treesitter/` — Tree-sitter integration for scope-aware variable rename and go-to-references. ## LSP feature map @@ -45,7 +45,7 @@ The `__using__` cache (`usingCacheEntry`) stores the parsed result of each modul - **`transUses`** — `use Mod` inside the body (double-use chains); also a heuristic for `Keyword.put_new/put` - **`optBindings`** — dynamic `import unquote(var)` where `var` comes from `Keyword.get(opts, :key, Default)`; stores `{optKey, defaultMod, kind}` so consumer opts override the default -`parseUsingBody` handles three forms: +`parseUsingBody` uses the tokenizer to walk the `__using__` body directly on the token stream. This avoids line-joining heuristics and correctly handles heredocs in moduledocs (which previously caused a regression where `bracketDepth` in line-based joining treated `#` inside markdown links as comments, cascading into file-wide line merges). It handles three forms: - `defmacro __using__` — standard form - `using opts do` — ExUnit.CaseTemplate form (only when `use ExUnit.CaseTemplate` is present) - Function delegation — when the body calls a local helper like `using_block(opts)`, `parseHelperQuoteBlock` finds the function definition and parses its `quote do` body @@ -84,11 +84,10 @@ Call sites are attributed to the **injecting module** in the store (not the defi ## Key design decisions -- **Regex instead of tree-sitter for indexing** — 7.5x faster per file. Tree-sitter is only used when necessary in a - file already opened by the editor. +- **Tokenizer instead of tree-sitter for indexing** — a hand-rolled tokenizer + walker replaced the original regex-based parser for both file indexing and runtime `__using__` parsing. The tokenizer handles heredocs, sigils, multi-line expressions, and comments as opaque tokens, eliminating fragile line-joining heuristics. Tree-sitter is only used for scope-aware variable operations in files already opened by the editor. - **SQLite for storage** — single file, fast reads, incremental updates via mtime tracking. - **Parallel indexing** — `init` uses all CPU cores for parsing, single writer for SQLite. -- **Delegate following** — `defdelegate` targets are resolved at index time (including alias resolution and `as:` renames). +- **Delegate following** — `defdelegate` targets are resolved at index time (including alias resolution and `as:` renames). `LookupFollowDelegate` follows chains recursively (up to 5 hops) so `A → B → C` resolves to `C`. - **Git HEAD polling** — watches `.git/HEAD` mtime every 2 seconds to detect branch switches and trigger reindex. - **Full document sync** — `TextDocumentSyncKindFull`; Elixir files are small enough that incremental sync adds complexity without benefit. - **Index versioning** — `IndexVersion` in `internal/version/version.go`. Mismatch on startup triggers a forced rebuild. Bump when parser or schema changes would invalidate existing indexes. diff --git a/internal/lsp/documents.go b/internal/lsp/documents.go index 5044738..f168fa6 100644 --- a/internal/lsp/documents.go +++ b/internal/lsp/documents.go @@ -5,12 +5,17 @@ import ( tree_sitter "github.com/tree-sitter/go-tree-sitter" tree_sitter_elixir "github.com/tree-sitter/tree-sitter-elixir/bindings/go" + + "github.com/remoteoss/dexter/internal/parser" ) type cachedDoc struct { - text string - tree *tree_sitter.Tree - src []byte // source bytes the tree references — must stay alive + text string + tree *tree_sitter.Tree + src []byte // source bytes the tree references — must stay alive + tokens []parser.Token // cached tokenizer output + tokSrc []byte // source bytes for tokens + lineStarts []int // byte offset of each line start (from TokenizeFull) } // DocumentStore tracks the text content of open buffers and caches @@ -89,3 +94,51 @@ func (ds *DocumentStore) GetTree(uri string) (*tree_sitter.Tree, []byte, bool) { } return doc.tree, doc.src, true } + +// GetTokens returns cached tokenizer output and source bytes for the given URI. +// Tokenizes on first access and caches the result. The cache is invalidated on +// the next Set() call. +func (ds *DocumentStore) GetTokens(uri string) ([]parser.Token, []byte, bool) { + ds.mu.Lock() + defer ds.mu.Unlock() + doc, ok := ds.docs[uri] + if !ok { + return nil, nil, false + } + if doc.tokens == nil { + doc.tokSrc = []byte(doc.text) + result := parser.TokenizeFull(doc.tokSrc) + doc.tokens = result.Tokens + doc.lineStarts = result.LineStarts + } + return doc.tokens, doc.tokSrc, true +} + +// GetTokensFull returns cached tokenizer output including line starts for +// efficient (line, col) → byte offset conversion. +func (ds *DocumentStore) GetTokensFull(uri string) ([]parser.Token, []byte, []int, bool) { + ds.mu.Lock() + defer ds.mu.Unlock() + doc, ok := ds.docs[uri] + if !ok { + return nil, nil, nil, false + } + if doc.tokens == nil { + doc.tokSrc = []byte(doc.text) + result := parser.TokenizeFull(doc.tokSrc) + doc.tokens = result.Tokens + doc.lineStarts = result.LineStarts + } + return doc.tokens, doc.tokSrc, doc.lineStarts, true +} + +// GetTokenizedFile returns a cached TokenizedFile for the given URI, or nil +// if the document is not tracked. This is the preferred way to get a +// TokenizedFile from the document store. +func (ds *DocumentStore) GetTokenizedFile(uri string) *TokenizedFile { + tokens, src, lineStarts, ok := ds.GetTokensFull(uri) + if !ok { + return nil + } + return NewTokenizedFileFromCache(tokens, src, lineStarts) +} diff --git a/internal/lsp/elixir.go b/internal/lsp/elixir.go index 214f062..20d9200 100644 --- a/internal/lsp/elixir.go +++ b/internal/lsp/elixir.go @@ -1,7 +1,7 @@ package lsp import ( - "regexp" + "slices" "strconv" "strings" "unicode" @@ -9,93 +9,324 @@ import ( "github.com/remoteoss/dexter/internal/parser" ) +// TokenizedFile holds pre-tokenized source for efficient multi-operation queries. +// Use this when multiple tokenizer-based lookups will be performed on the same text. +type TokenizedFile struct { + source []byte + tokens []parser.Token + n int + lineStarts []int +} + +// NewTokenizedFile tokenizes the text once for reuse across multiple queries. +func NewTokenizedFile(text string) *TokenizedFile { + source := []byte(text) + result := parser.TokenizeFull(source) + return &TokenizedFile{ + source: source, + tokens: result.Tokens, + n: len(result.Tokens), + lineStarts: result.LineStarts, + } +} + +// NewTokenizedFileFromCache wraps pre-existing tokens (e.g. from DocumentStore cache). +func NewTokenizedFileFromCache(tokens []parser.Token, source []byte, lineStarts []int) *TokenizedFile { + return &TokenizedFile{ + source: source, + tokens: tokens, + n: len(tokens), + lineStarts: lineStarts, + } +} + +// ExpressionAtCursor extracts the dotted expression at the given 0-based line +// and col, using the cached token stream. +func (tf *TokenizedFile) ExpressionAtCursor(line, col int) CursorContext { + return ExpressionAtCursor(tf.tokens, tf.source, tf.lineStarts, line, col) +} + +// FullExpressionAtCursor extracts the complete dotted expression at the given +// 0-based line and col without truncating at the cursor's segment. +func (tf *TokenizedFile) FullExpressionAtCursor(line, col int) CursorContext { + return FullExpressionAtCursor(tf.tokens, tf.source, tf.lineStarts, line, col) +} + +// FirstDefmodule returns the first defmodule name found, or "". +func (tf *TokenizedFile) FirstDefmodule() string { + for i := 0; i < tf.n; i++ { + if tf.tokens[i].Kind == parser.TokDefmodule { + j := tokNextSig(tf.tokens, tf.n, i+1) + name, _ := tokCollectModuleName(tf.source, tf.tokens, tf.n, j) + if name != "" { + return name + } + } + } + return "" +} + +// ResolveModuleExpr replaces __MODULE__ in expr with the enclosing module name +// at the given 0-based line. If targetLine < 0, uses the first defmodule found. +func (tf *TokenizedFile) ResolveModuleExpr(expr string, targetLine int) string { + if !strings.Contains(expr, "__MODULE__") { + return expr + } + + var moduleName string + if targetLine >= 0 { + moduleName = extractEnclosingModuleFromTokens(tf.source, tf.tokens, targetLine) + } + if moduleName == "" { + moduleName = tf.FirstDefmodule() + } + + if moduleName != "" { + return strings.ReplaceAll(expr, "__MODULE__", moduleName) + } + return expr +} + +// FindFunctionDefinition searches for a def/defp/defmacro/defmacrop or @type/@typep/@opaque +// matching the given function name. Returns the 1-based line number and true if found. +func (tf *TokenizedFile) FindFunctionDefinition(functionName string) (int, bool) { + for i := 0; i < tf.n; i++ { + tok := tf.tokens[i] + + switch tok.Kind { + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + j := tokNextSig(tf.tokens, tf.n, i+1) + if j >= tf.n || tf.tokens[j].Kind != parser.TokIdent { + continue + } + if parser.TokenText(tf.source, tf.tokens[j]) == functionName { + return tok.Line, true + } + + case parser.TokAttrType: + j := tokNextSig(tf.tokens, tf.n, i+1) + if j >= tf.n || tf.tokens[j].Kind != parser.TokIdent { + continue + } + if parser.TokenText(tf.source, tf.tokens[j]) == functionName { + return tok.Line, true + } + } + } + return 0, false +} + +// ExtractAliasesInScope parses alias declarations visible at the given 0-based line. +func (tf *TokenizedFile) ExtractAliasesInScope(targetLine int) map[string]string { + return extractAliasesFromTokens(tf.source, tf.tokens, targetLine) +} + +// ExtractImports returns all import declarations from the tokenized file. +func (tf *TokenizedFile) ExtractImports() []string { + var imports []string + for i := 0; i < tf.n; i++ { + if tf.tokens[i].Kind == parser.TokImport { + j := tokNextSig(tf.tokens, tf.n, i+1) + mod, _ := tokCollectModuleName(tf.source, tf.tokens, tf.n, j) + if mod != "" { + imports = append(imports, mod) + } + } + } + return imports +} + func isExprChar(b byte) bool { c := rune(b) return unicode.IsLetter(c) || unicode.IsDigit(c) || c == '_' || c == '.' || c == '?' || c == '!' } -// ExtractExpression returns the dotted expression up to and including the -// segment the cursor is on. Line is the text content, col is a 0-based -// character offset. +// CursorContext holds the result of token-based expression extraction at a +// cursor position. It replaces the combination of ExtractExpression + +// ExtractModuleAndFunction with a single token-aware lookup. +type CursorContext struct { + // ModuleRef is the dot-joined module chain (e.g. "Foo.Bar"). Empty for + // bare function calls. + ModuleRef string + // FunctionName is the lowercase identifier (e.g. "baz"). Empty for + // module-only expressions like "Foo.Bar". + FunctionName string + // ExprStart is the 0-based column of the expression start on its line. + ExprStart int + // ExprEnd is the 0-based column one past the end of the expression. + ExprEnd int +} + +// Expr returns the combined dotted expression string (e.g. "Foo.Bar.baz"). +func (c CursorContext) Expr() string { + if c.ModuleRef == "" && c.FunctionName == "" { + return "" + } + if c.ModuleRef == "" { + return c.FunctionName + } + if c.FunctionName == "" { + return c.ModuleRef + } + return c.ModuleRef + "." + c.FunctionName +} + +// Empty returns true if no expression was found at the cursor. +func (c CursorContext) Empty() bool { + return c.ModuleRef == "" && c.FunctionName == "" +} + +// isExprToken returns true for token kinds that can be part of a dotted +// expression chain (Module.function). +func isExprToken(k parser.TokenKind) bool { + return k == parser.TokModule || k == parser.TokIdent +} + +// ExpressionAtCursor extracts the dotted expression at the cursor position +// using the token stream. Unlike the char-based ExtractExpression, this +// correctly ignores expressions inside strings, comments, heredocs, sigils, +// and atoms. // -// Examples (cursor position marked with |): +// The returned expression is truncated to the cursor's segment (matching +// ExtractExpression behavior): cursor on "Foo" in "Foo.Bar.baz" returns +// only "Foo" as the module ref. // -// "MyApp.Re|po.all" → "MyApp.Repo" -// "MyApp.Repo.a|ll" → "MyApp.Repo.all" -// "Ti|ger.Repo.all" → "MyApp" -// "MyApp|.Repo.all" → "MyApp.Repo" (cursor on dot → include next segment) -func ExtractExpression(line string, col int) string { - expr, _ := extractExpressionBounds(line, col) - return expr +// line and col are 0-based. +func ExpressionAtCursor(tokens []parser.Token, source []byte, lineStarts []int, line, col int) CursorContext { + return expressionAtCursorImpl(tokens, source, lineStarts, line, col, false) +} + +// FullExpressionAtCursor is like ExpressionAtCursor but returns the complete +// dotted chain without truncating at the cursor's segment. +func FullExpressionAtCursor(tokens []parser.Token, source []byte, lineStarts []int, line, col int) CursorContext { + return expressionAtCursorImpl(tokens, source, lineStarts, line, col, true) } -// extractExpressionBounds returns the same expression as ExtractExpression plus -// the start column (0-based) of that expression within the line. Returns ("", 0) -// when there is no expression at the cursor position. -func extractExpressionBounds(line string, col int) (expr string, startCol int) { - if len(line) == 0 { - return "", 0 +func expressionAtCursorImpl(tokens []parser.Token, source []byte, lineStarts []int, line, col int, full bool) CursorContext { + offset := parser.LineColToOffset(lineStarts, line, col) + if offset < 0 { + return CursorContext{} } - if col >= len(line) { - col = len(line) - 1 + + n := len(tokens) + idx := parser.TokenAtOffset(tokens, offset) + + // If cursor lands between tokens, try the token just before (handles + // cursor immediately after an identifier with no gap). + if idx < 0 { + idx = parser.TokenAtOffset(tokens, offset-1) + if idx < 0 { + return CursorContext{} + } } - if col < 0 { - col = 0 + + tok := tokens[idx] + + // Cursor on a dot: advance to the next significant token so we include + // the segment after the dot (matching old behavior: cursor on dot → + // include next segment). + if tok.Kind == parser.TokDot { + next := idx + 1 + if next < n && isExprToken(tokens[next].Kind) { + idx = next + tok = tokens[idx] + } else { + return CursorContext{} + } } - if !isExprChar(line[col]) { - return "", 0 + + // Reject non-expression tokens (strings, comments, atoms, etc.) + if !isExprToken(tok.Kind) { + return CursorContext{} } - start := col - for start > 0 && isExprChar(line[start-1]) { - start-- + // cursorIdx is the token the cursor is physically on — used for truncation + cursorIdx := idx + + // Walk backward through Dot+Module/Ident chains to find the start + startIdx := idx + for startIdx >= 2 { + dotIdx := startIdx - 1 + prevIdx := startIdx - 2 + if tokens[dotIdx].Kind == parser.TokDot && isExprToken(tokens[prevIdx].Kind) { + startIdx = prevIdx + } else { + break + } } - end := col - for end+1 < len(line) && isExprChar(line[end+1]) { - end++ + + // Walk forward through Dot+Module/Ident chains to find the end + endIdx := idx + for endIdx+2 < n { + dotIdx := endIdx + 1 + nextIdx := endIdx + 2 + if tokens[dotIdx].Kind == parser.TokDot && isExprToken(tokens[nextIdx].Kind) { + endIdx = nextIdx + } else { + break + } } - fullExpr := line[start : end+1] - cursorOffset := col - start - searchFrom := cursorOffset - if fullExpr[searchFrom] == '.' { - searchFrom++ + // Determine truncation point: include up to the cursor's segment + truncEnd := endIdx + if !full { + truncEnd = cursorIdx } - nextDot := strings.IndexByte(fullExpr[searchFrom:], '.') - if nextDot == -1 { - return fullExpr, start + + // Build module ref and function name from the token chain + lineStart := 0 + if line < len(lineStarts) { + lineStart = lineStarts[line] } - return fullExpr[:searchFrom+nextDot], start -} -// ExtractFullExpression returns the complete dotted expression at the cursor -// position without truncating at the cursor's segment. Unlike ExtractExpression -// which returns "DocuSign" when the cursor is on "DocuSign.Client.request", -// this returns the entire "DocuSign.Client.request". -func ExtractFullExpression(line string, col int) string { - if len(line) == 0 || col < 0 { - return "" + var moduleParts []string + functionName := "" + + for ti := startIdx; ti <= truncEnd; ti += 2 { + t := tokens[ti] + text := parser.TokenText(source, t) + if t.Kind == parser.TokModule { + moduleParts = append(moduleParts, text) + } else { + // TokIdent — this is the function name; stop here + functionName = text + break + } } - if col >= len(line) { - col = len(line) - 1 + + moduleRef := "" + if len(moduleParts) > 0 { + moduleRef = strings.Join(moduleParts, ".") } - if !isExprChar(line[col]) { - return "" + + exprStart := tokens[startIdx].Start - lineStart + lastTok := tokens[truncEnd] + if functionName != "" { + // Find the ident token for end position + for ti := startIdx; ti <= truncEnd; ti += 2 { + if tokens[ti].Kind == parser.TokIdent { + lastTok = tokens[ti] + break + } + } } - // Reuse the same boundary scan as extractExpressionBounds, but pass col - // at the end of the expression so no truncation occurs. - end := col - for end+1 < len(line) && isExprChar(line[end+1]) { - end++ + exprEnd := lastTok.End - lineStart + + return CursorContext{ + ModuleRef: moduleRef, + FunctionName: functionName, + ExprStart: exprStart, + ExprEnd: exprEnd, } - expr, _ := extractExpressionBounds(line, end) - return expr } // ExtractModuleAndFunction splits a dotted expression into module reference and optional function name. // Uppercase-starting parts are module segments, the first lowercase part is the function. // Returns ("Foo.Bar", "baz") for "Foo.Bar.baz", ("Foo.Bar.Baz", "") for "Foo.Bar.Baz", // ("", "do_something") for "do_something". +// +// Deprecated: Use ExpressionAtCursor which returns ModuleRef and FunctionName directly. func ExtractModuleAndFunction(expr string) (moduleRef string, functionName string) { var moduleParts []string for _, part := range strings.Split(expr, ".") { @@ -148,6 +379,285 @@ func ExtractCompletionContext(line string, col int) (prefix string, afterDot boo return raw, false, start } +// ExtractAliasBlockParent detects whether the given 0-based line is inside +// a multi-line alias brace block (alias Parent.{ ... }). If so, it returns +// the resolved parent module name. This is used by the completion and hover +// handlers to resolve module names inside multi-line alias blocks. +func ExtractAliasBlockParent(lines []string, targetLine int) (string, bool) { + if targetLine < 0 || targetLine >= len(lines) { + return "", false + } + + // Quick pre-check: scan backward for an "alias ...{" line without a + // matching "}" on the same line. Pure string ops, no allocations in + // the fast path, so this is nearly free for the 99% of hover/definition + // requests that are not inside an alias block. + found := false + for i := targetLine; i >= 0; i-- { + trimmed := strings.TrimSpace(lines[i]) + if strings.HasPrefix(trimmed, "alias ") && strings.Contains(trimmed, "{") && !strings.Contains(trimmed, "}") { + found = true + break + } + // Any def/defp/defmodule means we've left the possible alias context. + if strings.HasPrefix(trimmed, "def ") || strings.HasPrefix(trimmed, "defp ") || strings.HasPrefix(trimmed, "defmodule ") { + break + } + } + if !found { + return "", false + } + + // Use tokenizer for accurate parsing + source := []byte(strings.Join(lines, "\n")) + tokens := parser.Tokenize(source) + n := len(tokens) + + // targetLine is 0-based; token.Line is 1-based + targetLine1 := targetLine + 1 + + // Find the token position for the target line + targetIdx := 0 + for i, tok := range tokens { + if tok.Line >= targetLine1 { + targetIdx = i + break + } + } + + // Check if target line has only a closing brace (no module content) + hasModuleOnLine := false + hasCloseBraceOnLine := false + for i := targetIdx; i < n && tokens[i].Line == targetLine1; i++ { + if tokens[i].Kind == parser.TokModule { + hasModuleOnLine = true + } + if tokens[i].Kind == parser.TokCloseBrace { + hasCloseBraceOnLine = true + } + } + if hasCloseBraceOnLine && !hasModuleOnLine { + return "", false + } + + // Scan backward through tokens looking for "alias Parent.{" without matching "}" + for i := targetIdx; i >= 0; i-- { + tok := tokens[i] + + // If we see a closing brace before finding alias, we're not in an open block + if tok.Kind == parser.TokCloseBrace && tok.Line < targetLine1 { + return "", false + } + + if tok.Kind != parser.TokAlias { + continue + } + + // Found alias — collect the module name + j := tokNextSig(tokens, n, i+1) + modName, k := tokCollectModuleName(source, tokens, n, j) + if modName == "" { + return "", false + } + + // Check for ".{" after module name + if k >= n || tokens[k].Kind != parser.TokDot { + return "", false + } + k++ + if k >= n || tokens[k].Kind != parser.TokOpenBrace { + return "", false + } + openBraceLine := tokens[k].Line + + // Check that "}" is NOT on the same line as "{" + for m := k + 1; m < n; m++ { + if tokens[m].Line > openBraceLine { + break + } + if tokens[m].Kind == parser.TokCloseBrace { + if tokens[m].Line == openBraceLine { + return "", false // single-line alias block + } + break + } + } + + // We're inside a multi-line alias block — resolve the parent module + parent := modName + + // Resolve __MODULE__ using enclosing module from token stream + aliasLine := tok.Line - 1 // convert to 0-based + enclosingModule := extractEnclosingModuleFromTokens(source, tokens, aliasLine) + if enclosingModule != "" { + parent = strings.ReplaceAll(parent, "__MODULE__", enclosingModule) + } + if strings.Contains(parent, "__MODULE__") { + return "", false + } + return parent, true + } + + return "", false +} + +func tokParseModuleDef(source []byte, tokens []parser.Token, from int, currentModule string) (name string, nextPos int, hasDo bool) { + n := len(tokens) + j := tokNextSig(tokens, n, from) + name, k := tokCollectModuleName(source, tokens, n, j) + if name == "" { + return "", from, false + } + if !strings.Contains(name, ".") && currentModule != "" { + name = currentModule + "." + name + } + _, nextPos, hasDo = parser.ScanForwardToBlockDo(tokens, n, k) + return name, nextPos, hasDo +} + +// extractEnclosingModuleFromTokens finds the innermost defmodule enclosing the given 0-based line. +func extractEnclosingModuleFromTokens(source []byte, tokens []parser.Token, targetLine int) string { + n := len(tokens) + targetLine1 := targetLine + 1 + + type moduleFrame struct { + name string + depth int + } + var stack []moduleFrame + depth := 0 + + processModuleDef := func(i int) int { + currentModule := "" + if len(stack) > 0 { + currentModule = stack[len(stack)-1].name + } + name, nextPos, hasDo := tokParseModuleDef(source, tokens, i, currentModule) + if name == "" { + return i + } + if hasDo { + depth++ + stack = append(stack, moduleFrame{name, depth}) + } + return nextPos + } + + for i := 0; i < n; i++ { + tok := tokens[i] + if tok.Line > targetLine1 { + break + } + + switch tok.Kind { + case parser.TokDo, parser.TokFn: + parser.TrackBlockDepth(tok.Kind, &depth) + case parser.TokEnd: + prevDepth := depth + parser.TrackBlockDepth(tok.Kind, &depth) + if len(stack) > 0 && stack[len(stack)-1].depth == prevDepth { + stack = stack[:len(stack)-1] + } + case parser.TokDefmodule, parser.TokDefprotocol, parser.TokDefimpl: + i = processModuleDef(i+1) - 1 // -1: loop post-increment will advance to the returned position + continue + } + } + + if len(stack) > 0 { + return stack[len(stack)-1].name + } + return "" +} + +// IsDefmoduleLine returns true if the given 0-based line contains a defmodule +// keyword, and returns the module name being defined on that line. +func IsDefmoduleLine(text string, lineNum int) (string, bool) { + // Fast path: check if the line even contains "defmodule" + lines := strings.Split(text, "\n") + if lineNum < 0 || lineNum >= len(lines) { + return "", false + } + if !strings.Contains(lines[lineNum], "defmodule") { + return "", false + } + + // Tokenize just that line to extract the module name + source := []byte(lines[lineNum]) + tokens := parser.Tokenize(source) + n := len(tokens) + + for i := 0; i < n; i++ { + if tokens[i].Kind == parser.TokDefmodule { + j := tokNextSig(tokens, n, i+1) + name, _ := tokCollectModuleName(source, tokens, n, j) + if name != "" { + return name, true + } + } + } + return "", false +} + +// FindModuleAttributeDefinitionTokenized searches for the line where @attr_name +// is defined (assigned a value, not used). Returns the 1-based line number and +// true if found. Uses the tokenizer for accurate parsing. +func FindModuleAttributeDefinitionTokenized(text string, attrName string) (int, bool) { + if reservedModuleAttrs[attrName] { + return 0, false + } + + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) + + for i := 0; i < n; i++ { + tok := tokens[i] + if tok.Kind != parser.TokAttr { + continue + } + + // TokAttr includes the @ prefix, so extract the name + attrText := parser.TokenText(source, tok) + if len(attrText) < 2 || attrText[0] != '@' { + continue + } + name := attrText[1:] + if name != attrName { + continue + } + + // Match only line-start attributes (equivalent to ^\s*@attr from + // the line-based parser), not references inside expressions. + atLineStart := true + for k := i - 1; k >= 0 && tokens[k].Kind != parser.TokEOL; k-- { + if tokens[k].Kind != parser.TokComment { + atLineStart = false + break + } + } + if !atLineStart { + continue + } + + // A definition needs a value token on the same line after @attr. + j := i + 1 + for j < n && tokens[j].Kind == parser.TokComment { + j++ + } + if j >= n || tokens[j].Kind == parser.TokEOL || tokens[j].Line != tok.Line { + continue + } + // Skip invalid `@attr @other_attr` patterns. + if tokens[j].Kind == parser.TokAttr { + continue + } + + return tok.Line, true + } + return 0, false +} + // IsPipeContext returns true if the text before prefixStartCol on this line // contains a pipe operator (|>), meaning the first argument is supplied by the // pipe and should be omitted from the completion snippet. @@ -174,194 +684,233 @@ type BufferFunction struct { // Functions with default parameters emit one entry per callable arity. // Private types (@typep) are included since they are accessible within the same file. func FindBufferFunctions(text string) []BufferFunction { + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) + seen := make(map[string]bool) var results []BufferFunction - for _, line := range strings.Split(text, "\n") { - if m := parser.FuncDefRe.FindStringSubmatch(line); m != nil { - name := m[2] - paramContent := parser.FindParamContent(line, name) - maxArity := parser.ArityFromParams(paramContent) - minArity := maxArity - parser.DefaultsFromParams(paramContent) - allParamNames := parser.ExtractParamNames(line, name) + + for i := 0; i < n; i++ { + tok := tokens[i] + + switch tok.Kind { + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + kind := parser.TokenText(source, tok) + j := tokNextSig(tokens, n, i+1) + if j >= n || tokens[j].Kind != parser.TokIdent { + continue + } + name := parser.TokenText(source, tokens[j]) + j++ + pj := tokNextSig(tokens, n, j) + maxArity := 0 + defaultCount := 0 + var paramNames []string + if pj < n && tokens[pj].Kind == parser.TokOpenParen { + maxArity, defaultCount, paramNames, _ = parser.CollectParams(source, tokens, n, pj) + paramNames = parser.FixParamNames(paramNames) + } + minArity := maxArity - defaultCount for arity := minArity; arity <= maxArity; arity++ { key := name + "/" + strconv.Itoa(arity) if !seen[key] { seen[key] = true - results = append(results, BufferFunction{Name: name, Arity: arity, Kind: m[1], Params: parser.JoinParams(allParamNames, arity)}) + results = append(results, BufferFunction{ + Name: name, + Arity: arity, + Kind: kind, + Params: parser.JoinParams(paramNames, arity), + }) } } - } else if m := parser.TypeDefRe.FindStringSubmatch(line); m != nil { - name := m[2] - arity := parser.ExtractArity(line, name) + + case parser.TokAttrType: + attrText := parser.TokenText(source, tok) + kind := "type" + switch attrText { + case "@opaque": + kind = "opaque" + case "@typep": + kind = "typep" + } + j := tokNextSig(tokens, n, i+1) + if j >= n || tokens[j].Kind != parser.TokIdent { + continue + } + name := parser.TokenText(source, tokens[j]) + arity := 0 + pj := tokNextSig(tokens, n, j+1) + if pj < n && tokens[pj].Kind == parser.TokOpenParen { + arity, _, _, _ = parser.CollectParams(source, tokens, n, pj) + } key := name + "/" + strconv.Itoa(arity) if !seen[key] { seen[key] = true - results = append(results, BufferFunction{Name: name, Arity: arity, Kind: m[1]}) + results = append(results, BufferFunction{Name: name, Arity: arity, Kind: kind}) } } } return results } -var ( - aliasMultiRe = regexp.MustCompile(`^\s*alias\s+([A-Za-z0-9_.]+)\.{([^}]+)}`) - importRe = regexp.MustCompile(`^\s*import\s+([A-Za-z0-9_.]+)`) - useRe = regexp.MustCompile(`^\s*use\s+([A-Za-z0-9_.]+)`) - usingDefRe = regexp.MustCompile(`^\s*defmacro\s+__using__`) - moduleAttrDefRe = regexp.MustCompile(`^\s*@([a-z_][a-z0-9_]*)\s+[^@]`) - keywordModuleRe = regexp.MustCompile(`Keyword\.(?:put_new|put)\([^,]+,\s*:[a-z_]+,\s*([A-Z][A-Za-z0-9_.]+)\)`) - - // Dynamic opt-binding patterns inside __using__ bodies. - // var = Keyword.get/pop(opts, :key, Default) - varKeywordWithDefaultRe = regexp.MustCompile(`^\s*([a-z_][a-z0-9_]*)\s*=\s*Keyword\.(?:get|pop)\s*\([^,]+,\s*:([a-z_][a-z0-9_]*),\s*([A-Z][A-Za-z0-9_.]+)\)`) - // {var, _} = Keyword.pop(opts, :key, Default) — tuple destructuring - varTupleKeywordRe = regexp.MustCompile(`^\s*\{([a-z_][a-z0-9_]*),\s*[^}]+\}\s*=\s*Keyword\.pop\s*\([^,]+,\s*:([a-z_][a-z0-9_]*),\s*([A-Z][A-Za-z0-9_.]+)\)`) - // var = Keyword.fetch/fetch!/pop!/pop_lazy(opts, :key) — no parseable default - varKeywordNoDefaultRe = regexp.MustCompile(`^\s*([a-z_][a-z0-9_]*)\s*=\s*Keyword\.(?:fetch!?|pop!|pop_lazy)\s*\([^,]+,\s*:([a-z_][a-z0-9_]*)\b`) - // var = ModuleName (simple assignment to a capitalized module) - varSimpleModuleRe = regexp.MustCompile(`^\s*([a-z_][a-z0-9_]*)\s*=\s*([A-Z][A-Za-z0-9_.]+)\s*$`) - - // import/use with unquote(var) — captures the var name - importUnquoteRe = regexp.MustCompile(`^\s*import\s+unquote\(([a-z_][a-z0-9_]*)\)`) - useUnquoteRe = regexp.MustCompile(`^\s*use\s+unquote\(([a-z_][a-z0-9_]*)\)`) - - // ExUnit.CaseTemplate detection and `using` macro form - caseTemplateRe = regexp.MustCompile(`^\s*use\s+ExUnit\.CaseTemplate\b`) - caseTemplateUsingRe = regexp.MustCompile(`^\s*using\b`) - // quote do block inside a helper function - quoteDoRe = regexp.MustCompile(`^\s*quote\s+do\b`) - // bare function call: function_name( — used to detect delegation to a helper - bareCallRe = regexp.MustCompile(`^\s*([a-z_][a-z0-9_]*)\s*\(`) - - // use Module, key: Val, key2: Val2 — captures (module, opts_string) - useWithOptsRe = regexp.MustCompile(`^\s*use\s+([A-Za-z0-9_.]+)\s*,\s*(.+)$`) - // individual key: Module pairs in opts - optKeyModuleRe = regexp.MustCompile(`([a-z_][a-z0-9_]*):\s*([A-Z][A-Za-z0-9_.]+)`) -) - // ExtractAliases parses all alias declarations from document text. // Returns a map of short name -> full module name (not scope-aware). func ExtractAliases(text string) map[string]string { - return extractAliasesFromLines(strings.Split(text, "\n"), -1) + return extractAliasesFromText(text, -1) } // ExtractAliasesInScope parses alias declarations visible at the given 0-based // line. In Elixir, aliases are lexically scoped to the enclosing defmodule — // a nested module does NOT inherit its parent's aliases. func ExtractAliasesInScope(text string, targetLine int) map[string]string { - return extractAliasesFromLines(strings.Split(text, "\n"), targetLine) + return extractAliasesFromText(text, targetLine) } -// extractAliasesFromLines is the shared implementation. When targetLine >= 0, only -// aliases from the module scope enclosing that line are returned. -// Uses a single pass: collects all aliases keyed by their module scope, then -// returns only those matching the target line's scope. -func extractAliasesFromLines(lines []string, targetLine int) map[string]string { +// extractAliasesFromText is the shared implementation using the tokenizer. +// When targetLine >= 0, only aliases from the module scope enclosing that +// 0-based line are returned. Uses a single pass over the token stream. +func extractAliasesFromText(text string, targetLine int) map[string]string { + source := []byte(text) + tokens := parser.Tokenize(source) + return extractAliasesFromTokens(source, tokens, targetLine) +} + +// extractAliasesFromTokens is the implementation that works with pre-tokenized data. +func extractAliasesFromTokens(source []byte, tokens []parser.Token, targetLine int) map[string]string { + n := len(tokens) + type moduleFrame struct { name string - depth int // do..end nesting depth when this module was opened + depth int } var stack []moduleFrame depth := 0 - // Per-scope alias collection: module name → alias map - var allAliases []struct { - scope string - short string - full string + type aliasEntry struct { + scope, short, full string } + var allAliases []aliasEntry var targetModule string unscoped := targetLine < 0 - inHeredoc := false + // targetLine is 0-based; token.Line is 1-based + targetLine1 := targetLine + 1 - for i, line := range lines { - var skip bool - inHeredoc, skip = parser.CheckHeredoc(line, inHeredoc) - if skip { - // Still track targetLine so scope is correct for lines inside heredocs - if i == targetLine { - if len(stack) > 0 { - targetModule = stack[len(stack)-1].name - } - } - continue + currentModule := func() string { + if len(stack) > 0 { + return stack[len(stack)-1].name } + return "" + } - trimmed := strings.TrimSpace(line) - stripped := strings.TrimSpace(parser.StripCommentsAndStrings(trimmed)) - - // Track do..end nesting - if parser.IsEnd(stripped) { - if len(stack) > 0 && stack[len(stack)-1].depth == depth { - stack = stack[:len(stack)-1] - } - depth-- - if depth < 0 { - depth = 0 - } + resolve := func(s string) string { + cm := currentModule() + if cm != "" { + return strings.ReplaceAll(s, "__MODULE__", cm) } + return s + } - if parser.OpensBlock(stripped) { - depth++ + processModuleDef := func(i int) int { + name, nextPos, hasDo := tokParseModuleDef(source, tokens, i, currentModule()) + if name == "" { + return i } - - if m := parser.DefmoduleRe.FindStringSubmatch(trimmed); m != nil { - name := m[1] - if !strings.Contains(name, ".") && len(stack) > 0 { - name = stack[len(stack)-1].name + "." + name - } + if hasDo { + depth++ stack = append(stack, moduleFrame{name, depth}) } + return nextPos + } - currentModule := "" - if len(stack) > 0 { - currentModule = stack[len(stack)-1].name - } + for i := 0; i < n; i++ { + tok := tokens[i] - if i == targetLine { - targetModule = currentModule + // Track target line's module scope (check before any depth changes) + if !unscoped && targetModule == "" && tok.Line >= targetLine1 { + targetModule = currentModule() } - resolve := func(s string) string { - if currentModule != "" { - return strings.ReplaceAll(s, "__MODULE__", currentModule) + switch tok.Kind { + case parser.TokDo, parser.TokFn: + parser.TrackBlockDepth(tok.Kind, &depth) + case parser.TokEnd: + prevDepth := depth + parser.TrackBlockDepth(tok.Kind, &depth) + if len(stack) > 0 && stack[len(stack)-1].depth == prevDepth { + stack = stack[:len(stack)-1] } - return s - } - // Collect alias declarations - if m := parser.AliasAsRe.FindStringSubmatch(line); m != nil { - resolved := resolve(m[1]) - if !strings.Contains(resolved, "__MODULE__") { - allAliases = append(allAliases, struct { - scope, short, full string - }{currentModule, m[2], resolved}) - } - } else if m := aliasMultiRe.FindStringSubmatch(line); m != nil { - base := resolve(m[1]) - if !strings.Contains(base, "__MODULE__") { - for _, name := range strings.Split(m[2], ",") { - name = strings.TrimSpace(name) - if len(name) > 0 && unicode.IsUpper(rune(name[0])) { - allAliases = append(allAliases, struct { - scope, short, full string - }{currentModule, name, base + "." + name}) - } + case parser.TokDefmodule, parser.TokDefprotocol, parser.TokDefimpl: + i = processModuleDef(i+1) - 1 // -1: loop post-increment will advance to the returned position + continue + + case parser.TokAlias: + cm := currentModule() + j := tokNextSig(tokens, n, i+1) + modName, k := tokCollectModuleName(source, tokens, n, j) + if modName == "" { + continue + } + + // Multi-alias: alias Parent.{A, B, C} + if children, nextPos, ok := parser.ScanMultiAliasChildren(source, tokens, n, k, true); ok { + base := resolve(modName) + if strings.Contains(base, "__MODULE__") { + continue + } + for _, child := range children { + allAliases = append(allAliases, aliasEntry{cm, parser.AliasShortName(child), base + "." + child}) + } + i = nextPos - 1 + continue + } + + // Check for alias Module, as: Name + if asName, nextPos, ok := parser.ScanKeywordOptionValue(source, tokens, n, k, "as"); ok { + resolved := resolve(modName) + if !strings.Contains(resolved, "__MODULE__") { + allAliases = append(allAliases, aliasEntry{cm, asName, resolved}) } + i = nextPos - 1 + continue + } + + // Simple alias + resolved := resolve(modName) + if !strings.Contains(resolved, "__MODULE__") { + allAliases = append(allAliases, aliasEntry{cm, parser.AliasShortName(resolved), resolved}) } - } else if m := parser.AliasRe.FindStringSubmatch(line); m != nil { - fullMod := resolve(m[1]) - if !strings.Contains(fullMod, "__MODULE__") { - parts := strings.Split(fullMod, ".") - allAliases = append(allAliases, struct { - scope, short, full string - }{currentModule, parts[len(parts)-1], fullMod}) + i = k - 1 + + case parser.TokRequire: + cm := currentModule() + j := tokNextSig(tokens, n, i+1) + modName, k := tokCollectModuleName(source, tokens, n, j) + if modName == "" { + continue + } + + // Check for require Module, as: Name + if asName, nextPos, ok := parser.ScanKeywordOptionValue(source, tokens, n, k, "as"); ok { + resolved := resolve(modName) + if !strings.Contains(resolved, "__MODULE__") { + allAliases = append(allAliases, aliasEntry{cm, asName, resolved}) + } + i = nextPos - 1 + continue } + i = k - 1 } } - // Filter by scope + // If targetLine was past all tokens, resolve now + if !unscoped && targetModule == "" { + targetModule = currentModule() + } + aliases := make(map[string]string) for _, a := range allAliases { if unscoped || a.scope == targetModule { @@ -371,141 +920,202 @@ func extractAliasesFromLines(lines []string, targetLine int) map[string]string { return aliases } +// Token-walking aliases for the shared parser helpers. +var ( + tokNextSig = parser.NextSigToken + tokCollectModuleName = parser.CollectModuleName +) + // ExtractImports parses all import declarations from document text. // Returns a slice of full module names. func ExtractImports(text string) []string { + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) var imports []string - for _, line := range strings.Split(text, "\n") { - if m := importRe.FindStringSubmatch(line); m != nil { - imports = append(imports, m[1]) + for i := 0; i < n; i++ { + if tokens[i].Kind == parser.TokImport { + j := tokNextSig(tokens, n, i+1) + mod, _ := tokCollectModuleName(source, tokens, n, j) + if mod != "" { + imports = append(imports, mod) + } } } return imports } -// extractAliasFromLine checks whether line matches an alias declaration -// (alias X, as: Y / alias X.{A, B} / alias X.Y) and, if so, records it in -// aliases and returns the (possibly newly-created) map plus true. Returns -// (aliases, false) when the line is not an alias declaration. -func extractAliasFromLine(line string, aliases map[string]string, resolveAlias func(string) string) (map[string]string, bool) { - if m := parser.AliasAsRe.FindStringSubmatch(line); m != nil { - if aliases == nil { - aliases = make(map[string]string) - } - aliases[m[2]] = resolveAlias(m[1]) - return aliases, true - } - if m := aliasMultiRe.FindStringSubmatch(line); m != nil { - base := resolveAlias(m[1]) - for _, name := range strings.Split(m[2], ",") { - name = strings.TrimSpace(name) - if len(name) > 0 && unicode.IsUpper(rune(name[0])) { - if aliases == nil { - aliases = make(map[string]string) - } - aliases[name] = base + "." + name +// skipToEndOfStatement advances from the given token index past the current statement +// (to the next TokEOL at bracket/block depth 0, or to end of tokens). +func skipToEndOfStatement(tokens []parser.Token, n, from int) int { + depth := 0 + blockDepth := 0 + for i := from; i < n; i++ { + switch tokens[i].Kind { + case parser.TokOpenParen, parser.TokOpenBracket, parser.TokOpenBrace, parser.TokOpenAngle: + depth++ + case parser.TokCloseParen, parser.TokCloseBracket, parser.TokCloseBrace, parser.TokCloseAngle: + if depth > 0 { + depth-- + } + case parser.TokDo, parser.TokFn: + blockDepth++ + case parser.TokEnd: + if blockDepth > 0 { + blockDepth-- + } + case parser.TokEOL, parser.TokEOF: + if depth <= 0 && blockDepth <= 0 { + return i } } - return aliases, true - } - if m := parser.AliasRe.FindStringSubmatch(line); m != nil { - resolved := resolveAlias(m[1]) - parts := strings.Split(resolved, ".") - if aliases == nil { - aliases = make(map[string]string) - } - aliases[parts[len(parts)-1]] = resolved - return aliases, true } - return aliases, false + return n } -// parseHelperQuoteBlock finds `def/defp helperName` in lines, locates its -// `quote do` block, and extracts imports/uses/inline-defs/aliases from it. -// Returns nil slices if the function or its quote block can't be found. +// parseHelperQuoteBlock finds `def/defp helperName` in the source text, locates +// its `quote do` block, and extracts imports/uses/inline-defs/aliases from it. +// Uses the tokenizer for correct heredoc and multi-line handling. func parseHelperQuoteBlock(lines []string, helperName string, fileAliases map[string]string) (imported []string, inlineDefs map[string][]inlineDef, transUses []string, optBindings []optBinding, aliases map[string]string) { - resolveAlias := func(modName string) string { - return parser.ResolveModuleRef(modName, fileAliases, "") - } + source := []byte(strings.Join(lines, "\n")) + tokens := parser.Tokenize(source) + n := len(tokens) - // Find the def/defp for helperName - funcIdx := -1 - funcIndent := 0 - for i, line := range lines { - trimmed := strings.TrimSpace(line) - rest := "" - if strings.HasPrefix(trimmed, "defp ") { - rest = trimmed[5:] - } else if strings.HasPrefix(trimmed, "def ") { - rest = trimmed[4:] - } - if rest != "" && strings.HasPrefix(rest, helperName) { - after := rest[len(helperName):] - if after == "" || after[0] == '(' || after[0] == ' ' || after[0] == '\t' || after[0] == ',' { - funcIdx = i - funcIndent = len(line) - len(strings.TrimLeft(line, " \t")) - break - } + resolveAlias := func(modName string) string { + if resolved := parser.ResolveModuleRef(modName, aliases, ""); resolved != modName { + return resolved } - } - if funcIdx < 0 { - return + return parser.ResolveModuleRef(modName, fileAliases, "") } - // Find the quote do block inside the function - quoteIdx := -1 - quoteIndent := 0 - for i := funcIdx + 1; i < len(lines); i++ { - line := lines[i] - trimmed := strings.TrimSpace(line) - indent := len(line) - len(strings.TrimLeft(line, " \t")) - if indent <= funcIndent && (trimmed == "end" || parser.FuncDefRe.MatchString(line)) { - break + // Find def/defp helperName + helperStart := -1 + for i := 0; i < n; i++ { + if tokens[i].Kind != parser.TokDef && tokens[i].Kind != parser.TokDefp { + continue } - if quoteDoRe.MatchString(line) { - quoteIdx = i - quoteIndent = indent + j := tokNextSig(tokens, n, i+1) + if j < n && tokens[j].Kind == parser.TokIdent && string(source[tokens[j].Start:tokens[j].End]) == helperName { + // Find the TokDo that opens this function. Don't stop at TokEOL + // because Elixir allows `do` on the next line after multi-line params. + if _, nextPos, hasDo := parser.ScanForwardToBlockDo(tokens, n, j+1); hasDo { + helperStart = nextPos + } break } } - if quoteIdx < 0 { + if helperStart < 0 { return } - // Parse the quote do block for imports/uses/inline-defs - inlineDefs = make(map[string][]inlineDef) - for i := quoteIdx + 1; i < len(lines); i++ { - line := lines[i] - trimmed := strings.TrimSpace(line) - if trimmed == "" { - continue - } - indent := len(line) - len(strings.TrimLeft(line, " \t")) - if indent <= quoteIndent && (trimmed == "end" || parser.FuncDefRe.MatchString(line)) { - break - } - - if m := importRe.FindStringSubmatch(line); m != nil { - imported = append(imported, resolveAlias(m[1])) - continue - } - if m := useRe.FindStringSubmatch(line); m != nil { - transUses = append(transUses, resolveAlias(m[1])) - continue + // Find `quote do` inside the function body + quoteBodyStart := -1 + depth := 1 + for i := helperStart; i < n && depth > 0; i++ { + parser.TrackBlockDepth(tokens[i].Kind, &depth) + switch tokens[i].Kind { + case parser.TokIdent: + if string(source[tokens[i].Start:tokens[i].End]) == "quote" { + j := tokNextSig(tokens, n, i+1) + if j < n && tokens[j].Kind == parser.TokDo { + quoteBodyStart = j + 1 + } + } } - if updated, matched := extractAliasFromLine(line, aliases, resolveAlias); matched { - aliases = updated - continue + if quoteBodyStart >= 0 { + break } - if m := parser.FuncDefRe.FindStringSubmatch(line); m != nil { - funcName := m[2] - arity := parser.ExtractArity(line, funcName) - inlineDefs[funcName] = append(inlineDefs[funcName], inlineDef{ - line: i + 1, - arity: arity, - kind: m[1], - params: parser.JoinParams(parser.ExtractParamNames(line, funcName), arity), - }) + } + if quoteBodyStart < 0 { + return + } + + // Walk the quote do block (depth 1 = inside quote do, 0 = we hit its end) + inlineDefs = make(map[string][]inlineDef) + depth = 1 + for i := quoteBodyStart; i < n && depth > 0; i++ { + tok := tokens[i] + parser.TrackBlockDepth(tok.Kind, &depth) + switch tok.Kind { + + case parser.TokImport: + j := tokNextSig(tokens, n, i+1) + mod, _ := tokCollectModuleName(source, tokens, n, j) + if mod != "" { + imported = append(imported, resolveAlias(mod)) + } + + case parser.TokUse: + j := tokNextSig(tokens, n, i+1) + mod, _ := tokCollectModuleName(source, tokens, n, j) + if mod != "" { + transUses = append(transUses, resolveAlias(mod)) + } + + case parser.TokAlias: + j := tokNextSig(tokens, n, i+1) + modName, k := tokCollectModuleName(source, tokens, n, j) + if modName == "" { + continue + } + // Multi-alias: alias Parent.{A, B} + if children, nextPos, ok := parser.ScanMultiAliasChildren(source, tokens, n, k, false); ok { + base := resolveAlias(modName) + for _, child := range children { + if aliases == nil { + aliases = make(map[string]string) + } + aliases[parser.AliasShortName(child)] = base + "." + child + } + i = nextPos - 1 + continue + } + // alias Module, as: Name + if asName, nextPos, ok := parser.ScanKeywordOptionValue(source, tokens, n, k, "as"); ok { + if aliases == nil { + aliases = make(map[string]string) + } + aliases[asName] = resolveAlias(modName) + i = nextPos - 1 + continue + } + // Simple alias + resolved := resolveAlias(modName) + if aliases == nil { + aliases = make(map[string]string) + } + aliases[parser.AliasShortName(resolved)] = resolved + i = k - 1 + + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + kind := string(source[tok.Start:tok.End]) + defLine := tok.Line + j := tokNextSig(tokens, n, i+1) + if j >= n || tokens[j].Kind != parser.TokIdent { + continue + } + funcName := string(source[tokens[j].Start:tokens[j].End]) + j++ + pj := tokNextSig(tokens, n, j) + nextPos := pj + maxArity := 0 + defaultCount := 0 + var paramNames []string + if pj < n && tokens[pj].Kind == parser.TokOpenParen { + maxArity, defaultCount, paramNames, nextPos = collectParamsFromTokens(source, tokens, n, pj) + paramNames = parser.FixParamNames(paramNames) + } + minArity := maxArity - defaultCount + for arity := minArity; arity <= maxArity; arity++ { + inlineDefs[funcName] = append(inlineDefs[funcName], inlineDef{ + line: defLine, + arity: arity, + kind: kind, + params: parser.JoinParams(paramNames, arity), + }) + } + i = skipToEndOfStatement(tokens, n, nextPos) - 1 } } return @@ -513,10 +1123,17 @@ func parseHelperQuoteBlock(lines []string, helperName string, fileAliases map[st // ExtractUses returns module names from all `use Module` declarations. func ExtractUses(text string) []string { + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) var uses []string - for _, line := range strings.Split(text, "\n") { - if m := useRe.FindStringSubmatch(line); m != nil { - uses = append(uses, m[1]) + for i := 0; i < n; i++ { + if tokens[i].Kind == parser.TokUse { + j := tokNextSig(tokens, n, i+1) + mod, _ := tokCollectModuleName(source, tokens, n, j) + if mod != "" { + uses = append(uses, mod) + } } } return uses @@ -530,31 +1147,78 @@ type UseCall struct { // ExtractUsesWithOpts parses all `use Module` and `use Module, key: Val` // declarations, returning each as a UseCall. Aliases are resolved using the -// provided map. +// provided map. Handles opts spanning multiple lines via the tokenizer. func ExtractUsesWithOpts(text string, aliases map[string]string) []UseCall { + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) var calls []UseCall - for _, line := range strings.Split(text, "\n") { - // use Module, key: Val - if m := useWithOptsRe.FindStringSubmatch(line); m != nil { - module := parser.ResolveModuleRef(m[1], aliases, "") - calls = append(calls, UseCall{Module: module, Opts: ParseKeywordModuleOpts(m[2], aliases)}) + + for i := 0; i < n; i++ { + if tokens[i].Kind != parser.TokUse { + continue + } + j := tokNextSig(tokens, n, i+1) + modName, k := tokCollectModuleName(source, tokens, n, j) + if modName == "" { continue } - // plain use Module - if m := useRe.FindStringSubmatch(line); m != nil { - calls = append(calls, UseCall{Module: parser.ResolveModuleRef(m[1], aliases, "")}) + module := parser.ResolveModuleRef(modName, aliases, "") + + // Check for comma after module name → keyword opts follow + nk := tokNextSig(tokens, n, k) + if nk < n && tokens[nk].Kind == parser.TokComma { + opts := tokCollectKeywordModuleOpts(source, tokens, n, nk+1, aliases) + calls = append(calls, UseCall{Module: module, Opts: opts}) + } else { + calls = append(calls, UseCall{Module: module}) } + i = k } return calls } -// ParseKeywordModuleOpts parses an Elixir keyword list string (e.g. "mod: Hammox, repo: MyRepo") -// into a map of key → value. Only entries whose value starts with an uppercase letter -// (module names) are included. Alias resolution is applied to module values. -func ParseKeywordModuleOpts(optsStr string, aliases map[string]string) map[string]string { +// tokCollectKeywordModuleOpts scans tokens starting at pos for keyword pairs +// like `key: ModuleName` and returns a map of key → resolved module name. +// Only includes entries whose value is a module (starts with uppercase). +func tokCollectKeywordModuleOpts(source []byte, tokens []parser.Token, n, pos int, aliases map[string]string) map[string]string { result := make(map[string]string) - for _, m := range optKeyModuleRe.FindAllStringSubmatch(optsStr, -1) { - result[m[1]] = parser.ResolveModuleRef(m[2], aliases, "") + i := tokNextSig(tokens, n, pos) + for i < n { + tok := tokens[i] + // Stop at EOL not followed by a continuation (keyword opt) + if tok.Kind == parser.TokEOL { + j := tokNextSig(tokens, n, i+1) + if j >= n || tokens[j].Kind == parser.TokEOL || tokens[j].Kind == parser.TokEOF { + break + } + // Check if next sig token is an ident followed by colon (keyword opt) + if tokens[j].Kind == parser.TokIdent { + jj := j + 1 + if jj < n && tokens[jj].Kind == parser.TokColon { + i = j + continue + } + } + break + } + if tok.Kind == parser.TokEOF { + break + } + // Match: ident colon Module + if tok.Kind == parser.TokIdent { + if i+1 < n && tokens[i+1].Kind == parser.TokColon { + key := parser.TokenText(source, tok) + k := tokNextSig(tokens, n, i+2) + if k < n && tokens[k].Kind == parser.TokModule { + modName, _ := tokCollectModuleName(source, tokens, n, k) + if modName != "" { + result[key] = parser.ResolveModuleRef(modName, aliases, "") + } + } + } + } + i++ } return result } @@ -574,211 +1238,467 @@ type inlineDef struct { // dynamic opt-driven imports (e.g. `import unquote(mod)` where `mod` comes from // a Keyword.get on opts), and alias declarations that get injected into the // consumer module. +// +// Uses the tokenizer so that heredocs, multi-line expressions, and comments are +// handled correctly without line-joining heuristics. func parseUsingBody(text string) (imported []string, inlineDefs map[string][]inlineDef, transUses []string, optBindings []optBinding, aliases map[string]string) { - lines := strings.Split(text, "\n") - fileAliases := extractAliasesFromLines(lines, -1) + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) - // Check if this module uses ExUnit.CaseTemplate (which provides the `using` macro) + nextSig := func(from int) int { + return tokNextSig(tokens, n, from) + } + + collectModuleName := func(i int) (string, int) { + return tokCollectModuleName(source, tokens, n, i) + } + + // Check if this module uses ExUnit.CaseTemplate usesCaseTemplate := false - for _, line := range lines { - if caseTemplateRe.MatchString(line) { - usesCaseTemplate = true - break + for i := 0; i < n; i++ { + if tokens[i].Kind == parser.TokUse { + j := nextSig(i + 1) + mod, _ := collectModuleName(j) + if mod == "ExUnit.CaseTemplate" { + usesCaseTemplate = true + break + } } } - usingIdx := -1 - usingIndent := 0 - inHeredoc := false - for i, line := range lines { - if strings.IndexByte(line, '"') >= 0 { - if count := strings.Count(line, `"""`); count > 0 { - if count < 2 { - inHeredoc = !inHeredoc + // Find the __using__ entry point: defmacro __using__ or ExUnit.CaseTemplate `using` + usingBodyStart := -1 + usingDepth := -1 + + for i := 0; i < n; i++ { + tok := tokens[i] + if tok.Kind == parser.TokDefmacro { + j := nextSig(i + 1) + if j < n && tokens[j].Kind == parser.TokIdent && string(source[tokens[j].Start:tokens[j].End]) == "__using__" { + // Scan forward to find TokDo; Elixir allows split-line heads. + if _, nextPos, hasDo := parser.ScanForwardToBlockDo(tokens, n, j+1); hasDo { + usingBodyStart = nextPos + usingDepth = 1 // inside the defmacro do..end } - continue + break } } - if strings.IndexByte(line, '\'') >= 0 { - if count := strings.Count(line, `'''`); count > 0 { - if count < 2 { - inHeredoc = !inHeredoc + // ExUnit.CaseTemplate: `using do` or `using opts do` + if usesCaseTemplate && tok.Kind == parser.TokIdent && string(source[tok.Start:tok.End]) == "using" { + // Must be at statement start + if i == 0 || tokens[i-1].Kind == parser.TokEOL { + if _, nextPos, hasDo := parser.ScanForwardToBlockDo(tokens, n, i+1); hasDo { + usingBodyStart = nextPos + usingDepth = 1 + } + if usingBodyStart >= 0 { + break } - continue } } - if inHeredoc { - continue - } - if usingDefRe.MatchString(line) { - usingIdx = i - usingIndent = len(line) - len(strings.TrimLeft(line, " \t")) - break - } - // ExUnit.CaseTemplate's `using opts do` form — only when the module - // explicitly uses ExUnit.CaseTemplate (or a known subclass). - if usesCaseTemplate && caseTemplateUsingRe.MatchString(line) { - usingIdx = i - usingIndent = len(line) - len(strings.TrimLeft(line, " \t")) - break - } } - if usingIdx < 0 { + if usingBodyStart < 0 { return } + // Extract file-level aliases for resolution (reuse already-tokenized data) + lines := strings.Split(text, "\n") + fileAliases := extractAliasesFromTokens(source, tokens, -1) + inlineDefs = make(map[string][]inlineDef) resolveAlias := func(modName string) string { + if resolved := parser.ResolveModuleRef(modName, aliases, ""); resolved != modName { + return resolved + } return parser.ResolveModuleRef(modName, fileAliases, "") } - // varToOpt tracks variables bound from opts: var_name → {optKey, defaultMod} type varBinding struct { optKey string defaultMod string } varToOpt := make(map[string]varBinding) - for i := usingIdx + 1; i < len(lines); i++ { - line := lines[i] - trimmed := strings.TrimSpace(line) - if trimmed == "" { - continue - } - indent := len(line) - len(strings.TrimLeft(line, " \t")) - // Stop at another definition or closing end at the same indentation level - if indent <= usingIndent && (parser.FuncDefRe.MatchString(line) || trimmed == "end") { - break - } - - // Detect var = Keyword.get/pop(opts, :key, Default) - if m := varKeywordWithDefaultRe.FindStringSubmatch(line); m != nil { - varToOpt[m[1]] = varBinding{optKey: m[2], defaultMod: resolveAlias(m[3])} - continue + // scanKeywordCall checks if tokens starting at i match: + // Keyword.{get|pop|put|put_new|fetch|fetch!|pop!|pop_lazy}(ident, :key [, Default]) + // Returns (funcName, argIdent, atomKey, defaultModule, endPos) or empty strings if no match. + scanKeywordCall := func(i int) (string, string, string, int) { + // Expect: TokModule("Keyword") TokDot TokIdent(funcName) TokOpenParen + if i+3 >= n { + return "", "", "", i } - // Detect {var, _} = Keyword.pop(opts, :key, Default) — tuple destructuring - if m := varTupleKeywordRe.FindStringSubmatch(line); m != nil { - varToOpt[m[1]] = varBinding{optKey: m[2], defaultMod: resolveAlias(m[3])} - continue + if tokens[i].Kind != parser.TokModule || string(source[tokens[i].Start:tokens[i].End]) != "Keyword" { + return "", "", "", i } - // Detect var = Keyword.fetch/fetch!/pop!/pop_lazy(opts, :key) — no parseable default - if m := varKeywordNoDefaultRe.FindStringSubmatch(line); m != nil { - varToOpt[m[1]] = varBinding{optKey: m[2]} - continue + if tokens[i+1].Kind != parser.TokDot { + return "", "", "", i } - // Detect var = ModuleName (simple assignment to a module constant) - if m := varSimpleModuleRe.FindStringSubmatch(line); m != nil { - varToOpt[m[1]] = varBinding{defaultMod: resolveAlias(m[2])} - continue + if tokens[i+2].Kind != parser.TokIdent { + return "", "", "", i } - // Keyword.put_new/put with a module default: the module may be passed - // into a transitive `use` via unquote(opts) — keep as a heuristic. - if m := keywordModuleRe.FindStringSubmatch(line); m != nil { - transUses = append(transUses, resolveAlias(m[1])) + funcName := string(source[tokens[i+2].Start:tokens[i+2].End]) + j := nextSig(i + 3) + if j >= n || tokens[j].Kind != parser.TokOpenParen { + return "", "", "", i } - - // import unquote(var) — dynamic import, goes into optBindings only so - // consumer-provided opts take priority over the default. - if m := importUnquoteRe.FindStringSubmatch(line); m != nil { - varName := m[1] - if b, ok := varToOpt[varName]; ok { - optBindings = append(optBindings, optBinding{optKey: b.optKey, defaultMod: b.defaultMod, kind: "import"}) + j++ // skip ( + + // Skip first argument (the opts variable) up to comma + depth := 1 + for j < n && depth > 0 { + switch tokens[j].Kind { + case parser.TokOpenParen, parser.TokOpenBracket, parser.TokOpenBrace: + depth++ + case parser.TokCloseParen, parser.TokCloseBracket, parser.TokCloseBrace: + depth-- + if depth == 0 { + return funcName, "", "", j + 1 + } + case parser.TokComma: + if depth == 1 { + j++ + goto foundFirstComma + } } - continue + j++ } + return funcName, "", "", j + foundFirstComma: - // use unquote(var) — dynamic use, goes into optBindings only. - if m := useUnquoteRe.FindStringSubmatch(line); m != nil { - varName := m[1] - if b, ok := varToOpt[varName]; ok { - optBindings = append(optBindings, optBinding{optKey: b.optKey, defaultMod: b.defaultMod, kind: "use"}) - } - continue + // Expect :atom_key + j = nextSig(j) + if j >= n || tokens[j].Kind != parser.TokAtom { + return funcName, "", "", skipToEndOfStatement(tokens, n, j) } - - if m := importRe.FindStringSubmatch(line); m != nil { - imported = append(imported, resolveAlias(m[1])) - continue + atomText := string(source[tokens[j].Start:tokens[j].End]) + atomKey := "" + if len(atomText) > 1 && atomText[0] == ':' { + atomKey = atomText[1:] } + j++ - if m := useRe.FindStringSubmatch(line); m != nil { - transUses = append(transUses, resolveAlias(m[1])) - continue + // Check for optional comma + default module + j = nextSig(j) + if j >= n { + return funcName, atomKey, "", j } - - if updated, matched := extractAliasFromLine(line, aliases, resolveAlias); matched { - aliases = updated - continue + if tokens[j].Kind == parser.TokCloseParen { + return funcName, atomKey, "", j + 1 + } + if tokens[j].Kind == parser.TokComma { + j = nextSig(j + 1) + defaultMod, endJ := collectModuleName(j) + if defaultMod != "" { + // Advance to close paren + for endJ < n && tokens[endJ].Kind != parser.TokCloseParen { + endJ++ + } + if endJ < n { + endJ++ + } + return funcName, atomKey, defaultMod, endJ + } } + // Skip to end + return funcName, atomKey, "", skipToEndOfStatement(tokens, n, j) + } - // Delegation to a helper function: `using_block(opts)` or similar. - // Find that function's definition in the same file and parse its - // quote do block, which contains the actual imports/uses to inject. - if m := bareCallRe.FindStringSubmatch(line); m != nil { - helperName := m[1] - helperImported, helperDefs, helperTransUses, helperBindings, helperAliases := parseHelperQuoteBlock(lines, helperName, fileAliases) - if helperImported != nil { - imported = append(imported, helperImported...) - for k, v := range helperDefs { - inlineDefs[k] = append(inlineDefs[k], v...) + // Walk tokens inside the __using__ body + depth := usingDepth + i := usingBodyStart + for i < n && depth > 0 { + tok := tokens[i] + + switch tok.Kind { + case parser.TokDo, parser.TokFn, parser.TokEnd: + parser.TrackBlockDepth(tok.Kind, &depth) + i++ + case parser.TokEOL, parser.TokComment, parser.TokString, parser.TokHeredoc, + parser.TokSigil, parser.TokAtom, parser.TokNumber, parser.TokCharLiteral, + parser.TokEOF: + i++ + + case parser.TokImport: + i++ + j := nextSig(i) + // import unquote(var) + if j < n && tokens[j].Kind == parser.TokIdent && string(source[tokens[j].Start:tokens[j].End]) == "unquote" { + if j+2 < n && tokens[j+1].Kind == parser.TokOpenParen && tokens[j+2].Kind == parser.TokIdent { + varName := source[tokens[j+2].Start:tokens[j+2].End] + if b, ok := varToOpt[string(varName)]; ok { + optBindings = append(optBindings, optBinding{optKey: b.optKey, defaultMod: b.defaultMod, kind: "import"}) + } } - transUses = append(transUses, helperTransUses...) - optBindings = append(optBindings, helperBindings...) + i = skipToEndOfStatement(tokens, n, j) + continue + } + // import Module + modName, k := collectModuleName(j) + if modName != "" { + imported = append(imported, resolveAlias(modName)) + } + i = k + + case parser.TokUse: + i++ + j := nextSig(i) + // use unquote(var) + if j < n && tokens[j].Kind == parser.TokIdent && string(source[tokens[j].Start:tokens[j].End]) == "unquote" { + if j+2 < n && tokens[j+1].Kind == parser.TokOpenParen && tokens[j+2].Kind == parser.TokIdent { + varName := source[tokens[j+2].Start:tokens[j+2].End] + if b, ok := varToOpt[string(varName)]; ok { + optBindings = append(optBindings, optBinding{optKey: b.optKey, defaultMod: b.defaultMod, kind: "use"}) + } + } + i = skipToEndOfStatement(tokens, n, j) + continue + } + // use Module + modName, k := collectModuleName(j) + if modName != "" { + transUses = append(transUses, resolveAlias(modName)) + } + i = k + + case parser.TokAlias: + i++ + j := nextSig(i) + modName, k := collectModuleName(j) + if modName == "" { + i = k + continue + } + // Multi-alias: alias Parent.{A, B} + if children, nextPos, ok := parser.ScanMultiAliasChildren(source, tokens, n, k, false); ok { + parent := resolveAlias(modName) + for _, childName := range children { + if aliases == nil { + aliases = make(map[string]string) + } + aliases[parser.AliasShortName(childName)] = parent + "." + childName + } + i = nextPos + continue } - for k, v := range helperAliases { + // alias Module, as: Name + if asName, nextPos, ok := parser.ScanKeywordOptionValue(source, tokens, n, k, "as"); ok { if aliases == nil { aliases = make(map[string]string) } - aliases[k] = v + aliases[asName] = resolveAlias(modName) + i = nextPos - 1 + continue + } + // Simple alias + resolved := resolveAlias(modName) + if aliases == nil { + aliases = make(map[string]string) + } + aliases[parser.AliasShortName(resolved)] = resolved + i = k + + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + kind := string(source[tok.Start:tok.End]) + defLine := tok.Line + i++ + j := nextSig(i) + if j >= n || tokens[j].Kind != parser.TokIdent { + i = j + continue + } + funcName := string(source[tokens[j].Start:tokens[j].End]) + j++ + pj := nextSig(j) + nextPos := pj + maxArity := 0 + defaultCount := 0 + var paramNames []string + if pj < n && tokens[pj].Kind == parser.TokOpenParen { + maxArity, defaultCount, paramNames, nextPos = collectParamsFromTokens(source, tokens, n, pj) + paramNames = parser.FixParamNames(paramNames) + } + minArity := maxArity - defaultCount + for arity := minArity; arity <= maxArity; arity++ { + inlineDefs[funcName] = append(inlineDefs[funcName], inlineDef{ + line: defLine, + arity: arity, + kind: kind, + params: parser.JoinParams(paramNames, arity), + }) } + i = skipToEndOfStatement(tokens, n, nextPos) continue - } - if m := parser.FuncDefRe.FindStringSubmatch(line); m != nil { - funcName := m[2] - arity := parser.ExtractArity(line, funcName) - inlineDefs[funcName] = append(inlineDefs[funcName], inlineDef{ - line: i + 1, - arity: arity, - kind: m[1], - params: parser.JoinParams(parser.ExtractParamNames(line, funcName), arity), - }) + case parser.TokModule: + // Check for Keyword.put/put_new(opts, :key, Module) heuristic + modText := string(source[tok.Start:tok.End]) + if modText == "Keyword" && i+2 < n && tokens[i+1].Kind == parser.TokDot && tokens[i+2].Kind == parser.TokIdent { + funcName := string(source[tokens[i+2].Start:tokens[i+2].End]) + if funcName == "put" || funcName == "put_new" { + _, atomKey, defaultMod, endJ := scanKeywordCall(i) + if atomKey != "" && defaultMod != "" { + transUses = append(transUses, resolveAlias(defaultMod)) + } + i = endJ + continue + } + if funcName == "get" || funcName == "pop" { + _, atomKey, defaultMod, endJ := scanKeywordCall(i) + if atomKey != "" { + // This is just a bare Keyword.get/pop, not an assignment. + // Only var = Keyword.get/pop patterns are handled in the TokIdent case. + _ = defaultMod + i = endJ + continue + } + } + } + i++ + + case parser.TokIdent: + identName := string(source[tok.Start:tok.End]) + isStmtStart := i == 0 || tokens[i-1].Kind == parser.TokEOL || tokens[i-1].Kind == parser.TokComment + j := nextSig(i + 1) + + // Check for var = Keyword.{get,pop,put,put_new,...}(opts, :key, Default) + // or var = ModuleName + if isStmtStart && j < n && tokens[j].Kind == parser.TokOther && string(source[tokens[j].Start:tokens[j].End]) == "=" { + k := nextSig(j + 1) + if k < n && tokens[k].Kind == parser.TokModule && string(source[tokens[k].Start:tokens[k].End]) == "Keyword" { + funcName, atomKey, defaultMod, endJ := scanKeywordCall(k) + switch funcName { + case "get", "pop", "pop!": + if atomKey != "" { + varToOpt[identName] = varBinding{optKey: atomKey, defaultMod: resolveAlias(defaultMod)} + } + case "fetch", "fetch!", "pop_lazy": + if atomKey != "" { + varToOpt[identName] = varBinding{optKey: atomKey} + } + case "put", "put_new": + if atomKey != "" && defaultMod != "" { + transUses = append(transUses, resolveAlias(defaultMod)) + } + } + i = endJ + continue + } + // var = ModuleName + if k < n && tokens[k].Kind == parser.TokModule { + modName, endK := collectModuleName(k) + if modName != "" { + // Check it's a simple assignment (next sig token is EOL or EOF) + peek := nextSig(endK) + if peek >= n || tokens[peek].Kind == parser.TokEOL || tokens[peek].Kind == parser.TokEOF { + varToOpt[identName] = varBinding{defaultMod: resolveAlias(modName)} + i = endK + continue + } + } + } + } + // Check for bare function call that delegates to a helper: + // helper_name(opts) where helper_name is a def/defp in the same file. + // Only at statement start to avoid matching function calls inside expressions. + if isStmtStart && j < n && tokens[j].Kind == parser.TokOpenParen && !parser.IsElixirKeyword(identName) { + helperImported, helperDefs, helperTransUses, helperBindings, helperAliases := parseHelperQuoteBlock(lines, identName, fileAliases) + if helperImported != nil { + imported = append(imported, helperImported...) + for hk, hv := range helperDefs { + inlineDefs[hk] = append(inlineDefs[hk], hv...) + } + transUses = append(transUses, helperTransUses...) + optBindings = append(optBindings, helperBindings...) + } + for hk, hv := range helperAliases { + if aliases == nil { + aliases = make(map[string]string) + } + aliases[hk] = hv + } + i = skipToEndOfStatement(tokens, n, i) + continue + } + i++ + + case parser.TokOpenBrace: + // Check for {var, _} = Keyword.pop(opts, :key, Default) + j := nextSig(i + 1) + if j < n && tokens[j].Kind == parser.TokIdent { + varName := string(source[tokens[j].Start:tokens[j].End]) + // Scan forward to find } = Keyword.pop pattern + k := j + 1 + braceDepth := 1 + for k < n && braceDepth > 0 { + switch tokens[k].Kind { + case parser.TokOpenBrace: + braceDepth++ + case parser.TokCloseBrace: + braceDepth-- + } + k++ + } + // k is now past } + eq := nextSig(k) + if eq < n && tokens[eq].Kind == parser.TokOther && string(source[tokens[eq].Start:tokens[eq].End]) == "=" { + kw := nextSig(eq + 1) + if kw < n && tokens[kw].Kind == parser.TokModule && string(source[tokens[kw].Start:tokens[kw].End]) == "Keyword" { + funcName, atomKey, defaultMod, endJ := scanKeywordCall(kw) + if (funcName == "pop" || funcName == "pop!") && atomKey != "" { + varToOpt[string(varName)] = varBinding{optKey: atomKey, defaultMod: resolveAlias(defaultMod)} + } else if (funcName == "fetch" || funcName == "fetch!" || funcName == "pop_lazy") && atomKey != "" { + varToOpt[string(varName)] = varBinding{optKey: atomKey} + } + i = endJ + continue + } + } + } + i++ + + default: + i++ } } return } -// ExtractModuleAttribute returns the attribute name if the cursor is on a @attr reference, -// otherwise returns "". For example, on "@endpoint_scopes" returns "endpoint_scopes". -func ExtractModuleAttribute(line string, col int) string { - if col >= len(line) { +// collectParamsFromTokens delegates to the shared parser implementation. +var collectParamsFromTokens = parser.CollectParams + +// ModuleAttributeAtCursor returns the attribute name if the cursor is on a +// @attr reference, otherwise returns "". For example, on "@endpoint_scopes" +// returns "endpoint_scopes". Uses the token stream to correctly ignore +// attributes inside strings, comments, and heredocs. +func ModuleAttributeAtCursor(tokens []parser.Token, source []byte, lineStarts []int, line, col int) string { + offset := parser.LineColToOffset(lineStarts, line, col) + if offset < 0 { return "" } - // Scan back to find a leading @ - start := col - for start > 0 && (unicode.IsLetter(rune(line[start-1])) || unicode.IsDigit(rune(line[start-1])) || line[start-1] == '_') { - start-- - } - if start > 0 && line[start-1] == '@' { - start-- - } else if start < len(line) && line[start] == '@' { - // cursor is on the @ itself - } else { + + idx := parser.TokenAtOffset(tokens, offset) + if idx < 0 { return "" } - if start >= len(line) || line[start] != '@' { + + tok := tokens[idx] + if tok.Kind != parser.TokAttr { return "" } - end := start + 1 - for end < len(line) && (unicode.IsLetter(rune(line[end])) || unicode.IsDigit(rune(line[end])) || line[end] == '_') { - end++ - } - name := line[start+1 : end] - if len(name) == 0 { + + text := parser.TokenText(source, tok) + if len(text) <= 1 { return "" } - return name + return text[1:] // strip leading '@' +} + +// ExtractModuleAttribute is the TokenizedFile method version of ModuleAttributeAtCursor. +func (tf *TokenizedFile) ModuleAttributeAtCursor(line, col int) string { + return ModuleAttributeAtCursor(tf.tokens, tf.source, tf.lineStarts, line, col) } // reservedModuleAttrs are Elixir built-in module attributes that are not @@ -796,186 +1716,316 @@ var reservedModuleAttrs = map[string]bool{ // (assigned a value, not used). Returns the 1-based line number and true if found. // Returns false for reserved Elixir module attributes. func FindModuleAttributeDefinition(text string, attrName string) (int, bool) { - if reservedModuleAttrs[attrName] { - return 0, false - } - for i, line := range strings.Split(text, "\n") { - if m := moduleAttrDefRe.FindStringSubmatch(line); m != nil && m[1] == attrName { - return i + 1, true - } - } - return 0, false + return FindModuleAttributeDefinitionTokenized(text, attrName) } -// FindFunctionDefinition searches the document text for a def/defp/defmacro/defmacrop -// matching the given function name. Returns the 1-based line number and true if found. -func FindFunctionDefinition(text string, functionName string) (int, bool) { - for i, line := range strings.Split(text, "\n") { - if m := parser.FuncDefRe.FindStringSubmatch(line); m != nil { - if m[2] == functionName { - return i + 1, true +// FindBareFunctionCalls scans text for unqualified calls to functionName, +// including direct calls like functionName(...) and pipe calls like |> functionName. +// Returns 1-based line numbers. Definition lines are excluded. +func FindBareFunctionCalls(text string, functionName string) []int { + source := []byte(text) + tokens := parser.Tokenize(source) + n := len(tokens) + + seenLines := make(map[int]bool) + defLines := make(map[int]bool) + + // First pass: identify definition lines to exclude + for i := 0; i < n; i++ { + tok := tokens[i] + switch tok.Kind { + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + j := tokNextSig(tokens, n, i+1) + if j < n && tokens[j].Kind == parser.TokIdent { + if parser.TokenText(source, tokens[j]) == functionName { + defLines[tok.Line] = true + } } - continue // FuncDefRe and TypeDefRe match different line prefixes - } - if m := parser.TypeDefRe.FindStringSubmatch(line); m != nil { - if m[2] == functionName { - return i + 1, true + case parser.TokAttrSpec, parser.TokAttrCallback: + // Skip @spec and @callback lines that define this function + j := tokNextSig(tokens, n, i+1) + if j < n && tokens[j].Kind == parser.TokIdent { + if parser.TokenText(source, tokens[j]) == functionName { + defLines[tok.Line] = true + } } } } - return 0, false -} -// FindBareFunctionCalls scans text for unqualified calls to functionName, -// including direct calls like functionName(...) and pipe calls like |> functionName. -// Returns 1-based line numbers. Definition lines are excluded. -func FindBareFunctionCalls(text string, functionName string) []int { - var lineNums []int - for i, line := range strings.Split(text, "\n") { - trimmed := strings.TrimSpace(line) - if m := parser.FuncDefRe.FindStringSubmatch(trimmed); m != nil && m[2] == functionName { + // Second pass: find bare calls + for i := 0; i < n; i++ { + tok := tokens[i] + + if tok.Kind != parser.TokIdent { continue } - if strings.HasPrefix(trimmed, "@spec ") || strings.HasPrefix(trimmed, "@callback ") { + if parser.TokenText(source, tok) != functionName { + continue + } + if defLines[tok.Line] { continue } - found := false - - // Direct bare call: functionName( — but NOT Module.functionName( - for _, col := range findAllTokenColumns(line, functionName) { - if col == 0 || line[col-1] != '.' { - afterToken := line[col+len(functionName):] - afterTrimmed := strings.TrimLeft(afterToken, " \t") - if strings.HasPrefix(afterTrimmed, "(") { - found = true - break - } - } + // Check this is a bare call (not preceded by dot) + if i > 0 && tokens[i-1].Kind == parser.TokDot { + continue } - // Pipe call: |> functionName - if !found { - for pipeSearch := line; ; { - idx := strings.Index(pipeSearch, "|>") - if idx < 0 { + // Check it's followed by ( or preceded by |> + isCall := false + j := tokNextSig(tokens, n, i+1) + if j < n && tokens[j].Kind == parser.TokOpenParen { + isCall = true + } + // Check for pipe call: |> functionName + if !isCall && i > 0 { + // Look back for |> (may have EOL/comments between) + for k := i - 1; k >= 0; k-- { + if tokens[k].Kind == parser.TokPipe { + isCall = true break } - afterPipe := strings.TrimLeft(pipeSearch[idx+2:], " \t") - if col := findTokenColumn(afterPipe, functionName); col == 0 { - found = true + if tokens[k].Kind != parser.TokEOL && tokens[k].Kind != parser.TokComment { break } - pipeSearch = pipeSearch[idx+2:] } } - if found { - lineNums = append(lineNums, i+1) + if isCall && !seenLines[tok.Line] { + seenLines[tok.Line] = true } } + + var lineNums []int + for line := range seenLines { + lineNums = append(lineNums, line) + } + // Sort for deterministic output + slices.Sort(lineNums) return lineNums } -// ExtractCallContext scans backward from (lineNum, col) in text to find the -// innermost open function call. Returns the function expression (e.g. +// CallContextAtCursor scans backward through the token stream from (lineNum, col) +// to find the innermost open function call. Returns the function expression (e.g. // "Enum.map" or "my_func"), the 0-based argument index, and true if found. -func ExtractCallContext(text string, lineNum, col int) (funcExpr string, argIndex int, ok bool) { - lines := strings.Split(text, "\n") - if lineNum >= len(lines) { +// Handles both parenthesized calls like `Enum.map(list, fun)` and paren-less +// calls like `IO.puts "hello"` or `import MyApp.Repo`. +func CallContextAtCursor(tokens []parser.Token, source []byte, lineStarts []int, lineNum, col int) (funcExpr string, argIndex int, ok bool) { + offset := parser.LineColToOffset(lineStarts, lineNum, col) + if offset < 0 { + return "", 0, false + } + + startIdx := tokenAtOrBeforeOffset(tokens, offset) + if startIdx < 0 { + return "", 0, false + } + + // If cursor is inside a comment, bail out (strings may be arguments) + if tokens[startIdx].Kind == parser.TokComment { return "", 0, false } - // Clamp col to line length - if col > len(lines[lineNum]) { - col = len(lines[lineNum]) + + // If cursor is exactly on a closing delimiter, step back one token so the + // scan sees us as *inside* the call rather than outside the balanced pair. + scanIdx := startIdx + switch tokens[scanIdx].Kind { + case parser.TokCloseParen, parser.TokCloseBracket, parser.TokCloseBrace: + if scanIdx > 0 { + scanIdx-- + } + } + + // Try parenthesized call first + if expr, argIdx, found := callContextParen(tokens, source, scanIdx); found { + return expr, argIdx, true } - // Convert (lineNum, col) to a flat byte offset - offset := 0 - for i := 0; i < lineNum; i++ { - offset += len(lines[i]) + 1 // +1 for newline + // Try paren-less call: scan backward on the same line for + // `func_or_module.func arg1, arg2` patterns. + return callContextNoParen(tokens, source, startIdx) +} + +// tokenAtOrBeforeOffset returns the index of the token at or just before the +// given byte offset. Returns -1 if no suitable token exists. +func tokenAtOrBeforeOffset(tokens []parser.Token, offset int) int { + idx := parser.TokenAtOffset(tokens, offset) + if idx >= 0 { + return idx + } + // Cursor is between tokens — find the last token starting before offset + for i := len(tokens) - 1; i >= 0; i-- { + if tokens[i].Start < offset { + return i + } } - offset += col + return -1 +} - if offset > len(text) { - offset = len(text) +// collectDotChain walks backward from tokens[j] collecting a dotted identifier +// chain (e.g. Module.SubModule.func). Returns the expression string or "". +func collectDotChain(tokens []parser.Token, source []byte, j int) string { + var parts []string + for j >= 0 { + t := tokens[j] + if isCallableToken(t.Kind) { + parts = append(parts, parser.TokenText(source, t)) + if j-1 >= 0 && tokens[j-1].Kind == parser.TokDot { + j -= 2 + continue + } + break + } + break + } + if len(parts) == 0 { + return "" } + for l, r := 0, len(parts)-1; l < r; l, r = l+1, r-1 { + parts[l], parts[r] = parts[r], parts[l] + } + return strings.Join(parts, ".") +} - // Scan backward tracking nesting depth +// callContextParen scans backward from startIdx looking for an unmatched open +// paren to identify a parenthesized function call. +// +// All bracket types (paren, bracket, brace) are tracked in a unified depth +// counter so that commas inside nested containers are not counted toward the +// outer call's argument index. +func callContextParen(tokens []parser.Token, source []byte, startIdx int) (string, int, bool) { depth := 0 commas := 0 - inString := false - for i := offset - 1; i >= 0; i-- { - ch := text[i] + for i := startIdx; i >= 0; i-- { + tok := tokens[i] - // String skip: when we hit a closing ", scan backward to find the opening " - if ch == '"' && !inString { - inString = true - continue - } - if inString { - if ch == '"' { - // Count preceding backslashes — an odd number means the quote is escaped - backslashes := 0 - for j := i - 1; j >= 0 && text[j] == '\\'; j-- { - backslashes++ + switch tok.Kind { + case parser.TokCloseParen, parser.TokCloseBracket, parser.TokCloseBrace, parser.TokCloseAngle: + depth++ + case parser.TokOpenBracket, parser.TokOpenBrace, parser.TokOpenAngle: + if depth > 0 { + depth-- + } else { + // Unmatched open bracket/brace/angle — we exited a container + // that is itself an argument. Reset comma count for this + // nesting level and keep scanning for the enclosing call. + commas = 0 + } + case parser.TokOpenParen: + if depth > 0 { + depth-- + } else { + j := i - 1 + // Anonymous call: callback.(arg) — skip the dot + if j >= 0 && tokens[j].Kind == parser.TokDot { + j-- } - if backslashes%2 == 0 { - inString = false + expr := collectDotChain(tokens, source, j) + if expr == "" || parser.IsElixirKeyword(expr) { + return "", 0, false } + return expr, commas, true + } + case parser.TokComma: + if depth == 0 { + commas++ } - continue } + } + return "", 0, false +} + +// isCallableToken returns true if the token kind can be the name of a +// paren-less function/macro call. +func isCallableToken(kind parser.TokenKind) bool { + switch kind { + case parser.TokIdent, parser.TokModule, + parser.TokImport, parser.TokAlias, parser.TokUse, parser.TokRequire: + return true + default: + return false + } +} + +// isArgStartToken returns true if the token kind can appear as the beginning +// of a function argument (i.e., it's a value-like token, not an operator). +func isArgStartToken(kind parser.TokenKind) bool { + switch kind { + case parser.TokIdent, parser.TokModule, parser.TokNumber, + parser.TokString, parser.TokHeredoc, parser.TokSigil, + parser.TokAtom, parser.TokCharLiteral, + parser.TokOpenParen, parser.TokOpenBracket, parser.TokOpenBrace, + parser.TokOpenAngle, parser.TokPercent, + parser.TokAttr, parser.TokFn: + return true + default: + return false + } +} + +// callContextNoParen detects paren-less function calls by scanning backward +// for a pattern like `ident arg, arg` or `Module.func arg, arg` where the +// function name is separated from its arguments by whitespace (no parens). +// +// Follows Elixir's own approach (Code.Fragment): if the preceding token is an +// identifier separated by whitespace from the next token, it's a no-paren call. +func callContextNoParen(tokens []parser.Token, source []byte, startIdx int) (string, int, bool) { + depth := 0 + commas := 0 + + for i := startIdx; i >= 0; i-- { + tok := tokens[i] - switch ch { - case ')', ']', '}': + switch tok.Kind { + case parser.TokCloseParen, parser.TokCloseBracket, parser.TokCloseBrace, parser.TokCloseAngle: depth++ - case '[', '{': + case parser.TokOpenParen: if depth > 0 { depth-- } else { - // Inside a list/map/tuple, not a function call return "", 0, false } - case '(': + case parser.TokOpenBracket, parser.TokOpenBrace, parser.TokOpenAngle: if depth > 0 { depth-- } else { - // Found the opening paren of our call — extract the function name before it - // Scan backward from i-1 to find the expression - exprEnd := i - 1 - // Skip whitespace between expression and paren - for exprEnd >= 0 && (text[exprEnd] == ' ' || text[exprEnd] == '\t' || text[exprEnd] == '\n' || text[exprEnd] == '\r') { - exprEnd-- - } - if exprEnd < 0 { - return "", 0, false - } - // Find the start of the expression - exprStart := exprEnd - for exprStart > 0 && isExprChar(text[exprStart-1]) { - exprStart-- - } - funcExpr = text[exprStart : exprEnd+1] - if funcExpr == "" { - return "", 0, false - } - // Skip Elixir keywords that take parens (if, case, etc.) - if parser.IsElixirKeyword(funcExpr) { - return "", 0, false - } - return funcExpr, commas, true + commas = 0 } - case ',': + case parser.TokComma: if depth == 0 { commas++ } + default: + if depth == 0 && isCallableToken(tok.Kind) { + if i+1 < len(tokens) { + next := tokens[i+1] + // Part of a dotted chain — keep scanning + if next.Kind == parser.TokDot { + continue + } + // Must be separated by whitespace AND followed by a + // value-like token (not an operator like =, ->, etc.) + if next.Start > tok.End && isArgStartToken(next.Kind) { + expr := collectDotChain(tokens, source, i) + if expr == "" || parser.IsElixirKeyword(expr) { + return "", 0, false + } + return expr, commas, true + } + } + } } } return "", 0, false } +// CallContextAtCursor is the TokenizedFile method version. +func (tf *TokenizedFile) CallContextAtCursor(line, col int) (funcExpr string, argIndex int, ok bool) { + return CallContextAtCursor(tf.tokens, tf.source, tf.lineStarts, line, col) +} + // extractParamNames reads the function definition line at defIdx and returns // the parameter names. Falls back to positional names (arg1, arg2, ...) for // complex patterns. @@ -983,10 +2033,30 @@ func extractParamNames(lines []string, defIdx int) []string { if defIdx < 0 || defIdx >= len(lines) { return nil } - line := lines[defIdx] - m := parser.FuncDefRe.FindStringSubmatch(line) - if m == nil { - return nil + + // Tokenize just the single line for efficiency + source := []byte(lines[defIdx]) + tokens := parser.Tokenize(source) + n := len(tokens) + + for i := 0; i < n; i++ { + tok := tokens[i] + + switch tok.Kind { + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + j := tokNextSig(tokens, n, i+1) + if j >= n || tokens[j].Kind != parser.TokIdent { + return nil + } + j++ + pj := tokNextSig(tokens, n, j) + if pj >= n || tokens[pj].Kind != parser.TokOpenParen { + return nil + } + _, _, paramNames, _ := parser.CollectParams(source, tokens, n, pj) + return parser.FixParamNames(paramNames) + } } - return parser.ExtractParamNames(line, m[2]) + return nil } diff --git a/internal/lsp/elixir_docs.go b/internal/lsp/elixir_docs.go new file mode 100644 index 0000000..dd901e3 --- /dev/null +++ b/internal/lsp/elixir_docs.go @@ -0,0 +1,280 @@ +package lsp + +import ( + "strings" + + "github.com/remoteoss/dexter/internal/parser" +) + +// ExtractDocAbove scans backward from defLineIdx (0-based) to find @doc and @spec. +func (tf *TokenizedFile) ExtractDocAbove(defLineIdx int) (doc, spec string) { + defLine1 := defLineIdx + 1 + + // Find the token index for defLine1 + startIdx := -1 + for i, tok := range tf.tokens { + if tok.Line >= defLine1 { + startIdx = i + break + } + } + if startIdx < 0 { + return "", "" + } + + // Scan backward to find the previous statement boundary to scope the search + boundaryIdx := 0 + for i := startIdx - 1; i >= 0; i-- { + if parser.IsStatementBoundaryToken(tf.tokens[i].Kind) { + boundaryIdx = i + 1 + break + } + } + + var currentDoc string + var currentSpec []string + inSpecBlock := false + + // Instead of token by token, we iterate line by line in the token range + // because specs and docs are line-oriented in the original code. + startLine := tf.tokens[boundaryIdx].Line - 1 + endLine := defLineIdx + + lines := strings.Split(string(tf.source), "\n") + if startLine < 0 { + startLine = 0 + } + if endLine > len(lines) { + endLine = len(lines) + } + + inDocHeredoc := false + var docLines []string + + for i := startLine; i < endLine; i++ { + trimmed := strings.TrimSpace(lines[i]) + + if inDocHeredoc { + if trimmed == `"""` { + inDocHeredoc = false + currentDoc = dedentBlock(docLines) + docLines = nil + } else { + docLines = append(docLines, lines[i]) + } + continue + } + + if inSpecBlock { + if trimmed == "" || strings.HasPrefix(trimmed, "@") || strings.HasPrefix(trimmed, "def") { + inSpecBlock = false + } else { + currentSpec = append(currentSpec, lines[i]) + continue + } + } + + if trimmed == `@doc """` || trimmed == `@doc ~S"""` || trimmed == `@doc ~s"""` || + trimmed == `@typedoc """` || trimmed == `@typedoc ~S"""` || trimmed == `@typedoc ~s"""` { + inDocHeredoc = true + docLines = nil + continue + } + + if strings.HasPrefix(trimmed, `@doc "`) { + currentDoc = extractQuotedString(trimmed[5:]) + continue + } + + if strings.HasPrefix(trimmed, `@typedoc "`) { + currentDoc = extractQuotedString(trimmed[9:]) + continue + } + + if trimmed == "@doc false" || trimmed == "@typedoc false" { + currentDoc = "" + continue + } + + if strings.HasPrefix(trimmed, "@spec ") { + currentSpec = []string{lines[i]} + inSpecBlock = true + continue + } + } + + if len(currentSpec) > 0 { + spec = strings.TrimSpace(strings.Join(currentSpec, "\n")) + } + return currentDoc, spec +} + +// ExtractModuledoc scans forward from defLineIdx to find @moduledoc. +func (tf *TokenizedFile) ExtractModuledoc(defLineIdx int) string { + defLine1 := defLineIdx + 1 + n := tf.n + + startIdx := -1 + for i, tok := range tf.tokens { + if tok.Line >= defLine1 { + startIdx = i + break + } + } + if startIdx < 0 { + return "" + } + + // Scan forward within the module block + for i := startIdx; i < n; i++ { + tok := tf.tokens[i] + + // Stop if we hit a definition (we went too far) + if tok.Kind == parser.TokDef || tok.Kind == parser.TokDefp || tok.Kind == parser.TokDefmacro || tok.Kind == parser.TokDefmacrop || tok.Kind == parser.TokEnd { + break + } + + if tok.Kind == parser.TokAttrDoc || tok.Kind == parser.TokAttr { + attrText := parser.TokenText(tf.source, tok) + if attrText == "@moduledoc" { + j := parser.NextSigToken(tf.tokens, n, i+1) + if j < n { + nextTok := tf.tokens[j] + if nextTok.Kind == parser.TokHeredoc || nextTok.Kind == parser.TokString || nextTok.Kind == parser.TokSigil { + return extractDocFromStringToken(tf.source, nextTok) + } else if nextTok.Kind == parser.TokIdent && parser.TokenText(tf.source, nextTok) == "false" { + return "" + } + } + } + } + } + return "" +} + +func extractDocFromStringToken(source []byte, tok parser.Token) string { + text := string(source[tok.Start:tok.End]) + if tok.Kind == parser.TokHeredoc { + // remove quotes + lines := strings.Split(text, "\n") + if len(lines) >= 2 { + // strip first line `"""` + lines = lines[1 : len(lines)-1] + return dedentBlock(lines) + } + return "" + } + if tok.Kind == parser.TokString { + return extractQuotedString(text) + } + if tok.Kind == parser.TokSigil { + if doc := extractDocFromSigilToken(text); doc != "" { + return doc + } + } + return text +} + +func extractDocFromSigilToken(text string) string { + if len(text) < 3 || text[0] != '~' || !isASCIILetter(text[1]) { + return "" + } + i := 2 + sigilLetter := text[1] + if isUpperASCII(sigilLetter) { + for i < len(text) && isUpperASCII(text[i]) { + i++ + } + } + if i >= len(text) { + return "" + } + + open := text[i] + close := open + nested := false + switch open { + case '(': + close = ')' + nested = true + case '[': + close = ']' + nested = true + case '{': + close = '}' + nested = true + case '<': + close = '>' + nested = true + } + + end := len(text) + for end > i+1 && isASCIILetter(text[end-1]) { + end-- + } + if end <= i+1 { + return "" + } + + // Heredoc sigils: ~s"""...""" or ~S'''...''' + if (open == '"' || open == '\'') && i+2 < end && text[i+1] == open && text[i+2] == open { + if end < i+6 || text[end-3] != open || text[end-2] != open || text[end-1] != open { + return "" + } + body := text[i+3 : end-3] + lines := strings.Split(body, "\n") + if len(lines) > 0 && lines[0] == "" { + lines = lines[1:] + } + return dedentBlock(lines) + } + + contentStart := i + 1 + escapes := isLowerASCII(sigilLetter) + + if nested { + depth := 1 + for j := contentStart; j < end; j++ { + ch := text[j] + if escapes && ch == '\\' && j+1 < end { + j++ + continue + } + if ch == open { + depth++ + continue + } + if ch == close { + depth-- + if depth == 0 { + return text[contentStart:j] + } + } + } + return "" + } + + for j := contentStart; j < end; j++ { + ch := text[j] + if escapes && ch == '\\' && j+1 < end { + j++ + continue + } + if ch == close { + return text[contentStart:j] + } + } + return "" +} + +func isASCIILetter(ch byte) bool { + return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') +} + +func isUpperASCII(ch byte) bool { + return ch >= 'A' && ch <= 'Z' +} + +func isLowerASCII(ch byte) bool { + return ch >= 'a' && ch <= 'z' +} diff --git a/internal/lsp/elixir_test.go b/internal/lsp/elixir_test.go index 0c97a92..ea9fb70 100644 --- a/internal/lsp/elixir_test.go +++ b/internal/lsp/elixir_test.go @@ -1,118 +1,11 @@ package lsp import ( + "strings" "testing" -) - -func TestExtractExpression(t *testing.T) { - tests := []struct { - name string - line string - col int - expected string - }{ - // Cursor on middle segment → truncate at that segment's end - { - name: "cursor on middle module segment", - line: " Foo.Bar.baz(123)", - col: 9, - expected: "Foo.Bar", - }, - // Cursor on dot → include next segment - { - name: "cursor on dot between segments", - line: " Foo.Bar.Baz", - col: 7, - expected: "Foo.Bar", - }, - { - name: "bare function", - line: " do_something(x)", - col: 7, - expected: "do_something", - }, - // Cursor on first segment → return only that segment - { - name: "cursor at start of expr", - line: " Foo.bar()", - col: 4, - expected: "Foo", - }, - // Cursor on last segment → return full expression - { - name: "cursor at end of expr", - line: " Foo.bar()", - col: 10, - expected: "Foo.bar", - }, - { - name: "function with question mark", - line: " valid?(x)", - col: 6, - expected: "valid?", - }, - { - name: "function with bang", - line: " process!(x)", - col: 6, - expected: "process!", - }, - // Cursor on first segment of underscore module - { - name: "cursor on first segment of underscore module", - line: " MyApp_Web.Router", - col: 8, - expected: "MyApp_Web", - }, - // Cursor on last segment → full expr - { - name: "cursor on last segment", - line: " MyApp_Web.Router", - col: 16, - expected: "MyApp_Web.Router", - }, - { - name: "empty line", - line: "", - col: 0, - expected: "", - }, - { - name: "cursor on paren", - line: " Foo.bar()", - col: 11, - expected: "", - }, - // Three-part expression: cursor on each segment - { - name: "three-part: cursor on first", - line: "MyApp.Repo.all", - col: 2, - expected: "MyApp", - }, - { - name: "three-part: cursor on middle", - line: "MyApp.Repo.all", - col: 7, - expected: "MyApp.Repo", - }, - { - name: "three-part: cursor on last", - line: "MyApp.Repo.all", - col: 11, - expected: "MyApp.Repo.all", - }, - } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := ExtractExpression(tt.line, tt.col) - if got != tt.expected { - t.Errorf("ExtractExpression(%q, %d) = %q, want %q", tt.line, tt.col, got, tt.expected) - } - }) - } -} + "github.com/remoteoss/dexter/internal/parser" +) func TestExtractModuleAndFunction(t *testing.T) { tests := []struct { @@ -178,6 +71,398 @@ func TestExtractModuleAndFunction(t *testing.T) { } } +func tokenize(code string) ([]parser.Token, []byte, []int) { + source := []byte(code) + result := parser.TokenizeFull(source) + return result.Tokens, source, result.LineStarts +} + +func TestExpressionAtCursor(t *testing.T) { + tests := []struct { + name string + code string + line int + col int + wantMod string + wantFunc string + }{ + { + name: "cursor on middle module segment", + code: " Foo.Bar.baz(123)", + line: 0, + col: 9, // 'a' in Bar + wantMod: "Foo.Bar", + wantFunc: "", + }, + { + name: "cursor on function name", + code: " Foo.Bar.baz(123)", + line: 0, + col: 12, // 'b' in baz + wantMod: "Foo.Bar", + wantFunc: "baz", + }, + { + name: "cursor on first module segment", + code: " Foo.bar()", + line: 0, + col: 4, // 'F' in Foo + wantMod: "Foo", + wantFunc: "", + }, + { + name: "bare function call", + code: " do_something(x)", + line: 0, + col: 7, + wantMod: "", + wantFunc: "do_something", + }, + { + name: "cursor on dot includes next segment", + code: " Foo.Bar.Baz", + line: 0, + col: 7, // the dot between Foo and Bar + wantMod: "Foo.Bar", + wantFunc: "", + }, + { + name: "three-part cursor on last", + code: "MyApp.Repo.all", + line: 0, + col: 11, // 'a' in all + wantMod: "MyApp.Repo", + wantFunc: "all", + }, + { + name: "three-part cursor on middle", + code: "MyApp.Repo.all", + line: 0, + col: 7, // 'e' in Repo + wantMod: "MyApp.Repo", + wantFunc: "", + }, + { + name: "three-part cursor on first", + code: "MyApp.Repo.all", + line: 0, + col: 2, // 'A' in MyApp + wantMod: "MyApp", + wantFunc: "", + }, + { + name: "function with question mark", + code: " valid?(x)", + line: 0, + col: 6, + wantMod: "", + wantFunc: "valid?", + }, + { + name: "function with bang", + code: " process!(x)", + line: 0, + col: 6, + wantMod: "", + wantFunc: "process!", + }, + { + name: "empty line", + code: "", + line: 0, + col: 0, + wantMod: "", + wantFunc: "", + }, + { + name: "cursor on paren", + code: " Foo.bar()", + line: 0, + col: 11, // the open paren + wantMod: "", + wantFunc: "", + }, + // --- Token-aware improvements over char-based version --- + { + name: "expression inside string is ignored", + code: `x = "Foo.bar"`, + line: 0, + col: 7, // 'o' in Foo inside the string + wantMod: "", + wantFunc: "", + }, + { + name: "expression inside comment is ignored", + code: " # Foo.bar is great", + line: 0, + col: 6, // 'o' in Foo inside comment + wantMod: "", + wantFunc: "", + }, + { + name: "expression inside heredoc is ignored", + code: " \"\"\"\n Foo.bar\n \"\"\"", + line: 1, + col: 4, // 'o' in Foo inside heredoc + wantMod: "", + wantFunc: "", + }, + { + name: "multiline: cursor on second line", + code: "defmodule MyApp do\n Foo.Bar.baz()\nend", + line: 1, + col: 6, // 'B' in Bar + wantMod: "Foo.Bar", + wantFunc: "", + }, + { + name: "module-only expression", + code: " Foo.Bar.Baz", + line: 0, + col: 10, // 'B' in Baz + wantMod: "Foo.Bar.Baz", + wantFunc: "", + }, + { + name: "pipe into qualified call", + code: " |> Foo.Bar.transform()", + line: 0, + col: 15, // 't' in transform + wantMod: "Foo.Bar", + wantFunc: "transform", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + tokens, source, lineStarts := tokenize(tt.code) + ctx := ExpressionAtCursor(tokens, source, lineStarts, tt.line, tt.col) + if ctx.ModuleRef != tt.wantMod { + t.Errorf("ModuleRef = %q, want %q", ctx.ModuleRef, tt.wantMod) + } + if ctx.FunctionName != tt.wantFunc { + t.Errorf("FunctionName = %q, want %q", ctx.FunctionName, tt.wantFunc) + } + }) + } +} + +func TestFullExpressionAtCursor(t *testing.T) { + code := " Foo.Bar.baz(123)" + tokens, source, lineStarts := tokenize(code) + + // Cursor on Foo — full returns entire chain + ctx := FullExpressionAtCursor(tokens, source, lineStarts, 0, 5) + if ctx.ModuleRef != "Foo.Bar" { + t.Errorf("ModuleRef = %q, want %q", ctx.ModuleRef, "Foo.Bar") + } + if ctx.FunctionName != "baz" { + t.Errorf("FunctionName = %q, want %q", ctx.FunctionName, "baz") + } + + // Truncated version should only return Foo + ctx2 := ExpressionAtCursor(tokens, source, lineStarts, 0, 5) + if ctx2.ModuleRef != "Foo" { + t.Errorf("truncated ModuleRef = %q, want %q", ctx2.ModuleRef, "Foo") + } + if ctx2.FunctionName != "" { + t.Errorf("truncated FunctionName = %q, want %q", ctx2.FunctionName, "") + } +} + +func TestExpressionAtCursor_ExprBounds(t *testing.T) { + code := " Foo.Bar.baz(123)" + tokens, source, lineStarts := tokenize(code) + + // Cursor on baz: exprStart should be at Foo (col 4), exprEnd after baz + ctx := ExpressionAtCursor(tokens, source, lineStarts, 0, 12) + if ctx.ExprStart != 4 { + t.Errorf("ExprStart = %d, want 4", ctx.ExprStart) + } + if ctx.ExprEnd != 15 { + t.Errorf("ExprEnd = %d, want 15", ctx.ExprEnd) + } + + // Cursor on Bar: exprStart at Foo (col 4), exprEnd after Bar + ctx2 := ExpressionAtCursor(tokens, source, lineStarts, 0, 9) + if ctx2.ExprStart != 4 { + t.Errorf("ExprStart = %d, want 4", ctx2.ExprStart) + } + if ctx2.ExprEnd != 11 { + t.Errorf("ExprEnd = %d, want 11", ctx2.ExprEnd) + } +} + +func TestCursorContext_Expr(t *testing.T) { + tests := []struct { + mod, fn, want string + }{ + {"Foo.Bar", "baz", "Foo.Bar.baz"}, + {"Foo.Bar", "", "Foo.Bar"}, + {"", "baz", "baz"}, + {"", "", ""}, + } + for _, tt := range tests { + ctx := CursorContext{ModuleRef: tt.mod, FunctionName: tt.fn} + if got := ctx.Expr(); got != tt.want { + t.Errorf("CursorContext{%q, %q}.Expr() = %q, want %q", tt.mod, tt.fn, got, tt.want) + } + } +} + +func TestExtractAliasBlockParent(t *testing.T) { + t.Run("cursor inside multi-line block", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Services.{ + Accounts, + + } +end` + parent, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 3) + if !ok { + t.Fatal("expected to be inside alias block") + } + if parent != "MyApp.Services" { + t.Errorf("got %q, want MyApp.Services", parent) + } + }) + + t.Run("cursor on line with children", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Services.{ + Accounts, + } +end` + parent, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 2) + if !ok { + t.Fatal("expected to be inside alias block") + } + if parent != "MyApp.Services" { + t.Errorf("got %q, want MyApp.Services", parent) + } + }) + + t.Run("cursor after closing brace", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Services.{ + Accounts + } + +end` + _, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 4) + if ok { + t.Error("should not be inside alias block after closing brace") + } + }) + + t.Run("cursor on normal alias line", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Repo + +end` + _, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 2) + if ok { + t.Error("should not be inside alias block on a normal line") + } + }) + + t.Run("cursor on same line as opening brace", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Handlers.{ +end` + parent, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 1) + if !ok { + t.Fatal("expected to be inside alias block") + } + if parent != "MyApp.Handlers" { + t.Errorf("got %q, want MyApp.Handlers", parent) + } + }) + + t.Run("resolves __MODULE__ in parent", func(t *testing.T) { + text := `defmodule MyApp.HRIS do + alias __MODULE__.{ + Services, + + } +end` + parent, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 3) + if !ok { + t.Fatal("expected to be inside alias block") + } + if parent != "MyApp.HRIS" { + t.Errorf("got %q, want MyApp.HRIS", parent) + } + }) + + t.Run("single-line block with closing brace", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.{Accounts, Users} + +end` + _, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 1) + if ok { + t.Error("should not be inside alias block when braces close on same line") + } + }) + + t.Run("trailing brace on content line", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Billing.{ + Services.MakePayment } +end` + parent, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 2) + if !ok { + t.Fatal("expected to be inside alias block when } follows module content") + } + if parent != "MyApp.Billing" { + t.Errorf("got %q, want MyApp.Billing", parent) + } + }) + + t.Run("blank lines between alias and cursor", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Services.{ + Accounts, + + + } +end` + parent, ok := ExtractAliasBlockParent(strings.Split(text, "\n"), 4) + if !ok { + t.Fatal("expected to be inside alias block") + } + if parent != "MyApp.Services" { + t.Errorf("got %q, want MyApp.Services", parent) + } + }) + t.Run("missing close brace", func(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.Services.{ + Accounts, + + def foo do + # missing close brace + end +end` + lines := strings.Split(text, "\n") + // Unclosed `{`: user is still typing. We still want the parent for completion/hover + // on lines inside the block, and the forward scan must not walk the whole file + // looking for a `}` on the same line as `{` (regression guard for the line-bound + // scan in ExtractAliasBlockParent). + for _, line := range []int{2, 3} { + parent, ok := ExtractAliasBlockParent(lines, line) + if !ok || parent != "MyApp.Services" { + t.Errorf("line %d: expected in block parent MyApp.Services, got %q, ok=%v", line, parent, ok) + } + } + parent, ok := ExtractAliasBlockParent(lines, 0) + if ok { + t.Errorf("line 0 (defmodule): expected not in alias block, got parent %q", parent) + } + }) +} + func TestExtractAliases(t *testing.T) { t.Run("simple alias", func(t *testing.T) { aliases := ExtractAliases(" alias MyApp.Repo") @@ -253,6 +538,94 @@ func TestExtractAliases(t *testing.T) { } }) + t.Run("multi-line alias with as on next line", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Helpers.Paginator,\n as: Pages\nend" + aliases := ExtractAliases(text) + if aliases["Pages"] != "MyApp.Helpers.Paginator" { + t.Errorf("Pages: got %q, want MyApp.Helpers.Paginator", aliases["Pages"]) + } + // Should NOT also register as a simple alias under the last segment + if _, ok := aliases["Paginator"]; ok { + t.Error("should not register simple alias Paginator when as: is on next line") + } + }) + + t.Run("multi-line alias with as and extra whitespace before comma", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Billing.Services.MakePayment ,\n as: MakePaymentNow\nend" + aliases := ExtractAliases(text) + if aliases["MakePaymentNow"] != "MyApp.Billing.Services.MakePayment" { + t.Errorf("MakePaymentNow: got %q, want MyApp.Billing.Services.MakePayment", aliases["MakePaymentNow"]) + } + if _, ok := aliases["MakePayment"]; ok { + t.Error("should not register simple alias MakePayment when as: is on next line") + } + }) + + t.Run("multi-line multi-alias with braces spanning lines", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Handlers.{\n Accounts,\n Users,\n Profiles\n }\nend" + aliases := ExtractAliases(text) + if aliases["Accounts"] != "MyApp.Handlers.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Handlers.Accounts", aliases["Accounts"]) + } + if aliases["Users"] != "MyApp.Handlers.Users" { + t.Errorf("Users: got %q, want MyApp.Handlers.Users", aliases["Users"]) + } + if aliases["Profiles"] != "MyApp.Handlers.Profiles" { + t.Errorf("Profiles: got %q, want MyApp.Handlers.Profiles", aliases["Profiles"]) + } + }) + + t.Run("multi-line multi-alias with comments inside", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Services.{\n Accounts,\n # Users is deprecated\n Profiles\n }\nend" + aliases := ExtractAliases(text) + if aliases["Accounts"] != "MyApp.Services.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Services.Accounts", aliases["Accounts"]) + } + if aliases["Profiles"] != "MyApp.Services.Profiles" { + t.Errorf("Profiles: got %q, want MyApp.Services.Profiles", aliases["Profiles"]) + } + if len(aliases) != 2 { + t.Errorf("expected 2 aliases, got %d: %v", len(aliases), aliases) + } + }) + + t.Run("multi-line multi-alias with multiple children per line", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Handlers.{\n Accounts, Users,\n Profiles\n }\nend" + aliases := ExtractAliases(text) + if aliases["Accounts"] != "MyApp.Handlers.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Handlers.Accounts", aliases["Accounts"]) + } + if aliases["Users"] != "MyApp.Handlers.Users" { + t.Errorf("Users: got %q, want MyApp.Handlers.Users", aliases["Users"]) + } + if aliases["Profiles"] != "MyApp.Handlers.Profiles" { + t.Errorf("Profiles: got %q, want MyApp.Handlers.Profiles", aliases["Profiles"]) + } + }) + + t.Run("multi-line multi-alias with trailing comma", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Handlers.{\n Accounts,\n Users,\n }\nend" + aliases := ExtractAliases(text) + if aliases["Accounts"] != "MyApp.Handlers.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Handlers.Accounts", aliases["Accounts"]) + } + if aliases["Users"] != "MyApp.Handlers.Users" { + t.Errorf("Users: got %q, want MyApp.Handlers.Users", aliases["Users"]) + } + if len(aliases) != 2 { + t.Errorf("expected 2 aliases, got %d: %v", len(aliases), aliases) + } + }) + + t.Run("multi-line alias bail-out on new statement", func(t *testing.T) { + text := "defmodule MyApp.Web do\n alias MyApp.Handlers.{\n Accounts,\n def foo, do: :ok\nend" + aliases := ExtractAliases(text) + // Key assertion: no alias for "foo" or anything weird — the def line must not be swallowed + if _, ok := aliases["foo"]; ok { + t.Error("should not register 'foo' as an alias") + } + }) + t.Run("partial __MODULE__ alias resolves in lookup", func(t *testing.T) { // Simulates: alias __MODULE__.Services -> Services = MyApp.HRIS.Services // Then a lookup for "Services.AssociateWithTeamV2" should resolve @@ -271,6 +644,17 @@ func TestExtractAliases(t *testing.T) { t.Errorf("got %q, want MyApp.HRIS.Services.AssociateWithTeamV2", full) } }) + + t.Run("alias on same line as defmodule do is not skipped", func(t *testing.T) { + // Regression: the for-loop post-increment skipped the first token after + // processModuleDef returned. On a single-line defmodule + alias, the + // alias token was missed. + text := "defmodule MyApp.Web do alias MyApp.Accounts\nend" + aliases := ExtractAliases(text) + if aliases["Accounts"] != "MyApp.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Accounts", aliases["Accounts"]) + } + }) } func TestExtractAliasesInScope(t *testing.T) { @@ -334,11 +718,37 @@ end } }) - t.Run("fn...end block does not break scope tracking", func(t *testing.T) { - // Regression: fn...end has an "end" without a corresponding "do", - // which caused the depth counter to go out of sync and pop the - // module scope prematurely. - fnSrc := `defmodule MyApp.Aggregator do + t.Run("defmodule with do on next line keeps alias in inner scope", func(t *testing.T) { + src := `defmodule MyApp.Outer do + defmodule Inner + do + alias MyApp.InnerOnly + def run, do: InnerOnly.call() + end + + def outer_run do + :ok + end +end +` + // Line 4 = inside Inner module body. + innerAliases := ExtractAliasesInScope(src, 4) + if innerAliases["InnerOnly"] != "MyApp.InnerOnly" { + t.Errorf("expected InnerOnly alias in inner scope, got %q", innerAliases["InnerOnly"]) + } + + // Line 7 = inside Outer after Inner ends. + outerAliases := ExtractAliasesInScope(src, 7) + if _, ok := outerAliases["InnerOnly"]; ok { + t.Error("InnerOnly alias should NOT leak to outer scope") + } + }) + + t.Run("fn...end block does not break scope tracking", func(t *testing.T) { + // Regression: fn...end has an "end" without a corresponding "do", + // which caused the depth counter to go out of sync and pop the + // module scope prematurely. + fnSrc := `defmodule MyApp.Aggregator do alias MyApp.Filters defp build_filter(:active, items) do @@ -447,6 +857,36 @@ end t.Errorf("expected Validator alias after trailing fn, got %q", aliases["Validator"]) } }) + t.Run("alias and require as on same line with semicolon", func(t *testing.T) { + // Regression: after `alias Mod, as: Name` / `require Mod, as: Name`, the token + // walker must resume past the value token (ScanKeywordOptionValue's nextPos) so the + // for-loop post-increment does not skip the next statement on the same line. + text := `defmodule MyApp.Outer do + alias MyApp.Foo, as: MyFoo; alias MyApp.Bar, as: MyBar + require MyApp.Baz, as: MyBaz; require MyApp.Qux, as: MyQux + + def call do + MyFoo.run() + MyBar.run() + MyBaz.ok() + MyQux.ok() + end +end` + // Line 4 is `def call do` — still inside Outer; aliases from lines 1–2 must be visible. + aliases := ExtractAliasesInScope(text, 4) + if aliases["MyFoo"] != "MyApp.Foo" { + t.Errorf("MyFoo: got %q, want MyApp.Foo", aliases["MyFoo"]) + } + if aliases["MyBar"] != "MyApp.Bar" { + t.Errorf("MyBar: got %q, want MyApp.Bar", aliases["MyBar"]) + } + if aliases["MyBaz"] != "MyApp.Baz" { + t.Errorf("MyBaz: got %q, want MyApp.Baz", aliases["MyBaz"]) + } + if aliases["MyQux"] != "MyApp.Qux" { + t.Errorf("MyQux: got %q, want MyApp.Qux", aliases["MyQux"]) + } + }) } func TestExtractImports(t *testing.T) { @@ -492,6 +932,7 @@ func TestFindFunctionDefinition(t *testing.T) { end end` + tf := NewTokenizedFile(text) tests := []struct { name string functionName string @@ -507,7 +948,7 @@ end` for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - line, found := FindFunctionDefinition(text, tt.functionName) + line, found := tf.FindFunctionDefinition(tt.functionName) if found != tt.expectedFound { t.Errorf("found: got %v, want %v", found, tt.expectedFound) } @@ -524,12 +965,13 @@ func TestFindFunctionDefinition_Guards(t *testing.T) { defguardp is_active(user) when user.status == :active end` - line, found := FindFunctionDefinition(text, "is_admin") + tf := NewTokenizedFile(text) + line, found := tf.FindFunctionDefinition("is_admin") if !found || line != 2 { t.Errorf("is_admin: got line %d found %v", line, found) } - line, found = FindFunctionDefinition(text, "is_active") + line, found = tf.FindFunctionDefinition("is_active") if !found || line != 3 { t.Errorf("is_active: got line %d found %v", line, found) } @@ -540,7 +982,8 @@ func TestFindFunctionDefinition_Delegate(t *testing.T) { defdelegate fetch(id), to: MyApp.Repo end` - line, found := FindFunctionDefinition(text, "fetch") + tf := NewTokenizedFile(text) + line, found := tf.FindFunctionDefinition("fetch") if !found || line != 2 { t.Errorf("fetch: got line %d found %v", line, found) } @@ -552,25 +995,27 @@ func TestFindFunctionDefinition_InlineDo(t *testing.T) { defp secret(x), do: x * 2 end` - line, found := FindFunctionDefinition(text, "add") + tf := NewTokenizedFile(text) + line, found := tf.FindFunctionDefinition("add") if !found || line != 2 { t.Errorf("add: got line %d found %v", line, found) } - line, found = FindFunctionDefinition(text, "secret") + line, found = tf.FindFunctionDefinition("secret") if !found || line != 3 { t.Errorf("secret: got line %d found %v", line, found) } } -func TestExtractExpression_PipeOperator(t *testing.T) { - line := " |> Foo.Bar.transform()" - // col=12 is on 'a' of Bar → returns up to and including Bar - if got := ExtractExpression(line, 12); got != "Foo.Bar" { - t.Errorf("cursor on Bar: got %q, want %q", got, "Foo.Bar") +func TestExtractAliases_MultiAliasBraceUnexpectedTokenForwardProgress(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.{:unexpected, Accounts, 42, Users} +end` + aliases := ExtractAliases(text) + if aliases["Accounts"] != "MyApp.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Accounts", aliases["Accounts"]) } - // col=15 is on 't' of transform → returns full expression - if got := ExtractExpression(line, 15); got != "Foo.Bar.transform" { - t.Errorf("cursor on transform: got %q, want %q", got, "Foo.Bar.transform") + if aliases["Users"] != "MyApp.Users" { + t.Errorf("Users: got %q, want MyApp.Users", aliases["Users"]) } } @@ -599,25 +1044,30 @@ func TestExtractModuleAndFunction_QuestionMarkBang(t *testing.T) { } } -func TestExtractModuleAttribute(t *testing.T) { +func TestModuleAttributeAtCursor(t *testing.T) { tests := []struct { name string - line string + text string + line int col int expected string }{ - {"cursor on attr name", " tags: @open_api_shared_tags,", 18, "open_api_shared_tags"}, - {"cursor on @", " tags: @open_api_shared_tags,", 12, "open_api_shared_tags"}, - {"cursor at end of attr", " tags: @open_api_shared_tags,", 31, "open_api_shared_tags"}, - {"not on attr", " tags: :something,", 10, ""}, - {"standalone attr", " @endpoint_scopes %{", 4, "endpoint_scopes"}, + {"cursor on attr name", " tags: @open_api_shared_tags,", 0, 18, "open_api_shared_tags"}, + {"cursor on @", " tags: @open_api_shared_tags,", 0, 12, "open_api_shared_tags"}, + {"cursor at end of attr", " tags: @open_api_shared_tags,", 0, 31, "open_api_shared_tags"}, + {"not on attr", " tags: :something,", 0, 10, ""}, + {"standalone attr", " @endpoint_scopes %{", 0, 4, "endpoint_scopes"}, + {"inside string ignored", ` x = "has @fake_attr inside"`, 0, 14, ""}, + {"inside comment ignored", " # @fake_attr comment", 0, 5, ""}, + {"multiline second line", "first_line\n @my_attr value", 1, 5, "my_attr"}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - got := ExtractModuleAttribute(tt.line, tt.col) + tf := NewTokenizedFile(tt.text) + got := tf.ModuleAttributeAtCursor(tt.line, tt.col) if got != tt.expected { - t.Errorf("ExtractModuleAttribute(%q, %d) = %q, want %q", tt.line, tt.col, got, tt.expected) + t.Errorf("ModuleAttributeAtCursor(%d, %d) = %q, want %q", tt.line, tt.col, got, tt.expected) } }) } @@ -666,6 +1116,21 @@ end` t.Error("expected not found for nonexistent attribute") } }) + + t.Run("does not treat attribute reference as definition", func(t *testing.T) { + refText := `defmodule MyApp.Worker do + def run(job) do + process(@my_attr) + @my_attr + :ok + end +end` + + _, found := FindModuleAttributeDefinition(refText, "my_attr") + if found { + t.Error("expected reference-only @my_attr to not be treated as a definition") + } + }) } func TestExtractCompletionContext(t *testing.T) { @@ -908,6 +1373,7 @@ func TestParseUsingBody_InlineDefArity(t *testing.T) { def zero_arity, do: :ok def one_arity(x), do: x def two_arity(x, y), do: x + y + def bitstring_param(<>), do: {header, rest} defmacro my_macro(ast), do: ast end end @@ -932,6 +1398,7 @@ end` check("zero_arity", 0, "def") check("one_arity", 1, "def") check("two_arity", 2, "def") + check("bitstring_param", 1, "def") check("my_macro", 1, "defmacro") } @@ -1171,6 +1638,43 @@ end` } }) + t.Run("Keyword.fetch! and Keyword.pop! bindings", func(t *testing.T) { + text := `defmodule MyLib do + defmacro __using__(opts) do + fetched = Keyword.fetch!(opts, :fetched_mod) + {popped, opts} = Keyword.pop!(opts, :popped_mod, DefaultMod) + + quote do + import unquote(fetched) + use unquote(popped) + end + end +end` + _, _, _, optBindings, _ := parseUsingBody(text) + foundFetch := false + foundPop := false + for _, b := range optBindings { + if b.optKey == "fetched_mod" && b.kind == "import" { + foundFetch = true + if b.defaultMod != "" { + t.Errorf("fetch! should have no default, got %q", b.defaultMod) + } + } + if b.optKey == "popped_mod" && b.kind == "use" { + foundPop = true + if b.defaultMod != "DefaultMod" { + t.Errorf("pop! default: want DefaultMod, got %q", b.defaultMod) + } + } + } + if !foundFetch { + t.Errorf("expected opt binding for fetched_mod (via fetch!), got %v", optBindings) + } + if !foundPop { + t.Errorf("expected opt binding for popped_mod (via pop!), got %v", optBindings) + } + }) + t.Run("use unquote(mod) with Keyword.get default", func(t *testing.T) { text := `defmodule MyLib do defmacro __using__(opts \\ []) do @@ -1257,6 +1761,28 @@ end` } }) + t.Run("two alias as on one line in quote", func(t *testing.T) { + // Same regression as ExtractAliasesInScope semicolon case, but through parseUsingBody + // (use-chain / __using__ extraction uses a separate loop with the same nextPos rule). + text := `defmodule MyApp.Schema do + defmacro __using__(_opts) do + quote do + alias MyApp.Foo, as: MyFoo; alias MyApp.Bar, as: MyBar + end + end +end` + _, _, _, _, aliases := parseUsingBody(text) + if aliases == nil { + t.Fatal("expected aliases, got nil") + } + if aliases["MyFoo"] != "MyApp.Foo" { + t.Errorf("MyFoo: got %q, want MyApp.Foo", aliases["MyFoo"]) + } + if aliases["MyBar"] != "MyApp.Bar" { + t.Errorf("MyBar: got %q, want MyApp.Bar", aliases["MyBar"]) + } + }) + t.Run("alias resolved through file-level alias", func(t *testing.T) { text := `defmodule MyApp.Schema do alias Remote.Ecto.Schema, as: EctoSchema @@ -1283,60 +1809,258 @@ end` t.Errorf("expected nil aliases, got %v", aliases) } }) + + t.Run("multi alias with unexpected tokens does not hang", func(t *testing.T) { + text := `defmodule MyApp.Schema do + defmacro __using__(_opts) do + quote do + alias MyApp.{:unexpected, Repo, 42} + end + end +end` + _, _, _, _, aliases := parseUsingBody(text) + if aliases == nil || aliases["Repo"] != "MyApp.Repo" { + t.Errorf("Repo: got %q, want MyApp.Repo", aliases["Repo"]) + } + }) } -func TestParseKeywordModuleOpts(t *testing.T) { - tests := []struct { - name string - input string - aliases map[string]string - expected map[string]string - }{ - { - name: "single module opt", - input: "mod: Hammox", - expected: map[string]string{"mod": "Hammox"}, - }, - { - name: "multiple module opts", - input: "mod: Hammox, repo: MyRepo", - expected: map[string]string{"mod": "Hammox", "repo": "MyRepo"}, - }, - { - name: "alias resolved", - input: "mod: Hammox", - aliases: map[string]string{"Hammox": "MyApp.Hammox"}, - expected: map[string]string{"mod": "MyApp.Hammox"}, - }, - { - name: "non-module values ignored", - input: "mod: Hammox, async: true, queue: :default", - expected: map[string]string{"mod": "Hammox"}, - }, - { - name: "empty string", - input: "", - expected: map[string]string{}, - }, - { - name: "dotted module name", - input: "repo: MyApp.Oban.Repo", - expected: map[string]string{"repo": "MyApp.Oban.Repo"}, - }, +func TestParseUsingBody_IgnoresHelperCallsInsideInlineDefBodies(t *testing.T) { + text := `defmodule MyLib do + def helper_name(_opts) do + quote do + import SharedLib.Hidden + end + end + + defmacro __using__(opts) do + quote do + def run(value) do + helper_name(opts) + value + end + end + end +end` + + imported, _, _, _, _ := parseUsingBody(text) + if len(imported) != 0 { + t.Fatalf("expected no imports from helper call inside inline def body, got %v", imported) } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := ParseKeywordModuleOpts(tt.input, tt.aliases) - if len(got) != len(tt.expected) { - t.Fatalf("ParseKeywordModuleOpts(%q) = %v, want %v", tt.input, got, tt.expected) +} + +func TestParseHelperQuoteBlock_IgnoresInlineDefBodies(t *testing.T) { + text := `defmodule MyLib do + def helper_name(_opts) do + quote do + def run(value) do + import SharedLib.Hidden + value + end + end + end +end` + + lines := strings.Split(text, "\n") + imported, _, _, _, _ := parseHelperQuoteBlock(lines, "helper_name", nil) + if len(imported) != 0 { + t.Fatalf("expected no imports from inside inline def body, got %v", imported) + } +} + +func TestParseHelperQuoteBlock_MultiAliasUnexpectedTokenForwardProgress(t *testing.T) { + text := `defmodule MyLib do + def build_aliases(_opts) do + quote do + alias MyApp.{:unexpected, Accounts, 42} + end + end +end` + lines := strings.Split(text, "\n") + _, _, _, _, aliases := parseHelperQuoteBlock(lines, "build_aliases", nil) + if aliases == nil || aliases["Accounts"] != "MyApp.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Accounts", aliases["Accounts"]) + } +} + +func TestParseUsingBody_HeredocModuledoc(t *testing.T) { + // Regression: moduledocs with code examples containing brackets that span + // multiple lines (e.g. multi-line keyword lists, markdown links) must not + // confuse the parser. Line-based joinBracketLines treats heredoc content as + // code, causing unmatched [ or ( on one line to join with all subsequent + // lines until the bracket closes — potentially swallowing defmacro __using__. + t.Run("import inside __using__ survives moduledoc with brackets", func(t *testing.T) { + text := `defmodule SharedLib.Pro.Workers.Chunk do + @moduledoc """ + Chunk workers execute jobs in groups based on a size or timeout option. + + ## Usage + + defmodule MyApp.ChunkWorker do + use SharedLib.Pro.Workers.Chunk, queue: :messages, size: 100 + end + + ## Options + + Options are passed as a keyword list: + + [ + by: :worker, + size: 100, + timeout: 1000 + ] + + The [return values](#t:result/0) are different from standard workers. + + See [the documentation](#module-options) for more details. + """ + + @type options :: [ + by: atom(), + size: pos_integer(), + timeout: pos_integer() + ] + + @doc false + defmacro __using__(opts) do + {chunk_opts, other_opts} = Keyword.split(opts, [:by, :size, :timeout]) + + quote do + use SharedLib.Pro.Worker, unquote(other_opts) + + alias SharedLib.Pro.Workers.Chunk + + @impl SharedLib.Worker + def new(args, opts) when is_map(args) and is_list(opts) do + super(args, opts) + end + + @impl SharedLib.Worker + def perform(%Job{} = job) do + :ok + end + end + end +end` + imports, inlineDefs, transUses, _, _ := parseUsingBody(text) + // The __using__ body has "use SharedLib.Pro.Worker" — should appear in transUses + found := false + for _, u := range transUses { + if u == "SharedLib.Pro.Worker" { + found = true } - for k, v := range tt.expected { - if got[k] != v { - t.Errorf("key %q: got %q, want %q", k, got[k], v) - } + } + if !found { + t.Errorf("expected SharedLib.Pro.Worker in transUses, got %v", transUses) + } + // Inline defs: new/2, perform/1 + if _, ok := inlineDefs["new"]; !ok { + t.Errorf("expected 'new' in inlineDefs, got keys: %v", mapKeys(inlineDefs)) + } + if _, ok := inlineDefs["perform"]; !ok { + t.Errorf("expected 'perform' in inlineDefs, got keys: %v", mapKeys(inlineDefs)) + } + _ = imports + }) + + t.Run("full chain: import through __using__ with long moduledoc", func(t *testing.T) { + text := `defmodule SharedLib.Pro.Worker do + @moduledoc """ + The SharedLib.Pro.Worker is a replacement for SharedLib.Worker with expanded + capabilities such as encryption and output recording. + + ## Usage + + def MyApp.Worker do + use SharedLib.Pro.Worker + + @impl SharedLib.Pro.Worker + def process(%Job{} = job) do + :ok + end + end + + ## Encryption + + Workers can be encrypted by passing the ` + "`:encrypted`" + ` option: + + use SharedLib.Pro.Worker, + encrypted: [key: {MyApp.Config, :secret_key}] + + ## Hooks + + Lifecycle hooks are declared with the ` + "`:hooks`" + ` option: + + use SharedLib.Pro.Worker, + hooks: [ + on_start: &MyApp.Telemetry.worker_started/1, + on_complete: &MyApp.Telemetry.worker_completed/1 + ] + """ + + defmacro __using__(opts) do + {_hook_opts, other_opts} = Keyword.split(opts, [:hooks, :encrypted]) + + quote do + @behaviour SharedLib.Worker + @behaviour SharedLib.Pro.Worker + + import SharedLib.Pro.Worker, + only: [ + args_schema: 1, + field: 2, + field: 3, + embeds_one: 2, + embeds_one: 3 + ] + + alias SharedLib.{Job, Worker} + + def __opts__, do: unquote(other_opts) + end + end + + defmacro args_schema(do: block) do + quote do + Module.register_attribute(__MODULE__, :args_fields, accumulate: true) + unquote(block) + end + end + + defmacro field(name, type, opts \\ []) do + quote do + @args_fields {unquote(name), unquote(type), unquote(opts)} + end + end +end` + imports, inlineDefs, _, _, aliases := parseUsingBody(text) + // Should find the import + found := false + for _, imp := range imports { + if imp == "SharedLib.Pro.Worker" { + found = true } - }) + } + if !found { + t.Errorf("expected SharedLib.Pro.Worker in imports, got %v", imports) + } + // Should find inline def __opts__ + if _, ok := inlineDefs["__opts__"]; !ok { + t.Errorf("expected '__opts__' in inlineDefs, got keys: %v", mapKeys(inlineDefs)) + } + // Should find aliases + if aliases == nil || aliases["Job"] != "SharedLib.Job" { + t.Errorf("expected alias Job -> SharedLib.Job, got %v", aliases) + } + }) +} + +func mapKeys[V any](m map[string][]V) []string { + keys := make([]string, 0, len(m)) + for k := range m { + keys = append(keys, k) } + return keys } func TestExtractUsesWithOpts(t *testing.T) { @@ -1384,6 +2108,34 @@ func TestExtractUsesWithOpts(t *testing.T) { t.Errorf("alias not resolved: got %q", calls[0].Opts["mod"]) } }) + + t.Run("multiline opts", func(t *testing.T) { + text := "defmodule Foo do\n use Tool,\n name: \"mock\",\n controller: CompanyController,\n action: :show\nend" + calls := ExtractUsesWithOpts(text, nil) + if len(calls) != 1 { + t.Fatalf("expected 1 use call, got %d", len(calls)) + } + if calls[0].Module != "Tool" { + t.Errorf("module: want Tool, got %q", calls[0].Module) + } + if calls[0].Opts["controller"] != "CompanyController" { + t.Errorf("controller: want CompanyController, got %q", calls[0].Opts["controller"]) + } + }) + + t.Run("multiline opts with module values", func(t *testing.T) { + text := "defmodule Foo do\n use Remote.Mox,\n mod: Hammox,\n repo: MyRepo\nend" + calls := ExtractUsesWithOpts(text, nil) + if len(calls) != 1 { + t.Fatalf("expected 1 use call, got %d", len(calls)) + } + if calls[0].Opts["mod"] != "Hammox" { + t.Errorf("mod: want Hammox, got %q", calls[0].Opts["mod"]) + } + if calls[0].Opts["repo"] != "MyRepo" { + t.Errorf("repo: want MyRepo, got %q", calls[0].Opts["repo"]) + } + }) } func TestFindBufferFunctions(t *testing.T) { @@ -1453,7 +2205,7 @@ end` }) } -func TestExtractCallContext(t *testing.T) { +func TestCallContextAtCursor(t *testing.T) { tests := []struct { name string text string @@ -1462,6 +2214,7 @@ func TestExtractCallContext(t *testing.T) { wantArgIdx int wantOK bool }{ + // Parenthesized calls { name: "simple call first arg", text: "foo(x, y)", @@ -1499,7 +2252,7 @@ func TestExtractCallContext(t *testing.T) { wantOK: true, }, { - name: "multi-line", + name: "multi-line paren call", text: "defmodule MyApp do\n def run do\n foo(x,\n y)\n end\nend", line: 3, col: 6, @@ -1511,48 +2264,368 @@ func TestExtractCallContext(t *testing.T) { name: "not in call", text: "x = 1", line: 0, - col: 0, + col: 4, wantExpr: "", wantArgIdx: 0, wantOK: false, }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - expr, argIdx, ok := ExtractCallContext(tt.text, tt.line, tt.col) - if ok != tt.wantOK { - t.Errorf("ok = %v, want %v", ok, tt.wantOK) - } - if expr != tt.wantExpr { - t.Errorf("expr = %q, want %q", expr, tt.wantExpr) - } - if argIdx != tt.wantArgIdx { - t.Errorf("argIdx = %d, want %d", argIdx, tt.wantArgIdx) - } - }) - } -} - -func TestFindBareFunctionCalls(t *testing.T) { - tests := []struct { - name string - text string - funcName string - want []int - }{ + // Paren-less calls { - name: "simple call", - text: "def foo do\n bar(x)\nend", - funcName: "bar", - want: []int{2}, + name: "no-paren qualified call first arg", + text: `IO.puts "hello"`, + line: 0, + col: 10, + wantExpr: "IO.puts", + wantArgIdx: 0, + wantOK: true, }, { - name: "keyword key shadows call on same line", - text: "def foo do\n %{resource_type: resource_type(x)}\nend", - funcName: "resource_type", - want: []int{2}, - }, + name: "no-paren bare call first arg", + text: `import MyApp.Repo`, + line: 0, + col: 10, + wantExpr: "import", + wantArgIdx: 0, + wantOK: true, + }, + { + name: "no-paren keyword if is not a call", + text: "if true do\n :ok\nend", + line: 0, + col: 5, + wantExpr: "", + wantArgIdx: 0, + wantOK: false, + }, + { + name: "no-paren two args second", + text: `Enum.each list, fun`, + line: 0, + col: 18, + wantExpr: "Enum.each", + wantArgIdx: 1, + wantOK: true, + }, + { + name: "no-paren inside string not matched", + text: `x = "foo bar"`, + line: 0, + col: 8, + wantExpr: "", + wantArgIdx: 0, + wantOK: false, + }, + // Edge cases: maps, nested calls, keyword lists, tuples + { + name: "no-paren map param cursor on key", + text: `IO.inspect %{a: 1, b: 2}`, + line: 0, + col: 15, + wantExpr: "IO.inspect", + wantArgIdx: 0, + wantOK: true, + }, + { + name: "no-paren map param cursor inside map after comma", + text: `IO.inspect %{a: 1, b: 2}`, + line: 0, + col: 20, + wantExpr: "IO.inspect", + wantArgIdx: 0, + wantOK: true, + }, + { + name: "no-paren with nested paren call as arg", + text: `IO.puts String.upcase("hi")`, + line: 0, + col: 23, + wantExpr: "String.upcase", + wantArgIdx: 0, + wantOK: true, + }, + { + name: "no-paren second arg is paren call", + text: `Enum.each list, Enum.count(other)`, + line: 0, + col: 30, + wantExpr: "Enum.count", + wantArgIdx: 0, + wantOK: true, + }, + { + name: "no-paren keyword list arg", + text: `plug :auth, only: [:index]`, + line: 0, + col: 20, + wantExpr: "plug", + wantArgIdx: 1, + wantOK: true, + }, + { + name: "no-paren tuple arg", + text: `send self(), {:ok, result}`, + line: 0, + col: 20, + wantExpr: "send", + wantArgIdx: 1, + wantOK: true, + }, + { + name: "no-paren with paren call as sole arg", + text: `IO.puts inspect(x)`, + line: 0, + col: 17, + wantExpr: "inspect", + wantArgIdx: 0, + wantOK: true, + }, + { + name: "cursor on func name of no-paren call", + text: `IO.puts "hello"`, + line: 0, + col: 5, + wantExpr: "IO.puts", + wantArgIdx: 0, + wantOK: true, + }, + // Pipe operator — cursor inside paren call on RHS + { + name: "pipe into paren call", + text: `list |> Enum.map(fn x -> x end)`, + line: 0, + col: 20, + wantExpr: "Enum.map", + wantArgIdx: 0, + wantOK: true, + }, + // Struct as argument + { + name: "no-paren struct arg", + text: `Repo.insert %User{name: "joe"}`, + line: 0, + col: 20, + wantExpr: "Repo.insert", + wantArgIdx: 0, + wantOK: true, + }, + // Multi-line no-paren (comma at end of prev line) + { + name: "multi-line no-paren call", + text: "use MyApp.Web,\n controllers: true", + line: 1, + col: 15, + wantExpr: "use", + wantArgIdx: 1, + wantOK: true, + }, + // Nested paren call inside paren call — cursor on outer's second arg + { + name: "nested paren call second arg of outer", + text: `Enum.reduce(list, %{}, fn x, acc -> acc end)`, + line: 0, + col: 18, + wantExpr: "Enum.reduce", + wantArgIdx: 1, + wantOK: true, + }, + // Sigil as argument to no-paren call + { + name: "no-paren sigil arg", + text: `Regex.compile ~r/foo/`, + line: 0, + col: 18, + wantExpr: "Regex.compile", + wantArgIdx: 0, + wantOK: true, + }, + // Guard clause — cursor inside is_integer(x) call + { + name: "inside guard call", + text: `def foo(x) when is_integer(x) do`, + line: 0, + col: 27, + wantExpr: "is_integer", + wantArgIdx: 0, + wantOK: true, + }, + // Capture operator — not a call + { + name: "capture operator not a call", + text: `&Enum.map/2`, + line: 0, + col: 5, + wantExpr: "", + wantArgIdx: 0, + wantOK: false, + }, + // Bare assignment — not a call + { + name: "assignment not a call", + text: `result = Enum.map(list, fun)`, + line: 0, + col: 3, + wantExpr: "", + wantArgIdx: 0, + wantOK: false, + }, + // Nested map inside paren call + { + name: "map inside paren call", + text: `Repo.insert(%{name: "joe", age: 30})`, + line: 0, + col: 25, + wantExpr: "Repo.insert", + wantArgIdx: 0, + wantOK: true, + }, + // Keyword list as last arg in paren call + { + name: "keyword list in paren call", + text: `GenServer.call(pid, :msg, timeout: 5000)`, + line: 0, + col: 35, + wantExpr: "GenServer.call", + wantArgIdx: 2, + wantOK: true, + }, + // Empty args — cursor right after open paren + { + name: "cursor right after open paren", + text: `foo()`, + line: 0, + col: 4, + wantExpr: "foo", + wantArgIdx: 0, + wantOK: true, + }, + // Cross-line: unrelated expression on previous line should not match + { + name: "unrelated line above not matched", + text: "result\n\"hello\"", + line: 1, + col: 3, + wantExpr: "", + wantArgIdx: 0, + wantOK: false, + }, + // def/defp — keyword not treated as call + { + name: "def keyword not a call", + text: `def foo(x) do`, + line: 0, + col: 8, + wantExpr: "foo", + wantArgIdx: 0, + wantOK: true, + }, + // Binary <<>> as argument + { + name: "no-paren binary arg", + text: `send pid, <<1, 2, 3>>`, + line: 0, + col: 15, + wantExpr: "send", + wantArgIdx: 1, + wantOK: true, + }, + // fn/end block as argument to paren call + { + name: "fn end block inside paren call", + text: "Enum.map(list, fn x -> x * 2 end)", + line: 0, + col: 25, + wantExpr: "Enum.map", + wantArgIdx: 1, + wantOK: true, + }, + // Cursor inside fn block body — enclosing call is Task.async + { + name: "cursor inside fn block of paren call", + text: "Task.async(fn ->\n heavy_work()\nend)", + line: 1, + col: 5, + wantExpr: "Task.async", + wantArgIdx: 0, + wantOK: true, + }, + // Pipe chain — cursor on rightmost call + { + name: "pipe chain paren call", + text: `list |> Enum.filter(fn x -> x > 0 end) |> Enum.map(fn x -> x * 2 end)`, + line: 0, + col: 55, + wantExpr: "Enum.map", + wantArgIdx: 0, + wantOK: true, + }, + // Anonymous function call var.(arg) + { + name: "anonymous function call", + text: `callback.(arg1, arg2)`, + line: 0, + col: 15, + wantExpr: "callback", + wantArgIdx: 1, + wantOK: true, + }, + // Nested keyword default — def foo(opts \\ [key: :val]) + { + name: "cursor inside default keyword list", + text: `def foo(opts \\ [key: :val]) do`, + line: 0, + col: 22, + wantExpr: "foo", + wantArgIdx: 0, + wantOK: true, + }, + // String interpolation as argument + { + name: "string interpolation arg", + text: `Logger.info("User #{name} logged in")`, + line: 0, + col: 20, + wantExpr: "Logger.info", + wantArgIdx: 0, + wantOK: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + tf := NewTokenizedFile(tt.text) + expr, argIdx, ok := tf.CallContextAtCursor(tt.line, tt.col) + if ok != tt.wantOK { + t.Errorf("ok = %v, want %v", ok, tt.wantOK) + } + if expr != tt.wantExpr { + t.Errorf("expr = %q, want %q", expr, tt.wantExpr) + } + if argIdx != tt.wantArgIdx { + t.Errorf("argIdx = %d, want %d", argIdx, tt.wantArgIdx) + } + }) + } +} + +func TestFindBareFunctionCalls(t *testing.T) { + tests := []struct { + name string + text string + funcName string + want []int + }{ + { + name: "simple call", + text: "def foo do\n bar(x)\nend", + funcName: "bar", + want: []int{2}, + }, + { + name: "keyword key shadows call on same line", + text: "def foo do\n %{resource_type: resource_type(x)}\nend", + funcName: "resource_type", + want: []int{2}, + }, { name: "keyword key only, no call", text: "def foo do\n %{resource_type: :payroll}\nend", @@ -1630,3 +2703,578 @@ func TestExtractParamNames(t *testing.T) { }) } } + +func TestExtractAliasesInScope_AliasInString(t *testing.T) { + text := `defmodule MyApp.Foo do + def bar do + x = "alias MyApp.Helpers, as: H" + H.help() + end +end` + aliases := ExtractAliasesInScope(text, 3) + if _, ok := aliases["H"]; ok { + t.Error("should not extract alias from string content") + } +} + +func TestExtractAliasesInScope_AliasInHeredoc(t *testing.T) { + text := `defmodule MyApp.Foo do + @doc """ + alias MyApp.Helpers, as: H + """ + def bar do + H.help() + end +end` + aliases := ExtractAliasesInScope(text, 5) + if _, ok := aliases["H"]; ok { + t.Error("should not extract alias from heredoc content") + } +} + +func TestExtractAliasesInScope_MultilineAliasWithComment(t *testing.T) { + text := `defmodule MyApp.Foo do + alias MyApp.Helpers.Paginator, + # Short name for convenience + as: Pages + + def bar, do: Pages.paginate() +end` + aliases := ExtractAliasesInScope(text, 5) + if aliases["Pages"] != "MyApp.Helpers.Paginator" { + t.Errorf("expected Pages -> MyApp.Helpers.Paginator, got %q", aliases["Pages"]) + } +} + +func TestExtractAliasesInScope_NestedModuleScope(t *testing.T) { + text := `defmodule MyApp.Outer do + alias MyApp.Helpers + + defmodule Inner do + def bar, do: Helpers.help() + end +end` + outerAliases := ExtractAliasesInScope(text, 1) + innerAliases := ExtractAliasesInScope(text, 4) + + if outerAliases["Helpers"] != "MyApp.Helpers" { + t.Error("outer module should have the alias") + } + if _, ok := innerAliases["Helpers"]; ok { + t.Error("inner module should NOT inherit outer alias") + } +} + +func TestExtractAliasesInScope_MultilineBlockTrailingComma(t *testing.T) { + text := `defmodule MyApp.Web do + alias MyApp.{ + Accounts, + Users, + } + + def foo, do: Accounts.list() +end` + aliases := ExtractAliasesInScope(text, 6) + if aliases["Accounts"] != "MyApp.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Accounts", aliases["Accounts"]) + } + if aliases["Users"] != "MyApp.Users" { + t.Errorf("Users: got %q, want MyApp.Users", aliases["Users"]) + } +} + +func TestExtractEnclosingModuleFromTokens_NestedModules(t *testing.T) { + text := `defmodule MyApp.Outer do + defmodule Inner do + def run do + __MODULE__ + end + end + + def call do + __MODULE__ + end +end` + + tokens := parser.Tokenize([]byte(text)) + + inner := extractEnclosingModuleFromTokens([]byte(text), tokens, 3) + if inner != "MyApp.Outer.Inner" { + t.Errorf("inner: got %q, want MyApp.Outer.Inner", inner) + } + + outer := extractEnclosingModuleFromTokens([]byte(text), tokens, 7) + if outer != "MyApp.Outer" { + t.Errorf("outer: got %q, want MyApp.Outer", outer) + } +} + +func TestExtractEnclosingModuleFromTokens_DoesNotStealLaterDoFromInlineModule(t *testing.T) { + text := `defmodule MyApp.Outer do + defmodule Inline, do: nil + + def run do + __MODULE__ + end +end` + + tokens := parser.Tokenize([]byte(text)) + + enclosing := extractEnclosingModuleFromTokens([]byte(text), tokens, 4) + if enclosing != "MyApp.Outer" { + t.Errorf("got %q, want MyApp.Outer", enclosing) + } +} + +func TestExtractUsesWithOpts_StringContent(t *testing.T) { + text := `defmodule MyApp.Foo do + def bar do + x = "use Tool," + y = "name: mock" + end +end` + calls := ExtractUsesWithOpts(text, nil) + for _, c := range calls { + if c.Module == "Tool" { + t.Error("should not extract use from string content") + } + } +} + +func TestExtractAliasBlockParent_NotConfusedByMapBraces(t *testing.T) { + lines := strings.Split(`defmodule MyApp.Foo do + def bar do + map = %{ + key: "value" + } + end +end`, "\n") + _, inBlock := ExtractAliasBlockParent(lines, 3) + if inBlock { + t.Error("map literal brace should not be detected as alias block") + } +} + +func TestSkipToEndOfStatement_NegativeDepthClamp(t *testing.T) { + // Regression: skipToEndOfStatement would go negative on unmatched closing + // brackets, causing premature termination on the next TokEOL. + source := []byte("x = ) + y\nz = 1") + tokens := parser.Tokenize(source) + n := len(tokens) + + // Start at index 0; the ) at index 2 is unmatched. + // Without clamping, depth goes -1, and the function returns at the first EOL. + // With clamping, we should reach the EOL at the end of the first line normally. + endIdx := skipToEndOfStatement(tokens, n, 0) + + // We expect it to stop at the EOL after "y" (end of first statement) + if endIdx >= n { + t.Fatalf("expected endIdx < n, got %d", endIdx) + } + if tokens[endIdx].Kind != parser.TokEOL && tokens[endIdx].Kind != parser.TokEOF { + t.Errorf("expected TokEOL or TokEOF at endIdx, got %v", tokens[endIdx].Kind) + } +} + +func TestExtractEnclosingModule_DefprotocolAndDefimpl(t *testing.T) { + // Regression: extractEnclosingModuleFromTokens only handled TokDefmodule, + // missing TokDefprotocol and TokDefimpl. + t.Run("defprotocol", func(t *testing.T) { + text := `defprotocol MyApp.Printable do + def print(data) +end` + tokens := parser.Tokenize([]byte(text)) + enclosing := extractEnclosingModuleFromTokens([]byte(text), tokens, 1) + if enclosing != "MyApp.Printable" { + t.Errorf("got %q, want MyApp.Printable", enclosing) + } + }) + + t.Run("defimpl", func(t *testing.T) { + text := `defimpl MyApp.Printable, for: MyApp.User do + def print(user), do: user.name +end` + tokens := parser.Tokenize([]byte(text)) + enclosing := extractEnclosingModuleFromTokens([]byte(text), tokens, 1) + if enclosing != "MyApp.Printable" { + t.Errorf("got %q, want MyApp.Printable", enclosing) + } + }) +} + +func TestExtractAliasesInScope_DefmoduleDoOnNextLine(t *testing.T) { + // Regression: when `do` appears on the next line after defmodule, + // the module frame was not properly pushed, causing aliases to leak. + text := `defmodule MyApp.Outer +do + alias MyApp.OuterOnly + + defmodule Inner do + alias MyApp.InnerOnly + def run, do: InnerOnly.call() + end + + def outer_run, do: OuterOnly.call() +end` + + // Line 6 is inside Inner — should see InnerOnly but not OuterOnly + innerAliases := ExtractAliasesInScope(text, 6) + if innerAliases["InnerOnly"] != "MyApp.InnerOnly" { + t.Errorf("InnerOnly: got %q, want MyApp.InnerOnly", innerAliases["InnerOnly"]) + } + if _, ok := innerAliases["OuterOnly"]; ok { + t.Error("OuterOnly should NOT be visible inside Inner") + } + + // Line 9 is inside Outer after Inner ends — should see OuterOnly but not InnerOnly + outerAliases := ExtractAliasesInScope(text, 9) + if outerAliases["OuterOnly"] != "MyApp.OuterOnly" { + t.Errorf("OuterOnly: got %q, want MyApp.OuterOnly", outerAliases["OuterOnly"]) + } + if _, ok := outerAliases["InnerOnly"]; ok { + t.Error("InnerOnly should NOT leak to outer scope") + } +} + +func TestExtractAliases_MultiAliasUnexpectedTokensForwardProgress(t *testing.T) { + // Regression: collectModuleName returning ("", k) without advancing k + // caused infinite loops in multi-alias brace scanning. + // Note: we test atoms and numbers as unexpected tokens. Maps with braces + // are a separate edge case that may confuse brace depth tracking. + text := `defmodule MyApp.Web do + alias MyApp.{ + :unexpected_atom, + Accounts, + 123, + Users + } +end` + aliases := ExtractAliases(text) + + // Should extract valid module names despite unexpected tokens + if aliases["Accounts"] != "MyApp.Accounts" { + t.Errorf("Accounts: got %q, want MyApp.Accounts", aliases["Accounts"]) + } + if aliases["Users"] != "MyApp.Users" { + t.Errorf("Users: got %q, want MyApp.Users", aliases["Users"]) + } +} + +func TestParseUsingBody_KeywordFetchAndPopBang(t *testing.T) { + // Regression: Keyword.fetch! and Keyword.pop! were not handled because + // the switch cases only checked "fetch" and "pop", not "fetch!" and "pop!". + text := `defmodule MyLib do + defmacro __using__(opts) do + required_mod = Keyword.fetch!(opts, :required_mod) + {optional_mod, opts} = Keyword.pop!(opts, :optional_mod, DefaultMod) + + quote do + import unquote(required_mod) + use unquote(optional_mod) + end + end +end` + + _, _, _, optBindings, _ := parseUsingBody(text) + + foundFetch := false + foundPop := false + for _, b := range optBindings { + if b.optKey == "required_mod" && b.kind == "import" { + foundFetch = true + if b.defaultMod != "" { + t.Errorf("fetch! should have no default, got %q", b.defaultMod) + } + } + if b.optKey == "optional_mod" && b.kind == "use" { + foundPop = true + if b.defaultMod != "DefaultMod" { + t.Errorf("pop! default: want DefaultMod, got %q", b.defaultMod) + } + } + } + if !foundFetch { + t.Errorf("expected opt binding for required_mod via fetch!, got %v", optBindings) + } + if !foundPop { + t.Errorf("expected opt binding for optional_mod via pop!, got %v", optBindings) + } +} + +func TestFindModuleAttributeDefinition_StatementStartCheck(t *testing.T) { + // Regression: FindModuleAttributeDefinitionTokenized matched @attr used + // as a value reference (not at statement start), jumping to wrong locations. + text := `defmodule MyApp.Worker do + @config_value %{timeout: 5000} + + def run(job) do + process(@config_value) + @config_value + :ok + end +end` + + line, found := FindModuleAttributeDefinition(text, "config_value") + if !found { + t.Fatal("expected to find @config_value definition") + } + // Should find line 2 (the actual definition), not lines 5 or 6 (references) + if line != 2 { + t.Errorf("expected definition at line 2, got line %d", line) + } +} + +func TestCallContextNoParen_KeywordFilter(t *testing.T) { + // Regression: callContextNoParen didn't filter Elixir keywords like `if`, + // `case`, `cond`, `with`, causing them to be detected as function calls. + tests := []struct { + name string + text string + line int + col int + wantOK bool + }{ + {"if is not a call", "if true do\n :ok\nend", 0, 5, false}, + {"case is not a call", "case x do\n _ -> :ok\nend", 0, 6, false}, + {"cond is not a call", "cond do\n true -> :ok\nend", 0, 3, false}, + {"with is not a call", "with {:ok, x} <- foo() do\n x\nend", 0, 10, false}, + {"unless is not a call", "unless false do\n :ok\nend", 0, 8, false}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + tf := NewTokenizedFile(tt.text) + _, _, ok := tf.CallContextAtCursor(tt.line, tt.col) + if ok != tt.wantOK { + t.Errorf("got ok=%v, want ok=%v", ok, tt.wantOK) + } + }) + } +} + +// ============================================================================= +// Consistency tests - ensure similar functions handle edge cases the same way +// ============================================================================= + +// TestModuleScopeConsistency verifies that functions handling module scope +// produce consistent results for tricky edge cases. +func TestModuleScopeConsistency(t *testing.T) { + // These test cases have historically caused divergence between similar functions + testCases := []struct { + name string + text string + innerLine int // line inside inner module + outerLine int // line inside outer module (after inner ends) + wantInnerMod string + wantOuterMod string + wantInnerAlias string // alias only visible in inner + wantOuterAlias string // alias only visible in outer + }{ + { + name: "nested modules basic", + text: `defmodule MyApp.Outer do + alias MyApp.OuterOnly + + defmodule Inner do + alias MyApp.InnerOnly + def run, do: :ok + end + + def call, do: :ok +end`, + innerLine: 5, + outerLine: 8, + wantInnerMod: "MyApp.Outer.Inner", + wantOuterMod: "MyApp.Outer", + wantInnerAlias: "InnerOnly", + wantOuterAlias: "OuterOnly", + }, + { + name: "do on next line", + text: `defmodule MyApp.Outer +do + alias MyApp.OuterOnly + + defmodule Inner + do + alias MyApp.InnerOnly + def run, do: :ok + end + + def call, do: :ok +end`, + innerLine: 7, + outerLine: 10, + wantInnerMod: "MyApp.Outer.Inner", + wantOuterMod: "MyApp.Outer", + wantInnerAlias: "InnerOnly", + wantOuterAlias: "OuterOnly", + }, + { + name: "defprotocol and defimpl", + text: `defprotocol MyApp.Printable do + alias MyApp.ProtoOnly + def print(data) +end + +defimpl MyApp.Printable, for: MyApp.User do + alias MyApp.ImplOnly + def print(user), do: user.name +end`, + innerLine: 2, // inside protocol + outerLine: 7, // inside impl + wantInnerMod: "MyApp.Printable", + wantOuterMod: "MyApp.Printable", + wantInnerAlias: "ProtoOnly", + wantOuterAlias: "ImplOnly", + // Note: protocol and impl are separate top-level constructs, + // so their aliases don't leak to each other (both are "" for the other) + }, + { + name: "fn...end does not break scope", + text: `defmodule MyApp.Worker do + alias MyApp.Helper + + def run do + handler = fn x -> + x * 2 + end + handler.(1) + end + + def other, do: Helper.call() +end`, + innerLine: 5, // inside fn + outerLine: 10, // after fn ends + wantInnerMod: "MyApp.Worker", + wantOuterMod: "MyApp.Worker", + wantInnerAlias: "Helper", + wantOuterAlias: "Helper", + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + // Test extractEnclosingModuleFromTokens + source := []byte(tc.text) + tokens := parser.Tokenize(source) + + innerMod := extractEnclosingModuleFromTokens(source, tokens, tc.innerLine) + if innerMod != tc.wantInnerMod { + t.Errorf("enclosing module at inner line %d: got %q, want %q", + tc.innerLine, innerMod, tc.wantInnerMod) + } + + outerMod := extractEnclosingModuleFromTokens(source, tokens, tc.outerLine) + if outerMod != tc.wantOuterMod { + t.Errorf("enclosing module at outer line %d: got %q, want %q", + tc.outerLine, outerMod, tc.wantOuterMod) + } + + // Test ExtractAliasesInScope + innerAliases := ExtractAliasesInScope(tc.text, tc.innerLine) + if tc.wantInnerAlias != "" { + if _, ok := innerAliases[tc.wantInnerAlias]; !ok { + t.Errorf("inner line %d: expected alias %q not found, got %v", + tc.innerLine, tc.wantInnerAlias, innerAliases) + } + } + // Only check alias leakage if inner and outer are different scopes + // (same module name means they're in the same scope or separate top-level modules) + if tc.wantOuterAlias != "" && tc.wantOuterAlias != tc.wantInnerAlias && tc.wantInnerMod != tc.wantOuterMod { + if _, ok := innerAliases[tc.wantOuterAlias]; ok { + t.Errorf("inner line %d: outer alias %q should not be visible", + tc.innerLine, tc.wantOuterAlias) + } + } + + outerAliases := ExtractAliasesInScope(tc.text, tc.outerLine) + if tc.wantOuterAlias != "" { + if _, ok := outerAliases[tc.wantOuterAlias]; !ok { + t.Errorf("outer line %d: expected alias %q not found, got %v", + tc.outerLine, tc.wantOuterAlias, outerAliases) + } + } + // Only check alias leakage if inner and outer are different scopes + if tc.wantInnerAlias != "" && tc.wantInnerAlias != tc.wantOuterAlias && tc.wantInnerMod != tc.wantOuterMod { + if _, ok := outerAliases[tc.wantInnerAlias]; ok { + t.Errorf("outer line %d: inner alias %q should not be visible", + tc.outerLine, tc.wantInnerAlias) + } + } + }) + } +} + +// TestDepthTrackingConsistency verifies that all depth-tracking code handles +// edge cases consistently (especially negative depth clamping). +func TestDepthTrackingConsistency(t *testing.T) { + // Code with unmatched brackets at start (simulates cursor mid-expression) + testCases := []struct { + name string + text string + line int + wantOK bool // should not crash or return garbage + }{ + { + name: "unmatched close paren", + text: ") + foo(x)\nbar()", + line: 1, + wantOK: true, + }, + { + name: "unmatched close bracket", + text: "] ++ list\nother()", + line: 1, + wantOK: true, + }, + { + name: "unmatched end", + text: "end\ndef foo, do: :ok", + line: 1, + wantOK: true, + }, + { + name: "deeply nested then unmatched", + text: "foo(bar([{x}]))\n))]}\nvalid()", + line: 2, + wantOK: true, + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + // These should not panic + defer func() { + if r := recover(); r != nil { + t.Errorf("panic on %q: %v", tc.name, r) + } + }() + + // Test various functions that track depth + source := []byte(tc.text) + tokens := parser.Tokenize(source) + + // extractEnclosingModuleFromTokens + _ = extractEnclosingModuleFromTokens(source, tokens, tc.line) + + // ExtractAliasesInScope + _ = ExtractAliasesInScope(tc.text, tc.line) + + // skipToEndOfStatement + if len(tokens) > 0 { + _ = skipToEndOfStatement(tokens, len(tokens), 0) + } + + // TokenWalker + w := parser.NewTokenWalker(source, tokens) + for w.More() { + w.Advance() + } + if w.Depth() < 0 || w.BlockDepth() < 0 { + t.Errorf("TokenWalker depth went negative: depth=%d blockDepth=%d", + w.Depth(), w.BlockDepth()) + } + }) + } +} diff --git a/internal/lsp/hover.go b/internal/lsp/hover.go index b0cdedb..104d737 100644 --- a/internal/lsp/hover.go +++ b/internal/lsp/hover.go @@ -5,7 +5,6 @@ import ( "go.lsp.dev/protocol" - "github.com/remoteoss/dexter/internal/parser" "github.com/remoteoss/dexter/internal/store" ) @@ -22,14 +21,15 @@ func (s *Server) hoverFromFile(function string, result store.LookupResult) (*pro return nil, nil } + tf := NewTokenizedFile(text) var doc, spec, signature string if function == "" { - doc = extractModuledoc(lines, defIdx) + doc = tf.ExtractModuledoc(defIdx) signature = strings.TrimSpace(lines[defIdx]) signature = strings.TrimSuffix(signature, " do") } else { - doc, spec = extractDocAbove(lines, defIdx) + doc, spec = tf.ExtractDocAbove(defIdx) signature = extractSignature(lines, defIdx) } @@ -46,9 +46,9 @@ func (s *Server) hoverFromFile(function string, result store.LookupResult) (*pro }, nil } -func (s *Server) hoverFromBuffer(text string, defIdx int) (*protocol.Hover, error) { +func (s *Server) hoverFromBuffer(tf *TokenizedFile, text string, defIdx int) (*protocol.Hover, error) { lines := strings.Split(text, "\n") - doc, spec := extractDocAbove(lines, defIdx) + doc, spec := tf.ExtractDocAbove(defIdx) signature := extractSignature(lines, defIdx) content := formatHoverContent(doc, spec, signature) @@ -74,146 +74,6 @@ func extractSignature(lines []string, defIdx int) string { return sig } -// extractDocAbove scans the region above a function definition to find the -// @doc content and @spec that precede it. -func extractDocAbove(lines []string, defIdx int) (doc, spec string) { - // Scan backward to find the previous function/module boundary so we don't - // have to process the entire file — the relevant doc block is always between - // the previous definition and this one. We must skip heredoc content so that - // example code inside @doc blocks (e.g. "defmodule MyApp.Worker do") doesn't - // get mistaken for a real boundary. - start := 0 - inHeredocBack := false - for i := defIdx - 1; i >= 0; i-- { - trimmed := strings.TrimSpace(lines[i]) - if !inHeredocBack && (trimmed == `"""` || trimmed == `'''`) { - inHeredocBack = true - continue - } - if inHeredocBack { - if strings.HasSuffix(trimmed, `"""`) || strings.HasSuffix(trimmed, `'''`) { - inHeredocBack = false - } - continue - } - if parser.FuncDefRe.MatchString(lines[i]) || parser.DefmoduleRe.MatchString(lines[i]) || parser.TypeDefRe.MatchString(lines[i]) { - start = i + 1 - break - } - } - - var currentDoc string - var currentSpec []string - inDocHeredoc := false - var docLines []string - inSpecBlock := false - - for i := start; i < defIdx; i++ { - trimmed := strings.TrimSpace(lines[i]) - - if inDocHeredoc { - if trimmed == `"""` { - inDocHeredoc = false - currentDoc = dedentBlock(docLines) - docLines = nil - } else { - docLines = append(docLines, lines[i]) - } - continue - } - - if inSpecBlock { - if trimmed == "" || strings.HasPrefix(trimmed, "@") || strings.HasPrefix(trimmed, "def") { - inSpecBlock = false - } else { - currentSpec = append(currentSpec, lines[i]) - continue - } - } - - if trimmed == `@doc """` || trimmed == `@doc ~S"""` || trimmed == `@doc ~s"""` || - trimmed == `@typedoc """` || trimmed == `@typedoc ~S"""` || trimmed == `@typedoc ~s"""` { - inDocHeredoc = true - docLines = nil - continue - } - - if strings.HasPrefix(trimmed, `@doc "`) { - currentDoc = extractQuotedString(trimmed[5:]) - continue - } - - if strings.HasPrefix(trimmed, `@typedoc "`) { - currentDoc = extractQuotedString(trimmed[9:]) - continue - } - - if trimmed == "@doc false" || trimmed == "@typedoc false" { - currentDoc = "" - continue - } - - if strings.HasPrefix(trimmed, "@spec ") { - currentSpec = []string{lines[i]} - inSpecBlock = true - continue - } - - if parser.FuncDefRe.MatchString(lines[i]) || parser.DefmoduleRe.MatchString(lines[i]) || parser.TypeDefRe.MatchString(lines[i]) { - currentDoc = "" - currentSpec = nil - } - } - - if len(currentSpec) > 0 { - spec = strings.TrimSpace(strings.Join(currentSpec, "\n")) - } - - return currentDoc, spec -} - -// extractModuledoc scans forward from a defmodule line to find the @moduledoc content. -func extractModuledoc(lines []string, moduleIdx int) string { - for i := moduleIdx + 1; i < len(lines); i++ { - trimmed := strings.TrimSpace(lines[i]) - - if trimmed == "" { - continue - } - - if trimmed == `@moduledoc """` || trimmed == `@moduledoc ~S"""` || trimmed == `@moduledoc ~s"""` { - var docLines []string - for j := i + 1; j < len(lines); j++ { - if strings.TrimSpace(lines[j]) == `"""` { - return dedentBlock(docLines) - } - docLines = append(docLines, lines[j]) - } - return "" - } - - if strings.HasPrefix(trimmed, `@moduledoc "`) { - return extractQuotedString(trimmed[len("@moduledoc "):]) - } - - if trimmed == "@moduledoc false" { - return "" - } - - if strings.HasPrefix(trimmed, "use ") || strings.HasPrefix(trimmed, "import ") || - strings.HasPrefix(trimmed, "alias ") || strings.HasPrefix(trimmed, "require ") || - strings.HasPrefix(trimmed, "@") || strings.HasPrefix(trimmed, "#") { - continue - } - - if strings.HasPrefix(trimmed, "def") || trimmed == "end" { - break - } - } - - return "" -} - func extractQuotedString(s string) string { if len(s) < 2 || s[0] != '"' { return "" diff --git a/internal/lsp/hover_test.go b/internal/lsp/hover_test.go index 2f8b336..1df8cac 100644 --- a/internal/lsp/hover_test.go +++ b/internal/lsp/hover_test.go @@ -20,8 +20,8 @@ func TestExtractDocAbove_Heredoc(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, spec := extractDocAbove(lines, 6) + + doc, spec := NewTokenizedFile(src).ExtractDocAbove(6) if doc == "" { t.Fatal("expected doc, got empty") @@ -44,8 +44,8 @@ func TestExtractDocAbove_SingleLine(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, _ := extractDocAbove(lines, 2) + + doc, _ := NewTokenizedFile(src).ExtractDocAbove(2) if doc != "Creates a new user." { t.Errorf("expected 'Creates a new user.', got %q", doc) @@ -62,8 +62,8 @@ func TestExtractDocAbove_WithSpec(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, spec := extractDocAbove(lines, 5) + + doc, spec := NewTokenizedFile(src).ExtractDocAbove(5) if !strings.Contains(doc, "Creates a new user") { t.Errorf("expected doc content, got %q", doc) @@ -81,8 +81,8 @@ func TestExtractDocAbove_MultiLineSpec(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - _, spec := extractDocAbove(lines, 3) + + _, spec := NewTokenizedFile(src).ExtractDocAbove(3) if !strings.Contains(spec, "@spec create") { t.Errorf("expected spec to contain '@spec create', got %q", spec) @@ -99,8 +99,8 @@ func TestExtractDocAbove_DocFalse(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, _ := extractDocAbove(lines, 2) + + doc, _ := NewTokenizedFile(src).ExtractDocAbove(2) if doc != "" { t.Errorf("expected empty doc for @doc false, got %q", doc) @@ -113,8 +113,8 @@ func TestExtractDocAbove_NoDoc(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, spec := extractDocAbove(lines, 1) + + doc, spec := NewTokenizedFile(src).ExtractDocAbove(1) if doc != "" { t.Errorf("expected no doc, got %q", doc) @@ -137,8 +137,8 @@ func TestExtractDocAbove_DoesNotLeakFromPreviousFunction(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, _ := extractDocAbove(lines, 8) + + doc, _ := NewTokenizedFile(src).ExtractDocAbove(8) if doc != "" { t.Errorf("expected no doc for second function, got %q", doc) @@ -153,8 +153,8 @@ func TestExtractDocAbove_SpecBeforeDoc(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, spec := extractDocAbove(lines, 3) + + doc, spec := NewTokenizedFile(src).ExtractDocAbove(3) if doc != "Creates a user." { t.Errorf("expected doc, got %q", doc) @@ -176,8 +176,8 @@ func TestExtractModuledoc_Heredoc(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc := extractModuledoc(lines, 0) + + doc := NewTokenizedFile(src).ExtractModuledoc(0) if !strings.Contains(doc, "Manages user accounts") { t.Errorf("expected moduledoc content, got %q", doc) @@ -195,22 +195,108 @@ func TestExtractModuledoc_SingleLine(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc := extractModuledoc(lines, 0) + + doc := NewTokenizedFile(src).ExtractModuledoc(0) if doc != "Manages user accounts." { t.Errorf("expected 'Manages user accounts.', got %q", doc) } } +func TestExtractModuledoc_SigilHeredoc(t *testing.T) { + src := `defmodule MyApp.Users do + @moduledoc ~S""" + Manages user accounts. + + Keep #{interpolation} literal. + """ + + def create(attrs) do + :ok + end +end` + + doc := NewTokenizedFile(src).ExtractModuledoc(0) + + if !strings.Contains(doc, "Manages user accounts.") { + t.Errorf("expected moduledoc content, got %q", doc) + } + if !strings.Contains(doc, "#{interpolation}") { + t.Errorf("expected sigil heredoc content, got %q", doc) + } +} + +func TestExtractModuledoc_SigilVariants(t *testing.T) { + tests := []struct { + name string + src string + want string + }{ + { + name: "raw pipe delimiter", + src: `defmodule MyApp.Users do + @moduledoc ~S|Pipe-delimited docs| +end`, + want: "Pipe-delimited docs", + }, + { + name: "escaped paren delimiter", + src: `defmodule MyApp.Users do + @moduledoc ~s(Paren docs) +end`, + want: "Paren docs", + }, + { + name: "single quote delimiter", + src: `defmodule MyApp.Users do + @moduledoc ~s'Single-quoted docs' +end`, + want: "Single-quoted docs", + }, + { + name: "delimiter with modifier", + src: `defmodule MyApp.Users do + @moduledoc ~s|Docs with modifier|m +end`, + want: "Docs with modifier", + }, + { + name: "multi-char uppercase sigil", + src: `defmodule MyApp.Users do + @moduledoc ~HTML|HTML-like docs| +end`, + want: "HTML-like docs", + }, + { + name: "single quote heredoc", + src: `defmodule MyApp.Users do + @moduledoc ~S''' + Multi-line docs. + Keep #{raw} literal. + ''' +end`, + want: "Multi-line docs.\nKeep #{raw} literal.", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + doc := NewTokenizedFile(tt.src).ExtractModuledoc(0) + if doc != tt.want { + t.Errorf("ExtractModuledoc() = %q, want %q", doc, tt.want) + } + }) + } +} + func TestExtractModuledoc_False(t *testing.T) { src := `defmodule MyApp.Internal do @moduledoc false def helper(x), do: x end` - lines := strings.Split(src, "\n") - doc := extractModuledoc(lines, 0) + + doc := NewTokenizedFile(src).ExtractModuledoc(0) if doc != "" { t.Errorf("expected empty doc for @moduledoc false, got %q", doc) @@ -230,8 +316,8 @@ func TestExtractModuledoc_AfterUseAndAlias(t *testing.T) { Repo.all(User) end end` - lines := strings.Split(src, "\n") - doc := extractModuledoc(lines, 0) + + doc := NewTokenizedFile(src).ExtractModuledoc(0) if !strings.Contains(doc, "Users context module") { t.Errorf("expected moduledoc after use/alias, got %q", doc) @@ -244,8 +330,8 @@ func TestExtractModuledoc_None(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc := extractModuledoc(lines, 0) + + doc := NewTokenizedFile(src).ExtractModuledoc(0) if doc != "" { t.Errorf("expected no moduledoc, got %q", doc) @@ -574,11 +660,10 @@ func TestHover_TypeDocNotLeakingAcrossTypes(t *testing.T) { @type first :: integer() @type second :: string() end` - lines := strings.Split(src, "\n") // extractDocAbove for second (line index 3) should find no doc — // the @typedoc belongs to first, not second. - doc, _ := extractDocAbove(lines, 3) + doc, _ := NewTokenizedFile(src).ExtractDocAbove(3) if doc != "" { t.Errorf("expected no doc for second type, got %q", doc) } @@ -960,8 +1045,8 @@ func TestHover_SigilHeredoc(t *testing.T) { :ok end end` - lines := strings.Split(src, "\n") - doc, _ := extractDocAbove(lines, 6) + + doc, _ := NewTokenizedFile(src).ExtractDocAbove(6) if !strings.Contains(doc, "Creates a user") { t.Errorf("expected doc from sigil heredoc, got %q", doc) @@ -1026,3 +1111,119 @@ end` t.Errorf("expected submodule in hover, got %q", hover.Contents.Value) } } + +func TestHover_MultiLineAliasBlock(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/accounts.ex", `defmodule MyApp.Accounts do + @moduledoc "The Accounts context." + def list, do: [] +end +`) + + indexFile(t, server.store, server.projectRoot, "lib/users.ex", `defmodule MyApp.Users do + @moduledoc "User management." + def get(id), do: nil +end +`) + + src := `defmodule MyApp.Web do + alias MyApp.{ + Accounts, + Users + } + + def run, do: Accounts.list() +end` + uri := "file:///test.ex" + server.docs.Set(uri, src) + + // Hover on "Accounts" inside the multi-line alias block (line 2, col 4) + hover := hoverAt(t, server, uri, 2, 4) + if hover == nil { + t.Fatal("expected hover for Accounts inside multi-line alias block") + } + if !strings.Contains(hover.Contents.Value, "The Accounts context") { + t.Errorf("expected moduledoc in hover, got %q", hover.Contents.Value) + } + + // Hover on "Users" inside the multi-line alias block (line 3, col 4) + hover = hoverAt(t, server, uri, 3, 4) + if hover == nil { + t.Fatal("expected hover for Users inside multi-line alias block") + } + if !strings.Contains(hover.Contents.Value, "User management") { + t.Errorf("expected moduledoc in hover, got %q", hover.Contents.Value) + } + + // Trailing brace on content line: alias MyApp.{ Users } + src2 := `defmodule MyApp.Web do + alias MyApp.{ + Accounts } +end` + uri2 := "file:///test2.ex" + server.docs.Set(uri2, src2) + + hover = hoverAt(t, server, uri2, 2, 6) + if hover == nil { + t.Fatal("expected hover for module on line with trailing brace") + } + if !strings.Contains(hover.Contents.Value, "The Accounts context") { + t.Errorf("expected moduledoc in hover, got %q", hover.Contents.Value) + } +} + +func TestHover_UseInjectedMultilineAlias(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/helpers.ex", `defmodule MyApp.Helpers do + @moduledoc "Helper utilities." + def help, do: :ok +end +`) + + // Module with __using__ that has a multiline alias ... as: + // AND uses the alias in an import within the same quote block. + indexFile(t, server.store, server.projectRoot, "lib/base.ex", `defmodule MyApp.Base do + defmacro __using__(_opts) do + quote do + alias MyApp.Helpers, + as: H + + import H + end + end +end +`) + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyApp.Consumer do + use MyApp.Base + + def run, do: help() +end`) + + // help() should resolve via import H → MyApp.Helpers + hover := hoverAt(t, server, uri, 3, 15) + if hover == nil { + t.Fatal("expected hover for help() injected via multiline alias + import in __using__") + } + if !strings.Contains(hover.Contents.Value, "def help") { + t.Errorf("expected help signature in hover, got %q", hover.Contents.Value) + } +} + +func TestHover_DefKeywordNoCrash(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyApp.Foo do + def bar, do: :ok +end`) + + hover := hoverAt(t, server, uri, 1, 3) + _ = hover +} diff --git a/internal/lsp/server.go b/internal/lsp/server.go index 72c78de..492f421 100644 --- a/internal/lsp/server.go +++ b/internal/lsp/server.go @@ -525,8 +525,14 @@ func (s *Server) Definition(ctx context.Context, params *protocol.DefinitionPara return nil, nil } + // Get cached tokens for efficient multi-query operations + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) + } + // Check for @module_attribute reference first - if attrName := ExtractModuleAttribute(lines[lineNum], col); attrName != "" { + if attrName := tf.ModuleAttributeAtCursor(lineNum, col); attrName != "" { if line, found := FindModuleAttributeDefinition(text, attrName); found { return []protocol.Location{{ URI: params.TextDocument.URI, @@ -536,24 +542,21 @@ func (s *Server) Definition(ctx context.Context, params *protocol.DefinitionPara return nil, nil } - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { + exprCtx := tf.ExpressionAtCursor(lineNum, col) + if exprCtx.Empty() { return nil, nil } - // Substitute __MODULE__ with the actual module name so that expressions - // like __MODULE__.User resolve correctly through normal alias/module paths. - if strings.Contains(expr, "__MODULE__") { - for _, l := range lines { - if m := parser.DefmoduleRe.FindStringSubmatch(l); m != nil { - expr = strings.ReplaceAll(expr, "__MODULE__", m[1]) - break - } + expr := tf.ResolveModuleExpr(exprCtx.Expr(), lineNum) + moduleRef, functionName := ExtractModuleAndFunction(expr) + + if moduleRef != "" { + if aliasParent, inBlock := ExtractAliasBlockParent(lines, lineNum); inBlock { + moduleRef = aliasParent + "." + moduleRef } } - moduleRef, functionName := ExtractModuleAndFunction(expr) - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) s.debugf("Definition: expr=%q module=%q function=%q", expr, moduleRef, functionName) @@ -575,7 +578,7 @@ func (s *Server) Definition(ctx context.Context, params *protocol.DefinitionPara } } - currentModule := firstDefmodule(lines) + currentModule := tf.FirstDefmodule() fullModule := s.resolveBareFunctionModule(uriToPath(protocol.DocumentURI(docURI)), text, lines, lineNum, functionName, aliases) s.debugf("Definition: resolved bare %q -> %q", functionName, fullModule) if fullModule == "" { @@ -585,7 +588,7 @@ func (s *Server) Definition(ctx context.Context, params *protocol.DefinitionPara // Current module — return buffer location directly (works before indexing) if fullModule == currentModule { - if line, found := FindFunctionDefinition(text, functionName); found { + if line, found := tf.FindFunctionDefinition(functionName); found { return []protocol.Location{{ URI: params.TextDocument.URI, Range: lineRange(line - 1), @@ -779,17 +782,22 @@ func (s *Server) CodeAction(ctx context.Context, params *protocol.CodeActionPara // Find the full dotted expression at the cursor so that "DocuSign.Client.request" // gives us the complete module reference, not just the segment under the cursor. col := int(params.Range.Start.Character) - fullExpr := ExtractFullExpression(lines[lineNum], col) - if fullExpr == "" { + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) + } + + exprCtx := tf.FullExpressionAtCursor(lineNum, col) + if exprCtx.Empty() { return nil, nil } - moduleRef, _ := ExtractModuleAndFunction(fullExpr) + moduleRef := exprCtx.ModuleRef if moduleRef == "" { return nil, nil } - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) // Check if the first segment is already aliased — if so, the reference @@ -812,11 +820,7 @@ func (s *Server) CodeAction(ctx context.Context, params *protocol.CodeActionPara lastSegment := moduleLastSegment(moduleRef) aliasText := indent + "alias " + moduleRef + "\n" - // Replace the module part of the expression on the current line - // with just the last segment (e.g. "MyApp.RandomAPI.Client" → "Client"). - // Use the expression start column rather than strings.Index so that - // duplicate module references on the same line are not misidentified. - _, exprStart := extractExpressionBounds(lines[lineNum], col) + exprStart := exprCtx.ExprStart var edits []protocol.TextEdit // Insert the alias line edits = append(edits, protocol.TextEdit{ @@ -949,6 +953,46 @@ func (s *Server) Completion(ctx context.Context, params *protocol.CompletionPara } prefix, afterDot, prefixStartCol := ExtractCompletionContext(lines[lineNum], col) + + // Inside a multi-line alias block: complete child module segments under the parent. + if aliasParent, inBlock := ExtractAliasBlockParent(lines, lineNum); inBlock { + searchParent := aliasParent + segmentPrefix := prefix + labelPrefix := "" + + if afterDot && prefix != "" { + searchParent = aliasParent + "." + prefix + segmentPrefix = "" + labelPrefix = prefix + "." + } else if prefix != "" { + if dotIdx := strings.LastIndexByte(prefix, '.'); dotIdx >= 0 { + searchParent = aliasParent + "." + prefix[:dotIdx] + segmentPrefix = prefix[dotIdx+1:] + labelPrefix = prefix[:dotIdx+1] + } + } + + segments, err := s.store.SearchSubmoduleSegments(searchParent, segmentPrefix) + if err != nil { + return nil, nil + } + var items []protocol.CompletionItem + for _, segment := range segments { + items = append(items, protocol.CompletionItem{ + Label: labelPrefix + segment, + Kind: protocol.CompletionItemKindModule, + Detail: searchParent + "." + segment, + }) + } + if len(items) == 0 { + return nil, nil + } + return &protocol.CompletionList{ + IsIncomplete: len(items) >= 100, + Items: items, + }, nil + } + if prefix == "" && !afterDot { return nil, nil } @@ -963,6 +1007,12 @@ func (s *Server) Completion(ctx context.Context, params *protocol.CompletionPara moduleRef, funcPrefix := ExtractModuleAndFunction(prefix) inPipe := IsPipeContext(lines[lineNum], prefixStartCol) + // "Module.func." or "variable." — dot after a function call result or + // map/struct field access. We have no type info to complete the result. + if afterDot && (funcPrefix != "" || moduleRef == "") { + return nil, nil + } + var items []protocol.CompletionItem if moduleRef != "" && (afterDot || funcPrefix != "") { @@ -1526,16 +1576,6 @@ func (s *Server) addCompletionsFromUsing(moduleName, funcPrefix string, seen map } } -// firstDefmodule returns the first defmodule name found in the file, or "". -func firstDefmodule(lines []string) string { - for _, l := range lines { - if m := parser.DefmoduleRe.FindStringSubmatch(l); m != nil { - return m[1] - } - } - return "" -} - // resolveBareFunctionModule finds the module that defines a bare function name. // Mirrors the go-to-definition priority: current file modules → imports → use chains → Kernel. // Callers should pass pre-computed aliases to avoid redundant ExtractAliases scans. @@ -1774,7 +1814,8 @@ func (s *Server) CompletionResolve(ctx context.Context, params *protocol.Complet return params, nil } - doc, spec := extractDocAbove(lines, defIdx) + tf := NewTokenizedFile(string(fileData)) + doc, spec := tf.ExtractDocAbove(defIdx) signature := extractSignature(lines, defIdx) content := formatHoverContent(doc, spec, signature) @@ -1821,12 +1862,16 @@ func (s *Server) Declaration(ctx context.Context, params *protocol.DeclarationPa arity = ar } if functionName == "" { - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) + } + exprCtx := tf.ExpressionAtCursor(lineNum, col) + if exprCtx.Empty() { s.debugf("Declaration: no expression at cursor") return nil, nil } - _, functionName = ExtractModuleAndFunction(expr) + functionName = exprCtx.FunctionName if functionName == "" { s.debugf("Declaration: no function name in expression") return nil, nil @@ -2040,21 +2085,18 @@ func (s *Server) DocumentHighlight(ctx context.Context, params *protocol.Documen return highlights, nil } - // Extract the cursor's line without splitting the entire document - line, ok := nthLine(text, lineNum) - if !ok || line == "" { - return nil, nil + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) } - - expr := ExtractExpression(line, col) - if expr == "" { + curCtx := tf.ExpressionAtCursor(lineNum, col) + if curCtx.Empty() { return nil, nil } - moduleRef, functionName := ExtractModuleAndFunction(expr) - token := functionName + token := curCtx.FunctionName if token == "" { - token = moduleLastSegment(moduleRef) + token = moduleLastSegment(curCtx.ModuleRef) } if token == "" { return nil, nil @@ -2091,28 +2133,51 @@ func (s *Server) DocumentSymbol(ctx context.Context, params *protocol.DocumentSy return nil, nil } + var tokens []parser.Token + var source []byte + if cachedTokens, cachedSrc, ok := s.docs.GetTokens(docURI); ok { + tokens = cachedTokens + source = cachedSrc + } else { + source = []byte(text) + tokens = parser.Tokenize(source) + } + lines := strings.Split(text, "\n") lastLine := len(lines) - 1 + n := len(tokens) + + tokText := func(t parser.Token) string { return string(source[t.Start:t.End]) } + + // tokCol returns the 0-based column of token t within its line. + tokCol := func(t parser.Token) int { + lineStart := t.Start + for lineStart > 0 && source[lineStart-1] != '\n' { + lineStart-- + } + return t.Start - lineStart + } + + // nextSig returns the index of the next significant (non-EOL, non-comment) token. + nextSig := func(from int) int { return parser.NextSigToken(tokens, n, from) } type symbolEntry struct { symbol protocol.DocumentSymbol - module string // owning module name (empty for top-level modules) - parentIdx int // index of parent entry (-1 for top-level) + module string + parentIdx int } type blockFrame struct { - name string // module full name, or "" for functions - indent int - entryIdx int // index into entries slice + name string + depth int + entryIdx int } var entries []symbolEntry - var moduleStack []blockFrame // defmodule/defprotocol/defimpl - var funcStack []blockFrame // def/defp/defmacro/describe/test/etc with do...end bodies - inHeredoc := false + var moduleStack []blockFrame + var funcStack []blockFrame + depth := 0 - // currentParentIdx returns the index of the innermost enclosing block. - // funcStack entries (describe blocks) take priority over moduleStack. currentParentIdx := func() int { if len(funcStack) > 0 { return funcStack[len(funcStack)-1].entryIdx @@ -2123,321 +2188,398 @@ func (s *Server) DocumentSymbol(ctx context.Context, params *protocol.DocumentSy return -1 } - for lineIdx, line := range lines { - if strings.IndexByte(line, '"') >= 0 { - quoteCount := strings.Count(line, `"""`) - if quoteCount > 0 { - if quoteCount >= 2 { - continue - } - inHeredoc = !inHeredoc + currentModule := func() string { + if len(moduleStack) > 0 { + return moduleStack[len(moduleStack)-1].name + } + return "" + } + + // lineEndChar returns the character length of the given 0-based line index. + lineEndChar := func(lineIdx int) int { + if lineIdx >= 0 && lineIdx < len(lines) { + return len(lines[lineIdx]) + } + return 0 + } + + // isLineFirstSignificant returns true if tokens[i] is the first + // non-EOL/comment token on its line. + isLineFirstSignificant := func(i int) bool { + tokLine := tokens[i].Line + for j := i - 1; j >= 0; j-- { + k := tokens[j].Kind + if k == parser.TokEOL || k == parser.TokEOF { + return true + } + if k == parser.TokComment { continue } + if tokens[j].Line == tokLine { + return false + } + return true } - if strings.IndexByte(line, '\'') >= 0 { - quoteCount := strings.Count(line, `'''`) - if quoteCount > 0 { - if quoteCount >= 2 { - continue - } - inHeredoc = !inHeredoc + return true + } + + for i := 0; i < n; i++ { + tok := tokens[i] + + switch tok.Kind { + case parser.TokEnd: + if !isLineFirstSignificant(i) { + parser.TrackBlockDepth(tok.Kind, &depth) continue } - } - if inHeredoc { - continue - } + lineIdx := tok.Line - 1 + endPos := protocol.Position{Line: uint32(lineIdx), Character: uint32(lineEndChar(lineIdx))} + + prevDepth := depth + parser.TrackBlockDepth(tok.Kind, &depth) + + if len(funcStack) > 0 && funcStack[len(funcStack)-1].depth == prevDepth { + entries[funcStack[len(funcStack)-1].entryIdx].symbol.Range.End = endPos + funcStack = funcStack[:len(funcStack)-1] + } else if len(moduleStack) > 0 && moduleStack[len(moduleStack)-1].depth == prevDepth { + entries[moduleStack[len(moduleStack)-1].entryIdx].symbol.Range.End = endPos + moduleStack = moduleStack[:len(moduleStack)-1] + } - trimStart := 0 - for trimStart < len(line) && (line[trimStart] == ' ' || line[trimStart] == '\t') { - trimStart++ - } - if trimStart >= len(line) { - continue - } - first := line[trimStart] - rest := line[trimStart:] + case parser.TokDefmodule, parser.TokDefprotocol, parser.TokDefimpl: + if !isLineFirstSignificant(i) { + continue + } + keyword := tokText(tok) + lineIdx := tok.Line - 1 + indent := tokCol(tok) - // Fast first-character dispatch - if first != 'd' && first != 'e' && first != '@' && (first < 'a' || first > 'z') { - continue - } + j := nextSig(i + 1) + name, _ := parser.CollectModuleName(source, tokens, n, j) + if name == "" { + continue + } - // 'e' — pop stacks on "end" - if first == 'e' { - if len(rest) >= 3 && rest[0] == 'e' && rest[1] == 'n' && rest[2] == 'd' && - (len(rest) == 3 || rest[3] == ' ' || rest[3] == '\t' || rest[3] == '\r') { - trimmedEnd := strings.TrimRight(rest, " \t\r") - if trimmedEnd == "end" { - endPos := protocol.Position{Line: uint32(lineIdx), Character: uint32(len(line))} - if len(funcStack) > 0 && funcStack[len(funcStack)-1].indent == trimStart { - entries[funcStack[len(funcStack)-1].entryIdx].symbol.Range.End = endPos - funcStack = funcStack[:len(funcStack)-1] - } else if len(moduleStack) > 0 && moduleStack[len(moduleStack)-1].indent == trimStart { - entries[moduleStack[len(moduleStack)-1].entryIdx].symbol.Range.End = endPos - moduleStack = moduleStack[:len(moduleStack)-1] - } - } + curMod := currentModule() + fullName := name + if !strings.Contains(name, ".") && curMod != "" { + fullName = curMod + "." + name } - continue - } - currentModule := "" - if len(moduleStack) > 0 { - currentModule = moduleStack[len(moduleStack)-1].name - } - - // 'd' — defmodule/defprotocol/defimpl/def*/defstruct/defexception - if first == 'd' && strings.HasPrefix(rest, "def") { - - // Try module-level keywords first, ordered by frequency - var matchedKeyword string - switch { - case strings.HasPrefix(rest, "defmodule") && len(rest) > 9 && (rest[9] == ' ' || rest[9] == '\t'): - matchedKeyword = "defmodule" - case strings.HasPrefix(rest, "defimpl") && len(rest) > 7 && (rest[7] == ' ' || rest[7] == '\t'): - matchedKeyword = "defimpl" - case strings.HasPrefix(rest, "defprotocol") && len(rest) > 11 && (rest[11] == ' ' || rest[11] == '\t'): - matchedKeyword = "defprotocol" - } - if matchedKeyword != "" { - after := strings.TrimLeft(rest[len(matchedKeyword)+1:], " \t") - name := parser.ScanModuleName(after) - if name != "" { - fullName := name - if !strings.Contains(name, ".") && currentModule != "" { - fullName = currentModule + "." + name - } + kind := defKindToSymbolKind(keyword) + if keyword == "defmodule" { + kind = defKindToSymbolKind("module") + } - kind := defKindToSymbolKind(matchedKeyword) - if matchedKeyword == "defmodule" { - kind = defKindToSymbolKind("module") - } + nameCol := strings.Index(lines[lineIdx], name) + if nameCol < 0 { + nameCol = indent + } - nameCol := strings.Index(line, name) - if nameCol < 0 { - nameCol = trimStart - } + entryIdx := len(entries) + moduleParentIdx := -1 + if len(moduleStack) > 0 { + moduleParentIdx = moduleStack[len(moduleStack)-1].entryIdx + } + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: name, + Detail: keyword, + Kind: kind, + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: protocol.Position{Line: uint32(lastLine), Character: 0}, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(name))}, + }, + }, + module: curMod, + parentIdx: moduleParentIdx, + }) + _, nextPos, hasDoBlock := parser.ScanForwardToBlockDo(tokens, n, j) + if hasDoBlock { + depth++ + moduleStack = append(moduleStack, blockFrame{name: fullName, depth: depth, entryIdx: entryIdx}) + i = nextPos - 1 + } - entryIdx := len(entries) - moduleParentIdx := -1 - if len(moduleStack) > 0 { - moduleParentIdx = moduleStack[len(moduleStack)-1].entryIdx - } - entries = append(entries, symbolEntry{ - symbol: protocol.DocumentSymbol{ - Name: name, - Detail: matchedKeyword, - Kind: kind, - Range: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, - End: protocol.Position{Line: uint32(lastLine), Character: 0}, - }, - SelectionRange: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(name))}, - }, - }, - module: currentModule, - parentIdx: moduleParentIdx, - }) - moduleStack = append(moduleStack, blockFrame{name: fullName, indent: trimStart, entryIdx: entryIdx}) - continue - } + case parser.TokDef, parser.TokDefp, parser.TokDefmacro, parser.TokDefmacrop, + parser.TokDefguard, parser.TokDefguardp, parser.TokDefdelegate: + if !isLineFirstSignificant(i) { + continue + } + curMod := currentModule() + if curMod == "" { + continue } + kind := tokText(tok) + lineIdx := tok.Line - 1 - // Function/macro/guard/delegate definitions - if currentModule != "" { - if kind, funcName, ok := parser.ScanFuncDef(rest); ok { - arity := parser.ExtractArity(line, funcName) - nameWithArity := fmt.Sprintf("%s/%d", funcName, arity) + j := nextSig(i + 1) + if j >= n || tokens[j].Kind != parser.TokIdent { + continue + } + funcName := tokText(tokens[j]) + nameCol := tokCol(tokens[j]) + j = nextSig(j + 1) + arity, _, _, _ := parser.CollectParams(source, tokens, n, j) + nameWithArity := fmt.Sprintf("%s/%d", funcName, arity) - nameCol := strings.Index(line, funcName) - if nameCol < 0 { - nameCol = trimStart - } + _, nextPos, hasDoBlock := parser.ScanForwardToBlockDo(tokens, n, j) - hasDoBlock := false - trimmedRight := strings.TrimRight(rest, " \t\r") - if strings.HasSuffix(trimmedRight, " do") || strings.HasSuffix(trimmedRight, "\tdo") { - hasDoBlock = true - } + rangeEnd := protocol.Position{Line: uint32(lineIdx), Character: uint32(lineEndChar(lineIdx))} + if hasDoBlock { + rangeEnd = protocol.Position{Line: uint32(lastLine), Character: 0} + } - rangeEnd := protocol.Position{Line: uint32(lineIdx), Character: uint32(len(line))} - if hasDoBlock { - rangeEnd = protocol.Position{Line: uint32(lastLine), Character: 0} - } + entryIdx := len(entries) + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: nameWithArity, + Detail: kind, + Kind: defKindToSymbolKind(kind), + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: rangeEnd, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(funcName))}, + }, + }, + module: curMod, + parentIdx: currentParentIdx(), + }) - entryIdx := len(entries) - entries = append(entries, symbolEntry{ - symbol: protocol.DocumentSymbol{ - Name: nameWithArity, - Detail: kind, - Kind: defKindToSymbolKind(kind), - Range: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, - End: rangeEnd, - }, - SelectionRange: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(funcName))}, - }, - }, - module: currentModule, - parentIdx: currentParentIdx(), - }) + if hasDoBlock { + depth++ + funcStack = append(funcStack, blockFrame{depth: depth, entryIdx: entryIdx}) + i = nextPos - 1 + } - if hasDoBlock { - funcStack = append(funcStack, blockFrame{indent: trimStart, entryIdx: entryIdx}) - } - } else if strings.HasPrefix(rest, "defstruct ") || strings.HasPrefix(rest, "defstruct\t") { - entries = append(entries, symbolEntry{ - symbol: protocol.DocumentSymbol{ - Name: "defstruct", - Detail: "defstruct", - Kind: defKindToSymbolKind("defstruct"), - Range: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(len(line))}, - }, - SelectionRange: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(trimStart)}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(trimStart + 9)}, - }, - }, - module: currentModule, - parentIdx: currentParentIdx(), - }) - } else if strings.HasPrefix(rest, "defexception ") || strings.HasPrefix(rest, "defexception\t") { - entries = append(entries, symbolEntry{ - symbol: protocol.DocumentSymbol{ - Name: "defexception", - Detail: "defexception", - Kind: defKindToSymbolKind("defexception"), - Range: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(len(line))}, - }, - SelectionRange: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(trimStart)}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(trimStart + 12)}, - }, - }, - module: currentModule, - parentIdx: currentParentIdx(), - }) - } + case parser.TokDefstruct: + if !isLineFirstSignificant(i) { + continue } - continue - } + curMod := currentModule() + if curMod == "" { + continue + } + lineIdx := tok.Line - 1 + indent := tokCol(tok) + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: "defstruct", + Detail: "defstruct", + Kind: defKindToSymbolKind("defstruct"), + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(lineEndChar(lineIdx))}, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(indent)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(indent + 9)}, + }, + }, + module: curMod, + parentIdx: currentParentIdx(), + }) + + case parser.TokDefexception: + if !isLineFirstSignificant(i) { + continue + } + curMod := currentModule() + if curMod == "" { + continue + } + lineIdx := tok.Line - 1 + indent := tokCol(tok) + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: "defexception", + Detail: "defexception", + Kind: defKindToSymbolKind("defexception"), + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(lineEndChar(lineIdx))}, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(indent)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(indent + 12)}, + }, + }, + module: curMod, + parentIdx: currentParentIdx(), + }) - // '@' — type definitions and @behaviour refs - if first == '@' { - if currentModule == "" { + case parser.TokAttrType: + if !isLineFirstSignificant(i) { continue } + curMod := currentModule() + if curMod == "" { + continue + } + attrText := tokText(tok) var kind string - var afterKw string - if strings.HasPrefix(rest, "@typep") && len(rest) > 6 && (rest[6] == ' ' || rest[6] == '\t') { + switch attrText { + case "@typep": kind = "typep" - afterKw = strings.TrimLeft(rest[6:], " \t") - } else if strings.HasPrefix(rest, "@type") && len(rest) > 5 && (rest[5] == ' ' || rest[5] == '\t') { + case "@type": kind = "type" - afterKw = strings.TrimLeft(rest[5:], " \t") - } else if strings.HasPrefix(rest, "@opaque") && len(rest) > 7 && (rest[7] == ' ' || rest[7] == '\t') { + case "@opaque": kind = "opaque" - afterKw = strings.TrimLeft(rest[7:], " \t") - } else if strings.HasPrefix(rest, "@macrocallback") && len(rest) > 14 && (rest[14] == ' ' || rest[14] == '\t') { - kind = "macrocallback" - afterKw = strings.TrimLeft(rest[14:], " \t") - } else if strings.HasPrefix(rest, "@callback") && len(rest) > 9 && (rest[9] == ' ' || rest[9] == '\t') { + default: + continue + } + lineIdx := tok.Line - 1 + + j := nextSig(i + 1) + if j >= n || tokens[j].Kind != parser.TokIdent { + continue + } + name := tokText(tokens[j]) + nameCol := tokCol(tokens[j]) + j = nextSig(j + 1) + arity, _, _, _ := parser.CollectParams(source, tokens, n, j) + nameWithArity := fmt.Sprintf("%s/%d", name, arity) + + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: nameWithArity, + Detail: "@" + kind, + Kind: defKindToSymbolKind(kind), + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(lineEndChar(lineIdx))}, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(name))}, + }, + }, + module: curMod, + parentIdx: currentParentIdx(), + }) + + case parser.TokAttrCallback: + if !isLineFirstSignificant(i) { + continue + } + curMod := currentModule() + if curMod == "" { + continue + } + attrText := tokText(tok) + var kind string + switch attrText { + case "@callback": kind = "callback" - afterKw = strings.TrimLeft(rest[9:], " \t") + case "@macrocallback": + kind = "macrocallback" + default: + continue } - if kind != "" { - name := parser.ScanFuncName(afterKw) - if name != "" { - arity := parser.ExtractArity(line, name) - nameWithArity := fmt.Sprintf("%s/%d", name, arity) + lineIdx := tok.Line - 1 - nameCol := strings.Index(line, name) - if nameCol < 0 { - nameCol = trimStart - } + j := nextSig(i + 1) + if j >= n || tokens[j].Kind != parser.TokIdent { + continue + } + name := tokText(tokens[j]) + nameCol := tokCol(tokens[j]) + j = nextSig(j + 1) + arity, _, _, _ := parser.CollectParams(source, tokens, n, j) + nameWithArity := fmt.Sprintf("%s/%d", name, arity) + + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: nameWithArity, + Detail: "@" + kind, + Kind: defKindToSymbolKind(kind), + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(lineEndChar(lineIdx))}, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(name))}, + }, + }, + module: curMod, + parentIdx: currentParentIdx(), + }) - entries = append(entries, symbolEntry{ - symbol: protocol.DocumentSymbol{ - Name: nameWithArity, - Detail: "@" + kind, - Kind: defKindToSymbolKind(kind), - Range: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(len(line))}, - }, - SelectionRange: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol)}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(nameCol + len(name))}, - }, - }, - module: currentModule, - parentIdx: currentParentIdx(), - }) - } + case parser.TokIdent: + // Bare macro calls with do blocks (describe, test, setup, etc.) + curMod := currentModule() + if curMod == "" { + continue + } + if !isLineFirstSignificant(i) { + continue + } + macroName := tokText(tok) + if parser.IsElixirKeyword(macroName) { + continue + } + // Skip keyword keys like "reduce:" — the token after the ident is TokColon + if i+1 < n && tokens[i+1].Kind == parser.TokColon { + continue + } + doIdx, nextPos, hasDoBlock := parser.ScanForwardToBlockDo(tokens, n, i+1) + if !hasDoBlock { + continue } - continue - } - // Bare macro calls with do blocks (describe, test, setup, etc.) - if currentModule != "" && first >= 'a' && first <= 'z' { - trimmedRight := strings.TrimRight(rest, " \t\r") - if strings.HasSuffix(trimmedRight, " do") || strings.HasSuffix(trimmedRight, "\tdo") { - macroName := parser.ScanFuncName(rest) - if macroName != "" && !parser.IsElixirKeyword(macroName) { - afterName := rest[len(macroName):] - doIdx := strings.LastIndex(trimmedRight, " do") - if doIdx < 0 { - doIdx = strings.LastIndex(trimmedRight, "\tdo") - } - label := macroName - if doIdx > len(macroName) { - arg := strings.TrimSpace(afterName[:doIdx-len(macroName)]) - if len(arg) >= 2 && arg[0] == '"' && arg[len(arg)-1] == '"' { - arg = arg[1 : len(arg)-1] - } - if arg != "" { - label = macroName + " " + arg - } - } + lineIdx := tok.Line - 1 + indent := tokCol(tok) - entryIdx := len(entries) - entries = append(entries, symbolEntry{ - symbol: protocol.DocumentSymbol{ - Name: label, - Detail: macroName, - Kind: protocol.SymbolKindFunction, - Range: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, - End: protocol.Position{Line: uint32(lastLine), Character: 0}, - }, - SelectionRange: protocol.Range{ - Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(trimStart)}, - End: protocol.Position{Line: uint32(lineIdx), Character: uint32(trimStart + len(macroName))}, - }, - }, - module: currentModule, - parentIdx: currentParentIdx(), - }) - funcStack = append(funcStack, blockFrame{indent: trimStart, entryIdx: entryIdx}) - } + // Extract the argument between the macro name and `do` from source bytes + label := macroName + argBytes := source[tok.End:tokens[doIdx].Start] + arg := strings.TrimSpace(string(argBytes)) + if len(arg) >= 2 && arg[0] == '"' && arg[len(arg)-1] == '"' { + arg = arg[1 : len(arg)-1] } + if arg != "" { + label = macroName + " " + arg + } + + entryIdx := len(entries) + entries = append(entries, symbolEntry{ + symbol: protocol.DocumentSymbol{ + Name: label, + Detail: macroName, + Kind: protocol.SymbolKindFunction, + Range: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: 0}, + End: protocol.Position{Line: uint32(lastLine), Character: 0}, + }, + SelectionRange: protocol.Range{ + Start: protocol.Position{Line: uint32(lineIdx), Character: uint32(indent)}, + End: protocol.Position{Line: uint32(lineIdx), Character: uint32(indent + len(macroName))}, + }, + }, + module: curMod, + parentIdx: currentParentIdx(), + }) + depth++ + funcStack = append(funcStack, blockFrame{depth: depth, entryIdx: entryIdx}) + i = nextPos - 1 + + case parser.TokDo, parser.TokFn: + parser.TrackBlockDepth(tok.Kind, &depth) } } // Build hierarchical tree using parentIdx references. - // Process in reverse so children are attached before their parents are read. type symNode struct { sym protocol.DocumentSymbol - children []int // indices of child entries + children []int } nodes := make([]symNode, len(entries)) for i, e := range entries { @@ -2497,72 +2639,63 @@ func (s *Server) FoldingRanges(ctx context.Context, params *protocol.FoldingRang return nil, nil } - lines := strings.Split(text, "\n") + // Use cached tokens if available + var tokens []parser.Token + var source []byte + if cachedTokens, cachedSrc, ok := s.docs.GetTokens(docURI); ok { + tokens = cachedTokens + source = cachedSrc + } else { + source = []byte(text) + tokens = parser.Tokenize(source) + } + n := len(tokens) + var ranges []protocol.FoldingRange - inHeredoc := false - heredocStart := 0 + // Track do/fn..end blocks by depth type blockStart struct { - line int - indent int + line int + depth int } var stack []blockStart + depth := 0 - for i, line := range lines { - // Track heredoc boundaries as foldable regions (""" and ''') - isHeredocDelimiter := false - if strings.Contains(line, `"""`) { - if strings.Count(line, `"""`) < 2 { - isHeredocDelimiter = true - } - } else if strings.Contains(line, `'''`) { - if strings.Count(line, `'''`) < 2 { - isHeredocDelimiter = true - } - } - if isHeredocDelimiter { - if !inHeredoc { - inHeredoc = true - heredocStart = i - } else { - inHeredoc = false - if i > heredocStart { - ranges = append(ranges, protocol.FoldingRange{ - StartLine: uint32(heredocStart), - EndLine: uint32(i), - }) + for i := 0; i < n; i++ { + tok := tokens[i] + + switch tok.Kind { + case parser.TokHeredoc: + // Heredocs are single tokens spanning multiple lines — fold them + // Find the end line by scanning for last newline in the token + startLine := tok.Line + endLine := startLine + for j := tok.Start; j < tok.End; j++ { + if source[j] == '\n' { + endLine++ } } - continue - } - if inHeredoc { - continue - } - - trimmed := strings.TrimSpace(line) - if trimmed == "" { - continue - } - indent := len(line) - len(strings.TrimLeft(line, " \t")) + if endLine > startLine { + ranges = append(ranges, protocol.FoldingRange{ + StartLine: uint32(startLine - 1), // convert to 0-based + EndLine: uint32(endLine - 1), + }) + } - // Strip strings/comments for block detection so content like - // `x = "foo do"` doesn't create a false folding range. - stripped := strings.TrimSpace(parser.StripCommentsAndStrings(trimmed)) + case parser.TokDo, parser.TokFn: + parser.TrackBlockDepth(tok.Kind, &depth) + stack = append(stack, blockStart{line: tok.Line, depth: depth}) - if parser.OpensBlock(stripped) { - stack = append(stack, blockStart{line: i, indent: indent}) - continue - } - - // Pop on "end" at matching indent - if parser.IsEnd(stripped) && len(stack) > 0 { - top := stack[len(stack)-1] - if indent == top.indent { + case parser.TokEnd: + prevDepth := depth + parser.TrackBlockDepth(tok.Kind, &depth) + if len(stack) > 0 && stack[len(stack)-1].depth == prevDepth { + top := stack[len(stack)-1] stack = stack[:len(stack)-1] - if i > top.line { + if tok.Line > top.line { ranges = append(ranges, protocol.FoldingRange{ - StartLine: uint32(top.line), - EndLine: uint32(i), + StartLine: uint32(top.line - 1), // convert to 0-based + EndLine: uint32(tok.Line - 1), }) } } @@ -2628,22 +2761,29 @@ func (s *Server) Hover(ctx context.Context, params *protocol.HoverParams) (*prot return nil, nil } - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { + // Get cached tokens for efficient multi-query operations + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) + } + + exprCtx := tf.ExpressionAtCursor(lineNum, col) + if exprCtx.Empty() { return nil, nil } - if strings.Contains(expr, "__MODULE__") { - for _, l := range lines { - if m := parser.DefmoduleRe.FindStringSubmatch(l); m != nil { - expr = strings.ReplaceAll(expr, "__MODULE__", m[1]) - break - } + expr := tf.ResolveModuleExpr(exprCtx.Expr(), lineNum) + moduleRef, functionName := ExtractModuleAndFunction(expr) + + // Inside a multi-line alias block like "alias MyModule.{ Something }", + // prepend the parent so "Something" resolves to "MyModule.Something". + if moduleRef != "" { + if aliasParent, inBlock := ExtractAliasBlockParent(lines, lineNum); inBlock { + moduleRef = aliasParent + "." + moduleRef } } - moduleRef, functionName := ExtractModuleAndFunction(expr) - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) if moduleRef == "" { @@ -2651,14 +2791,14 @@ func (s *Server) Hover(ctx context.Context, params *protocol.HoverParams) (*prot return nil, nil } - currentModule := firstDefmodule(lines) + currentModule := tf.FirstDefmodule() fullModule := s.resolveBareFunctionModule(uriToPath(protocol.DocumentURI(docURI)), text, lines, lineNum, functionName, aliases) if fullModule != "" { // Current module — hover from the buffer directly if fullModule == currentModule { - if line, found := FindFunctionDefinition(text, functionName); found { - return s.hoverFromBuffer(text, line-1) + if line, found := tf.FindFunctionDefinition(functionName); found { + return s.hoverFromBuffer(tf, text, line-1) } } @@ -2725,11 +2865,12 @@ func (s *Server) Implementation(ctx context.Context, params *protocol.Implementa return nil, nil } - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { - return nil, nil + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) } - _, functionName := ExtractModuleAndFunction(expr) + exprCtx := tf.ExpressionAtCursor(lineNum, col) + functionName := exprCtx.FunctionName if functionName == "" { return nil, nil } @@ -2790,11 +2931,13 @@ func (s *Server) PrepareRename(ctx context.Context, params *protocol.PrepareRena return nil, nil } - expr, exprStart := extractExpressionBounds(lines[lineNum], col) - moduleRef, functionName := "", "" - if expr != "" { - moduleRef, functionName = ExtractModuleAndFunction(expr) + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) } + exprCtx := tf.ExpressionAtCursor(lineNum, col) + moduleRef, functionName := exprCtx.ModuleRef, exprCtx.FunctionName + exprStart := exprCtx.ExprStart // For bare identifiers (no module qualifier), check tree-sitter variables // first — a local variable shadows a same-named function in Elixir. @@ -2818,7 +2961,7 @@ func (s *Server) PrepareRename(ctx context.Context, params *protocol.PrepareRena } // Try module/function rename via the index - if expr != "" { + if !exprCtx.Empty() { aliases := ExtractAliasesInScope(text, lineNum) // Detect `as:` aliases — these are file-local renames, not module renames. @@ -2934,48 +3077,52 @@ func (s *Server) References(ctx context.Context, params *protocol.ReferenceParam return nil, nil } - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { + // Get cached tokens for efficient multi-query operations + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) + } + + cursorCtx := tf.ExpressionAtCursor(lineNum, col) + if cursorCtx.Empty() { s.debugf("References: no expression at cursor") return nil, nil } // Special case: cursor on defmacro __using__ — find all `use ModuleName` sites. - if expr == "__using__" { - for _, l := range lines { - if m := parser.DefmoduleRe.FindStringSubmatch(l); m != nil { - s.debugf("References: __using__ in module %s — looking up use sites", m[1]) - allRefs, err := s.store.LookupReferences(m[1], "") - if err != nil { - return nil, nil - } - var locations []protocol.Location - for _, r := range allRefs { - if r.Kind == "use" { - locations = append(locations, protocol.Location{ - URI: uri.File(r.FilePath), - Range: lineRange(r.Line - 1), - }) - } + if cursorCtx.Expr() == "__using__" { + moduleName := tf.FirstDefmodule() + if moduleName != "" { + s.debugf("References: __using__ in module %s — looking up use sites", moduleName) + allRefs, err := s.store.LookupReferences(moduleName, "") + if err != nil { + return nil, nil + } + var locations []protocol.Location + for _, r := range allRefs { + if r.Kind == "use" { + locations = append(locations, protocol.Location{ + URI: uri.File(r.FilePath), + Range: lineRange(r.Line - 1), + }) } - s.debugf("References: returning %d use sites", len(locations)) - return locations, nil } + s.debugf("References: returning %d use sites", len(locations)) + return locations, nil } return nil, nil } - if strings.Contains(expr, "__MODULE__") { - for _, l := range lines { - if m := parser.DefmoduleRe.FindStringSubmatch(l); m != nil { - expr = strings.ReplaceAll(expr, "__MODULE__", m[1]) - break - } + expr := tf.ResolveModuleExpr(cursorCtx.Expr(), lineNum) + moduleRef, functionName := ExtractModuleAndFunction(expr) + + if moduleRef != "" { + if aliasParent, inBlock := ExtractAliasBlockParent(lines, lineNum); inBlock { + moduleRef = aliasParent + "." + moduleRef } } - moduleRef, functionName := ExtractModuleAndFunction(expr) - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) s.debugf("References: expr=%q module=%q function=%q", expr, moduleRef, functionName) @@ -3020,7 +3167,7 @@ func (s *Server) References(ctx context.Context, params *protocol.ReferenceParam // When cursor is on a defmodule line, use the store's fully-qualified // name directly — the user is asking about the module being defined, // not a reference that might be shadowed by an alias with the same name. - if m := parser.DefmoduleRe.FindStringSubmatch(lines[lineNum]); m != nil { + if _, isDefmod := IsDefmoduleLine(text, lineNum); isDefmod { if enclosing := s.store.LookupEnclosingModule(uriToPath(params.TextDocument.URI), lineNum+1); enclosing != "" { fullModule = enclosing } else { @@ -3172,6 +3319,7 @@ func (s *Server) References(ctx context.Context, params *protocol.ReferenceParam s.debugf("References: returning %d locations", len(locations)) return locations, nil } + func (s *Server) Rename(ctx context.Context, params *protocol.RenameParams) (*protocol.WorkspaceEdit, error) { docURI := string(params.TextDocument.URI) text, ok := s.docs.Get(docURI) @@ -3186,11 +3334,12 @@ func (s *Server) Rename(ctx context.Context, params *protocol.RenameParams) (*pr return nil, nil } - expr, _ := extractExpressionBounds(lines[lineNum], col) - moduleRef, functionName := "", "" - if expr != "" { - moduleRef, functionName = ExtractModuleAndFunction(expr) + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) } + renameCtx := tf.ExpressionAtCursor(lineNum, col) + moduleRef, functionName := renameCtx.ModuleRef, renameCtx.FunctionName // For bare identifiers, check tree-sitter variables first — a local // variable shadows a same-named function in Elixir. @@ -3216,7 +3365,7 @@ func (s *Server) Rename(ctx context.Context, params *protocol.RenameParams) (*pr } // Try module/function rename via the index - if expr != "" { + if !renameCtx.Empty() { aliases := ExtractAliasesInScope(text, lineNum) // Detect `as:` aliases — file-local rename of the alias name, not @@ -4009,6 +4158,7 @@ func (s *Server) buildTextEdits(sites []renameSite, oldToken, newToken string) * applyTokenEdits := func(origLines []string, fileSites []renameSite) []string { lines := make([]string, len(origLines)) copy(lines, origLines) + for _, site := range fileSites { if site.line-1 >= len(lines) { continue @@ -4241,7 +4391,13 @@ func (s *Server) SignatureHelp(ctx context.Context, params *protocol.SignatureHe lineNum := int(params.Position.Line) col := int(params.Position.Character) - funcExpr, argIndex, found := ExtractCallContext(text, lineNum, col) + // Get cached tokens for efficient multi-query operations + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) + } + + funcExpr, argIndex, found := tf.CallContextAtCursor(lineNum, col) if !found { return nil, nil } @@ -4251,7 +4407,7 @@ func (s *Server) SignatureHelp(ctx context.Context, params *protocol.SignatureHe return nil, nil } - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) lines := strings.Split(text, "\n") @@ -4264,13 +4420,13 @@ func (s *Server) SignatureHelp(ctx context.Context, params *protocol.SignatureHe } } else { // Bare function — check buffer, imports, use chains, Kernel - if defLine, found := FindFunctionDefinition(text, functionName); found { + if defLine, found := tf.FindFunctionDefinition(functionName); found { // Build signature from buffer paramNames := extractParamNames(lines, defLine-1) if paramNames == nil { return nil, nil } - sig := buildSignature(functionName, paramNames, lines, defLine-1) + sig := buildSignature(functionName, paramNames, tf, lines, defLine-1) return &protocol.SignatureHelp{ Signatures: []protocol.SignatureInformation{sig}, ActiveSignature: 0, @@ -4278,7 +4434,7 @@ func (s *Server) SignatureHelp(ctx context.Context, params *protocol.SignatureHe }, nil } - for _, mod := range ExtractImports(text) { + for _, mod := range tf.ExtractImports() { if results, err := s.store.LookupFunction(mod, functionName); err == nil && len(results) > 0 { result = &results[0] break @@ -4318,7 +4474,8 @@ func (s *Server) SignatureHelp(ctx context.Context, params *protocol.SignatureHe return nil, nil } - sig := buildSignature(functionName, paramNames, fileLines, defIdx) + tfDef := NewTokenizedFile(fileText) + sig := buildSignature(functionName, paramNames, tfDef, fileLines, defIdx) return &protocol.SignatureHelp{ Signatures: []protocol.SignatureInformation{sig}, ActiveSignature: 0, @@ -4326,7 +4483,7 @@ func (s *Server) SignatureHelp(ctx context.Context, params *protocol.SignatureHe }, nil } -func buildSignature(functionName string, paramNames []string, lines []string, defIdx int) protocol.SignatureInformation { +func buildSignature(functionName string, paramNames []string, tf *TokenizedFile, lines []string, defIdx int) protocol.SignatureInformation { label := functionName + "(" + strings.Join(paramNames, ", ") + ")" var params []protocol.ParameterInformation @@ -4342,7 +4499,7 @@ func buildSignature(functionName string, paramNames []string, lines []string, de } // Add @spec and @doc as documentation if present - doc, spec := extractDocAbove(lines, defIdx) + doc, spec := tf.ExtractDocAbove(defIdx) var docParts []string if spec != "" { docParts = append(docParts, "```elixir\n"+spec+"\n```") @@ -4405,19 +4562,19 @@ func (s *Server) TypeDefinition(ctx context.Context, params *protocol.TypeDefini return nil, nil } - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { - return nil, nil + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) } - - moduleRef, typeName := ExtractModuleAndFunction(expr) + typeCtx := tf.ExpressionAtCursor(lineNum, col) + typeName := typeCtx.FunctionName if typeName == "" { return nil, nil } - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) - fullModule := s.resolveModuleWithNesting(moduleRef, aliases, uriToPath(protocol.DocumentURI(docURI)), lineNum) + fullModule := s.resolveModuleWithNesting(typeCtx.ModuleRef, aliases, uriToPath(protocol.DocumentURI(docURI)), lineNum) results, err := s.store.LookupFunction(fullModule, typeName) if err != nil { @@ -4478,21 +4635,21 @@ func (s *Server) PrepareCallHierarchy(ctx context.Context, params *protocol.Call return nil, nil } - expr := ExtractExpression(lines[lineNum], col) - if expr == "" { - return nil, nil + tf := s.docs.GetTokenizedFile(docURI) + if tf == nil { + tf = NewTokenizedFile(text) } - - moduleRef, functionName := ExtractModuleAndFunction(expr) + callCtx := tf.ExpressionAtCursor(lineNum, col) + functionName := callCtx.FunctionName if functionName == "" { return nil, nil } - aliases := ExtractAliasesInScope(text, lineNum) + aliases := tf.ExtractAliasesInScope(lineNum) s.mergeAliasesFromUse(text, aliases) var fullModule string - if moduleRef != "" { - fullModule = resolveModule(moduleRef, aliases) + if callCtx.ModuleRef != "" { + fullModule = resolveModule(callCtx.ModuleRef, aliases) } else { fullModule = s.resolveBareFunctionModule(uriToPath(protocol.DocumentURI(docURI)), text, lines, lineNum, functionName, aliases) } @@ -4514,7 +4671,7 @@ func (s *Server) PrepareCallHierarchy(ctx context.Context, params *protocol.Call } item := protocol.CallHierarchyItem{ - Name: fmt.Sprintf("%s.%s/%d", fullModule, functionName, parser.ExtractArity(lines[lineNum], functionName)), + Name: fmt.Sprintf("%s.%s/%d", fullModule, functionName, r.Arity), Kind: protocol.SymbolKindFunction, Detail: r.Kind, URI: protocol.DocumentURI(uri.File(r.FilePath)), diff --git a/internal/lsp/server_test.go b/internal/lsp/server_test.go index 89ccfb3..64813ab 100644 --- a/internal/lsp/server_test.go +++ b/internal/lsp/server_test.go @@ -752,6 +752,130 @@ end`) } } +func TestCompletion_AliasBlock_SimplePrefix(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/services.ex", `defmodule MyApp.Services.Accounts do +end + +defmodule MyApp.Services.Analytics do +end + +defmodule MyApp.Services.Billing do +end +`) + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyModule do + alias MyApp.Services.{ + Ac + } +end`) + + // cursor at "Ac" → line 2, col 6 + items := completionAt(t, server, uri, 2, 6) + if !hasCompletionItem(items, "Accounts") { + t.Error("expected 'Accounts' in alias block completions") + } + if hasCompletionItem(items, "Analytics") { + t.Error("should not include 'Analytics' — doesn't match prefix 'Ac'") + } + if hasCompletionItem(items, "Billing") { + t.Error("should not include 'Billing' — doesn't match prefix 'Ac'") + } +} + +func TestCompletion_AliasBlock_DottedPrefix(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/ecto.ex", `defmodule MyApp.Ecto.Paginator do +end + +defmodule MyApp.Ecto.ChangesetHelpers do +end + +defmodule MyApp.Accounts do +end +`) + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyModule do + alias MyApp.{ + Ecto. + } +end`) + + // cursor after "Ecto." → line 2, col 9 + items := completionAt(t, server, uri, 2, 9) + if !hasCompletionItem(items, "Ecto.Paginator") { + t.Error("expected 'Ecto.Paginator' in alias block completions") + } + if !hasCompletionItem(items, "Ecto.ChangesetHelpers") { + t.Error("expected 'Ecto.ChangesetHelpers' in alias block completions") + } + if hasCompletionItem(items, "Accounts") { + t.Error("should not include 'Accounts' — not a child of MyApp.Ecto") + } +} + +func TestCompletion_AliasBlock_DottedPrefixWithPartial(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/ecto.ex", `defmodule MyApp.Ecto.Paginator do +end + +defmodule MyApp.Ecto.ChangesetHelpers do +end +`) + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyModule do + alias MyApp.{ + Ecto.Pag + } +end`) + + // cursor at "Ecto.Pag" → line 2, col 12 + items := completionAt(t, server, uri, 2, 12) + if !hasCompletionItem(items, "Ecto.Paginator") { + t.Error("expected 'Ecto.Paginator' in alias block completions") + } + if hasCompletionItem(items, "Ecto.ChangesetHelpers") { + t.Error("should not include 'Ecto.ChangesetHelpers' — doesn't match prefix 'Pag'") + } +} + +func TestCompletion_AliasBlock_EmptyPrefix(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/services.ex", `defmodule MyApp.Accounts do +end + +defmodule MyApp.Billing do +end +`) + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyModule do + alias MyApp.{ + + } +end`) + + // cursor on blank line inside the block → line 2, col 4 + items := completionAt(t, server, uri, 2, 4) + if !hasCompletionItem(items, "Accounts") { + t.Error("expected 'Accounts' in alias block completions with empty prefix") + } + if !hasCompletionItem(items, "Billing") { + t.Error("expected 'Billing' in alias block completions with empty prefix") + } +} + func TestCompletion_ImportedFunctions(t *testing.T) { server, cleanup := setupTestServer(t) defer cleanup() @@ -947,6 +1071,46 @@ func TestCompletion_NoResults(t *testing.T) { } } +func TestCompletion_FunctionResultDotNoResults(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/accounts.ex", `defmodule MyApp.Accounts do + def list, do: [] +end +`) + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyApp.Web do + alias MyApp.Accounts + Accounts.list. +end`) + + // col 16 = right after "Accounts.list." on line 2 + items := completionAt(t, server, uri, 2, 16) + if len(items) != 0 { + t.Errorf("expected no completions after function result dot, got %d: %v", len(items), items) + } +} + +func TestCompletion_VariableDotNoResults(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + uri := "file:///test.ex" + server.docs.Set(uri, `defmodule MyApp.Web do + def run(config) do + config. + end +end`) + + // col 11 = right after "config." on line 2 + items := completionAt(t, server, uri, 2, 11) + if len(items) != 0 { + t.Errorf("expected no completions after variable dot, got %d: %v", len(items), items) + } +} + func TestCompletionResolve_WithDoc(t *testing.T) { server, cleanup := setupTestServer(t) defer cleanup() @@ -1257,6 +1421,47 @@ end`) } } +func TestCompletion_MultilineUseOpts(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + indexFile(t, server.store, server.projectRoot, "lib/custom_mock.ex", `defmodule MyApp.CustomMock do + def mock_func, do: :ok +end +`) + + indexFile(t, server.store, server.projectRoot, "lib/mox_base.ex", `defmodule MyApp.MoxBase do + defmacro __using__(opts) do + mod = Keyword.get(opts, :mod, MyApp.DefaultMod) + quote do + import unquote(mod) + end + end +end +`) + + uri := "file:///test.ex" + src := `defmodule MyApp.Test do + use MyApp.MoxBase, + mod: MyApp.CustomMock + + def test, do: mock_func() +end` + server.docs.Set(uri, src) + + aliases := map[string]string{} + calls := ExtractUsesWithOpts(src, aliases) + found := false + for _, c := range calls { + if c.Module == "MyApp.MoxBase" && c.Opts["mod"] == "MyApp.CustomMock" { + found = true + } + } + if !found { + t.Errorf("expected use MyApp.MoxBase with mod: MyApp.CustomMock; got %+v", calls) + } +} + func TestDefinition_ModuleKeyword(t *testing.T) { server, cleanup := setupTestServer(t) defer cleanup() @@ -1628,6 +1833,74 @@ end` } } +func TestDefinition_AliasInjectedByUse_MultiAliasUnexpectedTokens_NoHang(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + // Regression: malformed tokens inside a multi-alias brace list must not hang + // alias parsing, and valid children in the same list should still resolve. + schemaSrc := `defmodule MyApp.Schema do + defmacro __using__(_opts) do + quote do + alias MyApp.{:unexpected, Meta, 42} + end + end +end` + metaSrc := `defmodule MyApp.Meta do + def source(x), do: x +end` + callerSrc := `defmodule MyApp.MyCheck do + use MyApp.Schema + + def run do + Meta.source(:foo) + end +end` + + indexFile(t, server.store, server.projectRoot, "lib/schema.ex", schemaSrc) + indexFile(t, server.store, server.projectRoot, "lib/meta.ex", metaSrc) + + callerURI := "file://" + filepath.Join(server.projectRoot, "lib/my_check.ex") + server.docs.Set(callerURI, callerSrc) + + type definitionResult struct { + locs []protocol.Location + err error + } + done := make(chan definitionResult, 1) + go func() { + locs, err := server.Definition(context.Background(), &protocol.DefinitionParams{ + TextDocumentPositionParams: protocol.TextDocumentPositionParams{ + TextDocument: protocol.TextDocumentIdentifier{URI: protocol.DocumentURI(callerURI)}, + Position: protocol.Position{Line: 4, Character: 4}, + }, + }) + done <- definitionResult{locs: locs, err: err} + }() + + select { + case got := <-done: + if got.err != nil { + t.Fatal(got.err) + } + if len(got.locs) == 0 { + t.Fatal("expected definition for Meta from use-injected alias") + } + foundMeta := false + for _, loc := range got.locs { + if strings.Contains(string(loc.URI), "meta.ex") { + foundMeta = true + break + } + } + if !foundMeta { + t.Fatalf("expected definition location in meta.ex, got %v", got.locs) + } + case <-time.After(2 * time.Second): + t.Fatal("definition timed out; possible infinite loop in multi-alias brace scanning") + } +} + func TestReferences_UseWithOptOverride(t *testing.T) { server, cleanup := setupTestServer(t) defer cleanup() @@ -3508,6 +3781,133 @@ end } } +func TestDocumentSymbol_ForReduceDoesNotAddDepth(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + content := `defmodule MyApp.Chat do + defp handle_event(socket, message) do + for {_key, %State{} = state} <- socket.assigns, + state.topic == message.topic, + reduce: socket do + socket -> + process(socket, state, message) + end + end + + defp other_func(x) do + x + 1 + end +end +` + docURI := "file:///test/chat.ex" + server.docs.Set(docURI, content) + + symbols := documentSymbols(t, server, docURI) + + if len(symbols) != 1 { + t.Fatalf("expected 1 top-level symbol, got %d", len(symbols)) + } + + mod := symbols[0] + childNames := collectNames(mod.Children) + + // Both functions should be direct children of the module, not nested. + // The "reduce: socket do" line must NOT be treated as a macro call. + expectedChildren := []string{"handle_event/2", "other_func/1"} + if len(childNames) != len(expectedChildren) { + t.Fatalf("expected %d children of module, got %d: %v", len(expectedChildren), len(childNames), childNames) + } + for i, name := range expectedChildren { + if childNames[i] != name { + t.Errorf("child %d: expected %q, got %q", i, name, childNames[i]) + } + } + + // Verify "reduce" is not present as a symbol anywhere + if found := findSymbol(symbols, "reduce socket"); found != nil { + t.Error("reduce should not appear as a document symbol") + } +} + +func TestDocumentSymbol_MisindentedInnerEndDoesNotCloseFunction(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + content := `defmodule MyApp do + def outer do + if true do + :ok + end + :still_in_outer + end + + def after_func, do: :ok +end +` + docURI := "file:///test/misindented_end.ex" + server.docs.Set(docURI, content) + + symbols := documentSymbols(t, server, docURI) + if len(symbols) != 1 { + t.Fatalf("expected 1 top-level symbol, got %d", len(symbols)) + } + + mod := symbols[0] + childNames := collectNames(mod.Children) + expectedChildren := []string{"outer/0", "after_func/0"} + if len(childNames) != len(expectedChildren) { + t.Fatalf("expected %d children of module, got %d: %v", len(expectedChildren), len(childNames), childNames) + } + for i, name := range expectedChildren { + if childNames[i] != name { + t.Errorf("child %d: expected %q, got %q", i, name, childNames[i]) + } + } + + outer := findSymbol(mod.Children, "outer/0") + if outer == nil { + t.Fatal("outer/0 not found") + } + if outer.Range.End.Line != 6 { + t.Errorf("outer end line: expected 6, got %d", outer.Range.End.Line) + } +} + +func TestDocumentSymbol_SplitLineDoTracksFunctionBody(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + content := `defmodule MyApp do + def split( + x + ) + do + x + 1 + end +end +` + docURI := "file:///test/split_line_do.ex" + server.docs.Set(docURI, content) + + symbols := documentSymbols(t, server, docURI) + if len(symbols) != 1 { + t.Fatalf("expected 1 top-level symbol, got %d", len(symbols)) + } + + mod := symbols[0] + split := findSymbol(mod.Children, "split/1") + if split == nil { + t.Fatal("split/1 not found") + } + if split.Range.Start.Line != 1 { + t.Errorf("split start line: expected 1, got %d", split.Range.Start.Line) + } + if split.Range.End.Line != 6 { + t.Errorf("split end line: expected 6, got %d", split.Range.End.Line) + } +} + // Verify capabilities are advertised func TestServer_Capabilities_DocumentSymbolAndWorkspaceSymbol(t *testing.T) { server, cleanup := setupTestServer(t) @@ -4184,3 +4584,47 @@ end`) t.Error("expected an edit replacing the module ref with 'Client'") } } + +func TestDefinition_RequireWithAs(t *testing.T) { + server, cleanup := setupTestServer(t) + defer cleanup() + + snapshotSrc := `defmodule MyApp.Snapshots.ContractSnapshotSchema do + defmacro snapshot_fields do + quote do + field(:snapshot_data, :map) + end + end +end` + + contractSrc := `defmodule MyApp.Contract do + require MyApp.Snapshots.ContractSnapshotSchema, as: ContractSnapshotSchema + + schema "contracts" do + ContractSnapshotSchema.snapshot_fields() + end +end` + + indexFile(t, server.store, server.projectRoot, "lib/snapshots/contract_snapshot_schema.ex", snapshotSrc) + snapshotURI := "file://" + filepath.Join(server.projectRoot, "lib/snapshots/contract_snapshot_schema.ex") + server.docs.Set(snapshotURI, snapshotSrc) + + contractURI := "file://" + filepath.Join(server.projectRoot, "lib/contract.ex") + server.docs.Set(contractURI, contractSrc) + + // line 4 (0-indexed): ` ContractSnapshotSchema.snapshot_fields()` — col 4 is on ContractSnapshotSchema + locs := definitionAt(t, server, contractURI, 4, 4) + if len(locs) == 0 { + t.Fatal("expected go-to-definition for ContractSnapshotSchema resolved via require with as:") + } + + found := false + for _, loc := range locs { + if strings.Contains(string(loc.URI), "contract_snapshot_schema.ex") { + found = true + } + } + if !found { + t.Errorf("expected definition location in contract_snapshot_schema.ex, got %v", locs) + } +} diff --git a/internal/parser/module_refs_test.go b/internal/parser/module_refs_test.go index 6407cc2..effd9d9 100644 --- a/internal/parser/module_refs_test.go +++ b/internal/parser/module_refs_test.go @@ -96,3 +96,33 @@ end } } } + +func TestMultiAliasBrace_UnexpectedTokenForwardProgress(t *testing.T) { + dir := t.TempDir() + content := `defmodule MyApp.Web do + alias MyApp.{:unexpected, Accounts, 42} + + def test do + Accounts.list() + end +end +` + path := filepath.Join(dir, "web.ex") + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatal(err) + } + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "MyApp.Accounts" { + found = true + } + } + if !found { + t.Error("expected MyApp.Accounts ref despite unexpected tokens in brace block") + } +} diff --git a/internal/parser/parser.go b/internal/parser/parser.go index 78ea6a1..0e2bab2 100644 --- a/internal/parser/parser.go +++ b/internal/parser/parser.go @@ -4,26 +4,9 @@ import ( "io/fs" "os" "path/filepath" - "regexp" - "strconv" "strings" ) -// Shared regex patterns used by both the parser and the LSP. -var ( - AliasRe = regexp.MustCompile(`^\s*alias\s+([A-Za-z0-9_.]+)`) - AliasAsRe = regexp.MustCompile(`^\s*alias\s+([A-Za-z0-9_.]+)\s*,\s*as:\s*([A-Za-z0-9_]+)`) - FuncDefRe = regexp.MustCompile(`^\s*(defp?|defmacrop?|defguardp?|defdelegate)\s+([a-z_][a-z0-9_?!]*)[\s(,]`) - TypeDefRe = regexp.MustCompile(`^\s*@(typep?|opaque)\s+([a-z_][a-z0-9_?!]*)`) -) - -var ( - DefmoduleRe = regexp.MustCompile(`^\s*defmodule\s+([A-Za-z0-9_.]+)\s+do`) - delegateToRe = regexp.MustCompile(`to:\s*([A-Za-z0-9_.]+)`) - delegateAsRe = regexp.MustCompile(`as:\s*:?([a-z_][a-z0-9_?!]*)`) - newStatementRe = regexp.MustCompile(`^\s*(defdelegate|defp?|defmacrop?|defguardp?|alias|import|@|end)\b`) -) - // IsElixirKeyword returns true if the name is an Elixir language keyword // (control flow or definition keyword) rather than a user-defined macro. func IsElixirKeyword(name string) bool { @@ -78,465 +61,9 @@ func ParseFile(path string) ([]Definition, []Reference, error) { // ParseText parses Elixir source text and returns definitions and references. // The path is used to populate FilePath fields but the text is not read from disk. func ParseText(path, text string) ([]Definition, []Reference, error) { - type moduleFrame struct { - name string - depth int // do..end/fn..end nesting depth when this module was opened - savedAliases map[string]string - savedInjectors map[string]bool - } - - lines := strings.Split(text, "\n") - var defs []Definition - var refs []Reference - var moduleStack []moduleFrame - depth := 0 - aliases := map[string]string{} // short name -> full module - injectors := map[string]bool{} // modules from use/import that inject bare functions - inHeredoc := false - - for lineIdx, line := range lines { - lineNum := lineIdx + 1 - - var skip bool - inHeredoc, skip = CheckHeredoc(line, inHeredoc) - if skip { - continue - } - - // Find first non-whitespace character for fast pre-filtering. - trimStart := 0 - for trimStart < len(line) && (line[trimStart] == ' ' || line[trimStart] == '\t') { - trimStart++ - } - if trimStart >= len(line) { - continue - } - first := line[trimStart] - rest := line[trimStart:] // line content from first non-whitespace char - - strippedRest := strings.TrimRight(StripCommentsAndStrings(rest), " \t\r") - - // 'e' — check for "end" to pop module stack; otherwise fall through - if first == 'e' { - if IsEnd(strippedRest) { - if len(moduleStack) > 0 && moduleStack[len(moduleStack)-1].depth == depth { - frame := moduleStack[len(moduleStack)-1] - moduleStack = moduleStack[:len(moduleStack)-1] - aliases = frame.savedAliases - injectors = frame.savedInjectors - } - depth-- - if depth < 0 { - depth = 0 - } - continue - } - // Not "end" — may be a bare macro call like "embedded_schema do" - } - - // Track block-opening keywords (do..end and fn..end) for depth counting. - // This covers defmodule/def/defp/case/cond/fn/etc. — any construct closed by "end". - if OpensBlock(strippedRest) { - depth++ - } - - currentModule := "" - if len(moduleStack) > 0 { - currentModule = moduleStack[len(moduleStack)-1].name - } - - // 'a' — alias tracking (+ emit alias ref) - if first == 'a' { - if strings.HasPrefix(rest, "alias") && len(rest) > 5 && (rest[5] == ' ' || rest[5] == '\t') { - afterAlias := strings.TrimLeft(rest[5:], " \t") - moduleName := ScanModuleName(afterAlias) - if moduleName != "" { - remaining := afterAlias[len(moduleName):] - remaining = strings.TrimLeft(remaining, " \t") - - // Multi-alias: alias MyApp.{Accounts, Users} - // ScanModuleName consumes the trailing "." so remaining starts with "{" - if strings.HasPrefix(remaining, "{") { - braceEnd := strings.IndexByte(remaining, '}') - if braceEnd >= 0 { - inner := remaining[1:braceEnd] - // Trim trailing dot from parent module name - parent := strings.TrimRight(moduleName, ".") - parentResolved := resolveModule(parent, currentModule) - for _, segment := range strings.Split(inner, ",") { - segment = strings.TrimSpace(segment) - childName := ScanModuleName(segment) - if childName != "" { - fullChild := parentResolved + "." + childName - aliasKey := childName - if dot := strings.LastIndexByte(childName, '.'); dot >= 0 { - aliasKey = childName[dot+1:] - } - aliases[aliasKey] = fullChild - if !strings.Contains(fullChild, "__MODULE__") { - refs = append(refs, Reference{Module: fullChild, Line: lineNum, FilePath: path, Kind: "alias"}) - } - } - } - continue - } - } - - if strings.HasPrefix(remaining, ", as:") { - asStr := strings.TrimLeft(remaining[5:], " \t") // skip ", as:" - asName := scanIdentifier(asStr) - if asName != "" { - resolved := resolveModule(moduleName, currentModule) - if !strings.Contains(resolved, "__MODULE__") { - aliases[asName] = resolved - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "alias"}) - } - } - } else if strings.HasPrefix(remaining, ",as:") { - asStr := strings.TrimLeft(remaining[4:], " \t") // skip ",as:" - asName := scanIdentifier(asStr) - if asName != "" { - resolved := resolveModule(moduleName, currentModule) - if !strings.Contains(resolved, "__MODULE__") { - aliases[asName] = resolved - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "alias"}) - } - } - } else { - resolved := resolveModule(moduleName, currentModule) - dot := strings.LastIndexByte(resolved, '.') - var shortName string - if dot >= 0 { - shortName = resolved[dot+1:] - } else { - shortName = resolved - } - aliases[shortName] = resolved - if !strings.Contains(resolved, "__MODULE__") { - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "alias"}) - } - } - continue - } - } - // Not an alias — fall through to ref extraction - goto extractCallRefs - } - - // 'i' — import (refs only) - if first == 'i' { - if strings.HasPrefix(rest, "import") && len(rest) > 6 && (rest[6] == ' ' || rest[6] == '\t') { - afterImport := strings.TrimLeft(rest[6:], " \t") - moduleName := ScanModuleName(afterImport) - if moduleName != "" { - resolved := resolveModule(moduleName, currentModule) - if !strings.Contains(resolved, "__MODULE__") { - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "import"}) - injectors[resolved] = true - } - continue - } - } - goto extractCallRefs - } - - // 'u' — use (refs only) - if first == 'u' { - if strings.HasPrefix(rest, "use") && len(rest) > 3 && (rest[3] == ' ' || rest[3] == '\t') { - afterUse := strings.TrimLeft(rest[3:], " \t") - moduleName := ScanModuleName(afterUse) - if moduleName != "" { - resolved := resolveModule(moduleName, currentModule) - if !strings.Contains(resolved, "__MODULE__") { - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "use"}) - injectors[resolved] = true - } - continue - } - } - goto extractCallRefs - } - - // '@' — type definitions (@type, @opaque) and @behaviour refs - // @typep is private-to-file and not indexed. - if first == '@' { - if currentModule != "" { - var kind string - var afterKw string - if strings.HasPrefix(rest, "@type") && len(rest) > 5 && (rest[5] == ' ' || rest[5] == '\t') { - kind = "type" - afterKw = strings.TrimLeft(rest[5:], " \t") - } else if strings.HasPrefix(rest, "@opaque") && len(rest) > 7 && (rest[7] == ' ' || rest[7] == '\t') { - kind = "opaque" - afterKw = strings.TrimLeft(rest[7:], " \t") - } - if kind != "" { - name := ScanFuncName(afterKw) - if name != "" { - defs = append(defs, Definition{ - Module: currentModule, - Function: name, - Arity: ExtractArity(line, name), - Line: lineNum, - FilePath: path, - Kind: kind, - }) - } - } - - // @behaviour ModuleName — record as a ref so module renames update it - if strings.HasPrefix(rest, "@behaviour") && len(rest) > 10 && (rest[10] == ' ' || rest[10] == '\t') { - afterBehaviour := strings.TrimLeft(rest[10:], " \t") - moduleName := ScanModuleName(afterBehaviour) - if moduleName != "" { - resolved := resolveModule(moduleName, currentModule) - if !strings.Contains(resolved, "__MODULE__") { - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "behaviour"}) - } - } - } - - // @callback/@macrocallback — index as definitions for go-to-declaration and go-to-implementation. - // Check @macrocallback first since it shares a prefix with @callback. - var callbackKind string - var afterCallbackKw string - if strings.HasPrefix(rest, "@macrocallback") && len(rest) > 14 && (rest[14] == ' ' || rest[14] == '\t') { - callbackKind = "macrocallback" - afterCallbackKw = strings.TrimLeft(rest[14:], " \t") - } else if strings.HasPrefix(rest, "@callback") && len(rest) > 9 && (rest[9] == ' ' || rest[9] == '\t') { - callbackKind = "callback" - afterCallbackKw = strings.TrimLeft(rest[9:], " \t") - } - if callbackKind != "" { - name := ScanFuncName(afterCallbackKw) - if name != "" { - defs = append(defs, Definition{ - Module: currentModule, - Function: name, - Arity: ExtractArity(line, name), - Line: lineNum, - FilePath: path, - Kind: callbackKind, - }) - } - } - - } - // Don't continue — fall through to extractCallRefs so that module - // references in @spec/@type/@callback annotations are captured - // (e.g. User.t() in "@spec get_user() :: User.t()"). - goto extractCallRefs - } - - // 'd' — defmodule, defprotocol, defimpl, def*, defstruct, defexception - if first == 'd' && strings.HasPrefix(rest, "def") { - if name, ok := scanDefKeyword(rest, "defmodule"); ok { - if !strings.Contains(name, ".") && currentModule != "" { - name = currentModule + "." + name - } - currentModule = name - moduleStack = append(moduleStack, moduleFrame{name: currentModule, depth: depth, savedAliases: copyMap(aliases), savedInjectors: copyBoolMap(injectors)}) - defs = append(defs, Definition{ - Module: currentModule, - Line: lineNum, - FilePath: path, - Kind: "module", - }) - continue - } - - if name, ok := scanDefKeyword(rest, "defprotocol"); ok { - if !strings.Contains(name, ".") && currentModule != "" { - name = currentModule + "." + name - } - currentModule = name - moduleStack = append(moduleStack, moduleFrame{name: currentModule, depth: depth, savedAliases: copyMap(aliases), savedInjectors: copyBoolMap(injectors)}) - defs = append(defs, Definition{ - Module: currentModule, - Line: lineNum, - FilePath: path, - Kind: "defprotocol", - }) - continue - } - - if name, ok := scanDefKeyword(rest, "defimpl"); ok { - if !strings.Contains(name, ".") && currentModule != "" { - name = currentModule + "." + name - } - currentModule = name - moduleStack = append(moduleStack, moduleFrame{name: currentModule, depth: depth, savedAliases: copyMap(aliases), savedInjectors: copyBoolMap(injectors)}) - defs = append(defs, Definition{ - Module: currentModule, - Line: lineNum, - FilePath: path, - Kind: "defimpl", - }) - continue - } - - if currentModule != "" { - if kind, funcName, ok := ScanFuncDef(rest); ok { - paramContent := FindParamContent(line, funcName) - maxArity := ArityFromParams(paramContent) - defaultCount := DefaultsFromParams(paramContent) - minArity := maxArity - defaultCount - - var delegateTo, delegateAs string - if kind == "defdelegate" { - delegateTo, delegateAs = findDelegateToAndAs(lines, lineIdx, aliases, currentModule) - } - - allParamNames := ExtractParamNames(line, funcName) - - for arity := minArity; arity <= maxArity; arity++ { - params := JoinParams(allParamNames, arity) - defs = append(defs, Definition{ - Module: currentModule, - Function: funcName, - Arity: arity, - Line: lineNum, - FilePath: path, - Kind: kind, - DelegateTo: delegateTo, - DelegateAs: delegateAs, - Params: params, - }) - } - // Don't continue — line may contain refs like: def foo, do: Repo.all() - goto extractCallRefs - } - - if strings.HasPrefix(rest, "defstruct ") || strings.HasPrefix(rest, "defstruct\t") { - defs = append(defs, Definition{ - Module: currentModule, - Function: "__struct__", - Line: lineNum, - FilePath: path, - Kind: "defstruct", - }) - } - if strings.HasPrefix(rest, "defexception ") || strings.HasPrefix(rest, "defexception\t") { - defs = append(defs, Definition{ - Module: currentModule, - Function: "__exception__", - Line: lineNum, - FilePath: path, - Kind: "defexception", - }) - } - } - // Fall through to ref extraction for lines like: def foo, do: Mod.func() - } - - extractCallRefs: - // Detect bare calls to functions/macros from use'd/import'd modules. - // Detect bare calls to functions/macros from use'd/import'd modules. - if currentModule != "" && len(injectors) > 0 { - trimmedRest := strings.TrimRight(rest, " \t\r") - if strings.HasSuffix(trimmedRest, " do") || strings.HasSuffix(trimmedRest, "\tdo") { - // Macro call with do block: embedded_schema do, schema "t" do, test "x" do - name := ScanFuncName(rest) - if name != "" && !elixirKeyword[name] { - for mod := range injectors { - refs = append(refs, Reference{Module: mod, Function: name, Line: lineNum, FilePath: path, Kind: "call"}) - } - } - } else { - // DSL call at line start: field :name, :string or cast(struct, params) - name := ScanFuncName(rest) - if name != "" && !elixirKeyword[name] { - after := rest[len(name):] - if len(after) > 0 && (after[0] == '(' || (after[0] == ' ' && len(after) > 1 && after[1] == ':')) { - for mod := range injectors { - refs = append(refs, Reference{Module: mod, Function: name, Line: lineNum, FilePath: path, Kind: "call"}) - } - } - } - // Pipe call: |> cast_embed(:jobs), |> validate_required(...) - if idx := strings.Index(rest, "|>"); idx >= 0 { - afterPipe := strings.TrimLeft(rest[idx+2:], " \t") - name := ScanFuncName(afterPipe) - if name != "" && !elixirKeyword[name] { - for mod := range injectors { - refs = append(refs, Reference{Module: mod, Function: name, Line: lineNum, FilePath: path, Kind: "call"}) - } - } - } - } - } - - // Extract Module.function calls and %Module{} struct literals from any line. - if !hasUppercase(line) { - continue - } - { - codeLine := StripCommentsAndStrings(line) - - // Module.function calls (including type refs like User.t()) - for _, match := range moduleCallRe.FindAllStringSubmatch(codeLine, -1) { - modRef, funcName := match[1], match[2] - if elixirKeyword[funcName] { - continue - } - resolved := ResolveModuleRef(modRef, aliases, currentModule) - if resolved != "" { - refs = append(refs, Reference{Module: resolved, Function: funcName, Line: lineNum, FilePath: path, Kind: "call"}) - } - } - - // %Module{} struct literals and pattern matches - for _, match := range structLiteralRe.FindAllStringSubmatch(codeLine, -1) { - resolved := ResolveModuleRef(match[1], aliases, currentModule) - if resolved != "" { - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "call"}) - } - } - - // Standalone module references: @impl GenServer, @derive [Jason.Encoder], - // rescue e in MyError, is_struct(x, User), etc. - for _, match := range standaloneModuleRe.FindAllStringSubmatchIndex(codeLine, -1) { - modStart, modEnd := match[2], match[3] - modRef := codeLine[modStart:modEnd] - // Skip Module.function — already caught by moduleCallRe - if modEnd < len(codeLine) && codeLine[modEnd] == '.' && - modEnd+1 < len(codeLine) && ((codeLine[modEnd+1] >= 'a' && codeLine[modEnd+1] <= 'z') || codeLine[modEnd+1] == '_') { - continue - } - // Skip %Module{ — already caught by structLiteralRe - if modStart > 0 && codeLine[modStart-1] == '%' { - continue - } - // Skip self-references to the current module - if modRef == currentModule { - continue - } - resolved := ResolveModuleRef(modRef, aliases, currentModule) - if resolved != "" { - refs = append(refs, Reference{Module: resolved, Line: lineNum, FilePath: path, Kind: "call"}) - } - } - } - } - - return defs, refs, nil -} - -// ScanModuleName reads a module name ([A-Za-z0-9_.]+) from the start of s. -func ScanModuleName(s string) string { - i := 0 - for i < len(s) { - c := s[i] - if (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '_' || c == '.' { - i++ - } else { - break - } - } - if i == 0 { - return "" - } - return s[:i] + source := []byte(text) + tokens := Tokenize(source) + return parseTextFromTokens(path, source, tokens) } // ScanFuncName reads a function/type name ([a-z_][a-z0-9_?!]*) from the start of s. @@ -560,408 +87,6 @@ func ScanFuncName(s string) string { return s[:i] } -// scanIdentifier reads an identifier ([A-Za-z0-9_]+) from the start of s. -func scanIdentifier(s string) string { - i := 0 - for i < len(s) { - c := s[i] - if (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '_' { - i++ - } else { - break - } - } - if i == 0 { - return "" - } - return s[:i] -} - -// scanDefKeyword checks if rest starts with keyword (e.g. "defmodule") followed -// by whitespace and a module name. For defmodule/defprotocol, requires " do" after. -func scanDefKeyword(rest, keyword string) (string, bool) { - if !strings.HasPrefix(rest, keyword) { - return "", false - } - after := rest[len(keyword):] - if len(after) == 0 || (after[0] != ' ' && after[0] != '\t') { - return "", false - } - after = strings.TrimLeft(after, " \t") - name := ScanModuleName(after) - if name == "" { - return "", false - } - if keyword == "defimpl" { - return name, true - } - remaining := strings.TrimLeft(after[len(name):], " \t") - if remaining == "do" || strings.HasPrefix(remaining, "do ") || strings.HasPrefix(remaining, "do\t") || strings.HasPrefix(remaining, "do\r") { - return name, true - } - return "", false -} - -// funcDefKeywords is ordered longest-first to avoid prefix ambiguity -// (e.g. "defmacrop" before "defmacro", "defp" before "def"). -var funcDefKeywords = []string{ - "defdelegate", - "defmacrop", - "defmacro", - "defguardp", - "defguard", - "defp", - "def", -} - -// ScanFuncDef checks if rest matches a function definition keyword followed by -// whitespace and a function name. Returns the kind, name, and true if matched. -func ScanFuncDef(rest string) (string, string, bool) { - for _, kw := range funcDefKeywords { - if !strings.HasPrefix(rest, kw) { - continue - } - after := rest[len(kw):] - // Must be followed by whitespace - if len(after) == 0 || (after[0] != ' ' && after[0] != '\t') { - continue - } - after = strings.TrimLeft(after, " \t") - name := ScanFuncName(after) - if name == "" { - continue - } - // Verify next char is whitespace, '(', or ',' - afterName := after[len(name):] - if len(afterName) > 0 { - c := afterName[0] - if c != ' ' && c != '\t' && c != '(' && c != ',' && c != '\n' && c != '\r' { - continue - } - } - return kw, name, true - } - return "", "", false -} - -// CheckHeredoc updates the inHeredoc state for a given line. Returns the new -// inHeredoc state and whether this line is a heredoc boundary or content that -// should be skipped by callers doing line-by-line analysis. -func CheckHeredoc(line string, inHeredoc bool) (newState bool, skip bool) { - if strings.IndexByte(line, '"') >= 0 { - if c := strings.Count(line, `"""`); c > 0 { - if c < 2 { - inHeredoc = !inHeredoc - } - return inHeredoc, true - } - } - if strings.IndexByte(line, '\'') >= 0 { - if c := strings.Count(line, `'''`); c > 0 { - if c < 2 { - inHeredoc = !inHeredoc - } - return inHeredoc, true - } - } - return inHeredoc, inHeredoc -} - -// ContainsDo returns true if the trimmed line ends with a block-opening " do" -// (not an inline "do:" keyword argument). -func ContainsDo(trimmed string) bool { - return trimmed == "do" || strings.HasSuffix(trimmed, " do") || strings.HasSuffix(trimmed, "\tdo") -} - -// IsEnd returns true if the trimmed line starts with the block-closing "end" -// keyword. It distinguishes "end" from identifiers like "endpoint" by checking -// that the character after "end" is not an identifier character. -func IsEnd(trimmed string) bool { - if !strings.HasPrefix(trimmed, "end") { - return false - } - // "end" at end of string, or followed by a non-identifier char - return len(trimmed) == 3 || !isIdentChar(trimmed[3]) -} - -// OpensBlock returns true if the trimmed line opens a block that will be closed -// by a matching "end". This covers both do..end blocks and fn..end blocks. -func OpensBlock(trimmed string) bool { - return ContainsDo(trimmed) || ContainsFn(trimmed) -} - -// ContainsFn returns true if the line opens an anonymous function block -// (fn ... -> on its own line) that will be closed by a matching "end". -// It returns false when the fn...end is entirely on one line. -// Callers should pass input through StripCommentsAndStrings first. -func ContainsFn(code string) bool { - if !containsFnKeyword(code) { - return false - } - // Must not have a matching end on the same line (inline fn...end). - if idx := strings.LastIndex(code, " end"); idx >= 0 { - if IsEnd(strings.TrimSpace(code[idx:])) { - return false - } - } - return true -} - -// containsFnKeyword returns true if code contains "fn" as a standalone keyword, -// not part of a longer identifier. The character before "fn" must be a -// non-identifier char (or start of string), and the character after must also -// be a non-identifier char (or end of string). -func containsFnKeyword(code string) bool { - for i := 0; i <= len(code)-2; i++ { - if code[i] != 'f' || code[i+1] != 'n' { - continue - } - // Check character before: must be start of string or non-identifier. - // ':' before means it's an atom (:fn), not the keyword. - if i > 0 && (isIdentChar(code[i-1]) || code[i-1] == ':') { - continue - } - // Check character after: must be end of string or non-identifier. - // ':' after means it's a keyword key (fn: value), not the keyword. - if i+2 < len(code) && (isIdentChar(code[i+2]) || code[i+2] == ':') { - continue - } - return true - } - return false -} - -func isIdentChar(b byte) bool { - return (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || (b >= '0' && b <= '9') || b == '_' -} - -// findDelegateTo searches the current line and up to 5 subsequent lines for a to: target, -// then resolves it via aliases. -func findDelegateToAndAs(lines []string, startIdx int, aliases map[string]string, currentModule string) (string, string) { - end := startIdx + 6 - if end > len(lines) { - end = len(lines) - } - - var targetModule, targetFunc string - for i := startIdx; i < end; i++ { - // A new statement on any line after the first means the current defdelegate ended - if i > startIdx && newStatementRe.MatchString(lines[i]) { - break - } - if m := delegateToRe.FindStringSubmatch(lines[i]); m != nil && targetModule == "" { - target := m[1] - // Resolve __MODULE__ directly in to: field - if currentModule != "" { - target = strings.ReplaceAll(target, "__MODULE__", currentModule) - } - if resolved, ok := aliases[target]; ok { - // Exact alias match: "to: Services" where Services -> MyApp.HRIS.Services - targetModule = resolved - } else if parts := strings.SplitN(target, ".", 2); len(parts) == 2 { - // Partial alias: "to: Services.Foo" where Services -> MyApp.HRIS.Services - if resolved, ok := aliases[parts[0]]; ok { - targetModule = resolved + "." + parts[1] - } else { - targetModule = target - } - } else { - targetModule = target - } - } - if m := delegateAsRe.FindStringSubmatch(lines[i]); m != nil && targetFunc == "" { - targetFunc = m[1] - } - } - return targetModule, targetFunc -} - -// FindParamContent locates funcName in line, finds the opening parenthesis -// after it, and returns the substring starting after that '('. Returns "" -// if funcName is not found or has no parenthesized arguments. -// This allows callers that need both arity and default counts to avoid -// repeating the Index + IndexByte lookup. -func FindParamContent(line, funcName string) string { - idx := strings.Index(line, funcName) - if idx < 0 { - return "" - } - rest := line[idx+len(funcName):] - parenIdx := strings.IndexByte(rest, '(') - if parenIdx < 0 { - return "" - } - return rest[parenIdx+1:] -} - -// ArityFromParams counts the number of top-level arguments in the parameter -// content string (as returned by FindParamContent). Respects nested -// parens/brackets/braces and skips string literals. -func ArityFromParams(inside string) int { - if inside == "" { - return 0 - } - depth := 1 - commas := 0 - hasContent := false - for i := 0; i < len(inside); i++ { - ch := inside[i] - // Skip string and charlist literals - if ch == '"' || ch == '\'' { - i = skipStringLiteral(inside, i) - hasContent = true - continue - } - switch ch { - case '(', '[', '{': - depth++ - case ')', ']', '}': - depth-- - if depth == 0 { - if hasContent { - return commas + 1 - } - return 0 - } - case ',': - if depth == 1 { - commas++ - } - } - if depth == 1 && ch != ' ' && ch != '\t' && ch != '\n' { - hasContent = true - } - } - if hasContent { - return commas + 1 - } - return 0 -} - -// DefaultsFromParams counts parameters with default values (\\) in the -// parameter content string (as returned by FindParamContent). Only counts -// defaults at the top-level param depth, not inside nested structures. -func DefaultsFromParams(inside string) int { - if inside == "" { - return 0 - } - depth := 1 - defaults := 0 - for i := 0; i < len(inside); i++ { - ch := inside[i] - // Skip string and charlist literals - if ch == '"' || ch == '\'' { - i = skipStringLiteral(inside, i) - continue - } - switch ch { - case '(', '[', '{': - depth++ - case ')', ']', '}': - depth-- - if depth == 0 { - return defaults - } - case '\\': - if depth == 1 && i+1 < len(inside) && inside[i+1] == '\\' { - defaults++ - i++ // skip the second backslash - } - } - } - return defaults -} - -// ExtractArity counts the number of arguments in a function definition line. -// It finds the first parenthesized argument list after the function name and -// counts top-level commas, respecting nested parens/brackets/braces. -func ExtractArity(line string, funcName string) int { - return ArityFromParams(FindParamContent(line, funcName)) -} - -// CountDefaultParams counts the number of parameters with default values (\\) -// in a function definition line. Only counts defaults at the top-level param -// depth, not inside nested structures. -func CountDefaultParams(line string, funcName string) int { - return DefaultsFromParams(FindParamContent(line, funcName)) -} - -// ExtractParamNames extracts readable parameter names from a function -// definition line. Returns nil if the line can't be parsed. For complex -// patterns (e.g. %{name: name}), falls back to positional names like "arg1". -func ExtractParamNames(line, funcName string) []string { - idx := strings.Index(line, funcName) - if idx < 0 { - return nil - } - rest := line[idx+len(funcName):] - parenIdx := strings.IndexByte(rest, '(') - if parenIdx < 0 { - return nil - } - - inside := rest[parenIdx+1:] - depth := 1 - var end int - for i := 0; i < len(inside); i++ { - switch inside[i] { - case '(', '[', '{': - depth++ - case ')', ']', '}': - depth-- - if depth == 0 { - end = i - goto found - } - } - } - return nil - -found: - paramStr := inside[:end] - if strings.TrimSpace(paramStr) == "" { - return nil - } - - var params []string - depth = 0 - start := 0 - for i := 0; i < len(paramStr); i++ { - switch paramStr[i] { - case '(', '[', '{': - depth++ - case ')', ']', '}': - depth-- - case '<': - if i+1 < len(paramStr) && paramStr[i+1] == '<' { - depth++ - i++ - } - case '>': - if i+1 < len(paramStr) && paramStr[i+1] == '>' { - depth-- - i++ - } - case ',': - if depth == 0 { - params = append(params, strings.TrimSpace(paramStr[start:i])) - start = i + 1 - } - } - } - params = append(params, strings.TrimSpace(paramStr[start:])) - - var names []string - for i, p := range params { - if bsIdx := strings.Index(p, "\\\\"); bsIdx >= 0 { - p = strings.TrimSpace(p[:bsIdx]) - } - name := scanParamName(p, i) - names = append(names, name) - } - return names -} - // JoinParams returns a comma-separated string of the first `arity` parameter // names extracted from a function definition. Returns "" when names is nil or // shorter than arity. @@ -972,38 +97,6 @@ func JoinParams(names []string, arity int) string { return strings.Join(names[:arity], ",") } -func scanParamName(param string, index int) string { - param = strings.TrimSpace(param) - if name := ScanFuncName(param); name != "" && name != "_" { - return name - } - // Handle "pattern = variable" (e.g. %User{} = user, [_ | _] = list) - if eqIdx := strings.LastIndex(param, "="); eqIdx >= 0 { - after := strings.TrimSpace(param[eqIdx+1:]) - if name := ScanFuncName(after); name != "" && name != "_" { - return name - } - } - return "arg" + strconv.Itoa(index+1) -} - -// skipStringLiteral advances past a string or charlist literal starting at -// position i in s (where s[i] is '"' or '\"). Returns the index of the -// closing quote character so that the outer for-loop's i++ lands past it. -func skipStringLiteral(s string, i int) int { - quote := s[i] - for j := i + 1; j < len(s); j++ { - if s[j] == '\\' { - j++ // skip escaped character - continue - } - if s[j] == quote { - return j - } - } - return len(s) - 1 -} - func resolveModule(s, currentModule string) string { if currentModule != "" { return strings.ReplaceAll(s, "__MODULE__", currentModule) @@ -1011,18 +104,6 @@ func resolveModule(s, currentModule string) string { return s } -// moduleCallRe matches Module.function calls — an uppercase module segment -// followed by a dot and a lowercase function name. -var moduleCallRe = regexp.MustCompile(`([A-Z][A-Za-z0-9_]*(?:\.[A-Z][A-Za-z0-9_]*)*)\.([a-z_][a-z0-9_?!]*)`) - -// structLiteralRe matches %Module{...} struct literals and pattern matches. -var structLiteralRe = regexp.MustCompile(`%([A-Z][A-Za-z0-9_]*(?:\.[A-Z][A-Za-z0-9_]*)*)\{`) - -// standaloneModuleRe matches module names that appear without a .function or %{ -// suffix — covers @impl GenServer, @derive [Jason.Encoder], rescue e in MyError, -// is_struct(x, User), etc. The negative lookahead for . and { is handled in code. -var standaloneModuleRe = regexp.MustCompile(`(?:^|[^A-Za-z0-9_.%])([A-Z][A-Za-z0-9_]*(?:\.[A-Z][A-Za-z0-9_]*)*)`) - // ResolveModuleRef resolves a module reference through aliases and __MODULE__. // Returns "" if the reference contains unresolvable __MODULE__. func ResolveModuleRef(modRef string, aliases map[string]string, currentModule string) string { @@ -1041,143 +122,6 @@ func ResolveModuleRef(modRef string, aliases map[string]string, currentModule st return resolved } -func hasUppercase(s string) bool { - for i := 0; i < len(s); i++ { - if s[i] >= 'A' && s[i] <= 'Z' { - return true - } - } - return false -} - -// StripCommentsAndStrings removes inline comments and replaces the content -// of string literals and sigils with spaces so that regex-based extraction -// doesn't produce false-positive references from comments, strings, or sigils. -func StripCommentsAndStrings(line string) string { - // Fast path: skip allocation if line has no strings, comments, or sigils - if !strings.ContainsAny(line, "\"'#~") { - return line - } - buf := []byte(line) - i := 0 - for i < len(buf) { - ch := buf[i] - // Skip escaped characters - if ch == '\\' && i+1 < len(buf) { - i += 2 - continue - } - // Sigil: ~s(...), ~r/.../, etc. - if ch == '~' && i+1 < len(buf) { - next := buf[i+1] - sigilStart := i + 2 - // Sigil letter (uppercase or lowercase) - if (next >= 'a' && next <= 'z') || (next >= 'A' && next <= 'Z') { - if sigilStart < len(buf) { - i = blankSigil(buf, sigilStart) - continue - } - } - i++ - continue - } - // String literal (double-quoted) - if ch == '"' { - i = blankQuoted(buf, i, '"') - continue - } - // Single-quoted charlist - if ch == '\'' { - i = blankQuoted(buf, i, '\'') - continue - } - // Comment — blank everything from here to end of line - if ch == '#' { - for k := i; k < len(buf); k++ { - buf[k] = ' ' - } - break - } - i++ - } - return string(buf) -} - -// blankQuoted blanks the contents of a quoted literal (string or charlist) -// starting at buf[i] (which is the opening quote). Returns the index after -// the closing quote. -func blankQuoted(buf []byte, i int, quote byte) int { - j := i + 1 - for j < len(buf) { - if buf[j] == '\\' && j+1 < len(buf) { - buf[j] = ' ' - buf[j+1] = ' ' - j += 2 - continue - } - if buf[j] == quote { - i = j + 1 - return i - } - buf[j] = ' ' - j++ - } - // Unterminated — blank to end - for k := i + 1; k < len(buf); k++ { - buf[k] = ' ' - } - return len(buf) -} - -// blankSigil blanks the contents of a sigil starting at buf[i] (the opening -// delimiter character, after the ~X). Returns the index after the closing -// delimiter + modifiers. -func blankSigil(buf []byte, i int) int { - opener := buf[i] - var closer byte - switch opener { - case '(': - closer = ')' - case '[': - closer = ']' - case '{': - closer = '}' - case '<': - closer = '>' - case '/', '|', '"', '\'': - closer = opener - default: - return i + 1 - } - j := i + 1 - depth := 1 - for j < len(buf) { - if buf[j] == '\\' && j+1 < len(buf) { - buf[j] = ' ' - buf[j+1] = ' ' - j += 2 - continue - } - if buf[j] == closer { - depth-- - if depth == 0 { - j++ - // Skip trailing sigil modifiers (letters) - for j < len(buf) && ((buf[j] >= 'a' && buf[j] <= 'z') || (buf[j] >= 'A' && buf[j] <= 'Z')) { - buf[j] = ' ' - j++ - } - return j - } - } else if closer != opener && buf[j] == opener { - depth++ - } - buf[j] = ' ' - j++ - } - return len(buf) -} - func copyMap(m map[string]string) map[string]string { cp := make(map[string]string, len(m)) for k, v := range m { diff --git a/internal/parser/parser_bench_test.go b/internal/parser/parser_bench_test.go new file mode 100644 index 0000000..74b6940 --- /dev/null +++ b/internal/parser/parser_bench_test.go @@ -0,0 +1,112 @@ +package parser + +import ( + "os" + "path/filepath" + "testing" +) + +var benchFiles []struct { + name string + data []byte +} + +func loadBenchFiles(b *testing.B) { + b.Helper() + if benchFiles != nil { + return + } + testdata := filepath.Join("..", "lsp", "testdata", "monorepo", "apps", "app_with_ecto_migration", "deps") + candidates := []string{ + filepath.Join(testdata, "ecto", "lib", "ecto", "changeset.ex"), + filepath.Join(testdata, "db_connection", "lib", "db_connection.ex"), + filepath.Join(testdata, "ecto", "lib", "ecto", "repo.ex"), + filepath.Join(testdata, "ecto", "lib", "ecto", "query.ex"), + filepath.Join(testdata, "ecto_sql", "lib", "ecto", "adapters", "sql.ex"), + } + for _, path := range candidates { + data, err := os.ReadFile(path) + if err != nil { + continue + } + benchFiles = append(benchFiles, struct { + name string + data []byte + }{filepath.Base(path), data}) + } + if len(benchFiles) == 0 { + b.Skip("no benchmark files found") + } +} + +func BenchmarkParseText(b *testing.B) { + loadBenchFiles(b) + for _, f := range benchFiles { + b.Run(f.name, func(b *testing.B) { + text := string(f.data) + b.SetBytes(int64(len(f.data))) + b.ResetTimer() + for i := 0; i < b.N; i++ { + _, _, _ = ParseText("bench.ex", text) + } + }) + } +} + +func BenchmarkTokenize(b *testing.B) { + loadBenchFiles(b) + for _, f := range benchFiles { + b.Run(f.name, func(b *testing.B) { + b.SetBytes(int64(len(f.data))) + b.ResetTimer() + for i := 0; i < b.N; i++ { + Tokenize(f.data) + } + }) + } +} + +func BenchmarkParseTextAllFiles(b *testing.B) { + testdata := filepath.Join("..", "lsp", "testdata") + var allFiles []struct { + name string + data []byte + } + _ = filepath.Walk(testdata, func(path string, info os.FileInfo, err error) error { + if err != nil || info.IsDir() { + return nil + } + ext := filepath.Ext(path) + if ext != ".ex" && ext != ".exs" { + return nil + } + data, err := os.ReadFile(path) + if err != nil { + return nil + } + rel, _ := filepath.Rel(testdata, path) + allFiles = append(allFiles, struct { + name string + data []byte + }{rel, data}) + return nil + }) + if len(allFiles) == 0 { + b.Skip("no test files found") + } + + var totalBytes int64 + for _, f := range allFiles { + totalBytes += int64(len(f.data)) + } + + b.Run("all_testdata", func(b *testing.B) { + b.SetBytes(totalBytes) + b.ResetTimer() + for i := 0; i < b.N; i++ { + for _, f := range allFiles { + _, _, _ = ParseText(f.name, string(f.data)) + } + } + }) +} diff --git a/internal/parser/parser_test.go b/internal/parser/parser_test.go index 99548ae..5b7181a 100644 --- a/internal/parser/parser_test.go +++ b/internal/parser/parser_test.go @@ -4,6 +4,7 @@ import ( "fmt" "os" "path/filepath" + "strings" "testing" ) @@ -130,6 +131,72 @@ end } } +func TestParseFile_NestedModuleDoNextLine(t *testing.T) { + path := writeTempFile(t, `defmodule MyApp.Outer do + defmodule Inner + do + def inner_func, do: :ok + end + + def outer_func, do: :ok +end +`) + + defs, _, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + var innerMod, innerFunc bool + for _, d := range defs { + if d.Kind == "module" && d.Module == "MyApp.Outer.Inner" { + innerMod = true + } + if d.Function == "inner_func" { + innerFunc = true + if d.Module != "MyApp.Outer.Inner" { + t.Errorf("inner_func should belong to MyApp.Outer.Inner, got %s", d.Module) + } + } + if d.Function == "outer_func" && d.Module != "MyApp.Outer" { + t.Errorf("outer_func should belong to MyApp.Outer, got %s", d.Module) + } + } + if !innerMod { + t.Error("missing inner module definition") + } + if !innerFunc { + t.Error("missing inner_func") + } +} + +func TestParseFile_InlineDoModule(t *testing.T) { + path := writeTempFile(t, `defmodule MyApp.Outer do + defmodule Inline, do: (def greet, do: :hi) + + def outer_func, do: :ok +end +`) + + defs, _, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + var inlineMod bool + for _, d := range defs { + if d.Kind == "module" && d.Module == "MyApp.Outer.Inline" { + inlineMod = true + } + if d.Function == "outer_func" && d.Module != "MyApp.Outer" { + t.Errorf("outer_func should belong to MyApp.Outer, got %s", d.Module) + } + } + if !inlineMod { + t.Error("missing MyApp.Outer.Inline module definition from inline do: form") + } +} + func TestParseFile_Macros(t *testing.T) { path := writeTempFile(t, `defmodule MyApp.Macros do defmacro my_macro(arg) do @@ -1327,103 +1394,6 @@ end } } -func TestExtractArity(t *testing.T) { - tests := []struct { - name string - line string - funcName string - expected int - }{ - {"no parens", " def foo, do: :ok", "foo", 0}, - {"empty parens", " def foo(), do: :ok", "foo", 0}, - {"one arg", " def foo(a), do: :ok", "foo", 1}, - {"two args", " def foo(a, b) do", "foo", 2}, - {"three args", " def create(name, email, role) do", "create", 3}, - {"nested map", " def foo(%{name: name}, opts) do", "foo", 2}, - {"nested list", " def foo([head | tail], acc) do", "foo", 2}, - {"default arg", " defmacro from(expr, kw \\\\ []) do", "from", 2}, - {"defguard", " defguard is_admin(user) when user.role == :admin", "is_admin", 1}, - {"defdelegate", " defdelegate create(attrs), to: Create", "create", 1}, - {"tuple arg", " def foo({a, b}, c) do", "foo", 2}, - {"keyword list", " def foo(a, opts \\\\ [key: :val]) do", "foo", 2}, - {"pattern match", " def handle_call(:get, _from, state) do", "handle_call", 3}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := ExtractArity(tt.line, tt.funcName) - if got != tt.expected { - t.Errorf("ExtractArity(%q, %q) = %d, want %d", tt.line, tt.funcName, got, tt.expected) - } - }) - } -} - -func TestCountDefaultParams(t *testing.T) { - tests := []struct { - name string - line string - funcName string - expected int - }{ - {"no defaults", " def foo(a, b) do", "foo", 0}, - {"one default", ` def foo(a, b \\ []) do`, "foo", 1}, - {"two defaults", ` def foo(a, b \\ nil, c \\ []) do`, "foo", 2}, - {"no parens", " def foo, do: :ok", "foo", 0}, - {"empty parens", " def foo() do", "foo", 0}, - {"default with keyword list", ` def foo(a, opts \\ [key: :val]) do`, "foo", 1}, - {"default in nested not counted", ` def foo(%{a: b \\ c}, d) do`, "foo", 0}, - {"defmacro default", ` defmacro from(expr, kw \\ []) do`, "from", 1}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := CountDefaultParams(tt.line, tt.funcName) - if got != tt.expected { - t.Errorf("CountDefaultParams(%q, %q) = %d, want %d", tt.line, tt.funcName, got, tt.expected) - } - }) - } -} - -func TestExtractParamNames(t *testing.T) { - tests := []struct { - name string - line string - funcName string - expected []string - }{ - {"simple params", " def create(name, email) do", "create", []string{"name", "email"}}, - {"default param", ` def fetch(slug, opts \\ []) do`, "fetch", []string{"slug", "opts"}}, - {"pattern match map", " def process(%{name: name}, data) do", "process", []string{"arg1", "data"}}, - {"no params", " def run do", "run", nil}, - {"empty parens", " def run() do", "run", nil}, - {"single param", " def get(id) do", "get", []string{"id"}}, - {"underscore param", " def handle(_ignored, state) do", "handle", []string{"_ignored", "state"}}, - {"struct = var", " def update(%User{} = user, attrs) do", "update", []string{"user", "attrs"}}, - {"var = struct", " def update(user = %User{}, attrs) do", "update", []string{"user", "attrs"}}, - {"tuple pattern", " def handle_info({:DOWN, _ref, :process, pid, _reason}, state) do", "handle_info", []string{"arg1", "state"}}, - {"bare underscore", " def foo(_, b) do", "foo", []string{"arg1", "b"}}, - {"atom literal", " def handle_call(:get, _from, state) do", "handle_call", []string{"arg1", "_from", "state"}}, - {"list pattern = var", " def process([_ | _] = list, opts) do", "process", []string{"list", "opts"}}, - {"binary pattern", " def parse(<>) do", "parse", []string{"arg1"}}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := ExtractParamNames(tt.line, tt.funcName) - if len(got) != len(tt.expected) { - t.Fatalf("ExtractParamNames(%q, %q) = %v, want %v", tt.line, tt.funcName, got, tt.expected) - } - for i := range got { - if got[i] != tt.expected[i] { - t.Errorf("param %d: got %q, want %q", i, got[i], tt.expected[i]) - } - } - }) - } -} - func TestParseFile_DefaultParamExpansion(t *testing.T) { path := writeTempFile(t, `defmodule MyApp.Companies do def fetch_company_by_slug(slug, opts \\ []) do @@ -2047,3 +2017,684 @@ end t.Errorf("expected String.t ref from callback type annotation, refs: %v", refs) } } + +// --- Regression tests for parser edge cases --- + +func TestParseFile_CharLiteralDoesNotConfuseStringBlanking(t *testing.T) { + // Bug 1: char literal ?" should not eat the module ref on the same line + path := writeTempFile(t, "defmodule MyApp.Foo do\n def bar do\n x = ?\"\n Real.Module.call()\n end\nend\n") + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + found = true + } + } + if !found { + t.Errorf("expected Real.Module.call ref; got refs: %+v", refs) + } +} + +func TestParseFile_InterpolationDoesNotConfuseRefExtraction(t *testing.T) { + // Bug 2: string interpolation with nested quotes containing module refs + path := writeTempFile(t, "defmodule MyApp.Foo do\n def bar do\n x = \"hello #{Real.Module.call(\\\"arg\\\"}\"\n Other.Module.work()\n end\nend\n") + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + for _, r := range refs { + if r.Module == "Real.Module" { + t.Errorf("should not extract refs from inside string interpolation, got %+v", r) + } + } + + found := false + for _, r := range refs { + if r.Module == "Other.Module" && r.Function == "work" { + found = true + } + } + if !found { + t.Errorf("expected Other.Module.work ref; got refs: %+v", refs) + } +} + +func TestParseFile_TripleQuoteInStringDoesNotToggleHeredoc(t *testing.T) { + // Bug 3: """ inside a string should not cause subsequent lines to be skipped + path := writeTempFile(t, "defmodule MyApp.Foo do\n def bar do\n x = \"contains \\\"\\\"\\\" triple quotes\"\n Real.Module.call()\n end\nend\n") + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + found = true + } + } + if !found { + t.Errorf("Real.Module.call should be found; got refs: %+v", refs) + } +} + +func TestParseText_LineContinuation(t *testing.T) { + // Bug 4: backslash at EOL joins with next line. + // Use a case where the module name spans the continuation boundary. + text := "defmodule MyApp.Foo do\n alias Some.\\\n Module\n def bar, do: Some.Module.call()\nend\n" + + _, refs, err := ParseText("test.ex", text) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "Some.Module" && r.Function == "call" { + found = true + } + } + if !found { + t.Errorf("Some.Module.call should be resolved after line continuation; got refs: %+v", refs) + } +} + +// --- Regression tests for multi-line construct bugs --- + +func TestParseFile_MultiLineAliasAs(t *testing.T) { + // Bug: alias with multi-line as: was silently lost because the trailing + // comma didn't trigger bracket joining and the parser saw two separate lines. + path := writeTempFile(t, `defmodule MyApp do + alias MyModule.MySubModule, + as: Something + + def foo do + Something.call() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + // The alias ref should be recorded + foundAlias := false + for _, r := range refs { + if r.Kind == "alias" && r.Module == "MyModule.MySubModule" { + foundAlias = true + } + } + if !foundAlias { + t.Error("expected alias ref for MyModule.MySubModule") + } + + // Something.call() should resolve via the as: alias + foundCall := false + for _, r := range refs { + if r.Kind == "call" && r.Module == "MyModule.MySubModule" && r.Function == "call" { + foundCall = true + } + } + if !foundCall { + t.Error("expected Something.call() to resolve to MyModule.MySubModule.call via as: alias") + } +} + +func TestParseFile_MultiLineAliasAs_Defdelegate(t *testing.T) { + // Multi-line alias ... as: must resolve for defdelegate targets too. + path := writeTempFile(t, `defmodule MyApp do + alias MyApp.Serializer.Date, + as: DateSerializer + + defdelegate format(date), to: DateSerializer +end +`) + + defs, _, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + for _, d := range defs { + if d.Function == "format" { + if d.DelegateTo != "MyApp.Serializer.Date" { + t.Errorf("expected DelegateTo MyApp.Serializer.Date, got %q", d.DelegateTo) + } + return + } + } + t.Error("missing defdelegate format") +} + +func TestParseFile_SigilTripleQuoteDoesNotToggleHeredoc(t *testing.T) { + // Bug: """ inside ~s(...) toggled heredoc mode on, causing subsequent + // lines to be silently skipped by the parser. + path := writeTempFile(t, `defmodule MyApp do + def foo do + x = ~s(this has """ inside parens) + Real.Module.call() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + found = true + } + } + if !found { + t.Error("expected Real.Module.call ref — triple quote inside sigil may have toggled heredoc") + } +} + +func TestParseFile_SigilTripleQuoteDoesNotToggleHeredoc_Bracket(t *testing.T) { + // Same bug with ~s[...] delimiter. + path := writeTempFile(t, `defmodule MyApp do + def foo do + x = ~s[this has """ inside brackets] + Real.Module.call() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + found = true + } + } + if !found { + t.Error("expected Real.Module.call ref") + } +} + +func TestParseFile_MultiLineSigilNoFalseRefs(t *testing.T) { + // Bug: multi-line sigil content was indexed as real references because the + // parser had no "inside sigil" tracking (only heredoc tracking). + path := writeTempFile(t, `defmodule MyApp do + @doc ~S( + Fake.Module.ref() inside sigil + ) + def foo do + Real.Module.call() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + for _, r := range refs { + if r.Module == "Fake.Module" { + t.Errorf("should not extract refs from inside multi-line sigil, got %+v", r) + } + } + + found := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + found = true + } + } + if !found { + t.Error("expected Real.Module.call ref after multi-line sigil") + } +} + +func TestParseFile_MultiLineSigilNoFalseDefs(t *testing.T) { + // Multi-line sigil should not swallow subsequent function definitions. + path := writeTempFile(t, `defmodule MyApp do + @doc ~S( + multi-line sigil content + ) + def foo do + :ok + end + + def bar do + :ok + end +end +`) + + defs, _, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + funcs := map[string]bool{} + for _, d := range defs { + if d.Function != "" { + funcs[d.Function] = true + } + } + if !funcs["foo"] { + t.Error("missing def foo after multi-line sigil") + } + if !funcs["bar"] { + t.Error("missing def bar after multi-line sigil") + } +} + +// --- Additional regression tests for edge cases --- + +func TestParseFile_MultiLineUseWithOpts(t *testing.T) { + // use with opts spanning multiple lines must produce correct refs + path := writeTempFile(t, `defmodule MyApp.Worker do + use GenServer, + restart: :transient + + def init(state), do: {:ok, state} +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := false + for _, r := range refs { + if r.Module == "GenServer" && r.Kind == "use" { + found = true + } + } + if !found { + t.Errorf("expected use ref for GenServer; refs: %+v", refs) + } +} + +func TestParseFile_MultilineDefWithDefaults(t *testing.T) { + // Function head with params spanning lines AND \\\\ defaults + path := writeTempFile(t, `defmodule MyApp.Accounts do + def fetch( + slug, + opts \\ [] + ) do + :ok + end +end +`) + + defs, _, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + found := map[string]bool{} + for _, d := range defs { + if d.Function == "fetch" { + found[fmt.Sprintf("fetch/%d", d.Arity)] = true + } + } + if !found["fetch/1"] || !found["fetch/2"] { + t.Errorf("expected fetch/1 and fetch/2 from multiline def with defaults; got %v", found) + } +} + +func TestParseFile_StringContainingDirectiveComma(t *testing.T) { + // A string literal that looks like "alias Foo," must NOT trigger joining + path := writeTempFile(t, `defmodule MyApp.Foo do + def bar do + x = "alias Fake.Module," + Real.Module.call() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + foundReal := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + foundReal = true + } + if r.Module == "Fake.Module" { + t.Errorf("should not extract refs from string content, got %+v", r) + } + } + if !foundReal { + t.Errorf("Real.Module.call should be found; refs: %+v", refs) + } +} + +func TestParseFile_MultiLineAliasAs_PreservesLineNumber(t *testing.T) { + // Verify that joining preserves the original line number for definitions + path := writeTempFile(t, `defmodule MyApp.Foo do + alias MyModule.MySubModule, + as: Something + + def bar do + :ok + end +end +`) + + defs, _, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + for _, d := range defs { + if d.Function == "bar" { + if d.Line != 5 { + t.Errorf("def bar should be on original line 5, got %d", d.Line) + } + } + } +} + +func TestParseFile_SigilContainingDirective(t *testing.T) { + // Sigil content containing alias/use keywords should not produce refs + path := writeTempFile(t, `defmodule MyApp.Foo do + def bar do + x = ~s(alias Fake.Module) + y = ~s(use Fake.Module, key: val) + Real.Module.call() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + for _, r := range refs { + if r.Module == "Fake.Module" { + t.Errorf("should not extract refs from sigil content, got %+v", r) + } + } + + foundReal := false + for _, r := range refs { + if r.Module == "Real.Module" && r.Function == "call" { + foundReal = true + } + } + if !foundReal { + t.Errorf("Real.Module.call should be found; refs: %+v", refs) + } +} + +func TestParseFile_TrailingCommaInAliasBlock(t *testing.T) { + // Trailing comma after last child in alias block (common formatter output) + path := writeTempFile(t, `defmodule MyApp.Web do + alias MyApp.{ + Accounts, + Users, + } + + def foo do + Accounts.list() + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + aliasRefs := filterRefs(refs, "alias") + found := map[string]bool{} + for _, r := range aliasRefs { + found[r.Module] = true + } + if !found["MyApp.Accounts"] { + t.Error("expected alias ref for MyApp.Accounts from block with trailing comma") + } + if !found["MyApp.Users"] { + t.Error("expected alias ref for MyApp.Users from block with trailing comma") + } + + callRefs := filterRefs(refs, "call") + foundCall := false + for _, r := range callRefs { + if r.Module == "MyApp.Accounts" && r.Function == "list" { + foundCall = true + } + } + if !foundCall { + t.Error("expected Accounts.list() to resolve to MyApp.Accounts.list via alias") + } +} + +func TestParseFile_BareMacroCallMultiTokenBeforeDo(t *testing.T) { + // Bare macro calls with complex arguments before do must be detected. + // These are real patterns from ExUnit (use ExUnit.Case injects setup/test). + path := writeTempFile(t, `defmodule MyApp.Test do + use ExUnit.Case + + setup %{conn: conn} do + {:ok, conn: conn} + end + + test "creates user", %{conn: conn} do + assert conn + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + callRefs := filterRefs(refs, "call") + foundSetup := false + foundTest := false + for _, r := range callRefs { + if r.Function == "setup" { + foundSetup = true + } + if r.Function == "test" { + foundTest = true + } + } + if !foundSetup { + t.Errorf("expected bare macro call ref for setup; got refs: %+v", callRefs) + } + if !foundTest { + t.Errorf("expected bare macro call ref for test; got refs: %+v", callRefs) + } +} + +func TestParseFile_BareMacroCallDoOnNextLine(t *testing.T) { + // do can appear on a separate line from the macro call in valid Elixir. + path := writeTempFile(t, `defmodule MyApp.Test do + use ExUnit.Case + + setup :ok + do + :ok + end + + setup %{ + conn: conn + } do + {:ok, conn: conn} + end +end +`) + + _, refs, err := ParseFile(path) + if err != nil { + t.Fatal(err) + } + + callRefs := filterRefs(refs, "call") + setupCount := 0 + for _, r := range callRefs { + if r.Function == "setup" { + setupCount++ + } + } + if setupCount < 2 { + t.Errorf("expected 2 bare macro call refs for setup, got %d; refs: %+v", setupCount, callRefs) + } +} + +func TestTokenize_HeredocInterpolationWithNestedString(t *testing.T) { + // #{"}"} inside a heredoc must not close the interpolation prematurely. + source := []byte("x = \"\"\"\n#{\"}\"}\n\"\"\"") + tokens := Tokenize(source) + + var kinds []TokenKind + for _, tok := range tokens { + if tok.Kind != TokEOL { + kinds = append(kinds, tok.Kind) + } + } + if len(kinds) < 3 { + t.Fatalf("expected at least 3 non-EOL tokens, got %d: %v", len(kinds), kinds) + } + if kinds[2] != TokHeredoc { + t.Errorf("expected TokHeredoc at position 2, got %v (tokens: %v)", kinds[2], kinds) + } + heredocTok := tokens[0] + for _, tok := range tokens { + if tok.Kind == TokHeredoc { + heredocTok = tok + break + } + } + content := string(source[heredocTok.Start:heredocTok.End]) + if !strings.Contains(content, "#{") { + t.Errorf("heredoc token should contain interpolation, got: %q", content) + } +} + +func TestBareMacroCall_NoFalsePositiveAcrossStatements(t *testing.T) { + source := `defmodule Test do + use SomeMacroLib + + x = 1 + if x > 0 do + :positive + end +end +` + _, refs, _ := ParseText("test.ex", source) + for _, r := range refs { + if r.Kind == "call" && r.Function == "x" { + t.Errorf("false positive: x detected as bare macro call: %+v", r) + } + } +} + +func TestTokenAtOffset(t *testing.T) { + source := []byte("defmodule Foo.Bar do\n def baz(x), do: x\nend\n") + tokens := Tokenize(source) + + tests := []struct { + name string + offset int + want TokenKind + }{ + {"defmodule keyword", 0, TokDefmodule}, + {"middle of defmodule", 5, TokDefmodule}, + {"Foo module", 10, TokModule}, + {"dot", 13, TokDot}, + {"Bar module", 14, TokModule}, + {"do keyword", 18, TokDo}, + {"def keyword", 23, TokDef}, + {"baz ident", 27, TokIdent}, + {"open paren", 30, TokOpenParen}, + {"x param", 31, TokIdent}, + } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + idx := TokenAtOffset(tokens, tt.offset) + if idx < 0 { + t.Fatalf("TokenAtOffset(%d) returned -1", tt.offset) + } + if tokens[idx].Kind != tt.want { + t.Errorf("TokenAtOffset(%d) = %v, want %v", tt.offset, tokens[idx].Kind, tt.want) + } + }) + } + + // Offset in whitespace (between tokens) should return -1 + if idx := TokenAtOffset(tokens, 9); idx >= 0 { + t.Errorf("expected -1 for whitespace offset 9, got token kind %v", tokens[idx].Kind) + } +} + +func TestLineColToOffset(t *testing.T) { + source := []byte("defmodule Foo do\n def bar, do: :ok\nend\n") + result := TokenizeFull(source) + + tests := []struct { + line, col int + wantOff int + }{ + {0, 0, 0}, // start of file + {0, 10, 10}, // "F" in "Foo" + {1, 2, 19}, // "d" in "def" + {2, 0, 36}, // "e" in "end" + } + for _, tt := range tests { + got := LineColToOffset(result.LineStarts, tt.line, tt.col) + if got != tt.wantOff { + t.Errorf("LineColToOffset(line=%d, col=%d) = %d, want %d", tt.line, tt.col, got, tt.wantOff) + } + } + + // Out-of-range line + if got := LineColToOffset(result.LineStarts, 99, 0); got != -1 { + t.Errorf("expected -1 for out-of-range line, got %d", got) + } +} + +func TestBareMacroCall_CommentBetweenArgsAndDo(t *testing.T) { + source := `defmodule Test do + use SomeMacroLib + + setup %{ + # this sets up the connection + conn: conn + } + # yeah, I know it's odd + do + :ok + end +end +` + _, refs, _ := ParseText("test.ex", source) + found := false + for _, r := range refs { + if r.Kind == "call" && r.Function == "setup" { + found = true + } + } + if !found { + t.Errorf("expected bare macro call ref for setup with comment before do") + } +} diff --git a/internal/parser/parser_tokenized.go b/internal/parser/parser_tokenized.go new file mode 100644 index 0000000..f18b61e --- /dev/null +++ b/internal/parser/parser_tokenized.go @@ -0,0 +1,932 @@ +package parser + +import "strings" + +// parseTextFromTokens is the token-stream replacement for the line-based ParseText. +// It walks a []Token stream from the tokenizer and produces identical Definition +// and Reference output. +func parseTextFromTokens(path string, source []byte, tokens []Token) ([]Definition, []Reference, error) { + var defs []Definition + var refs []Reference + + type moduleFrame struct { + name string + depth int + savedAliases map[string]string + savedInjectors map[string]bool + } + + var moduleStack []moduleFrame + depth := 0 + aliases := map[string]string{} + injectors := map[string]bool{} + + n := len(tokens) + + tokenText := func(t Token) string { + return TokenText(source, t) + } + + nextSig := func(from int) int { + return NextSigToken(tokens, n, from) + } + + // isUserModuleToken returns true if the TokModule token represents a user-defined + // module name (starts with ASCII uppercase). Returns false for __MODULE__. + isUserModuleToken := func(t Token) bool { + return source[t.Start] >= 'A' && source[t.Start] <= 'Z' + } + + collectModuleName := func(i int) (string, int) { + return CollectModuleName(source, tokens, n, i) + } + + collectParamsFromTokens := func(i int) (int, int, []string, int) { + return CollectParams(source, tokens, n, i) + } + + fixParamNames := func(names []string) []string { + return FixParamNames(names) + } + + currentModule := func() string { + if len(moduleStack) > 0 { + return moduleStack[len(moduleStack)-1].name + } + return "" + } + + // processModuleDef handles defmodule/defprotocol/defimpl. + // It collects the module name, scans forward to consume the TokDo token, + // increments depth, and pushes the module frame with the post-increment depth. + // For inline `, do:` modules the definition is still emitted but no frame is + // pushed (no do..end scope to track). + processModuleDef := func(i int, kind string) int { + kwLine := tokens[i-1].Line + j := nextSig(i) + name, j := collectModuleName(j) + if name == "" { + return i + } + if !strings.Contains(name, ".") && currentModule() != "" { + name = currentModule() + "." + name + } + + // Always emit the module definition — even `, do:` one-liners must be + // tracked so callers can find the module. + defs = append(defs, Definition{ + Module: name, + Line: kwLine, + FilePath: path, + Kind: kind, + }) + + // Scan forward to find and consume TokDo (skipping "for: Module" etc.). + // Do not stop at TokEOL — Elixir allows `defmodule Name` then `do` on the + // next line; stopping at EOL left TokDo to the main loop (double-counting + // depth) so inner `end` did not pop the inner module frame. + // Stop at statement-boundary tokens to avoid stealing a later module's TokDo + // when the current module uses the `, do:` keyword form. + _, scanPos, hasDo := ScanForwardToBlockDo(tokens, n, j) + if hasDo { + depth++ + moduleStack = append(moduleStack, moduleFrame{ + name: name, + depth: depth, + savedAliases: copyMap(aliases), + savedInjectors: copyBoolMap(injectors), + }) + } + return scanPos + } + + emitModuleRef := func(modName string, line int, kind string) { + resolved := resolveModule(modName, currentModule()) + if !strings.Contains(resolved, "__MODULE__") { + refs = append(refs, Reference{Module: resolved, Line: line, FilePath: path, Kind: kind}) + } + } + + scanDelegateOpts := func(i int) (string, string) { + var delegateTo, delegateAs string + bracketDepth := 0 + for i < n { + tok := tokens[i] + if tok.Kind == TokEOF { + break + } + if bracketDepth == 0 { + switch tok.Kind { + case TokEnd, TokDef, TokDefp, TokDefmacro, TokDefmacrop, + TokDefguard, TokDefguardp, TokDefdelegate, TokDefmodule, + TokDefprotocol, TokDefimpl, TokAlias, TokImport: + return delegateTo, delegateAs + } + } + switch tok.Kind { + case TokOpenParen, TokOpenBracket, TokOpenBrace: + bracketDepth++ + case TokCloseParen, TokCloseBracket, TokCloseBrace: + bracketDepth-- + } + if tok.Kind == TokIdent { + text := tokenText(tok) + if text == "to" && i+1 < n && tokens[i+1].Kind == TokColon { + j := nextSig(i + 2) + modName, _ := collectModuleName(j) + if modName != "" { + target := modName + if currentModule() != "" { + target = strings.ReplaceAll(target, "__MODULE__", currentModule()) + } + if resolved, ok := aliases[target]; ok { + delegateTo = resolved + } else if parts := strings.SplitN(target, ".", 2); len(parts) == 2 { + if resolved, ok := aliases[parts[0]]; ok { + delegateTo = resolved + "." + parts[1] + } else { + delegateTo = target + } + } else { + delegateTo = target + } + } + } + if text == "as" && i+1 < n && tokens[i+1].Kind == TokColon { + j := nextSig(i + 2) + if j < n { + switch tokens[j].Kind { + case TokAtom: + atomText := tokenText(tokens[j]) + if len(atomText) > 1 && atomText[0] == ':' { + delegateAs = atomText[1:] + } + case TokIdent: + delegateAs = tokenText(tokens[j]) + } + } + } + } + i++ + } + return delegateTo, delegateAs + } + + // extractModuleRefs emits call/struct refs for module references in a token range. + // Only processes TokModule tokens that start with ASCII uppercase (matching old regex behavior). + extractModuleRefs := func(lineStart, lineEnd int) { + cm := currentModule() + for j := lineStart; j < lineEnd; j++ { + tok := tokens[j] + + // %Module{ struct literal + if tok.Kind == TokPercent && j+1 < lineEnd && tokens[j+1].Kind == TokModule && isUserModuleToken(tokens[j+1]) { + modName, k := collectModuleName(j + 1) + if k < lineEnd && tokens[k].Kind == TokOpenBrace { + resolved := ResolveModuleRef(modName, aliases, cm) + if resolved != "" { + refs = append(refs, Reference{Module: resolved, Line: tok.Line, FilePath: path, Kind: "call"}) + } + j = k + continue + } + } + + if tok.Kind != TokModule || !isUserModuleToken(tok) { + continue + } + + modName, k := collectModuleName(j) + + // Skip if preceded by % (struct literal already handled above) + if j > 0 && tokens[j-1].Kind == TokPercent { + j = k - 1 + continue + } + + // Module.function call + if k < lineEnd && tokens[k].Kind == TokDot && k+1 < lineEnd && tokens[k+1].Kind == TokIdent { + funcName := tokenText(tokens[k+1]) + if !elixirKeyword[funcName] { + resolved := ResolveModuleRef(modName, aliases, cm) + if resolved != "" { + refs = append(refs, Reference{Module: resolved, Function: funcName, Line: tok.Line, FilePath: path, Kind: "call"}) + } + } + j = k + 1 + continue + } + + // Standalone module ref (skip self-references) + if modName != cm { + resolved := ResolveModuleRef(modName, aliases, cm) + if resolved != "" { + refs = append(refs, Reference{Module: resolved, Line: tok.Line, FilePath: path, Kind: "call"}) + } + } + j = k - 1 + } + } + + // trackLineDepth scans tokens[lineStart:lineEnd] for TokDo/TokFn/TokEnd + // and updates depth accordingly. TokEnd also checks for module stack pops. + trackLineDepth := func(lineStart, lineEnd int) { + for j := lineStart; j < lineEnd; j++ { + switch tokens[j].Kind { + case TokDo, TokFn: + TrackBlockDepth(tokens[j].Kind, &depth) + case TokEnd: + prevDepth := depth + TrackBlockDepth(tokens[j].Kind, &depth) + if len(moduleStack) > 0 && moduleStack[len(moduleStack)-1].depth == prevDepth { + frame := moduleStack[len(moduleStack)-1] + moduleStack = moduleStack[:len(moduleStack)-1] + aliases = frame.savedAliases + injectors = frame.savedInjectors + } + } + } + } + + // Main token walker + i := 0 + for i < n { + tok := tokens[i] + + switch tok.Kind { + case TokEOL, TokComment, TokString, TokHeredoc, TokSigil, + TokCharLiteral, TokAtom, TokNumber, TokOther, + TokDot, TokColon, TokOpenParen, TokCloseParen, + TokOpenBracket, TokCloseBracket, TokOpenBrace, TokCloseBrace, + TokOpenAngle, TokCloseAngle, TokBackslash, TokRightArrow, + TokLeftArrow, TokAssoc, TokDoubleColon, TokComma, TokWhen: + i++ + continue + + case TokEOF: + i = n + continue + + case TokEnd: + prevDepth := depth + TrackBlockDepth(tok.Kind, &depth) + if len(moduleStack) > 0 && moduleStack[len(moduleStack)-1].depth == prevDepth { + frame := moduleStack[len(moduleStack)-1] + moduleStack = moduleStack[:len(moduleStack)-1] + aliases = frame.savedAliases + injectors = frame.savedInjectors + } + i++ + continue + + case TokDo, TokFn: + TrackBlockDepth(tok.Kind, &depth) + i++ + continue + + case TokDefmodule: + i++ + i = processModuleDef(i, "module") + continue + + case TokDefprotocol: + i++ + i = processModuleDef(i, "defprotocol") + continue + + case TokDefimpl: + i++ + i = processModuleDef(i, "defimpl") + continue + + case TokDef, TokDefp, TokDefmacro, TokDefmacrop, TokDefguard, TokDefguardp, TokDefdelegate: + cm := currentModule() + if cm == "" { + i++ + continue + } + kind := tokenText(tok) + defLine := tok.Line + i++ + j := nextSig(i) + if j >= n || tokens[j].Kind != TokIdent { + i = j + goto extractRefsForLine + } + { + funcName := tokenText(tokens[j]) + j++ + + pj := nextSig(j) + maxArity := 0 + defaultCount := 0 + var paramNames []string + if pj < n && tokens[pj].Kind == TokOpenParen { + maxArity, defaultCount, paramNames, pj = collectParamsFromTokens(pj) + paramNames = fixParamNames(paramNames) + } + + var delegateTo, delegateAs string + if kind == "defdelegate" { + delegateTo, delegateAs = scanDelegateOpts(pj) + } + + minArity := maxArity - defaultCount + for arity := minArity; arity <= maxArity; arity++ { + params := JoinParams(paramNames, arity) + defs = append(defs, Definition{ + Module: cm, + Function: funcName, + Arity: arity, + Line: defLine, + FilePath: path, + Kind: kind, + DelegateTo: delegateTo, + DelegateAs: delegateAs, + Params: params, + }) + } + i = j + } + goto extractRefsForLine + + case TokDefstruct: + cm := currentModule() + if cm != "" { + defs = append(defs, Definition{ + Module: cm, + Function: "__struct__", + Line: tok.Line, + FilePath: path, + Kind: "defstruct", + }) + } + i++ + goto extractRefsForLine + + case TokDefexception: + cm := currentModule() + if cm != "" { + defs = append(defs, Definition{ + Module: cm, + Function: "__exception__", + Line: tok.Line, + FilePath: path, + Kind: "defexception", + }) + } + i++ + goto extractRefsForLine + + case TokAlias: + aliasLine := tok.Line + i++ + j := nextSig(i) + modName, k := collectModuleName(j) + if modName == "" { + i = k + continue + } + cm := currentModule() + + // Multi-alias: alias MyApp.{Users, Accounts} + if children, nextPos, ok := ScanMultiAliasChildren(source, tokens, n, k, false); ok { + parentResolved := resolveModule(modName, cm) + for _, childName := range children { + fullChild := parentResolved + "." + childName + aliases[AliasShortName(childName)] = fullChild + emitModuleRef(fullChild, aliasLine, "alias") + } + i = nextPos + continue + } + + // Alias with as: + if asName, nextPos, ok := ScanKeywordOptionValue(source, tokens, n, k, "as"); ok { + resolved := resolveModule(modName, cm) + if !strings.Contains(resolved, "__MODULE__") { + aliases[asName] = resolved + refs = append(refs, Reference{Module: resolved, Line: aliasLine, FilePath: path, Kind: "alias"}) + } + i = nextPos + continue + } + + // Simple alias + { + resolved := resolveModule(modName, cm) + aliases[AliasShortName(resolved)] = resolved + emitModuleRef(resolved, aliasLine, "alias") + } + i = k + continue + + case TokImport: + importLine := tok.Line + i++ + j := nextSig(i) + modName, k := collectModuleName(j) + if modName != "" { + resolved := resolveModule(modName, currentModule()) + if !strings.Contains(resolved, "__MODULE__") { + refs = append(refs, Reference{Module: resolved, Line: importLine, FilePath: path, Kind: "import"}) + injectors[resolved] = true + } + } + i = k + continue + + case TokUse: + useLine := tok.Line + i++ + j := nextSig(i) + modName, k := collectModuleName(j) + if modName != "" { + resolved := resolveModule(modName, currentModule()) + if !strings.Contains(resolved, "__MODULE__") { + refs = append(refs, Reference{Module: resolved, Line: useLine, FilePath: path, Kind: "use"}) + injectors[resolved] = true + } + } + i = k + continue + + case TokRequire: + requireLine := tok.Line + i++ + j := nextSig(i) + modName, k := collectModuleName(j) + if modName == "" { + i = k + goto extractRefsForLine + } + cm := currentModule() + + // Check for require Module, as: Name + if asName, nextPos, ok := ScanKeywordOptionValue(source, tokens, n, k, "as"); ok { + resolved := resolveModule(modName, cm) + if !strings.Contains(resolved, "__MODULE__") { + aliases[asName] = resolved + refs = append(refs, Reference{Module: resolved, Line: requireLine, FilePath: path, Kind: "require"}) + } + i = nextPos + continue + } + + // Simple require (no as:) — still emit reference but no alias + resolved := resolveModule(modName, cm) + if !strings.Contains(resolved, "__MODULE__") { + refs = append(refs, Reference{Module: resolved, Line: requireLine, FilePath: path, Kind: "require"}) + } + i = k + continue + + case TokAttrType: + cm := currentModule() + if cm != "" { + attrLine := tok.Line + attrText := tokenText(tok) + kind := "type" + switch attrText { + case "@opaque": + kind = "opaque" + case "@typep": + i++ + goto extractRefsForLine + } + i++ + j := nextSig(i) + if j < n && tokens[j].Kind == TokIdent { + name := tokenText(tokens[j]) + arity := 0 + pj := nextSig(j + 1) + if pj < n && tokens[pj].Kind == TokOpenParen { + arity, _, _, _ = collectParamsFromTokens(pj) + } + defs = append(defs, Definition{ + Module: cm, + Function: name, + Arity: arity, + Line: attrLine, + FilePath: path, + Kind: kind, + }) + } + i = j + } else { + i++ + } + goto extractRefsForLine + + case TokAttrBehaviour: + cm := currentModule() + if cm != "" { + attrLine := tok.Line + i++ + j := nextSig(i) + modName, k := collectModuleName(j) + if modName != "" { + resolved := resolveModule(modName, cm) + if !strings.Contains(resolved, "__MODULE__") { + refs = append(refs, Reference{Module: resolved, Line: attrLine, FilePath: path, Kind: "behaviour"}) + } + } + i = k + } else { + i++ + } + goto extractRefsForLine + + case TokAttrCallback: + cm := currentModule() + if cm != "" { + attrLine := tok.Line + attrText := tokenText(tok) + kind := "callback" + if attrText == "@macrocallback" { + kind = "macrocallback" + } + i++ + j := nextSig(i) + if j < n && tokens[j].Kind == TokIdent { + name := tokenText(tokens[j]) + arity := 0 + pj := nextSig(j + 1) + if pj < n && tokens[pj].Kind == TokOpenParen { + arity, _, _, _ = collectParamsFromTokens(pj) + } + defs = append(defs, Definition{ + Module: cm, + Function: name, + Arity: arity, + Line: attrLine, + FilePath: path, + Kind: kind, + }) + } + i = j + } else { + i++ + } + goto extractRefsForLine + + case TokAttrDoc, TokAttrSpec, TokAttr: + i++ + goto extractRefsForLine + + case TokPercent: + // %Module{ struct literal + if i+1 < n && tokens[i+1].Kind == TokModule && isUserModuleToken(tokens[i+1]) { + modName, k := collectModuleName(i + 1) + if k < n && tokens[k].Kind == TokOpenBrace { + cm := currentModule() + resolved := ResolveModuleRef(modName, aliases, cm) + if resolved != "" { + refs = append(refs, Reference{Module: resolved, Line: tok.Line, FilePath: path, Kind: "call"}) + } + i = k + 1 + continue + } + } + i++ + continue + + case TokModule: + // Skip __MODULE__ and other non-ASCII-uppercase module tokens + if !isUserModuleToken(tok) { + i++ + continue + } + + cm := currentModule() + modName, k := collectModuleName(i) + + // Skip if preceded by % (struct literal handled by TokPercent case) + if i > 0 && tokens[i-1].Kind == TokPercent { + i = k + continue + } + + // Module.function call + if k < n && tokens[k].Kind == TokDot && k+1 < n && tokens[k+1].Kind == TokIdent { + funcName := tokenText(tokens[k+1]) + if !elixirKeyword[funcName] { + resolved := ResolveModuleRef(modName, aliases, cm) + if resolved != "" { + refs = append(refs, Reference{Module: resolved, Function: funcName, Line: tok.Line, FilePath: path, Kind: "call"}) + } + } + i = k + 2 + continue + } + + // Standalone module ref (skip self-references) + if modName != cm { + resolved := ResolveModuleRef(modName, aliases, cm) + if resolved != "" { + refs = append(refs, Reference{Module: resolved, Line: tok.Line, FilePath: path, Kind: "call"}) + } + } + i = k + continue + + case TokPipe: + cm := currentModule() + if cm != "" && len(injectors) > 0 { + j := nextSig(i + 1) + if j < n && tokens[j].Kind == TokIdent { + name := tokenText(tokens[j]) + if !elixirKeyword[name] { + for mod := range injectors { + refs = append(refs, Reference{Module: mod, Function: name, Line: tokens[j].Line, FilePath: path, Kind: "call"}) + } + } + } + } + i++ + continue + + case TokIdent: + cm := currentModule() + if cm != "" && len(injectors) > 0 { + isStatementStart := i == 0 || tokens[i-1].Kind == TokEOL || tokens[i-1].Kind == TokComment + if isStatementStart { + name := tokenText(tok) + if !elixirKeyword[name] { + emit := false + j := i + 1 + if j < n { + switch tokens[j].Kind { + case TokDo: + // macro_name do + emit = true + case TokOpenParen: + // macro_name(...) + emit = true + case TokAtom: + // macro_name :atom + emit = true + default: + // Scan forward to see if TokDo follows the arguments. + // In Elixir, `do` can follow across EOLs and blank lines + // but not past an intervening statement. We track whether + // we've seen EOL at bracket depth 0: once we have, the + // only token that can continue the expression is `do`. + scanDepth := 0 + seenEOLAtZero := false + for k := j; k < n; k++ { + switch tokens[k].Kind { + case TokDo: + if scanDepth == 0 { + emit = true + } + case TokEOL, TokComment: + if scanDepth == 0 { + seenEOLAtZero = true + } + case TokOpenParen, TokOpenBracket, TokOpenBrace: + scanDepth++ + seenEOLAtZero = false + case TokCloseParen, TokCloseBracket, TokCloseBrace: + scanDepth-- + case TokEOF: + k = n + default: + // At depth 0, after seeing EOL, any non-do + // token means a new statement started. + if scanDepth == 0 && seenEOLAtZero { + k = n // stop + } + } + if emit { + break + } + } + } + } + if emit { + for mod := range injectors { + refs = append(refs, Reference{Module: mod, Function: name, Line: tok.Line, FilePath: path, Kind: "call"}) + } + } + } + } + } + i++ + continue + } + + i++ + continue + + extractRefsForLine: + { + triggerLine := tok.Line + lineStart := i + for lineStart > 0 && tokens[lineStart-1].Line == triggerLine && tokens[lineStart-1].Kind != TokEOL { + lineStart-- + } + lineEnd := i + for lineEnd < n && tokens[lineEnd].Kind != TokEOL && tokens[lineEnd].Kind != TokEOF { + lineEnd++ + } + + // Track depth changes (TokDo/TokFn/TokEnd) on this line so that + // def/defp/case/fn blocks that open here are properly counted. + trackLineDepth(lineStart, lineEnd) + + extractModuleRefs(lineStart, lineEnd) + + // Check for pipe calls on this line + if currentModule() != "" && len(injectors) > 0 { + for j := lineStart; j < lineEnd; j++ { + if tokens[j].Kind == TokPipe { + pj := nextSig(j + 1) + if pj < lineEnd && tokens[pj].Kind == TokIdent { + name := tokenText(tokens[pj]) + if !elixirKeyword[name] { + for mod := range injectors { + refs = append(refs, Reference{Module: mod, Function: name, Line: tokens[pj].Line, FilePath: path, Kind: "call"}) + } + } + } + } + } + } + + // Advance past this line + for i < n && tokens[i].Kind != TokEOL && tokens[i].Kind != TokEOF { + i++ + } + } + } + + return defs, refs, nil +} + +func boolToInt(b bool) int { + if b { + return 1 + } + return 0 +} + +func itoa(n int) string { + if n < 10 { + return string(rune('0' + n)) + } + return itoa(n/10) + string(rune('0'+n%10)) +} + +// Exported token-walking helpers shared with the LSP package. + +// LineColToOffset converts a 0-based (line, col) pair to a byte offset using +// the LineStarts table from TokenizeFull. Returns -1 if out of range. +func LineColToOffset(lineStarts []int, line, col int) int { + if line < 0 || line >= len(lineStarts) { + return -1 + } + return lineStarts[line] + col +} + +// TokenAtOffset returns the index of the token containing byteOffset, or -1 +// if the offset falls in a gap between tokens (whitespace) or is out of range. +// Uses binary search for O(log n) lookup. +func TokenAtOffset(tokens []Token, byteOffset int) int { + lo, hi := 0, len(tokens)-1 + for lo <= hi { + mid := lo + (hi-lo)/2 + t := tokens[mid] + if byteOffset < t.Start { + hi = mid - 1 + } else if byteOffset >= t.End { + lo = mid + 1 + } else { + return mid + } + } + return -1 +} + +func TokenText(source []byte, t Token) string { + return string(source[t.Start:t.End]) +} + +func NextSigToken(tokens []Token, n, from int) int { + for from < n && (tokens[from].Kind == TokEOL || tokens[from].Kind == TokComment) { + from++ + } + return from +} + +func CollectModuleName(source []byte, tokens []Token, n, i int) (string, int) { + if i >= n || tokens[i].Kind != TokModule { + return "", i + } + var parts []string + parts = append(parts, string(source[tokens[i].Start:tokens[i].End])) + i++ + for i+1 < n && tokens[i].Kind == TokDot && tokens[i+1].Kind == TokModule { + parts = append(parts, string(source[tokens[i+1].Start:tokens[i+1].End])) + i += 2 + } + return strings.Join(parts, "."), i +} + +func CollectParams(source []byte, tokens []Token, n, i int) (int, int, []string, int) { + if i >= n || tokens[i].Kind != TokOpenParen { + return 0, 0, nil, i + } + i++ + bracketDepth := 1 + commas := 0 + defaults := 0 + hasContent := false + var paramNames []string + currentParamName := "" + seenDefault := false + + for i < n && bracketDepth > 0 { + tok := tokens[i] + switch tok.Kind { + case TokOpenParen, TokOpenBracket, TokOpenBrace: + bracketDepth++ + hasContent = true + i++ + case TokOpenAngle: + bracketDepth++ + hasContent = true + i++ + case TokCloseAngle: + bracketDepth-- + i++ + case TokCloseParen, TokCloseBracket, TokCloseBrace: + bracketDepth-- + if bracketDepth == 0 { + if hasContent { + if seenDefault { + defaults++ + } + paramNames = append(paramNames, currentParamName) + } + i++ + return commas + boolToInt(hasContent), defaults, paramNames, i + } + i++ + case TokComma: + if bracketDepth == 1 { + commas++ + if seenDefault { + defaults++ + } + paramNames = append(paramNames, currentParamName) + currentParamName = "" + seenDefault = false + } + i++ + case TokBackslash: + if bracketDepth == 1 { + seenDefault = true + } + hasContent = true + i++ + case TokIdent: + if bracketDepth == 1 && currentParamName == "" { + name := string(source[tok.Start:tok.End]) + if name != "_" { + currentParamName = name + } + } + hasContent = true + i++ + case TokOther: + if bracketDepth == 1 && tok.End-tok.Start == 1 && source[tok.Start] == '=' { + currentParamName = "" + } + hasContent = true + i++ + case TokEOL, TokComment: + i++ + default: + hasContent = true + i++ + } + } + if hasContent { + if seenDefault { + defaults++ + } + paramNames = append(paramNames, currentParamName) + return commas + 1, defaults, paramNames, i + } + return 0, 0, nil, i +} + +func FixParamNames(names []string) []string { + for idx, name := range names { + if name == "" { + names[idx] = "arg" + itoa(idx+1) + } + } + return names +} diff --git a/internal/parser/token_walk.go b/internal/parser/token_walk.go new file mode 100644 index 0000000..09321a9 --- /dev/null +++ b/internal/parser/token_walk.go @@ -0,0 +1,121 @@ +package parser + +import "strings" + +// IsStatementBoundaryToken reports whether kind starts a new statement or closes +// the current one, so forward scans should stop before consuming later syntax. +func IsStatementBoundaryToken(kind TokenKind) bool { + switch kind { + case TokEOF, TokEnd, + TokDefmodule, TokDefprotocol, TokDefimpl, + TokDef, TokDefp, TokDefmacro, TokDefmacrop, + TokDefguard, TokDefguardp, TokDefdelegate, + TokAttrType, TokAttrCallback: + return true + } + return false +} + +// ScanForwardToBlockDo scans tokens[from:] for a block-opening TokDo. +// It does not stop at EOL because Elixir allows split-line heads with `do` +// on the next line. It stops at statement-boundary tokens so malformed or +// inline `, do:` forms do not steal a later construct's block opener. +func ScanForwardToBlockDo(tokens []Token, n, from int) (doIdx, nextPos int, hasDo bool) { + for j := from; j < n; j++ { + switch tokens[j].Kind { + case TokDo: + return j, j + 1, true + default: + if IsStatementBoundaryToken(tokens[j].Kind) { + return -1, j, false + } + } + } + return -1, n, false +} + +// TrackBlockDepth updates the block depth counter for do/fn/end tokens. +func TrackBlockDepth(kind TokenKind, depth *int) { + switch kind { + case TokDo, TokFn: + *depth += 1 + case TokEnd: + if *depth > 0 { + *depth -= 1 + } + } +} + +// AliasShortName returns the alias key for a module path. +func AliasShortName(name string) string { + if dot := strings.LastIndexByte(name, '.'); dot >= 0 { + return name[dot+1:] + } + return name +} + +// ScanKeywordOptionValue scans for `key: Value` immediately after the token at +// from (typically the position after a parsed module expression) and returns the +// Value token text when present. nextPos points one past the Value token. +func ScanKeywordOptionValue(source []byte, tokens []Token, n, from int, key string) (value string, nextPos int, ok bool) { + nk := NextSigToken(tokens, n, from) + if nk >= n || tokens[nk].Kind != TokComma { + return "", from, false + } + afterComma := NextSigToken(tokens, n, nk+1) + if afterComma >= n || tokens[afterComma].Kind != TokIdent || TokenText(source, tokens[afterComma]) != key { + return "", from, false + } + afterKey := NextSigToken(tokens, n, afterComma+1) + if afterKey >= n || tokens[afterKey].Kind != TokColon { + return "", from, false + } + afterColon := NextSigToken(tokens, n, afterKey+1) + if afterColon >= n { + return "", from, false + } + if tokens[afterColon].Kind != TokModule && tokens[afterColon].Kind != TokIdent { + return "", from, false + } + return TokenText(source, tokens[afterColon]), afterColon + 1, true +} + +// ScanMultiAliasChildren collects child module names from `alias Parent.{A, B}`. +// It expects `from` to point at the token after the parent module expression. +// When stopAtStatement is true, it aborts on statement keywords inside the brace +// body so malformed input does not swallow later declarations. +func ScanMultiAliasChildren(source []byte, tokens []Token, n, from int, stopAtStatement bool) (children []string, nextPos int, ok bool) { + if from >= n || tokens[from].Kind != TokDot || from+1 >= n || tokens[from+1].Kind != TokOpenBrace { + return nil, from, false + } + k := from + 2 + for k < n && tokens[k].Kind != TokCloseBrace && tokens[k].Kind != TokEOF { + k = NextSigToken(tokens, n, k) + if k >= n || tokens[k].Kind == TokCloseBrace { + break + } + if stopAtStatement { + switch tokens[k].Kind { + case TokDef, TokDefp, TokDefmacro, TokDefmacrop, + TokDefmodule, TokEnd, TokImport, TokUse, TokAlias: + return children, k, true + } + } + child, nk := CollectModuleName(source, tokens, n, k) + if child != "" { + children = append(children, child) + } + if nk == k { + k++ + } else { + k = nk + } + if k < n && tokens[k].Kind == TokComma { + k++ + } + } + if k < n && tokens[k].Kind == TokCloseBrace { + k++ + } + return children, k, true +} diff --git a/internal/parser/token_walk_test.go b/internal/parser/token_walk_test.go new file mode 100644 index 0000000..0bb92c2 --- /dev/null +++ b/internal/parser/token_walk_test.go @@ -0,0 +1,386 @@ +package parser + +import ( + "reflect" + "testing" +) + +func TestIsStatementBoundaryToken_IncludesTypeAndCallbackAttrs(t *testing.T) { + if !IsStatementBoundaryToken(TokAttrType) { + t.Fatal("expected TokAttrType to be a statement boundary") + } + if !IsStatementBoundaryToken(TokAttrCallback) { + t.Fatal("expected TokAttrCallback to be a statement boundary") + } +} + +func TestTrackBlockDepth(t *testing.T) { + depth := 0 + + TrackBlockDepth(TokEnd, &depth) + if depth != 0 { + t.Fatalf("unexpected negative depth handling: got %d", depth) + } + + TrackBlockDepth(TokDo, &depth) + TrackBlockDepth(TokFn, &depth) + if depth != 2 { + t.Fatalf("expected depth 2 after do+fn, got %d", depth) + } + + TrackBlockDepth(TokEnd, &depth) + TrackBlockDepth(TokEnd, &depth) + if depth != 0 { + t.Fatalf("expected depth back to 0, got %d", depth) + } +} + +func TestAliasShortName(t *testing.T) { + tests := []struct { + in string + want string + }{ + {"Foo", "Foo"}, + {"Foo.Bar", "Bar"}, + {"Foo.Bar.Baz", "Baz"}, + } + + for _, tt := range tests { + if got := AliasShortName(tt.in); got != tt.want { + t.Fatalf("AliasShortName(%q) = %q, want %q", tt.in, got, tt.want) + } + } +} + +func TestScanForwardToBlockDo(t *testing.T) { + // Split-line do should still be found. + source := []byte("def foo(\n x\n)\ndo\n x\nend\n") + tokens := Tokenize(source) + n := len(tokens) + + defIdx := -1 + for i, tok := range tokens { + if tok.Kind == TokDef { + defIdx = i + break + } + } + if defIdx < 0 { + t.Fatal("missing TokDef in test source") + } + + doIdx, _, hasDo := ScanForwardToBlockDo(tokens, n, defIdx+1) + if !hasDo || doIdx < 0 || tokens[doIdx].Kind != TokDo { + t.Fatalf("expected split-line TokDo to be found, got hasDo=%v doIdx=%d", hasDo, doIdx) + } +} + +func TestScanForwardToBlockDo_StopsAtStatementBoundary(t *testing.T) { + source := []byte("def foo, do: :ok\ndef bar do\n :ok\nend\n") + tokens := Tokenize(source) + n := len(tokens) + + firstDef := -1 + for i, tok := range tokens { + if tok.Kind == TokDef { + firstDef = i + break + } + } + if firstDef < 0 { + t.Fatal("missing first TokDef in test source") + } + + _, nextPos, hasDo := ScanForwardToBlockDo(tokens, n, firstDef+1) + if hasDo { + t.Fatal("unexpected TokDo detected for inline do: form") + } + if nextPos >= n || tokens[nextPos].Kind != TokDef { + t.Fatalf("expected scan to stop at next TokDef boundary, got nextPos=%d kind=%v", nextPos, tokens[nextPos].Kind) + } +} + +func TestScanKeywordOptionValue(t *testing.T) { + source := []byte("alias Foo.Bar, as: Baz") + tokens := Tokenize(source) + n := len(tokens) + + aliasIdx := -1 + for i, tok := range tokens { + if tok.Kind == TokAlias { + aliasIdx = i + break + } + } + if aliasIdx < 0 { + t.Fatal("missing TokAlias in test source") + } + + j := NextSigToken(tokens, n, aliasIdx+1) + _, k := CollectModuleName(source, tokens, n, j) + + value, _, ok := ScanKeywordOptionValue(source, tokens, n, k, "as") + if !ok { + t.Fatal("expected as: option to be detected") + } + if value != "Baz" { + t.Fatalf("expected alias value Baz, got %q", value) + } + + if _, _, ok := ScanKeywordOptionValue(source, tokens, n, k, "nope"); ok { + t.Fatal("unexpected option match for invalid key") + } +} + +func TestScanMultiAliasChildren(t *testing.T) { + source := []byte("alias Parent.{A, B, C}") + tokens := Tokenize(source) + n := len(tokens) + + aliasIdx := -1 + for i, tok := range tokens { + if tok.Kind == TokAlias { + aliasIdx = i + break + } + } + if aliasIdx < 0 { + t.Fatal("missing TokAlias in test source") + } + + j := NextSigToken(tokens, n, aliasIdx+1) + _, k := CollectModuleName(source, tokens, n, j) + + children, _, ok := ScanMultiAliasChildren(source, tokens, n, k, false) + if !ok { + t.Fatal("expected multi-alias children to be detected") + } + want := []string{"A", "B", "C"} + if !reflect.DeepEqual(children, want) { + t.Fatalf("children mismatch: got %v, want %v", children, want) + } +} + +func TestScanMultiAliasChildren_StopAtStatement(t *testing.T) { + source := []byte("alias Parent.{A, def foo, do: :ok}") + tokens := Tokenize(source) + n := len(tokens) + + aliasIdx := -1 + for i, tok := range tokens { + if tok.Kind == TokAlias { + aliasIdx = i + break + } + } + if aliasIdx < 0 { + t.Fatal("missing TokAlias in test source") + } + + j := NextSigToken(tokens, n, aliasIdx+1) + _, k := CollectModuleName(source, tokens, n, j) + + children, nextPos, ok := ScanMultiAliasChildren(source, tokens, n, k, true) + if !ok { + t.Fatal("expected scan to return even for malformed multi-alias") + } + if len(children) != 1 || children[0] != "A" { + t.Fatalf("expected only first child before statement boundary, got %v", children) + } + if nextPos >= n || tokens[nextPos].Kind != TokDef { + t.Fatalf("expected nextPos at TokDef boundary, got nextPos=%d kind=%v", nextPos, tokens[nextPos].Kind) + } +} + +// ============================================================================= +// TokenWalker tests +// ============================================================================= + +func TestTokenWalker_BasicIteration(t *testing.T) { + source := []byte("def foo do\n :ok\nend") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + var kinds []TokenKind + for w.More() { + kinds = append(kinds, w.CurrentKind()) + w.Advance() + } + + if len(kinds) == 0 { + t.Fatal("expected some tokens") + } + if kinds[0] != TokDef { + t.Errorf("first token: got %v, want TokDef", kinds[0]) + } +} + +func TestTokenWalker_DepthTracking(t *testing.T) { + source := []byte("foo(bar(x), [y])") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + // Find positions manually + maxDepth := 0 + for w.More() { + if w.Depth() > maxDepth { + maxDepth = w.Depth() + } + w.Advance() + } + + // Depth should hit 2 for nested parens + if maxDepth < 2 { + t.Errorf("expected max depth >= 2, got %d", maxDepth) + } + + // After full iteration, should be back to 0 + if w.Depth() != 0 { + t.Errorf("final depth: got %d, want 0", w.Depth()) + } +} + +func TestTokenWalker_BlockDepthTracking(t *testing.T) { + source := []byte("def foo do\n fn -> :ok end\nend") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + maxBlockDepth := 0 + for w.More() { + if w.BlockDepth() > maxBlockDepth { + maxBlockDepth = w.BlockDepth() + } + w.Advance() + } + + // Block depth should hit 2 (def do + fn) + if maxBlockDepth != 2 { + t.Errorf("expected max block depth 2, got %d", maxBlockDepth) + } + + // After full iteration, should be back to 0 + if w.BlockDepth() != 0 { + t.Errorf("final block depth: got %d, want 0", w.BlockDepth()) + } +} + +func TestTokenWalker_NegativeDepthClamp(t *testing.T) { + // Start mid-expression with unmatched closing bracket + source := []byte(") + x") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + for w.More() { + w.Advance() + } + + // Depth should never go negative + if w.Depth() < 0 { + t.Errorf("depth went negative: %d", w.Depth()) + } +} + +func TestTokenWalker_SkipToEndOfStatement(t *testing.T) { + source := []byte("foo(x,\n y)\nbar()") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + w.SkipToEndOfStatement() + + // Should stop at EOL after the first complete statement + if w.CurrentKind() != TokEOL { + t.Errorf("expected TokEOL, got %v", w.CurrentKind()) + } +} + +func TestTokenWalker_EnsureProgress(t *testing.T) { + source := []byte("foo bar") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + prevPos := w.Pos() + // Simulate a function that doesn't advance + w.EnsureProgress(prevPos) + + if w.Pos() == prevPos { + t.Error("EnsureProgress should have advanced position") + } +} + +func TestTokenWalker_CollectModuleName(t *testing.T) { + source := []byte("Foo.Bar.Baz") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + + name := w.CollectModuleName() + if name != "Foo.Bar.Baz" { + t.Errorf("got %q, want Foo.Bar.Baz", name) + } +} + +func TestTokenWalker_ScanForBlockDo(t *testing.T) { + t.Run("do on same line", func(t *testing.T) { + source := []byte("defmodule Foo do") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + w.Advance() // skip defmodule + w.SkipToNextSig() + w.CollectModuleName() // skip Foo + + if !w.ScanForBlockDo() { + t.Error("expected to find do") + } + if w.BlockDepth() != 1 { + t.Errorf("block depth: got %d, want 1", w.BlockDepth()) + } + }) + + t.Run("do on next line", func(t *testing.T) { + source := []byte("defmodule Foo\ndo") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + w.Advance() // skip defmodule + w.SkipToNextSig() + w.CollectModuleName() // skip Foo + + if !w.ScanForBlockDo() { + t.Error("expected to find do on next line") + } + }) + + t.Run("inline do: form", func(t *testing.T) { + source := []byte("def foo, do: :ok\ndef bar do") + tokens := Tokenize(source) + w := NewTokenWalker(source, tokens) + w.Advance() // skip def + w.SkipToNextSig() + + // Should NOT find the do from the next def + if w.ScanForBlockDo() { + t.Error("should not find block do for inline do: form") + } + }) +} + +func TestTokenWalker_IsModuleDefiningToken(t *testing.T) { + tests := []struct { + source string + want bool + }{ + {"defmodule Foo do end", true}, + {"defprotocol P do end", true}, + {"defimpl P, for: M do end", true}, + {"def foo do end", false}, + {"alias Foo", false}, + } + + for _, tt := range tests { + tokens := Tokenize([]byte(tt.source)) + w := NewTokenWalker([]byte(tt.source), tokens) + + got := w.IsModuleDefiningToken() + if got != tt.want { + t.Errorf("%q: got %v, want %v", tt.source, got, tt.want) + } + } +} diff --git a/internal/parser/token_walker.go b/internal/parser/token_walker.go new file mode 100644 index 0000000..1ab2538 --- /dev/null +++ b/internal/parser/token_walker.go @@ -0,0 +1,274 @@ +package parser + +// TokenWalker provides a consistent interface for iterating over tokens with +// automatic depth tracking for brackets and blocks. It consolidates patterns +// that were previously duplicated across multiple functions, ensuring: +// - Consistent depth tracking (never goes negative) +// - Forward progress guarantees +// - Proper handling of module-defining constructs (defmodule, defprotocol, defimpl) +// +// Usage: +// +// w := NewTokenWalker(source, tokens) +// for w.More() { +// tok := w.Current() +// // process token... +// w.Advance() +// } +type TokenWalker struct { + Source []byte + Tokens []Token + N int + + pos int + depth int // bracket depth: (), [], {}, <<>> + blockDepth int // block depth: do/fn/end +} + +// NewTokenWalker creates a walker starting at position 0. +func NewTokenWalker(source []byte, tokens []Token) *TokenWalker { + return &TokenWalker{ + Source: source, + Tokens: tokens, + N: len(tokens), + pos: 0, + } +} + +// Pos returns the current position. +func (w *TokenWalker) Pos() int { + return w.pos +} + +// SetPos sets the current position (does not update depths automatically). +func (w *TokenWalker) SetPos(pos int) { + w.pos = pos +} + +// More returns true if there are more tokens to process. +func (w *TokenWalker) More() bool { + return w.pos < w.N +} + +// Current returns the token at the current position. +// Panics if position is out of bounds. +func (w *TokenWalker) Current() Token { + return w.Tokens[w.pos] +} + +// CurrentKind returns the kind of the current token, or TokEOF if at end. +func (w *TokenWalker) CurrentKind() TokenKind { + if w.pos >= w.N { + return TokEOF + } + return w.Tokens[w.pos].Kind +} + +// CurrentText returns the text of the current token. +func (w *TokenWalker) CurrentText() string { + if w.pos >= w.N { + return "" + } + return TokenText(w.Source, w.Tokens[w.pos]) +} + +// Peek returns the token at pos+offset without advancing. +// Returns nil if out of bounds. +func (w *TokenWalker) Peek(offset int) *Token { + idx := w.pos + offset + if idx < 0 || idx >= w.N { + return nil + } + return &w.Tokens[idx] +} + +// PeekKind returns the kind at pos+offset, or TokEOF if out of bounds. +func (w *TokenWalker) PeekKind(offset int) TokenKind { + if tok := w.Peek(offset); tok != nil { + return tok.Kind + } + return TokEOF +} + +// Advance moves forward by one token and updates depth counters. +func (w *TokenWalker) Advance() { + if w.pos < w.N { + w.trackDepths(w.Tokens[w.pos].Kind) + w.pos++ + } +} + +// AdvanceTo moves to a specific position (must be >= current pos). +// Does NOT track depths for skipped tokens — use when you know depths are irrelevant. +func (w *TokenWalker) AdvanceTo(pos int) { + if pos > w.pos { + w.pos = pos + } +} + +// AdvanceWithDepthTracking moves to a specific position while tracking depths +// for all intermediate tokens. +func (w *TokenWalker) AdvanceWithDepthTracking(pos int) { + for w.pos < pos && w.pos < w.N { + w.Advance() + } +} + +// SkipToNextSig advances to the next significant token (skipping EOL and comments). +func (w *TokenWalker) SkipToNextSig() { + w.pos = NextSigToken(w.Tokens, w.N, w.pos) +} + +// NextSigPos returns the position of the next significant token without advancing. +func (w *TokenWalker) NextSigPos() int { + return NextSigToken(w.Tokens, w.N, w.pos) +} + +// Depth returns the current bracket depth. +func (w *TokenWalker) Depth() int { + return w.depth +} + +// BlockDepth returns the current block depth. +func (w *TokenWalker) BlockDepth() int { + return w.blockDepth +} + +// AtBalancedPoint returns true when both bracket and block depths are zero. +func (w *TokenWalker) AtBalancedPoint() bool { + return w.depth == 0 && w.blockDepth == 0 +} + +// trackDepths updates depth counters based on token kind, with clamping to prevent +// negative values (which cause bugs when starting mid-expression). +func (w *TokenWalker) trackDepths(kind TokenKind) { + switch kind { + case TokOpenParen, TokOpenBracket, TokOpenBrace, TokOpenAngle: + w.depth++ + case TokCloseParen, TokCloseBracket, TokCloseBrace, TokCloseAngle: + if w.depth > 0 { + w.depth-- + } + case TokDo, TokFn: + w.blockDepth++ + case TokEnd: + if w.blockDepth > 0 { + w.blockDepth-- + } + } +} + +// CollectModuleName collects a module name starting at current position. +// Returns the name and advances the walker past the module name tokens. +func (w *TokenWalker) CollectModuleName() string { + name, nextPos := CollectModuleName(w.Source, w.Tokens, w.N, w.pos) + w.pos = nextPos + return name +} + +// ScanForBlockDo scans forward for a block-opening TokDo. +// Returns true and advances past the do if found. +// Returns false and advances to the boundary token if not found. +func (w *TokenWalker) ScanForBlockDo() bool { + doIdx, nextPos, hasDo := ScanForwardToBlockDo(w.Tokens, w.N, w.pos) + if hasDo { + w.pos = nextPos + w.blockDepth++ + return true + } + w.pos = nextPos + _ = doIdx + return false +} + +// ScanKeywordOption looks for `key: Value` after the current position. +// If found, returns the value and advances past it. +func (w *TokenWalker) ScanKeywordOption(key string) (value string, ok bool) { + value, nextPos, ok := ScanKeywordOptionValue(w.Source, w.Tokens, w.N, w.pos, key) + if ok { + w.pos = nextPos + } + return value, ok +} + +// IsModuleDefiningToken returns true if the current token is defmodule, defprotocol, or defimpl. +func (w *TokenWalker) IsModuleDefiningToken() bool { + if w.pos >= w.N { + return false + } + switch w.Tokens[w.pos].Kind { + case TokDefmodule, TokDefprotocol, TokDefimpl: + return true + } + return false +} + +// IsFunctionDefiningToken returns true if the current token is a function definition keyword. +func (w *TokenWalker) IsFunctionDefiningToken() bool { + if w.pos >= w.N { + return false + } + switch w.Tokens[w.pos].Kind { + case TokDef, TokDefp, TokDefmacro, TokDefmacrop, TokDefguard, TokDefguardp, TokDefdelegate: + return true + } + return false +} + +// IsStatementBoundary returns true if the current token is a statement boundary. +func (w *TokenWalker) IsStatementBoundary() bool { + if w.pos >= w.N { + return true + } + return IsStatementBoundaryToken(w.Tokens[w.pos].Kind) +} + +// SkipToEndOfStatement advances past the current statement to EOL/EOF at depth 0. +func (w *TokenWalker) SkipToEndOfStatement() { + for w.pos < w.N { + kind := w.Tokens[w.pos].Kind + switch kind { + case TokOpenParen, TokOpenBracket, TokOpenBrace, TokOpenAngle: + w.depth++ + case TokCloseParen, TokCloseBracket, TokCloseBrace, TokCloseAngle: + if w.depth > 0 { + w.depth-- + } + case TokDo, TokFn: + w.blockDepth++ + case TokEnd: + if w.blockDepth > 0 { + w.blockDepth-- + } + case TokEOL, TokEOF: + if w.depth <= 0 && w.blockDepth <= 0 { + return + } + } + w.pos++ + } +} + +// EnsureProgress guarantees the walker advances by at least one token. +// Call this in loops where external functions might not advance position. +func (w *TokenWalker) EnsureProgress(prevPos int) { + if w.pos == prevPos && w.pos < w.N { + w.pos++ + } +} + +// TokenAt returns the token at the given position, or nil if out of bounds. +func (w *TokenWalker) TokenAt(pos int) *Token { + if pos < 0 || pos >= w.N { + return nil + } + return &w.Tokens[pos] +} + +// TextAt returns the text of the token at the given position. +func (w *TokenWalker) TextAt(pos int) string { + if pos < 0 || pos >= w.N { + return "" + } + return TokenText(w.Source, w.Tokens[pos]) +} diff --git a/internal/parser/tokenizer.go b/internal/parser/tokenizer.go new file mode 100644 index 0000000..c43471e --- /dev/null +++ b/internal/parser/tokenizer.go @@ -0,0 +1,959 @@ +package parser + +import ( + "unicode" + "unicode/utf8" +) + +// TokenKind identifies the kind of a lexed token. +type TokenKind byte + +const ( + TokDefmodule TokenKind = iota // defmodule + TokDef // def + TokDefp // defp + TokDefmacro // defmacro + TokDefmacrop // defmacrop + TokDefguard // defguard + TokDefguardp // defguardp + TokDefdelegate // defdelegate + TokDefprotocol // defprotocol + TokDefimpl // defimpl + TokDefstruct // defstruct + TokDefexception // defexception + TokAlias // alias + TokImport // import + TokUse // use + TokRequire // require + TokDo // do + TokEnd // end + TokFn // fn + TokWhen // when + TokIdent // lowercase identifier or _ prefixed + TokModule // uppercase-starting identifier segment + TokAttr // @identifier (general) + TokAttrDoc // @doc, @moduledoc + TokAttrSpec // @spec + TokAttrType // @type, @typep, @opaque + TokAttrBehaviour // @behaviour + TokAttrCallback // @callback, @macrocallback + TokString // "..." or '...' (content blanked) + TokHeredoc // """...""" or '''...''' (content blanked) + TokSigil // ~X... (content blanked) + TokCharLiteral // ?x or ?\n etc. + TokAtom // :foo or :"..." (colon-prefixed) + TokDot // . + TokComma // , + TokColon // : (keyword separator, not atom prefix) + TokOpenParen // ( + TokCloseParen // ) + TokOpenBracket // [ + TokCloseBracket // ] + TokOpenBrace // { + TokCloseBrace // } + TokOpenAngle // << + TokCloseAngle // >> + TokPipe // |> + TokBackslash // \\ + TokRightArrow // -> + TokLeftArrow // <- + TokAssoc // => + TokDoubleColon // :: + TokPercent // % + TokNumber // integer or float literal + TokComment // # to end of line + TokEOL // newline + TokEOF // end of input + TokOther // anything else (operators, etc.) +) + +// Token is a lexed token from an Elixir source file. +// Start and End are byte offsets into the source; source[Start:End] is the token text. +// Line is 1-based. +type Token struct { + Kind TokenKind + Start int + End int + Line int +} + +// TokenResult holds the output of Tokenize: the token stream and a line-starts +// table for O(1) byte-offset-to-column conversion. LineStarts[i] is the byte +// offset of the first character on line i+1 (0-indexed). Column for a token +// on line L at byte offset B is: B - LineStarts[L-1]. +// +// Note: LineStarts only tracks newlines seen by the main tokenizer loop (bare +// newlines between tokens). Escaped newlines inside strings, heredocs, sigils, +// and interpolations increment Token.Line correctly but are NOT reflected in +// LineStarts. Callers needing column info for tokens inside multi-line string +// literals should compute it from byte offsets directly. +type TokenResult struct { + Tokens []Token + LineStarts []int +} + +// keywordKinds maps keyword strings to their token kind. +// Checked after lexing a lowercase identifier. +var keywordKinds = map[string]TokenKind{ + "defmodule": TokDefmodule, + "defprotocol": TokDefprotocol, + "defimpl": TokDefimpl, + "defstruct": TokDefstruct, + "defexception": TokDefexception, + "defdelegate": TokDefdelegate, + "defmacrop": TokDefmacrop, + "defmacro": TokDefmacro, + "defguardp": TokDefguardp, + "defguard": TokDefguard, + "defp": TokDefp, + "def": TokDef, + "alias": TokAlias, + "import": TokImport, + "use": TokUse, + "require": TokRequire, + "do": TokDo, + "end": TokEnd, + "fn": TokFn, + "when": TokWhen, +} + +// operatorAtomChars are the characters that can form operator atoms like :+, :&&, :>>>. +// Elixir allows these after a bare : to form atoms. +var operatorAtomChars = [256]bool{ + '+': true, '-': true, '*': true, '/': true, + '=': true, '!': true, '<': true, '>': true, + '|': true, '&': true, '^': true, '~': true, + '@': true, '\\': true, +} + +func Tokenize(source []byte) []Token { + return TokenizeFull(source).Tokens +} + +func TokenizeFull(source []byte) TokenResult { + tokens := make([]Token, 0, len(source)/8) + lineStarts := make([]int, 1, 64) + lineStarts[0] = 0 // line 1 starts at byte 0 + line := 1 + i := 0 + afterDot := false // true when the last significant token was TokDot + + for i < len(source) { + ch := source[i] + + // Whitespace, newlines, and comments don't affect afterDot — they preserve it. + // Everything else clears it (except the dot case which sets it). + switch { + case ch == '\n': + tokens = append(tokens, Token{Kind: TokEOL, Start: i, End: i + 1, Line: line}) + line++ + i++ + lineStarts = append(lineStarts, i) + continue + + case ch == ' ' || ch == '\t' || ch == '\r': + i++ + continue + + case ch == '#': + start := i + for i < len(source) && source[i] != '\n' { + i++ + } + tokens = append(tokens, Token{Kind: TokComment, Start: start, End: i, Line: line}) + continue + + case ch == '?': + start := i + startLine := line + i++ // consume '?' + if i < len(source) { + if source[i] == '\\' { + i++ // consume backslash + if i < len(source) { + if source[i] == 'x' || source[i] == 'X' { + // hex escape: \xFF + i++ + for i < len(source) && isHexDigit(source[i]) { + i++ + } + } else if source[i] >= '0' && source[i] <= '7' { + // octal escape + i++ + for i < len(source) && source[i] >= '0' && source[i] <= '7' { + i++ + } + } else { + if source[i] == '\n' { + line++ + lineStarts = append(lineStarts, i+1) + } + i++ // single char escape like \n \t \\ + } + } + } else { + i++ // any other single char + } + } + tokens = append(tokens, Token{Kind: TokCharLiteral, Start: start, End: i, Line: startLine}) + + case ch == '"': + // Check for heredoc + if i+2 < len(source) && source[i+1] == '"' && source[i+2] == '"' { + start := i + startLine := line + i += 3 // consume opening """ + // scan to closing """ on its own line + i, line = scanHeredocContent(source, i, line, '"', &lineStarts) + tokens = append(tokens, Token{Kind: TokHeredoc, Start: start, End: i, Line: startLine}) + } else { + start := i + startLine := line + i++ // consume opening " + i, line = scanStringContent(source, i, line, '"', &lineStarts) + tokens = append(tokens, Token{Kind: TokString, Start: start, End: i, Line: startLine}) + } + + case ch == '\'': + // Check for heredoc + if i+2 < len(source) && source[i+1] == '\'' && source[i+2] == '\'' { + start := i + startLine := line + i += 3 // consume opening ''' + i, line = scanHeredocContent(source, i, line, '\'', &lineStarts) + tokens = append(tokens, Token{Kind: TokHeredoc, Start: start, End: i, Line: startLine}) + } else { + start := i + startLine := line + i++ // consume opening ' + i, line = scanStringContent(source, i, line, '\'', &lineStarts) + tokens = append(tokens, Token{Kind: TokString, Start: start, End: i, Line: startLine}) + } + + case ch == '~': + // Sigil: ~ followed by letter(s) then delimiter. + // Single-char sigils: ~r, ~s, ~S, etc. + // Multi-char sigils (Elixir 1.15+): ~HTML, ~HEEX — uppercase only. + if i+1 < len(source) && isLetter(source[i+1]) { + start := i + startLine := line + sigilLetter := source[i+1] + i += 2 // consume ~ and first letter + // Multi-char sigils: continue reading uppercase letters + if isUpper(sigilLetter) { + for i < len(source) && isUpper(source[i]) { + i++ + } + } + if i < len(source) { + i, line = scanSigilContent(source, i, line, sigilLetter, &lineStarts) + } + tokens = append(tokens, Token{Kind: TokSigil, Start: start, End: i, Line: startLine}) + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == ':': + if i+1 < len(source) && source[i+1] == ':' { + tokens = append(tokens, Token{Kind: TokDoubleColon, Start: i, End: i + 2, Line: line}) + i += 2 + } else if i+1 < len(source) && source[i+1] == '"' { + // Atom with quoted string: :"..." + start := i + startLine := line + i += 2 // consume :" + i, line = scanStringContent(source, i, line, '"', &lineStarts) + tokens = append(tokens, Token{Kind: TokAtom, Start: start, End: i, Line: startLine}) + } else if i+1 < len(source) && source[i+1] == '\'' { + // Atom with quoted charlist: :'...' + start := i + startLine := line + i += 2 // consume :' + i, line = scanStringContent(source, i, line, '\'', &lineStarts) + tokens = append(tokens, Token{Kind: TokAtom, Start: start, End: i, Line: startLine}) + } else if i+1 < len(source) && (isLower(source[i+1]) || source[i+1] == '_' || isUpperAtomStart(source, i+1)) { + start := i + i++ // consume ':' + i = scanIdentContinue(source, i) + tokens = append(tokens, Token{Kind: TokAtom, Start: start, End: i, Line: line}) + } else if i+1 < len(source) && source[i+1] >= 0x80 { + r, size := utf8.DecodeRune(source[i+1:]) + if r != utf8.RuneError && unicode.IsLetter(r) { + start := i + i++ + i += size + i = scanIdentContinue(source, i) + tokens = append(tokens, Token{Kind: TokAtom, Start: start, End: i, Line: line}) + } else { + tokens = append(tokens, Token{Kind: TokColon, Start: i, End: i + 1, Line: line}) + i++ + } + } else if i+1 < len(source) && operatorAtomChars[source[i+1]] { + // Operator atom: :+, :-, :&&, :>>>, etc. + start := i + i++ // consume ':' + for i < len(source) && operatorAtomChars[source[i]] { + i++ + } + tokens = append(tokens, Token{Kind: TokAtom, Start: start, End: i, Line: line}) + } else { + tokens = append(tokens, Token{Kind: TokColon, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '@': + if i+1 < len(source) && (isLower(source[i+1]) || source[i+1] == '_') { + start := i + i++ // consume '@' + i = scanIdentContinue(source, i) + tokens = append(tokens, Token{Kind: classifyAttr(source, start, i), Start: start, End: i, Line: line}) + } else if i+1 < len(source) && source[i+1] >= 0x80 { + // Check for Unicode lowercase letter after @ + r, _ := utf8.DecodeRune(source[i+1:]) + if r != utf8.RuneError && unicode.IsLetter(r) && unicode.IsLower(r) { + start := i + i++ // consume '@' + i = scanIdentContinue(source, i) + tokens = append(tokens, Token{Kind: classifyAttr(source, start, i), Start: start, End: i, Line: line}) + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '.': + if i+2 < len(source) && source[i+1] == '.' && source[i+2] == '.' { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 3, Line: line}) + i += 3 + } else if i+1 < len(source) && source[i+1] == '.' { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokDot, Start: i, End: i + 1, Line: line}) + i++ + afterDot = true + continue + } + + case ch == ',': + tokens = append(tokens, Token{Kind: TokComma, Start: i, End: i + 1, Line: line}) + i++ + + case ch == '(': + tokens = append(tokens, Token{Kind: TokOpenParen, Start: i, End: i + 1, Line: line}) + i++ + + case ch == ')': + tokens = append(tokens, Token{Kind: TokCloseParen, Start: i, End: i + 1, Line: line}) + i++ + + case ch == '[': + tokens = append(tokens, Token{Kind: TokOpenBracket, Start: i, End: i + 1, Line: line}) + i++ + + case ch == ']': + tokens = append(tokens, Token{Kind: TokCloseBracket, Start: i, End: i + 1, Line: line}) + i++ + + case ch == '{': + tokens = append(tokens, Token{Kind: TokOpenBrace, Start: i, End: i + 1, Line: line}) + i++ + + case ch == '}': + tokens = append(tokens, Token{Kind: TokCloseBrace, Start: i, End: i + 1, Line: line}) + i++ + + case ch == '<': + if i+1 < len(source) && source[i+1] == '<' { + tokens = append(tokens, Token{Kind: TokOpenAngle, Start: i, End: i + 2, Line: line}) + i += 2 + } else if i+1 < len(source) && source[i+1] == '-' { + tokens = append(tokens, Token{Kind: TokLeftArrow, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '>': + if i+1 < len(source) && source[i+1] == '>' { + tokens = append(tokens, Token{Kind: TokCloseAngle, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '|': + if i+1 < len(source) && source[i+1] == '>' { + tokens = append(tokens, Token{Kind: TokPipe, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '\\': + if i+1 < len(source) && source[i+1] == '\\' { + tokens = append(tokens, Token{Kind: TokBackslash, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '-': + if i+1 < len(source) && source[i+1] == '>' { + tokens = append(tokens, Token{Kind: TokRightArrow, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '=': + if i+1 < len(source) && source[i+1] == '>' { + tokens = append(tokens, Token{Kind: TokAssoc, Start: i, End: i + 2, Line: line}) + i += 2 + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + + case ch == '%': + tokens = append(tokens, Token{Kind: TokPercent, Start: i, End: i + 1, Line: line}) + i++ + + case ch >= '0' && ch <= '9': + start := i + i++ + // Hex: 0x, Octal: 0o, Binary: 0b + if ch == '0' && i < len(source) { + switch source[i] { + case 'x', 'X': + i++ + for i < len(source) && (isHexDigit(source[i]) || source[i] == '_') { + i++ + } + tokens = append(tokens, Token{Kind: TokNumber, Start: start, End: i, Line: line}) + continue + case 'o', 'O': + i++ + for i < len(source) && ((source[i] >= '0' && source[i] <= '7') || source[i] == '_') { + i++ + } + tokens = append(tokens, Token{Kind: TokNumber, Start: start, End: i, Line: line}) + continue + case 'b', 'B': + i++ + for i < len(source) && (source[i] == '0' || source[i] == '1' || source[i] == '_') { + i++ + } + tokens = append(tokens, Token{Kind: TokNumber, Start: start, End: i, Line: line}) + continue + } + } + // Decimal digits (with optional underscores) + for i < len(source) && (isDigit(source[i]) || source[i] == '_') { + i++ + } + // Float: decimal point followed by digit + if i < len(source) && source[i] == '.' && i+1 < len(source) && isDigit(source[i+1]) { + i++ // consume '.' + for i < len(source) && (isDigit(source[i]) || source[i] == '_') { + i++ + } + } + // Scientific notation + if i < len(source) && (source[i] == 'e' || source[i] == 'E') { + i++ + if i < len(source) && (source[i] == '+' || source[i] == '-') { + i++ + } + for i < len(source) && (isDigit(source[i]) || source[i] == '_') { + i++ + } + } + tokens = append(tokens, Token{Kind: TokNumber, Start: start, End: i, Line: line}) + + case ch == '_': + start := i + if i+9 < len(source) && string(source[i:i+10]) == "__MODULE__" && !isIdentContinueAt(source, i+10) { + tokens = append(tokens, Token{Kind: TokModule, Start: i, End: i + 10, Line: line}) + i += 10 + } else { + i++ + i = scanIdentContinue(source, i) + word := string(source[start:i]) + if !afterDot { + if kind, ok := keywordKinds[word]; ok && !isIdentContinueAt(source, i) && !isKeywordKey(source, i) { + tokens = append(tokens, Token{Kind: kind, Start: start, End: i, Line: line}) + } else { + tokens = append(tokens, Token{Kind: TokIdent, Start: start, End: i, Line: line}) + } + } else { + tokens = append(tokens, Token{Kind: TokIdent, Start: start, End: i, Line: line}) + } + } + + case isUpper(ch): + start := i + i++ + i = scanIdentContinueMod(source, i) + tokens = append(tokens, Token{Kind: TokModule, Start: start, End: i, Line: line}) + + case isLower(ch): + start := i + i++ + i = scanIdentContinue(source, i) + word := string(source[start:i]) + if !afterDot { + if kind, ok := keywordKinds[word]; ok && !isIdentContinueAt(source, i) && !isKeywordKey(source, i) { + tokens = append(tokens, Token{Kind: kind, Start: start, End: i, Line: line}) + } else { + tokens = append(tokens, Token{Kind: TokIdent, Start: start, End: i, Line: line}) + } + } else { + tokens = append(tokens, Token{Kind: TokIdent, Start: start, End: i, Line: line}) + } + + default: + // Check for multi-byte UTF-8 rune (Unicode identifiers) + if ch >= 0x80 { + r, size := utf8.DecodeRune(source[i:]) + if r != utf8.RuneError && unicode.IsLetter(r) { + start := i + isModuleStart := unicode.IsUpper(r) + i += size + if isModuleStart { + i = scanIdentContinueMod(source, i) + tokens = append(tokens, Token{Kind: TokModule, Start: start, End: i, Line: line}) + } else { + i = scanIdentContinue(source, i) + tokens = append(tokens, Token{Kind: TokIdent, Start: start, End: i, Line: line}) + } + } else { + // Non-letter Unicode or invalid UTF-8 — skip the whole rune + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + size, Line: line}) + i += size + } + } else { + tokens = append(tokens, Token{Kind: TokOther, Start: i, End: i + 1, Line: line}) + i++ + } + } + + afterDot = false + } + + tokens = append(tokens, Token{Kind: TokEOF, Start: len(source), End: len(source), Line: line}) + return TokenResult{Tokens: tokens, LineStarts: lineStarts} +} + +// scanStringContent scans from after the opening delimiter to (and including) the matching closing delimiter. +// Returns the new position (after closing delimiter) and updated line count. +// Handles escape sequences and #{} interpolation with proper brace depth tracking. +func scanStringContent(source []byte, i, line int, delim byte, lineStarts *[]int) (int, int) { + for i < len(source) { + ch := source[i] + if ch == '\n' { + line++ + i++ + *lineStarts = append(*lineStarts, i) + } else if ch == '\\' && i+1 < len(source) { + if source[i+1] == '\n' { + line++ + *lineStarts = append(*lineStarts, i+2) + } + i += 2 // skip backslash and next char + } else if ch == '#' && i+1 < len(source) && source[i+1] == '{' { + i += 2 // consume #{ + i, line = scanInterpolation(source, i, line, lineStarts) + } else if ch == delim { + i++ // consume closing delimiter + return i, line + } else { + i++ + } + } + return i, line +} + +// scanInterpolation scans the body of a #{} interpolation block, starting after the #{. +// Tracks brace depth and properly handles nested strings, char literals, and sigils +// so that } inside those constructs doesn't prematurely close the interpolation. +func scanInterpolation(source []byte, i, line int, lineStarts *[]int) (int, int) { + depth := 1 + for i < len(source) && depth > 0 { + c := source[i] + switch { + case c == '\n': + line++ + i++ + *lineStarts = append(*lineStarts, i) + case c == '\\' && i+1 < len(source): + if source[i+1] == '\n' { + line++ + *lineStarts = append(*lineStarts, i+2) + } + i += 2 + case c == '"' || c == '\'': + innerDelim := c + i++ + i, line = scanStringContent(source, i, line, innerDelim, lineStarts) + case c == '?' && i+1 < len(source): + i++ // consume '?' + if source[i] == '\\' && i+1 < len(source) { + if source[i+1] == '\n' { + line++ + *lineStarts = append(*lineStarts, i+2) + } + i += 2 // escape sequence like ?\n + } else { + i++ // single char like ?} or ?a + } + case c == '~' && i+1 < len(source) && isLetter(source[i+1]): + sigilLetter := source[i+1] + i += 2 // consume ~ and letter + if i < len(source) { + i, line = scanSigilContent(source, i, line, sigilLetter, lineStarts) + } + case c == '#' && i+1 < len(source) && source[i+1] == '{': + i += 2 + i, line = scanInterpolation(source, i, line, lineStarts) + case c == '{': + depth++ + i++ + case c == '}': + depth-- + i++ + default: + i++ + } + } + return i, line +} + +// scanHeredocContent scans from after the opening """ (or ”') to (and including) the closing """ on its own line. +// The closing delimiter must appear at the start of a line (possibly with leading whitespace). +func scanHeredocContent(source []byte, i, line int, delim byte, lineStarts *[]int) (int, int) { + for i < len(source) { + ch := source[i] + if ch == '\n' { + line++ + i++ + *lineStarts = append(*lineStarts, i) + // Check if the next non-space chars are the closing delimiter + j := i + for j < len(source) && (source[j] == ' ' || source[j] == '\t') { + j++ + } + if j+2 < len(source) && source[j] == delim && source[j+1] == delim && source[j+2] == delim { + i = j + 3 // consume closing delimiter + return i, line + } + } else if ch == '\\' && i+1 < len(source) { + if source[i+1] == '\n' { + line++ + *lineStarts = append(*lineStarts, i+2) + } + i += 2 + } else if ch == '#' && i+1 < len(source) && source[i+1] == '{' { + i += 2 + i, line = scanInterpolation(source, i, line, lineStarts) + } else { + i++ + } + } + return i, line +} + +// scanSigilContent scans from the opening delimiter of a sigil to its closing delimiter, +// including any trailing modifier letters. Returns new position and updated line count. +// sigilLetter is the letter after ~ (e.g. 's' in ~s, 'S' in ~S). Uppercase sigil letters +// mean the content is "raw" — backslash is NOT an escape character. +func scanSigilContent(source []byte, i, line int, sigilLetter byte, lineStarts *[]int) (int, int) { + if i >= len(source) { + return i, line + } + + escapes := isLower(sigilLetter) // only lowercase sigils process escapes + openCh := source[i] + + // Check for heredoc sigil: ~s""" or ~S""" + if openCh == '"' && i+2 < len(source) && source[i+1] == '"' && source[i+2] == '"' { + i += 3 // consume """ + if escapes { + i, line = scanHeredocContent(source, i, line, '"', lineStarts) + } else { + i, line = scanRawHeredocContent(source, i, line, '"', lineStarts) + } + return i, line + } + if openCh == '\'' && i+2 < len(source) && source[i+1] == '\'' && source[i+2] == '\'' { + i += 3 // consume ''' + if escapes { + i, line = scanHeredocContent(source, i, line, '\'', lineStarts) + } else { + i, line = scanRawHeredocContent(source, i, line, '\'', lineStarts) + } + return i, line + } + + i++ // consume opening delimiter + + var closeCh byte + nested := false + + switch openCh { + case '(': + closeCh = ')' + nested = true + case '[': + closeCh = ']' + nested = true + case '{': + closeCh = '}' + nested = true + case '<': + closeCh = '>' + nested = true + default: + closeCh = openCh + nested = false + } + + if nested { + depth := 1 + for i < len(source) && depth > 0 { + ch := source[i] + if ch == '\n' { + line++ + i++ + *lineStarts = append(*lineStarts, i) + } else if escapes && ch == '\\' && i+1 < len(source) { + if source[i+1] == '\n' { + line++ + *lineStarts = append(*lineStarts, i+2) + } + i += 2 + } else if ch == openCh { + depth++ + i++ + } else if ch == closeCh { + depth-- + i++ + } else { + i++ + } + } + } else { + for i < len(source) { + ch := source[i] + if ch == '\n' { + line++ + i++ + *lineStarts = append(*lineStarts, i) + } else if escapes && ch == '\\' && i+1 < len(source) { + if source[i+1] == '\n' { + line++ + *lineStarts = append(*lineStarts, i+2) + } + i += 2 + } else if ch == closeCh { + i++ // consume closing delimiter + break + } else { + i++ + } + } + } + + // Consume trailing modifier letters (e.g. the 'i' in ~r/foo/i) + for i < len(source) && isLetter(source[i]) { + i++ + } + + return i, line +} + +// scanRawHeredocContent scans a heredoc body where backslash is NOT an escape character +// (used by uppercase sigils like ~S"""). Only tracks newlines and looks for closing delimiter. +func scanRawHeredocContent(source []byte, i, line int, delim byte, lineStarts *[]int) (int, int) { + for i < len(source) { + ch := source[i] + if ch == '\n' { + line++ + i++ + *lineStarts = append(*lineStarts, i) + j := i + for j < len(source) && (source[j] == ' ' || source[j] == '\t') { + j++ + } + if j+2 < len(source) && source[j] == delim && source[j+1] == delim && source[j+2] == delim { + i = j + 3 + return i, line + } + } else { + i++ + } + } + return i, line +} + +// isLetter returns true for ASCII [a-zA-Z]. +func isLetter(ch byte) bool { + return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') +} + +// isLower returns true for ASCII [a-z]. +func isLower(ch byte) bool { + return ch >= 'a' && ch <= 'z' +} + +// isUpper returns true for ASCII [A-Z]. +func isUpper(ch byte) bool { + return ch >= 'A' && ch <= 'Z' +} + +// isDigit returns true for [0-9]. +func isDigit(ch byte) bool { + return ch >= '0' && ch <= '9' +} + +// isHexDigit returns true for [0-9a-fA-F]. +func isHexDigit(ch byte) bool { + return isDigit(ch) || (ch >= 'a' && ch <= 'f') || (ch >= 'A' && ch <= 'F') +} + +// isIdentContinue returns true for ASCII characters valid after the first character of a lowercase identifier. +func isIdentContinue(ch byte) bool { + return isLetter(ch) || isDigit(ch) || ch == '_' || ch == '?' || ch == '!' || ch == '@' +} + +// isIdentContinueMod returns true for ASCII characters valid in module name identifiers (no ? or !). +func isIdentContinueMod(ch byte) bool { + return isLetter(ch) || isDigit(ch) || ch == '_' || ch == '@' +} + +// scanIdentContinue advances i past identifier continuation characters, +// including multi-byte UTF-8 letters/digits. For lowercase identifiers, +// allows ? and ! as the final character (Elixir convention). +func scanIdentContinue(source []byte, i int) int { + for i < len(source) { + ch := source[i] + if isIdentContinue(ch) { + i++ + continue + } + // Check for multi-byte UTF-8 letter/digit + if ch >= 0x80 { + r, size := utf8.DecodeRune(source[i:]) + if r != utf8.RuneError && (unicode.IsLetter(r) || unicode.IsDigit(r)) { + i += size + continue + } + } + break + } + return i +} + +// scanIdentContinueMod advances i past module identifier continuation characters +// (no ? or !), including multi-byte UTF-8 letters/digits. +func scanIdentContinueMod(source []byte, i int) int { + for i < len(source) { + ch := source[i] + if isIdentContinueMod(ch) { + i++ + continue + } + if ch >= 0x80 { + r, size := utf8.DecodeRune(source[i:]) + if r != utf8.RuneError && (unicode.IsLetter(r) || unicode.IsDigit(r)) { + i += size + continue + } + } + break + } + return i +} + +// isIdentContinueAt checks if the byte at position i in source is an identifier +// continuation character, including multi-byte UTF-8 letters/digits. +func isIdentContinueAt(source []byte, i int) bool { + if i >= len(source) { + return false + } + ch := source[i] + if isIdentContinue(ch) { + return true + } + if ch >= 0x80 { + r, size := utf8.DecodeRune(source[i:]) + _ = size + return r != utf8.RuneError && (unicode.IsLetter(r) || unicode.IsDigit(r)) + } + return false +} + +// isUpperAtomStart returns true if source[i] starts an uppercase letter +// (either ASCII A-Z or a multi-byte uppercase Unicode letter). +// Elixir allows :Foo atoms (though they're typically aliases). +func isUpperAtomStart(source []byte, i int) bool { + if i >= len(source) { + return false + } + if isUpper(source[i]) { + return true + } + if source[i] >= 0x80 { + r, _ := utf8.DecodeRune(source[i:]) + return r != utf8.RuneError && unicode.IsUpper(r) + } + return false +} + +// classifyAttr returns the specific TokAttr* kind for known attribute names, +// or TokAttr for everything else. source[start:end] includes the leading '@'. +func classifyAttr(source []byte, start, end int) TokenKind { + name := source[start+1 : end] // strip '@' + switch { + case bytesEqual(name, "doc") || bytesEqual(name, "moduledoc"): + return TokAttrDoc + case bytesEqual(name, "spec"): + return TokAttrSpec + case bytesEqual(name, "type") || bytesEqual(name, "typep") || bytesEqual(name, "opaque"): + return TokAttrType + case bytesEqual(name, "behaviour"): + return TokAttrBehaviour + case bytesEqual(name, "callback") || bytesEqual(name, "macrocallback"): + return TokAttrCallback + default: + return TokAttr + } +} + +func bytesEqual(b []byte, s string) bool { + if len(b) != len(s) { + return false + } + for i := range b { + if b[i] != s[i] { + return false + } + } + return true +} + +// isKeywordKey checks if source[i] is ':' not followed by another ':'. +// When true, the preceding keyword (do, end, fn, when, etc.) is being used as a +// keyword-list key (e.g. `do: :something`) and should emit TokIdent instead. +func isKeywordKey(source []byte, i int) bool { + return i < len(source) && source[i] == ':' && (i+1 >= len(source) || source[i+1] != ':') +} diff --git a/internal/parser/tokenizer_test.go b/internal/parser/tokenizer_test.go new file mode 100644 index 0000000..7b38365 --- /dev/null +++ b/internal/parser/tokenizer_test.go @@ -0,0 +1,2143 @@ +package parser + +import ( + "fmt" + "strings" + "testing" +) + +// tokenizeNoEOF runs Tokenize and strips the trailing TokEOF for cleaner assertions. +func tokenizeNoEOF(source string) []Token { + tokens := Tokenize([]byte(source)) + if len(tokens) > 0 && tokens[len(tokens)-1].Kind == TokEOF { + return tokens[:len(tokens)-1] + } + return tokens +} + +// kindNames maps TokenKind to a human-readable name for test output. +var kindNames = map[TokenKind]string{ + TokDefmodule: "TokDefmodule", + TokDef: "TokDef", + TokDefp: "TokDefp", + TokDefmacro: "TokDefmacro", + TokDefmacrop: "TokDefmacrop", + TokDefguard: "TokDefguard", + TokDefguardp: "TokDefguardp", + TokDefdelegate: "TokDefdelegate", + TokDefprotocol: "TokDefprotocol", + TokDefimpl: "TokDefimpl", + TokDefstruct: "TokDefstruct", + TokDefexception: "TokDefexception", + TokAlias: "TokAlias", + TokImport: "TokImport", + TokUse: "TokUse", + TokRequire: "TokRequire", + TokDo: "TokDo", + TokEnd: "TokEnd", + TokFn: "TokFn", + TokWhen: "TokWhen", + TokIdent: "TokIdent", + TokModule: "TokModule", + TokAttr: "TokAttr", + TokAttrDoc: "TokAttrDoc", + TokAttrSpec: "TokAttrSpec", + TokAttrType: "TokAttrType", + TokAttrBehaviour: "TokAttrBehaviour", + TokAttrCallback: "TokAttrCallback", + TokString: "TokString", + TokHeredoc: "TokHeredoc", + TokSigil: "TokSigil", + TokCharLiteral: "TokCharLiteral", + TokAtom: "TokAtom", + TokDot: "TokDot", + TokComma: "TokComma", + TokColon: "TokColon", + TokOpenParen: "TokOpenParen", + TokCloseParen: "TokCloseParen", + TokOpenBracket: "TokOpenBracket", + TokCloseBracket: "TokCloseBracket", + TokOpenBrace: "TokOpenBrace", + TokCloseBrace: "TokCloseBrace", + TokOpenAngle: "TokOpenAngle", + TokCloseAngle: "TokCloseAngle", + TokPipe: "TokPipe", + TokBackslash: "TokBackslash", + TokRightArrow: "TokRightArrow", + TokLeftArrow: "TokLeftArrow", + TokAssoc: "TokAssoc", + TokDoubleColon: "TokDoubleColon", + TokPercent: "TokPercent", + TokNumber: "TokNumber", + TokComment: "TokComment", + TokEOL: "TokEOL", + TokEOF: "TokEOF", + TokOther: "TokOther", +} + +func kindName(k TokenKind) string { + if name, ok := kindNames[k]; ok { + return name + } + return fmt.Sprintf("Token(%d)", int(k)) +} + +// assertKinds checks that the tokens produced have exactly the given kinds (ignoring EOL). +func assertKinds(t *testing.T, source string, expected []TokenKind) { + t.Helper() + tokens := tokenizeNoEOF(source) + // Filter out EOL tokens for easier assertions unless EOL is in expected + wantEOL := false + for _, k := range expected { + if k == TokEOL { + wantEOL = true + break + } + } + var got []TokenKind + for _, tok := range tokens { + if tok.Kind == TokEOL && !wantEOL { + continue + } + got = append(got, tok.Kind) + } + if len(got) != len(expected) { + t.Errorf("source %q: got %d tokens, want %d", source, len(got), len(expected)) + t.Logf(" got: %v", kindSlice(got)) + t.Logf(" want: %v", kindSlice(expected)) + return + } + for i, k := range expected { + if got[i] != k { + t.Errorf("source %q: token[%d] = %s, want %s", source, i, kindName(got[i]), kindName(k)) + } + } +} + +func kindSlice(kinds []TokenKind) []string { + names := make([]string, len(kinds)) + for i, k := range kinds { + names[i] = kindName(k) + } + return names +} + +// assertText checks that a specific token (by index, excluding EOL) has the given text. +func assertText(t *testing.T, source string, index int, expected string) { + t.Helper() + tokens := tokenizeNoEOF(source) + var filtered []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + filtered = append(filtered, tok) + } + } + if index >= len(filtered) { + t.Errorf("source %q: token index %d out of range (have %d tokens)", source, index, len(filtered)) + return + } + got := string([]byte(source)[filtered[index].Start:filtered[index].End]) + if got != expected { + t.Errorf("source %q: token[%d] text = %q, want %q", source, index, got, expected) + } +} + +// TestTokenize_EmptySource verifies that an empty input produces just TokEOF. +func TestTokenize_EmptySource(t *testing.T) { + tokens := Tokenize([]byte("")) + if len(tokens) != 1 || tokens[0].Kind != TokEOF { + t.Errorf("empty source: expected [TokEOF], got %v", tokens) + } +} + +// TestTokenize_BasicKeywords verifies that `defmodule Foo do` produces correct tokens. +func TestTokenize_BasicKeywords(t *testing.T) { + assertKinds(t, "defmodule Foo do", []TokenKind{TokDefmodule, TokModule, TokDo}) + assertText(t, "defmodule Foo do", 1, "Foo") +} + +// TestTokenize_IdentVsKeyword verifies that `define` is TokIdent, not TokDef. +func TestTokenize_IdentVsKeyword(t *testing.T) { + assertKinds(t, "define", []TokenKind{TokIdent}) + assertText(t, "define", 0, "define") + assertKinds(t, "def", []TokenKind{TokDef}) + assertKinds(t, "defmodule_helper", []TokenKind{TokIdent}) +} + +// TestTokenize_AllDefKeywords verifies all def-family keywords are recognized. +func TestTokenize_AllDefKeywords(t *testing.T) { + cases := []struct { + source string + kind TokenKind + }{ + {"defmodule ", TokDefmodule}, + {"def ", TokDef}, + {"defp ", TokDefp}, + {"defmacro ", TokDefmacro}, + {"defmacrop ", TokDefmacrop}, + {"defguard ", TokDefguard}, + {"defguardp ", TokDefguardp}, + {"defdelegate ", TokDefdelegate}, + {"defprotocol ", TokDefprotocol}, + {"defimpl ", TokDefimpl}, + {"defstruct ", TokDefstruct}, + {"defexception ", TokDefexception}, + {"alias ", TokAlias}, + {"import ", TokImport}, + {"use ", TokUse}, + {"require ", TokRequire}, + {"do\n", TokDo}, + {"end", TokEnd}, + {"fn ", TokFn}, + } + for _, tc := range cases { + t.Run(tc.source, func(t *testing.T) { + tokens := Tokenize([]byte(tc.source)) + if tokens[0].Kind != tc.kind { + t.Errorf("source %q: first token = %s, want %s", tc.source, kindName(tokens[0].Kind), kindName(tc.kind)) + } + }) + } +} + +// TestTokenize_String verifies that "hello" produces TokString. +func TestTokenize_String(t *testing.T) { + assertKinds(t, `"hello"`, []TokenKind{TokString}) + assertText(t, `"hello"`, 0, `"hello"`) +} + +// TestTokenize_StringWithInterpolation verifies that the whole interpolated string is one TokString. +func TestTokenize_StringWithInterpolation(t *testing.T) { + source := `"hello #{World.name}"` + assertKinds(t, source, []TokenKind{TokString}) + assertText(t, source, 0, source) +} + +// TestTokenize_StringNestedInterpolation verifies nested strings inside interpolation are handled. +func TestTokenize_StringNestedInterpolation(t *testing.T) { + source := `"#{foo("arg")}"` + assertKinds(t, source, []TokenKind{TokString}) + assertText(t, source, 0, source) +} + +// TestTokenize_CharLiteralQuote verifies that ?" is TokCharLiteral, not TokString. +func TestTokenize_CharLiteralQuote(t *testing.T) { + assertKinds(t, `?"`, []TokenKind{TokCharLiteral}) + assertText(t, `?"`, 0, `?"`) +} + +// TestTokenize_CharLiteralSingleQuote verifies that ?' is TokCharLiteral, not TokString/charlist. +func TestTokenize_CharLiteralSingleQuote(t *testing.T) { + assertKinds(t, `?'`, []TokenKind{TokCharLiteral}) + assertText(t, `?'`, 0, `?'`) +} + +// TestTokenize_CharLiteralHash verifies that ?# is TokCharLiteral, not a comment. +func TestTokenize_CharLiteralHash(t *testing.T) { + assertKinds(t, `?#`, []TokenKind{TokCharLiteral}) + assertText(t, `?#`, 0, `?#`) +} + +// TestTokenize_CharLiteralEscape verifies backslash escape char literals. +func TestTokenize_CharLiteralEscape(t *testing.T) { + assertKinds(t, `?\n`, []TokenKind{TokCharLiteral}) + assertText(t, `?\n`, 0, `?\n`) + assertKinds(t, `?\\`, []TokenKind{TokCharLiteral}) +} + +// TestTokenize_Atom verifies that :foo produces TokAtom. +func TestTokenize_Atom(t *testing.T) { + assertKinds(t, ":foo", []TokenKind{TokAtom}) + assertText(t, ":foo", 0, ":foo") +} + +// TestTokenize_AtomQuoted verifies that :"hello" produces TokAtom. +func TestTokenize_AtomQuoted(t *testing.T) { + assertKinds(t, `:"hello world"`, []TokenKind{TokAtom}) + assertText(t, `:"hello world"`, 0, `:"hello world"`) +} + +// TestTokenize_DoubleColon verifies that :: is TokDoubleColon, not an atom. +func TestTokenize_DoubleColon(t *testing.T) { + assertKinds(t, "::", []TokenKind{TokDoubleColon}) +} + +// TestTokenize_KeywordKey verifies that `as: Foo` is TokIdent + TokColon + TokModule (not atom). +func TestTokenize_KeywordKey(t *testing.T) { + assertKinds(t, "as: Foo", []TokenKind{TokIdent, TokColon, TokModule}) + assertText(t, "as: Foo", 0, "as") +} + +// TestTokenize_Sigil verifies that a sigil produces a single TokSigil and no tokens inside. +func TestTokenize_Sigil(t *testing.T) { + assertKinds(t, "~s(alias Fake.Module)", []TokenKind{TokSigil}) +} + +// TestTokenize_SigilBracketNested verifies nested brackets inside sigil are handled. +func TestTokenize_SigilBracketNested(t *testing.T) { + assertKinds(t, "~s(foo (bar) baz)", []TokenKind{TokSigil}) +} + +// TestTokenize_SigilModifier verifies sigil with trailing modifier letters. +func TestTokenize_SigilModifier(t *testing.T) { + assertKinds(t, "~r/foo/i", []TokenKind{TokSigil}) + assertText(t, "~r/foo/i", 0, "~r/foo/i") +} + +// TestTokenize_SigilHeredoc verifies that a heredoc sigil produces TokSigil (not TokHeredoc). +func TestTokenize_SigilHeredoc(t *testing.T) { + source := "~s\"\"\"\nhello\n\"\"\"" + assertKinds(t, source, []TokenKind{TokSigil}) +} + +// TestTokenize_Heredoc verifies that """ heredocs produce TokHeredoc. +func TestTokenize_Heredoc(t *testing.T) { + source := "\"\"\"\nhello\n\"\"\"" + assertKinds(t, source, []TokenKind{TokHeredoc}) +} + +// TestTokenize_HeredocSingleQuote verifies ”' heredocs produce TokHeredoc. +func TestTokenize_HeredocSingleQuote(t *testing.T) { + source := "'''\nhello\n'''" + assertKinds(t, source, []TokenKind{TokHeredoc}) +} + +// TestTokenize_ModuleName verifies `MyApp.Accounts` produces TokModule, TokDot, TokModule. +func TestTokenize_ModuleName(t *testing.T) { + assertKinds(t, "MyApp.Accounts", []TokenKind{TokModule, TokDot, TokModule}) + assertText(t, "MyApp.Accounts", 0, "MyApp") + assertText(t, "MyApp.Accounts", 2, "Accounts") +} + +// TestTokenize_Comment verifies that # starts a comment and no keywords are recognized inside. +func TestTokenize_Comment(t *testing.T) { + assertKinds(t, "# defmodule Foo", []TokenKind{TokComment}) +} + +// TestTokenize_Attribute verifies that @doc produces TokAttrDoc. +func TestTokenize_Attribute(t *testing.T) { + assertKinds(t, "@doc", []TokenKind{TokAttrDoc}) + assertText(t, "@doc", 0, "@doc") +} + +// TestTokenize_AttrOther verifies that @ not followed by identifier is TokOther. +func TestTokenize_AttrOther(t *testing.T) { + assertKinds(t, "@ ", []TokenKind{TokOther}) +} + +// TestTokenize_SpecLine verifies the token kinds for a @spec line. +func TestTokenize_SpecLine(t *testing.T) { + source := "@spec foo(String.t()) :: {:ok, User.t()}" + tokens := tokenizeNoEOF(source) + var kinds []TokenKind + for _, tok := range tokens { + if tok.Kind != TokEOL { + kinds = append(kinds, tok.Kind) + } + } + expected := []TokenKind{ + TokAttrSpec, // @spec + TokIdent, // foo + TokOpenParen, // ( + TokModule, // String + TokDot, // . + TokIdent, // t + TokOpenParen, // ( + TokCloseParen, // ) + TokCloseParen, // ) + TokDoubleColon, // :: + TokOpenBrace, // { + TokAtom, // :ok + TokComma, // , + TokModule, // User + TokDot, // . + TokIdent, // t + TokOpenParen, // ( + TokCloseParen, // ) + TokCloseBrace, // } + } + if len(kinds) != len(expected) { + t.Errorf("spec line: got %d tokens, want %d", len(kinds), len(expected)) + t.Logf(" got: %v", kindSlice(kinds)) + t.Logf(" want: %v", kindSlice(expected)) + return + } + for i, k := range expected { + if kinds[i] != k { + t.Errorf("spec line: token[%d] = %s, want %s", i, kindName(kinds[i]), kindName(k)) + } + } +} + +// TestTokenize_LineNumbers verifies that newlines increment line numbers correctly. +func TestTokenize_LineNumbers(t *testing.T) { + source := "foo\nbar\nbaz" + tokens := Tokenize([]byte(source)) + // foo is on line 1, bar on line 2, baz on line 3 + var idents []Token + for _, tok := range tokens { + if tok.Kind == TokIdent { + idents = append(idents, tok) + } + } + if len(idents) != 3 { + t.Fatalf("expected 3 idents, got %d", len(idents)) + } + if idents[0].Line != 1 { + t.Errorf("foo: line = %d, want 1", idents[0].Line) + } + if idents[1].Line != 2 { + t.Errorf("bar: line = %d, want 2", idents[1].Line) + } + if idents[2].Line != 3 { + t.Errorf("baz: line = %d, want 3", idents[2].Line) + } +} + +// TestTokenize_MultilineString verifies that line numbers are tracked inside multi-line strings. +func TestTokenize_MultilineString(t *testing.T) { + source := "\"line1\nline2\"\nafter" + tokens := Tokenize([]byte(source)) + var afterTok Token + for _, tok := range tokens { + if tok.Kind == TokIdent { + afterTok = tok + } + } + if afterTok.Line != 3 { + t.Errorf("'after' token: line = %d, want 3", afterTok.Line) + } +} + +// TestTokenize_ModuleSpecial verifies that __MODULE__ is TokModule. +func TestTokenize_ModuleSpecial(t *testing.T) { + assertKinds(t, "__MODULE__", []TokenKind{TokModule}) + assertText(t, "__MODULE__", 0, "__MODULE__") +} + +// TestTokenize_Structural verifies structural tokens: <<, >>, |>, \\. +func TestTokenize_Structural(t *testing.T) { + assertKinds(t, "<<", []TokenKind{TokOpenAngle}) + assertKinds(t, ">>", []TokenKind{TokCloseAngle}) + assertKinds(t, "|>", []TokenKind{TokPipe}) + assertKinds(t, `\\`, []TokenKind{TokBackslash}) + assertKinds(t, "->", []TokenKind{TokRightArrow}) + assertKinds(t, "=>", []TokenKind{TokAssoc}) +} + +// TestTokenize_Dots verifies that . is TokDot, .. and ... are TokOther. +func TestTokenize_Dots(t *testing.T) { + assertKinds(t, ".", []TokenKind{TokDot}) + assertKinds(t, "..", []TokenKind{TokOther}) + assertKinds(t, "...", []TokenKind{TokOther}) +} + +// TestTokenize_KeywordInsideString verifies that keywords inside strings are not emitted. +func TestTokenize_KeywordInsideString(t *testing.T) { + assertKinds(t, `"defmodule"`, []TokenKind{TokString}) +} + +// TestTokenize_KeywordInsideComment verifies that keywords inside comments are not emitted. +func TestTokenize_KeywordInsideComment(t *testing.T) { + assertKinds(t, "# defmodule Foo", []TokenKind{TokComment}) +} + +// TestTokenize_KeywordFollowedByIdentChar verifies keywords need word boundary. +func TestTokenize_KeywordFollowedByIdentChar(t *testing.T) { + // defp followed by _ shouldn't produce TokDefp + assertKinds(t, "defp_helper", []TokenKind{TokIdent}) + // def followed by 2 shouldn't produce TokDef + assertKinds(t, "def2", []TokenKind{TokIdent}) +} + +// TestTokenize_DefWithParens verifies def followed by ( is a keyword. +func TestTokenize_DefWithParens(t *testing.T) { + assertKinds(t, "def(", []TokenKind{TokDef, TokOpenParen}) +} + +// TestTokenize_FullModule verifies a realistic module header parses correctly. +func TestTokenize_FullModule(t *testing.T) { + source := "defmodule MyApp.Accounts do\n @moduledoc false\nend" + tokens := tokenizeNoEOF(source) + var kinds []TokenKind + for _, tok := range tokens { + if tok.Kind != TokEOL { + kinds = append(kinds, tok.Kind) + } + } + expected := []TokenKind{ + TokDefmodule, + TokModule, TokDot, TokModule, + TokDo, + TokAttrDoc, // @moduledoc + TokIdent, // false — not a keyword, just an identifier + TokEnd, + } + if len(kinds) != len(expected) { + t.Errorf("full module: got %d tokens, want %d", len(kinds), len(expected)) + t.Logf(" got: %v", kindSlice(kinds)) + t.Logf(" want: %v", kindSlice(expected)) + return + } + for i, k := range expected { + if kinds[i] != k { + t.Errorf("full module: token[%d] = %s, want %s", i, kindName(kinds[i]), kindName(k)) + } + } +} + +// TestTokenize_Charlist verifies single-quoted charlists are TokString. +func TestTokenize_Charlist(t *testing.T) { + assertKinds(t, `'hello'`, []TokenKind{TokString}) +} + +// TestTokenize_SigilWithAngleBrackets verifies angle bracket sigils. +func TestTokenize_SigilWithAngleBrackets(t *testing.T) { + assertKinds(t, "~s", []TokenKind{TokSigil}) +} + +// TestTokenize_SigilWithSquareBrackets verifies square bracket sigils. +func TestTokenize_SigilWithSquareBrackets(t *testing.T) { + assertKinds(t, "~w[foo bar baz]", []TokenKind{TokSigil}) +} + +// TestTokenize_AtomWithBang verifies atoms with ! work correctly. +func TestTokenize_AtomWithBang(t *testing.T) { + assertKinds(t, ":ok!", []TokenKind{TokAtom}) + assertText(t, ":ok!", 0, ":ok!") +} + +// TestTokenize_BinaryPattern verifies << and >> tokens in binary syntax. +func TestTokenize_BinaryPattern(t *testing.T) { + assertKinds(t, "<>", []TokenKind{TokOpenAngle, TokIdent, TokCloseAngle}) +} + +// TestTokenize_PipeChain verifies |> produces TokPipe. +func TestTokenize_PipeChain(t *testing.T) { + assertKinds(t, "foo |> bar", []TokenKind{TokIdent, TokPipe, TokIdent}) +} + +// TestTokenize_DefaultParam verifies \\ produces TokBackslash. +func TestTokenize_DefaultParam(t *testing.T) { + assertKinds(t, `def foo(x \\ 0)`, []TokenKind{TokDef, TokIdent, TokOpenParen, TokIdent, TokBackslash, TokNumber, TokCloseParen}) +} + +// TestTokenize_HeredocLineNumbers verifies that line numbers are tracked inside heredocs. +func TestTokenize_HeredocLineNumbers(t *testing.T) { + source := "\"\"\"\nline1\nline2\n\"\"\"\nafter" + tokens := Tokenize([]byte(source)) + var afterTok Token + for _, tok := range tokens { + if tok.Kind == TokIdent && string([]byte(source)[tok.Start:tok.End]) == "after" { + afterTok = tok + } + } + if afterTok.Line != 5 { + t.Errorf("'after' token: line = %d, want 5", afterTok.Line) + } +} + +// TestTokenize_SigilHeredocLineNumbers verifies line numbers after a sigil heredoc. +func TestTokenize_SigilHeredocLineNumbers(t *testing.T) { + source := "~s\"\"\"\nhello\nworld\n\"\"\"\nafter" + tokens := Tokenize([]byte(source)) + var afterTok Token + for _, tok := range tokens { + if tok.Kind == TokIdent && string([]byte(source)[tok.Start:tok.End]) == "after" { + afterTok = tok + } + } + if afterTok.Line != 5 { + t.Errorf("'after' token after sigil heredoc: line = %d, want 5", afterTok.Line) + } +} + +// --- Edge cases from Elixir tokenizer cross-check --- + +func TestTokenize_NestedInterpolation(t *testing.T) { + // "This is a #{var("#{that}", here)}" → single TokString, then the code after is parsed correctly + source := `"This is #{var("#{that}", here)}" + x` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // Should be: TokString, TokOther(+), TokIdent(x) + if len(nonEOL) < 3 { + t.Fatalf("expected at least 3 non-EOL tokens, got %d", len(nonEOL)) + } + if nonEOL[0].Kind != TokString { + t.Errorf("token[0] = %s, want TokString", kindName(nonEOL[0].Kind)) + } + // The last token should be the identifier "x" + last := nonEOL[len(nonEOL)-1] + if last.Kind != TokIdent || string([]byte(source)[last.Start:last.End]) != "x" { + t.Errorf("last token = %s %q, want TokIdent \"x\"", kindName(last.Kind), string([]byte(source)[last.Start:last.End])) + } +} + +func TestTokenize_CharLiteralCloseBraceInInterpolation(t *testing.T) { + // "#{?}}" — ?} is a char literal, the second } closes the interpolation + source := `"#{?}}" <> rest` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // Should be: TokString("#{?}}"), TokOther(<>), TokIdent(rest) + if len(nonEOL) < 2 { + t.Fatalf("expected at least 2 tokens, got %d", len(nonEOL)) + } + if nonEOL[0].Kind != TokString { + t.Errorf("token[0] = %s, want TokString", kindName(nonEOL[0].Kind)) + } + lastTok := nonEOL[len(nonEOL)-1] + if lastTok.Kind != TokIdent || string([]byte(source)[lastTok.Start:lastTok.End]) != "rest" { + t.Errorf("last token = %s %q, want TokIdent \"rest\"", kindName(lastTok.Kind), string([]byte(source)[lastTok.Start:lastTok.End])) + } +} + +func TestTokenize_SigilInsideInterpolation(t *testing.T) { + // "#{~r/pat}tern/}" — the } inside the regex is part of the sigil, not interpolation + source := `"#{~r/pat}tern/}" <> rest` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) < 2 { + t.Fatalf("expected at least 2 tokens, got %d", len(nonEOL)) + } + if nonEOL[0].Kind != TokString { + t.Errorf("token[0] = %s, want TokString", kindName(nonEOL[0].Kind)) + } + lastTok := nonEOL[len(nonEOL)-1] + if lastTok.Kind != TokIdent || string([]byte(source)[lastTok.Start:lastTok.End]) != "rest" { + t.Errorf("last token = %s %q, want TokIdent \"rest\"", kindName(lastTok.Kind), string([]byte(source)[lastTok.Start:lastTok.End])) + } +} + +func TestTokenize_UppercaseSigilNoEscapes(t *testing.T) { + // ~S"contains \" stuff" — uppercase sigil, backslash is NOT escape. + // The \" is literal \ then " which closes the sigil. + // So ~S"contains \" is the sigil, then stuff" is separate tokens. + source := `~S"contains \" stuff` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) < 1 { + t.Fatal("expected at least 1 token") + } + if nonEOL[0].Kind != TokSigil { + t.Errorf("token[0] = %s, want TokSigil", kindName(nonEOL[0].Kind)) + } + // The sigil should end at the first unescaped " (which is the \" — no escape in uppercase sigil) + sigilText := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + expected := `~S"contains \"` + if sigilText != expected { + t.Errorf("sigil text = %q, want %q", sigilText, expected) + } +} + +func TestTokenize_LowercaseSigilEscapes(t *testing.T) { + // ~s"contains \" stuff" — lowercase sigil, backslash IS escape. + source := `~s"contains \" stuff"` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) != 1 { + t.Fatalf("expected 1 token, got %d: %v", len(nonEOL), kindSlice(kindsOf(nonEOL))) + } + if nonEOL[0].Kind != TokSigil { + t.Errorf("token[0] = %s, want TokSigil", kindName(nonEOL[0].Kind)) + } + sigilText := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + if sigilText != source { + t.Errorf("sigil text = %q, want %q", sigilText, source) + } +} + +// --- Broken/incomplete code tests (LSP context: code is frequently mid-edit) --- + +func TestTokenize_UnterminatedString(t *testing.T) { + // User is mid-edit, string not closed + source := "def foo do\n x = \"hello\nend" + tokens := Tokenize([]byte(source)) + // Must not panic. Should produce tokens and end with TokEOF. + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_UnterminatedHeredoc(t *testing.T) { + source := "def foo do\n @doc \"\"\"\n some docs\n" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_UnterminatedSigil(t *testing.T) { + source := "~r(pattern" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } + if tokens[0].Kind != TokSigil { + t.Errorf("token[0] = %s, want TokSigil", kindName(tokens[0].Kind)) + } +} + +func TestTokenize_UnterminatedInterpolation(t *testing.T) { + // String with unclosed interpolation: "hello #{world + source := "\"hello #{world\nend" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_TrailingBackslashInString(t *testing.T) { + // String ending with backslash at EOF: "hello\ + source := "\"hello\\" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_LoneQuestion(t *testing.T) { + // ? at end of file + source := "?" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } + // Should emit TokCharLiteral (even if malformed) + if tokens[0].Kind != TokCharLiteral { + t.Errorf("token[0] = %s, want TokCharLiteral", kindName(tokens[0].Kind)) + } +} + +func TestTokenize_LoneTilde(t *testing.T) { + // ~ at end of file + source := "~" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_TildeLetterNoDelimiter(t *testing.T) { + // ~r at end of file (no delimiter) + source := "~r" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_EmptyString(t *testing.T) { + source := `""` + assertKinds(t, source, []TokenKind{TokString}) +} + +func TestTokenize_PartialDefmodule(t *testing.T) { + // User is typing "defmo" — not a full keyword yet + source := "defmo" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) != 1 || nonEOL[0].Kind != TokIdent { + t.Errorf("partial keyword 'defmo' should be TokIdent, got %v", kindSlice(kindsOf(nonEOL))) + } +} + +func TestTokenize_CodeAfterUnterminatedSigil(t *testing.T) { + // Even if a sigil eats the rest of the file, we must not panic or loop forever + source := "~s(unclosed sigil\ndefmodule Foo do\nend" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_AtSignAtEOF(t *testing.T) { + source := "@" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } +} + +func TestTokenize_ColonAtEOF(t *testing.T) { + source := ":" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF at end") + } + if tokens[0].Kind != TokColon { + t.Errorf("token[0] = %s, want TokColon", kindName(tokens[0].Kind)) + } +} + +func TestTokenize_UppercaseSigilHeredocNoEscapes(t *testing.T) { + // ~S""" should not process escapes + source := "~S\"\"\"\ncontains \\\" stuff\n\"\"\"\nafter" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // Should be: TokSigil (the whole heredoc), TokIdent("after") + if len(nonEOL) < 2 { + t.Fatalf("expected at least 2 tokens, got %d", len(nonEOL)) + } + if nonEOL[0].Kind != TokSigil { + t.Errorf("token[0] = %s, want TokSigil", kindName(nonEOL[0].Kind)) + } + lastTok := nonEOL[len(nonEOL)-1] + if lastTok.Kind != TokIdent || string([]byte(source)[lastTok.Start:lastTok.End]) != "after" { + t.Errorf("last token = %s %q, want TokIdent \"after\"", kindName(lastTok.Kind), string([]byte(source)[lastTok.Start:lastTok.End])) + } +} + +// --- Edge cases from Elixir tokenizer test suite cross-check --- + +func TestTokenize_EscapedInterpolation(t *testing.T) { + // \#{ inside a string is NOT interpolation — it's a literal #{ + source := `"hello \#{world}" <> rest` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // Should be: TokString("hello \#{world}"), TokOther(<), TokOther(>), TokIdent(rest) + if nonEOL[0].Kind != TokString { + t.Errorf("token[0] = %s, want TokString", kindName(nonEOL[0].Kind)) + } + lastTok := nonEOL[len(nonEOL)-1] + if lastTok.Kind != TokIdent || string([]byte(source)[lastTok.Start:lastTok.End]) != "rest" { + t.Errorf("last token = %s %q, want TokIdent \"rest\"", kindName(lastTok.Kind), string([]byte(source)[lastTok.Start:lastTok.End])) + } +} + +func TestTokenize_SigilEscapedDelimiter(t *testing.T) { + // ~s(f\(oo) — escaped paren inside sigil is literal, not nesting + source := `~s(f\(oo) + x` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if nonEOL[0].Kind != TokSigil { + t.Errorf("token[0] = %s, want TokSigil", kindName(nonEOL[0].Kind)) + } + sigilText := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + if sigilText != `~s(f\(oo)` { + t.Errorf("sigil text = %q, want %q", sigilText, `~s(f\(oo)`) + } + lastTok := nonEOL[len(nonEOL)-1] + if lastTok.Kind != TokIdent || string([]byte(source)[lastTok.Start:lastTok.End]) != "x" { + t.Errorf("last token = %s %q, want TokIdent \"x\"", kindName(lastTok.Kind), string([]byte(source)[lastTok.Start:lastTok.End])) + } +} + +func TestTokenize_DotAfterNewline(t *testing.T) { + // Dot on next line is a valid remote call continuation + source := "Foo\n.bar" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // TokModule("Foo"), TokDot, TokIdent("bar") + if len(nonEOL) != 3 { + t.Fatalf("expected 3 tokens, got %d: %v", len(nonEOL), kindSlice(kindsOf(nonEOL))) + } + if nonEOL[0].Kind != TokModule { + t.Errorf("token[0] = %s, want TokModule", kindName(nonEOL[0].Kind)) + } + if nonEOL[1].Kind != TokDot { + t.Errorf("token[1] = %s, want TokDot", kindName(nonEOL[1].Kind)) + } + if nonEOL[2].Kind != TokIdent { + t.Errorf("token[2] = %s, want TokIdent", kindName(nonEOL[2].Kind)) + } +} + +func TestTokenize_AtomWithOperatorName(t *testing.T) { + // Atoms can be operator names: :+, :==, :|| + source := ":+ :== :||" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // These would not match our ident-based atom scanner — they become TokColon + TokOther + // That's acceptable for Dexter's purposes. Just verify no panic. + if len(nonEOL) == 0 { + t.Error("expected some tokens") + } +} + +func TestTokenize_HeredocOpeningMustEndLine(t *testing.T) { + // """ followed by content on the same line is NOT a heredoc — it's an empty string + string + // Actually in Elixir, content after opening """ on same line is allowed and part of heredoc + // But closing """ must be on its own line + source := "\"\"\"\ncontent\n\"\"\"" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) != 1 || nonEOL[0].Kind != TokHeredoc { + t.Errorf("expected single TokHeredoc, got %v", kindSlice(kindsOf(nonEOL))) + } +} + +func TestTokenize_HeredocWithIndentedClosing(t *testing.T) { + // Closing """ can have leading whitespace + source := "\"\"\"\n content\n \"\"\"" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) != 1 || nonEOL[0].Kind != TokHeredoc { + t.Errorf("expected single TokHeredoc, got %v", kindSlice(kindsOf(nonEOL))) + } +} + +func TestTokenize_SigilWithModifiers(t *testing.T) { + // ~r/foo/iu — modifiers after closing delimiter + source := `~r/foo/iu + x` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if nonEOL[0].Kind != TokSigil { + t.Errorf("token[0] = %s, want TokSigil", kindName(nonEOL[0].Kind)) + } + sigilText := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + if sigilText != "~r/foo/iu" { + t.Errorf("sigil text = %q, want %q", sigilText, "~r/foo/iu") + } +} + +func TestTokenize_QuotedAtomWithInterpolation(t *testing.T) { + // :"hello #{world}" — quoted atom with interpolation + source := `:\"hello #{world}\" <> rest` + // This doesn't actually produce the right escape in Go... let me use a raw approach: + src := []byte{ + ':', '"', 'h', 'e', 'l', 'l', 'o', ' ', + '#', '{', 'w', 'o', 'r', 'l', 'd', '}', + '"', ' ', '<', '>', ' ', 'r', 'e', 's', 't', + } + _ = source + tokens := Tokenize(src) + if tokens[0].Kind != TokAtom { + t.Errorf("token[0] = %s, want TokAtom", kindName(tokens[0].Kind)) + } +} + +func TestTokenize_NumberLiterals(t *testing.T) { + assertKinds(t, "42", []TokenKind{TokNumber}) + assertKinds(t, "1_000_000", []TokenKind{TokNumber}) + assertKinds(t, "0xFF", []TokenKind{TokNumber}) + assertKinds(t, "0b101", []TokenKind{TokNumber}) + assertKinds(t, "0o777", []TokenKind{TokNumber}) + assertKinds(t, "3.14", []TokenKind{TokNumber}) + assertKinds(t, "1.0e10", []TokenKind{TokNumber}) + assertKinds(t, "1.0e-3", []TokenKind{TokNumber}) + // Multiple numbers with operators + assertKinds(t, "1 + 2", []TokenKind{TokNumber, TokOther, TokNumber}) +} + +func TestTokenize_OnlyWhitespace(t *testing.T) { + source := " \t\t \n \n" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF") + } +} + +func TestTokenize_NullBytesInSource(t *testing.T) { + // Binary content in file — should not panic + source := []byte("def foo\x00do\nend") + tokens := Tokenize(source) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF") + } +} + +func TestTokenize_TripleColons(t *testing.T) { + // ::: — should not cause issues + source := ":::" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF") + } +} + +func TestTokenize_DoubleColonInSpec(t *testing.T) { + // @spec foo() :: {:ok, term()} + source := "@spec foo() :: {:ok, term()}" + assertKinds(t, source, []TokenKind{ + TokAttrSpec, // @spec + TokIdent, // foo + TokOpenParen, // ( + TokCloseParen, // ) + TokDoubleColon, // :: + TokOpenBrace, // { + TokAtom, // :ok + TokComma, // , + TokIdent, // term + TokOpenParen, // ( + TokCloseParen, // ) + TokCloseBrace, // } + }) +} + +func TestTokenize_CharLiteralBeforeString(t *testing.T) { + // ?", "hello" — char literal then string, not confused + source := `?", "hello"` + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) < 3 { + t.Fatalf("expected at least 3 tokens, got %d: %v", len(nonEOL), kindSlice(kindsOf(nonEOL))) + } + if nonEOL[0].Kind != TokCharLiteral { + t.Errorf("token[0] = %s, want TokCharLiteral", kindName(nonEOL[0].Kind)) + } + // Find the string token + found := false + for _, tok := range nonEOL { + if tok.Kind == TokString { + found = true + break + } + } + if !found { + t.Error("expected a TokString token for \"hello\"") + } +} + +// --- Unicode identifier tests --- + +func TestTokenize_UnicodeModuleName(t *testing.T) { + // Élixir is a valid module name with Unicode + source := "defmodule Élixir do end" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) < 3 { + t.Fatalf("expected at least 3 tokens, got %d: %v", len(nonEOL), kindSlice(kindsOf(nonEOL))) + } + if nonEOL[0].Kind != TokDefmodule { + t.Errorf("token[0] = %s, want TokDefmodule", kindName(nonEOL[0].Kind)) + } + if nonEOL[1].Kind != TokModule { + t.Errorf("token[1] = %s, want TokModule", kindName(nonEOL[1].Kind)) + } + text := string([]byte(source)[nonEOL[1].Start:nonEOL[1].End]) + if text != "Élixir" { + t.Errorf("module name = %q, want %q", text, "Élixir") + } +} + +func TestTokenize_UnicodeLowercaseIdent(t *testing.T) { + // ólá is a valid lowercase identifier + source := "ólá = 1" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) < 1 { + t.Fatal("expected at least 1 token") + } + if nonEOL[0].Kind != TokIdent { + t.Errorf("token[0] = %s, want TokIdent", kindName(nonEOL[0].Kind)) + } + text := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + if text != "ólá" { + t.Errorf("ident text = %q, want %q", text, "ólá") + } +} + +func TestTokenize_UnicodeAtom(t *testing.T) { + // :ólá is a valid atom + source := ":ólá" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) != 1 { + t.Fatalf("expected 1 token, got %d: %v", len(nonEOL), kindSlice(kindsOf(nonEOL))) + } + if nonEOL[0].Kind != TokAtom { + t.Errorf("token[0] = %s, want TokAtom", kindName(nonEOL[0].Kind)) + } + text := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + if text != ":ólá" { + t.Errorf("atom text = %q, want %q", text, ":ólá") + } +} + +func TestTokenize_UnicodeInModuleDotChain(t *testing.T) { + // Élixir.Módulo.foo + source := "Élixir.Módulo.foo" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + // TokModule("Élixir"), TokDot, TokModule("Módulo"), TokDot, TokIdent("foo") + if len(nonEOL) != 5 { + t.Fatalf("expected 5 tokens, got %d: %v", len(nonEOL), kindSlice(kindsOf(nonEOL))) + } + if nonEOL[0].Kind != TokModule { + t.Errorf("token[0] = %s, want TokModule", kindName(nonEOL[0].Kind)) + } + if nonEOL[2].Kind != TokModule { + t.Errorf("token[2] = %s, want TokModule", kindName(nonEOL[2].Kind)) + } +} + +func TestTokenize_JapaneseAtom(t *testing.T) { + // :こんにちは — CJK characters in atom + source := ":こんにちは" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if len(nonEOL) != 1 || nonEOL[0].Kind != TokAtom { + t.Errorf("expected single TokAtom, got %v", kindSlice(kindsOf(nonEOL))) + } +} + +func TestTokenize_MixedASCIIUnicodeIdent(t *testing.T) { + // http_сервер — Latin + Cyrillic + source := "http_сервер = 1" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if nonEOL[0].Kind != TokIdent { + t.Errorf("token[0] = %s, want TokIdent", kindName(nonEOL[0].Kind)) + } + text := string([]byte(source)[nonEOL[0].Start:nonEOL[0].End]) + if text != "http_сервер" { + t.Errorf("ident text = %q, want %q", text, "http_сервер") + } +} + +func TestTokenize_EmojiIsNotIdentifier(t *testing.T) { + // 🎉 is not a letter — should be TokOther + source := "🎉 + x" + tokens := tokenizeNoEOF(source) + var nonEOL []Token + for _, tok := range tokens { + if tok.Kind != TokEOL { + nonEOL = append(nonEOL, tok) + } + } + if nonEOL[0].Kind == TokIdent || nonEOL[0].Kind == TokModule { + t.Errorf("emoji should not be TokIdent or TokModule, got %s", kindName(nonEOL[0].Kind)) + } + // The "x" at the end should still be a valid TokIdent + lastTok := nonEOL[len(nonEOL)-1] + if lastTok.Kind != TokIdent { + t.Errorf("last token = %s, want TokIdent", kindName(lastTok.Kind)) + } +} + +func TestTokenize_AttrWithUnicode(t *testing.T) { + // @módulo — attribute name containing Unicode + source := "@módulo" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected TokEOF") + } + if tokens[0].Kind != TokAttr { + t.Errorf("token[0] = %s, want TokAttr", kindName(tokens[0].Kind)) + } + text := string([]byte(source)[tokens[0].Start:tokens[0].End]) + if text != "@módulo" { + t.Errorf("attr text = %q, want %q", text, "@módulo") + } +} + +// --- Keyword list edge cases --- + +func TestTokenize_KeywordList(t *testing.T) { + // [:foo, :bar, three: :four, five: :six] + source := "[:foo, :bar, three: :four, five: :six]" + assertKinds(t, source, []TokenKind{ + TokOpenBracket, // [ + TokAtom, // :foo + TokComma, // , + TokAtom, // :bar + TokComma, // , + TokIdent, // three + TokColon, // : + TokAtom, // :four + TokComma, // , + TokIdent, // five + TokColon, // : + TokAtom, // :six + TokCloseBracket, // ] + }) +} + +func TestTokenize_KeywordListWithTuples(t *testing.T) { + // [:foo, :bar, {:three, :four}] + source := "[:foo, :bar, {:three, :four}]" + assertKinds(t, source, []TokenKind{ + TokOpenBracket, // [ + TokAtom, // :foo + TokComma, // , + TokAtom, // :bar + TokComma, // , + TokOpenBrace, // { + TokAtom, // :three + TokComma, // , + TokAtom, // :four + TokCloseBrace, // } + TokCloseBracket, // ] + }) +} + +func TestTokenize_KeywordArgInFunctionCall(t *testing.T) { + // func(arg, key: value, other: thing) + source := "func(arg, key: value, other: thing)" + assertKinds(t, source, []TokenKind{ + TokIdent, // func + TokOpenParen, // ( + TokIdent, // arg + TokComma, // , + TokIdent, // key + TokColon, // : + TokIdent, // value + TokComma, // , + TokIdent, // other + TokColon, // : + TokIdent, // thing + TokCloseParen, // ) + }) +} + +func TestTokenize_AliasWithKeywordAs(t *testing.T) { + // alias MyApp.Foo, as: Bar + source := "alias MyApp.Foo, as: Bar" + assertKinds(t, source, []TokenKind{ + TokAlias, // alias + TokModule, // MyApp + TokDot, // . + TokModule, // Foo + TokComma, // , + TokIdent, // as + TokColon, // : + TokModule, // Bar + }) +} + +func TestTokenize_UseWithKeywordOpts(t *testing.T) { + // use Phoenix.Controller, namespace: MyApp.Web + source := "use Phoenix.Controller, namespace: MyApp.Web" + assertKinds(t, source, []TokenKind{ + TokUse, // use + TokModule, // Phoenix + TokDot, // . + TokModule, // Controller + TokComma, // , + TokIdent, // namespace + TokColon, // : + TokModule, // MyApp + TokDot, // . + TokModule, // Web + }) +} + +func TestTokenize_MapLiteral(t *testing.T) { + // %{name: "foo", age: 1} + source := `%{name: "foo", age: 1}` + assertKinds(t, source, []TokenKind{ + TokPercent, // % + TokOpenBrace, // { + TokIdent, // name + TokColon, // : + TokString, // "foo" + TokComma, // , + TokIdent, // age + TokColon, // : + TokNumber, // 1 + TokCloseBrace, // } + }) +} + +func TestTokenize_StructLiteral(t *testing.T) { + // %User{name: "foo"} + source := `%User{name: "foo"}` + assertKinds(t, source, []TokenKind{ + TokPercent, // % + TokModule, // User + TokOpenBrace, // { + TokIdent, // name + TokColon, // : + TokString, // "foo" + TokCloseBrace, // } + }) +} + +func TestTokenize_DefdelegateWithKeywordOpts(t *testing.T) { + // defdelegate foo(x), to: Mod, as: :bar + source := "defdelegate foo(x), to: Mod, as: :bar" + assertKinds(t, source, []TokenKind{ + TokDefdelegate, // defdelegate + TokIdent, // foo + TokOpenParen, // ( + TokIdent, // x + TokCloseParen, // ) + TokComma, // , + TokIdent, // to + TokColon, // : + TokModule, // Mod + TokComma, // , + TokIdent, // as + TokColon, // : + TokAtom, // :bar + }) +} + +func TestTokenize_AfterDotKeywordBecomesIdent(t *testing.T) { + // foo.do should emit TokIdent, not TokDo + assertKinds(t, "foo.do", []TokenKind{ + TokIdent, // foo + TokDot, // . + TokIdent, // do (de-keyworded because after dot) + }) + // foo.end should emit TokIdent, not TokEnd + assertKinds(t, "foo.end", []TokenKind{ + TokIdent, // foo + TokDot, // . + TokIdent, // end + }) + // foo.def should emit TokIdent, not TokDef + assertKinds(t, "foo.def", []TokenKind{ + TokIdent, // foo + TokDot, // . + TokIdent, // def + }) + // foo.fn should emit TokIdent, not TokFn + assertKinds(t, "foo.fn", []TokenKind{ + TokIdent, // foo + TokDot, // . + TokIdent, // fn + }) +} + +func TestTokenize_AfterDotKeywordWithNewline(t *testing.T) { + // afterDot persists through whitespace and newlines + source := "foo.\n do" + assertKinds(t, source, []TokenKind{ + TokIdent, TokDot, TokEOL, TokIdent, + }) +} + +func TestTokenize_AfterDotKeywordWithComment(t *testing.T) { + // afterDot persists through comments + source := "foo. # comment\ndo" + assertKinds(t, source, []TokenKind{ + TokIdent, TokDot, TokComment, TokEOL, TokIdent, + }) +} + +func TestTokenize_AfterDotClearedByNonDot(t *testing.T) { + // afterDot does NOT persist through other tokens + // foo.bar, do — the comma clears afterDot, so "do" is a keyword + assertKinds(t, "foo.bar, do", []TokenKind{ + TokIdent, TokDot, TokIdent, TokComma, TokDo, + }) +} + +func TestTokenize_OperatorAtoms(t *testing.T) { + assertKinds(t, ":+", []TokenKind{TokAtom}) + assertKinds(t, ":-", []TokenKind{TokAtom}) + assertKinds(t, ":&&", []TokenKind{TokAtom}) + assertKinds(t, ":>>>", []TokenKind{TokAtom}) + assertKinds(t, ":||", []TokenKind{TokAtom}) + assertKinds(t, ":|>", []TokenKind{TokAtom}) + assertKinds(t, ":!", []TokenKind{TokAtom}) + assertKinds(t, ":~", []TokenKind{TokAtom}) + assertKinds(t, ":\\\\", []TokenKind{TokAtom}) +} + +func TestTokenize_IdentWithAt(t *testing.T) { + // Elixir allows @ inside identifiers (e.g. a@b) + assertKinds(t, "a@b", []TokenKind{TokIdent}) + + tokens := tokenizeNoEOF("a@b") + if string([]byte("a@b")[tokens[0].Start:tokens[0].End]) != "a@b" { + t.Errorf("expected a@b to be a single identifier") + } +} + +func TestTokenize_When(t *testing.T) { + // when as keyword in guard clause + assertKinds(t, "def foo(x) when is_integer(x) do", []TokenKind{ + TokDef, TokIdent, TokOpenParen, TokIdent, TokCloseParen, + TokWhen, TokIdent, TokOpenParen, TokIdent, TokCloseParen, TokDo, + }) + // when: as keyword key → TokIdent, not TokWhen + assertKinds(t, "[when: true]", []TokenKind{ + TokOpenBracket, TokIdent, TokColon, TokIdent, TokCloseBracket, + }) +} + +func TestTokenize_KeywordAsKeywordKey(t *testing.T) { + // do: inline syntax → TokIdent, not TokDo + assertKinds(t, "def foo, do: :bar", []TokenKind{ + TokDef, TokIdent, TokComma, TokIdent, TokColon, TokAtom, + }) + // end: as keyword key + assertKinds(t, "[end: 1]", []TokenKind{ + TokOpenBracket, TokIdent, TokColon, TokNumber, TokCloseBracket, + }) + // fn: as keyword key + assertKinds(t, "[fn: 1]", []TokenKind{ + TokOpenBracket, TokIdent, TokColon, TokNumber, TokCloseBracket, + }) + // do without colon is still TokDo + assertKinds(t, "do", []TokenKind{TokDo}) + // do:: (double colon) — do is still TokDo, :: is TokDoubleColon + assertKinds(t, "do::", []TokenKind{TokDo, TokDoubleColon}) +} + +func TestTokenizeFull_LineStarts(t *testing.T) { + source := "defmodule Foo do\n def bar do\n :ok\n end\nend\n" + result := TokenizeFull([]byte(source)) + + // 6 lines (5 newlines + line 1) + if len(result.LineStarts) != 6 { + t.Fatalf("expected 6 line starts, got %d: %v", len(result.LineStarts), result.LineStarts) + } + if result.LineStarts[0] != 0 { + t.Errorf("line 1 should start at 0, got %d", result.LineStarts[0]) + } + // Line 2 starts after "defmodule Foo do\n" = 17 bytes + if result.LineStarts[1] != 17 { + t.Errorf("line 2 should start at 17, got %d", result.LineStarts[1]) + } + + // Verify column calculation: "def" on line 2 starts at byte 19 (2 spaces + "def") + // Column = offset - lineStarts[line-1] = 19 - 17 = 2 (0-based) + defTok := result.Tokens[0] + for _, tok := range result.Tokens { + if tok.Kind == TokDef { + defTok = tok + break + } + } + col := defTok.Start - result.LineStarts[defTok.Line-1] + if col != 2 { + t.Errorf("def column should be 2 (0-based), got %d", col) + } +} + +func TestTokenize_LeftArrow(t *testing.T) { + // <- in for comprehension + assertKinds(t, "for x <- list do", []TokenKind{ + TokIdent, TokIdent, TokLeftArrow, TokIdent, TokDo, + }) + // << is still TokOpenAngle, not confused with <- + assertKinds(t, "<>", []TokenKind{TokOpenAngle, TokIdent, TokCloseAngle}) +} + +func TestTokenize_MultiCharSigil(t *testing.T) { + // ~HTML is a multi-char uppercase sigil (Elixir 1.15+) + assertKinds(t, `~HTML"""
hello
"""`, []TokenKind{TokSigil}) + assertKinds(t, `~HEEX"
"`, []TokenKind{TokSigil}) + assertKinds(t, `~JSON[{"a": 1}]`, []TokenKind{TokSigil}) + + // Verify the full token text is captured + tokens := tokenizeNoEOF(`~HTML"""
hello
"""`) + source := `~HTML"""
hello
"""` + if string(source[tokens[0].Start:tokens[0].End]) != source { + t.Errorf("expected full sigil text, got %q", string(source[tokens[0].Start:tokens[0].End])) + } + + // Multi-char sigils are raw (no escape processing), like uppercase single-char + raw := `~HTML"""hello\nworld"""` + tokensRaw := tokenizeNoEOF(raw) + if len(tokensRaw) != 1 || tokensRaw[0].Kind != TokSigil { + t.Errorf("expected single TokSigil for raw multi-char sigil") + } + + // Single lowercase letter is still just one char — ~sigil is NOT multi-char + // ~s followed by ( is a normal single-char sigil + assertKinds(t, `~s(hello)`, []TokenKind{TokSigil}) +} + +func TestTokenize_BrokenCodeRecovery(t *testing.T) { + // Unterminated string consumes to EOF (Elixir allows multi-line strings). + // The key property: we don't crash and always produce EOF. + source := "\"unterminated\ndefmodule Foo do\nend" + tokens := Tokenize([]byte(source)) + if tokens[len(tokens)-1].Kind != TokEOF { + t.Fatal("expected EOF token at end") + } + + // If the broken string is accidentally closed by a later quote, code after recovers. + source2 := "\"oops\" def bar, do: :ok" + tokens2 := Tokenize([]byte(source2)) + if tokens2[len(tokens2)-1].Kind != TokEOF { + t.Fatal("expected EOF token at end") + } + hasDef := false + for _, tok := range tokens2 { + if tok.Kind == TokDef { + hasDef = true + } + } + if !hasDef { + t.Error("expected to find TokDef after closed string") + } +} + +func TestTokenize_UnterminatedQuotedAtom(t *testing.T) { + // :"foo without closing quote — should not panic, should produce tokens + source := `:\"foo` + tokens := Tokenize([]byte(source)) + if len(tokens) == 0 { + t.Fatal("expected at least one token") + } + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected EOF token at end") + } +} + +func TestTokenize_PartialUTF8(t *testing.T) { + // Truncated UTF-8 sequence: é is 0xC3 0xA9, send only 0xC3 + source := []byte{0xC3} + tokens := Tokenize(source) + if len(tokens) == 0 { + t.Fatal("expected at least one token") + } + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected EOF token at end") + } +} + +func TestTokenize_PartialUTF8InIdentifier(t *testing.T) { + // Valid identifier start, then truncated UTF-8 + source := []byte{'f', 'o', 'o', 0xC3} + tokens := Tokenize(source) + hasIdent := false + for _, tok := range tokens { + if tok.Kind == TokIdent { + hasIdent = true + } + } + if !hasIdent { + t.Error("expected TokIdent for 'foo' before truncated UTF-8") + } + if tokens[len(tokens)-1].Kind != TokEOF { + t.Error("expected EOF token at end") + } +} + +func TestTokenize_MidEditIncompleteFunction(t *testing.T) { + // User is mid-edit: typed "def " and hasn't finished + source := "defmodule Foo do\n def \nend" + assertKinds(t, source, []TokenKind{ + TokDefmodule, TokModule, TokDo, TokEOL, + TokDef, TokEOL, + TokEnd, + }) +} + +func TestTokenize_MidEditIncompletePipe(t *testing.T) { + // User is mid-edit with a pipe: "foo |> " + source := "foo |> " + assertKinds(t, source, []TokenKind{ + TokIdent, TokPipe, + }) +} + +func TestTokenize_MultipleConsecutiveErrors(t *testing.T) { + // Multiple broken constructs — must not panic, must always reach EOF. + // Unterminated strings/sigils consume greedily, so we can't guarantee + // recovery of tokens after them. The invariant is: no crash, always EOF. + cases := []string{ + "\"unterminated\n~r/unclosed\n:'also broken\ndef valid_func do\nend", + "~s[[[[\n\n\n", + ":\"\n:\"\n:\"\ndef foo, do: :ok", + "?", + "~", + "@\n@\n@", + "::::", + "...", + "<<<>>><<<>>>", + } + for _, source := range cases { + tokens := Tokenize([]byte(source)) + if len(tokens) == 0 { + t.Errorf("source %q: expected at least one token", source) + continue + } + if tokens[len(tokens)-1].Kind != TokEOF { + t.Errorf("source %q: expected EOF at end, got %s", source, kindName(tokens[len(tokens)-1].Kind)) + } + } +} + +func TestTokenize_FullModuleIntegration(t *testing.T) { + source := `defmodule MyApp.Accounts.User do + @moduledoc """ + A user account. + """ + + use Ecto.Schema + alias MyApp.Accounts.Role + import MyApp.Helpers, only: [normalize: 1] + + @primary_key {:id, :binary_id, autogenerate: true} + + defstruct [:name, :email, active?: true] + + defmodule Settings do + @moduledoc false + + defstruct theme: "dark", locale: "en" + + def default do + %__MODULE__{} + end + end + + @spec changeset(t(), map()) :: Ecto.Changeset.t() + def changeset(%__MODULE__{} = user, attrs \\ %{}) do + user + |> cast(attrs, [:name, :email]) + |> validate_required([:email]) + end + + defp normalize_email(email) when is_binary(email) do + email + |> String.downcase() + |> String.trim() + end + + defmacro __using__(_opts) do + quote do + import unquote(__MODULE__) + end + end + + defdelegate find(id), to: MyApp.Accounts.Repo, as: :get_user + + @type t :: %__MODULE__{ + name: String.t(), + email: String.t() + } +end +` + tokens := Tokenize([]byte(source)) + + type expected struct { + kind TokenKind + text string + } + want := []expected{ + {TokDefmodule, "defmodule"}, + {TokModule, "MyApp"}, + {TokDot, "."}, + {TokModule, "Accounts"}, + {TokDot, "."}, + {TokModule, "User"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokAttrDoc, "@moduledoc"}, + {TokHeredoc, "\"\"\"\n A user account.\n \"\"\""}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + {TokUse, "use"}, + {TokModule, "Ecto"}, + {TokDot, "."}, + {TokModule, "Schema"}, + {TokEOL, "\n"}, + {TokAlias, "alias"}, + {TokModule, "MyApp"}, + {TokDot, "."}, + {TokModule, "Accounts"}, + {TokDot, "."}, + {TokModule, "Role"}, + {TokEOL, "\n"}, + {TokImport, "import"}, + {TokModule, "MyApp"}, + {TokDot, "."}, + {TokModule, "Helpers"}, + {TokComma, ","}, + {TokIdent, "only"}, + {TokColon, ":"}, + {TokOpenBracket, "["}, + {TokIdent, "normalize"}, + {TokColon, ":"}, + {TokNumber, "1"}, + {TokCloseBracket, "]"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + {TokAttr, "@primary_key"}, + {TokOpenBrace, "{"}, + {TokAtom, ":id"}, + {TokComma, ","}, + {TokAtom, ":binary_id"}, + {TokComma, ","}, + {TokIdent, "autogenerate"}, + {TokColon, ":"}, + {TokIdent, "true"}, + {TokCloseBrace, "}"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + {TokDefstruct, "defstruct"}, + {TokOpenBracket, "["}, + {TokAtom, ":name"}, + {TokComma, ","}, + {TokAtom, ":email"}, + {TokComma, ","}, + {TokIdent, "active?"}, + {TokColon, ":"}, + {TokIdent, "true"}, + {TokCloseBracket, "]"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + // nested module + {TokDefmodule, "defmodule"}, + {TokModule, "Settings"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokAttrDoc, "@moduledoc"}, + {TokIdent, "false"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + {TokDefstruct, "defstruct"}, + {TokIdent, "theme"}, + {TokColon, ":"}, + {TokString, "\"dark\""}, + {TokComma, ","}, + {TokIdent, "locale"}, + {TokColon, ":"}, + {TokString, "\"en\""}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + {TokDef, "def"}, + {TokIdent, "default"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokPercent, "%"}, + {TokModule, "__MODULE__"}, + {TokOpenBrace, "{"}, + {TokCloseBrace, "}"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + // @spec changeset(t(), map()) :: Ecto.Changeset.t() + {TokAttrSpec, "@spec"}, + {TokIdent, "changeset"}, + {TokOpenParen, "("}, + {TokIdent, "t"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokComma, ","}, + {TokIdent, "map"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokCloseParen, ")"}, + {TokDoubleColon, "::"}, + {TokModule, "Ecto"}, + {TokDot, "."}, + {TokModule, "Changeset"}, + {TokDot, "."}, + {TokIdent, "t"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + // def changeset(%__MODULE__{} = user, attrs \\ %{}) do + {TokDef, "def"}, + {TokIdent, "changeset"}, + {TokOpenParen, "("}, + {TokPercent, "%"}, + {TokModule, "__MODULE__"}, + {TokOpenBrace, "{"}, + {TokCloseBrace, "}"}, + {TokOther, "="}, + {TokIdent, "user"}, + {TokComma, ","}, + {TokIdent, "attrs"}, + {TokBackslash, "\\\\"}, + {TokPercent, "%"}, + {TokOpenBrace, "{"}, + {TokCloseBrace, "}"}, + {TokCloseParen, ")"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokIdent, "user"}, + {TokEOL, "\n"}, + {TokPipe, "|>"}, + {TokIdent, "cast"}, + {TokOpenParen, "("}, + {TokIdent, "attrs"}, + {TokComma, ","}, + {TokOpenBracket, "["}, + {TokAtom, ":name"}, + {TokComma, ","}, + {TokAtom, ":email"}, + {TokCloseBracket, "]"}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + {TokPipe, "|>"}, + {TokIdent, "validate_required"}, + {TokOpenParen, "("}, + {TokOpenBracket, "["}, + {TokAtom, ":email"}, + {TokCloseBracket, "]"}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + // defp normalize_email(email) when is_binary(email) do + {TokDefp, "defp"}, + {TokIdent, "normalize_email"}, + {TokOpenParen, "("}, + {TokIdent, "email"}, + {TokCloseParen, ")"}, + {TokWhen, "when"}, + {TokIdent, "is_binary"}, + {TokOpenParen, "("}, + {TokIdent, "email"}, + {TokCloseParen, ")"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokIdent, "email"}, + {TokEOL, "\n"}, + {TokPipe, "|>"}, + {TokModule, "String"}, + {TokDot, "."}, + {TokIdent, "downcase"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + {TokPipe, "|>"}, + {TokModule, "String"}, + {TokDot, "."}, + {TokIdent, "trim"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + // defmacro __using__(_opts) do + {TokDefmacro, "defmacro"}, + {TokIdent, "__using__"}, + {TokOpenParen, "("}, + {TokIdent, "_opts"}, + {TokCloseParen, ")"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokIdent, "quote"}, + {TokDo, "do"}, + {TokEOL, "\n"}, + {TokImport, "import"}, + {TokIdent, "unquote"}, + {TokOpenParen, "("}, + {TokModule, "__MODULE__"}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + // defdelegate find(id), to: MyApp.Accounts.Repo, as: :get_user + {TokDefdelegate, "defdelegate"}, + {TokIdent, "find"}, + {TokOpenParen, "("}, + {TokIdent, "id"}, + {TokCloseParen, ")"}, + {TokComma, ","}, + {TokIdent, "to"}, + {TokColon, ":"}, + {TokModule, "MyApp"}, + {TokDot, "."}, + {TokModule, "Accounts"}, + {TokDot, "."}, + {TokModule, "Repo"}, + {TokComma, ","}, + {TokIdent, "as"}, + {TokColon, ":"}, + {TokAtom, ":get_user"}, + {TokEOL, "\n"}, + {TokEOL, "\n"}, + // @type t :: %__MODULE__{...} + {TokAttrType, "@type"}, + {TokIdent, "t"}, + {TokDoubleColon, "::"}, + {TokPercent, "%"}, + {TokModule, "__MODULE__"}, + {TokOpenBrace, "{"}, + {TokEOL, "\n"}, + {TokIdent, "name"}, + {TokColon, ":"}, + {TokModule, "String"}, + {TokDot, "."}, + {TokIdent, "t"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokComma, ","}, + {TokEOL, "\n"}, + {TokIdent, "email"}, + {TokColon, ":"}, + {TokModule, "String"}, + {TokDot, "."}, + {TokIdent, "t"}, + {TokOpenParen, "("}, + {TokCloseParen, ")"}, + {TokEOL, "\n"}, + {TokCloseBrace, "}"}, + {TokEOL, "\n"}, + {TokEnd, "end"}, + {TokEOL, "\n"}, + {TokEOF, ""}, + } + + if len(tokens) != len(want) { + t.Fatalf("token count mismatch: got %d, want %d", len(tokens), len(want)) + } + for i, w := range want { + tok := tokens[i] + gotText := string(source[tok.Start:tok.End]) + if tok.Kind != w.kind || gotText != w.text { + t.Errorf("token[%d]: got {%s, %q}, want {%s, %q}", + i, kindName(tok.Kind), gotText, kindName(w.kind), w.text) + } + } +} + +func kindsOf(tokens []Token) []TokenKind { + kinds := make([]TokenKind, len(tokens)) + for i, tok := range tokens { + kinds[i] = tok.Kind + } + return kinds +} + +func TestTokenize_EscapedNewlineLineTracking(t *testing.T) { + // Regression: backslash-escaped newlines were not incrementing the line + // counter, causing all subsequent tokens to have wrong line numbers. + // This affected scanStringContent, scanHeredocContent, scanInterpolation, + // scanSigilContent, and char literals. + + findLine := func(t *testing.T, src string, kind TokenKind) int { + t.Helper() + tokens := Tokenize([]byte(src)) + for _, tok := range tokens { + if tok.Kind == kind { + return tok.Line + } + } + t.Fatalf("token kind %d not found", kind) + return 0 + } + + t.Run("heredoc", func(t *testing.T) { + src := "@doc \"\"\"\n Line one \\\n continued \\\n more\n \"\"\"\n defmacro foo do\n end\n" + if got := findLine(t, src, TokDefmacro); got != 6 { + t.Errorf("defmacro line=%d, want 6", got) + } + }) + + t.Run("regular string", func(t *testing.T) { + src := "x = \"line one \\\n continued\"\ndefmacro bar do\nend\n" + if got := findLine(t, src, TokDefmacro); got != 3 { + t.Errorf("defmacro line=%d, want 3", got) + } + }) + + t.Run("interpolation", func(t *testing.T) { + // Escaped newline inside #{} interpolation + src := "x = \"hello #{a \\\n b}\"\ndef foo do\nend\n" + if got := findLine(t, src, TokDef); got != 3 { + t.Errorf("def line=%d, want 3", got) + } + }) + + t.Run("sigil nested parens", func(t *testing.T) { + // ~s(...\\\n...) — lowercase sigil with escaped newline inside parens + src := "x = ~s(hello \\\n world)\ndef foo do\nend\n" + if got := findLine(t, src, TokDef); got != 3 { + t.Errorf("def line=%d, want 3", got) + } + }) + + t.Run("sigil non-nested slash", func(t *testing.T) { + // ~r/...\\\n.../ — lowercase sigil with slash delimiter + src := "x = ~r/hello \\\n world/\ndef foo do\nend\n" + if got := findLine(t, src, TokDef); got != 3 { + t.Errorf("def line=%d, want 3", got) + } + }) + + t.Run("char literal escaped newline", func(t *testing.T) { + // ?\\\n is the char literal for newline (?\n) + src := "x = ?\\\ndef foo do\nend\n" + if got := findLine(t, src, TokDef); got != 2 { + t.Errorf("def line=%d, want 2", got) + } + }) + + t.Run("quoted atom string", func(t *testing.T) { + // :"...\\\n..." — escaped newline in quoted atom + src := "x = :\"hello \\\n world\"\ndef foo do\nend\n" + if got := findLine(t, src, TokDef); got != 3 { + t.Errorf("def line=%d, want 3", got) + } + }) + + t.Run("multiple escaped newlines accumulate", func(t *testing.T) { + // 3 escaped newlines should shift the line by 3 + src := "x = \"a \\\nb \\\nc \\\nd\"\ndef foo do\nend\n" + if got := findLine(t, src, TokDef); got != 5 { + t.Errorf("def line=%d, want 5", got) + } + }) +} + +func TestLineStartsAccuracy(t *testing.T) { + assertLineStarts := func(t *testing.T, src string) { + t.Helper() + result := TokenizeFull([]byte(src)) + lineStarts := result.LineStarts + lines := strings.Split(src, "\n") + if len(lineStarts) != len(lines) { + t.Fatalf("lineStarts has %d entries but source has %d lines", len(lineStarts), len(lines)) + } + for i, ls := range lineStarts { + if ls > len(src) { + t.Errorf("lineStarts[%d] = %d out of range", i, ls) + continue + } + end := ls + for end < len(src) && src[end] != '\n' { + end++ + } + if got := src[ls:end]; got != lines[i] { + t.Errorf("lineStarts[%d] = %d -> %q, want %q", i, ls, got, lines[i]) + } + } + } + + assertTokenAt := func(t *testing.T, src string, line0, col int, wantKind TokenKind, wantText string) { + t.Helper() + result := TokenizeFull([]byte(src)) + offset := LineColToOffset(result.LineStarts, line0, col) + idx := TokenAtOffset(result.Tokens, offset) + if idx < 0 { + t.Fatalf("no token at line %d col %d (offset %d)", line0, col, offset) + } + tok := result.Tokens[idx] + if tok.Kind != wantKind { + t.Errorf("token kind = %d, want %d", tok.Kind, wantKind) + } + if text := TokenText([]byte(src), tok); text != wantText { + t.Errorf("token text = %q, want %q", text, wantText) + } + } + + t.Run("heredoc", func(t *testing.T) { + src := "defmodule MyApp.Example do\n @moduledoc \"\"\"\n This is a long\n multiline heredoc\n with several lines\n of documentation.\n \"\"\"\n\n @type t :: %__MODULE__{\n name: String.t(),\n age: Integer.t()\n }\n\n def hello do\n :world\n end\nend" + assertLineStarts(t, src) + assertTokenAt(t, src, 9, 16, TokModule, "String") + }) + + t.Run("multiline string", func(t *testing.T) { + src := "x = \"line one\nline two\nline three\"\ny = Enum.map(list, fn x -> x end)" + assertLineStarts(t, src) + assertTokenAt(t, src, 3, 4, TokModule, "Enum") + }) + + t.Run("sigil heredoc", func(t *testing.T) { + src := "x = ~s\"\"\"\nline one\nline two\n\"\"\"\ny = MyModule.func()" + assertLineStarts(t, src) + assertTokenAt(t, src, 4, 4, TokModule, "MyModule") + }) + + t.Run("multiline interpolation", func(t *testing.T) { + src := "x = \"hello #{\n some_func()\n}\"\ny = String.trim(x)" + assertLineStarts(t, src) + assertTokenAt(t, src, 3, 4, TokModule, "String") + }) +} diff --git a/internal/store/store.go b/internal/store/store.go index fe3a542..e295918 100644 --- a/internal/store/store.go +++ b/internal/store/store.go @@ -552,13 +552,14 @@ type LookupResult struct { FilePath string Line int Kind string + Arity int DelegateTo string DelegateAs string } func (s *Store) LookupModule(module string) ([]LookupResult, error) { return s.queryLookup( - "SELECT file_path, line, kind, delegate_to, delegate_as FROM definitions WHERE module = ? AND function = '' AND kind IN ('module', 'defprotocol', 'defimpl')", + "SELECT file_path, line, kind, arity, delegate_to, delegate_as FROM definitions WHERE module = ? AND function = '' AND kind IN ('module', 'defprotocol', 'defimpl')", module, ) } @@ -617,7 +618,7 @@ func (s *Store) LookupFunctionInFile(filePath, function string, nearLine int) (s func (s *Store) LookupFunction(module, function string) ([]LookupResult, error) { return s.queryLookup( - "SELECT file_path, line, kind, delegate_to, delegate_as FROM definitions WHERE module = ? AND function = ? AND kind NOT IN ('module', 'defprotocol', 'defimpl', 'callback', 'macrocallback') ORDER BY CASE WHEN kind IN ('type', 'opaque') THEN 1 ELSE 0 END, line", + "SELECT file_path, line, kind, arity, delegate_to, delegate_as FROM definitions WHERE module = ? AND function = ? AND kind NOT IN ('module', 'defprotocol', 'defimpl', 'callback', 'macrocallback') ORDER BY CASE WHEN kind IN ('type', 'opaque') THEN 1 ELSE 0 END, line", module, function, ) } @@ -769,7 +770,7 @@ func (s *Store) LookupFunctionInModules(modules []string, function string, arity } args = append(args, function) - query := "SELECT file_path, line, kind, delegate_to, delegate_as FROM definitions WHERE module IN (" + + query := "SELECT file_path, line, kind, arity, delegate_to, delegate_as FROM definitions WHERE module IN (" + strings.Join(placeholders, ",") + ") AND function = ? AND kind NOT IN ('module', 'defprotocol', 'defimpl', 'callback', 'macrocallback')" @@ -792,7 +793,7 @@ func (s *Store) queryLookup(query string, args ...interface{}) ([]LookupResult, var results []LookupResult for rows.Next() { var r LookupResult - if err := rows.Scan(&r.FilePath, &r.Line, &r.Kind, &r.DelegateTo, &r.DelegateAs); err != nil { + if err := rows.Scan(&r.FilePath, &r.Line, &r.Kind, &r.Arity, &r.DelegateTo, &r.DelegateAs); err != nil { return nil, err } results = append(results, r) @@ -867,7 +868,7 @@ func (s *Store) LookupReferencesByPrefix(prefix string) ([]ModuleReferenceResult // Used for bulk module renames to replace N per-module LookupModule calls with one. func (s *Store) LookupModulesByPrefix(prefix string) ([]LookupResult, error) { rows, err := s.db.Query( - "SELECT module, file_path, line, kind, delegate_to, delegate_as FROM definitions WHERE function = '' AND (module = ? OR module LIKE ?) AND kind IN ('module', 'defprotocol', 'defimpl') ORDER BY module", + "SELECT module, file_path, line, kind, arity, delegate_to, delegate_as FROM definitions WHERE function = '' AND (module = ? OR module LIKE ?) AND kind IN ('module', 'defprotocol', 'defimpl') ORDER BY module", prefix, prefix+".%", ) if err != nil { @@ -878,7 +879,7 @@ func (s *Store) LookupModulesByPrefix(prefix string) ([]LookupResult, error) { var results []LookupResult for rows.Next() { var r LookupResult - if err := rows.Scan(&r.Module, &r.FilePath, &r.Line, &r.Kind, &r.DelegateTo, &r.DelegateAs); err != nil { + if err := rows.Scan(&r.Module, &r.FilePath, &r.Line, &r.Kind, &r.Arity, &r.DelegateTo, &r.DelegateAs); err != nil { return nil, err } results = append(results, r) @@ -1103,6 +1104,14 @@ func (s *Store) NextFunctionLine(filePath string, startLine int) int { } func (s *Store) LookupFollowDelegate(module, function string) ([]LookupResult, error) { + return s.lookupFollowDelegate(module, function, 0) +} + +func (s *Store) lookupFollowDelegate(module, function string, depth int) ([]LookupResult, error) { + if depth > 5 { + return nil, nil + } + results, err := s.LookupFunction(module, function) if err != nil { return nil, err @@ -1123,7 +1132,7 @@ func (s *Store) LookupFollowDelegate(module, function string) ([]LookupResult, e if results[0].DelegateAs != "" { targetFunc = results[0].DelegateAs } - targetResults, err := s.LookupFunction(targetModule, targetFunc) + targetResults, err := s.lookupFollowDelegate(targetModule, targetFunc, depth+1) if err != nil { return nil, err } diff --git a/internal/treesitter/variables_test.go b/internal/treesitter/variables_test.go index e183fa3..083d965 100644 --- a/internal/treesitter/variables_test.go +++ b/internal/treesitter/variables_test.go @@ -1358,3 +1358,25 @@ end`) t.Fatalf("expected 1 occurrence of 'process' (not atom), got %d: %+v", len(occs), occs) } } + +func TestFindVariableOccurrences_DefpLine(t *testing.T) { + src := []byte(`defmodule MyApp.Worker do + def enqueue(resource) do + %{ + resource_type: resource_type(resource) + } + end + + defp resource_type(%{type: t}), do: t +end`) + + // Line 7 is "defp resource_type(%{type: t}), do: t" + // Col 7 is on the 'r' in 'resource_type' + occs := FindVariableOccurrences(src, 7, 7) + if occs != nil { + t.Errorf("expected nil on defp line, got %d occurrences", len(occs)) + for _, occ := range occs { + t.Logf(" Line %d, col %d-%d", occ.Line, occ.StartCol, occ.EndCol) + } + } +} diff --git a/internal/version/version.go b/internal/version/version.go index 53f1c83..5f56476 100644 --- a/internal/version/version.go +++ b/internal/version/version.go @@ -5,4 +5,4 @@ const Version = "0.5.3" // IndexVersion is incremented whenever the index schema or parser changes in a // way that requires a full rebuild. Bump this alongside Version when releasing // a change that makes existing indexes stale. -const IndexVersion = 9 +const IndexVersion = 11