Skip to content

Integrate the toxic2 tolerant parser across providers#1267

Open
lukaszsamson wants to merge 15 commits into
masterfrom
toxic2-integration
Open

Integrate the toxic2 tolerant parser across providers#1267
lukaszsamson wants to merge 15 commits into
masterfrom
toxic2-integration

Conversation

@lukaszsamson

Copy link
Copy Markdown
Collaborator

Summary

Integrates the toxic2 error-tolerant parser (via the
matching elixir_sense branch) across the ElixirLS providers, replacing the tokenizer-driven and
Code.Fragment-driven implementations with passes derived from toxic2's ranged AST.

Companion PR: elixir-lsp/elixir_sense#336 (this branch pins that elixir_sense SHA).

What changed

Providers reimplemented on toxic2 ranges

  • Selection ranges: delimiter pairs, comment blocks and AST node ranges now derive from the
    toxic2 parse; the old FoldingRange.Token/TokenPair/SpecialToken tokenizer is deleted.
    Half-open containment fixes an adjacent-bracket (foo[bar][baz]) crash.
  • Folding ranges and document symbols reimplemented over toxic2 ranges (dead range
    heuristics removed).
  • Navigation (definition / references / implementation / declaration / hover / call hierarchy):
    routed through ElixirSense.Core.SurroundContext.Toxic for symbol-under-cursor classification.
  • selection_ranges reuses the AST it already parsed instead of re-parsing per cursor position.
  • Context.ast/metadata for .ex/.exs built from the toxic2 ranged AST; neutralize_errors
    deduped onto the shared ElixirSense.Core.Parser helper.

No more direct Code.Fragment.surround_context in the providers — the only remaining use is
the internal fallback inside the toxic2 classifier.

Build / CI

  • Move elixir_sense from a local path dep back to the git-dep model, pinned to the
    toxic2-parser SHA (dep_versions.exs + mix.lock); this transitively pulls toxic2.
  • elixir_sense/toxic2 require Elixir ~> 1.19, so drop the 1.16/1.17/1.18 jobs from the CI
    matrix (smoke tests + test matrix, Linux + Windows).

Testing

  • apps/language_server provider suite green (1170 passing) against the git-pinned deps.
  • mix compile --warnings-as-errors, mix format --check-formatted, and mix dialyzer all clean.

⚠️ Depends on elixir-lsp/elixir_sense#336 being merged (and its pinned toxic2). The pinned
elixir_sense SHA should be advanced to the merge commit before this lands.

🤖 Generated with Claude Code

lukaszsamson and others added 15 commits June 13, 2026 15:59
… toxic2 ranges

- Point elixir_sense to a local path dep (carries the toxic2-backed parser)
- Reimplement SelectionRanges AST node ranges using toxic2's range: metadata
  and comments from Toxic2.string_to_quoted_with_comments, dropping the bespoke
  AstUtils.node_range computation
- Delete now-unused AstUtils module and its tests
- Adjust document_symbols test for toxic2-recovered AST

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Derive structural and comment folds from the error-tolerant toxic2 parser
(node source ranges + Toxic2.string_to_quoted_with_comments) instead of the
Elixir-tokenizer-backed token-pair / special-token passes. The line-based
indentation pass is kept (it supplies assignment/clause folds that have no
single closing token) and AST folds override it at shared start lines, as the
token-pair pass used to. Comments inside strings/heredocs are no longer
mistaken for fold-able comment blocks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Parse with toxic2 (range: true, no literal_encoder) and read each node's range:
meta for symbol range and selection range, replacing the token-metadata
end-position heuristics (kept only as a fallback for range-less nodes).

Also: preserve nil args (bare identifiers like var / __MODULE__) in
neutralize_errors across document_symbols, selection_ranges and folding_range -
the previous `not is_list(args)` clause turned them into zero-arity calls; and
ignore error-recovery placeholders when computing function/type arity so an
incomplete `def foo(` reports foo/0 rather than an inflated arity.

Adjust the records test for the toxic2 call-node range (starts at `Record`).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every node extract_* takes a location from carries a toxic2 range:, so the old
token-metadata heuristic was unreachable - and it harbored a latent bug
(elixir_position_to_lsp({nil, nil}) returns end-of-file because nil sorts above
integers). Replace location_to_range with a range:-only version that degrades a
range-less node to a zero-width range at its line/column, and drop the now-dead
symbol argument.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The LS Parser now produces a range-bearing toxic2 AST for Context.ast (and
the metadata built from it) on every .ex/.exs parse, replacing the
Code.string_to_quoted! AST and the ElixirSense fault-tolerant fallback.
This is the foundation that lets range-aware providers read node ranges
straight off Context.ast.

- parse_file/3 stays the sole diagnostics source (Code.with_diagnostics)
  and the EEx/HEEx parser; it now returns a tagged {:ok, ast, diagnostics}
  / {:error, diagnostics} so a falsey-but-valid AST (literal nil/false) is
  no longer mistaken for a parse failure.
- do_parse/2 decides the flag from the Code success tag: clean -> :exact,
  else toxic recovered something usable -> :fixed, else :not_parsable.
- parse_elixir_toxic/3 builds the AST/metadata via
  ElixirSense.Core.Parser.parse_to_neutralized_ast(range: true,
  keep_range: true), keeping the catch/telemetry safety net.
- fault_tolerant_parse/2 removed (toxic always recovers; cursor env is
  derived separately in Metadata.get_cursor_env).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…r helper

DocumentSymbols, FoldingRange and SelectionRanges each carried a byte-identical
private neutralize_errors/1. Replace all three with the shared
ElixirSense.Core.Parser.neutralize_errors/3 (keep_range: true so the range:
meta survives). document_symbols/folding_range pass their parse diagnostics
(also gets the call-arg sentinel cleaning); selection_ranges' self-neutralizing
ast_node_ranges/4 passes [] (range-only, and __error__ nodes carry no range, so
the diagnostics-driven cleaning is a no-op there).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
definition/references/implementation/declaration/call_hierarchy/hover locators and
the llm_environment command now classify the symbol under the cursor via
ElixirSense.Core.SurroundContext.Toxic.surround_context/2 (stage 0 delegates to
Code.Fragment). Completion (cursor_context/container_cursor_to_quoted) untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces the Elixir-tokenizer passes (token_pair_ranges via FoldingRange.Token/
TokenPair, special_token_group_ranges via FoldingRange.SpecialToken, and the
stop-token machinery) with a toxic2-AST pass, delimiter_pair_ranges/4. It derives
outer/inner ranges for ()/[]/{}/%{}/<<>>, calls, bracket access x[y], and
do/else/rescue/after/catch/end blocks from the toxic2 closing:/do:/end:/section-key
range: metadata, plus a stab pattern .. -> range. String/heredoc/sigil ranges now
come from ast_node_ranges (the toxic nodes carry range:), so the special-token pass
is gone. Both selection and folding providers no longer use :elixir_tokenizer for
their output (FoldingRange was already migrated).

Adversarial review found a crash: a cursor exactly on a block section keyword
(else/rescue/...) made two sibling, non-nested ranges and tripped the
"increasingly narrowing" merge invariant. Fixed by selecting the cursor's section
with half-open containment (end exclusive). A fuzz over real files dropped
selection-range crashes from 238 (old tokenizer code) to 6 pre-existing
"no intersection" cases in the shared merge. Bracket-access ranges (lost in the
first cut) restored via the from_brackets meta. Regression tests added.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ules

These FoldingRange submodules were lib-dead after both providers moved off the
Elixir tokenizer (FoldingRange.provide and SelectionRanges no longer use them).
Remove the modules and their tests. convert_text_to_input and @type input drop
the :tokens field (now lines-only); Indentation and CommentBlock provide_ranges
only ever read :lines, so their doctests and the folding_range_test passes keep
working. The only remaining ElixirSense.Core.Normalized.Tokenizer user is now
elixir_sense's Source.which_func.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
elixir-ls already extracts join bindings (backported in 906c8c8) but had no unit
test for it - only commented-out integration TODOs. Port elixir_sense's focused,
self-contained QueryTest (mock Post/Comment schemas) to lock in the behavior.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…cent bracket crash)

container_ranges decided cursor containment with the inclusive in?/2 on end-exclusive
ranges, so adjacent bracket accesses (foo[bar][baz]) both claimed the shared boundary
column and emitted two non-nested sibling ranges, raising "ranges_1 is not
increasingly narrowing" in the merge. Use the half-open check (end exclusive) like
do_block_ranges already does. Found by gpt-5.5 adversarial review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…Toxic

selection_ranges: the symbol-under-cursor pass called Code.Fragment.surround_context
directly. Route it through ElixirSense.Core.SurroundContext.Toxic instead - the same
toxic2-backed entry point the navigation providers already use. Navigable shapes now get
their span from the AST range: metadata; only purely lexical units (a bare do/end, exotic
operators) reach Toxic's internal Code.Fragment fallback. AST ranges alone don't cover
these symbol-level spans (e.g. the do/end keyword units, the dot-path callee), so the pass
is kept rather than removed.

document_symbols: drop the stale 'extract module name location from
Code.Fragment.surround_context?' TODO - module name locations come from the toxic2 AST
range metadata now.

This removes the last direct Code.Fragment.surround_context call in the providers; the only
remaining uses are Toxic's own internal fallback.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pass the AST already parsed at the top of selection_ranges/3 into
SurroundContext.Toxic.surround_context/3 instead of having it re-parse the
source on every cursor position.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Move elixir_sense from a local path dep back to the git dep model and pin it to
the pushed toxic2-parser SHA (b928399b) via dep_versions.exs + mix.lock. This
transitively pulls toxic2 (lukaszsamson/toxic2).

elixir_sense/toxic2 require Elixir ~> 1.19, so drop the 1.16/1.17/1.18 jobs from
the CI matrix (smoke tests + test matrix, Linux + Windows).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ixes)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant