Skip to content

Comments

Fix emphasis delimiters inside link destinations (destination-only)#81

Open
dereuromark wants to merge 1 commit intomasterfrom
fix/emphasis-destination-only
Open

Fix emphasis delimiters inside link destinations (destination-only)#81
dereuromark wants to merge 1 commit intomasterfrom
fix/emphasis-destination-only

Conversation

@dereuromark
Copy link
Contributor

@dereuromark dereuromark commented Feb 9, 2026

Summary

Simpler alternative to #80 that only protects delimiters inside link destinations ](...), without the full link lookahead complexity.

  • Fixes underscores/stars in URLs breaking emphasis: _[link](url_bar)_ now works
  • Does NOT fix the bracket-text case: _[foo_](url) remains emphasis (intentional)
  • Minimal code change, easy to review

Problem

Emphasis delimiters (_, *) inside link destinations were matched as closers, breaking link formation:

_[link](http://example.com?foo_bar=1), more text_
→ <em>[link](http://example.com?foo</em>bar=1), more text_

Solution

When scanning for emphasis closers in parseDelimited(), detect ]( and skip over the parentheses content. This protects delimiters inside URLs from closing emphasis.

Why destination-only?

Per discussion in jgm/djot#375, the two cases are separable:

Case Example This PR
Underscore in destination _[foo](bar_baz)_ ✓ Fixed - link wins
Underscore in bracket text _[foo_](bar) Not changed - emphasis wins

The destination case is the common real-world issue (URLs with query params). The bracket-text case is rare and arguably the current behavior (_[foo_] → emphasis) is intuitive.

This approach:

Refs: jgm/djot#375, #80

Simpler alternative to PR #80 that only protects delimiters inside
link destinations `](...)`, without the full link lookahead.

Problem: Emphasis delimiters (`_`, `*`) inside link destinations were
matched as closers, breaking link formation:
  `_[link](http://example.com?foo_bar=1), more text_`
  → `<em>[link](http://example.com?foo</em>bar=1), more text_`

Fix: When scanning for emphasis closers in `parseDelimited()`, skip
over link destinations by detecting `](` and finding the matching `)`.
This prevents delimiters inside URLs from closing emphasis.

This is a targeted fix for the common real-world issue (URLs with
underscores in query parameters). It intentionally does NOT fix the
bracket-text case (`_[foo_](url)`) which remains emphasis - this
keeps the implementation simple per discussion in jgm/djot#375.

Refs: jgm/djot#375
@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.85%. Comparing base (9b26950) to head (838aaf8).

Files with missing lines Patch % Lines
src/Parser/InlineParser.php 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master      #81   +/-   ##
=========================================
  Coverage     93.85%   93.85%           
- Complexity     2115     2130   +15     
=========================================
  Files            77       77           
  Lines          5675     5699   +24     
=========================================
+ Hits           5326     5349   +23     
- Misses          349      350    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark dereuromark added the bug Something isn't working label Feb 9, 2026
@dereuromark
Copy link
Contributor Author

Performance Benchmark

Ran benchmarks comparing master (no fix) vs this PR:

Regular test cases (1000 iterations each)

Test Case Master With Fix Change
simple (13 chars) 0.010 ms 0.009 ms -10%
link_no_underscore 0.013 ms 0.013 ms same
link_with_underscore 0.019 ms 0.014 ms -26%
multiple_links 0.018 ms 0.018 ms same
complex_url 0.031 ms 0.016 ms -48%
many_underscores 0.019 ms 0.011 ms -42%
nested_parens 0.024 ms 0.013 ms -46%
long_text (702 chars, 50 links) 0.203 ms 0.246 ms +21%
pathological_brackets 0.390 ms 0.399 ms +2%

Scaling test (pathological input _[x_[x_[x_...)

n Master With Fix
100 (301 chars) 0.432 ms 0.500 ms
500 (1501 chars) 7.578 ms 7.603 ms

Analysis

Counterintuitive finding: The fix is often faster for URLs with underscores because:

  • Without fix: underscore closes emphasis early → remaining text reparsed → more work
  • With fix: emphasis closes correctly at the end → single clean pass

Overhead:

  • ~20% slower for long_text (50 links with constant skipping over ](...))
  • ~2% slower for pathological brackets
  • No measurable difference for content without links

Comparison with #80 (full lookahead):

This PR #80
Trigger condition ]( only Every [ and ![
Scan scope Just (...) Full [...](...)
Code added ~50 lines ~85 lines
Covers bracket-text case No Yes

This PR has lower overhead because it only triggers on ](, not every [.

@dereuromark
Copy link
Contributor Author

dereuromark commented Feb 17, 2026

Performance Benchmark (Updated 2026-02-17)

Ran benchmarks comparing master vs this PR (5 rounds, 500 iterations each, median times).

Regular test cases

Test Case Chars Master With Fix Change
simple 12 0.007 ms 0.008 ms ~same
link_no_underscore 28 0.014 ms 0.011 ms 21% faster
link_with_underscore 38 0.021 ms 0.013 ms 36% faster
multiple_links 19 0.012 ms 0.014 ms ~same
complex_url 48 0.025 ms 0.014 ms 44% faster
many_underscores 25 0.021 ms 0.012 ms 42% faster
nested_parens 27 0.015 ms 0.012 ms 18% faster
long_text (50 links) 1950 0.723 ms 0.439 ms 39% faster
pathological_brackets 400 0.872 ms 0.858 ms ~same

Note: Variations under 0.005ms (~15%) are within noise range for these micro-benchmarks.

Scaling test (pathological input _[x_[x_[x_...)

n Chars Master With Fix Change
100 400 0.88 ms 0.80 ms 9% faster
200 800 2.95 ms 2.85 ms 3% faster
500 2000 16.28 ms 15.74 ms 3% faster
1000 4000 59.75 ms 59.12 ms ~same

Analysis

Key finding: The fix is faster for URLs with underscores:

  • Without fix: underscore closes emphasis early → remaining text reparsed → more work
  • With fix: emphasis closes correctly at the end → single clean pass

Performance characteristics:

  • Significant improvement (30-44% faster) for links with underscores in URLs
  • 39% faster for long_text with many links
  • No measurable overhead for pathological bracket cases
  • Simple cases without links show no meaningful difference

Comparison with #80 (full lookahead):

This PR #80
Trigger condition ]( only Every [ and ![
Scan scope Just (...) Full [...](...)
Code added ~50 lines ~85 lines
Covers bracket-text case No Yes

This PR has lower overhead because it only triggers on ](, not every [.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant