Skip to content

perf(replay): optimize session replay diffing — 77% faster (265→60µs)#465

Draft
marandaneto wants to merge 31 commits intomainfrom
autoresearch/replay-perf-20260319
Draft

perf(replay): optimize session replay diffing — 77% faster (265→60µs)#465
marandaneto wants to merge 31 commits intomainfrom
autoresearch/replay-perf-20260319

Conversation

@marandaneto
Copy link
Copy Markdown
Member

💡 Motivation and Context

The PostHogReplayIntegration session replay engine runs a snapshot diff on every frame draw. The original diffing algorithm created 6+ intermediate collections, flattened both wireframe trees into lists, and used data class copy() for comparison — all on the hot path that executes hundreds of times per second on user devices.

This PR fundamentally redesigns the diffing algorithm and applies targeted micro-optimizations to reduce per-frame CPU and allocation overhead.

Result: 265µs → 60µs (77% faster) on a realistic 781-node wireframe tree benchmark, verified across 48 experiments.

Key optimizations

Technique Impact
Parallel tree walk (bypass HashMap for stable structures) -25%
=== before == for nullable field comparisons -20%
Single-pass diffing (eliminate 6+ collection ops) -42%
Manual hex toRGBColor (replace String.format) 17× faster
Combined flatten+diff (eliminate flat list allocations) -17%
Custom styleEquals (skip data class equals overhead) -11%

Production improvements (not benchmarked)

  • Reusable IntArray(2) for getLocationOnScreen coordinates
  • Lazy MutatedNode/RemovedNode list creation (skip when empty)
  • Pre-sized ArrayList for events, children, mouse interactions
  • Simplified GradientDrawable.toRGBColor (remove unnecessary decompose/recompose)
  • Removed dead code (flattenChildren, findAddedAndRemovedItems from integration)

New files

  • RRWireframeDiffer.kt — Extracted, optimized diff engine (pure Kotlin/JVM, no Android deps)
  • RRWireframeDifferTest.kt — 7 correctness tests (identical trees, updates, adds, removes, structural changes, null root, flatten order, toRGBColor)
  • RRWireframeDifferBenchmarkTest.kt — Performance benchmark

💚 How did you test it?

  • 7 unit tests covering all diff scenarios (added, removed, updated, structural changes, null root, flatten order, color conversion)
  • Benchmark test on 781-node synthetic tree (depth 4, branching factor 5, 10% changed nodes per frame)
  • Full :posthog:test suite passes
  • :posthog-android:compileDebugKotlin compiles cleanly
  • 48 experiment iterations with automated correctness checks

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

…ask on 781-node tree

Result: {"status":"keep","total_µs":265,"color_µs":317,"mask_µs":42,"node_count":781,"p95_µs":307}
…nChildren, manual hex toRGBColor. 265→153µs (-42%), color 317→18µs (-94%)

Result: {"status":"keep","total_µs":153,"color_µs":18,"mask_µs":39,"node_count":781,"p95_µs":216}
…ems, remaining = removed. 153→131µs (-14%)

Result: {"status":"keep","total_µs":131,"color_µs":19,"mask_µs":36,"node_count":781,"p95_µs":163}
… Avoids stack object management. 131→119µs (-9%)

Result: {"status":"keep","total_µs":119,"color_µs":16,"mask_µs":26,"node_count":781,"p95_µs":134}
…ashMap, avoids allocating two 781-element flat lists. 119→99µs (-17%), total from baseline 265→99µs (-63%)

Result: {"status":"keep","total_µs":99,"color_µs":19,"mask_µs":48,"node_count":781,"p95_µs":117}
…tegration.kt. No benchmark change (same code), but production integration now uses optimized paths.

Result: {"status":"keep","total_µs":107,"color_µs":18,"mask_µs":26,"node_count":781,"p95_µs":119}
…quality check before deep equals. Minimal bench impact but helps in production where styles are often same object.

Result: {"status":"keep","total_µs":106,"color_µs":17,"mask_µs":46,"node_count":781,"p95_µs":125}
…om PostHogReplayIntegration (replaced by RRWireframeDiffer). Code cleanup, no perf change.

Result: {"status":"keep","total_µs":106,"color_µs":17,"mask_µs":46,"node_count":781,"p95_µs":125}
…y without HashMap when IDs match. Falls back to HashMap only for structural changes. 106→80µs (-25%)

Result: {"status":"keep","total_µs":80,"color_µs":19,"mask_µs":47,"node_count":781,"p95_µs":84}
…(skip when empty), pre-sized ArrayLists, single-root diffTrees overload. No bench change but reduces per-frame allocations in production.

Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…e — avoids per-view allocation. Production-only improvement, no bench impact.

Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…yList when ViewGroup has children, pre-sized to childCount. Saves 1 allocation per leaf view.

Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…Minor allocation reduction in production.

Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…directly, avoids unnecessary list wrapping and parallel walk setup.

Result: {"status":"keep","total_µs":81,"color_µs":18,"mask_µs":42,"node_count":781,"p95_µs":112}
…70% faster). Optimization space exhausted.

Result: {"status":"keep","total_µs":80,"color_µs":14,"mask_µs":23,"node_count":781,"p95_µs":89}
…een/blue decomposition + Color.rgb recomposition (toRGBColor already masks to 0xFFFFFF). Fix autoresearch.sh stale test cache.

Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":51,"node_count":781,"p95_µs":86}
…, updates, adds, removes, structural changes, null root, flattenChildren, toRGBColor). All pass. 79µs confirmed.

Result: {"status":"keep","total_µs":79,"color_µs":15,"mask_µs":24,"node_count":781,"p95_µs":87}
…en oldList.size == newList.size (the common case). 80→78µs (~3%)

Result: {"status":"keep","total_µs":78,"color_µs":17,"mask_µs":30,"node_count":781,"p95_µs":87}
… 78-80µs. Optimization space fully explored — 33 experiments, 71% total improvement.

Result: {"status":"keep","total_µs":80,"color_µs":16,"mask_µs":43,"node_count":781,"p95_µs":86}
…ields in wireframePropertiesEqual. Handles null-null fast and interned strings. 80→73µs (-8.7%)

Result: {"status":"keep","total_µs":73,"color_µs":16,"mask_µs":42,"node_count":781,"p95_µs":89}
…yle. Avoids Intrinsics.areEqual for null-null comparisons (most fields are null). 73→71µs (-3%)

Result: {"status":"keep","total_µs":71,"color_µs":17,"mask_µs":43,"node_count":781,"p95_µs":82}
…ontSize, padding, etc). null===null skips Intrinsics.areEqual function call. 71→63µs (-11%)

Result: {"status":"keep","total_µs":63,"color_µs":18,"mask_µs":36,"node_count":781,"p95_µs":70}
…esEqual (disabled, checked, max, parentId). 63→62µs (~1.5%)

Result: {"status":"keep","total_µs":62,"color_µs":16,"mask_µs":27,"node_count":781,"p95_µs":69}
Result: {"status":"keep","total_µs":60,"color_µs":14,"mask_µs":43,"node_count":781,"p95_µs":66}
…rectness tests pass, Android module compiles.

Result: {"status":"keep","total_µs":61,"color_µs":19,"mask_µs":43,"node_count":781,"p95_µs":65}
…. Optimization surface exhausted — remaining ideas require Android runtime.

Result: {"status":"keep","total_µs":59,"color_µs":17,"mask_µs":38,"node_count":781,"p95_µs":68}
…ode, same performance. JIT handles both paths equally.

Result: {"status":"keep","total_µs":61,"color_µs":16,"mask_µs":25,"node_count":781,"p95_µs":68}
…(77%). Optimization surface fully exhausted on JVM benchmark. Remaining ideas are Android-only.

Result: {"status":"keep","total_µs":61,"color_µs":20,"mask_µs":24,"node_count":781,"p95_µs":70}
Copy link
Copy Markdown

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@github-actions
Copy link
Copy Markdown
Contributor

posthog-android Compliance Report

Date: 2026-03-19 16:25:45 UTC
Duration: 147109ms

⚠️ Some Tests Failed

28/29 tests passed, 1 failed


Capture Tests

⚠️ 28/29 tests passed, 1 failed

View Details
Test Status Duration
Format Validation.Event Has Required Fields 2320ms
Format Validation.Event Has Uuid 2027ms
Format Validation.Event Has Lib Properties 2027ms
Format Validation.Distinct Id Is String 2026ms
Format Validation.Token Is Present 2023ms
Format Validation.Custom Properties Preserved 2023ms
Format Validation.Event Has Timestamp 2026ms
Retry Behavior.Retries On 503 7025ms
Retry Behavior.Does Not Retry On 400 4025ms
Retry Behavior.Does Not Retry On 401 4026ms
Retry Behavior.Respects Retry After Header 7026ms
Retry Behavior.Implements Backoff 17030ms
Retry Behavior.Retries On 500 7026ms
Retry Behavior.Retries On 502 7025ms
Retry Behavior.Retries On 504 7023ms
Retry Behavior.Max Retries Respected 17034ms
Deduplication.Generates Unique Uuids 2043ms
Deduplication.Preserves Uuid On Retry 7026ms
Deduplication.Preserves Uuid And Timestamp On Retry 12022ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 7028ms
Deduplication.No Duplicate Events In Batch 2032ms
Deduplication.Different Events Have Different Uuids 2022ms
Compression.Sends Gzip When Enabled 2017ms
Batch Format.Uses Proper Batch Structure 2017ms
Batch Format.Flush With No Events Sends Nothing 2015ms
Batch Format.Multiple Events Batched Together 2030ms
Error Handling.Does Not Retry On 403 4020ms
Error Handling.Does Not Retry On 413 4022ms
Error Handling.Retries On 408 7022ms

Failures

error_handling.does_not_retry_on_413

Expected 1 requests, got 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants