perf(replay): optimize session replay diffing — 77% faster (265→60µs)#465
Draft
marandaneto wants to merge 31 commits intomainfrom
Draft
perf(replay): optimize session replay diffing — 77% faster (265→60µs)#465marandaneto wants to merge 31 commits intomainfrom
marandaneto wants to merge 31 commits intomainfrom
Conversation
…ask on 781-node tree
Result: {"status":"keep","total_µs":265,"color_µs":317,"mask_µs":42,"node_count":781,"p95_µs":307}
…nChildren, manual hex toRGBColor. 265→153µs (-42%), color 317→18µs (-94%)
Result: {"status":"keep","total_µs":153,"color_µs":18,"mask_µs":39,"node_count":781,"p95_µs":216}
…ems, remaining = removed. 153→131µs (-14%)
Result: {"status":"keep","total_µs":131,"color_µs":19,"mask_µs":36,"node_count":781,"p95_µs":163}
… Avoids stack object management. 131→119µs (-9%)
Result: {"status":"keep","total_µs":119,"color_µs":16,"mask_µs":26,"node_count":781,"p95_µs":134}
…ashMap, avoids allocating two 781-element flat lists. 119→99µs (-17%), total from baseline 265→99µs (-63%)
Result: {"status":"keep","total_µs":99,"color_µs":19,"mask_µs":48,"node_count":781,"p95_µs":117}
…tegration.kt. No benchmark change (same code), but production integration now uses optimized paths.
Result: {"status":"keep","total_µs":107,"color_µs":18,"mask_µs":26,"node_count":781,"p95_µs":119}
…quality check before deep equals. Minimal bench impact but helps in production where styles are often same object.
Result: {"status":"keep","total_µs":106,"color_µs":17,"mask_µs":46,"node_count":781,"p95_µs":125}
…om PostHogReplayIntegration (replaced by RRWireframeDiffer). Code cleanup, no perf change.
Result: {"status":"keep","total_µs":106,"color_µs":17,"mask_µs":46,"node_count":781,"p95_µs":125}
…y without HashMap when IDs match. Falls back to HashMap only for structural changes. 106→80µs (-25%)
Result: {"status":"keep","total_µs":80,"color_µs":19,"mask_µs":47,"node_count":781,"p95_µs":84}
…(skip when empty), pre-sized ArrayLists, single-root diffTrees overload. No bench change but reduces per-frame allocations in production.
Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…e — avoids per-view allocation. Production-only improvement, no bench impact.
Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…yList when ViewGroup has children, pre-sized to childCount. Saves 1 allocation per leaf view.
Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…Minor allocation reduction in production.
Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":41,"node_count":781,"p95_µs":89}
…directly, avoids unnecessary list wrapping and parallel walk setup.
Result: {"status":"keep","total_µs":81,"color_µs":18,"mask_µs":42,"node_count":781,"p95_µs":112}
…70% faster). Optimization space exhausted.
Result: {"status":"keep","total_µs":80,"color_µs":14,"mask_µs":23,"node_count":781,"p95_µs":89}
…een/blue decomposition + Color.rgb recomposition (toRGBColor already masks to 0xFFFFFF). Fix autoresearch.sh stale test cache.
Result: {"status":"keep","total_µs":81,"color_µs":17,"mask_µs":51,"node_count":781,"p95_µs":86}
…, updates, adds, removes, structural changes, null root, flattenChildren, toRGBColor). All pass. 79µs confirmed.
Result: {"status":"keep","total_µs":79,"color_µs":15,"mask_µs":24,"node_count":781,"p95_µs":87}
…en oldList.size == newList.size (the common case). 80→78µs (~3%)
Result: {"status":"keep","total_µs":78,"color_µs":17,"mask_µs":30,"node_count":781,"p95_µs":87}
… 78-80µs. Optimization space fully explored — 33 experiments, 71% total improvement.
Result: {"status":"keep","total_µs":80,"color_µs":16,"mask_µs":43,"node_count":781,"p95_µs":86}
…ields in wireframePropertiesEqual. Handles null-null fast and interned strings. 80→73µs (-8.7%)
Result: {"status":"keep","total_µs":73,"color_µs":16,"mask_µs":42,"node_count":781,"p95_µs":89}
…yle. Avoids Intrinsics.areEqual for null-null comparisons (most fields are null). 73→71µs (-3%)
Result: {"status":"keep","total_µs":71,"color_µs":17,"mask_µs":43,"node_count":781,"p95_µs":82}
…ontSize, padding, etc). null===null skips Intrinsics.areEqual function call. 71→63µs (-11%)
Result: {"status":"keep","total_µs":63,"color_µs":18,"mask_µs":36,"node_count":781,"p95_µs":70}
…esEqual (disabled, checked, max, parentId). 63→62µs (~1.5%)
Result: {"status":"keep","total_µs":62,"color_µs":16,"mask_µs":27,"node_count":781,"p95_µs":69}
Result: {"status":"keep","total_µs":60,"color_µs":14,"mask_µs":43,"node_count":781,"p95_µs":66}
…rectness tests pass, Android module compiles.
Result: {"status":"keep","total_µs":61,"color_µs":19,"mask_µs":43,"node_count":781,"p95_µs":65}
…. Optimization surface exhausted — remaining ideas require Android runtime.
Result: {"status":"keep","total_µs":59,"color_µs":17,"mask_µs":38,"node_count":781,"p95_µs":68}
…ode, same performance. JIT handles both paths equally.
Result: {"status":"keep","total_µs":61,"color_µs":16,"mask_µs":25,"node_count":781,"p95_µs":68}
…(77%). Optimization surface fully exhausted on JVM benchmark. Remaining ideas are Android-only.
Result: {"status":"keep","total_µs":61,"color_µs":20,"mask_µs":24,"node_count":781,"p95_µs":70}
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Contributor
posthog-android Compliance ReportDate: 2026-03-19 16:25:45 UTC
|
| Test | Status | Duration |
|---|---|---|
| Format Validation.Event Has Required Fields | ✅ | 2320ms |
| Format Validation.Event Has Uuid | ✅ | 2027ms |
| Format Validation.Event Has Lib Properties | ✅ | 2027ms |
| Format Validation.Distinct Id Is String | ✅ | 2026ms |
| Format Validation.Token Is Present | ✅ | 2023ms |
| Format Validation.Custom Properties Preserved | ✅ | 2023ms |
| Format Validation.Event Has Timestamp | ✅ | 2026ms |
| Retry Behavior.Retries On 503 | ✅ | 7025ms |
| Retry Behavior.Does Not Retry On 400 | ✅ | 4025ms |
| Retry Behavior.Does Not Retry On 401 | ✅ | 4026ms |
| Retry Behavior.Respects Retry After Header | ✅ | 7026ms |
| Retry Behavior.Implements Backoff | ✅ | 17030ms |
| Retry Behavior.Retries On 500 | ✅ | 7026ms |
| Retry Behavior.Retries On 502 | ✅ | 7025ms |
| Retry Behavior.Retries On 504 | ✅ | 7023ms |
| Retry Behavior.Max Retries Respected | ✅ | 17034ms |
| Deduplication.Generates Unique Uuids | ✅ | 2043ms |
| Deduplication.Preserves Uuid On Retry | ✅ | 7026ms |
| Deduplication.Preserves Uuid And Timestamp On Retry | ✅ | 12022ms |
| Deduplication.Preserves Uuid And Timestamp On Batch Retry | ✅ | 7028ms |
| Deduplication.No Duplicate Events In Batch | ✅ | 2032ms |
| Deduplication.Different Events Have Different Uuids | ✅ | 2022ms |
| Compression.Sends Gzip When Enabled | ✅ | 2017ms |
| Batch Format.Uses Proper Batch Structure | ✅ | 2017ms |
| Batch Format.Flush With No Events Sends Nothing | ✅ | 2015ms |
| Batch Format.Multiple Events Batched Together | ✅ | 2030ms |
| Error Handling.Does Not Retry On 403 | ✅ | 4020ms |
| Error Handling.Does Not Retry On 413 | ❌ | 4022ms |
| Error Handling.Retries On 408 | ✅ | 7022ms |
Failures
error_handling.does_not_retry_on_413
Expected 1 requests, got 2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
💡 Motivation and Context
The
PostHogReplayIntegrationsession replay engine runs a snapshot diff on every frame draw. The original diffing algorithm created 6+ intermediate collections, flattened both wireframe trees into lists, and useddata class copy()for comparison — all on the hot path that executes hundreds of times per second on user devices.This PR fundamentally redesigns the diffing algorithm and applies targeted micro-optimizations to reduce per-frame CPU and allocation overhead.
Result: 265µs → 60µs (77% faster) on a realistic 781-node wireframe tree benchmark, verified across 48 experiments.
Key optimizations
===before==for nullable field comparisonstoRGBColor(replaceString.format)styleEquals(skip data class equals overhead)Production improvements (not benchmarked)
IntArray(2)forgetLocationOnScreencoordinatesMutatedNode/RemovedNodelist creation (skip when empty)ArrayListfor events, children, mouse interactionsGradientDrawable.toRGBColor(remove unnecessary decompose/recompose)flattenChildren,findAddedAndRemovedItemsfrom integration)New files
RRWireframeDiffer.kt— Extracted, optimized diff engine (pure Kotlin/JVM, no Android deps)RRWireframeDifferTest.kt— 7 correctness tests (identical trees, updates, adds, removes, structural changes, null root, flatten order, toRGBColor)RRWireframeDifferBenchmarkTest.kt— Performance benchmark💚 How did you test it?
:posthog:testsuite passes:posthog-android:compileDebugKotlincompiles cleanly📝 Checklist