Skip to content

[codex] speed up BLE image preparation#21

Merged
g4bri3lDev merged 5 commits into
OpenDisplay:mainfrom
LordMike:codex/ble-image-pipeline-performance
May 12, 2026
Merged

[codex] speed up BLE image preparation#21
g4bri3lDev merged 5 commits into
OpenDisplay:mainfrom
LordMike:codex/ble-image-pipeline-performance

Conversation

@LordMike
Copy link
Copy Markdown
Contributor

@LordMike LordMike commented May 7, 2026

Summary

Fixes #17.

This PR makes BLE image uploads avoid the slow internal JPEG round trip and reduces the CPU cost of preparing display bytes.

  • Keeps drawcustom output as a PIL image inside the integration until a destination actually needs JPEG.
  • Encodes JPEG only for AP /imgupload and Home Assistant image previews.
  • Sends PIL images directly into BLE upload preparation instead of PIL -> JPEG -> PIL internally.
  • Adds an exact-palette fast path for dither=0 using Image.getcolors(maxcolors=len(palette)).
  • Replaces per-pixel Python loops for direct mapping, ordered dithering quantization, and direct-write packing with chunked NumPy operations.
  • Selects compressed direct-write using the actual encoded display payload size, falling back to raw when the compressed payload exceeds the MCU buffer limit.
  • Moves BLE image preparation, direct-write encoding, compression checks, and compression off the Home Assistant event loop.
  • Logs drawcustom stage timings at upload completion so remote installations can show where time is spent.

Key Commits

e72b609 - internal image format

  • drawcustom now keeps generated images as PIL images internally.
  • AP upload and HA previews perform JPEG encoding at their own boundaries.
  • BLE block/direct upload paths receive PIL images directly and no longer decode the generated image from JPEG.

c6fabd1 - BLE preparation optimization

  • dither=0 skips palette mapping when every pixel is already an exact display-palette color.
  • Direct mapping and ordered dithering quantization now use chunked NumPy processing to avoid large peak allocations.
  • Direct-write encoding for color schemes 0-5 now packs bytes with vectorized operations.
  • CPU-heavy BLE preparation runs in an executor so the HA event loop remains responsive.

1233130 - direct-write compression selection

  • The previous compression gate used generated JPEG byte length as a proxy for zlib output size, even though direct-write sends encoded display bytes rather than JPEG.
  • Direct-write selection now treats zip as an allowed capability, not a preselected upload mode.
  • The executor-side prepare step compresses the real encoded display bytes and uses compressed direct-write only when that payload fits the 50 KiB MCU compressed buffer limit.
  • Oversized compressed attempts are capped during streaming compression and fall back to raw direct-write.

fc034d4 - drawcustom upload timing logs

  • Completion logs now include render, dither/quantize, and send/refresh timings for BLE uploads.
  • AP upload completion logs include render, JPEG encode, and HTTP send timings.
  • Dry-run logs include render and preview encode timings.

Timing Context

The original report came from issue #17, where my ARM Home Assistant server took about 29s to process a 960x640 image in the current path. That CPU-bound work ran on the HA event loop, so HA's HTTP server stopped responding during the call; in practice, drawcustom service calls timed out in the browser while the upload continued in the background.

The largest panel size I found was 2560x1440, so this PR uses that as a practical worst case. It has exactly 6x as many pixels as 960x640, so linear scaling on my ARM host implies roughly 174s in the original path.

Local original timings were measured on a much faster x64 CPU and should be read only as lower-bound comparison data, not representative of my HA server:

  • 2560x1440, scheme 4, dither=0, original path: roughly 10-11.4s depending on image content.
  • 2560x1440, optimized candidate: roughly 0.50-0.54s for vectorized process + encode + zlib.
  • 2560x1440, exact-palette skip path: roughly 0.006s detection plus 0.051s direct-write encoding.

The ARM host should benefit strongly because this removes the expensive Python per-pixel loops and moves the remaining CPU-heavy BLE preparation into executor jobs, keeping HA responsive while the work runs.

Validation

  • Python compile check for custom_components/opendisplay passed.
  • In-memory equivalence check passed for direct/ordered processing and direct-write encoding across color schemes 0-5.
  • Direct-write compression selection smoke check passed for compressed, raw fallback, and zip-disabled cases.
  • Timing prepare smoke check passed for block/direct prepare return values and timing fields.
  • Full pytest did not complete locally because setting up the HA test dependency environment was interrupted; CI should run on this PR.

Comment thread custom_components/opendisplay/ble/image_upload.py Outdated
Comment thread custom_components/opendisplay/ble/image_upload.py
@LordMike LordMike force-pushed the codex/ble-image-pipeline-performance branch 2 times, most recently from 82dd583 to 9536527 Compare May 7, 2026 19:40
@LordMike LordMike force-pushed the codex/ble-image-pipeline-performance branch from 9536527 to 9bde2b3 Compare May 7, 2026 19:41
@LordMike LordMike marked this pull request as ready for review May 7, 2026 19:59
@LordMike LordMike requested a review from g4bri3lDev as a code owner May 7, 2026 19:59
@g4bri3lDev g4bri3lDev merged commit d1ad0c3 into OpenDisplay:main May 12, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drawcustom: Home Assistant stops responding while image is being rendered

2 participants