ios_pointer_finder

A 338K-parameter CNN that finds the iPhone Pointer-Control cursor in a screen capture, in 10 ms.

When you pair a Bluetooth mouse with an iPhone, iOS draws a small "@-symbol" cursor on the screen (Pointer Control). This repo trains and ships a tiny model that finds that cursor with sub-cursor-radius accuracy, on any iPhone screen, in real time. It's the perception layer of a closed-loop iPhone agent that drives an unmodified phone over BLE HID + AirPlay mirroring.

A heatmap-regression CNN trained on synthetic composites. No iPhone images are bundled. Bring your own backgrounds, run synthesize.py, train your own variant — or just use the shipped weights directly.

Quickstart

Status: PyPI publish + Hugging Face Hub model repo are not live yet — both are gated on the v0.7 retrain finishing (see Roadmap). Today's install path is from a clone:

git clone https://github.com/ellyseum/ios_pointer_finder.git
cd ios_pointer_finder
pip install -e ".[hub,safetensors]"

Then either:

A) Train your own weights (see Train your own below) — produces pointer_model_v{X}.{Y}.{Z}_{err}px.{pt|safetensors}. Convert legacy .pt to .safetensors:

python scripts/convert_pt_to_safetensors.py pointer_model_v0.7.0.pt
# → pointer_model_v0.7.0.safetensors  +  pointer_model_v0.7.0.config.json

train.py can also write .safetensors directly via --weights-out *.safetensors.

B) Once the HF repo is up (gated on the v0.7 retrain), use the one-liner the package is designed for:

import cv2
from inference import PointerFinder

img = cv2.imread("snap.jpg")  # any resolution — auto-resized to native (994×2160)
finder = PointerFinder.from_pretrained("ellyseum/ios_pointer_finder")
result = finder(img)
print(result.x, result.y, result.confidence, result.heatmap_peak)
# 656 1424 0.94 0.81

For now (path A), call PointerFinder with a local .safetensors or .pt path:

from inference import PointerFinder
finder = PointerFinder.from_pretrained("./pointer_model_v0.7.0.safetensors")

See docs/MODEL_CARD.md for the full inference contract and provenance.

Why

Vision-LLM-based agents that "look at the screen" are too slow for real-time control loops (2-10 seconds per turn). For the iPhone agent we wanted, we needed one perception question — where is the cursor, right now? — answered in milliseconds, on commodity hardware, with no API call. A specialized 338K-param CNN handles it in 10 ms on an RTX 5080 and ~30 ms on Apple Silicon CoreML.

The general-purpose vision LLM still answers "what's on this screen" when we need it. The cursor finder answers "where's the cursor" continuously.

Model card (v0.7.0)

Field	Value
Architecture	5-block conv backbone → 1×1 heatmap head + conf head
Parameters	338,274
File size	1.3 MB (.safetensors fp32)
Native input	994 × 2160 (iPhone H264 stream)
Train input	497 × 1080 (2× downsample)
Heatmap stride	1/8 of train resolution (≈ 1/16 of native after the 2× input downsample)
Inference latency	10 ms (RTX 5080) / ~30 ms (M-series CoreML)
Throughput	95 fps (single-image batch, fp32, RTX 5080)
Val pos error	TBD — v0.7.0 cold-start retrain pending. v0.5 reached 18.9 px on bg-level honest split before the v0.6 fixes landed.
FPR (cursor-free)	<2% at conf ≥ 0.5 on held-out backgrounds
Weights license	CC-BY-4.0

See docs/MODEL_CARD.md for the full evaluation breakdown (per-sample-type metrics, failure modes, comparison to v0.2 / v0.3.x).

How it works

iPhone screen capture (994×2160 BGR)
        │
        │  resize to 497×1080
        ▼
   conv backbone (3 stride-2 + 2 stride-1 blocks → 1/8 of train resolution)
        │
        ├── heatmap head (1×1 conv) → 63×135 sigmoid map → argmax → (cx, cy) in native px
        └── conf head (global avg pool → MLP) → P(cursor present)

Trained on synthetic composites:

backgrounds — real iPhone screen captures (cursor-free) — bring your own
cursor — procedural smoothstep disc (peak alpha ≈ 0.25, luminance-matched to the local background), resized to ~46 native px and composited at random positions. v0.7 ships on this canonical synth target. A captured sprite at sprites/at_dot.png may be substituted, but only when accompanied by an approved sidecar manifest (sha256 + approver) — the loader fails hard otherwise. The synthetic mix includes hard negatives (decoy shapes designed to look cursor-like at a distance) and edge-clipped positives (cursor partially off-screen).

See docs/ARCHITECTURE.md for the full data + loss + training schedule writeup.

Eval

Version	Val pos err	Cursor-free FPR	Real-frame top-1 hit	Inference (ms, RTX 5080)
v0.2	73.9 px	27%	41/50	10
v0.3.4	30.5 px	1.7%	49/50	10
v0.4.0	22.9 px	<2%	—	10
v0.5.0	18.9 px	<2%	—	10
v0.7.0	TBD	TBD	—	10

All versions train on synthetic data (real backgrounds + composited cursor sprite) with a bg-level honest val split — apples-to-apples comparable from v0.3.x onward. v0.4.0 added correctness fixes (float labels through augmentation, parabolic subpixel refinement at inference). v0.5.0/v0.6.x shipped on a sprite asset that turned out to be a UI badge, not the cursor — v0.7 reverts to the v0.4 procedural smoothstep disc as the canonical synth target, adds a visual-validation gate that catches the failure pattern, and re-enables the previously-silent real-frame regression eval. v0.7 also unifies the decoder across all aux scripts (single canonical decode.py), switches the heatmap BCE reduction from mean to sum with a calibrated HM_WEIGHT (the prior mean form diluted the localization gradient ~1400× relative to the confidence head), and swaps confidence-head pooling from average to max (the avg-pool washed out the cursor signal at the head). v0.7.0 number lands here once retrain completes.

Run the eval harness yourself:

python eval_v03.py --v02 pointer_model_v0.2.0.pt --v07 pointer_model_v0.7.0.safetensors

Train your own

git clone https://github.com/ellyseum/ios_pointer_finder.git
cd ios_pointer_finder
pip install -e ".[dev]"

# 1. Capture iPhone backgrounds (cursor OFF). See capture_backgrounds.py for the workflow.
mkdir -p backgrounds_kept
python capture_backgrounds.py  # interactive curation

# 2. Synthesize the training set (~150K samples).
python synthesize.py --out dataset --n 150000

# 3. Train.
python train.py --dataset dataset --epochs 30 --augment

# 4. Eval against v0.2 baseline.
python eval_v03.py

See docs/DATASET.md for background-collection guidance.

Repo layout

ios_pointer_finder/
├── inference.py           # public PointerFinder API (load + predict)
├── train.py               # training loop, semver-tagged checkpoints
├── synthesize.py          # synthetic dataset generator
├── eval_v03.py            # version comparison eval
├── capture_backgrounds.py # iPhone screen capture pipeline
├── extract_sprite.py      # extract a real cursor sprite from a screen capture (optional)
├── grid_overlay.py        # debug overlay
├── keep_picker.py         # interactive background curation
├── cli.py                 # `ipf` console entry point
├── scripts/
│   ├── convert_pt_to_safetensors.py   # one-shot .pt → .safetensors converter
│   ├── export_onnx.py                 # ONNX export
│   └── export_coreml.py               # CoreML export (Mac)
├── tests/                 # pytest suite (shape, golden image, smoke)
├── docs/
│   ├── MODEL_CARD.md
│   ├── DATASET.md
│   └── ARCHITECTURE.md
├── .github/workflows/     # CI (ruff + pytest) + release automation
├── VERSION                # current model version (semver)
└── bump.sh                # patch/minor/major bump + tag

Model hosting

Trained weights are distributed at:

Primary: Hugging Face Hub — ellyseum/ios_pointer_finder — versioned .safetensors + sidecar config.json + model card.
Mirror: GitHub Releases — each semver tag attaches the canonical pointer_model_v{X}.{Y}.{Z}.safetensors and a matching pointer_model_v{X}.{Y}.{Z}.config.json.

We don't use Git LFS — at 1.3 MB per checkpoint and ~10 production checkpoints, hosting on HF/Releases is faster, cheaper, and more discoverable.

To convert any historical .pt checkpoint locally:

python scripts/convert_pt_to_safetensors.py pointer_model_v0.3.4_30.5px.pt
# → pointer_model_v0.3.4_30.5px.safetensors + pointer_model_v0.3.4_30.5px.config.json

Development

pip install -e ".[dev]"

# Lint + format
ruff check .
ruff format .

# Tests
pytest -q

# Bump version + tag a new model
./bump.sh patch --commit

CI runs ruff + pytest on every push and PR. See CONTRIBUTING.md for the full PR / release workflow.

Versioning history

The shipped weights are calibrated on iPhone 16 Pro Max (iPhone17,2), iOS 26.3.1, portrait orientation, 994×2160 AirPlay mirror native. Each major version represents a deliberate stage in moving from "works on one device" to "works on any current iPhone."

Stage	Tag(s)	Story
PoC	`v0.1` / `v0.2`	Initial proof: synthetic-data CNN learns the cursor sprite at all. Bg-leaky validation, 73.9 px val pos error.
Heatmap head + honest val	`v0.3.0` … `v0.3.4`	Heatmap regression replaces xy-only regression. Loss mask fix so negatives no longer push down the heatmap globally. Bg-level val split so validation stops leaking. Hard negatives (decoy cursor shapes) so the model rejects icon dots and badges. Train-time augmentation + cosine restart for the overfit-fix wave. v0.3.4: 30.5 px val, FPR <2%.
Correctness wave I	`v0.4.0`	Float labels through augmentation, parabolic subpixel refinement at inference, mask-aware heatmap eval. 22.9 px val.
Correctness wave II	`v0.5.0`	Real captured iOS pointer sprite (replaces procedural smoothstep disc — still synthetic compositing), alpha-mass-centroid labels (replaces geometric-tile-center labels), stride-aware coord mapping (replaces linear stretch), parabolic on raw logits (replaces sigmoid-domain), tighter Gaussian σ, plain/hard neg loss split. 18.9 px val.
v0.6.0 (current)	`v0.6.0`	Forward signature simplified — drops the unused soft-argmax head — so `PointerNet.forward(x)` returns `(conf_logit, heatmap)`. Asymmetric cursor-safe crop matching the real-sprite hotspot. H-flip disabled on positives (real sprite is left-right asymmetric). Hard-negative crop guard. Single canonical decoder reused by all aux scripts and exporters. `.safetensors` round-trip with `<stem>.config.json` sidecar. Breaking change for the exported `.onnx` / `.mlpackage` schema.
Bootstrap loop (planned)	—	Use a trained-enough agent to drive the cursor and emit verified real labels (move → observe → reverse-move → re-observe). Retrain on synthesis + verified-real. Eliminates the synthetic-data ceiling without manual labels.
Generalization (planned)	—	Multi-device dataset (iPhone 15 series, 16 / Pro / Pro Max / Plus) collected via the bootstrap loop on each device. One model that works on any current iPhone in portrait + landscape. Per-app UI element classifier as a second perception head.
v1.0	—	Stable public API + cross-platform export (CoreML / ONNX) + multi-device coverage. Frozen interface.

Roadmap milestones

Bootstrap loop / noisy-student self-labeling (next)
- On-device explorer agent: drives the cursor, records frame + commanded position
- Move-and-undo dance for verified ground-truth labels (motion existence + reversibility, NOT commanded magnitude — iOS Tracking Speed is per-device-tunable)
- 3.5 s auto-hide window for cursor-free background capture
- Retrain on synthesis + verified-real mix; iterate until val saturates
- Goal: surpass the synthetic-only ceiling on real-frame eval
Generalization across iPhones
- Bootstrap on iPhone 15 / 15 Pro / 16 / 16 Pro / 16 Plus / 16 Pro Max
- Landscape orientation
- Single multi-device model
- Per-app UI element classifier as a second perception head

Citation

@software{ios_pointer_finder_2026,
  author       = {Jocelyn Ellyse},
  title        = {ios\_pointer\_finder: a tiny CNN for iPhone Pointer-Control cursor detection},
  year         = {2026},
  url          = {https://github.com/ellyseum/ios_pointer_finder},
  version      = {0.6.0}
}

See CITATION.cff for machine-readable citation metadata.

License

Code: MIT (see LICENSE)
Trained weights: CC-BY-4.0 (see LICENSE-WEIGHTS)

The cursor sprite used during training (sprites/at_dot.png, 36×36 BGRA) was captured from a single high-resolution screenshot of the iOS Pointer-Control cursor and alpha-matted by hand. It ships in this repository as a small training artifact; iOS asset reproduction at this scale and form is fair-use research/utility, not a redistribution of any Apple image set. Hard-negative decoy sprites are still procedurally generated in synthesize.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ios_pointer_finder

Quickstart

Why

Model card (v0.7.0)

How it works

Eval

Train your own

Repo layout

Model hosting

Development

Versioning history

Roadmap milestones

Citation

License

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github		.github
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-WEIGHTS		LICENSE-WEIGHTS
README.md		README.md
VERSION		VERSION
_run_v07.sh		_run_v07.sh
bump.sh		bump.sh
capture_backgrounds.py		capture_backgrounds.py
cli.py		cli.py
click_at.py		click_at.py
decode.py		decode.py
eval_v03.py		eval_v03.py
extract_sprite.py		extract_sprite.py
grid_overlay.py		grid_overlay.py
inference.py		inference.py
keep_picker.html		keep_picker.html
keep_picker.py		keep_picker.py
pyproject.toml		pyproject.toml
synthesize.py		synthesize.py
test_real.py		test_real.py
test_real_bbox.py		test_real_bbox.py
train.py		train.py
train_continuous.sh		train_continuous.sh

Folders and files

Latest commit

History

Repository files navigation

ios_pointer_finder

Quickstart

Why

Model card (v0.7.0)

How it works

Eval

Train your own

Repo layout

Model hosting

Development

Versioning history

Roadmap milestones

Citation

License

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages