Skip to content

imessage: backfill display names from vCard contacts#303

Open
bjabes wants to merge 4 commits into
wesm:mainfrom
bjabes:imessage-contact-names
Open

imessage: backfill display names from vCard contacts#303
bjabes wants to merge 4 commits into
wesm:mainfrom
bjabes:imessage-contact-names

Conversation

@bjabes
Copy link
Copy Markdown

@bjabes bjabes commented Apr 29, 2026

Summary

iMessage-imported users showed up as raw phone numbers / emails because chat.db only stores handles — names came through only when another source (Gmail headers, Google Voice, WhatsApp --contacts) had already populated participants.display_name.

Two changes that together let names appear:

  • Stop poisoning display_name with the phone string when the iMessage importer creates a new participant. The previous behavior left display_name = "+15551234567", which blocked later imports from filling in a real name (the update guard only writes when display_name is NULL/empty). New iMessage rows now leave display_name empty and let the first name-bearing import win.

  • Add --contacts <vcf> to import-imessage, mirroring WhatsApp's flag. Backfills names by phone and email from a vCard export (e.g. macOS Contacts.app → File → Export → Export vCard). Only updates participants that already exist with an empty display_name, so any prior name is preserved.

Plumbing

  • Extract the vCard parser into a new internal/vcard package so iMessage and WhatsApp share it (whatsapp.ImportContacts now wraps it). The parser also reads EMAIL fields, not just TEL/FN.
  • Add Store.UpdateParticipantDisplayNameByEmail — case-insensitive, first-writer-wins, never creates rows.

Usage

msgvault import-imessage --contacts ~/contacts.vcf

Output now includes:

Contacts applied:
  Source:           ~/contacts.vcf (1234 entries)
  Names backfilled: 412 by phone, 38 by email

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 29, 2026

roborev: Combined Review (c4911d6)

Summary verdict: One medium issue remains; no high or critical findings were reported.

Medium

  • internal/store/messages.go:1226 - The iMessage contacts backfill only updates empty display_name values, but existing imports may have stored phone numbers as display_name. For upgraded users, import-imessage --contacts will not replace those legacy placeholder names with vCard names, limiting the backfill for the main existing-data case. Treat legacy placeholders as empty during iMessage contact backfill, for example by allowing updates when display_name = phone_number for participants with an iMessage identifier, while still preserving real names from other sources.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@bjabes
Copy link
Copy Markdown
Author

bjabes commented Apr 30, 2026

Pushed 122189b to address the legacy-placeholder feedback.

import-imessage --contacts now uses a new Store.UpdateImessageParticipantDisplayNameByPhone that overwrites when:

  • display_name IS NULL / empty (existing behavior), or
  • display_name = phone_number (legacy placeholder), gated on the participant having an imessage identifier so other-source participants are never touched.

Test (TestUpdateImessageParticipantDisplayNameByPhone) covers all three cases: legacy placeholder cleared, real Gmail name preserved, non-iMessage participant untouched. The email path is unchanged — iMessage email handles were never poisoned.

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 30, 2026

roborev: Combined Review (122189b)

Findings: two medium issues need attention before merge.

Medium

  • internal/vcard/vcard.go:35 - The parser unfolds continuation lines before handling quoted-printable soft breaks, so a valid wrapped value like FN;ENCODING=QUOTED-PRINTABLE:Jo=C3=A3o da =\r\n Silva becomes ... da =Silva and decodes with a literal = in the name. Handle quoted-printable soft breaks during unfolding, or join QP soft breaks before normal continuation folding.

  • internal/vcard/vcard.go:91 - vCard property groups such as item1.TEL and item1.EMAIL are not recognized because matching only checks for lines starting with TEL or EMAIL. These grouped properties are common in vCard 3.0 exports, including Apple Contacts-style files, so valid contact phones/emails can be skipped. Strip an optional group prefix before matching the property name, then handle TEL, EMAIL, and FN from the normalized key.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 30, 2026

roborev: Combined Review (44d9f7d)

Summary verdict: One medium correctness issue needs fixing before merge.

Medium

  • internal/store/messages.go:1291 - RetitleImessageDirectChats can skip email-only iMessage direct chats. The query matches email-titled chats, but p.display_name != p.phone_number is not NULL-safe; when phone_number is NULL, the comparison evaluates to UNKNOWN, so Apple ID/email participants whose display names were backfilled may remain titled as the email address.

    Fix: Make the placeholder check NULL-safe in both the title subquery and EXISTS predicate, for example:

    p.phone_number IS NULL OR p.display_name != p.phone_number

    Add a regression test for an email-only direct iMessage chat.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 30, 2026

roborev: Combined Review (9205e44)

Clean: no Medium, High, or Critical findings were reported.

All reviewed outputs agree there are no actionable issues to include.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Owner

wesm commented May 1, 2026

Thanks for working on this, I hadn't yet looked into how to get people's names into the whatsapp/imessage screens. I need to see how this composes with the other open PRs so bear with me

bjabes and others added 4 commits May 6, 2026 15:00
chat.db only stores phone/email handles, so iMessage-imported
participants showed up as raw numbers/addresses unless another source
(Gmail headers, Google Voice, WhatsApp --contacts) had previously
populated their display_name.

Two changes that together let names appear:

- Stop poisoning display_name with the phone string when the iMessage
  importer creates a new participant. The previous behavior left a
  non-empty display_name = "+15551234567" that blocked later imports
  from filling in a real name (the update guard requires NULL/empty).
  Pass "" instead so first-name-bearing import wins.

- Add --contacts <vcf> to import-imessage, mirroring WhatsApp's flag.
  Backfills participant names by phone and email from a vCard export
  (e.g. macOS Contacts.app → Export). Only updates participants that
  already exist and have an empty display_name, so prior names are
  preserved.

Plumbing:
- Extract the vCard parser into internal/vcard so iMessage and WhatsApp
  share it instead of cross-importing each other. EMAIL fields are now
  parsed in addition to TEL/FN.
- Add Store.UpdateParticipantDisplayNameByEmail (case-insensitive,
  first-writer-wins, never creates rows).

Additional changes squashed in:
- imessage: clear legacy phone-as-name placeholders on backfill
- imessage: refresh stale 1:1 conversation titles after import
- imessage: force full cache rebuild after name/title backfill
- vcard: stop eating END:VCARD when base64 PHOTO ends with '=' padding
- imessage: refresh generated group chat titles
- vcard: handle folded QP and grouped properties

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phone-only iMessage/SMS participants — display_name empty,
email_address NULL, phone_number set — disappeared from sender/
recipient name views and filters because the name fallback chain
stopped at email_address.

Add a single helper participantNameExpr(alias) returning
COALESCE(NULLIF(TRIM(display_name), ''), NULLIF(phone_number, ''),
email_address) and route every aggregate keyExpr, null-guard, and
SenderName/RecipientName filter through it. The existing text-search
sites that already had the phone fallback (duckdb_text.go,
sqlite_text.go) now use the helper too.

ParticipantOpts grows a Phone field and TestDataBuilder gets
AddPhoneParticipant; tests cover the SQLite aggregate, DuckDB
aggregate, and both ListMessages name-filter paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SearchByDomains skipped store.LiveMessagesWhere, so the MCP
search_by_domains tool could surface dedup losers (deleted_at) and
source-deleted rows that every other read path suppresses.

Apply LiveMessagesWhere("m", true) — same predicate Search/SearchFast
use — so dedup losers are always hidden alongside source-deleted
rows. New regression test soft-deletes one message of each kind and
verifies neither leaks. Adds dbtest.MarkDedupLoserByID for setting
deleted_at directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PST EntryIDs are unique only within a single archive, but the
importer keys messages as "pst-<EntryID>" against a source shared by
all PST files imported under the same identifier. Importing a
second archive into the same mailbox could collide with rows from
the first and silently skip or mis-update unrelated messages.

Compute a stable per-archive fingerprint from a SHA-256 over the
first 4 KiB of the PST file (the header, where unique BID/NID
counters live) and prefix it on source_message_id:

  pst-<fp12>-<EntryID>

Same bytes always yield the same fingerprint regardless of path, so
re-importing the same file remains idempotent. Different files now
sit in disjoint key spaces even when they share a source.

Unit test exercises distinct/stable fingerprint behavior; the
existing TestImportPst_SupportPST_Idempotent integration test
continues to cover round-trip idempotence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wesm wesm force-pushed the imessage-contact-names branch from 9205e44 to 7081192 Compare May 10, 2026 13:36
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 10, 2026

roborev: Combined Review (7081192)

Duplicate imports and incomplete sender-name filtering remain; no Critical or High findings were reported.

Medium

  • internal/importer/pst_import.go:515
    Changing PST source_message_id from pst-<EntryID> to pst-<archiveID>-<EntryID> breaks idempotence for PSTs imported before this change. Re-importing the same archive after upgrade will not match existing rows and can duplicate messages under the same source.
    Fix: Add legacy-key handling during import, such as checking for pst-<EntryID> and migrating it to the namespaced key or treating either key as an existing message before ingesting.

  • internal/query/duckdb.go:733
    DuckDB sender-name aggregate search still does not include phone_number in the sender-name key columns, even though the visible aggregate key now falls back to phone numbers. A phone-only sender can appear in ViewSenderNames but disappear when searching/filtering that aggregate by the displayed phone number.
    Fix: Include pAlias + ".phone_number" in the ViewSenderNames keyColumns, matching the recipient-name update and the new participantNameExpr fallback.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants