Skip to content

Add Claude Code and Gemini transcript support for context importer#159

Closed
Ankit-Kotnala wants to merge 2 commits intoXortexAI:mainfrom
Ankit-Kotnala:dev/Ankit
Closed

Add Claude Code and Gemini transcript support for context importer#159
Ankit-Kotnala wants to merge 2 commits intoXortexAI:mainfrom
Ankit-Kotnala:dev/Ankit

Conversation

@Ankit-Kotnala
Copy link
Copy Markdown
Contributor

Summary

Fixes #155.

Adds deterministic transcript parsing support for additional /context upload formats:

  • Claude Code JSONL session transcripts
  • Claude/Claude Code role-heading exports
  • Gemini CLI /chat share JSON exports
  • Gemini CLI /chat share Markdown exports

Also keeps existing Cursor and Antigravity behavior by moving transcript parsing into a shared helper used by both the production memory route and the legacy server entrypoint.

Changes

  • Added src/utils/transcripts.py as the shared transcript parser module.
  • Updated /v1/memory/parse_transcript to use the shared parser.
  • Updated legacy server.py parsing wrapper to use the same shared parser.

@Ankit-Kotnala
Copy link
Copy Markdown
Contributor Author

@ishaanxgupta Please review this.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes transcript parsing logic into a new utility module, src/utils/transcripts.py, replacing duplicate implementations in server.py and src/api/routes/memory.py. The new shared parser adds support for Claude Code (JSONL), Gemini CLI, and Claude-style markdown exports while implementing logic to filter out tool calls and thinking blocks. Feedback identified several instances where consecutive user messages would be overwritten rather than concatenated, leading to potential data loss. Additionally, a bug was found in the text cleaning utility that would incorrectly strip markdown list markers.

Comment thread src/utils/transcripts.py Outdated
Comment on lines +488 to +489
def _clean_text(text: str) -> str:
return text.strip().strip("-").strip()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The strip("-") call in _clean_text is problematic for Markdown content. It will remove leading bullet points from list items (e.g., - Item becomes Item) and can strip horizontal rules or other intentional formatting. It should be removed to preserve the integrity of the message content.

Suggested change
def _clean_text(text: str) -> str:
return text.strip().strip("-").strip()
def _clean_text(text: str) -> str:
return text.strip()

Comment thread src/utils/transcripts.py
Comment on lines +280 to +284
if role in _USER_ROLES:
flush_pair()
current_user_query = text
elif current_user_query:
assistant_chunks.append(text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic discards previous user messages if multiple user turns occur consecutively without an intervening assistant response. In many chat transcripts (especially from CLI tools or when users send multiple fragments), it's better to concatenate consecutive user messages to ensure no context is lost during import.

Suggested change
if role in _USER_ROLES:
flush_pair()
current_user_query = text
elif current_user_query:
assistant_chunks.append(text)
if role in _USER_ROLES:
if assistant_chunks:
flush_pair()
current_user_query = text
else:
current_user_query = (current_user_query + "\n\n" + text) if current_user_query else text
elif current_user_query:
assistant_chunks.append(text)

Comment thread src/utils/transcripts.py
Comment on lines +436 to +440
if role in _USER_ROLES:
flush_pair()
current_user_query = content
elif role in assistant_roles and current_user_query:
assistant_chunks.append(content)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the JSON parser, this role-heading parser also overwrites the current_user_query if multiple user headings are encountered before an assistant response. Concatenating them would prevent data loss.

Suggested change
if role in _USER_ROLES:
flush_pair()
current_user_query = content
elif role in assistant_roles and current_user_query:
assistant_chunks.append(content)
if role in _USER_ROLES:
if assistant_chunks:
flush_pair()
current_user_query = content
else:
current_user_query = (current_user_query + "\n\n" + content) if current_user_query else content
elif role in assistant_roles and current_user_query:
assistant_chunks.append(content)

Comment thread src/utils/transcripts.py Outdated
Comment on lines +99 to +100
if section.startswith("**User**"):
current_user_query = section.replace("**User**", "", 1).strip()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the Cursor transcript parser, consecutive user sections will result in the earlier sections being lost. Consider concatenating them to preserve all user input.

        if section.startswith("**User**"):
            content = section.replace("**User**", "", 1).strip()
            if current_user_query:
                current_user_query += "\n\n" + content
            else:
                current_user_query = content

@Ankit-Kotnala Ankit-Kotnala deleted the dev/Ankit branch May 8, 2026 19:02
@Ankit-Kotnala Ankit-Kotnala restored the dev/Ankit branch May 8, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add support of gemini, claude in /context route

1 participant