Skip to content

refactor(llm-access-kiro): split anthropic/converter.rs into submodules#12

Merged
acking-you merged 2 commits into
masterfrom
refactor/split-kiro-anthropic-converter
May 31, 2026
Merged

refactor(llm-access-kiro): split anthropic/converter.rs into submodules#12
acking-you merged 2 commits into
masterfrom
refactor/split-kiro-anthropic-converter

Conversation

@acking-you
Copy link
Copy Markdown
Owner

What

Split the 6071-line llm-access-kiro/src/anthropic/converter.rs into a
converter/mod.rs facade plus 15 concern-focused submodules. Pure structural
move — no logic changes. Next step in the incremental llm-access* refactor
(follows #10 stream, #7 cache_sim, #4 request).

Layout

The facade (mod.rs) keeps the shared data model (ConversionResult,
NormalizedRequest + its private fields, SessionTracking, the Session*/
Tool* events, ConversionError), module consts, the invalid_request
helper, and the integration tests. Each conversion concern moves to a
descendant submodule — descendants read the parent's private struct fields, so
no field was made public.

Submodule Concern
model model-id mapping + context-window sizing
schema JSON-schema normalization + multimodal-compat checks
document / image attachment normalization + format detection
tool_name tool-name/tool-use-id sanitization, aliasing, dup-id rewrite
tools tool-definition conversion + structured-output injection
tool_result tool-result text/attachment extraction
tool_pairing tool-use/tool-result pairing validation + orphan pruning
session conversation-id resolution from metadata
identity model-identity probes / Claude Code identity + billing-header strip
thinking thinking-prefix generation + injection
system system-prompt assembly
validate up-front request validation
normalize per-message + tool-schema normalization (normalize_request)
convert history/current-turn assembly into a Kiro ConversationState

Module style (per the agreed standard)

  • mod.rs convention; no use super::* in submodules — explicit origin
    imports. Anthropic types via crate::anthropic::types:: (submodule super
    is now converter, not anthropic), wire types via crate::wire::.
  • No pub use X::* globs: mod.rs re-exports only the 8 entry-point fns
    used outside the module, by name. The public data-model types stay defined in
    the facade, so the converter::X surface (consumed by llm-access
    provider.rs / provider/kiro_error.rs and the stream submodule) is
    unchanged.
  • Minimal visibility: a moved fn is pub only if referenced from another
    module (45 of 96); the rest stay private.
    convert_normalized_request_with_validation keeps pub(crate). Cross-module
    helper imports used only by #[cfg(test)] code are #[cfg(test)]-gated so
    the lib build stays warning-clean.
  • Tests inlined: 7 self-contained unit tests move into their owning submodule;
    the 70 integration tests stay in mod.rs against the public API.

Verification

  • Item inventory diff vs original: 0 dropped, 0 added (the lone "added"
    hit was a doc-comment prose match — media-type canonicalization).
  • #[test] count 77 = 77.
  • cargo clippy -p llm-access-kiro --all-targets -- -D warnings: clean.
  • cargo test -p llm-access-kiro: 164 passed.
  • Reverse-dep cargo build -p llm-access: compiles against the unchanged
    public surface.
  • rustfmt on the 16 new files only; deps/lance & deps/lancedb stayed clean.

🤖 Generated with Claude Code

…r/ submodules

Break the 6071-line anthropic/converter.rs into a converter/mod.rs facade plus
15 concern-focused submodules, following the agreed module style. Pure
structural move — no logic changes.

The facade (mod.rs) keeps the shared data model (ConversionResult,
NormalizedRequest, SessionTracking, the Session*/Tool* events, ConversionError,
NormalizedRequest's private fields, etc.), the module consts, the
invalid_request helper, and the 70 integration tests. Each conversion concern
moves to a descendant submodule (descendants read the parent's private struct
fields, so no field was made public):

- model: model-id mapping + per-model context-window sizing.
- schema: JSON-schema normalization + multimodal-compat keyword checks.
- document / image: attachment normalization + format detection.
- tool_name: tool-name/tool-use-id sanitization, aliasing, dup-id rewriting.
- tools: tool-definition conversion + structured-output tool injection.
- tool_result: tool-result text/attachment extraction.
- tool_pairing: tool-use/tool-result pairing validation + orphan pruning.
- session: conversation-id resolution from metadata.
- identity: model-identity probes / Claude Code identity + billing-header strip.
- thinking: thinking-prefix generation + injection.
- system: system-prompt assembly.
- validate: up-front request validation.
- normalize: per-message + tool-schema normalization (normalize_request).
- convert: history/current-turn assembly into a Kiro ConversationState.

Module style:
- mod.rs convention; no `use super::*` in submodules — explicit origin imports.
  Anthropic types are reached via `crate::anthropic::types::` (submodule `super`
  is now `converter`, not `anthropic`), wire types via `crate::wire::`.
- No `pub use X::*` globs: mod.rs re-exports only the 8 entry-point fns used
  outside the module, by name (map_model, get_context_window_size,
  normalize_request, convert_normalized_request_with_resolved_session,
  current_user_message_range, preview_session_value,
  resolve_conversation_id_from_metadata, extract_tool_result_content). The
  public data-model types stay defined in the facade, so the converter::X
  surface (consumed by llm-access provider.rs/provider/kiro_error.rs and the
  stream submodule) is unchanged.
- Minimal visibility: a moved fn is `pub` only if referenced from another module
  (45 of 96); the rest stay private. `convert_normalized_request_with_validation`
  keeps `pub(crate)`. Cross-module helper imports used only by `#[cfg(test)]`
  code are `#[cfg(test)]`-gated so the lib build stays warning-clean.
- Tests inlined: 7 self-contained unit tests move into their owning submodule
  (so the private items they exercise need no extra exposure); the 70
  integration tests stay in mod.rs against the public API.

No behavior change: item inventory diff vs the original shows zero dropped and
zero added items, #[test] count 77 = 77; clippy -D warnings clean (lib + tests),
164 kiro tests pass, and the reverse-dep llm-access binary compiles against the
unchanged public surface.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive conversion pipeline to translate Anthropic MessagesRequest payloads into Kiro ConversationState structures, handling message normalization, document deduplication, model mapping, and tool-use pairing. The review feedback highlights several critical improvements for robustness and performance, including avoiding potential panics from byte-based string slicing on non-UTF-8 boundaries, strictly validating UUIDs using the uuid crate, optimizing string traversal in session previews, and eliminating redundant string allocations during tool name generation.

Comment on lines +36 to +43
let pos = user_id.find("session_")?;
let session_part = &user_id[pos + 8..];
if session_part.len() < 36 {
return None;
}

let uuid = &session_part[..36];
is_valid_uuid(uuid).then(|| uuid.to_string())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Slicing session_part[..36] directly can panic if the index 36 does not lie on a UTF-8 character boundary (e.g., if the string contains multi-byte characters). To prevent panics on non-character boundaries, avoid byte-based slicing and use character-based truncation instead.

    let pos = user_id.find("session_")?;
    let session_part = &user_id[pos + 8..];
    let uuid: String = session_part.chars().take(36).collect();
    (uuid.chars().count() == 36 && is_valid_uuid(&uuid)).then(|| uuid)
References
  1. In Rust, avoid using String::truncate with byte indices on strings that may contain multi-byte UTF-8 characters to prevent panics on non-character boundaries. Use character-based truncation like chars().take(limit).collect() instead.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the resolve_conversation_id_from_metadata thread — deferring to a dedicated fix PR. Note this particular fn is #[cfg(test)] (test-only), so of the two slice sites it is the lower-priority one; the production path is resolve_conversation_id_from_metadata.

Comment on lines +109 to +118
let session_part = &user_id[pos + 8..];
if session_part.len() < 36 {
return generated_fallback(
SessionFallbackReason::InvalidLegacySessionId,
None,
user_id_preview,
);
}

let uuid = &session_part[..36];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Slicing session_part[..36] directly can panic if the index 36 does not lie on a UTF-8 character boundary. To prevent panics on non-character boundaries, avoid byte-based slicing and use character-based truncation instead.

Suggested change
let session_part = &user_id[pos + 8..];
if session_part.len() < 36 {
return generated_fallback(
SessionFallbackReason::InvalidLegacySessionId,
None,
user_id_preview,
);
}
let uuid = &session_part[..36];
let session_part = &user_id[pos + 8..];
let uuid: String = session_part.chars().take(36).collect();
if uuid.chars().count() != 36 {
return generated_fallback(
SessionFallbackReason::InvalidLegacySessionId,
None,
user_id_preview,
);
}
References
  1. In Rust, avoid using String::truncate with byte indices on strings that may contain multi-byte UTF-8 characters to prevent panics on non-character boundaries. Use character-based truncation like chars().take(limit).collect() instead.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid pre-existing concern, but holding for a separate fix PR. This PR is a behavior-preserving structural split (verified by item-inventory diff + the same unchanged tests), and a panic-to-fallback guard is a behavior change that does not belong in it — the slice site here was relocated verbatim, not introduced.

Worth flagging: the suggested session_part.chars().take(36) also changes byte semantics to char semantics (equal only because UUIDs are ASCII). The byte-preserving guard is session_part.get(..36). That deserves its own PR with a regression test for non-UTF-8 / multibyte user_id input.

Comment on lines +46 to +48
pub fn is_valid_uuid(value: &str) -> bool {
value.len() == 36 && value.chars().filter(|ch| *ch == '-').count() == 4
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of is_valid_uuid only checks the length and the count of hyphens, which is extremely loose and can accept invalid UUID strings (e.g., all hyphens). Since the uuid crate is already imported, we should use Uuid::parse_str to perform a strict and robust validation.

Suggested change
pub fn is_valid_uuid(value: &str) -> bool {
value.len() == 36 && value.chars().filter(|ch| *ch == '-').count() == 4
}
pub fn is_valid_uuid(value: &str) -> bool {
Uuid::parse_str(value).is_ok()
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding for the same follow-up PR. Tightening to Uuid::parse_str is a behavior change — it rejects strings the current 36-char / 4-hyphen check accepts — so it needs its own PR plus a check that upstream session_id values are always strict UUIDs before narrowing acceptance. Out of scope for this behavior-preserving structural move.

Comment on lines +12 to +21
pub fn preview_session_value(value: &str) -> String {
let mut preview = value
.chars()
.take(SESSION_SOURCE_PREVIEW_MAX_CHARS)
.collect::<String>();
if value.chars().count() > SESSION_SOURCE_PREVIEW_MAX_CHARS {
preview.push_str("...[truncated]");
}
preview
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The preview_session_value function currently traverses the string twice: once to collect the preview characters, and once to count all characters in the string via value.chars().count(). This can be inefficient for very long strings. We can optimize this by using the iterator directly to check if there are any remaining characters.

pub fn preview_session_value(value: &str) -> String {
    let mut chars = value.chars();
    let mut preview: String = chars.by_ref().take(SESSION_SOURCE_PREVIEW_MAX_CHARS).collect();
    if chars.next().is_some() {
        preview.push_str("...[truncated]");
    }
    preview
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c9b57. Single iterator now: take the preview, then peek one more char (chars.next().is_some()) to decide the truncation marker — no second chars().count() pass. Byte-identical output.

Comment on lines +122 to +129
if !existing.contains(&STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_lowercase()) {
return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string();
}
for suffix in 1.. {
let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}");
if !existing.contains(&candidate.to_lowercase()) {
return candidate;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since STRUCTURED_OUTPUT_TOOL_NAME_BASE is a lowercase constant ("sf_emit_structured_output"), calling .to_lowercase() on it and the generated candidate string is redundant and causes unnecessary allocations. We can directly query the HashSet with the lowercase constant and the candidate string.

Suggested change
if !existing.contains(&STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_lowercase()) {
return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string();
}
for suffix in 1.. {
let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}");
if !existing.contains(&candidate.to_lowercase()) {
return candidate;
}
if !existing.contains(STRUCTURED_OUTPUT_TOOL_NAME_BASE) {
return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string();
}
for suffix in 1.. {
let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}");
if !existing.contains(&candidate) {
return candidate;
}
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c9b57. STRUCTURED_OUTPUT_TOOL_NAME_BASE and the {BASE}_{suffix} candidates are already lowercase, so those .to_lowercase() calls were no-op allocations — now querying the HashSet with the borrowed &str/&String directly. The per-tool .to_lowercase() that builds the set stays (it normalizes arbitrary tool names for case-insensitive comparison).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f50d18829

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


use std::collections::{BTreeMap, HashMap};

pub use convert::{convert_normalized_request_with_resolved_session, current_user_message_range};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-export convert_request from the facade

convert_request was a public function on anthropic::converter before this split, but the new facade only re-exports other entry points while convert_request now lives inside the private convert submodule. Because anthropic/mod.rs exposes pub mod converter, any caller using llm_access_kiro::anthropic::converter::convert_request can no longer compile after this refactor even though the function still exists internally; include it in this pub use to preserve the previous public surface.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one does not hold. convert_request is #[cfg(test)]-gated (it was in the original file too), so it does not exist in non-test builds and was never part of the public surface — no downstream caller can name llm_access_kiro::anthropic::converter::convert_request, and a workspace grep finds no external (non-test) caller. The real entry points (convert_normalized_request_with_resolved_session, normalize_request, current_user_message_range, extract_tool_result_content, etc.) are re-exported by name, and the reverse-dep llm-access build compiles green against the unchanged surface. So there is nothing to re-export here.

…/tools

Two behavior-preserving cleanups flagged in PR #12 review (both on code this PR
only relocated, not introduced):

- preview_session_value traversed the string twice (collect the preview, then
  chars().count() the whole value). Drive a single iterator: take the preview,
  then peek one more char to decide whether to append the truncation marker.
  Byte-identical output.
- make_structured_output_tool_name called .to_lowercase() on
  STRUCTURED_OUTPUT_TOOL_NAME_BASE and each generated candidate before querying
  the HashSet. Both are already lowercase, so those calls were no-op
  allocations; query the set with the borrowed &str / &String directly. The
  per-tool .to_lowercase() that builds the set stays (it normalizes arbitrary
  tool names for case-insensitive comparison).

No behavior change: clippy -D warnings clean (lib + tests), 164 kiro tests pass
(incl. the session-fallback + structured-output tests that exercise both fns).

Addresses PR #12 review findings from gemini-code-assist.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@acking-you acking-you merged commit bc2002d into master May 31, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant