refactor(llm-access-kiro): split anthropic/converter.rs into submodules by acking-you · Pull Request #12 · acking-you/static_flow

acking-you · 2026-05-31T01:00:25Z

What

Split the 6071-line llm-access-kiro/src/anthropic/converter.rs into a
converter/mod.rs facade plus 15 concern-focused submodules. Pure structural
move — no logic changes. Next step in the incremental llm-access* refactor
(follows #10 stream, #7 cache_sim, #4 request).

Layout

The facade (mod.rs) keeps the shared data model (ConversionResult,
NormalizedRequest + its private fields, SessionTracking, the Session*/
Tool* events, ConversionError), module consts, the invalid_request
helper, and the integration tests. Each conversion concern moves to a
descendant submodule — descendants read the parent's private struct fields, so
no field was made public.

Submodule	Concern
`model`	model-id mapping + context-window sizing
`schema`	JSON-schema normalization + multimodal-compat checks
`document` / `image`	attachment normalization + format detection
`tool_name`	tool-name/tool-use-id sanitization, aliasing, dup-id rewrite
`tools`	tool-definition conversion + structured-output injection
`tool_result`	tool-result text/attachment extraction
`tool_pairing`	tool-use/tool-result pairing validation + orphan pruning
`session`	conversation-id resolution from metadata
`identity`	model-identity probes / Claude Code identity + billing-header strip
`thinking`	thinking-prefix generation + injection
`system`	system-prompt assembly
`validate`	up-front request validation
`normalize`	per-message + tool-schema normalization (`normalize_request`)
`convert`	history/current-turn assembly into a Kiro `ConversationState`

Module style (per the agreed standard)

mod.rs convention; no use super::* in submodules — explicit origin
imports. Anthropic types via crate::anthropic::types:: (submodule super
is now converter, not anthropic), wire types via crate::wire::.
No pub use X::* globs: mod.rs re-exports only the 8 entry-point fns
used outside the module, by name. The public data-model types stay defined in
the facade, so the converter::X surface (consumed by llm-access
provider.rs / provider/kiro_error.rs and the stream submodule) is
unchanged.
Minimal visibility: a moved fn is pub only if referenced from another
module (45 of 96); the rest stay private.
convert_normalized_request_with_validation keeps pub(crate). Cross-module
helper imports used only by #[cfg(test)] code are #[cfg(test)]-gated so
the lib build stays warning-clean.
Tests inlined: 7 self-contained unit tests move into their owning submodule;
the 70 integration tests stay in mod.rs against the public API.

Verification

Item inventory diff vs original: 0 dropped, 0 added (the lone "added"
hit was a doc-comment prose match — media-type canonicalization).
#[test] count 77 = 77.
cargo clippy -p llm-access-kiro --all-targets -- -D warnings: clean.
cargo test -p llm-access-kiro: 164 passed.
Reverse-dep cargo build -p llm-access: compiles against the unchanged
public surface.
rustfmt on the 16 new files only; deps/lance & deps/lancedb stayed clean.

🤖 Generated with Claude Code

…r/ submodules Break the 6071-line anthropic/converter.rs into a converter/mod.rs facade plus 15 concern-focused submodules, following the agreed module style. Pure structural move — no logic changes. The facade (mod.rs) keeps the shared data model (ConversionResult, NormalizedRequest, SessionTracking, the Session*/Tool* events, ConversionError, NormalizedRequest's private fields, etc.), the module consts, the invalid_request helper, and the 70 integration tests. Each conversion concern moves to a descendant submodule (descendants read the parent's private struct fields, so no field was made public): - model: model-id mapping + per-model context-window sizing. - schema: JSON-schema normalization + multimodal-compat keyword checks. - document / image: attachment normalization + format detection. - tool_name: tool-name/tool-use-id sanitization, aliasing, dup-id rewriting. - tools: tool-definition conversion + structured-output tool injection. - tool_result: tool-result text/attachment extraction. - tool_pairing: tool-use/tool-result pairing validation + orphan pruning. - session: conversation-id resolution from metadata. - identity: model-identity probes / Claude Code identity + billing-header strip. - thinking: thinking-prefix generation + injection. - system: system-prompt assembly. - validate: up-front request validation. - normalize: per-message + tool-schema normalization (normalize_request). - convert: history/current-turn assembly into a Kiro ConversationState. Module style: - mod.rs convention; no `use super::*` in submodules — explicit origin imports. Anthropic types are reached via `crate::anthropic::types::` (submodule `super` is now `converter`, not `anthropic`), wire types via `crate::wire::`. - No `pub use X::*` globs: mod.rs re-exports only the 8 entry-point fns used outside the module, by name (map_model, get_context_window_size, normalize_request, convert_normalized_request_with_resolved_session, current_user_message_range, preview_session_value, resolve_conversation_id_from_metadata, extract_tool_result_content). The public data-model types stay defined in the facade, so the converter::X surface (consumed by llm-access provider.rs/provider/kiro_error.rs and the stream submodule) is unchanged. - Minimal visibility: a moved fn is `pub` only if referenced from another module (45 of 96); the rest stay private. `convert_normalized_request_with_validation` keeps `pub(crate)`. Cross-module helper imports used only by `#[cfg(test)]` code are `#[cfg(test)]`-gated so the lib build stays warning-clean. - Tests inlined: 7 self-contained unit tests move into their owning submodule (so the private items they exercise need no extra exposure); the 70 integration tests stay in mod.rs against the public API. No behavior change: item inventory diff vs the original shows zero dropped and zero added items, #[test] count 77 = 77; clippy -D warnings clean (lib + tests), 164 kiro tests pass, and the reverse-dep llm-access binary compiles against the unchanged public surface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a comprehensive conversion pipeline to translate Anthropic MessagesRequest payloads into Kiro ConversationState structures, handling message normalization, document deduplication, model mapping, and tool-use pairing. The review feedback highlights several critical improvements for robustness and performance, including avoiding potential panics from byte-based string slicing on non-UTF-8 boundaries, strictly validating UUIDs using the uuid crate, optimizing string traversal in session previews, and eliminating redundant string allocations during tool name generation.

gemini-code-assist · 2026-05-31T01:02:38Z

+    let pos = user_id.find("session_")?;
+    let session_part = &user_id[pos + 8..];
+    if session_part.len() < 36 {
+        return None;
+    }
+
+    let uuid = &session_part[..36];
+    is_valid_uuid(uuid).then(|| uuid.to_string())


Slicing session_part[..36] directly can panic if the index 36 does not lie on a UTF-8 character boundary (e.g., if the string contains multi-byte characters). To prevent panics on non-character boundaries, avoid byte-based slicing and use character-based truncation instead.

let pos = user_id.find("session_")?; let session_part = &user_id[pos + 8..]; let uuid: String = session_part.chars().take(36).collect(); (uuid.chars().count() == 36 && is_valid_uuid(&uuid)).then(|| uuid)

References

In Rust, avoid using String::truncate with byte indices on strings that may contain multi-byte UTF-8 characters to prevent panics on non-character boundaries. Use character-based truncation like chars().take(limit).collect() instead.

Same as the resolve_conversation_id_from_metadata thread — deferring to a dedicated fix PR. Note this particular fn is #[cfg(test)] (test-only), so of the two slice sites it is the lower-priority one; the production path is resolve_conversation_id_from_metadata.

gemini-code-assist · 2026-05-31T01:02:38Z

+    let session_part = &user_id[pos + 8..];
+    if session_part.len() < 36 {
+        return generated_fallback(
+            SessionFallbackReason::InvalidLegacySessionId,
+            None,
+            user_id_preview,
+        );
+    }
+
+    let uuid = &session_part[..36];


Slicing session_part[..36] directly can panic if the index 36 does not lie on a UTF-8 character boundary. To prevent panics on non-character boundaries, avoid byte-based slicing and use character-based truncation instead.

Suggested change

let session_part = &user_id[pos + 8..];

if session_part.len() < 36 {

return generated_fallback(

SessionFallbackReason::InvalidLegacySessionId,

None,

user_id_preview,

);

}

let uuid = &session_part[..36];

let session_part = &user_id[pos + 8..];

let uuid: String = session_part.chars().take(36).collect();

if uuid.chars().count() != 36 {

return generated_fallback(

SessionFallbackReason::InvalidLegacySessionId,

None,

user_id_preview,

);

}

References

In Rust, avoid using String::truncate with byte indices on strings that may contain multi-byte UTF-8 characters to prevent panics on non-character boundaries. Use character-based truncation like chars().take(limit).collect() instead.

Valid pre-existing concern, but holding for a separate fix PR. This PR is a behavior-preserving structural split (verified by item-inventory diff + the same unchanged tests), and a panic-to-fallback guard is a behavior change that does not belong in it — the slice site here was relocated verbatim, not introduced.

Worth flagging: the suggested session_part.chars().take(36) also changes byte semantics to char semantics (equal only because UUIDs are ASCII). The byte-preserving guard is session_part.get(..36). That deserves its own PR with a regression test for non-UTF-8 / multibyte user_id input.

gemini-code-assist · 2026-05-31T01:02:39Z

+pub fn is_valid_uuid(value: &str) -> bool {
+    value.len() == 36 && value.chars().filter(|ch| *ch == '-').count() == 4
+}


The current implementation of is_valid_uuid only checks the length and the count of hyphens, which is extremely loose and can accept invalid UUID strings (e.g., all hyphens). Since the uuid crate is already imported, we should use Uuid::parse_str to perform a strict and robust validation.

Suggested change

pub fn is_valid_uuid(value: &str) -> bool {

value.len() == 36 && value.chars().filter(|ch| *ch == '-').count() == 4

}

pub fn is_valid_uuid(value: &str) -> bool {

Uuid::parse_str(value).is_ok()

}

Holding for the same follow-up PR. Tightening to Uuid::parse_str is a behavior change — it rejects strings the current 36-char / 4-hyphen check accepts — so it needs its own PR plus a check that upstream session_id values are always strict UUIDs before narrowing acceptance. Out of scope for this behavior-preserving structural move.

gemini-code-assist · 2026-05-31T01:02:39Z

+pub fn preview_session_value(value: &str) -> String {
+    let mut preview = value
+        .chars()
+        .take(SESSION_SOURCE_PREVIEW_MAX_CHARS)
+        .collect::<String>();
+    if value.chars().count() > SESSION_SOURCE_PREVIEW_MAX_CHARS {
+        preview.push_str("...[truncated]");
+    }
+    preview
+}


The preview_session_value function currently traverses the string twice: once to collect the preview characters, and once to count all characters in the string via value.chars().count(). This can be inefficient for very long strings. We can optimize this by using the iterator directly to check if there are any remaining characters.

pub fn preview_session_value(value: &str) -> String { let mut chars = value.chars(); let mut preview: String = chars.by_ref().take(SESSION_SOURCE_PREVIEW_MAX_CHARS).collect(); if chars.next().is_some() { preview.push_str("...[truncated]"); } preview }

Fixed in 34c9b57. Single iterator now: take the preview, then peek one more char (chars.next().is_some()) to decide the truncation marker — no second chars().count() pass. Byte-identical output.

gemini-code-assist · 2026-05-31T01:02:39Z

+    if !existing.contains(&STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_lowercase()) {
+        return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string();
+    }
+    for suffix in 1.. {
+        let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}");
+        if !existing.contains(&candidate.to_lowercase()) {
+            return candidate;
+        }


Since STRUCTURED_OUTPUT_TOOL_NAME_BASE is a lowercase constant ("sf_emit_structured_output"), calling .to_lowercase() on it and the generated candidate string is redundant and causes unnecessary allocations. We can directly query the HashSet with the lowercase constant and the candidate string.

Suggested change

if !existing.contains(&STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_lowercase()) {

return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string();

}

for suffix in 1.. {

let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}");

if !existing.contains(&candidate.to_lowercase()) {

return candidate;

}

if !existing.contains(STRUCTURED_OUTPUT_TOOL_NAME_BASE) {

return STRUCTURED_OUTPUT_TOOL_NAME_BASE.to_string();

}

for suffix in 1.. {

let candidate = format!("{STRUCTURED_OUTPUT_TOOL_NAME_BASE}_{suffix}");

if !existing.contains(&candidate) {

return candidate;

}

}

Fixed in 34c9b57. STRUCTURED_OUTPUT_TOOL_NAME_BASE and the {BASE}_{suffix} candidates are already lowercase, so those .to_lowercase() calls were no-op allocations — now querying the HashSet with the borrowed &str/&String directly. The per-tool .to_lowercase() that builds the set stays (it normalizes arbitrary tool names for case-insensitive comparison).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f50d18829

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T01:04:57Z

+
+use std::collections::{BTreeMap, HashMap};
+
+pub use convert::{convert_normalized_request_with_resolved_session, current_user_message_range};


Re-export convert_request from the facade

convert_request was a public function on anthropic::converter before this split, but the new facade only re-exports other entry points while convert_request now lives inside the private convert submodule. Because anthropic/mod.rs exposes pub mod converter, any caller using llm_access_kiro::anthropic::converter::convert_request can no longer compile after this refactor even though the function still exists internally; include it in this pub use to preserve the previous public surface.

Useful? React with 👍 / 👎.

This one does not hold. convert_request is #[cfg(test)]-gated (it was in the original file too), so it does not exist in non-test builds and was never part of the public surface — no downstream caller can name llm_access_kiro::anthropic::converter::convert_request, and a workspace grep finds no external (non-test) caller. The real entry points (convert_normalized_request_with_resolved_session, normalize_request, current_user_message_range, extract_tool_result_content, etc.) are re-exported by name, and the reverse-dep llm-access build compiles green against the unchanged surface. So there is nothing to re-export here.

…/tools Two behavior-preserving cleanups flagged in PR #12 review (both on code this PR only relocated, not introduced): - preview_session_value traversed the string twice (collect the preview, then chars().count() the whole value). Drive a single iterator: take the preview, then peek one more char to decide whether to append the truncation marker. Byte-identical output. - make_structured_output_tool_name called .to_lowercase() on STRUCTURED_OUTPUT_TOOL_NAME_BASE and each generated candidate before querying the HashSet. Both are already lowercase, so those calls were no-op allocations; query the set with the borrowed &str / &String directly. The per-tool .to_lowercase() that builds the set stays (it normalizes arbitrary tool names for case-insensitive comparison). No behavior change: clippy -D warnings clean (lib + tests), 164 kiro tests pass (incl. the session-fallback + structured-output tests that exercise both fns). Addresses PR #12 review findings from gemini-code-assist. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

acking-you merged commit bc2002d into master May 31, 2026
3 checks passed

acking-you mentioned this pull request May 31, 2026

refactor(llm-access-core): split store.rs into submodules #13

Open


		use std::collections::{BTreeMap, HashMap};

		pub use convert::{convert_normalized_request_with_resolved_session, current_user_message_range};

Conversation

acking-you commented May 31, 2026

What

Layout

Module style (per the agreed standard)

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

acking-you May 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

acking-you May 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

acking-you May 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

acking-you May 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

acking-you May 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

acking-you May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant