feat(provider): AGENTMEMORY_DISABLE_THINKING to recover structured output on Qwen3/GLM/Kimi by efenex · Pull Request #569 · rohitg00/agentmemory

efenex · 2026-05-20T15:24:20Z

Summary

Adds an opt-in env var `AGENTMEMORY_DISABLE_THINKING=true` that forces thinking mode OFF on hybrid-reasoning models served behind OpenAI-compatible endpoints (Qwen3 family, GLM, Kimi, DeepSeek V4-Flash, etc.).

Motivation

When the configured `OPENAI_BASE_URL` points at a hybrid-reasoning model (e.g. Qwen3.6-35B-A3B-FP8 via vLLM, or a serverless provider hosting Qwen3-32B), every chat completion burns input tokens on a `...` block before the actual answer. Worse, structured-output prompts (graph extraction, XML/JSON-mode summarization, `mem::compress`, `mem::summarize`'s XML schema) frequently truncate inside the thinking block — `message.content` comes back empty and `message.reasoning` carries the half-finished meandering, which the existing parsers can't recover into a valid summary/graph entity.

I hit this concretely while bulk-rebuilding the graph (~8k nodes / ~5k edges across observations + memories) against a local Qwen3.6 vLLM, where ~30% of chunks failed to parse due to mid-thinking truncation. Same pattern documented in another OSS project's `llm_engine.py:6207-6260` as the "$7 Qwen3-32B incident".

What it does

Two signals, belt-and-suspenders:

Server-side: send `chat_template_kwargs.enable_thinking=false` in the chat completions request body. Honored by vLLM, SGLang, llama.cpp's OpenAI-compatible server, and most thinking-aware hosting layers.
Client-side fallback: prefix `/no_think` to the system message. This is the convention Qwen3 and friends recognize even when the server doesn't pass through `chat_template_kwargs`.

Both fire only when `AGENTMEMORY_DISABLE_THINKING=true` is set. Default off; existing setups with non-thinking models are unaffected.

Test plan

`npm test` passes (existing OpenAI provider tests still green; no new tests added — the change is a body-shape additive)
With `AGENTMEMORY_DISABLE_THINKING=true` and a Qwen3 endpoint, `mem::compress` returns parseable XML on the first attempt instead of intermittently truncating mid-think
With the env var unset, request body is unchanged (verified by inspection)

Sister to fix(embeddings): allow custom OpenAI embedding path #467 / fix(embed): allow separate OPENAI_EMBEDDING_BASE_URL + OPENAI_EMBEDDING_API_KEY #503 / fix(embeddings): allow custom OpenAI embedding path #467 — operator-side knobs for OpenAI-compatible deployments with non-standard transports or backends.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added environment variable configuration to control AI reasoning behavior for hybrid reasoning models, providing operational flexibility without code modifications.

…mode When AGENTMEMORY_DISABLE_THINKING=true, the OpenAI-compatible provider forces thinking mode OFF on hybrid-reasoning models (Qwen3 family, GLM, Kimi, DeepSeek V4-Flash). Without this, every call burns tokens on a <think>...</think> block before the actual answer, and structured- output prompts (graph extraction, XML/JSON-mode summarization, etc.) often truncate inside the thinking block — yielding empty `content` and a meandering `reasoning` field that parsers can't recover. Belt-and-suspenders: send `chat_template_kwargs.enable_thinking=false` as the server-side signal AND prefix `/no_think` to the system message as the client-side fallback (same pattern as gitops-assistant's llm_engine.py:6207-6260, which has a documented "$7 Qwen3-32B incident" from missing this signal). The env var is opt-in (default off), so existing setups with thinking- required models are unaffected. Operators running Qwen3.x / GLM / Kimi behind an OpenAI-compatible endpoint (vLLM, LM Studio, Ollama, etc.) can set this to recover deterministic structured-output behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-20T15:24:26Z

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-05-20T15:24:34Z

📝 Walkthrough

Walkthrough

OpenAI provider now conditionally disables "thinking" for hybrid reasoning models. When AGENTMEMORY_DISABLE_THINKING=true, the system prompt is prefixed with /no_think and enable_thinking=false is added to the request body; otherwise, the original prompt is sent unchanged.

Changes

Thinking Disable Configuration

Layer / File(s)	Summary
Conditional thinking disable in request construction `src/providers/openai.ts`	The `call()` method reads `AGENTMEMORY_DISABLE_THINKING` and computes an effective system prompt. When disabled, it prefixes `/no_think` and adds `chat_template_kwargs.enable_thinking=false`; otherwise uses the original prompt.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

rohitg00/agentmemory#307: Also updates src/providers/openai.ts with conditional thinking disable logic for the same OpenAI provider request-building path.

Poem

A thought blocker springs to life,
/no_think calms the strife.
When flags flip true, minds rest—
Reasoning models run their best. 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: introducing AGENTMEMORY_DISABLE_THINKING feature for OpenAI provider to fix structured output issues on specific hybrid-reasoning models.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

src/providers/openai.ts (2)
93-94: ⚡ Quick win

Use getEnvVar() for consistency with rest of file.

Line 94 accesses the env var directly via process.env, while all other env var reads in this file use the getEnvVar() helper (lines 62, 63, 66, 171, 175). This breaks the abstraction and bypasses any normalization or mocking that getEnvVar may provide.
♻️ Proposed fix
     const disableThinking =
-      (process.env["AGENTMEMORY_DISABLE_THINKING"] || "").toLowerCase() === "true";
+      (getEnvVar("AGENTMEMORY_DISABLE_THINKING") || "").toLowerCase() === "true";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/openai.ts` around lines 93 - 94, The code reads
AGENTMEMORY_DISABLE_THINKING directly via process.env for the disableThinking
constant; replace that direct access with the existing helper getEnvVar to keep
env normalization/mocking consistent (use
getEnvVar("AGENTMEMORY_DISABLE_THINKING", "") and then apply the .toLowerCase()
=== "true" check) so the disableThinking assignment uses getEnvVar instead of
process.env.
80-92: 💤 Low value

Consider trimming the extended comment.

The comment provides valuable context about why this feature exists (truncation issues, the referenced incident). However, the coding guidelines suggest avoiding comments explaining WHAT in favor of clear naming. The first half (lines 80-86) documents the problem well; lines 87-92 about "belt-and-suspenders" could be shortened since the code itself shows both signals are applied.

As per coding guidelines: "Avoid code comments explaining WHAT — use clear naming instead".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/openai.ts` around lines 80 - 92, Trim the extended block
comment to keep the concise explanation of the problem and removal of redundant
detail: keep the first sentence(s) that describe why
AGENTMEMORY_DISABLE_THINKING exists (token waste / truncation) and remove the
extra "belt-and-suspenders" paragraph; ensure the remaining comment still
mentions the two signals by name (AGENTMEMORY_DISABLE_THINKING,
chat_template_kwargs.enable_thinking=false and the '/no_think' system prefix) so
the intent is clear from naming and the code (no additional incident history or
long rationale).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/providers/openai.ts`:
- Around line 93-94: The code reads AGENTMEMORY_DISABLE_THINKING directly via
process.env for the disableThinking constant; replace that direct access with
the existing helper getEnvVar to keep env normalization/mocking consistent (use
getEnvVar("AGENTMEMORY_DISABLE_THINKING", "") and then apply the .toLowerCase()
=== "true" check) so the disableThinking assignment uses getEnvVar instead of
process.env.
- Around line 80-92: Trim the extended block comment to keep the concise
explanation of the problem and removal of redundant detail: keep the first
sentence(s) that describe why AGENTMEMORY_DISABLE_THINKING exists (token waste /
truncation) and remove the extra "belt-and-suspenders" paragraph; ensure the
remaining comment still mentions the two signals by name
(AGENTMEMORY_DISABLE_THINKING, chat_template_kwargs.enable_thinking=false and
the '/no_think' system prefix) so the intent is clear from naming and the code
(no additional incident history or long rationale).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86909026-1531-4b08-aed6-3dc9e8f625b1

📥 Commits

Reviewing files that changed from the base of the PR and between 93d1bdd and ed03f59.

📒 Files selected for processing (1)

src/providers/openai.ts

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(provider): AGENTMEMORY_DISABLE_THINKING to recover structured output on Qwen3/GLM/Kimi#569

feat(provider): AGENTMEMORY_DISABLE_THINKING to recover structured output on Qwen3/GLM/Kimi#569
efenex wants to merge 1 commit into
rohitg00:mainfrom
efenex:feat/v3-e-disable-thinking

efenex commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efenex commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What it does

Test plan

Related

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

efenex commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading