Replace motleycrew with local structured_output_with_retries; drop langchain by ZmeiGorynych · Pull Request #1 · MotleyAI/schema_gen

ZmeiGorynych · 2026-04-16T11:19:21Z

Summary

Replace the motleycrew dependency with a local, in-tree structured_output_with_retries helper, and drop langchain-core / langchain-openai entirely. The new helper is built on instructor + litellm + tenacity — the same stack motleycrew was wrapping — and accepts a plain LiteLLM model string (e.g. openai/gpt-4o) instead of a LangChain BaseLanguageModel instance.

Also purges leftover storyline.schema_gen.* imports from when this package was extracted from the storyline monorepo (they would have broken a clean install), and adds a small offline test suite.

What changed

Dependencies (pyproject.toml):

Removed: motleycrew, langchain-core, langchain-openai
Added: instructor >= 1.15.0, litellm == 1.82.6 (exact pin — see security note), tenacity >= 9.0.0

LLM call site (schema_gen/llm_generator.py):

generate_sqlalchemy_models_from_prompt(prompt, language_model: BaseLanguageModel = None) → generate_sqlalchemy_models_from_prompt(prompt, model: str = \"anthropic/claude-sonnet-4-5-20250929\", max_attempts: int = 3)
Builds messages=[{\"role\": \"user\", \"content\": rendered_prompt}] and calls the local helper with retry_exceptions=(ValidationError, ValueError) so that both Pydantic field-validation failures and ORM-conversion failures from post_process_schema are retried.

Helper (schema_gen/structured_output_with_retries.py):

Replaced reraise=True with the default (no reraise), so exhausting retries raises tenacity.RetryError with __cause__ set to the last exception. The existing except RetryError → raise RuntimeError from e.__cause__ block now actually fires.
Replaced the lowercase-any annotations with typing.Any.
Added an import-time shim — see below.

Instructor import shim: instructor >= 1.10 eagerly imports every provider shim (openai, anthropic, mistralai, bedrock, cohere, …) from instructor.providers.__init__. When an optional provider's expected top-level symbol isn't present — e.g. mistralai 2.x no longer exposes a top-level Mistral class — `import instructor` fails even though we only use `from_litellm`. The shim retries the import; on each `ImportError`, it parses the message and installs a dummy stub for the missing name or module, then retries. This handles future provider breakage automatically. See 567-labs/instructor#2205 (still open) and the closed PR #2206.

Security note on litellm pin: versions `1.82.7` and `1.82.8` were compromised by the TeamPCP supply-chain attack (March 2026) and exfiltrated cloud credentials, SSH keys, and Kubernetes secrets on import. `1.82.6` is exact-pinned (`==`) rather than caret-pinned (`^`) so those releases cannot be resolved. See the LiteLLM security advisory.

Namespace cleanup: removed stale `from storyline.schema_gen.* import ...` lines in `models.py`, `converter.py`, and `validators.py` (leftovers from extraction; the `storyline` package isn't declared as a dependency and wasn't resolvable).

Docs + example: `README.md` and `example.py` updated to reflect the new LiteLLM-based API — `ChatOpenAI(...)` → pass `model="openai/gpt-4o"` and let litellm read `OPENAI_API_KEY` from the environment.

Tests

New `tests/` directory with 9 unit tests, all offline (no LLM calls):

`tests/conftest.py` — `FakeInstructorClient` + `patch_instructor` fixture that monkeypatches `instructor.from_litellm` and honours the passed-in `tenacity.Retrying` so the real retry policy drives the fake.
`tests/test_structured_output_with_retries.py` — success path, retry on `ValidationError` then success, retry on callable's `ValueError` (Pydantic wraps it as `ValidationError`) then success, exhaustion → `RuntimeError` with chained cause, non-retryable custom exception propagates untouched.
`tests/test_llm_generator.py` — `post_process_schema` passthrough and nullable-primary-key correction; `generate_sqlalchemy_models_from_prompt` wires the right args to the helper; default model is used when unspecified.

Test plan

`poetry lock` regenerates cleanly with the new dependency set
`poetry install` into the project `.venv/` succeeds
`poetry run pytest -v` — all 9 tests pass in ~6s
`poetry run python -c "import schema_gen; ..."` — package imports cleanly (~2s cold)
`grep -rE "storyline|langchain|motleycrew" schema_gen/ example.py pyproject.toml README.md` — no hits
(not run here; needs a real API key) `poetry run python example.py` — end-to-end smoke test

Summary by CodeRabbit

Breaking Changes
- Schema-generation API signature changed to accept a model string and max attempts; failure exception updated to RuntimeError — update call sites.
New Features
- Added a local structured-output helper with retry behavior (configurable model and attempts).
Documentation
- README and examples updated to show model-string-driven LiteLLM usage and streamlined env var guidance.
Chores
- Project dependencies and test config updated for new LLM tooling.
Tests
- New unit and integration tests covering validation, retries, and end-to-end generation.

…ngchain Removes the `motleycrew` and `langchain-core` / `langchain-openai` runtime dependencies. The only thing the project used from motleycrew was its `structured_output_with_retries` helper; that helper is now built in-tree on top of `instructor` + `litellm` + `tenacity`, which is what motleycrew itself was wrapping. The old LangChain `BaseLanguageModel` argument is replaced with a plain LiteLLM model string (e.g. `openai/gpt-4o`). Also: * Purge leftover `storyline.schema_gen.*` imports in `models.py`, `converter.py`, and `validators.py` — the package was extracted from `storyline` but kept referencing the parent namespace, which broke imports on a clean install. * Pin `litellm == 1.82.6` (exact). Versions 1.82.7 and 1.82.8 were compromised by the TeamPCP supply-chain attack (March 2026) and must not be resolvable from the constraint. * Add an import-time shim for `instructor` >= 1.10, which eagerly imports every provider module at top level. When an optional provider's top-level symbol isn't available (e.g. `mistralai` 2.x no longer exposes top-level `Mistral`), `import instructor` fails even though we only use `from_litellm`. The shim retries the import and installs a stub for whatever name/module is reported missing, so future provider breakage is handled automatically. See 567-labs/instructor#2205. * New test suite (9 tests, all offline) covering the helper's success and retry paths and the generator's wiring to it. Uses a fake instructor client that honours the passed-in tenacity retry policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-16T11:19:35Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56989aac-3340-4744-a00d-9cf5b38d9fef

📥 Commits

Reviewing files that changed from the base of the PR and between 3505f8b and 1326e7e.

📒 Files selected for processing (1)

README.md

📝 Walkthrough

Walkthrough

Replaces LangChain/MotleyCrew LLM plumbing with a LiteLLM + instructor-based flow: generate_sqlalchemy_models_from_prompt(prompt, model: str = DEFAULT_MODEL, max_attempts: int = 3) now drives schema generation via schema_gen.structured_output_with_retries(...); updates imports, docs, example, dependencies, and tests.

Changes

Cohort / File(s)	Summary
Documentation & README `README.md`	Replaced LangChain examples with LiteLLM/instructor usage; rename API to `generate_sqlalchemy_models_from_prompt(...)`, swap `language_model`+`max_retries` for `model`+`max_attempts`, change documented failure to `RuntimeError`, and adjust usage and dependency guidance.
Project Config `pyproject.toml`	Removed LangChain/motleycrew deps; added `instructor (>=1.15.0)`, `litellm (>=1.83.0)`, `tenacity (>=9.0.0)`; added pytest `integration` marker and default exclude for integration tests.
Core LLM Generator `schema_gen/llm_generator.py`	Reworked to accept `model: str` and `max_attempts`; render prompt into a single user-role message; call `structured_output_with_retries(..., validation_callable=post_process_schema, retry_exceptions=(ValidationError, ValueError))`; added `DEFAULT_MODEL` and adjusted logging.
New Retry/Parsing Module `schema_gen/structured_output_with_retries.py`	New module implementing `structured_output_with_retries(...)` using `instructor.from_litellm`, a `WrappedSchema` invoking `validation_callable` in `model_post_init`, tenacity-based retry loop driven by `max_attempts`/`retry_exceptions`, import-time shim to stub missing instructor provider imports, and raising `RuntimeError` on retry exhaustion with chained cause.
Package API & Imports `schema_gen/__init__.py`, `schema_gen/models.py`, `schema_gen/converter.py`, `schema_gen/validators.py`	Updated example in `__init__.py`. Normalized internal imports from `storyline.schema_gen.` → `schema_gen.`. Minor import-path changes only.
Example Script `example.py`	Removed LangChain/OpenAI client construction; check `OPENAI_API_KEY` presence and pass LiteLLM-style `model` string (e.g., `openai/gpt-4.1-mini`) to `generate_sqlalchemy_models_from_prompt(model=...)`.
Tests — Fixtures & Generator `tests/conftest.py`, `tests/test_llm_generator.py`	Added `conftest.py` with `patch_instructor` fixture and fake instructor client (`FakeInstructorClient`, `_FakeCompletions`); new tests for `post_process_schema` and that `generate_sqlalchemy_models_from_prompt()` forwards `schema`, `model`, `max_attempts`, `validation_callable`, `retry_exceptions`, and embeds the prompt as a single user message.
Tests — Structured Output `tests/test_structured_output_with_retries.py`	New tests covering success, validation-error retries, validation-callable exceptions, retry exhaustion raising `RuntimeError` with chained `__cause__`, and non-retryable exception propagation; asserts completion call counts and args.
Tests — Integration `tests/test_integration.py`	Added integration test exercising prompt → schema → ORM → SQLite lifecycle; marked `pytest.mark.integration`, requires `OPENAI_API_KEY`, includes `in_memory_engine` fixture and `test_full_cycle`.

Sequence Diagram

sequenceDiagram
    participant Client as Client\n(generate_sqlalchemy_models_from_prompt)
    participant Retry as tenacity.Retrying
    participant Instructor as instructor.from_litellm()
    participant LLM as LiteLLM API
    participant Validator as post_process_schema

    Client->>Client: Render prompt into single user message
    Client->>Retry: Enter retry loop (max_attempts)
    Retry->>Instructor: request completion(model, messages, response_model)
    Instructor->>LLM: send provider request
    LLM-->>Instructor: return completion payload
    Instructor->>Instructor: parse into pydantic schema (response_model)
    Instructor->>Validator: call validation_callable(schema)
    alt validation_callable succeeds
        Validator-->>Retry: return validated schema
        Retry-->>Client: return result
    else validation_callable raises (ValidationError/ValueError)
        Instructor-->>Retry: raise -> retry loop continues
    end
    alt retries exhausted
        Retry-->>Client: raise RuntimeError (with last exception as __cause__)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hopped through code where prompts take flight,

LiteLLM and Instructor guide the night.
Tenacity retries keep schemas bright,
Tests check the burrows, snug and tight.
A little rabbit cheers this tidy bite.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Replace motleycrew with local structured_output_with_retries; drop langchain' directly and accurately reflects the primary changes in the PR: removing motleycrew and langchain dependencies while introducing a local structured_output_with_retries helper.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch remove-motleycrew-langchain

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

schema_gen/llm_generator.py (1)
140-142: Redundant to_orm_classes() call.

The to_orm_classes() method is called here as a "final sanity check," but it was already called in post_process_schema (line 81) which is the validation_callable. Since post_process_schema already validates the schema by attempting ORM conversion, this second call is redundant and could be removed to avoid unnecessary computation.
♻️ Suggested simplification
-    # Convert the validated schema to ORM classes (also acts as a final sanity check)
-    logger.info("Converting validated schema to ORM classes")
-    orm_classes = schema_result.to_orm_classes()
-
-    logger.info(
-        f"Successfully generated {len(orm_classes)} ORM model classes: {list(orm_classes.keys())}"
-    )
+    logger.info(
+        f"Successfully generated schema with {len(schema_result.tables)} tables: "
+        f"{[t.name for t in schema_result.tables]}"
+    )
     return schema_result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@schema_gen/llm_generator.py` around lines 140 - 142, Remove the redundant ORM
conversion by deleting the second call to schema_result.to_orm_classes() in the
final conversion step: since post_process_schema already calls to_orm_classes()
as part of the validation_callable, avoid re-invoking to_orm_classes() here;
keep the logger.info("Converting validated schema to ORM classes") if you want
the message, but do not call schema_result.to_orm_classes() again—rely on the
validated schema returned by post_process_schema instead (references:
post_process_schema, schema_result, to_orm_classes, logger).
schema_gen/structured_output_with_retries.py (1)
76-76: Inconsistent default model string.

The default model here is "anthropic/claude-3-5-sonnet-20241022", but llm_generator.py uses DEFAULT_MODEL = "anthropic/claude-sonnet-4-5-20250929". While callers typically pass an explicit model, having different defaults could cause confusion. Consider aligning these or importing DEFAULT_MODEL from llm_generator.
♻️ Suggested fix
-    model: str = "anthropic/claude-3-5-sonnet-20241022",
+    model: str = "anthropic/claude-sonnet-4-5-20250929",
Also update line 169 in the __main__ example block to match.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@schema_gen/structured_output_with_retries.py` at line 76, The default model
string in structured_output_with_retries.py (parameter name model) is
inconsistent with llm_generator.py's DEFAULT_MODEL; import DEFAULT_MODEL from
llm_generator and use it as the default for the model parameter (replace the
hard-coded "anthropic/claude-3-5-sonnet-20241022"), and update the __main__
example block (the example model usage around the existing example at line ~169)
to reference DEFAULT_MODEL as well so both places stay in sync.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 19-22: The dependency pin for litellm in pyproject.toml currently
forces "==1.82.6"; update this to a modern secure constraint such as ">=1.83.0"
(or a specific recent 1.83.x release) to adopt the new secure CI/CD fixes while
avoiding the compromised 1.82.7/1.82.8 releases; locate the litellm entry in
pyproject.toml and replace the exact pin with the recommended range or specific
version, then run your dependency install/lock step to verify the updated
resolver picks the intended version.

In `@README.md`:
- Line 323: Update the README entry for the "instructor" dependency which
currently lists "(>=1.6.0)" to reflect the correct minimum version "(>=1.15.0)";
locate the line containing the bullet "**instructor** (>=1.6.0) - Structured
output wrapper" and change the version constraint to "(>=1.15.0)" so the
documentation matches the PR's dependency requirement.

---

Nitpick comments:
In `@schema_gen/llm_generator.py`:
- Around line 140-142: Remove the redundant ORM conversion by deleting the
second call to schema_result.to_orm_classes() in the final conversion step:
since post_process_schema already calls to_orm_classes() as part of the
validation_callable, avoid re-invoking to_orm_classes() here; keep the
logger.info("Converting validated schema to ORM classes") if you want the
message, but do not call schema_result.to_orm_classes() again—rely on the
validated schema returned by post_process_schema instead (references:
post_process_schema, schema_result, to_orm_classes, logger).

In `@schema_gen/structured_output_with_retries.py`:
- Line 76: The default model string in structured_output_with_retries.py
(parameter name model) is inconsistent with llm_generator.py's DEFAULT_MODEL;
import DEFAULT_MODEL from llm_generator and use it as the default for the model
parameter (replace the hard-coded "anthropic/claude-3-5-sonnet-20241022"), and
update the __main__ example block (the example model usage around the existing
example at line ~169) to reference DEFAULT_MODEL as well so both places stay in
sync.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2e5ccbc-3ce9-40c5-89b3-a027649e7a8f

📥 Commits

Reviewing files that changed from the base of the PR and between fb52b19 and 7142609.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

README.md
example.py
pyproject.toml
schema_gen/__init__.py
schema_gen/converter.py
schema_gen/llm_generator.py
schema_gen/models.py
schema_gen/structured_output_with_retries.py
schema_gen/validators.py
tests/__init__.py
tests/conftest.py
tests/test_llm_generator.py
tests/test_structured_output_with_retries.py

…ai/gpt-4.1-mini - Upgrade litellm from ==1.82.6 to >=1.83.0 (compromised versions removed from PyPI) - Fix README instructor version (>=1.6.0 -> >=1.15.0) to match pyproject.toml - Remove redundant to_orm_classes() call in generate_sqlalchemy_models_from_prompt - Change default model from anthropic/claude-sonnet to openai/gpt-4.1-mini everywhere - Add integration test with real LLM call (pytest -m integration) - Configure pytest to skip integration tests by default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 51-54: Remove the duplicated sentence "the corresponding provider
API key must be set in the environment" from the quick-start snippet so it
appears only once; keep a single instance (the one followed by the example in
parentheses) and delete the accidental copy to avoid the repeated API-key note
in README.md.

In `@schema_gen/llm_generator.py`:
- Around line 123-125: The code currently logs the raw user prompt via
logger.info("User prompt: {prompt}") in llm_generator.py; change this to avoid
emitting sensitive user-supplied data at INFO by moving the full prompt log to
DEBUG (logger.debug) and/or redacting it before logging, and instead keep a
non-sensitive INFO-level message (e.g., "Received user prompt" with length or a
masked preview). Update the logging calls that reference prompt and model (the
logger.info lines) so that only non-sensitive metadata stays at INFO and the
full prompt is only logged at DEBUG or not at all.

In `@schema_gen/structured_output_with_retries.py`:
- Around line 40-56: The ImportError handling block that uses
_CANNOT_IMPORT_NAME_RE and _NO_MODULE_RE is too permissive; add an explicit
allowlist check (e.g., SAFE_TO_STUB_PROVIDERS = {...}) and only perform the
sys.modules mutation and setattr(type(...)) when the resolved module_name or
attr_name is in that allowlist; otherwise re-raise the ImportError. Update the
code paths that create a placeholder module (where sys.modules[module_name] =
types.ModuleType(module_name)) and where setattr(target, attr_name, type(...))
to be guarded by this allowlist so unrelated ImportErrors are not silently
stubbed.
- Around line 102-107: WrappedSchema.model_post_init overrides Pydantic's hook
but doesn't call the parent hook, which drops any existing post-init behavior
from wrapped schemas; update WrappedSchema.model_post_init to invoke
super().model_post_init(__context) (preserving __context) and then set
object.__setattr__(self, "_validation_result", validation_callable(self)) so the
original initialization/validation chain runs before/after you capture the
callable's result; reference the WrappedSchema class, its model_post_init
method, the _validation_result attribute, and validation_callable when making
the change.

In `@tests/test_integration.py`:
- Around line 15-31: Update test_full_cycle to explicitly require an OpenAI
provider: at start of the test, check os.environ.get("OPENAI_API_KEY") and call
pytest.skip("OPENAI_API_KEY not set") when absent; then call
generate_sqlalchemy_models_from_prompt with an explicit model argument (pass
model=DEFAULT_MODEL or a concrete OpenAI model string) instead of relying
implicitly on the global default so changes to DEFAULT_MODEL won't silently
break the integration test; reference test_full_cycle and
generate_sqlalchemy_models_from_prompt (and import DEFAULT_MODEL if you choose
to use it).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dcb4d301-1a55-4f5a-a661-053a4d214d1b

📥 Commits

Reviewing files that changed from the base of the PR and between 7142609 and 3505f8b.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

README.md
example.py
pyproject.toml
schema_gen/__init__.py
schema_gen/llm_generator.py
schema_gen/structured_output_with_retries.py
tests/test_integration.py

✅ Files skipped from review due to trivial changes (2)

schema_gen/init.py
pyproject.toml

coderabbitai · 2026-04-18T14:07:44Z

    logger.info("Starting SQLAlchemy model generation from prompt")
    logger.info(f"User prompt: {prompt}")
-    logger.info(
-        f"Language model: {language_model.__class__.__name__ if language_model else 'None'}"
-    )
-
-    prompt_template = """You are a database schema designer. Generate a complete database schema based on the user's requirements:
-
-    {prompt}
-
-    For each table, specify:
-    - Table name and comment (description)
-    - Columns with types, constraints, and comments
-    - Foreign keys with proper references and ON DELETE/UPDATE actions
-    - Indexes for frequently queried columns
-    - Unique constraints EXACTLY for all primary keys and no other columns
-
-    Use appropriate data types:
-    - INTEGER, BIGINT, SMALLINT for whole numbers
-    - STRING (with length), TEXT for strings
-    - FLOAT, NUMERIC (with precision/scale) for decimals
-    - BOOLEAN for true/false
-    - DATE, DATETIME, TIMESTAMP for dates/times
-    - JSON, JSONB for structured data
-    - UUID for unique identifiers
-
-    Ensure referential integrity:
-    - Foreign keys reference existing tables
-    - Primary keys are defined
-    - Avoid circular non-nullable foreign key dependencies
-
-    Don't insert uniqueness constraints except for primary keys.
-    Provide clear, descriptive comments for tables and all columns.
-    """
-
-    prompt_template = PromptTemplate.from_template(prompt_template).format(prompt=prompt)
-    logger.info("Generated prompt template for LLM")
+    logger.info(f"Model: {model}")


⚠️ Potential issue | 🟠 Major

Avoid logging the raw prompt at INFO.

The prompt is user-supplied input and can easily contain sensitive schema or business details. Emitting it at INFO copies that data into routine logs by default; log metadata instead, or move the full prompt behind DEBUG with explicit redaction.

Suggested fix

- logger.info("Starting SQLAlchemy model generation from prompt") - logger.info(f"User prompt: {prompt}") - logger.info(f"Model: {model}") + logger.info( + "Starting SQLAlchemy model generation from prompt; model=%s prompt_chars=%d", + model, + len(prompt), + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info("Starting SQLAlchemy model generation from prompt")

logger.info(f"User prompt: {prompt}")

logger.info(

f"Language model: {language_model.__class__.__name__ if language_model else 'None'}"

)

prompt_template = """You are a database schema designer. Generate a complete database schema based on the user's requirements:

{prompt}

For each table, specify:

- Table name and comment (description)

- Columns with types, constraints, and comments

- Foreign keys with proper references and ON DELETE/UPDATE actions

- Indexes for frequently queried columns

- Unique constraints EXACTLY for all primary keys and no other columns

Use appropriate data types:

- INTEGER, BIGINT, SMALLINT for whole numbers

- STRING (with length), TEXT for strings

- FLOAT, NUMERIC (with precision/scale) for decimals

- BOOLEAN for true/false

- DATE, DATETIME, TIMESTAMP for dates/times

- JSON, JSONB for structured data

- UUID for unique identifiers

Ensure referential integrity:

- Foreign keys reference existing tables

- Primary keys are defined

- Avoid circular non-nullable foreign key dependencies

Don't insert uniqueness constraints except for primary keys.

Provide clear, descriptive comments for tables and all columns.

"""

prompt_template = PromptTemplate.from_template(prompt_template).format(prompt=prompt)

logger.info("Generated prompt template for LLM")

logger.info(f"Model: {model}")

logger.info(

"Starting SQLAlchemy model generation from prompt; model=%s prompt_chars=%d",

model,

len(prompt),

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@schema_gen/llm_generator.py` around lines 123 - 125, The code currently logs the raw user prompt via logger.info("User prompt: {prompt}") in llm_generator.py; change this to avoid emitting sensitive user-supplied data at INFO by moving the full prompt log to DEBUG (logger.debug) and/or redacting it before logging, and instead keep a non-sensitive INFO-level message (e.g., "Received user prompt" with length or a masked preview). Update the logging calls that reference prompt and model (the logger.info lines) so that only non-sensitive metadata stays at INFO and the full prompt is only logged at DEBUG or not at all.

coderabbitai · 2026-04-18T14:07:44Z

+        except ImportError as exc:
+            msg = str(exc)
+            m = _CANNOT_IMPORT_NAME_RE.search(msg)
+            if m:
+                attr_name, module_name = m.group(1), m.group(2)
+                try:
+                    target = importlib.import_module(module_name)
+                except ImportError:
+                    target = types.ModuleType(module_name)
+                    sys.modules[module_name] = target
+                setattr(target, attr_name, type(attr_name, (), {}))
+                continue
+            m = _NO_MODULE_RE.search(msg)
+            if m:
+                module_name = m.group(1)
+                sys.modules[module_name] = types.ModuleType(module_name)
+                continue


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "structured_output_with_retries.py" -type f

Repository: MotleyAI/schema_gen

Length of output: 109

🏁 Script executed:

git ls-files | grep -E "(schema_gen|structured_output)" | head -20

Repository: MotleyAI/schema_gen

Length of output: 316

🏁 Script executed:

fd -type f -name "*.py" | grep -i instructor | head -20

Repository: MotleyAI/schema_gen

Length of output: 233

🏁 Script executed:

cat -n schema_gen/structured_output_with_retries.py

Repository: MotleyAI/schema_gen

Length of output: 8362

🏁 Script executed:

cat -n tests/test_structured_output_with_retries.py

Repository: MotleyAI/schema_gen

Length of output: 5980

🏁 Script executed:

rg "sys\.modules|_import_instructor|optional" schema_gen/ tests/ -A 3 -B 1

Repository: MotleyAI/schema_gen

Length of output: 3301

Add an allowlist guard for safe-to-stub provider imports.

The regex patterns (_CANNOT_IMPORT_NAME_RE and _NO_MODULE_RE) are too broad. They will silently stub any ImportError matching those message shapes, not just instructor's optional provider failures. This risks masking real bugs in unrelated code. Wrap the stubbing at lines 43–56 with an explicit allowlist check before mutating sys.modules.

Suggested guard

+def _is_known_optional_provider_import(module_name: str, attr_name: str | None = None) -> bool: + # Keep this intentionally narrow — only stub known optional provider imports. + known_modules = { + # fill with specific optional provider modules that are safe to stub + } + known_symbols = { + # ("mistralai", "Mistral"), + } + return module_name in known_modules or (module_name, attr_name) in known_symbols + def _import_instructor_with_stubs(max_shims: int = 20): @@ if m: attr_name, module_name = m.group(1), m.group(2) + if not _is_known_optional_provider_import(module_name, attr_name): + raise try: target = importlib.import_module(module_name) except ImportError: @@ if m: module_name = m.group(1) + if not _is_known_optional_provider_import(module_name): + raise sys.modules[module_name] = types.ModuleType(module_name) continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@schema_gen/structured_output_with_retries.py` around lines 40 - 56, The ImportError handling block that uses _CANNOT_IMPORT_NAME_RE and _NO_MODULE_RE is too permissive; add an explicit allowlist check (e.g., SAFE_TO_STUB_PROVIDERS = {...}) and only perform the sys.modules mutation and setattr(type(...)) when the resolved module_name or attr_name is in that allowlist; otherwise re-raise the ImportError. Update the code paths that create a placeholder module (where sys.modules[module_name] = types.ModuleType(module_name)) and where setattr(target, attr_name, type(...)) to be guarded by this allowlist so unrelated ImportErrors are not silently stubbed.

coderabbitai · 2026-04-18T14:07:44Z

+    class WrappedSchema(schema):  # type: ignore[valid-type]
+        _validation_result: Any = None
+
+        def model_post_init(self, __context: Any) -> None:
+            # Store the callable's return value so we can retrieve it after .create()
+            object.__setattr__(self, "_validation_result", validation_callable(self))


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "structured_output_with_retries.py" 2>/dev/null | head -5

Repository: MotleyAI/schema_gen

Length of output: 45

🏁 Script executed:

git ls-files | grep -i "structured_output_with_retries"

Repository: MotleyAI/schema_gen

Length of output: 152

🏁 Script executed:

cat -n schema_gen/structured_output_with_retries.py | head -150

Repository: MotleyAI/schema_gen

Length of output: 7433

🏁 Script executed:

cat -n schema_gen/structured_output_with_retries.py | tail -50

Repository: MotleyAI/schema_gen

Length of output: 1998

🏁 Script executed:

cat -n tests/test_structured_output_with_retries.py

Repository: MotleyAI/schema_gen

Length of output: 5980

🏁 Script executed:

rg "model_post_init" --type py

Repository: MotleyAI/schema_gen

Length of output: 500

🏁 Script executed:

rg -A 5 "class.*BaseModel" tests/test_structured_output_with_retries.py

Repository: MotleyAI/schema_gen

Length of output: 179

🏁 Script executed:

rg -B 2 -A 10 "def model_post_init" .

Repository: MotleyAI/schema_gen

Length of output: 1223

🏁 Script executed:

cat -n tests/conftest.py

Repository: MotleyAI/schema_gen

Length of output: 3136

🏁 Script executed:

rg -B 5 -A 5 "will also invoke WrappedSchema.model_post_init" tests/conftest.py

Repository: MotleyAI/schema_gen

Length of output: 653

🏁 Script executed:

python3 << 'EOF' # Let's check what Pydantic's BaseModel.model_post_init() signature looks like # and whether it needs to be called import sys from pydantic import BaseModel from inspect import signature # Check the signature and docstring sig = signature(BaseModel.model_post_init) print(f"BaseModel.model_post_init signature: {sig}") print(f"BaseModel.model_post_init docstring: {BaseModel.model_post_init.__doc__}") # Create a simple example to show what happens when super() is not called class Parent(BaseModel): x: int def model_post_init(self, __context): print("Parent.model_post_init called") object.__setattr__(self, "_parent_initialized", True) class ChildWithoutSuper(Parent): y: int def model_post_init(self, __context): print("Child.model_post_init called (no super)") object.__setattr__(self, "_child_initialized", True) class ChildWithSuper(Parent): y: int def model_post_init(self, __context): print("Child.model_post_init called (with super)") super().model_post_init(__context) object.__setattr__(self, "_child_initialized", True) print("\n--- Test 1: Child WITHOUT super() ---") child1 = ChildWithoutSuper(x=1, y=2) print(f"Has _parent_initialized: {hasattr(child1, '_parent_initialized')}") print(f"Has _child_initialized: {hasattr(child1, '_child_initialized')}") print("\n--- Test 2: Child WITH super() ---") child2 = ChildWithSuper(x=1, y=2) print(f"Has _parent_initialized: {hasattr(child2, '_parent_initialized')}") print(f"Has _child_initialized: {hasattr(child2, '_child_initialized')}") EOF

Repository: MotleyAI/schema_gen

Length of output: 692

🏁 Script executed:

# Let's create a more realistic scenario matching the actual use case python3 << 'EOF' from pydantic import BaseModel, field_validator # This mimics a real-world schema that has model_post_init class SchemaWithPostInit(BaseModel): name: str value: int def model_post_init(self, __context): print(f"SchemaWithPostInit.model_post_init: initializing internal state") # Simulate some post-initialization that depends on the entire model object.__setattr__(self, "_initialized", True) `@property` def is_initialized(self): return getattr(self, "_initialized", False) # Simulate WrappedSchema WITHOUT super() call class WrappedSchemaBad(SchemaWithPostInit): _validation_result = None def model_post_init(self, __context): print(f"WrappedSchemaBad.model_post_init: running validation") object.__setattr__(self, "_validation_result", f"validation for {self.name}") # Simulate WrappedSchema WITH super() call class WrappedSchemaGood(SchemaWithPostInit): _validation_result = None def model_post_init(self, __context): print(f"WrappedSchemaGood.model_post_init: calling super first") super().model_post_init(__context) print(f"WrappedSchemaGood.model_post_init: running validation") object.__setattr__(self, "_validation_result", f"validation for {self.name}") print("=== Bad (no super()) ===") bad_instance = WrappedSchemaBad(name="test", value=42) print(f"is_initialized: {bad_instance.is_initialized}") print(f"_validation_result: {bad_instance._validation_result}") print("\n=== Good (with super()) ===") good_instance = WrappedSchemaGood(name="test", value=42) print(f"is_initialized: {good_instance.is_initialized}") print(f"_validation_result: {good_instance._validation_result}") EOF

Repository: MotleyAI/schema_gen

Length of output: 465

🏁 Script executed:

# Let's verify that the current tests don't actually test a schema with model_post_init python3 << 'EOF' from pydantic import BaseModel class SampleSchema(BaseModel): """Minimal Pydantic schema used by these tests.""" name: str count: int # Check if it has a custom model_post_init print("SampleSchema has custom model_post_init:", hasattr(SampleSchema, 'model_post_init') and SampleSchema.model_post_init is not BaseModel.model_post_init) # The issue would only manifest if someone passes a schema that has model_post_init class SchemaWithValidation(BaseModel): name: str def model_post_init(self, __context): # Example: custom validation or initialization if len(self.name) < 2: raise ValueError("Name too short") print("\nSchemaWithValidation has custom model_post_init:", SchemaWithValidation.model_post_init is not BaseModel.model_post_init) EOF

Repository: MotleyAI/schema_gen

Length of output: 164

🏁 Script executed:

python3 << 'EOF' from pydantic import BaseModel import sys # Simulate the actual WrappedSchema behavior from the code def test_wrapped_schema_without_super(): """Reproduce the bug: schema with model_post_init loses its initialization""" class UserSchema(BaseModel): """A schema that depends on model_post_init for proper initialization""" name: str def model_post_init(self, __context): print("UserSchema.model_post_init called - setting up internal state") object.__setattr__(self, "_is_valid", True) `@property` def is_valid(self): return getattr(self, "_is_valid", False) # This is how WrappedSchema is currently implemented (NO super()) class WrappedSchemaCurrently(UserSchema): _validation_result = None def model_post_init(self, __context): # Current implementation - NO super() call object.__setattr__(self, "_validation_result", f"validated {self.name}") print("Creating instance with current implementation...") instance = WrappedSchemaCurrently(name="Alice") print(f" is_valid: {instance.is_valid} (SHOULD BE True)") print(f" _validation_result: {instance._validation_result}") if not instance.is_valid: print("\n✗ BUG CONFIRMED: UserSchema's initialization was skipped!") return False return True test_wrapped_schema_without_super() EOF

Repository: MotleyAI/schema_gen

Length of output: 243

Preserve the wrapped schema's model_post_init() by calling super().

WrappedSchema overrides Pydantic's post-init hook but never calls super().model_post_init(__context). Any schema passed into this helper that already relies on model_post_init() will silently lose that initialization/validation path. The current tests only use SampleSchema, which has no custom model_post_init(), so this bug goes undetected.

Suggested fix

class WrappedSchema(schema): # type: ignore[valid-type] _validation_result: Any = None def model_post_init(self, __context: Any) -> None: + super().model_post_init(__context) # Store the callable's return value so we can retrieve it after .create() object.__setattr__(self, "_validation_result", validation_callable(self))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@schema_gen/structured_output_with_retries.py` around lines 102 - 107, WrappedSchema.model_post_init overrides Pydantic's hook but doesn't call the parent hook, which drops any existing post-init behavior from wrapped schemas; update WrappedSchema.model_post_init to invoke super().model_post_init(__context) (preserving __context) and then set object.__setattr__(self, "_validation_result", validation_callable(self)) so the original initialization/validation chain runs before/after you capture the callable's result; reference the WrappedSchema class, its model_post_init method, the _validation_result attribute, and validation_callable when making the change.

coderabbitai · 2026-04-18T14:07:44Z

+pytestmark = pytest.mark.integration
+
+
+@pytest.fixture
+def in_memory_engine():
+    engine = create_engine("sqlite:///:memory:")
+    yield engine
+    engine.dispose()
+
+
+def test_full_cycle(in_memory_engine):
+    """Generate a schema from a prompt, convert to ORM, create tables in SQLite."""
+    schema = generate_sqlalchemy_models_from_prompt(
+        prompt="Create a schema with a users table (id, username, email) "
+        "and a posts table (id, title, body, user_id foreign key to users).",
+        max_attempts=3,
+    )


⚠️ Potential issue | 🟡 Minor

Make the integration test explicit about its provider dependency.

This test assumes OpenAI credentials, but it actually relies on DEFAULT_MODEL staying on an OpenAI model. If that default changes, pytest -m integration will fail for the wrong reason. Pass the model explicitly here and skip cleanly when OPENAI_API_KEY is absent.

Suggested fix

+import os + import pytest from sqlalchemy import create_engine, inspect @@ def test_full_cycle(in_memory_engine): """Generate a schema from a prompt, convert to ORM, create tables in SQLite.""" + if not os.getenv("OPENAI_API_KEY"): + pytest.skip("OPENAI_API_KEY is required for integration tests") + schema = generate_sqlalchemy_models_from_prompt( prompt="Create a schema with a users table (id, username, email) " "and a posts table (id, title, body, user_id foreign key to users).", + model="openai/gpt-4.1-mini", max_attempts=3, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_integration.py` around lines 15 - 31, Update test_full_cycle to explicitly require an OpenAI provider: at start of the test, check os.environ.get("OPENAI_API_KEY") and call pytest.skip("OPENAI_API_KEY not set") when absent; then call generate_sqlalchemy_models_from_prompt with an explicit model argument (pass model=DEFAULT_MODEL or a concrete OpenAI model string) instead of relying implicitly on the global default so changes to DEFAULT_MODEL won't silently break the integration test; reference test_full_cycle and generate_sqlalchemy_models_from_prompt (and import DEFAULT_MODEL if you choose to use it).

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread README.md Outdated

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Update README.md

1326e7e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace motleycrew with local structured_output_with_retries; drop langchain#1

Replace motleycrew with local structured_output_with_retries; drop langchain#1
ZmeiGorynych wants to merge 3 commits into
mainfrom
remove-motleycrew-langchain

ZmeiGorynych commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZmeiGorynych commented Apr 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Tests

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZmeiGorynych commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading