FixtureForge

Agentic test data harness for Python — deterministic in CI, AI-powered in dev.

The problem with test data today

# ❌ Everyone does this. It's brittle and misses real-world edge cases.
user = User(name="Test User", email="test@test.com", bio="Lorem ipsum...")

# ❌ factory_boy is great — but it's static. No surprises, no edge cases.
UserFactory.create(role="admin")

# ❌ Writing 500 of them by hand? Not happening.

Hardcoded fixtures rot. AI-generated fixtures are unpredictable in CI.
FixtureForge solves both — same codebase, two behaviors:

Dev mode  →  AI generates rich, realistic, edge-case-aware fixtures
CI mode   →  same fixtures, frozen with seed=42, 100% reproducible

Quickstart

pip install fixtureforge

from fixtureforge import Forge
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    bio: str

forge = Forge()                    # auto-detects AI provider from env vars
users = forge.create_batch(User, count=50, context="SaaS platform users")

FixtureForge routes each field to the cheapest generator automatically:

id → sequential counter (free)
name, email → Faker (free)
bio → single batched AI call for all 50 records (1 API call, not 50)

No AI key? No problem. Pure Faker mode works out of the box:

forge = Forge(use_ai=False, seed=42)   # deterministic, zero network, CI-safe
users = forge.create_batch(User, count=500)

Intelligent Field Routing

Every field is classified into a tier. Only semantic fields hit the AI:

Tier	Fields	Generator	Cost
Structural	`id`, `user_id`, `order_id`	Counters + FK registry	Free
Standard	`name`, `email`, `phone`, `address`	Faker	Free
Computed	`@computed_field`	Pydantic	Free
Semantic	`bio`, `description`, `review`, `message`	LLM (batched)	API tokens

100 users with 2 semantic fields = 2 API calls, not 200.

CI/CD — zero config changes between environments

# .github/workflows/test.yml
- name: Run tests
  env:
    FORGE_SEED: 42        # identical output every run
    # No AI key needed — FixtureForge auto-detects and falls back to Faker
  run: pytest

In dev, export any provider key and AI kicks in automatically:

export ANTHROPIC_API_KEY=sk-ant-...   # → Claude
export OPENAI_API_KEY=sk-...          # → GPT
export GOOGLE_API_KEY=...             # → Gemini
export GROQ_API_KEY=...               # → Groq (fast + cheap)

Foreign Key Relationships — automatic

# Step 1: generate customers
customers = forge.create_batch(Customer, count=10)

# Step 2: Order.customer_id auto-resolves to a real customer.id
orders = forge.create_batch(Order, count=100)
# → every order.customer_id is valid. No manual wiring.

DataSwarms — bulk generation, shared cache

Generate multiple models in parallel. The first model warms the AI cache;
every subsequent model inherits it — ~90% cheaper per additional model.

results = forge.swarm(
    models=[User, Order, Product, Payment],
    counts=[10,   50,    100,     30],
    contexts=["SaaS users", "E-commerce orders", None, None],
)
# {
#   "User":    [...10 users...],
#   "Order":   [...50 orders...],
#   "Product": [...100 products...],
#   "Payment": [...30 payments...],
# }

5 models ≈ cost of 1.5 models.

ForgeMemory — fixtures that remember your domain

forge.memory.add_rule("financial", "Users under 18 get restricted account type")
forge.memory.add_rule("user", "Israeli phone numbers use format 05x-xxx-xxxx")
forge.memory.add_rule("orders", "Max 3 active loans per customer at any time")

# Rules inject into AI prompts automatically on every generation call
users = forge.create_batch(User, count=50, context="Israeli SaaS platform")

Rules survive across sessions. Update a rule — next call respects it immediately.
Skeptical Memory validates stored rules against the live schema before every call.

pytest plugin — one line per fixture

# conftest.py
from fixtureforge.pytest_plugin import forge_fixture, forge_swarm_fixture
from pydantic import BaseModel

class User(BaseModel):
    id: int; name: str; email: str

forge_fixture(User, count=10, seed=42)           # → fixture: "users"
forge_swarm_fixture([User, Order], counts=[5, 20], seed=42)  # → "swarm_data"

# test_users.py
def test_signup(users):
    for user in users:
        assert "@" in user.email

def test_full_flow(swarm_data):
    users  = swarm_data["User"]
    orders = swarm_data["Order"]

The forge fixture is auto-available in every test with zero config.

Multi-provider support

# Be explicit
forge = Forge(provider_name="anthropic", model="claude-haiku-4-5-20251001")
forge = Forge(provider_name="openai",    model="gpt-4o-mini")
forge = Forge(provider_name="gemini",    model="gemini-2.0-flash")
forge = Forge(provider_name="groq",      model="llama-3.3-70b-versatile")
forge = Forge(provider_name="ollama",    model="llama3.2")   # local, zero cost
forge = Forge(use_ai=False)                                   # pure Faker

Large datasets — constant AI cost regardless of count

# Seed + Interpolation: generates ~1 000 unique AI values, tiles to 100 000
dataset = forge.create_large(Order, count=100_000, seed_ratio=0.01)

# Streaming — one record at a time, never loads all into memory
for user in forge.create_stream(User, count=1_000_000, filename="users.json"):
    process(user)

Export

from fixtureforge.core.exporter import DataExporter

users = forge.create_batch(User, count=100)
DataExporter.to_json(users, "users.json")
DataExporter.to_csv(users,  "users.csv")
DataExporter.to_sql(users,  "users.sql", table_name="users")

FixtureForge vs alternatives

	FixtureForge	factory_boy	Faker	hypothesis
AI-powered context	✅	❌	❌	❌
Deterministic (seed=)	✅	✅	✅	✅
FK relationships	Auto	Manual	❌	❌
Batched AI calls	✅	❌	❌	❌
Coverage gap analysis	✅	❌	❌	Partial
Large datasets (100k+)	✅	Manual	Manual	❌
pytest plugin	✅	❌	❌	❌
Multi-LLM support	✅	❌	❌	❌
Permission gates	✅	❌	❌	❌
CI-safe (zero network)	✅	✅	✅	✅

FixtureForge is not a replacement for Faker — it uses Faker internally for standard fields.
It adds the layer between "I need realistic data" and "I need it to feel like production."

Installation

# Core (deterministic mode, no AI)
pip install fixtureforge

# With your preferred provider
pip install "fixtureforge[anthropic]"   # Claude
pip install "fixtureforge[openai]"      # GPT
pip install "fixtureforge[gemini]"      # Gemini
pip install "fixtureforge[all]"         # All providers

Requirements: Python 3.11+ · pydantic ≥ 2.5 · faker ≥ 22.0

Enterprise Edition

For teams with compliance requirements — GDPR, SOC2, HIPAA, multi-tenant SaaS:

	Community	Enterprise
AI generation + Faker	✅	✅
pytest plugin	✅	✅
Deterministic seeding	✅	✅
Cryptographic Provenance Envelope	❌	✅
PII Airgap — fail-closed scanner	❌	✅
Contextual Tenant Enclaves	❌	✅
Cross-tenant FK violation detection	❌	✅
Presidio / custom scanner support	❌	✅

from fixtureforge.enterprise import ForgeEnterprise

forge = ForgeEnterprise(use_ai=False)
users = forge.create_batch(User, count=10)

users[0].model_dump()
# { "id": 1, "name": "...", "bio": "...",
#   "forge_metadata": {
#     "forge_id": "abc-123",
#     "provenance_hash": "sha256:029773ed...",  ← immutable audit stamp
#     "tenant_id": "tenant-acme",
#     "source": "faker",
#     ...
#   }
# }

with forge.isolate_tenant("tenant-acme"):
    acme_users = forge.create_batch(User, count=5)
    # FK references from tenant-acme can NEVER resolve to tenant-xyz records

Access: yaniv2809@gmail.com

Project status

Component	Status
Core (`Forge`, `create_batch`)	✅ Stable
DataSwarms	✅ Stable
ForgeMemory	✅ Stable
pytest plugin	✅ v2.2.0
Anthropic / OpenAI / Gemini / Groq / Ollama	✅
`assert_semantic_match`	✅ v2.2.0
`SmartFailureAnalyzer`	✅ v2.2.0
Enterprise Edition	✅ (access by request)
ForgeDream (coverage analysis)	🔜 Feature-flagged
Async support	🔜 Planned

Links

Docs: https://yaniv2809.github.io/fixtureforge/
PyPI: https://pypi.org/project/fixtureforge/
Repository: https://github.com/Yaniv2809/fixtureforge
Issues: https://github.com/Yaniv2809/fixtureforge/issues
Discussion: #1

Contributing

Issues and PRs welcome.

git clone https://github.com/Yaniv2809/fixtureforge
cd fixtureforge
pip install -e ".[dev]"
PYTHONPATH=src python -m pytest tests/

License

If FixtureForge saved you time, give it a ⭐ — it helps others find it.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/fixtureforge		src/fixtureforge
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
docs-requirements.txt		docs-requirements.txt
fixtureforge-logo.png		fixtureforge-logo.png
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements-all.txt		requirements-all.txt
requirements-anthropic.txt		requirements-anthropic.txt
requirements-dev.txt		requirements-dev.txt
requirements-gemini.txt		requirements-gemini.txt
requirements-openai.txt		requirements-openai.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FixtureForge

The problem with test data today

Quickstart

Intelligent Field Routing

CI/CD — zero config changes between environments

Foreign Key Relationships — automatic

DataSwarms — bulk generation, shared cache

ForgeMemory — fixtures that remember your domain

pytest plugin — one line per fixture

Multi-provider support

Large datasets — constant AI cost regardless of count

Export

FixtureForge vs alternatives

Installation

Enterprise Edition

Project status

Links

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FixtureForge

The problem with test data today

Quickstart

Intelligent Field Routing

CI/CD — zero config changes between environments

Foreign Key Relationships — automatic

DataSwarms — bulk generation, shared cache

ForgeMemory — fixtures that remember your domain

pytest plugin — one line per fixture

Multi-provider support

Large datasets — constant AI cost regardless of count

Export

FixtureForge vs alternatives

Installation

Enterprise Edition

Project status

Links

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages