pixelpipe

Browser-driven HTTP API for stock-photo AI generators.

A self-hosted bridge that turns subscription-based image generators into a programmable endpoint.

Quick Start · How It Works · Endpoints · Configuration · Build Journal

The Problem

You pay for an unlimited image generation subscription. The platform doesn't expose a public API. The only way to use it programmatically is to drive the web UI.

You try the obvious approach. Headless Playwright. The site detects automation in milliseconds. You see a paywall instead of generated images, even though you're logged in with an active subscription.

You add cookies. You try Selenium. You try fingerprint patches. The platform fingerprints the browser, the IP, the timing. Every layer leaks a different signal.

The Solution

pixelpipe wraps the entire stack — anti-detection browser, residential proxy, CAPTCHA solver, automated email verification — into a single HTTP API. POST a prompt, get back a URL.

The trick isn't bypassing one defense. It's bypassing all of them at once, every time, without breaking when one slips.

Patchright drives a real Chromium browser inside Xvfb so it never runs headless. CapSolver handles the reCAPTCHA on first login. Gmail IMAP reads the verification code that gets emailed. A residential proxy carries every request. The session persists in a Chromium profile mounted as a Docker volume.

Quick Start

git clone https://github.com/numarulunu/pixelpipe.git
cd pixelpipe
cp .env.example .env
# fill in .env with your credentials
docker compose up -d

Then call it:

curl -X POST http://localhost:9100/generate/image \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A medieval castle at sunset","model":"Auto","aspect_ratio":"16:9"}'

{
  "status": "success",
  "images": [
    {"url": "https://cdn.example.com/...", "index": 0}
  ],
  "model_used": "Auto",
  "generation_time_seconds": 12.3
}

How It Works

flowchart TD
    A["n8n / your script"] -->|"POST /generate/image"| B["pixelpipe API"]
    B --> C{"Session valid?"}
    C -->|Yes| G["Drive the page"]
    C -->|No| D["Auto-login flow"]

    D --> D1["Fill credentials"]
    D1 --> D2["CapSolver solves reCAPTCHA"]
    D2 --> D3["Gmail IMAP reads code"]
    D3 --> D4["Submit verification"]
    D4 --> G

    G --> G1["Select model"]
    G1 --> G2["Set aspect ratio"]
    G1 --> G3["Set image count"]
    G2 --> G4["Type prompt"]
    G3 --> G4
    G4 --> G5["Click Generate"]
    G5 --> H["Poll for new CDN URLs"]
    H --> I["Return JSON"]
    I --> A

    style A fill:#1a1a2e,stroke:#9333ea,color:#fff
    style B fill:#1a1a2e,stroke:#0ea5e9,color:#fff
    style D fill:#2d1a1a,stroke:#e94560,color:#fff
    style I fill:#1a2e1a,stroke:#22c55e,color:#fff

The Five Layers

Layer	What it does	Why it's there
Patchright	Drop-in Playwright fork that patches CDP automation leaks	Vanilla Playwright leaks `navigator.webdriver = true` and is detected instantly
Xvfb + headless=False	Virtual display so Chromium runs in "headed" mode inside Docker	Headless mode has its own fingerprint that even Patchright can't hide
Residential proxy	All traffic routes through a residential IP	The target site allows page loads from datacenter IPs but blocks generation. Static residential, ~$2/month
CapSolver	Solves the reCAPTCHA v2 on the login page	Automated logins always trigger a captcha. Charges ~$0.003 per solve
Gmail IMAP	Reads the verification code emailed on first login from a new browser	The site emails a 6-digit code on every new browser fingerprint. Bot reads it from your inbox in seconds

Drop any one of these and the bot fails in a different way. All five together make the round trip invisible.

The Problems This Solves (in order I hit them)

flowchart LR
    P1["1. Site detects Playwright"] --> P2["2. Headless leaks fingerprint"]
    P2 --> P3["3. Datacenter IP blocked"]
    P3 --> P4["4. Login triggers reCAPTCHA"]
    P4 --> P5["5. New browser triggers email verification"]
    P5 --> P6["6. Session expires every 1-2 weeks"]

    style P1 fill:#2d1a1a,stroke:#e94560,color:#fff
    style P6 fill:#1a2e1a,stroke:#22c55e,color:#fff

Symptom	Root cause	Fix
"Sign up for free" modal even when logged in	Site fingerprints Playwright via CDP commands	Switch to Patchright (Playwright fork that patches CDP leaks)
403 on page load from server	Datacenter IP detection	Route through residential proxy
Login form shows but generation triggers paywall	Headless browser fingerprint	Run Chromium with headless=False inside Xvfb virtual display
Login submit just reloads the page	reCAPTCHA invisible challenge	CapSolver posts the token, bot calls `form.submit()`
Login redirects to verify-account page	Email-based MFA on new browser	Gmail IMAP reads the latest 6-digit code
Bot works once, breaks on next request	Profile not persisted	Persistent Chromium profile mounted as a Docker volume

The full incident log is in BUILD_JOURNAL.txt — every dead end, every wrong assumption, every fix.

Architecture

flowchart TB
    subgraph host ["Docker host"]
        subgraph container ["pixelpipe container"]
            X["Xvfb :99\nVirtual display"]
            C["Chromium\n(via Patchright)"]
            F["FastAPI\nport 9100"]
            X -.provides display.-> C
            F -->|drives| C
        end

        V[("/data volume")]
        C -.profile.-> V
    end

    P["Residential proxy"]
    CS["CapSolver API"]
    G["Gmail IMAP"]
    T["Target site"]

    C -->|all traffic| P
    P --> T
    F -->|reCAPTCHA tokens| CS
    F -->|verification codes| G

    style F fill:#1a1a2e,stroke:#0ea5e9,color:#fff
    style C fill:#1a1a2e,stroke:#9333ea,color:#fff
    style V fill:#1a2e1a,stroke:#22c55e,color:#fff

Endpoints

`POST /generate/image`

Generate an image. Request body:

Field	Required	Default	Description
`prompt`	yes	—	Text prompt
`model`	yes	—	Exact model name as shown in the target site UI
`aspect_ratio`	no	`1:1`	`1:1`, `16:9`, `9:16`, `4:3`, etc.
`count`	no	`1`	1 to 4
`reference_url`	no	—	URL to a reference image (downloaded and uploaded automatically)

Response:

{
  "status": "success",
  "images": [
    {"url": "https://cdn.example.com/...", "index": 0}
  ],
  "model_used": "Auto",
  "generation_time_seconds": 12.3
}

`POST /generate/video`

Same as image, returns video.url instead of images[].

`GET /health`

{"status": "ok", "browser_alive": true, "logged_in": true}

`POST /verify/{code}`

Manual verification code entry. Only needed if Gmail IMAP isn't configured.

`GET /debug/screenshot`

Returns a base64 PNG of the current page state. Useful when something breaks and you need to see what the bot sees.

`GET /debug/controls`

Inspects the live DOM for data-cy attributes around the generation form. Use this when the target site updates their UI and selectors break.

Configuration

All configuration is via environment variables. Copy .env.example to .env and fill in:

Variable	Required	What it does
`FREEPIK_EMAIL`	yes	Account email
`FREEPIK_PASSWORD`	yes	Account password
`CAPSOLVER_API_KEY`	recommended	Solves the reCAPTCHA on auto-login. Without it the bot can only use existing cookies
`PROXY_SERVER`	recommended	Format: `user:pass@host:port`. Required if running on a datacenter (any cloud VPS)
`GMAIL_USER`	recommended	Gmail address that receives verification emails
`GMAIL_APP_PASSWORD`	recommended	Gmail app password (16 chars, no spaces) — not your account password

Where to get the credentials

Service	Where	Cost
CapSolver	capsolver.com — sign up, top up $5	~$0.003 per login
Residential proxy	proxycheap.com, iproyal.com, etc. — get a static residential IP	~$2/month
Gmail App Password	myaccount.google.com/apppasswords — requires 2FA enabled	Free

How the Auto-Login Works

sequenceDiagram
    participant U as Your script
    participant B as pixelpipe
    participant T as Target site
    participant CS as CapSolver
    participant GM as Gmail

    U->>B: POST /generate/image
    B->>T: GET /pikaso (check session)
    T-->>B: Sign in button visible
    Note over B: Not logged in, start auto-login

    B->>T: GET /login
    B->>T: Fill email + password
    B->>T: Click submit
    T-->>B: reCAPTCHA challenge

    B->>CS: POST createTask (sitekey + URL)
    CS-->>B: taskId
    loop poll every 2s
        B->>CS: getTaskResult
    end
    CS-->>B: gRecaptchaResponse token

    B->>T: Inject token + submit form
    T-->>B: Redirect to verify-account

    loop poll every 10s
        B->>GM: IMAP search "authentication code"
    end
    GM-->>B: 6-digit code
    B->>T: Fill code + click verify
    T-->>B: Redirect to dashboard

    Note over B: Session saved to profile volume
    B->>T: Drive the generation flow
    T-->>B: CDN URLs
    B-->>U: JSON response

The first request after a fresh deploy takes ~60 seconds (login + captcha + verify). Every request after that takes ~15-30 seconds because the session is reused from the Chromium profile.

When the Session Expires

The target site invalidates the session every 1-2 weeks. Here's what happens:

flowchart LR
    A["Request comes in"] --> B{"Session\nvalid?"}
    B -->|"Yes (95% of the time)"| C["Generate immediately"]
    B -->|"No"| D["Auto re-login"]
    D --> E["CapSolver solves captcha"]
    E --> F["Gmail reads verify code"]
    F --> C
    C --> G["Return URLs"]

    style D fill:#2d1a1a,stroke:#e94560,color:#fff
    style C fill:#1a2e1a,stroke:#22c55e,color:#fff

Zero human intervention. The bot heals itself.

Costs

Item	Monthly cost
Subscription to the target service	whatever you already pay
Residential proxy (static)	~$2
CapSolver credit	~$0.01 (a few re-logins)
Total extra	~$2/month

For comparison, the equivalent volume on the official API of similar services would cost $20-100/month depending on usage. This is the bridge that makes a flat-rate UI subscription behave like an unlimited API.

Project Structure

pixelpipe/
  app/
    main.py          FastAPI app — endpoints, lifecycle, request queue
    browser.py       Patchright browser lifecycle (launch, restart, close)
    auth.py          Login flow + CapSolver + Gmail IMAP integration
    generator.py     Image and video generation Playwright flows
    config.py        Pydantic Settings (env var loading)
    models.py        Request/response schemas
  scripts/
    extract_cookies.py    One-time manual cookie extraction (fallback)
    refresh_session.py    Local login + push cookies to remote server
  tests/             pytest unit tests for config and models
  Dockerfile         Python 3.12 + Patchright Chromium + Xvfb + dbus
  docker-compose.yml Single service, mounts /data volume
  BUILD_JOURNAL.txt  Complete dev journal — every failure and fix
  .env.example       All env vars with explanations

Maintenance

When the target site updates their UI

Selectors in app/generator.py and app/auth.py may break. Use the /debug/controls endpoint to find the new selectors:

curl http://localhost:9100/debug/controls | jq

It returns every element with a data-cy attribute on the current page. Find the new attribute name for the broken control, update the relevant locator in code.

When auto-login fails

docker logs pixelpipe 2>&1 | tail -50

Look for CapSolver, verify-account, or Gmail errors. Each layer logs what it tried and why it failed.

When you need to start fresh

docker compose down
rm -rf data/patchright-profile data/cookies.json
mkdir -p data/patchright-profile
docker compose up -d

The bot will auto-login on the next request.

FAQ

Will this get me banned?

Possibly. Browser automation against a paid service is technically against most ToS. The bot generates at human-realistic intervals (one request at a time, no parallelism) so it's less detectable than a parallel scraper, but the risk is non-zero. Don't use this for anything critical without a backup.

Why not use the official API?

Most paid image-generation platforms either don't offer an API at all, or charge per-request fees that defeat the purpose of an unlimited subscription. This bridge gives you the cost profile of a subscription with the access pattern of an API.

Can I run this on a Raspberry Pi / cheap VPS?

Yes, but you need at least 2GB RAM (Chromium is hungry) and a residential proxy. Datacenter IPs get blocked. A €5/month Hetzner VPS works fine if you add a residential proxy on top.

What if CapSolver is down?

The bot will fail at the login step and return a 401. Use the POST /verify/{code} endpoint to manually enter the code, or fall back to the extract_cookies.py script which lets you log in manually in a real browser and ship the cookies to the server.

Can I run multiple accounts?

Not in a single instance. The bot is single-tab, single-account by design. Run multiple containers on different ports for multiple accounts.

Why Patchright instead of regular Playwright?

Vanilla Playwright leaks navigator.webdriver = true and several CDP signals. Patchright is a drop-in fork that patches all of them. Same API, same code, just a different import. Without it, the target site shows a paywall on every generation attempt.

The trick isn't bypassing one defense. It's bypassing all of them at once.

MIT License · Build Journal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pixelpipe

Browser-driven HTTP API for stock-photo AI generators.

The Problem

The Solution

Quick Start

How It Works

The Five Layers

The Problems This Solves (in order I hit them)

Architecture

Endpoints

`POST /generate/image`

`POST /generate/video`

`GET /health`

`POST /verify/{code}`

`GET /debug/screenshot`

`GET /debug/controls`

Configuration

Where to get the credentials

How the Auto-Login Works

When the Session Expires

Costs

Project Structure

Maintenance

When the target site updates their UI

When auto-login fails

When you need to start fresh

FAQ

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
BUILD_JOURNAL.txt		BUILD_JOURNAL.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

pixelpipe

Browser-driven HTTP API for stock-photo AI generators.

The Problem

The Solution

Quick Start

How It Works

The Five Layers

The Problems This Solves (in order I hit them)

Architecture

Endpoints

POST /generate/image

POST /generate/video

GET /health

POST /verify/{code}

GET /debug/screenshot

GET /debug/controls

Configuration

Where to get the credentials

How the Auto-Login Works

When the Session Expires

Costs

Project Structure

Maintenance

When the target site updates their UI

When auto-login fails

When you need to start fresh

FAQ

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /generate/image`

`POST /generate/video`

`GET /health`

`POST /verify/{code}`

`GET /debug/screenshot`

`GET /debug/controls`

Packages