Skip to content

numarulunu/pixelpipe

Repository files navigation

pixelpipe

Browser-driven HTTP API for stock-photo AI generators.

A self-hosted bridge that turns subscription-based image generators into a programmable endpoint.


Python 3.12+ Patchright FastAPI Docker License MIT


Quick Start · How It Works · Endpoints · Configuration · Build Journal




The Problem

You pay for an unlimited image generation subscription. The platform doesn't expose a public API. The only way to use it programmatically is to drive the web UI.

You try the obvious approach. Headless Playwright. The site detects automation in milliseconds. You see a paywall instead of generated images, even though you're logged in with an active subscription.

You add cookies. You try Selenium. You try fingerprint patches. The platform fingerprints the browser, the IP, the timing. Every layer leaks a different signal.


The Solution

pixelpipe wraps the entire stack — anti-detection browser, residential proxy, CAPTCHA solver, automated email verification — into a single HTTP API. POST a prompt, get back a URL.

The trick isn't bypassing one defense. It's bypassing all of them at once, every time, without breaking when one slips.

Patchright drives a real Chromium browser inside Xvfb so it never runs headless. CapSolver handles the reCAPTCHA on first login. Gmail IMAP reads the verification code that gets emailed. A residential proxy carries every request. The session persists in a Chromium profile mounted as a Docker volume.




Quick Start

git clone https://github.com/numarulunu/pixelpipe.git
cd pixelpipe
cp .env.example .env
# fill in .env with your credentials
docker compose up -d

Then call it:

curl -X POST http://localhost:9100/generate/image \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A medieval castle at sunset","model":"Auto","aspect_ratio":"16:9"}'
{
  "status": "success",
  "images": [
    {"url": "https://cdn.example.com/...", "index": 0}
  ],
  "model_used": "Auto",
  "generation_time_seconds": 12.3
}



How It Works


flowchart TD
    A["n8n / your script"] -->|"POST /generate/image"| B["pixelpipe API"]
    B --> C{"Session valid?"}
    C -->|Yes| G["Drive the page"]
    C -->|No| D["Auto-login flow"]

    D --> D1["Fill credentials"]
    D1 --> D2["CapSolver solves reCAPTCHA"]
    D2 --> D3["Gmail IMAP reads code"]
    D3 --> D4["Submit verification"]
    D4 --> G

    G --> G1["Select model"]
    G1 --> G2["Set aspect ratio"]
    G1 --> G3["Set image count"]
    G2 --> G4["Type prompt"]
    G3 --> G4
    G4 --> G5["Click Generate"]
    G5 --> H["Poll for new CDN URLs"]
    H --> I["Return JSON"]
    I --> A

    style A fill:#1a1a2e,stroke:#9333ea,color:#fff
    style B fill:#1a1a2e,stroke:#0ea5e9,color:#fff
    style D fill:#2d1a1a,stroke:#e94560,color:#fff
    style I fill:#1a2e1a,stroke:#22c55e,color:#fff
Loading

The Five Layers

Layer What it does Why it's there
Patchright Drop-in Playwright fork that patches CDP automation leaks Vanilla Playwright leaks navigator.webdriver = true and is detected instantly
Xvfb + headless=False Virtual display so Chromium runs in "headed" mode inside Docker Headless mode has its own fingerprint that even Patchright can't hide
Residential proxy All traffic routes through a residential IP The target site allows page loads from datacenter IPs but blocks generation. Static residential, ~$2/month
CapSolver Solves the reCAPTCHA v2 on the login page Automated logins always trigger a captcha. Charges ~$0.003 per solve
Gmail IMAP Reads the verification code emailed on first login from a new browser The site emails a 6-digit code on every new browser fingerprint. Bot reads it from your inbox in seconds

Drop any one of these and the bot fails in a different way. All five together make the round trip invisible.




The Problems This Solves (in order I hit them)

flowchart LR
    P1["1. Site detects Playwright"] --> P2["2. Headless leaks fingerprint"]
    P2 --> P3["3. Datacenter IP blocked"]
    P3 --> P4["4. Login triggers reCAPTCHA"]
    P4 --> P5["5. New browser triggers email verification"]
    P5 --> P6["6. Session expires every 1-2 weeks"]

    style P1 fill:#2d1a1a,stroke:#e94560,color:#fff
    style P6 fill:#1a2e1a,stroke:#22c55e,color:#fff
Loading
Symptom Root cause Fix
"Sign up for free" modal even when logged in Site fingerprints Playwright via CDP commands Switch to Patchright (Playwright fork that patches CDP leaks)
403 on page load from server Datacenter IP detection Route through residential proxy
Login form shows but generation triggers paywall Headless browser fingerprint Run Chromium with headless=False inside Xvfb virtual display
Login submit just reloads the page reCAPTCHA invisible challenge CapSolver posts the token, bot calls form.submit()
Login redirects to verify-account page Email-based MFA on new browser Gmail IMAP reads the latest 6-digit code
Bot works once, breaks on next request Profile not persisted Persistent Chromium profile mounted as a Docker volume

The full incident log is in BUILD_JOURNAL.txt — every dead end, every wrong assumption, every fix.




Architecture


flowchart TB
    subgraph host ["Docker host"]
        subgraph container ["pixelpipe container"]
            X["Xvfb :99\nVirtual display"]
            C["Chromium\n(via Patchright)"]
            F["FastAPI\nport 9100"]
            X -.provides display.-> C
            F -->|drives| C
        end

        V[("/data volume")]
        C -.profile.-> V
    end

    P["Residential proxy"]
    CS["CapSolver API"]
    G["Gmail IMAP"]
    T["Target site"]

    C -->|all traffic| P
    P --> T
    F -->|reCAPTCHA tokens| CS
    F -->|verification codes| G

    style F fill:#1a1a2e,stroke:#0ea5e9,color:#fff
    style C fill:#1a1a2e,stroke:#9333ea,color:#fff
    style V fill:#1a2e1a,stroke:#22c55e,color:#fff
Loading



Endpoints

POST /generate/image

Generate an image. Request body:

Field Required Default Description
prompt yes Text prompt
model yes Exact model name as shown in the target site UI
aspect_ratio no 1:1 1:1, 16:9, 9:16, 4:3, etc.
count no 1 1 to 4
reference_url no URL to a reference image (downloaded and uploaded automatically)

Response:

{
  "status": "success",
  "images": [
    {"url": "https://cdn.example.com/...", "index": 0}
  ],
  "model_used": "Auto",
  "generation_time_seconds": 12.3
}

POST /generate/video

Same as image, returns video.url instead of images[].


GET /health

{"status": "ok", "browser_alive": true, "logged_in": true}

POST /verify/{code}

Manual verification code entry. Only needed if Gmail IMAP isn't configured.


GET /debug/screenshot

Returns a base64 PNG of the current page state. Useful when something breaks and you need to see what the bot sees.


GET /debug/controls

Inspects the live DOM for data-cy attributes around the generation form. Use this when the target site updates their UI and selectors break.




Configuration

All configuration is via environment variables. Copy .env.example to .env and fill in:

Variable Required What it does
FREEPIK_EMAIL yes Account email
FREEPIK_PASSWORD yes Account password
CAPSOLVER_API_KEY recommended Solves the reCAPTCHA on auto-login. Without it the bot can only use existing cookies
PROXY_SERVER recommended Format: user:pass@host:port. Required if running on a datacenter (any cloud VPS)
GMAIL_USER recommended Gmail address that receives verification emails
GMAIL_APP_PASSWORD recommended Gmail app password (16 chars, no spaces) — not your account password

Where to get the credentials

Service Where Cost
CapSolver capsolver.com — sign up, top up $5 ~$0.003 per login
Residential proxy proxycheap.com, iproyal.com, etc. — get a static residential IP ~$2/month
Gmail App Password myaccount.google.com/apppasswords — requires 2FA enabled Free



How the Auto-Login Works

sequenceDiagram
    participant U as Your script
    participant B as pixelpipe
    participant T as Target site
    participant CS as CapSolver
    participant GM as Gmail

    U->>B: POST /generate/image
    B->>T: GET /pikaso (check session)
    T-->>B: Sign in button visible
    Note over B: Not logged in, start auto-login

    B->>T: GET /login
    B->>T: Fill email + password
    B->>T: Click submit
    T-->>B: reCAPTCHA challenge

    B->>CS: POST createTask (sitekey + URL)
    CS-->>B: taskId
    loop poll every 2s
        B->>CS: getTaskResult
    end
    CS-->>B: gRecaptchaResponse token

    B->>T: Inject token + submit form
    T-->>B: Redirect to verify-account

    loop poll every 10s
        B->>GM: IMAP search "authentication code"
    end
    GM-->>B: 6-digit code
    B->>T: Fill code + click verify
    T-->>B: Redirect to dashboard

    Note over B: Session saved to profile volume
    B->>T: Drive the generation flow
    T-->>B: CDN URLs
    B-->>U: JSON response
Loading

The first request after a fresh deploy takes ~60 seconds (login + captcha + verify). Every request after that takes ~15-30 seconds because the session is reused from the Chromium profile.




When the Session Expires

The target site invalidates the session every 1-2 weeks. Here's what happens:

flowchart LR
    A["Request comes in"] --> B{"Session\nvalid?"}
    B -->|"Yes (95% of the time)"| C["Generate immediately"]
    B -->|"No"| D["Auto re-login"]
    D --> E["CapSolver solves captcha"]
    E --> F["Gmail reads verify code"]
    F --> C
    C --> G["Return URLs"]

    style D fill:#2d1a1a,stroke:#e94560,color:#fff
    style C fill:#1a2e1a,stroke:#22c55e,color:#fff
Loading

Zero human intervention. The bot heals itself.




Costs

Item Monthly cost
Subscription to the target service whatever you already pay
Residential proxy (static) ~$2
CapSolver credit ~$0.01 (a few re-logins)
Total extra ~$2/month

For comparison, the equivalent volume on the official API of similar services would cost $20-100/month depending on usage. This is the bridge that makes a flat-rate UI subscription behave like an unlimited API.




Project Structure

pixelpipe/
  app/
    main.py          FastAPI app — endpoints, lifecycle, request queue
    browser.py       Patchright browser lifecycle (launch, restart, close)
    auth.py          Login flow + CapSolver + Gmail IMAP integration
    generator.py     Image and video generation Playwright flows
    config.py        Pydantic Settings (env var loading)
    models.py        Request/response schemas
  scripts/
    extract_cookies.py    One-time manual cookie extraction (fallback)
    refresh_session.py    Local login + push cookies to remote server
  tests/             pytest unit tests for config and models
  Dockerfile         Python 3.12 + Patchright Chromium + Xvfb + dbus
  docker-compose.yml Single service, mounts /data volume
  BUILD_JOURNAL.txt  Complete dev journal — every failure and fix
  .env.example       All env vars with explanations



Maintenance

When the target site updates their UI

Selectors in app/generator.py and app/auth.py may break. Use the /debug/controls endpoint to find the new selectors:

curl http://localhost:9100/debug/controls | jq

It returns every element with a data-cy attribute on the current page. Find the new attribute name for the broken control, update the relevant locator in code.


When auto-login fails

docker logs pixelpipe 2>&1 | tail -50

Look for CapSolver, verify-account, or Gmail errors. Each layer logs what it tried and why it failed.


When you need to start fresh

docker compose down
rm -rf data/patchright-profile data/cookies.json
mkdir -p data/patchright-profile
docker compose up -d

The bot will auto-login on the next request.




FAQ

Will this get me banned?
Possibly. Browser automation against a paid service is technically against most ToS. The bot generates at human-realistic intervals (one request at a time, no parallelism) so it's less detectable than a parallel scraper, but the risk is non-zero. Don't use this for anything critical without a backup.
Why not use the official API?
Most paid image-generation platforms either don't offer an API at all, or charge per-request fees that defeat the purpose of an unlimited subscription. This bridge gives you the cost profile of a subscription with the access pattern of an API.
Can I run this on a Raspberry Pi / cheap VPS?
Yes, but you need at least 2GB RAM (Chromium is hungry) and a residential proxy. Datacenter IPs get blocked. A €5/month Hetzner VPS works fine if you add a residential proxy on top.
What if CapSolver is down?
The bot will fail at the login step and return a 401. Use the POST /verify/{code} endpoint to manually enter the code, or fall back to the extract_cookies.py script which lets you log in manually in a real browser and ship the cookies to the server.
Can I run multiple accounts?
Not in a single instance. The bot is single-tab, single-account by design. Run multiple containers on different ports for multiple accounts.
Why Patchright instead of regular Playwright?
Vanilla Playwright leaks navigator.webdriver = true and several CDP signals. Patchright is a drop-in fork that patches all of them. Same API, same code, just a different import. Without it, the target site shows a paywall on every generation attempt.



The trick isn't bypassing one defense. It's bypassing all of them at once.


MIT License · Build Journal

About

Browser-driven HTTP API for stock-photo AI generators. Patchright + CapSolver + residential proxy + Gmail IMAP. Zero API fees on top of an unlimited subscription.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors