webba

Lightweight web search: three complementary providers, any-URL fetching.

Install

pip install webba
scrapling install   # one-time: fetch browser binaries for the stealth/dynamic fetch tiers

Quick Start

from webba import search, fetch

# Search — zero API keys needed (SearXNG by default)
results = search('python asyncio tutorial', n=5)
print(results.to_md())

# Fetch any URL as clean text
text = fetch('https://github.com/AnswerDotAI/ContextKit/blob/main/contextkit/read.py')

CLI

webba "python asyncio" --n 5 --fmt md
webba "latest AI news" --provider searxng --fmt json
webba --purge-cache
webba --start-searxng
webba --stop-searxng
webba --chrome-debug          # launch an isolated Chrome for login-gated search

Providers

webba uses three providers that together cover the search space:

Provider	Role	Needs
searxng	Meta-search aggregator (Google/Bing/Brave/DDG + news/science/code/video)	Docker (auto-started)
perplexity	LLM-synthesised cited answers for Q&A / research	`PERPLEXITY_API_KEY`
fastcdp	Real-browser SERP scrape via Chrome DevTools Protocol	Chrome/Chromium

provider='auto' (default) routes Q&A/academic queries to Perplexity (when a key is set) and everything else to SearXNG, then cascades to fastcdp if a tier returns nothing. provider='all' runs all available providers and RRF-merges.

Features

Zero-key search: SearXNG runs locally in Docker, no accounts needed
Smart routing: keyword intent detection picks the provider + SearXNG category
Perplexity error handling: HTTP error codes drive retry-vs-fail decisions (401/403/404/400 fatal; 429/5xx retried once with backoff)
TTL cache: literal-query results cached in ~/.webba/cache via diskcache
Any-URL fetch: GitHub files/repos, arxiv, gists, PDFs, YouTube, docs, HTML
scrapling-powered scraping: battle-tested fetch cascade (fast HTTP → stealth browser → dynamic browser)
YouTube: fetch() on a video URL returns metadata + transcript via yt-dlp

Fetching

fetch(url, sel=None, heavy=False, cdp=False, save_pdf=False, **kwargs)

sel: CSS selector to extract specific content from HTML
heavy: skip fast HTTP, go straight to a browser engine for JS-heavy pages
cdp: attach to a running debug Chrome (see below) for login-gated/enterprise pages — reuses that Chrome's cookies
kwargs: forwarded to read_gh_repo for repo URLs (branch, as_dict, etc.)

fastcdp & enterprise / login-gated search

The fastcdp provider drives a real Chrome over the Chrome DevTools Protocol.

Headless (default): webba auto-launches a headless Chrome with an isolated profile at ~/.webba/chrome-profile. Zero-config — handles most public sites.
Login-gated / enterprise: run chrome_debug_setup() (or webba --chrome-debug). This launches a dedicated, isolated Chrome profile with remote debugging. Log into the SSO/enterprise sites you want webba to reach, once, in that window — webba reuses those cookies.

Best practice: webba never attaches to your primary Chrome profile. Chrome's remote-debugging port is unauthenticated, and a daily-driver profile exposes every logged-in session (mail, banking, cloud consoles) to anything that can connect. The dedicated ~/.webba/chrome-profile keeps automation isolated from your personal browsing — log into only what webba needs.

Environment Variables (all optional)

Variable	Effect
`PERPLEXITY_API_KEY`	Enables the Perplexity provider
`SEARXNG_URL`	Use an existing SearXNG instead of the auto-started one
`WEBBA_SEARXNG`	Set to `false` to disable SearXNG entirely (default: `true`)

API

`search(q, n=10, provider='auto', cache=True, cache_ttl=3600) -> SearchResults`

Search the web. provider: 'auto' | 'searxng' | 'perplexity' | 'fastcdp' | 'all'.

`fetch(url, sel=None, heavy=False, cdp=False, save_pdf=False, **kwargs) -> str|dict`

Fetch any URL or local path as clean text. Handles GitHub, arxiv, gists, PDFs, YouTube, docs, HTML.

`crawl(seed, link_pat=None, sel=None, max_pages=500, delay=0.5, same_domain=True, save_dir=None) -> L`

Crawl a site, follow links, return L of {url, text} dicts.

`SearchResults.to_md()` / `.to_context(max_chars=4000)` / `.fetch_all(sel=None, heavy=False)`

Format as markdown / concatenate snippets as LLM context / fetch all result URLs.

`chrome_debug_setup(headless=False, port=9222) -> str`

Launch the isolated debug Chrome for login-gated search.

`searxng_start()` / `searxng_stop()`

Start / stop the local SearXNG container. searxng_start() is idempotent.

`purge_cache(db_path='~/.webba/cache')`

Wipe the search cache.

Architecture

File	Owns
`webba/search.py`	Providers (searxng/perplexity/fastcdp), routing, SearXNG setup, CLI
`webba/cache.py`	`diskcache`-backed literal-query TTL cache
`webba/fetch.py`	URL classification, scrapling fetch cascade, YouTube, crawl

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
tests		tests
webba		webba
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webba

Install

Quick Start

CLI

Providers

Features

Fetching

fastcdp & enterprise / login-gated search

Environment Variables (all optional)

API

`search(q, n=10, provider='auto', cache=True, cache_ttl=3600) -> SearchResults`

`fetch(url, sel=None, heavy=False, cdp=False, save_pdf=False, **kwargs) -> str|dict`

`crawl(seed, link_pat=None, sel=None, max_pages=500, delay=0.5, same_domain=True, save_dir=None) -> L`

`SearchResults.to_md()` / `.to_context(max_chars=4000)` / `.fetch_all(sel=None, heavy=False)`

`chrome_debug_setup(headless=False, port=9222) -> str`

`searxng_start()` / `searxng_stop()`

`purge_cache(db_path='~/.webba/cache')`

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

webba

Install

Quick Start

CLI

Providers

Features

Fetching

fastcdp & enterprise / login-gated search

Environment Variables (all optional)

API

search(q, n=10, provider='auto', cache=True, cache_ttl=3600) -> SearchResults

fetch(url, sel=None, heavy=False, cdp=False, save_pdf=False, **kwargs) -> str|dict

crawl(seed, link_pat=None, sel=None, max_pages=500, delay=0.5, same_domain=True, save_dir=None) -> L

SearchResults.to_md() / .to_context(max_chars=4000) / .fetch_all(sel=None, heavy=False)

chrome_debug_setup(headless=False, port=9222) -> str

searxng_start() / searxng_stop()

purge_cache(db_path='~/.webba/cache')

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`search(q, n=10, provider='auto', cache=True, cache_ttl=3600) -> SearchResults`

`fetch(url, sel=None, heavy=False, cdp=False, save_pdf=False, **kwargs) -> str|dict`

`crawl(seed, link_pat=None, sel=None, max_pages=500, delay=0.5, same_domain=True, save_dir=None) -> L`

`SearchResults.to_md()` / `.to_context(max_chars=4000)` / `.fetch_all(sel=None, heavy=False)`

`chrome_debug_setup(headless=False, port=9222) -> str`

`searxng_start()` / `searxng_stop()`

`purge_cache(db_path='~/.webba/cache')`

Packages