Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/01_introduction/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,5 @@ To see how you can integrate the Apify SDK with popular web scraping libraries,
- [Selenium](../guides/selenium)
- [Crawlee](../guides/crawlee)
- [Scrapy](../guides/scrapy)
- [Browser Use](../guides/browser-use)
- [Running webserver](../guides/running-webserver)
90 changes: 90 additions & 0 deletions docs/03_guides/09_browser_use.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
id: browser-use
title: Browser AI agents with Browser Use
description: Build an Apify Actor that automates a browser with an LLM agent using the Browser Use library.
---

import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import BrowserUseExample from '!!raw-loader!roa-loader!./code/09_browser_use.py';

In this guide, you'll learn how to use the [Browser Use](https://browser-use.com/) library to drive a browser with an LLM agent in your Apify Actors.

## Introduction

[Browser Use](https://browser-use.com/) is a Python library that lets an LLM control a real web browser. Instead of writing selectors and navigation steps by hand, you give an agent a natural-language task - such as "find the top post on Hacker News and return its title and URL" - and the agent decides which pages to open, what to click, and what to read until the task is done.

Some of the features that make Browser Use a good fit for Apify Actors:

- **Natural-language tasks** - Describe what you want in plain English; the agent figures out the steps. This is well suited to pages whose structure changes often or is hard to target with fixed selectors.
- **Model-agnostic** - Browser Use ships wrappers for many providers (`ChatOpenAI`, `ChatAnthropic`, `ChatGoogle`, and more), so you can pick the model that fits your task and budget.
- **Structured output** - Pass a [Pydantic](https://docs.pydantic.dev/) model as the output schema and the agent returns a validated object instead of free-form text, which maps cleanly onto an Apify dataset.
- **Real browser via CDP** - The agent drives a real Chromium over the Chrome DevTools Protocol, so JavaScript-heavy pages render just like they would for a human.
- **First-class async support** - The agent's `run` method is asynchronous, which integrates naturally with the asyncio-based Apify SDK.

Browser Use needs only the `browser-use` package - install it with:

```bash
pip install browser-use
```

## Configuring the LLM

Browser Use needs an LLM to drive the agent. You choose a provider wrapper, give it a model name, and supply the provider's API key:

- **`ChatOpenAI`** - OpenAI models such as `gpt-4.1-mini` or `gpt-5-mini`. Reads the key from `OPENAI_API_KEY`, or accepts it via the `api_key` argument.
- **`ChatAnthropic`** - Anthropic Claude models such as `claude-sonnet-4-5` or `claude-haiku-4-5`. Reads the key from `ANTHROPIC_API_KEY`.
- **`ChatGoogle`** - Google Gemini models such as `gemini-2.5-flash`. Reads the key from `GOOGLE_API_KEY`.

The example Actor in this guide uses `ChatOpenAI`, but switching providers is a one-line change in `run_agent_task`. More capable models generally complete tasks in fewer steps and more reliably, while smaller models are cheaper per step.

Keep the API key out of the Actor input and source code. The example reads it from an environment variable, which on the Apify platform you set as a [secret environment variable](https://docs.apify.com/platform/actors/development/programming-interface/environment-variables) (for example `OPENAI_API_KEY`), and locally you export in your shell.

## Example Actor

The following Actor runs a Browser Use agent for a single task and stores its structured result in the default dataset. By default it opens [Hacker News](https://news.ycombinator.com) and returns the title and URL of the top five posts, but the task, model, and step limit are all configurable through the Actor input.

The whole Actor fits in a single file. A `run_agent_task` helper holds the Browser Use-specific logic - it defines the output schema and builds the LLM, browser, and agent - while the `main` coroutine handles the [Actor](https://docs.apify.com/platform/actors) lifecycle, reads the input, sets up [Apify Proxy](https://docs.apify.com/platform/proxy), runs the agent, and stores the result:

<RunnableCodeBlock className="language-python" language="python">
{BrowserUseExample}
</RunnableCodeBlock>

A few things worth pointing out:

- Keeping the agent setup in `run_agent_task` separates the Browser Use-specific code from the Actor's orchestration logic. `main` only decides what to read from the input and what to store.
- Passing `output_model_schema=Posts` makes the agent return a validated `Posts` instance via `history.structured_output`, so `main` can push each item straight to the dataset. Adapt the task and the `Post`/`Posts` models together to fit your own use case.
- `enable_signal_handler=False` leaves signal handling to the Actor, which manages the run's lifecycle. Without it, Browser Use would install its own handlers and interfere with a clean shutdown.
- `headless=Actor.configuration.headless` runs the browser without a visible window, which is what you want on the platform.

## Using Apify Proxy

Running on the Apify platform gives your agent access to [Apify Proxy](https://docs.apify.com/platform/proxy), which rotates IP addresses to avoid rate limiting and blocking. In the example above, `main` creates a proxy configuration with `Actor.create_proxy_configuration` and passes a fresh proxy URL to `run_agent_task`.

Browser Use expects the proxy as a `ProxySettings` object with separate `server`, `username`, and `password` fields, whereas `ProxyConfiguration.new_url` returns a single URL string (for example `http://user:pass@proxy.apify.com:8000`). The `_proxy_settings` helper splits that URL into the fields Browser Use expects. To select specific proxy groups or a country, pass the relevant arguments to `Actor.create_proxy_configuration`. For more details, see the [Proxy management](../concepts/proxy-management) guide.

## Running on the Apify platform

Browser Use drives a real Chromium over CDP, so the Actor needs a browser binary available at runtime. The simplest way to provide one is to build on top of the [Apify Playwright base image](https://hub.docker.com/r/apify/actor-python-playwright), which already ships a browser together with all of its system-level dependencies. Browser Use discovers that browser automatically, so no extra install step is needed in the image.

Disable Browser Use's telemetry and cloud sync inside the Actor by setting the `ANONYMIZED_TELEMETRY=false` and `BROWSER_USE_CLOUD_SYNC=false` environment variables in your Dockerfile.

When running the Actor locally, install the browser once with the `browser-use install` command, which downloads a Chromium build together with its dependencies:

```bash
browser-use install
```

Remember to provide the LLM API key in both environments - as a secret environment variable on the platform, and exported in your shell when running locally.

## Conclusion

In this guide, you learned how to use Browser Use in your Apify Actors. You can now drive a real browser with an LLM agent, return its results as a validated Pydantic model, route the browser through Apify Proxy, and run the whole thing on the Apify platform. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own automation tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy automating!

## Additional resources

- [Browser Use: Official documentation](https://docs.browser-use.com/)
- [Browser Use: Supported models](https://docs.browser-use.com/customize/supported-models)
- [Browser Use: Structured output](https://docs.browser-use.com/customize/agent/output-format)
- [Browser Use: GitHub repository](https://github.com/browser-use/browser-use)
- [Apify: Proxy management](https://docs.apify.com/platform/proxy)
113 changes: 113 additions & 0 deletions docs/03_guides/code/09_browser_use.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import asyncio
import os
from urllib.parse import urlsplit

from browser_use import Agent, Browser, ChatOpenAI
from browser_use.browser import ProxySettings
from pydantic import BaseModel

from apify import Actor

# Default task, aligned with the `Posts` schema below.
DEFAULT_TASK = (
'Open https://news.ycombinator.com and return the title and URL '
'of the top 5 posts on the front page.'
)


class Post(BaseModel):
"""A single item the agent is asked to extract."""

title: str
url: str


class Posts(BaseModel):
"""The structured result returned by the agent."""

posts: list[Post]


def to_browser_use_proxy(proxy_url: str) -> ProxySettings:
"""Convert an Apify Proxy URL into Browser Use `ProxySettings`."""
parts = urlsplit(proxy_url)
return ProxySettings(
server=f'{parts.scheme}://{parts.hostname}:{parts.port}',
username=parts.username,
password=parts.password,
)


async def run_agent_task(
task: str,
*,
model: str,
llm_api_key: str,
max_steps: int,
headless: bool = True,
proxy_url: str | None = None,
) -> Posts | None:
"""Run a Browser Use agent for one task and return its structured output."""
# Configure the LLM. Swap `ChatOpenAI` for another provider if needed.
llm = ChatOpenAI(model=model, api_key=llm_api_key)

# Configure the browser, optionally routed through a proxy.
browser = Browser(
headless=headless,
proxy=to_browser_use_proxy(proxy_url) if proxy_url else None,
)

# `output_model_schema` returns a validated `Posts`; signals stay with the Actor.
agent = Agent(
task=task,
llm=llm,
browser=browser,
output_model_schema=Posts,
enable_signal_handler=False,
)

history = await agent.run(max_steps=max_steps)
return history.structured_output


async def main() -> None:
async with Actor:
# Read the Actor input.
actor_input = await Actor.get_input() or {}
task = actor_input.get('task', DEFAULT_TASK)
model = actor_input.get('model', 'gpt-4.1-mini')
max_steps = actor_input.get('maxSteps', 25)

# Read the LLM API key from the environment (set it as a secret on Apify).
llm_api_key = os.environ.get('OPENAI_API_KEY')
if not llm_api_key:
raise RuntimeError('The OPENAI_API_KEY environment variable is not set.')

# Route the browser through Apify Proxy.
proxy_configuration = await Actor.create_proxy_configuration()
proxy_url = await proxy_configuration.new_url() if proxy_configuration else None

Actor.log.info(f'Running the agent (model={model}) for task: {task}')

result = await run_agent_task(
task,
model=model,
llm_api_key=llm_api_key,
max_steps=max_steps,
headless=Actor.configuration.headless,
proxy_url=proxy_url,
)

if result is None:
Actor.log.warning('The agent did not return any structured output.')
return

# Store each extracted item as a dataset row.
Actor.log.info(f'The agent returned {len(result.posts)} post(s); storing them.')
for post in result.posts:
Actor.log.info(f'Storing post: {post.title!r} ({post.url})')
await Actor.push_data(post.model_dump())


if __name__ == '__main__':
asyncio.run(main())
Loading