diff --git a/src/content/docs/creating-custom-feeds.mdx b/src/content/docs/creating-custom-feeds.mdx
index 3814969c..16ddd656 100644
--- a/src/content/docs/creating-custom-feeds.mdx
+++ b/src/content/docs/creating-custom-feeds.mdx
@@ -6,6 +6,7 @@ sidebar:
---
import { Aside } from "@astrojs/starlight/components";
+import Code from "astro/components/Code.astro";
When auto-sourcing isn't enough, you can write your own configuration files to create custom RSS feeds for any website. This guide shows you how to take full control with YAML configs.
@@ -160,6 +161,22 @@ html2rss supports many configuration options:
4. **Check the output:** Make sure all items have titles, links, and descriptions
+### Useful CLI flags when a site is difficult
+
+Some sites need a little more request budget than the defaults.
+
+- Use `--max-redirects` when the site bounces through several canonicalization or tracking redirects before the real page loads.
+- Use `--max-requests` when your config needs more than one request, for example pagination or other follow-up fetches.
+
+
+
+Keep these values tight. Raise them only when the site proves it needs more.
+
## Add It To html2rss-web
Once the config works locally, add it to your `feeds.yml` or shared config repository and restart your
diff --git a/src/content/docs/getting-started.mdx b/src/content/docs/getting-started.mdx
index f08de5ba..f5e2a871 100644
--- a/src/content/docs/getting-started.mdx
+++ b/src/content/docs/getting-started.mdx
@@ -5,6 +5,8 @@ sidebar:
order: 1
---
+import Code from "astro/components/Code.astro";
+
This page points to the main onboarding flow.
## Start Here
@@ -23,3 +25,15 @@ That guide is the canonical setup flow for:
- **[Browse working feed examples](/feed-directory/)** - See what success looks like
- **[Create Custom Feeds](/creating-custom-feeds)** - Write configs when you need more control
- **[Troubleshooting Guide](/troubleshooting/troubleshooting)** - Fix startup or extraction problems
+
+## Using the Ruby CLI
+
+If you are working directly with the gem instead of `html2rss-web`, start with:
+
+
+
+If the target site is unusually redirect-heavy or needs extra follow-up requests, the CLI also supports:
+
+
+
+For config-driven runs, the same flags are available on `html2rss feed`.
diff --git a/src/content/docs/ruby-gem/how-to/advanced-features.mdx b/src/content/docs/ruby-gem/how-to/advanced-features.mdx
index 703bd9e9..7d1088b3 100644
--- a/src/content/docs/ruby-gem/how-to/advanced-features.mdx
+++ b/src/content/docs/ruby-gem/how-to/advanced-features.mdx
@@ -7,13 +7,7 @@ This guide covers advanced features and performance optimizations for html2rss.
## Parallel Processing
-html2rss uses parallel processing to improve performance when scraping multiple items. This happens automatically and doesn't require any configuration.
-
-### How It Works
-
-- **Auto-source scraping:** Multiple scrapers run in parallel to analyze the page
-- **Item processing:** Each scraped item is processed in parallel
-- **Performance benefit:** Significantly faster when dealing with many items
+html2rss uses parallel processing in auto-source discovery. This happens automatically and doesn't require any configuration.
### Performance Tips
@@ -88,7 +82,7 @@ LOG_LEVEL=debug html2rss feed config.yml
Use the health check endpoint to monitor feed generation:
```bash
-curl -u username:password http://localhost:3000/health_check.txt
+curl -u username:password http://localhost:4000/health_check.txt
```
## Article Validation
diff --git a/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx b/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx
index 33b6cca3..23fdfa7b 100644
--- a/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx
+++ b/src/content/docs/ruby-gem/how-to/custom-http-requests.mdx
@@ -3,7 +3,15 @@ title: "Custom HTTP Requests"
description: "Learn how to customize HTTP requests with custom headers, authentication, and API interactions for html2rss."
---
-Some websites require custom HTTP headers, authentication, or other request settings to access their content. `html2rss` lets you customize requests for those cases.
+import Code from "astro/components/Code.astro";
+
+Some sites only work when requests carry the headers, tokens, or cookies your browser uses. `html2rss` supports those cases without changing the rest of your feed workflow.
+
+Keep this structure in mind:
+
+- `headers` stays top-level
+- `strategy` stays top-level
+- request-specific controls such as budgets and Browserless options live under `request`
## When You Need Custom Headers
@@ -19,8 +27,8 @@ You might need custom HTTP requests when:
Add a `headers` section to your feed configuration. This example is a complete, valid config:
-```yaml
-headers:
+
+
+## Request Controls
+
+Request budgets are configured under `request`, not as top-level keys:
+
+
+
+- `request.max_redirects` limits redirect hops
+- `request.max_requests` limits the total request budget for the feed build
+- `request.browserless.*` is reserved for Browserless-only behavior such as preload actions
## Common Use Cases
diff --git a/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx b/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
index c0e5e379..2ca0db72 100644
--- a/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
+++ b/src/content/docs/ruby-gem/how-to/handling-dynamic-content.mdx
@@ -3,12 +3,38 @@ title: Handling Dynamic Content
description: "Learn how to handle JavaScript-heavy websites and dynamic content with html2rss. Use browserless strategy for sites that load content dynamically."
---
+import Code from "astro/components/Code.astro";
+
Some websites load their content dynamically using JavaScript. The default `html2rss` strategy might not see this content.
## Solution
Use the [`browserless` strategy](/ruby-gem/reference/strategy) to render JavaScript-heavy websites with a headless browser.
+Keep the strategy at the top level and put request-specific options under `request`:
+
+
+
## When to Use Browserless
The `browserless` strategy is necessary when:
@@ -18,6 +44,53 @@ The `browserless` strategy is necessary when:
- **Infinite scroll** - Content loads as you scroll
- **Dynamic forms** - Content changes based on user interaction
+## Preload Actions
+
+For dynamic sites, rendering once is often not enough. Use `request.browserless.preload` to wait, click, or scroll before the
+HTML snapshot is taken.
+
+### Wait for JavaScript Requests
+
+```yaml
+strategy: browserless
+request:
+ browserless:
+ preload:
+ wait_for_network_idle:
+ timeout_ms: 4000
+```
+
+### Click "Load More" Buttons
+
+```yaml
+strategy: browserless
+request:
+ browserless:
+ preload:
+ click_selectors:
+ - selector: ".load-more"
+ max_clicks: 3
+ delay_ms: 250
+ wait_for_network_idle:
+ timeout_ms: 3000
+```
+
+### Scroll Infinite Lists
+
+```yaml
+strategy: browserless
+request:
+ browserless:
+ preload:
+ scroll_down:
+ iterations: 5
+ delay_ms: 200
+ wait_for_network_idle:
+ timeout_ms: 2500
+```
+
+These preload steps can be combined in a single config when a site needs several interactions before all items appear.
+
## Performance Considerations
The `browserless` strategy is slower than the default `faraday` strategy because it:
diff --git a/src/content/docs/ruby-gem/reference/auto-source.mdx b/src/content/docs/ruby-gem/reference/auto-source.mdx
index 33454232..82e92df0 100644
--- a/src/content/docs/ruby-gem/reference/auto-source.mdx
+++ b/src/content/docs/ruby-gem/reference/auto-source.mdx
@@ -17,16 +17,19 @@ auto_source: {}
`auto_source` uses the following strategies to find content:
-1. **`schema`:** Parses `