Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ SLACK_BOT_TOKEN=xoxb-your-bot-token

# Server
PORT=3000
HEALTH_PORT=8080
# Bind address for GET /health (127.0.0.1 = localhost only). Docker Compose sets HEALTH_BIND_HOST=0.0.0.0.
HEALTH_BIND_HOST=127.0.0.1

# Database (required) — shared PostgreSQL on the host.
# When running in Docker, use host.docker.internal to reach the host:
Expand Down
42 changes: 25 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# paperscout-python
# paperscout

[![CI](https://github.com/cppalliance/paperscout-python/actions/workflows/ci.yml/badge.svg)](https://github.com/cppalliance/paperscout-python/actions/workflows/ci.yml)
[![CD](https://github.com/cppalliance/paperscout-python/actions/workflows/cd.yml/badge.svg)](https://github.com/cppalliance/paperscout-python/actions/workflows/cd.yml)
[![CI](https://github.com/cppalliance/paperscout/actions/workflows/ci.yml/badge.svg)](https://github.com/cppalliance/paperscout/actions/workflows/ci.yml)
[![CD](https://github.com/cppalliance/paperscout/actions/workflows/cd.yml/badge.svg)](https://github.com/cppalliance/paperscout/actions/workflows/cd.yml)

WG21 C++ paper tracker with ISO draft probing and Slack notifications.

Expand Down Expand Up @@ -79,7 +79,7 @@ Go to **App Home** in the left sidebar:
### 6. Configure and Start the Scout

```bash
cd paperscout-python
cd paperscout
cp .env.example .env
```

Expand Down Expand Up @@ -172,7 +172,7 @@ The workflow picks the environment from the branch (`refs/heads/main` → `produ

```bash
# On the production server (after Docker, PostgreSQL, and nginx are set up)
git clone https://github.com/cppalliance/paperscout-python.git /opt/paperscout
git clone https://github.com/cppalliance/paperscout.git /opt/paperscout
cd /opt/paperscout
cp .env.example .env # edit with real credentials
docker compose up -d --build
Expand All @@ -182,7 +182,7 @@ curl -sf http://localhost:9101/health
On the **staging** server (separate host or separate path on the same host; must match the `staging` environment's `DEPLOY_PATH` and expose `/health` on `HEALTH_PORT`):

```bash
git clone -b develop https://github.com/cppalliance/paperscout-python.git /opt/paperscout-staging
git clone -b develop https://github.com/cppalliance/paperscout.git /opt/paperscout-staging
cd /opt/paperscout-staging
cp .env.example .env # use staging credentials / DB / Slack app as appropriate
docker compose up -d --build
Expand Down Expand Up @@ -225,15 +225,23 @@ All parameters are configurable via environment variables or a `.env` file. See
| `SLACK_BOT_TOKEN` | Slack bot token (`xoxb-...`) |
| `DATABASE_URL` | PostgreSQL connection string (`postgresql://user:pass@host:5432/db`) |

### Server

| Variable | Default | Description |
| -------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `PORT` | `3000` | Slack Bolt HTTP listener |
| `HEALTH_PORT` | `8080` | GET `/health` JSON endpoint |
| `HEALTH_BIND_HOST` | `127.0.0.1` | Bind address for the health server (localhost-only). Use `0.0.0.0` inside Docker when publishing ports to the host; see `docker-compose.yml`. |

### Scheduling

| Variable | Default | Description |
| ----------------------- | ------- | ------------------------------------------------------ |
| `POLL_INTERVAL_MINUTES` | `30` | Main polling cycle interval |
| `POLL_OVERRUN_COOLDOWN_SECONDS` | `300` | Minimum sleep after a poll cycle that overran the interval (avoids tight loops when work or errors stretch a cycle) |
| `ENABLE_BULK_WG21` | `true` | Fetch wg21.link/index.json each cycle |
| `ENABLE_BULK_OPENSTD` | `true` | Reserved for open-std.org scraping (not yet scheduled) |
| `ENABLE_ISO_PROBE` | `true` | Run isocpp.org HEAD probing each cycle |
| Variable | Default | Description |
| ------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------- |
| `POLL_INTERVAL_MINUTES` | `30` | Main polling cycle interval |
| `POLL_OVERRUN_COOLDOWN_SECONDS` | `300` | Minimum sleep after a poll cycle that overran the interval (avoids tight loops when work or errors stretch a cycle) |
| `ENABLE_BULK_WG21` | `true` | Fetch wg21.link/index.json each cycle |
| `ENABLE_BULK_OPENSTD` | `true` | Reserved for open-std.org scraping (not yet scheduled) |
| `ENABLE_ISO_PROBE` | `true` | Run isocpp.org HEAD probing each cycle |

### Probe Prefixes / Extensions

Expand Down Expand Up @@ -302,7 +310,7 @@ All parameters are configurable via environment variables or a `.env` file. See
## Architecture

```
paperscout-python/
paperscout/
src/paperscout/
__main__.py Entry point; wires together all components
config.py All settings via pydantic-settings
Expand All @@ -312,7 +320,7 @@ paperscout-python/
scout.py Slack Bolt app, MessageQueue, notify_channel, notify_users
storage.py PaperCache, ProbeState, UserWatchlist (all PostgreSQL-backed)
db.py ThreadedConnectionPool init and schema DDL
health.py HTTP health-check endpoint (GET /health on port 8080)
health.py HTTP health-check endpoint (GET /health; bind via HEALTH_BIND_HOST)
data/ Log files (gitignored); all other state lives in PostgreSQL
deploy/
paperscout.conf Reference nginx site config (443 → 3000, /health → 8080)
Expand Down Expand Up @@ -380,8 +388,8 @@ The `Last-Modified` timestamp is shown in every notification message.
### Setup

```bash
git clone https://github.com/cppalliance/paperscout-python.git
cd paperscout-python
git clone https://github.com/cppalliance/paperscout.git
cd paperscout
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
Expand Down
21 changes: 14 additions & 7 deletions deploy/SERVER_SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
Step-by-step guide for provisioning a fresh Ubuntu 22.04 server to run
paperscout alongside other apps that share the same PostgreSQL and nginx.

Substitute **`<deploy-user>`** with your SSH/deploy UNIX username wherever it
appears (file ownership, Docker group).

---

## 1. System basics
Expand Down Expand Up @@ -41,7 +44,7 @@ sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Let the deploy user run docker without sudo
sudo usermod -aG docker gcp-cppdigest
sudo usermod -aG docker <deploy-user>
newgrp docker
```

Expand Down Expand Up @@ -136,15 +139,15 @@ sudo apt install -y nginx

# Obtain a Let's Encrypt certificate (skip if already done for this domain)
sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d dev.cppdigest.org
sudo certbot --nginx -d your-domain.example.org
```

Certbot creates a server block for `dev.cppdigest.org` in the default
Certbot creates a server block for `your-domain.example.org` in the default
nginx config. Add the paperscout location blocks **inside that existing
server block** (do NOT create a separate server block -- nginx will
ignore it in favour of the first match).

Open the config and find the `dev.cppdigest.org` server block with
Open the config and find the `your-domain.example.org` server block with
`listen 443 ssl;`. Add these lines before its closing `}`:

```nginx
Expand Down Expand Up @@ -177,8 +180,8 @@ Clone the repo into `/opt/paperscout`:

```bash
sudo mkdir -p /opt
sudo git clone https://github.com/cppalliance/paperscout-python.git /opt/paperscout
sudo chown -R gcp-cppdigest:gcp-cppdigest /opt/paperscout
sudo git clone https://github.com/<org>/<repo>.git /opt/paperscout
sudo chown -R <deploy-user>:<deploy-user> /opt/paperscout
Comment thread
henry0816191 marked this conversation as resolved.
```

Create the `.env` file:
Expand All @@ -197,6 +200,10 @@ The `DATABASE_URL` should use `host.docker.internal`:
DATABASE_URL=postgresql://paperscout:<password>@host.docker.internal:5432/paperscout
```

The Docker Compose file sets **`HEALTH_BIND_HOST=0.0.0.0`** so the health
HTTP server accepts connections from Docker’s port publishing (the default
`127.0.0.1` bind is for bare-metal / localhost-only use).

> **Note:** If the password contains special characters, they must be
> percent-encoded in the URL (e.g. `@` → `%40`, `!` → `%21`).
> Use `python3 -c "import urllib.parse; print(urllib.parse.quote('<password>', safe=''))"` to encode it.
Expand Down Expand Up @@ -248,7 +255,7 @@ Configure these in the repo under **Settings → Secrets and variables → Actio
| Secret | Purpose |
| ---------------- | ----------------------------------- |
| `SERVER_HOST` | Server IP or hostname |
| `SERVER_USER` | SSH username (e.g. `gcp-cppdigest`) |
| `SERVER_USER` | SSH username (e.g. `<deploy-user>`) |
| `SERVER_SSH_KEY` | Private SSH key for the deploy user |
| `SERVER_PORT` | SSH port (optional, defaults to 22) |

Expand Down
6 changes: 3 additions & 3 deletions deploy/paperscout.conf
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
server {
listen 443 ssl;
server_name dev.cppdigest.org;
server_name your-domain.example.org;

ssl_certificate /etc/letsencrypt/live/dev.cppdigest.org/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/dev.cppdigest.org/privkey.pem;
ssl_certificate /etc/letsencrypt/live/your-domain.example.org/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.example.org/privkey.pem;

# Health endpoint — must come before the general /paperscout/ block
# because nginx uses longest-prefix matching.
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ services:
- "127.0.0.1:9100:3000"
- "127.0.0.1:9101:8080"
env_file: .env
environment:
HEALTH_BIND_HOST: "0.0.0.0"
extra_hosts:
- "host.docker.internal:host-gateway"
logging:
Expand Down
10 changes: 6 additions & 4 deletions docs/onboarding.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ Supporting directories: [`tests/`](../tests/) (pytest), [`deploy/`](../deploy/)
### 1. Clone and virtual environment

```bash
git clone https://github.com/cppalliance/paperscout-python.git
cd paperscout-python
git clone https://github.com/cppalliance/paperscout.git
cd paperscout
python -m venv .venv
source .venv/bin/activate # Windows Git Bash: source .venv/Scripts/activate
pip install -e ".[dev]"
Expand Down Expand Up @@ -197,13 +197,15 @@ Every key from [`.env.example`](../.env.example) is listed below. Names in `.env
### Storage and logging

| Variable | Default | Meaning |
| -------------------- | -------- | ------------------------------------------------------------- |
| -------------------- | -------- | ------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `DATA_DIR` | `./data` | Log directory (and local file layout); created if missing. |
| `CACHE_TTL_HOURS` | `1` | Staleness window for cached wg21 index blob in Postgres. |
| `LOG_LEVEL` | `INFO` | Console/file log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`). |
| `LOG_RETENTION_DAYS` | `7` | Days of rotated log files to retain. |
| `HEALTH_PORT` | No | `8080` | Port for the `GET /health` endpoint. |
| `HEALTH_BIND_HOST` | No | `127.0.0.1` | Bind host for health server; use `0.0.0.0` in Docker when publishing ports. |

**Note:** `health_port` (default `8080`) exists in [Settings](../src/paperscout/config.py) but is not listed in `.env.example`. You can still set `HEALTH_PORT` in `.env` if you need to override the default.
Docker Compose sets `HEALTH_BIND_HOST=0.0.0.0` so the health endpoint accepts connections through published ports.

## Scheduling (asyncio loop)

Expand Down
8 changes: 7 additions & 1 deletion src/paperscout/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,13 @@ def _on_poll_result(result):

register_handlers(app, user_watchlist, state, paper_count_fn, launch_time)

start_health_server(settings.health_port, launch_time, state, paper_count_fn)
start_health_server(
settings.health_port,
launch_time,
state,
paper_count_fn,
bind_host=settings.health_bind_host,
)
log.info("Starting Slack Bolt app on port %d", settings.port)
bolt_thread = threading.Thread(
target=app.start,
Expand Down
2 changes: 2 additions & 0 deletions src/paperscout/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ class Settings(BaseSettings):
slack_bot_token: str = ""
port: int = 3000
health_port: int = 8080
# Empty string means all interfaces (0.0.0.0); Docker Compose sets HEALTH_BIND_HOST=0.0.0.0.
health_bind_host: str = "127.0.0.1"

# -- Scheduling --
poll_interval_minutes: int = 30
Expand Down
7 changes: 4 additions & 3 deletions src/paperscout/health.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@ def start_health_server(
launch_time: datetime,
state,
paper_count_fn: Callable[[], int],
bind_host: str = "127.0.0.1",
) -> HTTPServer:
"""Start the ``/health`` HTTP server on *port* in a daemon thread."""
"""Start the ``/health`` HTTP server on *bind_host*:*port* in a daemon thread."""

handler = type(
"_BoundHealthHandler",
Expand All @@ -79,8 +80,8 @@ def start_health_server(
},
)

server = HTTPServer(("", port), handler)
server = HTTPServer((bind_host, port), handler)
thread = threading.Thread(target=server.serve_forever, daemon=True, name="health")
thread.start()
log.info("Health endpoint listening on port %d", port)
log.info("Health endpoint listening on %s:%d", bind_host, port)
return server
2 changes: 1 addition & 1 deletion tests/test_health.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def _find_free_port() -> int:
import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", 0))
s.bind(("127.0.0.1", 0))
return s.getsockname()[1]


Expand Down
Loading