From ca5329a369caca74541967cd194b4e865329fb10 Mon Sep 17 00:00:00 2001
From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com>
Date: Mon, 23 Feb 2026 17:35:24 +0000
Subject: [PATCH 1/3] docs: add NFS replication transport guidance and tuning
 for GCP

Expand GCP deployment docs with Filestore, NetApp Volumes, and GKE
PersistentVolume guidance for NFS-based replication. Update replication
tuning docs with sub-200ms config profiles and replica poll interval
setting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 documentation/deployment/gcp.md           | 59 ++++++++++++++++++++---
 documentation/high-availability/tuning.md | 44 +++++++++++++++--
 2 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/documentation/deployment/gcp.md b/documentation/deployment/gcp.md
index 7e3deb987..be59034a8 100644
--- a/documentation/deployment/gcp.md
+++ b/documentation/deployment/gcp.md
@@ -27,9 +27,14 @@ We recommend starting with `C-Series` instances, and reviewing other instance ty
 
 You should deploy using an  `x86_64` Linux distribution, such as Ubuntu.
 
-For storage, we recommend using [Hyperdisk Balanced](https://cloud.google.com/compute/docs/disks/hyperdisks) disks, 
+For storage, we recommend using [Hyperdisk Balanced](https://cloud.google.com/compute/docs/disks/hyperdisks) disks,
 and provisioning them at `5000 IOPS/300 MBps` until you have tested your workload.
 
+:::warning
+Hyperdisk Balanced is not supported on all machine types. N2 instances do not
+support Hyperdisk. Use N4, C3, or C4 series instances with Hyperdisk Balanced.
+:::
+
 `Hyperdisk Extreme` generally requires much higher `vCPU` counts - for example, it cannot be used on `C3` machines
 smaller than `88 vCPUs`.
 
@@ -38,19 +43,61 @@ smaller than `88 vCPUs`.
 
 ### Google Filestore
 
-Google Filestore is a `NAS` solution offering an `NFS` API to talk to arbitrary volumes. 
+Google Filestore is a managed NFS service that can be used as a replication
+transport layer in QuestDB Enterprise.
+
+Filestore should **not** be used as primary storage for QuestDB. However, it
+is well-suited for replication when low latency is required. The `fs::`
+transport over NFS provides sub-200ms replication lag with
+[aggressive tuning](/docs/high-availability/tuning/), compared to ~1s+ with
+object store transport (GCS).
+
+To use Filestore for replication:
+
+1. Create a Filestore instance in the same region as your QuestDB VMs
+2. Mount the NFS share on both primary and replica nodes
+3. Configure the `fs::` transport in `server.conf`:
+
+```ini
+replication.object.store=fs::root=/mnt/questdb-repl/final;atomic_write_dir=/mnt/questdb-repl/scratch;
+```
+
+Use the [backup](/docs/operations/backup/) feature to manage WAL file retention
+on the NFS mount.
 
-This should **not** be used as primary storage for QuestDB. It could be used for replication in QuestDB Enterprise,
-but `Google Cloud Storage` is likely simpler and cheaper to use.
+On GKE, expose the Filestore share as a `PersistentVolume` with
+`ReadWriteMany` access mode using the
+[Filestore CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/filestore-csi-driver),
+so both primary and replica pods can mount it simultaneously.
+
+:::note
+Filestore Zonal and Basic SSD tiers may require a
+[quota increase](https://cloud.google.com/docs/quotas/view-manage) before use.
+Basic HDD is typically available by default.
+:::
 
 ### Google Cloud Storage
 
-QuestDB supports `Google Cloud Storage` as its replication object-store in the Enterprise edition.
+QuestDB supports Google Cloud Storage as its replication object store in the
+Enterprise edition. GCS is the simplest and cheapest replication transport, but
+has higher latency (~1s+) due to object store API overhead.
 
-To get started, create a bucket for the database to use. Then follow the 
+To get started, create a bucket for the database to use. Then follow the
 [Enterprise Quick Start](/docs/getting-started/enterprise-quick-start/) steps to create a connection string and
 configure QuestDB.
 
+### NetApp Volumes
+
+[NetApp Volumes](https://cloud.google.com/netapp/volumes/docs/discover/overview)
+is a managed NFS service on GCP backed by NetApp ONTAP. Like Filestore, it can
+be used as a low-latency replication transport via the `fs::` prefix. The
+QuestDB configuration is identical to Filestore.
+
+:::note
+NetApp Volumes requires enabling the `netapp.googleapis.com` API and may
+require separate quota allocation.
+:::
+
 ### Minimum specification
 
 - **Instance**: `c3-standard-4` or `c3d-standard-4` `(4 vCPUs, 16 GB RAM)`
diff --git a/documentation/high-availability/tuning.md b/documentation/high-availability/tuning.md
index 3403c064c..a4d7badc3 100644
--- a/documentation/high-availability/tuning.md
+++ b/documentation/high-availability/tuning.md
@@ -25,15 +25,31 @@ reduced network traffic depending on your needs.
 
 ## Quick reference
 
-**For low latency**:
+**For low latency (sub-200ms)**:
 ```ini
+# Primary
 cairo.wal.segment.rollover.size=262144
-replication.primary.throttle.window.duration=1000
+replication.primary.throttle.window.duration=50
 replication.primary.sequencer.part.txn.count=5000
+
+# Replica
+replication.replica.poll.interval=50
+```
+
+**For low latency (sub-500ms)**:
+```ini
+# Primary
+cairo.wal.segment.rollover.size=524288
+replication.primary.throttle.window.duration=100
+replication.primary.sequencer.part.txn.count=5000
+
+# Replica
+replication.replica.poll.interval=100
 ```
 
 **For network efficiency**:
 ```ini
+# Primary
 cairo.wal.segment.rollover.size=2097152
 replication.primary.throttle.window.duration=60000
 replication.primary.sequencer.part.txn.count=1000
@@ -99,7 +115,9 @@ fill up before upload, reducing redundant uploads (write amplification).
 
 | Value | Behavior |
 |-------|----------|
-| `1000` (1s) | Lowest latency, most uploads. |
+| `50` (50ms) | Ultra-low latency. Best with NFS transport. |
+| `100` (100ms) | Low latency. Good balance for NFS transport. |
+| `1000` (1s) | Low latency for object store transport. |
 | `10000` (10s) | Default. Balanced. |
 | `60000` (60s) | 1 minute delay OK. Fewer uploads. |
 | `300000` (5 min) | Cost-sensitive. Batches more data. |
@@ -107,6 +125,26 @@ fill up before upload, reducing redundant uploads (write amplification).
 This is your **maximum replication latency tolerance**. QuestDB still actively
 manages replication to prevent backlogs during bursts.
 
+### Replica poll interval
+
+```ini
+replication.replica.poll.interval=1000  # 1 second (default)
+```
+
+How often the replica checks the transport layer for new data. This setting
+is configured on the **replica** node.
+
+| Value | Behavior |
+|-------|----------|
+| `50` (50ms) | Ultra-low latency. Pair with aggressive primary settings. |
+| `100` (100ms) | Low latency. Good for NFS transport. |
+| `1000` (1s) | Default. Balanced. |
+
+:::note
+Reducing the poll interval below the throttle window duration has diminishing
+returns, since the replica cannot consume data faster than the primary produces it.
+:::
+
 ### Sequencer part size
 
 ```ini

From 623a2f1ba643a6c3577236a928bd9e1985fd78af Mon Sep 17 00:00:00 2001
From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com>
Date: Tue, 24 Feb 2026 19:01:42 +0000
Subject: [PATCH 2/3] docs: revamp replication tuning page structure and
 pricing guidance

Restructure the page so the most actionable content is at the top:
- Add "three settings that matter" summary table
- Add copy-paste configuration profiles (sub-200ms, sub-500ms, default, network efficiency)
- Add transport cost vs latency section with cloud-agnostic breakeven formula
- Move detailed settings docs to a reference section lower on the page
- Add advanced settings table for power-user knobs
- Add GCS-specific latency floor note
- Fix screenshot titles to match "balanced" profile naming
- Fix compression section to cross-link configurable settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 documentation/high-availability/tuning.md | 196 ++++++++++++++++------
 1 file changed, 146 insertions(+), 50 deletions(-)

diff --git a/documentation/high-availability/tuning.md b/documentation/high-availability/tuning.md
index a4d7badc3..23c90fcff 100644
--- a/documentation/high-availability/tuning.md
+++ b/documentation/high-availability/tuning.md
@@ -12,56 +12,159 @@ import { EnterpriseNote } from "@site/src/components/EnterpriseNote"
   Tune replication for lower latency or reduced network costs.
 </EnterpriseNote>
 
-Replication tuning lets you balance **latency** against **network costs**. By
-default, QuestDB uses balanced settings. You can tune for lower latency or
-reduced network traffic depending on your needs.
+Three settings control replication latency. The main decision is your transport
+layer — **object store** (S3, GCS, Azure Blob) is simplest and cheapest at rest,
+while **NFS** (EFS, Filestore, Azure Files, NetApp) removes per-operation costs
+and unlocks sub-second latency. Pick a transport, choose a profile below, and
+restart.
 
-## When to tune
+## The three settings that matter
 
-| Goal | Approach |
-|------|----------|
-| **Low latency** | Smaller WAL segments, shorter throttle windows. |
-| **Lower network costs** | Larger WAL segments, longer throttle windows. |
+| Setting | Node | Default | What it does |
+|---------|------|---------|-------------|
+| `replication.primary.throttle.window.duration` | Primary | `10000` (10s) | Maximum time before an incomplete WAL segment is flushed |
+| `replication.replica.poll.interval` | Replica | `1000` (1s) | How often the replica checks for new data |
+| `cairo.wal.segment.rollover.size` | Primary | `2097152` (2 MiB) | Max WAL segment size before rollover |
 
-## Quick reference
+A segment is uploaded when **either** the size limit or the throttle window is
+reached, whichever comes first. Under heavy write load, segments fill and flush
+well before the throttle window expires. Under light load, the throttle window
+controls when the partially-filled segment is flushed.
+
+## Configuration profiles
+
+### Sub-200ms latency (NFS transport)
 
-**For low latency (sub-200ms)**:
 ```ini
 # Primary
 cairo.wal.segment.rollover.size=262144
 replication.primary.throttle.window.duration=50
-replication.primary.sequencer.part.txn.count=5000
 
 # Replica
 replication.replica.poll.interval=50
 ```
 
-**For low latency (sub-500ms)**:
+### Sub-500ms latency (NFS or object store)
+
 ```ini
 # Primary
 cairo.wal.segment.rollover.size=524288
 replication.primary.throttle.window.duration=100
-replication.primary.sequencer.part.txn.count=5000
 
 # Replica
 replication.replica.poll.interval=100
 ```
 
-**For network efficiency**:
+### Default / balanced
+
+No configuration needed. The defaults are:
+
+- `replication.primary.throttle.window.duration=10000` (10s)
+- `replication.replica.poll.interval=1000` (1s)
+- `cairo.wal.segment.rollover.size=2097152` (2 MiB)
+
+### Network efficiency
+
 ```ini
 # Primary
 cairo.wal.segment.rollover.size=2097152
 replication.primary.throttle.window.duration=60000
-replication.primary.sequencer.part.txn.count=1000
 ```
 
+## Choosing a transport: cost vs latency
+
+{/* Pricing sources — verify periodically against your cloud provider:
+   GCS:          https://cloud.google.com/storage/pricing
+   Filestore:    https://cloud.google.com/filestore/pricing
+   NetApp (GCP): https://cloud.google.com/netapp/volumes/pricing
+   AWS S3:       https://aws.amazon.com/s3/pricing/
+   AWS EFS:      https://aws.amazon.com/efs/pricing/
+   Azure Blob:   https://azure.microsoft.com/en-us/pricing/details/storage/blobs/
+   Azure Files:  https://azure.microsoft.com/en-us/pricing/details/storage/files/
+   Azure NetApp: https://azure.microsoft.com/en-us/pricing/details/netapp/
+*/}
+
+### Object store (S3, GCS, Azure Blob)
+
+- **Per-request pricing**: every WAL upload is a write op, every replica poll is
+  a read op
+- Lower latency settings = more ops = higher cost
+- Best for: simplest setup, low storage cost, moderate latency tolerance
+- Storage cost: ~$20/TB/month across major clouds
+
+:::note[GCP users]
+Replication over GCS has a latency floor of roughly 1 second. If you need
+sub-second replication on GCP, use an NFS transport such as Filestore or
+NetApp Volumes instead.
+:::
+
+### NFS / managed file storage (EFS, Filestore, Azure Files, NetApp)
+
+- **Fixed monthly cost** regardless of how aggressively you tune
+- No per-operation charges — poll every 50ms at no extra cost
+- Best for: low-latency requirements, high-throughput ingestion
+- Storage cost: ~$60–300/TB/month depending on service tier and provider
+- NFS is usually priced by provisioned capacity, not usage — you pay for the
+  full volume whether it's 10% or 100% full
+
+### The cost tradeoff
+
+The storage cost gap (object store at ~$20/TB vs NFS at $60–300/TB) looks large,
+but the replication working set — WAL files in transit — is typically well under
+1 TB. At that scale the per-TB premium is modest in absolute terms.
+
+The real cost difference is **operations**. With object store, every flush and
+every poll is a billable request. Each actively-written table generates one write
+op per throttle window and one read op per poll interval. Across major clouds,
+write ops typically cost ~$5/million and read ops ~$0.40/million.
+
+**Object store ops cost per active table:**
+
+| Throttle / poll interval | Ops cost per table per month |
+|---|---|
+| 50ms / 50ms | ~$280 |
+| 100ms / 100ms | ~$140 |
+| 1s / 1s | ~$14 |
+| 10s / 1s (default) | ~$2 |
+
+Multiply by the number of tables being actively written to. With 10 tables at
+100ms intervals, that's ~$1,400/month in API charges alone. With NFS, that same
+configuration costs nothing extra.
+
+The rough breakeven:
+
+> **ops cost per month** ≈ active tables × $14,000 / interval_ms
+>
+> If that exceeds the NFS premium over object storage (typically $40–180/TB/mo
+> × your working set in TB), **NFS is cheaper**.
+
+At default settings with a handful of tables, object store wins easily. Once you
+push below ~200ms intervals or have many actively-written tables, NFS pays for
+itself on API savings alone — and you get lower latency as a bonus.
+
+:::note
+For long-term data retention (cold/archive tier), object storage is always
+significantly cheaper and should be used regardless of your replication
+transport choice.
+:::
+
+### Summary
+
+| | Object store | NFS / file storage |
+|---|---|---|
+| Pricing model | Per-request + per-GB stored | Fixed monthly (provisioned) |
+| Storage cost | ~$20/TB/mo | ~$60–300/TB/mo |
+| Cost of aggressive tuning | Higher (more ops) | No change |
+| Setup complexity | Low | Medium (mount on all nodes) |
+| Best for | Default settings, few tables | Sub-second latency, many tables |
+
 ## How replication works
 
 Understanding the data flow helps you tune effectively:
 
-1. **Ingestion** - Data is written to Write-Ahead Log (WAL) segments
-2. **Upload** - WAL segments are uploaded to object storage
-3. **Download** - Replicas download and apply WAL segments
+1. **Ingestion** — Data is written to Write-Ahead Log (WAL) segments
+2. **Upload** — WAL segments are flushed to the transport (object store or NFS)
+3. **Download** — Replicas poll the transport and apply new WAL segments
 
 The key insight: **smaller, more frequent uploads = lower latency but more
 network traffic**. Larger, less frequent uploads = higher latency but lower
@@ -69,7 +172,7 @@ costs.
 
 <Screenshot
   alt="Network traffic with default settings"
-  title="Default settings: optimized for latency"
+  title="Default settings: balanced latency and throughput"
   height={360}
   src="images/guides/replication-tuning/one_row_sec_defaults.webp"
   width={1072}
@@ -83,7 +186,7 @@ costs.
   width={1072}
 />
 
-## Settings explained
+## Settings reference
 
 ### WAL segment size
 
@@ -91,8 +194,10 @@ costs.
 cairo.wal.segment.rollover.size=2097152
 ```
 
-Controls when WAL segments are closed and uploaded. Smaller segments upload
-sooner (lower latency) but create more files.
+Controls the size threshold at which WAL segments are closed and uploaded.
+Smaller segments upload sooner (lower latency) but create more files. Works in
+tandem with the throttle window — whichever limit is hit first triggers the
+upload.
 
 | Value | Behavior |
 |-------|----------|
@@ -110,8 +215,9 @@ Tiering requires files over 128 KiB.
 replication.primary.throttle.window.duration=10000  # 10 seconds (default)
 ```
 
-Maximum time before uploading incomplete segments. Longer windows let segments
-fill up before upload, reducing redundant uploads (write amplification).
+Maximum time before uploading an incomplete segment. If a segment hasn't reached
+the rollover size within this window, it is flushed anyway. Longer windows let
+segments fill up before upload, reducing redundant uploads (write amplification).
 
 | Value | Behavior |
 |-------|----------|
@@ -145,41 +251,31 @@ Reducing the poll interval below the throttle window duration has diminishing
 returns, since the replica cannot consume data faster than the primary produces it.
 :::
 
-### Sequencer part size
-
-```ini
-replication.primary.sequencer.part.txn.count=5000
-```
-
-Controls how many transactions are grouped into each sequencer part file.
-
-Instead of uploading the entire transaction log on every replication cycle
-(which grows indefinitely), the sequencer is split into fixed-size part files.
-Only new or changed parts are uploaded, significantly reducing network overhead.
+## Advanced settings
 
-| Value | Effect |
-|-------|--------|
-| Lower (e.g. `1000`) | Smaller part files, more frequent new parts, more object storage requests, faster incremental uploads. |
-| Higher (e.g. `5000`) | Larger part files, fewer parts, fewer object storage requests, larger per-upload size. |
+These settings are available for power users but rarely need adjustment:
 
-Default is `5000` (each part ~34-68 KiB compressed).
-
-:::warning
-This setting is **fixed at table creation**. You cannot change it for existing
-tables.
-:::
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `replication.primary.sequencer.part.txn.count` | `5000` | Transactions per sequencer part file. Lower values mean smaller parts and faster incremental uploads but more storage requests. **Fixed at table creation** — cannot be changed for existing tables. |
+| `replication.primary.compression.level` | `1` | Zstd compression level for WAL uploads. Higher values reduce transfer size at the cost of CPU. |
+| `replication.primary.compression.threads` | `2` | Number of threads used for compressing WAL data before upload. |
+| `replication.requests.max.concurrent` | `32` | Maximum concurrent replication requests (uploads and downloads). |
+| `replication.requests.retry.attempts` | `3` | Number of retry attempts for failed replication requests. |
+| `replication.requests.retry.interval` | `500` | Milliseconds between retry attempts. |
 
 ## Compression (reference)
 
-WAL data is compressed before upload. This isn't tunable, but useful for
-estimating storage and network requirements:
+WAL data is compressed before upload (the level and thread count are configurable
+in [Advanced settings](#advanced-settings) above). The typical ratios are useful
+for estimating storage and network requirements:
 
-| Data type | Compression ratio |
-|-----------|-------------------|
+| Data type | Typical compression ratio |
+|-----------|---------------------------|
 | WAL segments | ~8x |
 | Sequencer parts | ~6x |
 
-For example, a 2 MiB WAL segment becomes ~256 KiB in object storage.
+For example, a 2 MiB WAL segment becomes ~256 KiB in the transport layer.
 
 ## Next steps
 

From 77bc8912380f80cef8d256075ae1346ad6fd5c81 Mon Sep 17 00:00:00 2001
From: Nick Woolmer <29717167+nwoolmer@users.noreply.github.com>
Date: Tue, 24 Feb 2026 19:13:54 +0000
Subject: [PATCH 3/3] docs: escape dollar signs to prevent LaTeX rendering in
 tuning page

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 documentation/high-availability/tuning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/documentation/high-availability/tuning.md b/documentation/high-availability/tuning.md
index 23c90fcff..5a25a20a9 100644
--- a/documentation/high-availability/tuning.md
+++ b/documentation/high-availability/tuning.md
@@ -109,14 +109,14 @@ NetApp Volumes instead.
 
 ### The cost tradeoff
 
-The storage cost gap (object store at ~$20/TB vs NFS at $60–300/TB) looks large,
+The storage cost gap (object store at ~\$20/TB vs NFS at \$60–300/TB) looks large,
 but the replication working set — WAL files in transit — is typically well under
 1 TB. At that scale the per-TB premium is modest in absolute terms.
 
 The real cost difference is **operations**. With object store, every flush and
 every poll is a billable request. Each actively-written table generates one write
 op per throttle window and one read op per poll interval. Across major clouds,
-write ops typically cost ~$5/million and read ops ~$0.40/million.
+write ops typically cost ~\$5/million and read ops ~\$0.40/million.
 
 **Object store ops cost per active table:**
 
@@ -153,7 +153,7 @@ transport choice.
 | | Object store | NFS / file storage |
 |---|---|---|
 | Pricing model | Per-request + per-GB stored | Fixed monthly (provisioned) |
-| Storage cost | ~$20/TB/mo | ~$60–300/TB/mo |
+| Storage cost | ~\$20/TB/mo | ~\$60–300/TB/mo |
 | Cost of aggressive tuning | Higher (more ops) | No change |
 | Setup complexity | Low | Medium (mount on all nodes) |
 | Best for | Default settings, few tables | Sub-second latency, many tables |