diff --git a/docs/enterprise/index.mdx b/docs/enterprise/index.mdx
index 62fed12..c2cd88f 100644
--- a/docs/enterprise/index.mdx
+++ b/docs/enterprise/index.mdx
@@ -94,9 +94,9 @@ Read More: [LanceDB Enterprise Architecture](/enterprise/architecture/)
### Latency of data retrieval
-With LanceDB OSS, read latency depends heavily on where the data lives. If you use local disk or shared file storage, reads can be quite fast. But if you point an embedded deployment at S3, GCS, or Azure Blob, each read still pays the latency of remote object storage, especially when data is cold.
+With LanceDB OSS, read latency depends heavily on where the data lives. If you use local disk or shared file storage, reads can be quite fast. But if you point an embedded deployment at S3, GCS, or Azure Blob, every read still takes a full round trip to remote object storage, especially when the data is cold.
-LanceDB Enterprise is designed for the object-storage-backed case. It uses NVMe SSDs as a hybrid cache and executes reads across a distributed serving layer, so repeated reads do not always pay the full object-store round trip. The first read fills the cache, subsequent reads can come from local disk, and parallel chunked reads further reduce tail latency. This matters when the application serves interactive dashboards, real-time recommendations, or other latency-sensitive workloads on top of object storage.
+LanceDB Enterprise is designed for the object-storage-backed case. It uses NVMe SSDs as a hybrid cache and spreads reads across a distributed serving layer, so repeated reads can skip the full object-store round trip. The first read fills the cache, subsequent reads come from local disk, and parallel chunked reads further reduce tail latency. This matters when the application serves interactive dashboards, real-time recommendations, or other latency-sensitive workloads on top of object storage.
Read More: [LanceDB Enterprise Benchmarks](/enterprise/benchmarks/)
diff --git a/docs/integrations/lerobotdataset.mdx b/docs/integrations/lerobotdataset.mdx
index c98c77b..5919c02 100644
--- a/docs/integrations/lerobotdataset.mdx
+++ b/docs/integrations/lerobotdataset.mdx
@@ -15,7 +15,7 @@ import {
[LeRobotDataset v3.0](https://huggingface.co/docs/lerobot/lerobot-dataset-v3) standardizes robot learning data across sensorimotor time series, actions, multi-camera video, and task metadata. Its v3 layout stores high-frequency tabular signals in Parquet, visual streams as MP4 shards, and metadata that reconstructs episode-level views from larger files.
-Lance is useful next to LeRobot when you want high-performance random access, lazy multimodal blob reads, and a single table interface for curation, search, and training data preparation. The `lerobot-lancedb` package provides Lance-backed `LeRobotDataset` subclasses, and LanceDB can also open Lance-formatted LeRobot datasets on the Hub directly through `hf://` URIs.
+Lance pairs well with LeRobot when you need high-performance random access, lazy multimodal blob reads, and a single table interface for curation, search, and training data preparation. The `lerobot-lancedb` package ships Lance-backed `LeRobotDataset` subclasses, and LanceDB can open Lance-formatted LeRobot datasets on the Hub directly through `hf://` URIs.
## Install
@@ -25,7 +25,7 @@ pip install lancedb lance lerobot-lancedb
## Use Lance-backed LeRobotDataset loaders
-Use `LeRobotLanceDataset` when your Lance-backed dataset stores decoded image observations. It is intended as a drop-in replacement for `LeRobotDataset`, so existing policy training code can keep using standard PyTorch dataset and dataloader patterns.
+`LeRobotLanceDataset` is useful when your Lance-backed dataset stores decoded image observations. It's a drop-in replacement for `LeRobotDataset`, so existing policy training code keeps working with the usual PyTorch dataset and dataloader patterns.
{PyFrameworksLerobotLancedbImageDataset}
@@ -49,17 +49,17 @@ Lance-formatted LeRobot datasets published by `lance-format` expose each `.lance
{PyFrameworksLerobotOpenLanceTables}
-This is useful when you want to inspect schemas, count rows, sample metadata, or build curation workflows before handing the selected data to a training loop.
+Opening the tables directly is handy for inspecting schemas, counting rows, sampling metadata, or building curation workflows before any data reaches the training loop.
## Filter a frame window
-Robotics workflows often need deterministic slices by `episode_index`, `frame_index`, or task metadata before any model training starts. LanceDB can filter those rows without reading video blobs.
+Most robotics workflows want a deterministic slice by `episode_index`, `frame_index`, or task metadata long before training begins. LanceDB filters those rows without touching the video blobs.
{PyFrameworksLerobotFilterFrames}
-From there, you can materialize a smaller local LanceDB database, add derived columns, attach embeddings, or build vector and scalar indexes for faster repeated access.
+With the filtered set in hand, you can materialize a smaller local LanceDB database, add derived columns, attach embeddings, or build vector and scalar indexes for faster repeated access.
## Example Lance-formatted LeRobot datasets
diff --git a/docs/integrations/stable-worldmodel.mdx b/docs/integrations/stable-worldmodel.mdx
index 0c6a65e..1e884b4 100644
--- a/docs/integrations/stable-worldmodel.mdx
+++ b/docs/integrations/stable-worldmodel.mdx
@@ -15,7 +15,7 @@ import {
The LanceDB integration is built into Stable World Model's data format registry. Lance is the default backend for collected datasets, so a path ending in `.lance` gives you an append-friendly LanceDB table with episode-contiguous rows and fast indexed reads.
-That matters for world model research because the training loop repeatedly samples temporal windows from high-dimensional observations, actions, and rewards. Faster random access means more GPU time is spent training the model instead of waiting on the data loader.
+Random access speed is the bottleneck for world model training, since the loop repeatedly samples temporal windows from high-dimensional observations, actions, and rewards. The faster those windows arrive, the more GPU time goes into training rather than waiting on the data loader.
## Install
@@ -44,7 +44,7 @@ The dataset loader autodetects the Lance format from the path.
{PyFrameworksStableWorldmodelLoadLance}
-This keeps model code focused on the world model objective while LanceDB handles the storage layout and read path.
+Your model code stays focused on the world model objective while LanceDB handles the storage layout and read path.
## Evaluate with model-predictive control
@@ -57,7 +57,7 @@ Replace `world_model` with the trained model object from your training loop.
## Convert between formats
-Stable World Model can also convert between registered dataset formats. For example, you can collect in Lance for fast training reads, then export to the video layout for compact inspection artifacts.
+Stable World Model can convert between registered dataset formats. A common workflow is to collect in Lance for fast training reads, then export to the video layout for compact inspection artifacts.
{PyFrameworksStableWorldmodelConvert}
@@ -81,7 +81,7 @@ The Stable World Model README reports the following PushT benchmark results from
In that benchmark, local LanceDB reached about **3.4x** the no-cache throughput of local HDF5, while S3-backed LanceDB reached about **350x** the no-cache throughput of S3-backed HDF5. Even with cache enabled, S3-backed LanceDB was about **4.3x** faster than S3-backed HDF5.
-These numbers are reported by the Stable World Model project for its benchmark setup. Treat them as a reproducible directional baseline, not a universal guarantee for every environment, model, or storage configuration.
+These numbers come from the Stable World Model project's own benchmark setup, so they're best read as a reproducible directional baseline that may shift across environments, models, and storage configurations.
## Storage
@@ -93,7 +93,7 @@ The same README reports these local storage sizes for the benchmark dataset:
| LanceDB | 13.31 GB |
| Video | 496.29 MB |
-LanceDB used about **69% less local storage than HDF5** in the reported benchmark, while preserving a table interface that is designed for fast training reads and append-heavy dataset collection.
+LanceDB used about **69% less local storage than HDF5** in the reported benchmark, while preserving a table interface built for fast training reads and append-heavy collection.
## More resources
diff --git a/skills/docs-writer/SKILL.md b/skills/docs-writer/SKILL.md
index 7345685..1a06fd7 100644
--- a/skills/docs-writer/SKILL.md
+++ b/skills/docs-writer/SKILL.md
@@ -14,6 +14,16 @@ Code examples on docs pages are **not** written directly into MDX. They live ins
3. **Snippets must be inside passing tests.** They're extracted from real pytest/vitest/cargo tests. If the test doesn't run, the example is wrong.
4. **Cross-check every non-trivial API claim against the source repo.** If the user names a repo (e.g., *"check in the sophon repo"*, *"verify against lancedb"*), that repo is the source of truth — grep it, cite the file + line, and let the code override prior assumptions. See [Cross-checking docs against source repos](#cross-checking-docs-against-source-repos) for resolution rules.
+## Make it sound like a human
+
+LLM writing at times feels very formulaic, using very similar phrasing. The goal is to make the docs feel approachable and human, not like a dry manual that was written by a robot. Avoid repeating the same sentence structures, vary your word choice, and inject a bit of personality where appropriate. The content should be clear and accurate, but also engaging to read.
+
+Avoid the following extremely common patterns:
+- "It's not this, it's that."
+- "Paying the ___ tax" (e.g., "paying the import tax", "paying the setup tax") − the words "pay" and "tax" are heavily overused by AI
+- "LanceDB changes that" − the phrase "changes that" is a common AI crutch
+- "That matters" - state the consequence directly rather than using this overused phrase
+
## Pipeline at a glance
```