chcopy copies a curated slice of data from a source ClickHouse instance into a local one, driven by a YAML config. A soft guard warns and pauses when the target doesn't look local.
When you need real data on a local ClickHouse — a recent slice of production for debugging, or a stable fixture for tests — ad-hoc INSERT INTO ... SELECT ... FROM remoteSecure(...) scripts proliferate. chcopy replaces them with a single declarative YAML: named sets of tables and WHERE clauses, repeatable across machines, with a soft guard that warns and pauses when the configured target doesn't look local.
Common use cases:
- Local development — seed a local ClickHouse with a recent prod slice so feature work runs against realistic data.
- CI fixtures — pin a stable subset of prod data into a container for integration tests.
- Schema-paired data — combine with
chsyncto bring both schema and a data slice to local in two commands.
- A reachable source ClickHouse (typically prod) accessible from your local instance via
remoteSecure(). - A running local ClickHouse instance.
go install github.com/anytoe/chcopy@latestOr build from source:
git clone https://github.com/anytoe/chcopy.git
cd chcopy
go build -o chcopy .chcopy --config examples/config.yaml --name dev_importFor a full end-to-end walkthrough with two Dockerized ClickHouse instances, see examples/README.md.
connection:
dial_timeout: 30s # clickhouse-go default
read_timeout: 5m # clickhouse-go default
import_configurations:
- name: dev_import
tables:
- table: shop.users
where: ""
truncate: true
- table: shop.orders
where: "WHERE created_at >= '2026-05-01'"
truncate: trueTop-level connection (optional) — Go-client tunables for the local connection. Omit either field to use clickhouse-go defaults.
dial_timeout— TCP/TLS handshake timeout (Go default: 30s).read_timeout— max wait for a single Read from ClickHouse on an established connection. Not a query wall-clock cap; for that, set the server-sidemax_execution_timesetting.
Values use Go duration syntax: 30s, 5m, 2h.
Per import configuration:
name— unique within the file.table— fully qualifieddb.table. Must already exist in the local instance.where— rawWHERE ...clause, or empty string for full table.truncate— if true,TRUNCATE TABLEruns locally before insert.
Connection details live in env vars, not the YAML. All eight are required:
| Variable | Description |
|---|---|
CHCOPY_LOCAL_HOST |
Local ClickHouse host (must look local — see Safety guards) |
CHCOPY_LOCAL_PORT |
Local native port (9000 plain, 9440 TLS) |
CHCOPY_LOCAL_USER |
Local username |
CHCOPY_LOCAL_PASSWORD |
Local password |
CHCOPY_SOURCE_HOST |
Source ClickHouse host (typically prod) |
CHCOPY_SOURCE_PORT |
Source native port (9000 plain, 9440 TLS) |
CHCOPY_SOURCE_USER |
Source username |
CHCOPY_SOURCE_PASSWORD |
Source password |
| Flag | Description |
|---|---|
--config <path> |
YAML config file (required) |
--name <name> |
Named configuration to run (required unless --list) |
--list |
Print available config names and exit |
--dry-run |
Print SQL without executing |
Tables run sequentially in declared YAML order. For each table:
- Print source row count for the slice (with the WHERE).
- Print local row count before.
- If
truncate: true,TRUNCATE TABLElocally. INSERT INTO <table> SELECT * FROM remote(<source>, <table>, user, pw) <WHERE...>.- Print local row count after.
All SQL executes on the local server; the source is reached via remote(), or remoteSecure() when CHCOPY_SOURCE_PORT=9440 (ClickHouse's TLS native port). The same heuristic applies to the local connection — port 9440 ⇒ TLS, port 9000 ⇒ plain. Failures abort immediately — no partial-success bookkeeping.
chcopy does not block on the write target — you are responsible for pointing it at the right server. As a soft guard, if CHCOPY_LOCAL_HOST does not resolve to localhost or a private / docker-bridge address (10/8, 172.16/12, 192.168/16), chcopy prints a warning and pauses for 10 seconds before proceeding. Ctrl-C aborts.
Required env vars (CHCOPY_LOCAL_* and CHCOPY_SOURCE_*) must still be set, since without them there is nothing to connect to.
- Profiles — a top-level
profiles:section that composes named import configurations. Example: profilelocaldevbundlesusers+sales; profilesalesis just thesalesconfig. Run withchcopy --config ... --profile localdev. - Column projection & masking — per-table column allowlist/denylist, plus simple transforms (e.g. hash an
emailcolumn) so PII never reaches local. - Sampling —
limit: Norsample: 0.1per table for fast smoke imports. - Parallel table copies — opt-in worker count to copy independent tables concurrently.
- CLI table filters —
--only <table>/--skip <table>to run a subset of a named config without editing YAML. - Schema pre-flight — diff source vs local columns (name, type, order) before insert and abort with a clear message if they drift, pointing at
chsync. - Configurable query settings — pass arbitrary ClickHouse
SETTINGS(e.g.max_threads,max_memory_usage,max_execution_time) on the import query, set globally or per table. - Batch import — split a single table copy into chunks (by primary key range, date bucket, or row count) so large tables can resume on failure and avoid blowing memory on the local server.
Not battle-tested. Review your config and the printed SQL (use --dry-run) before running. No warranty of any kind.