From 958290d166f4ebccaf697e1956a3cedf6896d46e Mon Sep 17 00:00:00 2001 From: kanthi subramanian Date: Thu, 14 May 2026 13:09:54 -0500 Subject: [PATCH] Added docs on scaling from 1 node to 3. --- docs/etcd-backup-restore-upgrade-3-node.md | 294 ++++++++++ docs/etcd-upgrade-etcd-cluster.md | 225 ++++++++ docs/ice-rest-catalog-k8s-1node.yaml | 610 +++++++++++++++++++++ ice-rest-catalog/README.md | 2 + 4 files changed, 1131 insertions(+) create mode 100644 docs/etcd-backup-restore-upgrade-3-node.md create mode 100644 docs/etcd-upgrade-etcd-cluster.md create mode 100644 docs/ice-rest-catalog-k8s-1node.yaml diff --git a/docs/etcd-backup-restore-upgrade-3-node.md b/docs/etcd-backup-restore-upgrade-3-node.md new file mode 100644 index 0000000..57987dd --- /dev/null +++ b/docs/etcd-backup-restore-upgrade-3-node.md @@ -0,0 +1,294 @@ + +## Alternative: Backup & restore to a fresh 3-node cluster + +If live-scaling fails or you prefer a clean migration with downtime, use this +approach. It works well for ice-rest-catalog because etcd stores only namespace +and table metadata pointers — actual table data lives in S3/MinIO. + +**Prerequisites**: `etcdctl` and `etcdutl` installed locally (same v3.5.x as the +cluster). Install via `brew install etcd` on macOS or download from +https://github.com/etcd-io/etcd/releases. + + + ### Deploy ice-rest-catalog (3 replicas), etcd (1 node), and MinIO + +```bash +kubectl apply -f ice-rest-catalog-k8s-1node.yaml +``` + +### Set up port-forwarding for ice-rest-catalog and MinIO + +```bash +kubectl port-forward -n iceberg-system svc/ice-rest-catalog 8181:8181 & +kubectl port-forward -n iceberg-system svc/minio 9000:9000 & +``` + +### Insert data using ice CLI + +```bash +ice insert -p flowers.iris file://iris.parquet +``` + +### Verify keys in etcd + +```bash +kubectl port-forward -n iceberg-system pod/etcd-0 2379:2379 & +etcdctl --endpoints=http://127.0.0.1:2379 get '' --prefix +``` + +Expected output: + +``` +n/flowers +{} +t/flowers/iris +{"table_type":"ICEBERG","metadata_location":"s3://warehouse/flowers/iris/metadata/..."} +``` + +### Verify the 1-node cluster is healthy + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health +kubectl exec -n iceberg-system etcd-0 -- etcdctl member list -w table + +127.0.0.1:2379 is healthy: successfully committed proposal: took = 1.508375ms ++------------------+---------+--------+----------------------------------------------------------+----------------------------------------------------------+------------+ +| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | ++------------------+---------+--------+----------------------------------------------------------+----------------------------------------------------------+------------+ +| 65f9fec08921c158 | started | etcd-0 | http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380 | http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379 | false | ++------------------+---------+--------+----------------------------------------------------------+----------------------------------------------------------+------------+ +``` + + +### Step 1: Take a snapshot of the 1-node cluster + +```bash +kubectl port-forward -n iceberg-system pod/etcd-0 2379:2379 & +PF_PID=$! +sleep 2 + +etcdctl --endpoints=http://127.0.0.1:2379 endpoint health +etcdctl --endpoints=http://127.0.0.1:2379 get '' --prefix --keys-only + +etcdctl --endpoints=http://127.0.0.1:2379 snapshot save etcd-backup.db +etcdutl snapshot status etcd-backup.db -w table + +kill $PF_PID +``` + +### Step 2: Stop ice-rest-catalog (prevents writes during migration) + +```bash +kubectl scale deployment ice-rest-catalog -n iceberg-system --replicas=0 +``` + +### Step 3: Restore snapshot locally for all 3 members + +`etcdutl snapshot restore` creates a separate data directory for each member +with the correct cluster membership baked in. + +Make sure `restore` is empty. + +```bash +INITIAL_CLUSTER="etcd-0=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd.iceberg-system.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd.iceberg-system.svc.cluster.local:2380" +CLUSTER_TOKEN="etcd-cluster-iceberg" + +etcdutl snapshot restore etcd-backup.db \ + --name etcd-0 \ + --initial-cluster "$INITIAL_CLUSTER" \ + --initial-cluster-token "$CLUSTER_TOKEN" \ + --initial-advertise-peer-urls http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380 \ + --data-dir ./restore/etcd-0 + +etcdutl snapshot restore etcd-backup.db \ + --name etcd-1 \ + --initial-cluster "$INITIAL_CLUSTER" \ + --initial-cluster-token "$CLUSTER_TOKEN" \ + --initial-advertise-peer-urls http://etcd-1.etcd.iceberg-system.svc.cluster.local:2380 \ + --data-dir ./restore/etcd-1 + +etcdutl snapshot restore etcd-backup.db \ + --name etcd-2 \ + --initial-cluster "$INITIAL_CLUSTER" \ + --initial-cluster-token "$CLUSTER_TOKEN" \ + --initial-advertise-peer-urls http://etcd-2.etcd.iceberg-system.svc.cluster.local:2380 \ + --data-dir ./restore/etcd-2 +``` + +You should now have three directories: `./restore/etcd-0`, `./restore/etcd-1`, +`./restore/etcd-2`, each containing a `member/` subdirectory. + +### Step 4: Tear down the 1-node etcd + +```bash +kubectl scale statefulset etcd -n iceberg-system --replicas=0 +kubectl wait --for=delete pod/etcd-0 -n iceberg-system --timeout=60s +kubectl delete pvc etcd-data-etcd-0 -n iceberg-system +``` + +### Step 5: Update the StatefulSet for 3 nodes and create PVCs + +```bash +kubectl set env statefulset/etcd -n iceberg-system \ + ETCD_INITIAL_CLUSTER="etcd-0=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd.iceberg-system.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd.iceberg-system.svc.cluster.local:2380" \ + ETCD_INITIAL_CLUSTER_STATE=existing + +kubectl scale statefulset etcd -n iceberg-system --replicas=3 +sleep 5 +kubectl scale statefulset etcd -n iceberg-system --replicas=0 + +kubectl wait --for=delete pod/etcd-0 pod/etcd-1 pod/etcd-2 \ + -n iceberg-system --timeout=60s + + +kubectl get pvc -n iceberg-system -l app.kubernetes.io/name=etcd +``` + +### Step 6: Copy restored data into each PVC + +For each member, run a temporary busybox pod that mounts the PVC, copy the +restored data in, then clean up. + +```bash +for i in 0 1 2; do + echo "--- Populating PVC for etcd-${i} ---" + + kubectl run etcd-restore-${i} --namespace=iceberg-system \ + --image=busybox:1.36 --restart=Never \ + --overrides="{ + \"spec\": { + \"containers\": [{ + \"name\": \"restore\", + \"image\": \"busybox:1.36\", + \"command\": [\"sleep\", \"3600\"], + \"volumeMounts\": [{ + \"name\": \"etcd-data\", + \"mountPath\": \"/var/lib/etcd\" + }] + }], + \"volumes\": [{ + \"name\": \"etcd-data\", + \"persistentVolumeClaim\": { + \"claimName\": \"etcd-data-etcd-${i}\" + } + }] + } + }" + + kubectl wait --for=condition=Ready pod/etcd-restore-${i} \ + -n iceberg-system --timeout=60s + + # Clear any leftover data and copy restored member directory in + kubectl exec -n iceberg-system etcd-restore-${i} -- rm -rf /var/lib/etcd/member + kubectl cp ./restore/etcd-${i}/member iceberg-system/etcd-restore-${i}:/var/lib/etcd/member + + kubectl delete pod etcd-restore-${i} -n iceberg-system + echo "--- etcd-${i} PVC populated ---" +done +``` + +### Step 7: Start the 3-node cluster + +```bash +kubectl scale statefulset etcd -n iceberg-system --replicas=3 +kubectl rollout status statefulset/etcd -n iceberg-system --timeout=120s + +# Verify all 3 members are healthy +kubectl exec -n iceberg-system etcd-0 -- etcdctl member list -w table + +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health \ + --endpoints=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-1.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-2.etcd.iceberg-system.svc.cluster.local:2379 + +# Verify data is present +kubectl exec -n iceberg-system etcd-0 -- etcdctl get '' --prefix --keys-only +``` + +### Step 8: Update ice-rest-catalog config and restart + +```bash +kubectl edit configmap ice-rest-catalog-config -n iceberg-system +``` + +Change the `uri` line to include all 3 endpoints: + +```yaml +uri: etcd:http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-1.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-2.etcd.iceberg-system.svc.cluster.local:2379 +``` + +Bring ice-rest-catalog back up: + +```bash +kubectl scale deployment ice-rest-catalog -n iceberg-system --replicas=3 +kubectl rollout status deployment/ice-rest-catalog -n iceberg-system +``` + +### Step 9: Verify end-to-end + +```bash +# Verify data is replicated across all 3 etcd nodes +kubectl exec -n iceberg-system etcd-0 -- etcdctl get '' --prefix --keys-only +kubectl exec -n iceberg-system etcd-1 -- etcdctl get '' --prefix --keys-only +kubectl exec -n iceberg-system etcd-2 -- etcdctl get '' --prefix --keys-only + +# Verify ice-rest-catalog can read the data +kubectl port-forward -n iceberg-system svc/ice-rest-catalog 8181:8181 & +curl -s http://localhost:8181/v1/namespaces | jq . +``` + +### Insert newer data +``` +ice insert -p flowers.iris2 iris.parquet +2026-05-14 12:12:56 [-5-thread-1/62876] INFO c.a.i.c.internal.cmd.Insert > iris.parquet: processing +2026-05-14 12:12:56 [-5-thread-1/62876] INFO c.a.i.c.internal.cmd.Insert > iris.parquet: copying to s3://warehouse/flowers/iris2/data/1778778776420-be3107e7c089b22d28aca3d577b1510d2920d8d0e9e096d52e09541898ebbdc1.parquet +2026-05-14 12:12:57 [-5-thread-1/62876] INFO c.a.i.c.internal.cmd.Insert > iris.parquet: adding data file (copy took 1s) +2026-05-14 12:12:57 [main/62876] INFO o.a.i.SnapshotProducer > Committed snapshot 1973195689872914758 (MergeAppend) +``` + +### Verify if the newer and existing data exists. +``` +kubectl exec -n iceberg-system etcd-0 -- etcdctl get '' --prefix --keys-only +kubectl exec -n iceberg-system etcd-1 -- etcdctl get '' --prefix --keys-only +kubectl exec -n iceberg-system etcd-2 -- etcdctl get '' --prefix --keys-only +Defaulted container "etcd" out of: etcd, backup-helper +n/flowers + +t/flowers/iris + +t/flowers/iris2 + +Defaulted container "etcd" out of: etcd, backup-helper +n/flowers + +t/flowers/iris + +t/flowers/iris2 + +Defaulted container "etcd" out of: etcd, backup-helper +n/flowers + +t/flowers/iris + +t/flowers/iris +``` + +### Cleanup + +```bash +rm -rf ./restore etcd-backup.db +``` + +## Summary (live scaling) + +``` +member add etcd-1 --> scale to 2 --> wait healthy --> member add etcd-2 --> scale to 3 --> wait healthy +``` + +The key rule: **always `member add` before the pod starts, never after.** + +## Summary (backup & restore) + +``` +snapshot save --> restore locally for 3 members --> tear down 1-node --> copy data into PVCs --> start 3-node --> update config +``` + +Preferred when live-scaling has failed or you want a clean-slate cluster. diff --git a/docs/etcd-upgrade-etcd-cluster.md b/docs/etcd-upgrade-etcd-cluster.md new file mode 100644 index 0000000..4ca3df1 --- /dev/null +++ b/docs/etcd-upgrade-etcd-cluster.md @@ -0,0 +1,225 @@ +# Scaling etcd from 1 node to 3 nodes in Kubernetes + +This guide walks through expanding a running 1-node etcd StatefulSet to a 3-node +cluster for ice-rest-catalog, without downtime or data loss. + +## Prerequisites + +- A running 1-node etcd StatefulSet (`etcd-0`) in namespace `iceberg-system` + deployed via `ice-rest-catalog-k8s-1node.yaml` +- `kubectl` access to the cluster +- The headless service `etcd` already exists with peer port 2380 + +## Initial setup (1-node) + +### Step 1: Deploy ice-rest-catalog (3 replicas), etcd (1 node), and MinIO + +```bash +kubectl apply -f ice-rest-catalog-k8s-1node.yaml +``` + +### Step 2: Set up port-forwarding for ice-rest-catalog and MinIO + +```bash +kubectl port-forward -n iceberg-system svc/ice-rest-catalog 8181:8181 & +kubectl port-forward -n iceberg-system svc/minio 9000:9000 & +``` + +### Step 3: Insert data using ice CLI + +```bash +ice insert -p flowers.iris file://iris.parquet +``` + +### Step 4: Verify keys in etcd + +```bash +kubectl port-forward -n iceberg-system pod/etcd-0 2379:2379 & +etcdctl --endpoints=http://127.0.0.1:2379 get '' --prefix +``` + +Expected output: + +``` +n/flowers +{} +t/flowers/iris +{"table_type":"ICEBERG","metadata_location":"s3://warehouse/flowers/iris/metadata/..."} +``` + +### Step 5: Verify the 1-node cluster is healthy + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health +kubectl exec -n iceberg-system etcd-0 -- etcdctl member list -w table + +127.0.0.1:2379 is healthy: successfully committed proposal: took = 1.508375ms ++------------------+---------+--------+----------------------------------------------------------+----------------------------------------------------------+------------+ +| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | ++------------------+---------+--------+----------------------------------------------------------+----------------------------------------------------------+------------+ +| 65f9fec08921c158 | started | etcd-0 | http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380 | http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379 | false | ++------------------+---------+--------+----------------------------------------------------------+----------------------------------------------------------+------------+ +``` + +You should see one member (`etcd-0`) with status `started`. + +## Scale from 1 to 3 nodes + +### Why "scale to 0, flip to existing, scale to 3" does NOT work + +etcd maintains an internal membership list. When etcd-0 is the only member, its +cluster membership is `{etcd-0}`. Setting `ETCD_INITIAL_CLUSTER_STATE=existing` +on a new pod tells etcd "I am joining an existing cluster" but the existing +cluster must already know about the new member. Without a prior `member add`, +the new pod is rejected. + +The correct sequence: **always `member add` before the pod starts, never after.** + +### Step 6: Update the StatefulSet env vars + +Update `ETCD_INITIAL_CLUSTER` to list all 3 nodes and set +`ETCD_INITIAL_CLUSTER_STATE` to `existing`: + +```bash +kubectl set env statefulset/etcd -n iceberg-system \ + ETCD_INITIAL_CLUSTER="etcd-0=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd.iceberg-system.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd.iceberg-system.svc.cluster.local:2380" \ + ETCD_INITIAL_CLUSTER_STATE=existing +``` + +**Note:** `kubectl set env` changes the pod template, which triggers a restart of +etcd-0. This is safe -- etcd ignores `ETCD_INITIAL_CLUSTER` and +`ETCD_INITIAL_CLUSTER_STATE` after first boot (it uses its data dir / WAL). +Wait for etcd-0 to come back up before proceeding: + +```bash +kubectl rollout status statefulset/etcd -n iceberg-system +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health +``` + +### Step 7: Register etcd-1 in the cluster + +Register etcd-1 with the running cluster *before* its pod starts: + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl member add etcd-1 \ + --peer-urls=http://etcd-1.etcd.iceberg-system.svc.cluster.local:2380 +``` + +The cluster now knows about etcd-1 but it is not yet running (reported as +`unstarted`). + +### Step 8: Ensure etcd-1 has a clean PVC + +If a PVC from a previous failed attempt exists, delete it so etcd-1 starts +fresh: + +```bash +kubectl get pvc -n iceberg-system | grep etcd-data-etcd-1 +# If it exists: +kubectl delete pvc etcd-data-etcd-1 -n iceberg-system +``` + +### Step 9: Scale to 2 + +```bash +kubectl scale statefulset etcd -n iceberg-system --replicas=2 +``` + +etcd-1 starts, reads `ETCD_INITIAL_CLUSTER_STATE=existing` from its env, and +joins the cluster. Its PVC is fresh (no data dir), so it bootstraps as a new +member joining an existing cluster. + +Wait for etcd-1 to become healthy: + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl member list -w table +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health \ + --endpoints=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-1.etcd.iceberg-system.svc.cluster.local:2379 +``` + +Both members should show `started` and `healthy`. + +### Step 10: Register etcd-2 in the cluster + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl member add etcd-2 \ + --peer-urls=http://etcd-2.etcd.iceberg-system.svc.cluster.local:2380 +``` + +### Step 11: Ensure etcd-2 has a clean PVC + +```bash +kubectl get pvc -n iceberg-system | grep etcd-data-etcd-2 +# If it exists: +kubectl delete pvc etcd-data-etcd-2 -n iceberg-system +``` + +### Step 12: Scale to 3 + +```bash +kubectl scale statefulset etcd -n iceberg-system --replicas=3 +``` + +Wait for etcd-2 to join: + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl member list -w table +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health \ + --endpoints=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-1.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-2.etcd.iceberg-system.svc.cluster.local:2379 +``` + +All 3 members should show `started` and `healthy`. + +### Step 13: Update ice-rest-catalog config + +Update the `uri` in ice-rest-catalog's ConfigMap to include all 3 endpoints: + +```bash +kubectl edit configmap ice-rest-catalog-config -n iceberg-system +``` + +Change the `uri` line to: + +```yaml +uri: etcd:http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-1.etcd.iceberg-system.svc.cluster.local:2379,http://etcd-2.etcd.iceberg-system.svc.cluster.local:2379 +``` + +Then restart ice-rest-catalog pods to pick up the new config: + +```bash +kubectl rollout restart deployment ice-rest-catalog -n iceberg-system +``` + +### Step 14: Verify data is replicated + +```bash +kubectl exec -n iceberg-system etcd-0 -- etcdctl get --prefix t/ --keys-only +kubectl exec -n iceberg-system etcd-1 -- etcdctl get --prefix t/ --keys-only +kubectl exec -n iceberg-system etcd-2 -- etcdctl get --prefix t/ --keys-only +``` + +All 3 should return the same keys. + +## Recovery: if you already scaled to 3 and pods are crash-looping + +If etcd-1 and etcd-2 are stuck in CrashLoopBackOff because they were never +registered via `member add`: + +```bash +# Scale back to 1 +kubectl scale statefulset etcd -n iceberg-system --replicas=1 + +# Delete the PVCs for the failed pods (they have corrupt/empty data) +kubectl delete pvc etcd-data-etcd-1 -n iceberg-system +kubectl delete pvc etcd-data-etcd-2 -n iceberg-system + +# Verify etcd-0 is healthy +kubectl exec -n iceberg-system etcd-0 -- etcdctl endpoint health +kubectl exec -n iceberg-system etcd-0 -- etcdctl member list -w table + +# Remove any phantom members that were partially added +# (check member list for unstarted members and remove them) +# kubectl exec -n iceberg-system etcd-0 -- etcdctl member remove + +# Now follow Steps 7-14 above +``` diff --git a/docs/ice-rest-catalog-k8s-1node.yaml b/docs/ice-rest-catalog-k8s-1node.yaml new file mode 100644 index 0000000..0a0df10 --- /dev/null +++ b/docs/ice-rest-catalog-k8s-1node.yaml @@ -0,0 +1,610 @@ +# ============================================================================= +# ice-rest-catalog Kubernetes Manifests (single-node etcd) +# ============================================================================= +# Same as ice-rest-catalog-k8s.yaml but etcd runs as 1 replica (dev / small +# clusters). For a 3-node etcd cluster, use ice-rest-catalog-k8s.yaml instead. +# Deploy with: kubectl apply -f ice-rest-catalog-k8s-1node.yaml +# +# For kind cluster access (run these commands to access services locally): +# kubectl port-forward svc/ice-rest-catalog 8181:8181 -n iceberg-system & +# kubectl port-forward svc/minio 9000:9000 -n iceberg-system & +# kubectl port-forward svc/minio 9001:9001 -n iceberg-system & +# +# Access URLs: +# - ice-rest-catalog: http://localhost:8181 +# - minio API: http://localhost:9000 +# - minio console: http://localhost:9001 +# +# For production use, consider using LoadBalancer or Ingress instead of NodePort +# ============================================================================= + +--- +# Namespace +apiVersion: v1 +kind: Namespace +metadata: + name: iceberg-system + labels: + app.kubernetes.io/name: iceberg-system + app.kubernetes.io/part-of: ice-rest-catalog + +--- +# ============================================================================= +# SECRETS & CONFIGMAPS +# ============================================================================= + +# MinIO Credentials Secret +apiVersion: v1 +kind: Secret +metadata: + name: minio-credentials + namespace: iceberg-system + labels: + app.kubernetes.io/name: minio + app.kubernetes.io/part-of: ice-rest-catalog +type: Opaque +stringData: + MINIO_ROOT_USER: "minio" + MINIO_ROOT_PASSWORD: "minio123" + +--- +# ice-rest-catalog Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: ice-rest-catalog-config + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog +data: + config.yaml: | + # etcd connection (single member: etcd-0) + uri: etcd:http://etcd-0.etcd.iceberg-system.svc.cluster.local:2379 + + # S3/MinIO warehouse configuration + warehouse: s3://warehouse + + # S3 settings for MinIO + s3: + endpoint: http://minio.iceberg-system.svc.cluster.local:9000 + pathStyleAccess: true + accessKeyID: minio + secretAccessKey: minio123 + region: us-east-1 + + # Server address + addr: 0.0.0.0:8181 + + # Anonymous access for development/testing + anonymousAccess: + enabled: true + accessConfig: {} + +--- +# ice-rest-catalog Secrets +apiVersion: v1 +kind: Secret +metadata: + name: ice-rest-catalog-secrets + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog +type: Opaque +stringData: + S3_ACCESS_KEY_ID: "minio" + S3_SECRET_ACCESS_KEY: "minio123" + +--- +# ============================================================================= +# ETCD CLUSTER +# ============================================================================= + +# etcd Headless Service for DNS SRV discovery +apiVersion: v1 +kind: Service +metadata: + name: etcd + namespace: iceberg-system + labels: + app.kubernetes.io/name: etcd + app.kubernetes.io/part-of: ice-rest-catalog +spec: + clusterIP: None + publishNotReadyAddresses: true + ports: + - name: client + port: 2379 + targetPort: 2379 + - name: peer + port: 2380 + targetPort: 2380 + selector: + app.kubernetes.io/name: etcd + +--- +# etcd StatefulSet +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: etcd + namespace: iceberg-system + labels: + app.kubernetes.io/name: etcd + app.kubernetes.io/part-of: ice-rest-catalog +spec: + serviceName: etcd + replicas: 1 + podManagementPolicy: Parallel + selector: + matchLabels: + app.kubernetes.io/name: etcd + template: + metadata: + labels: + app.kubernetes.io/name: etcd + app.kubernetes.io/part-of: ice-rest-catalog + spec: + terminationGracePeriodSeconds: 30 + containers: + - name: etcd + image: quay.io/coreos/etcd:v3.5.12 + ports: + - name: client + containerPort: 2379 + - name: peer + containerPort: 2380 + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + - name: ETCD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: ETCD_DATA_DIR + value: /var/lib/etcd + - name: ETCD_INITIAL_CLUSTER_STATE + value: new + - name: ETCD_INITIAL_CLUSTER_TOKEN + value: etcd-cluster-iceberg + - name: ETCD_LISTEN_PEER_URLS + value: http://0.0.0.0:2380 + - name: ETCD_LISTEN_CLIENT_URLS + value: http://0.0.0.0:2379 + - name: ETCD_ADVERTISE_CLIENT_URLS + value: http://$(POD_NAME).etcd.$(POD_NAMESPACE).svc.cluster.local:2379 + - name: ETCD_INITIAL_ADVERTISE_PEER_URLS + value: http://$(POD_NAME).etcd.$(POD_NAMESPACE).svc.cluster.local:2380 + - name: ETCD_INITIAL_CLUSTER + value: etcd-0=http://etcd-0.etcd.iceberg-system.svc.cluster.local:2380 + volumeMounts: + - name: etcd-data + mountPath: /var/lib/etcd + - name: etcd-backup + mountPath: /backup + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi + livenessProbe: + httpGet: + path: /health + port: 2379 + initialDelaySeconds: 15 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 + readinessProbe: + httpGet: + path: /health + port: 2379 + initialDelaySeconds: 5 + periodSeconds: 5 + timeoutSeconds: 3 + failureThreshold: 3 + - name: backup-helper + image: busybox:1.36 + command: ["sleep", "infinity"] + volumeMounts: + - name: etcd-backup + mountPath: /backup + resources: + requests: + cpu: 10m + memory: 16Mi + limits: + cpu: 100m + memory: 32Mi + volumes: + - name: etcd-backup + emptyDir: {} + volumeClaimTemplates: + - metadata: + name: etcd-data + labels: + app.kubernetes.io/name: etcd + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 10Gi + +--- +# ============================================================================= +# MINIO (S3-Compatible Storage) +# ============================================================================= + +# MinIO NodePort Service for external access +apiVersion: v1 +kind: Service +metadata: + name: minio + namespace: iceberg-system + labels: + app.kubernetes.io/name: minio + app.kubernetes.io/part-of: ice-rest-catalog +spec: + type: NodePort + ports: + - name: api + port: 9000 + targetPort: 9000 + nodePort: 30900 + - name: console + port: 9001 + targetPort: 9001 + nodePort: 30901 + selector: + app.kubernetes.io/name: minio + +--- +# MinIO Headless Service +apiVersion: v1 +kind: Service +metadata: + name: minio-headless + namespace: iceberg-system + labels: + app.kubernetes.io/name: minio +spec: + clusterIP: None + ports: + - name: api + port: 9000 + targetPort: 9000 + selector: + app.kubernetes.io/name: minio + +--- +# MinIO StatefulSet +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: minio + namespace: iceberg-system + labels: + app.kubernetes.io/name: minio + app.kubernetes.io/part-of: ice-rest-catalog +spec: + serviceName: minio-headless + replicas: 1 + podManagementPolicy: Parallel + selector: + matchLabels: + app.kubernetes.io/name: minio + template: + metadata: + labels: + app.kubernetes.io/name: minio + app.kubernetes.io/part-of: ice-rest-catalog + spec: + terminationGracePeriodSeconds: 30 + containers: + - name: minio + image: minio/minio:RELEASE.2024-01-31T20-20-33Z + args: + - server + - /data + - --console-address + - ":9001" + ports: + - name: api + containerPort: 9000 + - name: console + containerPort: 9001 + env: + - name: MINIO_ROOT_USER + valueFrom: + secretKeyRef: + name: minio-credentials + key: MINIO_ROOT_USER + - name: MINIO_ROOT_PASSWORD + valueFrom: + secretKeyRef: + name: minio-credentials + key: MINIO_ROOT_PASSWORD + volumeMounts: + - name: minio-data + mountPath: /data + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 1000m + memory: 1Gi + livenessProbe: + httpGet: + path: /minio/health/live + port: 9000 + initialDelaySeconds: 30 + periodSeconds: 20 + timeoutSeconds: 10 + readinessProbe: + httpGet: + path: /minio/health/ready + port: 9000 + initialDelaySeconds: 10 + periodSeconds: 10 + timeoutSeconds: 5 + volumeClaimTemplates: + - metadata: + name: minio-data + labels: + app.kubernetes.io/name: minio + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 50Gi + +--- +# MinIO Bucket Setup Job +apiVersion: batch/v1 +kind: Job +metadata: + name: minio-bucket-setup + namespace: iceberg-system + labels: + app.kubernetes.io/name: minio-setup + app.kubernetes.io/part-of: ice-rest-catalog +spec: + ttlSecondsAfterFinished: 300 + template: + metadata: + labels: + app.kubernetes.io/name: minio-setup + spec: + restartPolicy: OnFailure + initContainers: + - name: wait-for-minio + image: busybox:1.36 + command: + - sh + - -c + - | + echo "Waiting for MinIO to be ready..." + until wget -q --spider http://minio.iceberg-system.svc.cluster.local:9000/minio/health/ready; do + echo "MinIO not ready, waiting..." + sleep 5 + done + echo "MinIO is ready!" + containers: + - name: mc + image: minio/mc:RELEASE.2024-01-31T08-59-40Z + command: + - sh + - -c + - | + mc alias set myminio http://minio.iceberg-system.svc.cluster.local:9000 $MINIO_ROOT_USER $MINIO_ROOT_PASSWORD + mc mb --ignore-existing myminio/warehouse + echo "Bucket 'warehouse' created successfully!" + env: + - name: MINIO_ROOT_USER + valueFrom: + secretKeyRef: + name: minio-credentials + key: MINIO_ROOT_USER + - name: MINIO_ROOT_PASSWORD + valueFrom: + secretKeyRef: + name: minio-credentials + key: MINIO_ROOT_PASSWORD + +--- +# ============================================================================= +# ICE-REST-CATALOG +# ============================================================================= + +# ServiceAccount +apiVersion: v1 +kind: ServiceAccount +metadata: + name: ice-rest-catalog + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog + +--- +# ice-rest-catalog NodePort Service for external access +apiVersion: v1 +kind: Service +metadata: + name: ice-rest-catalog + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog + app.kubernetes.io/part-of: ice-rest-catalog +spec: + type: NodePort + ports: + - name: http + port: 8181 + targetPort: 8181 + nodePort: 30181 + protocol: TCP + selector: + app.kubernetes.io/name: ice-rest-catalog + +--- +# ice-rest-catalog Deployment +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ice-rest-catalog + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog + app.kubernetes.io/part-of: ice-rest-catalog +spec: + replicas: 3 + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + selector: + matchLabels: + app.kubernetes.io/name: ice-rest-catalog + template: + metadata: + labels: + app.kubernetes.io/name: ice-rest-catalog + app.kubernetes.io/part-of: ice-rest-catalog + spec: + terminationGracePeriodSeconds: 30 + serviceAccountName: ice-rest-catalog + initContainers: + - name: wait-for-etcd + image: busybox:1.36 + command: + - sh + - -c + - | + echo "Waiting for etcd cluster to be ready..." + until wget -q --spider http://etcd.iceberg-system.svc.cluster.local:2379/health; do + echo "etcd not ready, waiting..." + sleep 5 + done + echo "etcd is ready!" + - name: wait-for-minio + image: busybox:1.36 + command: + - sh + - -c + - | + echo "Waiting for MinIO to be ready..." + until wget -q --spider http://minio.iceberg-system.svc.cluster.local:9000/minio/health/ready; do + echo "MinIO not ready, waiting..." + sleep 5 + done + echo "MinIO is ready!" + containers: + - name: ice-rest-catalog + image: altinity/ice-rest-catalog:latest + ports: + - name: http + containerPort: 8181 + protocol: TCP + args: + - "-c" + - "/etc/ice-rest-catalog/config.yaml" + env: + - name: AWS_ACCESS_KEY_ID + valueFrom: + secretKeyRef: + name: ice-rest-catalog-secrets + key: S3_ACCESS_KEY_ID + - name: AWS_SECRET_ACCESS_KEY + valueFrom: + secretKeyRef: + name: ice-rest-catalog-secrets + key: S3_SECRET_ACCESS_KEY + - name: AWS_REGION + value: "us-east-1" + volumeMounts: + - name: config + mountPath: /etc/ice-rest-catalog + readOnly: true + resources: + requests: + cpu: 200m + memory: 512Mi + limits: + cpu: 1000m + memory: 1Gi + livenessProbe: + httpGet: + path: /v1/config + port: 8181 + initialDelaySeconds: 30 + periodSeconds: 15 + timeoutSeconds: 5 + failureThreshold: 3 + readinessProbe: + httpGet: + path: /v1/config + port: 8181 + initialDelaySeconds: 10 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 + volumes: + - name: config + configMap: + name: ice-rest-catalog-config + +--- +# PodDisruptionBudget +apiVersion: policy/v1 +kind: PodDisruptionBudget +metadata: + name: ice-rest-catalog + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog +spec: + minAvailable: 0 + selector: + matchLabels: + app.kubernetes.io/name: ice-rest-catalog + +--- +# HorizontalPodAutoscaler +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: ice-rest-catalog + namespace: iceberg-system + labels: + app.kubernetes.io/name: ice-rest-catalog +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: ice-rest-catalog + minReplicas: 3 + maxReplicas: 10 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 diff --git a/ice-rest-catalog/README.md b/ice-rest-catalog/README.md index 446249b..25bf295 100644 --- a/ice-rest-catalog/README.md +++ b/ice-rest-catalog/README.md @@ -29,6 +29,8 @@ If `enabled` is true but the catalog backend is not etcd, the lock is ignored (w - [Architecture](../docs/architecture.md) -- components, design principles, HA, backup/recovery - [Kubernetes Setup](../docs/k8s_setup.md) -- k8s deployment with etcd StatefulSet and replicas - [etcd Cluster Setup](../docs/etcd-cluster-setup.md) -- docker-compose setup with 3-node etcd, data insertion, replication verification, ClickHouse queries +- [etcd Cluster upgrade](../docs/etcd-backup-restore-upgrade-3-node.md) -- Includes k8s manifest and steps to backup data from 1 node etcd and restoring to 3 nodes.(with downtime) +- [Scaling etcd cluster (1 node to 3 nodes) in Kubernetes](../docs/etcd-upgrade-etcd-cluster.md) -- step-by-step guide for live scaling of etcd StatefulSet without downtime - [GCS Setup](../docs/ice-rest-catalog-gcs.md) -- configuring ice-rest-catalog with Google Cloud Storage - [etcd Backend Schema](../docs/etcd-backend-schema.md) -- etcd key/value schema (`n/`, `t/` prefixes) and mapping to SQLite - [SQLite Backend Schema](../docs/sqlite-backend-schema.md) -- SQLite tables (`iceberg_tables`, `iceberg_namespace_properties`)