Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,24 @@ repo to install.
- [**Cloud Run Basics**](./skills/cloud/cloud-run-basics)
- [**Cloud SQL Basics**](./skills/cloud/cloud-sql-basics)
- [**Firebase Basics**](./skills/cloud/firebase-basics)
- [**Kubernetes Engine (GKE) Basics**](./skills/cloud/gke-basics)
- **Kubernetes Engine (GKE)**:
- [Basics](./skills/cloud/gke-basics)
- [App Onboarding](./skills/cloud/gke-app-onboarding)
- [Backup & DR](./skills/cloud/gke-backup-dr)
- [Batch & HPC](./skills/cloud/gke-batch-hpc)
- [Cluster Creation](./skills/cloud/gke-cluster-creation)
- [Compute Classes](./skills/cloud/gke-compute-classes)
- [Cost Optimization](./skills/cloud/gke-cost)
- [Golden Path](./skills/cloud/gke-golden-path)
- [AI/ML Inference](./skills/cloud/gke-inference)
- [Multi-tenancy](./skills/cloud/gke-multitenancy)
- [Networking](./skills/cloud/gke-networking)
- [Observability](./skills/cloud/gke-observability)
- [Reliability](./skills/cloud/gke-reliability)
- [Scaling](./skills/cloud/gke-scaling)
- [Security](./skills/cloud/gke-security)
- [Storage](./skills/cloud/gke-storage)
- [Upgrades](./skills/cloud/gke-upgrades)
- [**Workload Manager Basics**](./skills/cloud/workload-manager-basics)
- [**Recipe: Onboarding to Google Cloud**](./skills/cloud/google-cloud-recipe-onboarding)
- [**Recipe: Authenticating to Google Cloud**](./skills/cloud/google-cloud-recipe-auth)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,21 +1,34 @@
---
name: gke-app-onboarding
description: >-
Manages GKE application onboarding, covering containerization, deployment
manifests, and migration. Use when onboarding or deploying an application to
GKE for the first time, or containerizing an app for GKE. Don't use for
general GKE cluster administration or upgrades (use gke-basics or
gke-upgrades instead).
---

# GKE App Onboarding

This reference provides workflows for containerizing and deploying applications to GKE for the first time.
This reference provides workflows for containerizing and deploying applications
to GKE for the first time.

> **MCP Tools:** `apply_k8s_manifest`, `get_k8s_resource`, `get_k8s_rollout_status`, `get_k8s_logs`, `describe_k8s_resource`
> **MCP Tools:** `apply_k8s_manifest`, `get_k8s_resource`,
> `get_k8s_rollout_status`, `get_k8s_logs`, `describe_k8s_resource`

## Workflow

### 1. App Assessment

Before containerizing, assess the application:

- **Language & Framework**: Identify the tech stack
- **Dependencies**: List required libraries and external services
- **Configuration**: How is the app configured? (env vars, config files, secrets)
- **Statefulness**: Does it need persistent storage? (databases, file storage)
- **Networking**: Port mapping and protocol (HTTP, gRPC, TCP)
- **Health endpoints**: Does the app expose health check endpoints?
- **Language & Framework**: Identify the tech stack
- **Dependencies**: List required libraries and external services
- **Configuration**: How is the app configured? (env vars, config files,
secrets)
- **Statefulness**: Does it need persistent storage? (databases, file storage)
- **Networking**: Port mapping and protocol (HTTP, gRPC, TCP)
- **Health endpoints**: Does the app expose health check endpoints?

### 2. Containerization

Expand All @@ -38,14 +51,19 @@ ENTRYPOINT ["/server"]
```

**Best practices:**
- Use multi-stage builds to keep production images small
- Use distroless or minimal base images to reduce attack surface
- Run as non-root user
- Log to `stdout` and `stderr` for Cloud Logging collection

**Alternatives:**
- **Cloud Native Buildpacks** — auto-detect language and build without a Dockerfile: `pack build <image> --builder gcr.io/buildpacks/builder:latest`
- **Skaffold** — development workflow tool for iterating on containerized apps: `skaffold dev`
- Use multi-stage builds to keep production images small
- Use distroless or minimal base images to reduce attack surface
- Run as non-root user
- Log to `stdout` and `stderr` for Cloud Logging collection

For applications where writing a Dockerfile is not preferred, you can use
[**Cloud Native Buildpacks**](https://buildpacks.io/) to automatically detect
the language and build a container image:

```bash
pack build <image> --builder gcr.io/buildpacks/builder:latest
```

### 3. Image Management

Expand All @@ -60,7 +78,8 @@ docker build -t <REGION>-docker.pkg.dev/<PROJECT>/<REPO>/<IMAGE>:<TAG> .
docker push <REGION>-docker.pkg.dev/<PROJECT>/<REPO>/<IMAGE>:<TAG>
```

**Vulnerability scanning**: Enable automatic scanning in Artifact Registry to detect issues in base images and dependencies.
**Vulnerability scanning**: Enable automatic scanning in Artifact Registry to
detect issues in base images and dependencies.

```bash
# Check scan results
Expand Down Expand Up @@ -127,10 +146,12 @@ spec:
```

**Checklist for manifests:**
- Resource requests and limits set
- Liveness and readiness probes configured
- At least 2 replicas for production
- Service type appropriate (ClusterIP for internal, use Gateway API for external)

- Resource requests and limits set
- Liveness and readiness probes configured
- At least 2 replicas for production
- Service type appropriate (ClusterIP for internal, use Gateway API for
external)

### 5. Deploy

Expand All @@ -154,7 +175,9 @@ kubectl get pods -l app=my-app
## Next Steps

Once the application is running on GKE:
- Configure autoscaling — see [gke-scaling.md](./gke-scaling.md)
- Set up observability — see [gke-observability.md](./gke-observability.md)
- Harden security — see [gke-security.md](./gke-security.md)
- Configure reliability (PDBs, topology spread) — see [gke-reliability.md](./gke-reliability.md)

- Configure autoscaling — see the `gke-scaling` skill
- Set up observability — see the `gke-observability` skill
- Harden security — see the `gke-security` skill
- Configure reliability (PDBs, topology spread) — see the `gke-reliability`
skill
21 changes: 21 additions & 0 deletions skills/cloud/gke-app-onboarding/assets/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Use official lightweight Node.js image
FROM node:18-slim

# Set working directory
WORKDIR /app

# Copy package files and install dependencies (none in this case, but good practice)
COPY package*.json ./
RUN npm install --production

# Copy the application code
COPY index.js .

# Expose the application port
EXPOSE 8080

# Run the application as a non-root user for security
USER node

# Start the application
CMD [ "node", "index.js" ]
64 changes: 64 additions & 0 deletions skills/cloud/gke-app-onboarding/assets/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-app-deployment
labels:
app: node-app
spec:
replicas: 3
selector:
matchLabels:
app: node-app
template:
metadata:
labels:
app: node-app
spec:
automountServiceAccountToken: false
securityContext:
runAsNonRoot: true
runAsUser: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: node-app
# Replace <IMAGE_PATH> with your actual container image path
# e.g., us-docker.pkg.dev/my-project/my-repo/node-app@sha256:0123456789abcdef...
image: gcr.io/my-project/node-app@sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
ports:
- containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: node-app-service
spec:
selector:
app: node-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancer
19 changes: 19 additions & 0 deletions skills/cloud/gke-app-onboarding/assets/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
const http = require('http');

const port = process.env.PORT || 8080;

const server = http.createServer((req, res) => {
if (req.url === '/healthz') {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
res.end('OK');
} else {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
res.end('Hello, World!\n');
}
});

server.listen(port, () => {
console.log(`Server running on port ${port}`);
});
8 changes: 8 additions & 0 deletions skills/cloud/gke-app-onboarding/assets/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"name": "simple-node-app",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"start": "node index.js"
}
}
51 changes: 51 additions & 0 deletions skills/cloud/gke-backup-dr/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
name: gke-backup-dr
description: >-
Configures GKE Backup Plans and restore workflows. Use for backup policies,
disaster recovery, or GKE cluster restores. Don't use for database backups.
---

# GKE Backup & Disaster Recovery

Protects stateful GKE workloads using Backup for GKE.

## CLI Reference

```bash
# Enable GKE Backup addon (Slow cluster-level update)
gcloud container clusters update <CLUSTER_NAME> --enable-gke-backup --region <REGION> --quiet

# Create Backup Plan
gcloud container backup-restore backup-plans create <PLAN_NAME> \
--cluster=<CLUSTER_NAME> --location=<REGION> \
--retention-days=<DAYS> --cron-schedule="<CRON>" --all-namespaces --quiet

# Trigger Manual Backup
gcloud container backup-restore backups create <BACKUP_NAME> \
--backup-plan=<PLAN_NAME> --location=<REGION> --quiet

# Create Restore Plan
gcloud container backup-restore restore-plans create <RESTORE_PLAN_NAME> \
--cluster=<TARGET_CLUSTER_NAME> --location=<REGION> --backup-plan=<SOURCE_BACKUP_PLAN_NAME> \
--cluster-resource-conflict-policy=USE_EXISTING_VERSION --namespaced-resource-restore-mode=FAIL_ON_CONFLICT --quiet

# Execute Restore
gcloud container backup-restore restores create <RESTORE_NAME> \
--restore-plan=<RESTORE_PLAN_NAME> --backup=<BACKUP_NAME> --location=<REGION> --quiet
```

## Best Practices

1. **CMEK Encryption**: Encrypt backup plans using Customer-Managed Encryption
Keys: `--backup-encryption-key=<KEY>`.
2. **Scope**: Prefer backing up specific namespaces rather than the entire
cluster: `--included-namespaces=<ns1>,<ns2>`.

## Troubleshooting & Common Pitfalls (CRITICAL)

> [!IMPORTANT] **Slow Operations**: Enabling GKE Backup (`--enable-gke-backup`)
> triggers a slow Google Cloud control plane cluster update that takes several
> minutes. * **Rule**: **Do not run a terminal loop waiting for the GKE Backup
> addon to become active.** * **Action**: Provide the command to enable the
> addon, explain that the operation will proceed in the background, and
> immediately proceed to write the backup plan configs. Do not block.
Loading