Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
TEMPORAL_ENDPOINT=localhost:7233
TEMPORAL_NAMESPACE=version-guard-dev
TEMPORAL_TASK_QUEUE=version-guard-detection
TEMPORAL_METRICS_ENABLED=true
TEMPORAL_METRICS_LISTEN_ADDRESS=0.0.0.0:9090

# ─── Wiz Configuration (Optional - falls back to mock data if not provided) ───
# Get these from your Wiz Service Account
Expand Down
2 changes: 1 addition & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -655,7 +655,7 @@ make run-locally # One-shot

### Monitoring

- **Metrics**: Expose Prometheus metrics from HTTP admin service
- **Metrics**: Expose Temporal SDK Prometheus metrics from the worker process
- **Logs**: Structured JSON logging via `log/slog`
- Machine-readable JSON format for log aggregation tools (Datadog, Splunk, CloudWatch Insights)
- Context-aware logging with typed fields for queryable log data
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ setup: ## Initial setup (install tools, setup hooks)
build: ## Build the server binary
@echo "🔨 Building $(BINARY_NAME) server..."
@mkdir -p bin
@go build -o bin/$(BINARY_NAME) cmd/server/main.go
@go build -o bin/$(BINARY_NAME) ./cmd/server
@echo "✅ Build complete: bin/$(BINARY_NAME)"

.PHONY: build-cli
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,16 @@ docker compose up --build
| `temporal` | Workflow orchestration | `7233` (gRPC), `8233` (Web UI) |
| `minio` | S3-compatible snapshot storage | `9000` (API), `9001` (Console) |
| `endoflife` | Local EOL data override (nginx) | `8082` |
| `version-guard` | The server | `8081` (HTTP admin) |
| `version-guard` | The server | `8081` (HTTP admin), `9090` (Temporal SDK metrics) |

The `endoflife` service serves patched EOL data for products with pending upstream PRs on [endoflife.date](https://endoflife.date), and proxies everything else to the live API. See [`deploy/endoflife-override/README.md`](./deploy/endoflife-override/README.md) for details on adding or updating overrides.

Once running, open the Temporal Web UI at http://localhost:8233 to trigger and monitor workflows.

Temporal SDK metrics are enabled by default and exposed at
http://localhost:9090/metrics. Set `TEMPORAL_METRICS_ENABLED=false` to disable
them, or set `TEMPORAL_METRICS_LISTEN_ADDRESS` to use a different address.

### Run Locally (manual)

If you prefer running components individually:
Expand Down
22 changes: 15 additions & 7 deletions USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,13 +348,21 @@ temporal workflow observe --workflow-id <workflow-id> --namespace version-guard-

#### Metrics to Track

Version Guard emits the following metrics (if Datadog enabled):
- `version_guard.findings.red` - Critical issues count
- `version_guard.findings.yellow` - Warning issues count
- `version_guard.findings.total` - Total resources scanned
- `version_guard.compliance_percentage` - Fleet compliance %
- `version_guard.detection.duration_ms` - Scan duration
- `version_guard.inventory.fetch` - Inventory fetch success rate
Version Guard enables the Temporal Go SDK metrics handler by default and exposes
Prometheus/OpenMetrics metrics on `:9090/metrics`.

Useful SDK metrics include:
- `temporal_workflow_completed_total`
- `temporal_workflow_failed_total`
- `temporal_workflow_endtoend_latency_seconds`
- `temporal_workflow_task_schedule_to_start_latency_seconds`
- `temporal_activity_execution_failed_total`
- `temporal_activity_execution_latency_seconds`
- `temporal_request_failure_total`
- `temporal_request_latency_seconds`

Set `TEMPORAL_METRICS_ENABLED=false` to disable the handler, or
`TEMPORAL_METRICS_LISTEN_ADDRESS=0.0.0.0:9091` to change the listen address.

#### Logs

Expand Down
4 changes: 2 additions & 2 deletions charts/version-guard/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ apiVersion: v2
name: version-guard
description: Cloud infrastructure version drift and EOL detection
type: application
version: 0.5.0
appVersion: "0.5.0"
version: 0.5.1
appVersion: "0.5.1"
maintainers:
- name: bakayolo
url: https://github.com/bakayolo
11 changes: 11 additions & 0 deletions charts/version-guard/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,20 @@ spec:
- name: http-admin
containerPort: {{ .Values.adminPort }}
protocol: TCP
{{- if .Values.temporalMetrics.enabled }}
- name: http-metrics
containerPort: {{ .Values.temporalMetrics.port }}
protocol: TCP
{{- end }}
env:
- name: HTTP_PORT
value: {{ .Values.adminPort | quote }}
- name: TEMPORAL_METRICS_ENABLED
value: {{ .Values.temporalMetrics.enabled | quote }}
{{- if .Values.temporalMetrics.enabled }}
- name: TEMPORAL_METRICS_LISTEN_ADDRESS
value: {{ .Values.temporalMetrics.listenAddress | quote }}
{{- end }}
{{- with .Values.env }}
{{- toYaml . | nindent 12 }}
{{- end }}
Expand Down
6 changes: 6 additions & 0 deletions charts/version-guard/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,11 @@ spec:
targetPort: http-admin
protocol: TCP
name: http-admin
{{- if .Values.temporalMetrics.enabled }}
- port: {{ .Values.temporalMetrics.port }}
targetPort: http-metrics
protocol: TCP
name: http-metrics
{{- end }}
selector:
{{- include "version-guard.selectorLabels" . | nindent 4 }}
5 changes: 5 additions & 0 deletions charts/version-guard/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ podAnnotations: {}

adminPort: 8081

temporalMetrics:
enabled: true
port: 9090
listenAddress: "0.0.0.0:9090"

service:
type: ClusterIP
adminPort: 8081
Expand Down
29 changes: 24 additions & 5 deletions cmd/server/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,11 @@ var version = "dev"
//nolint:govet // field alignment sacrificed for logical grouping
type ServerCLI struct {
// Temporal configuration
TemporalEndpoint string `help:"Temporal server endpoint" default:"localhost:7233" env:"TEMPORAL_ENDPOINT"`
TemporalNamespace string `help:"Temporal namespace" default:"version-guard-dev" env:"TEMPORAL_NAMESPACE"`
TemporalTaskQueue string `help:"Temporal task queue" default:"version-guard-detection" env:"TEMPORAL_TASK_QUEUE"`
TemporalEndpoint string `help:"Temporal server endpoint" default:"localhost:7233" env:"TEMPORAL_ENDPOINT"`
TemporalNamespace string `help:"Temporal namespace" default:"version-guard-dev" env:"TEMPORAL_NAMESPACE"`
TemporalTaskQueue string `help:"Temporal task queue" default:"version-guard-detection" env:"TEMPORAL_TASK_QUEUE"`
TemporalMetricsEnabled bool `help:"Enable Temporal SDK metrics" default:"true" env:"TEMPORAL_METRICS_ENABLED"`
TemporalMetricsListenAddress string `help:"Prometheus listen address for Temporal SDK metrics" default:"0.0.0.0:9090" env:"TEMPORAL_METRICS_LISTEN_ADDRESS"`

// Wiz configuration (optional - falls back to mock if not provided)
WizClientIDSecret string `help:"Wiz client ID" env:"WIZ_CLIENT_ID_SECRET"`
Expand Down Expand Up @@ -138,6 +140,8 @@ func (s *ServerCLI) Run(_ *kong.Context) error {
fmt.Printf(" Wiz Cache TTL: %d hours\n", s.WizCacheTTLHours)
fmt.Printf(" AWS Region: %s\n", s.AWSRegion)
fmt.Printf(" S3 Prefix: %s\n", s.S3Prefix)
fmt.Printf(" Temporal Metrics: enabled=%t listen=%s\n",
s.TemporalMetricsEnabled, s.TemporalMetricsListenAddress)
fmt.Printf(" Tag Keys - App: %s\n", s.TagAppKeys)
if s.ScheduleEnabled {
fmt.Printf(" Schedule: enabled (cron: %s, id: %s, jitter: %s)\n",
Expand Down Expand Up @@ -178,15 +182,30 @@ func (s *ServerCLI) Run(_ *kong.Context) error {
}

// Initialize Temporal client
temporalClient, err := client.Dial(client.Options{
temporalClientOptions := client.Options{
HostPort: s.TemporalEndpoint,
Namespace: s.TemporalNamespace,
ConnectionOptions: client.ConnectionOptions{
DialOptions: []grpc.DialOption{
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(20 * 1024 * 1024)), // 20MB for large Wiz reports
},
},
})
}
if s.TemporalMetricsEnabled {
metricsHandler, metricsCloser, metricsErr := newTemporalMetricsHandler(s.TemporalMetricsListenAddress)
if metricsErr != nil {
return metricsErr
}
defer func() {
if closeErr := metricsCloser.Close(); closeErr != nil {
slog.Warn("failed to close temporal metrics server", "error", closeErr)
}
}()
temporalClientOptions.MetricsHandler = metricsHandler
fmt.Printf("✓ Temporal SDK metrics listening on %s\n", s.TemporalMetricsListenAddress)
}

temporalClient, err := client.Dial(temporalClientOptions)
if err != nil {
return fmt.Errorf("failed to connect to Temporal at %s: %w", s.TemporalEndpoint, err)
}
Expand Down
31 changes: 31 additions & 0 deletions cmd/server/main_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package main

import (
"testing"

"github.com/stretchr/testify/require"
)

func TestNewTemporalMetricsHandlerRequiresListenAddress(t *testing.T) {
handler, closer, err := newTemporalMetricsHandler(" ")
require.Error(t, err)
require.Nil(t, handler)
require.Nil(t, closer)
require.Contains(t, err.Error(), "listen address is required")
}

func TestNewTemporalMetricsHandlerCreatesHandler(t *testing.T) {
handler, closer, err := newTemporalMetricsHandler("127.0.0.1:0")
require.NoError(t, err)
require.NotNil(t, handler)
require.NotNil(t, closer)
require.NoError(t, closer.Close())
}

func TestNewTemporalMetricsHandlerReturnsListenErrors(t *testing.T) {
handler, closer, err := newTemporalMetricsHandler("127.0.0.1:not-a-port")
require.Error(t, err)
require.Nil(t, handler)
require.Nil(t, closer)
require.Contains(t, err.Error(), "listen for temporal metrics")
}
84 changes: 84 additions & 0 deletions cmd/server/temporal_metrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
package main

import (
"context"
"errors"
"fmt"
"io"
"log/slog"
"net"
"net/http"
"strings"
"time"

prom "github.com/prometheus/client_golang/prometheus"
"github.com/uber-go/tally/v4"
tallyprom "github.com/uber-go/tally/v4/prometheus"
"go.temporal.io/sdk/client"
sdktally "go.temporal.io/sdk/contrib/tally"
)

type temporalMetricsCloser struct {
server *http.Server
scopeCloser io.Closer
}

func (c *temporalMetricsCloser) Close() error {
var closeErr error
if c.server != nil {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := c.server.Shutdown(ctx); err != nil && !errors.Is(err, http.ErrServerClosed) {
closeErr = errors.Join(closeErr, err)
}
}
if c.scopeCloser != nil {
closeErr = errors.Join(closeErr, c.scopeCloser.Close())
}
return closeErr
}

func newTemporalMetricsHandler(listenAddress string) (client.MetricsHandler, io.Closer, error) {
listenAddress = strings.TrimSpace(listenAddress)
if listenAddress == "" {
return nil, nil, fmt.Errorf("temporal metrics listen address is required")
}

registry := prom.NewRegistry()
reporter := tallyprom.NewReporter(tallyprom.Options{
Registerer: registry,
Gatherer: registry,
DefaultTimerType: tallyprom.HistogramTimerType,
OnRegisterError: func(err error) {
slog.Warn("temporal metrics reporter error", "error", err)
},
})

listener, err := net.Listen("tcp", listenAddress)
if err != nil {
return nil, nil, fmt.Errorf("listen for temporal metrics on %s: %w", listenAddress, err)
}
mux := http.NewServeMux()
mux.Handle("/metrics", reporter.HTTPHandler())
server := &http.Server{
Handler: mux,
ReadHeaderTimeout: 5 * time.Second,
}
go func() {
if err := server.Serve(listener); err != nil && !errors.Is(err, http.ErrServerClosed) {
slog.Warn("temporal metrics server stopped", "error", err)
}
}()

scopeOpts := tally.ScopeOptions{
CachedReporter: reporter,
Separator: tallyprom.DefaultSeparator,
SanitizeOptions: &sdktally.PrometheusSanitizeOptions,
}
scope, scopeCloser := tally.NewRootScope(scopeOpts, time.Second)
scope = sdktally.NewPrometheusNamingScope(scope)
return sdktally.NewMetricsHandler(scope), &temporalMetricsCloser{
server: server,
scopeCloser: scopeCloser,
}, nil
}
3 changes: 3 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ services:
environment:
TEMPORAL_ENDPOINT: temporal:7233
TEMPORAL_NAMESPACE: version-guard-dev
TEMPORAL_METRICS_ENABLED: ${TEMPORAL_METRICS_ENABLED:-true}
TEMPORAL_METRICS_LISTEN_ADDRESS: 0.0.0.0:9090
S3_BUCKET: version-guard-snapshots
AWS_REGION: us-east-1
AWS_ACCESS_KEY_ID: minioadmin
Expand All @@ -70,3 +72,4 @@ services:
SCHEDULE_JITTER: ${SCHEDULE_JITTER:-5m}
ports:
- "8081:8081"
- "9090:9090"
12 changes: 12 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ require (
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.0
github.com/google/uuid v1.6.0
github.com/pkg/errors v0.9.1
github.com/prometheus/client_golang v1.23.2
github.com/stretchr/testify v1.11.1
github.com/uber-go/tally/v4 v4.1.17
go.temporal.io/sdk v1.42.0
go.temporal.io/sdk/contrib/tally v0.2.0
golang.org/x/sync v0.19.0
google.golang.org/grpc v1.79.3
gopkg.in/yaml.v3 v3.0.1
Expand All @@ -33,17 +36,26 @@ require (
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
github.com/aws/smithy-go v1.24.2 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/facebookgo/clock v0.0.0-20150410010913-600d898af40a // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/mock v1.6.0 // indirect
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.3.2 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/nexus-rpc/sdk-go v0.6.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/robfig/cron v1.2.0 // indirect
github.com/stretchr/objx v0.5.2 // indirect
github.com/twmb/murmur3 v1.1.8 // indirect
go.temporal.io/api v1.62.7 // indirect
go.uber.org/atomic v1.11.0 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
golang.org/x/net v0.49.0 // indirect
golang.org/x/sys v0.40.0 // indirect
golang.org/x/text v0.33.0 // indirect
Expand Down
Loading
Loading