Skip to content

Building an Observability Module for Platform Engineering with Code to Cloud #89

@kevinevans1

Description

@kevinevans1

Overview

As part of the Code to Cloud initiative, our platform engineering team needed to standardize and simplify how we deliver observability across internal development platforms. Observability is a critical component of any modern platform—it enables teams to monitor, troubleshoot, and optimize systems proactively and at scale.

We set out to build an Observability Module that would serve as a reusable, extensible, and accessible starting point for platform teams adopting CNCF-native observability tooling.

What Is Observability in Platform Engineering?

Observability is the ability to understand the internal state of a system based on the data it produces—primarily logs, metrics, and traces. In platform engineering, observability is a key design principle that helps platform teams and internal developers:
• Detect performance bottlenecks or failures before customers do
• Self-serve debugging without infrastructure access
• Maintain service level objectives (SLOs)
• Build reliable, scalable software on top of the platform

Observability is not a single tool—it is a collection of telemetry systems working together.

Goals of the Observability Module
1. Comply with CNCF standards
2. Support Kubernetes-first environments
3. Educate users on observability tooling and patterns
4. Be easy to deploy, extend, and integrate into GitOps workflows
5. Be accessible to platform engineers, developers, and SREs

Tools Used (Aligned with CNCF Landscape)

The module leverages these open-source, cloud-native tools:
• Metrics: Prometheus
• Tracing: OpenTelemetry with Jaeger
• Logging: Fluent Bit shipping logs to Loki
• Visualization: Grafana
• Pipeline Management: OpenTelemetry Collector

Features of the Module
• Pre-configured dashboards and alert rules
• Terraform and Helm support for automated deployments
• Clean code with inline documentation and usage examples
• Default configurations for Kubernetes, but extensible to other environments
• Diagrams and markdown guides for easy onboarding

Benefits
• Standardized telemetry stack across all environments
• Reduces time-to-observability for new services
• Improves incident response with richer context
• Enables self-service debugging for developers
• Encourages observability-first mindset in platform teams

Where to Find It

Repository: https://github.com/codetocloud/platform-observability-module
Documentation: docs/observability/README.md within the repo

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions