-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Overview
As part of the Code to Cloud initiative, our platform engineering team needed to standardize and simplify how we deliver observability across internal development platforms. Observability is a critical component of any modern platform—it enables teams to monitor, troubleshoot, and optimize systems proactively and at scale.
We set out to build an Observability Module that would serve as a reusable, extensible, and accessible starting point for platform teams adopting CNCF-native observability tooling.
⸻
What Is Observability in Platform Engineering?
Observability is the ability to understand the internal state of a system based on the data it produces—primarily logs, metrics, and traces. In platform engineering, observability is a key design principle that helps platform teams and internal developers:
• Detect performance bottlenecks or failures before customers do
• Self-serve debugging without infrastructure access
• Maintain service level objectives (SLOs)
• Build reliable, scalable software on top of the platform
Observability is not a single tool—it is a collection of telemetry systems working together.
⸻
Goals of the Observability Module
1. Comply with CNCF standards
2. Support Kubernetes-first environments
3. Educate users on observability tooling and patterns
4. Be easy to deploy, extend, and integrate into GitOps workflows
5. Be accessible to platform engineers, developers, and SREs
⸻
Tools Used (Aligned with CNCF Landscape)
The module leverages these open-source, cloud-native tools:
• Metrics: Prometheus
• Tracing: OpenTelemetry with Jaeger
• Logging: Fluent Bit shipping logs to Loki
• Visualization: Grafana
• Pipeline Management: OpenTelemetry Collector
⸻
Features of the Module
• Pre-configured dashboards and alert rules
• Terraform and Helm support for automated deployments
• Clean code with inline documentation and usage examples
• Default configurations for Kubernetes, but extensible to other environments
• Diagrams and markdown guides for easy onboarding
⸻
Benefits
• Standardized telemetry stack across all environments
• Reduces time-to-observability for new services
• Improves incident response with richer context
• Enables self-service debugging for developers
• Encourages observability-first mindset in platform teams
⸻
Where to Find It
Repository: https://github.com/codetocloud/platform-observability-module
Documentation: docs/observability/README.md within the repo