diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index c361f813b1e7..3526512d1c0f 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -169,7 +169,13 @@ * Support Virtual-GenAI monitoring. * Fix on-demand pod log parsing failure by replacing invalid `DateTimeFormatter` pattern with `ISO_OFFSET_DATE_TIME`. * Fix Zipkin receiver compatibility with application/x-protobuf Content-Type. - +* Support Envoy AI Gateway observability (SWIP-10): new `ENVOY_AI_GATEWAY` layer with MAL/LAL rules + for GenAI metrics (token usage, latency, TTFT, TPOT) and access log sampling via OTLP. +* OTel metric receiver: convert data point attribute dots to underscores (consistent with resource attributes + and metric names). Label mappings are now fallback-only — explicit `job_name` in resource attributes takes + precedence over the `service.name` fallback. +* OTel log handler: prefer `service.instance.id` (OTel spec) over `service.instance` with fallback. +* Add `SampleFamily.debugDump()` for MAL debugging. #### UI * Fix the missing icon in new native trace view. diff --git a/docs/en/concepts-and-designs/lal.md b/docs/en/concepts-and-designs/lal.md index 974f29dee4d2..db86bd836508 100644 --- a/docs/en/concepts-and-designs/lal.md +++ b/docs/en/concepts-and-designs/lal.md @@ -8,6 +8,19 @@ The LAL config files are in YAML format, and are located under directory `lal`. set `log-analyzer/default/lalFiles` in the `application.yml` file or set environment variable `SW_LOG_LAL_FILES` to activate specific LAL config files. +## OTLP log attribute mapping + +When logs arrive via the OTLP receiver, resource attributes are mapped to `LogData` fields: + +| Resource attribute | LogData field | Notes | +|---|---|---| +| `service.name` | `service` | SkyWalking service name | +| `service.instance.id` | `serviceInstance` | OTel standard ([spec](https://opentelemetry.io/docs/specs/semconv/resource/#service)). Falls back to `service.instance` for backward compatibility. | +| `service.layer` | `layer` | Routes to the LAL rule with matching `layer` declaration | + +Log record attributes are available via `tag("attribute_name")` in LAL rules. Attribute keys +retain their original names (dots are NOT converted to underscores in log attributes). + ## Layer Layer should be declared in the LAL script to represent the analysis scope of the logs. diff --git a/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md new file mode 100644 index 000000000000..e8620edb9d8b --- /dev/null +++ b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md @@ -0,0 +1,104 @@ +# Envoy AI Gateway Monitoring + +## Envoy AI Gateway observability via OTLP + +[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for AI/LLM API traffic +(OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy. +It natively emits GenAI metrics and access logs via OTLP, following +[OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). + +SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) — no OpenTelemetry +Collector is needed between the AI Gateway and SkyWalking OAP. + +### Prerequisites +- [Envoy AI Gateway](https://aigateway.envoyproxy.io/) deployed. See the + [Envoy AI Gateway getting started](https://aigateway.envoyproxy.io/docs/getting-started/) for installation. + +### Data flow +1. Envoy AI Gateway processes LLM API requests and records GenAI metrics (token usage, latency, TTFT, TPOT). +2. The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking OAP. +3. SkyWalking OAP parses metrics with [MAL](../../concepts-and-designs/mal.md) rules and access logs + with [LAL](../../concepts-and-designs/lal.md) rules. + +### Set up + +The MAL rules (`envoy-ai-gateway/*`) and LAL rules (`envoy-ai-gateway`) are enabled by default +in SkyWalking OAP. No OAP-side configuration is needed. + +Configure the AI Gateway to push OTLP to SkyWalking by setting these environment variables: + +| Env Var | Value | Purpose | +|---------|-------|---------| +| `OTEL_SERVICE_NAME` | Per-deployment gateway name (e.g., `my-ai-gateway`) | SkyWalking service name | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking OAP gRPC receiver | +| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport | +| `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push | +| `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push | +| `OTEL_RESOURCE_ATTRIBUTES` | See below | Routing + instance + layer | + +**Required resource attributes** (in `OTEL_RESOURCE_ATTRIBUTES`): +- `job_name=envoy-ai-gateway` — Fixed routing tag for MAL/LAL rules. Same for all AI Gateway deployments. +- `service.instance.id=` — Instance identity. In Kubernetes, use the pod name via Downward API. +- `service.layer=ENVOY_AI_GATEWAY` — Routes access logs to the AI Gateway LAL rules. + +**Example:** +```bash +OTEL_SERVICE_NAME=my-ai-gateway +OTEL_EXPORTER_OTLP_ENDPOINT=http://skywalking-oap:11800 +OTEL_EXPORTER_OTLP_PROTOCOL=grpc +OTEL_METRICS_EXPORTER=otlp +OTEL_LOGS_EXPORTER=otlp +OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=pod-abc123,service.layer=ENVOY_AI_GATEWAY +``` + +### Supported Metrics + +SkyWalking observes the AI Gateway as a `LAYER: ENVOY_AI_GATEWAY` service. Each gateway deployment +is a service, each pod is an instance. Metrics include per-provider and per-model breakdowns. + +#### Service Metrics + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per minute | +| Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average request duration | +| Request Latency Percentile | ms | meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 | +| Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input (prompt) tokens per minute | +| Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate | Output (completion) tokens per minute | +| TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming only) | +| TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile | P50/P75/P90/P95/P99 TTFT | +| TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token (streaming only) | +| TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile | P50/P75/P90/P95/P99 TPOT | + +#### Provider Breakdown Metrics + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm | Requests by provider | +| Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate | Token rate by provider | +| Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency by provider | + +#### Model Breakdown Metrics + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm | Requests by model | +| Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token rate by model | +| Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by model | +| Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model | +| Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model | + +#### Instance Metrics + +All service-level metrics are also available per instance (pod) with `meter_envoy_ai_gw_instance_` prefix, +including per-provider and per-model breakdowns. + +### Access Log Sampling + +The LAL rules apply a sampling policy to reduce storage: +- **Error responses** (HTTP status >= 400) — always persisted. +- **Upstream failures** — always persisted. +- **High token cost** (>= 10,000 total tokens) — persisted for cost anomaly detection. +- Normal successful responses with low token counts are dropped. + +The token threshold can be adjusted in `lal/envoy-ai-gateway.yaml`. diff --git a/docs/en/setup/backend/marketplace.md b/docs/en/setup/backend/marketplace.md index 874ff29b81c4..a1d0b1ab652f 100644 --- a/docs/en/setup/backend/marketplace.md +++ b/docs/en/setup/backend/marketplace.md @@ -12,6 +12,7 @@ SkyWalking provides ready-to-use monitoring capabilities for a wide range of tec - **Infrastructure** - Linux and Windows server monitoring - **Cloud Services** - AWS EKS, S3, DynamoDB, API Gateway, and more - **Gateways** - Nginx, APISIX, Kong monitoring +- **GenAI** - [Virtual GenAI](../service-agent/virtual-genai.md) for agent-based LLM call monitoring, [Envoy AI Gateway](backend-envoy-ai-gateway-monitoring.md) for infrastructure-side AI traffic observability - **Databases** - MySQL, PostgreSQL, Redis, Elasticsearch, MongoDB, ClickHouse, and more - **Message Queues** - Kafka, RabbitMQ, Pulsar, RocketMQ, ActiveMQ - **Browser** - Real user monitoring for web applications diff --git a/docs/en/setup/backend/opentelemetry-receiver.md b/docs/en/setup/backend/opentelemetry-receiver.md index e17e309b87cc..ac7924ea40ba 100644 --- a/docs/en/setup/backend/opentelemetry-receiver.md +++ b/docs/en/setup/backend/opentelemetry-receiver.md @@ -26,7 +26,26 @@ The receiver adds label with key `node_identifier_host_name` to the collected da and its value is from `net.host.name` (or `host.name` for some OTLP versions) resource attributes defined in OpenTelemetry proto, for identification of the metric data. -**Notice:** In the resource scope, dots (.) in the attributes' key names are converted to underscores (_), whereas in the metrics scope, they are not converted. +**Label name conversion:** Dots (`.`) in attribute key names are converted to underscores (`_`) for both +resource attributes and data point (metric-level) attributes. For example, `gen_ai.token.type` becomes +`gen_ai_token_type` in MAL rules. Metric names also undergo the same conversion (e.g., +`gen_ai.client.token.usage` becomes `gen_ai_client_token_usage`). + +**Fallback label mappings:** The following resource attributes are copied to alternative label names +if the target does not already exist. These are fallback-only — if the target label is already present +in the resource attributes, the fallback is skipped. + +| Source | Target | Notes | +|---|---|---| +| `service.name` | `job_name` | The [OTel Collector Prometheus Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/README.md) automatically converts the Prometheus `job` label to `service.name`. This fallback ensures it is available as `job_name` for MAL rule filtering. | +| `net.host.name` | `node_identifier_host_name` | Legacy: used by VM/Windows MAL rules | +| `host.name` | `node_identifier_host_name` | Legacy: used by VM/Windows MAL rules | + +When `job_name` is set explicitly in `OTEL_RESOURCE_ATTRIBUTES` (e.g., by Envoy AI Gateway), +it takes precedence and the `service.name` fallback is skipped. + +**Note:** The `net.host.name` and `host.name` mappings are legacy. New integrations should use +the natural dot-to-underscore conversion (e.g., `host.name` → `host_name` in MAL rules). | Description | Configuration File | Data Source | |-----------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------| diff --git a/docs/en/swip/SWIP-10/SWIP.md b/docs/en/swip/SWIP-10/SWIP.md index 1910b140e665..1c4ca4d19600 100644 --- a/docs/en/swip/SWIP-10/SWIP.md +++ b/docs/en/swip/SWIP-10/SWIP.md @@ -62,43 +62,42 @@ This is a **normal** layer (`isNormal=true`) because the AI Gateway is a real, i #### `job_name` — Routing Tag for MAL/LAL Rules -SkyWalking's OTel receiver maps the OTLP resource attribute `service.name` to the internal tag `job_name`. -This tag is used by MAL rule filters to route metrics to the correct rule set. All Envoy AI Gateway -deployments must use a fixed `OTEL_SERVICE_NAME` value so that SkyWalking can identify the traffic: +The `job_name` resource attribute is set explicitly in `OTEL_RESOURCE_ATTRIBUTES` to a fixed value +for all AI Gateway deployments. MAL rule filters use it to route metrics to the correct rule set: -```bash -OTEL_SERVICE_NAME=envoy-ai-gateway -``` - -This becomes `job_name=envoy-ai-gateway` in MAL, and the rules filter on it: ```yaml filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" ``` -`job_name` is NOT the SkyWalking service name — it is only used for metric/log routing. +`job_name` is NOT the SkyWalking service name — it is only used for metric/log routing. The +SkyWalking service name comes from `OTEL_SERVICE_NAME` (standard OTel env var), which is set +per deployment. #### Service and Instance Mapping | SkyWalking Entity | Source | Example | |---|---|---| -| **Service** | `aigw.service` resource attribute (K8s Deployment/Service name, set via CRD) | `envoy-ai-gateway-basic` | -| **Service Instance** | `service.instance.id` resource attribute (pod name, set via CRD + Downward API) | `aigw-pod-7b9f4d8c5` | +| **Service** | `OTEL_SERVICE_NAME` / `service.name` (per-deployment gateway name) | `my-ai-gateway` | +| **Service Instance** | `service.instance.id` resource attribute (pod name, set via Downward API) | `aigw-pod-7b9f4d8c5` | -Each Kubernetes Gateway deployment is a separate SkyWalking **service**. Each pod (ext_proc replica) is a -**service instance**. Neither attribute is emitted by the AI Gateway by default — both must be explicitly -set via `OTEL_RESOURCE_ATTRIBUTES` in the `GatewayConfig` CRD (see below). +Each Kubernetes Gateway deployment sets its own `OTEL_SERVICE_NAME` (the standard OTel env var) as the +SkyWalking **service** name. Each pod is a **service instance** identified by `service.instance.id`. -The **layer** (`ENVOY_AI_GATEWAY`) is set by MAL/LAL rules based on the `job_name` filter, not by the -client. This follows the same pattern as other SkyWalking OTel integrations (e.g., ActiveMQ, K8s). +The `job_name` resource attribute is set explicitly to the fixed value `envoy-ai-gateway` for MAL/LAL +rule routing. This is separate from `service.name` — all AI Gateway deployments share the same +`job_name` for routing, but each has its own `service.name` for entity identity. + +The **layer** (`ENVOY_AI_GATEWAY`) is set via `service.layer` resource attribute and used by LAL for +log routing. MAL rules use `job_name` for metric routing. Provider and model are **metric-level labels**, not separate entities in this layer. They are used for fine-grained metric breakdowns within the gateway service dashboards rather than being modeled as separate services (unlike the agent-based `VIRTUAL_GENAI` layer where provider=service, model=instance). -The MAL `expSuffix` uses the `aigw_service` tag (dots converted to underscores by OTel receiver) as the -SkyWalking service name and `service_instance_id` as the instance name: +The MAL `expSuffix` uses the `service_name` tag as the SkyWalking service name and `service_instance_id` +as the instance name: ```yaml -expSuffix: service(['aigw_service'], Layer.ENVOY_AI_GATEWAY).instance(['aigw_service', 'service_instance_id']) +expSuffix: service(['service_name'], Layer.ENVOY_AI_GATEWAY).instance(['service_name', 'service_instance_id']) ``` #### Complete Kubernetes Setup Example @@ -127,36 +126,33 @@ spec: extProc: kubernetes: env: - # job_name — fixed value for MAL/LAL rule routing (same for ALL AI Gateway deployments) + # SkyWalking service name = Gateway CRD name (auto-resolved from pod label) + # OTEL_SERVICE_NAME is the standard OTel env var for service.name + - name: GATEWAY_NAME + valueFrom: + fieldRef: + fieldPath: metadata.labels['gateway.envoyproxy.io/owning-gateway-name'] - name: OTEL_SERVICE_NAME - value: "envoy-ai-gateway" + value: "$(GATEWAY_NAME)" # OTLP endpoint — SkyWalking OAP gRPC receiver - name: OTEL_EXPORTER_OTLP_ENDPOINT value: "http://skywalking-oap.skywalking:11800" - name: OTEL_EXPORTER_OTLP_PROTOCOL value: "grpc" - # Enable OTLP for both metrics and access logs - name: OTEL_METRICS_EXPORTER value: "otlp" - name: OTEL_LOGS_EXPORTER value: "otlp" - # Gateway name = Gateway CRD metadata.name (e.g., "my-ai-gateway") - # Read from pod label gateway.envoyproxy.io/owning-gateway-name, - # which is auto-set by the Envoy Gateway controller on every envoy pod. - - name: GATEWAY_NAME - valueFrom: - fieldRef: - fieldPath: metadata.labels['gateway.envoyproxy.io/owning-gateway-name'] - # Pod name (e.g., "envoy-default-my-ai-gateway-76d02f2b-xxx") + # Pod name for instance identity - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - # aigw.service → SkyWalking service name (= Gateway CRD name, auto-resolved) - # service.instance.id → SkyWalking instance name (= pod name, auto-resolved) - # $(VAR) substitution references the valueFrom env vars defined above. + # job_name — fixed routing tag for MAL/LAL rules (same for ALL AI Gateway deployments) + # service.instance.id — SkyWalking instance name (= pod name) + # service.layer — routes logs to ENVOY_AI_GATEWAY LAL rules - name: OTEL_RESOURCE_ATTRIBUTES - value: "aigw.service=$(GATEWAY_NAME),service.instance.id=$(POD_NAME)" + value: "job_name=envoy-ai-gateway,service.instance.id=$(POD_NAME),service.layer=ENVOY_AI_GATEWAY" --- # 3. Gateway — references the GatewayConfig via annotation apiVersion: gateway.networking.k8s.io/v1 @@ -227,15 +223,16 @@ spec: | Env Var / Resource Attribute | SkyWalking Concept | Example Value | |---|---|---| -| `OTEL_SERVICE_NAME` | `job_name` (MAL/LAL rule routing) | `envoy-ai-gateway` (fixed for all deployments) | -| `aigw.service` | Service name | `my-ai-gateway` (auto-resolved from gateway name label) | -| `service.instance.id` | Instance name | `envoy-default-my-ai-gateway-...` (auto-resolved from pod name) | +| `OTEL_SERVICE_NAME` | Service name | `my-ai-gateway` (auto-resolved from Gateway CRD name) | +| `job_name` (in `OTEL_RESOURCE_ATTRIBUTES`) | MAL/LAL rule routing | `envoy-ai-gateway` (fixed for all deployments) | +| `service.instance.id` (in `OTEL_RESOURCE_ATTRIBUTES`) | Instance name | `envoy-default-my-ai-gateway-...` (auto-resolved from pod name) | +| `service.layer` (in `OTEL_RESOURCE_ATTRIBUTES`) | LAL log routing | `ENVOY_AI_GATEWAY` (fixed) | **No manual per-gateway configuration needed** for service and instance names: - `GATEWAY_NAME` is auto-resolved from the pod label `gateway.envoyproxy.io/owning-gateway-name`, which is set automatically by the Envoy Gateway controller on every envoy pod. +- `OTEL_SERVICE_NAME` uses `$(GATEWAY_NAME)` substitution to set the per-deployment service name. - `POD_NAME` is auto-resolved from the pod name via the Downward API. -- Both are injected into `OTEL_RESOURCE_ATTRIBUTES` via standard Kubernetes `$(VAR)` substitution. The `GatewayConfig.spec.extProc.kubernetes.env` field accepts full `corev1.EnvVar` objects (including `valueFrom`), merged into the ext_proc container by the gateway mutator webhook. Verified on Kind @@ -247,8 +244,15 @@ The ext_proc runs in-process (not as a subprocess), so there is no env var propa ### 3. MAL Rules for OTLP Metrics -Create `oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/` with MAL rules consuming -the 4 GenAI metrics from Envoy AI Gateway. +Create `oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/` with 2 MAL rule files +consuming the 4 GenAI metrics from Envoy AI Gateway. Since `expSuffix` is file-level, service and +instance scopes need separate files. Provider and model breakdowns share the same `expSuffix` as their +parent scope, so they are included in the same file. + +| File | `expSuffix` | Contains | +|---|---|---| +| `gateway-service.yaml` | `service(['service_name'], Layer.ENVOY_AI_GATEWAY)` | Service aggregates + per-provider breakdown + per-model breakdown | +| `gateway-instance.yaml` | `instance(['service_name'], ['service_instance_id'], Layer.ENVOY_AI_GATEWAY)` | Instance aggregates + per-provider breakdown + per-model breakdown | All MAL rule files use the `job_name` filter to match only AI Gateway traffic: ```yaml @@ -282,7 +286,7 @@ filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" | Time Per Output Token Percentile | ms | `meter_envoy_ai_gw_tpot_percentile` | P50/P75/P90/P95/P99 inter-token latency | | Estimated Cost | cost/min | `meter_envoy_ai_gw_estimated_cost` | Estimated cost per minute (from token counts × config pricing) | -**Per-provider breakdown metrics (labeled, within gateway service):** +**Per-provider breakdown metrics (service scope):** | Monitoring Panel | Unit | Metric Name | Description | |---|---|---|---| @@ -290,7 +294,7 @@ filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" | Provider Token Usage | tokens/min | `meter_envoy_ai_gw_provider_token_rate` | Token rate by provider and token type | | Provider Latency Avg | ms | `meter_envoy_ai_gw_provider_latency_avg` | Average latency by provider | -**Per-model breakdown metrics (labeled, within gateway service):** +**Per-model breakdown metrics (service scope):** | Monitoring Panel | Unit | Metric Name | Description | |---|---|---|---| @@ -300,6 +304,42 @@ filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" | Model TTFT Avg | ms | `meter_envoy_ai_gw_model_ttft_avg` | Average TTFT by model | | Model TPOT Avg | ms | `meter_envoy_ai_gw_model_tpot_avg` | Average inter-token latency by model | +**Instance-level (per-pod) aggregate metrics:** + +Same metrics as service-level but scoped to individual pods via `expSuffix: service([...]).instance([...])`. + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Request CPM | count/min | `meter_envoy_ai_gw_instance_request_cpm` | Requests per minute per pod | +| Request Latency Avg | ms | `meter_envoy_ai_gw_instance_request_latency_avg` | Average request duration per pod | +| Request Latency Percentile | ms | `meter_envoy_ai_gw_instance_request_latency_percentile` | P50/P75/P90/P95/P99 per pod | +| Input Tokens Rate | tokens/min | `meter_envoy_ai_gw_instance_input_token_rate` | Input tokens per minute per pod | +| Output Tokens Rate | tokens/min | `meter_envoy_ai_gw_instance_output_token_rate` | Output tokens per minute per pod | +| Total Tokens Rate | tokens/min | `meter_envoy_ai_gw_instance_total_token_rate` | Total tokens per minute per pod | +| TTFT Avg | ms | `meter_envoy_ai_gw_instance_ttft_avg` | Average TTFT per pod | +| TTFT Percentile | ms | `meter_envoy_ai_gw_instance_ttft_percentile` | P50/P75/P90/P95/P99 TTFT per pod | +| TPOT Avg | ms | `meter_envoy_ai_gw_instance_tpot_avg` | Average inter-token latency per pod | +| TPOT Percentile | ms | `meter_envoy_ai_gw_instance_tpot_percentile` | P50/P75/P90/P95/P99 TPOT per pod | +| Estimated Cost | cost/min | `meter_envoy_ai_gw_instance_estimated_cost` | Estimated cost per minute per pod | + +**Per-provider breakdown metrics (instance scope):** + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Provider Request CPM | count/min | `meter_envoy_ai_gw_instance_provider_request_cpm` | Requests per minute by provider per pod | +| Provider Token Usage | tokens/min | `meter_envoy_ai_gw_instance_provider_token_rate` | Token rate by provider per pod | +| Provider Latency Avg | ms | `meter_envoy_ai_gw_instance_provider_latency_avg` | Average latency by provider per pod | + +**Per-model breakdown metrics (instance scope):** + +| Monitoring Panel | Unit | Metric Name | Description | +|---|---|---|---| +| Model Request CPM | count/min | `meter_envoy_ai_gw_instance_model_request_cpm` | Requests per minute by model per pod | +| Model Token Usage | tokens/min | `meter_envoy_ai_gw_instance_model_token_rate` | Token rate by model per pod | +| Model Latency Avg | ms | `meter_envoy_ai_gw_instance_model_latency_avg` | Average latency by model per pod | +| Model TTFT Avg | ms | `meter_envoy_ai_gw_instance_model_ttft_avg` | Average TTFT by model per pod | +| Model TPOT Avg | ms | `meter_envoy_ai_gw_instance_model_tpot_avg` | Average inter-token latency by model per pod | + #### Cost Estimation Reuse the same `gen-ai-config.yml` pricing configuration from PR #13745. The MAL rules will: @@ -335,7 +375,7 @@ OTLP gRPC to the same endpoint as metrics. No FluentBit or external log collecto The OTLP log sink shares the same `GatewayConfig` CRD env vars as metrics (see Section 2). `OTEL_LOGS_EXPORTER=otlp` and `OTEL_EXPORTER_OTLP_ENDPOINT` enable the log sink. The -`OTEL_RESOURCE_ATTRIBUTES` (including `aigw.service` and `service.instance.id`) are injected as +`OTEL_RESOURCE_ATTRIBUTES` (including `job_name`, `service.instance.id`, and `service.layer`) are injected as resource attributes on each OTLP log record, ensuring consistency between metrics and access logs. Additionally, enable token metadata population in `AIGatewayRoute` so token counts appear in access logs: @@ -360,9 +400,9 @@ Each access log record is pushed as an OTLP LogRecord with the following structu | Attribute | Example | Notes | |---|---|---| -| `aigw.service` | `envoy-ai-gateway-basic` | From `OTEL_RESOURCE_ATTRIBUTES` — SkyWalking service name | +| `job_name` | `envoy-ai-gateway` | From `OTEL_RESOURCE_ATTRIBUTES` — MAL/LAL routing tag | | `service.instance.id` | `aigw-pod-7b9f4d8c5` | From `OTEL_RESOURCE_ATTRIBUTES` — SkyWalking instance name | -| `service.name` | `envoy-ai-gateway` | From `OTEL_SERVICE_NAME` — mapped to `job_name` for rule routing | +| `service.name` | `envoy-ai-gateway` | From `OTEL_SERVICE_NAME` — SkyWalking service name for logs | | `node_name` | `default-aigw-run-85f8cf28` | Envoy node identifier | | `cluster_name` | `default/aigw-run` | Envoy cluster name | @@ -450,7 +490,8 @@ The LAL rules would: - `envoy-ai-gateway-root.json` — Root list view of all AI Gateway services. - `envoy-ai-gateway-service.json` — Service dashboard: Request CPM, latency, token rates, TTFT, TPOT, estimated cost, with provider and model breakdown panels. -- `envoy-ai-gateway-instance.json` — Instance (pod) level dashboard. +- `envoy-ai-gateway-instance.json` — Instance (pod) level dashboard: Same aggregate metrics as service + dashboard but scoped to a single pod, plus per-provider and per-model breakdown panels for that pod. **UI side** — A separate PR in [skywalking-booster-ui](https://github.com/apache/skywalking-booster-ui) is needed for i18n menu entries (similar to @@ -481,14 +522,14 @@ Apply the `GatewayConfig` CRD from Section 2 to your AI Gateway deployment. Key | Env Var | Value | Purpose | |---|---|---| -| `OTEL_SERVICE_NAME` | `envoy-ai-gateway` | Routes metrics/logs to correct MAL/LAL rules via `job_name` (fixed for all deployments) | +| `OTEL_SERVICE_NAME` | `$(GATEWAY_NAME)` | SkyWalking service name (per-deployment, auto-resolved from Gateway CRD name) | | `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking OAP OTLP receiver | | `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport | | `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push | | `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push | | `GATEWAY_NAME` | (auto from label) | Auto-resolved from pod label `gateway.envoyproxy.io/owning-gateway-name` | | `POD_NAME` | (auto from Downward API) | Auto-resolved from pod name | -| `OTEL_RESOURCE_ATTRIBUTES` | `aigw.service=$(GATEWAY_NAME),service.instance.id=$(POD_NAME)` | SkyWalking service name (auto) + instance ID (auto) | +| `OTEL_RESOURCE_ATTRIBUTES` | `job_name=envoy-ai-gateway,service.instance.id=$(POD_NAME),service.layer=ENVOY_AI_GATEWAY` | Routing tag (fixed) + instance ID (auto) + layer for LAL routing | ### Step 2: Configure SkyWalking OAP @@ -529,7 +570,7 @@ With `OTEL_RESOURCE_ATTRIBUTES=service.instance.id=test-instance-456` and | `telemetry.sdk.name` | `opentelemetry` | SDK metadata | | `telemetry.sdk.version` | `1.40.0` | SDK metadata | -**Not present by default (without explicit env config):** `service.instance.id`, `aigw.service`, `host.name`. +**Not present by default (without explicit env config):** `service.instance.id`, `job_name`, `service.layer`, `host.name`. These must be explicitly set via `OTEL_RESOURCE_ATTRIBUTES` in the `GatewayConfig` CRD (see Section 2). `resource.WithFromEnv()` (source: `internal/metrics/metrics.go:35-94`) is called inside a conditional diff --git a/docs/en/swip/SWIP-10/kind-test-resources.yaml b/docs/en/swip/SWIP-10/kind-test-resources.yaml deleted file mode 100644 index ff5d5bd790a6..000000000000 --- a/docs/en/swip/SWIP-10/kind-test-resources.yaml +++ /dev/null @@ -1,247 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -# SWIP-10 Kind Test Resources -# Deploy with: kubectl apply -f kind-test-resources.yaml -# -# This file contains all K8s resources for the SWIP-10 local verification: -# - Ollama (in-cluster LLM backend) -# - OTel Collector (debug exporter for capturing OTLP payloads) -# - AI Gateway CRDs (GatewayClass, GatewayConfig, Gateway, AIGatewayRoute, AIServiceBackend, Backend) - -# --- Ollama (in-cluster) --- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: ollama - namespace: default -spec: - replicas: 1 - selector: - matchLabels: - app: ollama - template: - metadata: - labels: - app: ollama - spec: - containers: - - name: ollama - image: ollama/ollama:latest - imagePullPolicy: Never - ports: - - containerPort: 11434 - resources: - requests: - cpu: "500m" - memory: "2Gi" ---- -apiVersion: v1 -kind: Service -metadata: - name: ollama - namespace: default -spec: - selector: - app: ollama - ports: - - port: 11434 - targetPort: 11434 ---- -# --- OTel Collector (debug exporter) --- -apiVersion: v1 -kind: ConfigMap -metadata: - name: otel-collector-config - namespace: default -data: - config.yaml: | - receivers: - otlp: - protocols: - grpc: - endpoint: 0.0.0.0:4317 - exporters: - debug: - verbosity: detailed - service: - pipelines: - metrics: - receivers: [otlp] - exporters: [debug] - logs: - receivers: [otlp] - exporters: [debug] ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: otel-collector - namespace: default -spec: - replicas: 1 - selector: - matchLabels: - app: otel-collector - template: - metadata: - labels: - app: otel-collector - spec: - containers: - - name: collector - image: otel/opentelemetry-collector:latest - imagePullPolicy: Never - ports: - - containerPort: 4317 - volumeMounts: - - name: config - mountPath: /etc/otelcol/config.yaml - subPath: config.yaml - volumes: - - name: config - configMap: - name: otel-collector-config ---- -apiVersion: v1 -kind: Service -metadata: - name: otel-collector - namespace: default -spec: - selector: - app: otel-collector - ports: - - port: 4317 - targetPort: 4317 ---- -# --- AI Gateway CRDs --- -# 1. GatewayClass -apiVersion: gateway.networking.k8s.io/v1 -kind: GatewayClass -metadata: - name: envoy-ai-gateway -spec: - controllerName: gateway.envoyproxy.io/gatewayclass-controller ---- -# 2. GatewayConfig — OTLP configuration for SkyWalking -# Verified: GATEWAY_NAME auto-resolves from pod label -# gateway.envoyproxy.io/owning-gateway-name via Downward API -apiVersion: aigateway.envoyproxy.io/v1alpha1 -kind: GatewayConfig -metadata: - name: sw-test-config - namespace: default -spec: - extProc: - kubernetes: - env: - # job_name for MAL/LAL rule routing (fixed for all deployments) - - name: OTEL_SERVICE_NAME - value: "envoy-ai-gateway" - # OTLP endpoint — OTel Collector (or SkyWalking OAP in production) - - name: OTEL_EXPORTER_OTLP_ENDPOINT - value: "http://otel-collector.default:4317" - - name: OTEL_EXPORTER_OTLP_PROTOCOL - value: "grpc" - # Enable OTLP for both metrics and access logs - - name: OTEL_METRICS_EXPORTER - value: "otlp" - - name: OTEL_LOGS_EXPORTER - value: "otlp" - - name: OTEL_METRIC_EXPORT_INTERVAL - value: "5000" - # Gateway name = Gateway CRD metadata.name (e.g., "my-ai-gateway") - # Read from pod label gateway.envoyproxy.io/owning-gateway-name, - # which is auto-set by the Envoy Gateway controller on every envoy pod. - - name: GATEWAY_NAME - valueFrom: - fieldRef: - fieldPath: metadata.labels['gateway.envoyproxy.io/owning-gateway-name'] - # Pod name (e.g., "envoy-default-my-ai-gateway-76d02f2b-xxx") - - name: POD_NAME - valueFrom: - fieldRef: - fieldPath: metadata.name - # aigw.service → SkyWalking service name (= Gateway CRD name, auto-resolved) - # service.instance.id → SkyWalking instance name (= pod name, auto-resolved) - # $(VAR) substitution references the valueFrom env vars defined above. - - name: OTEL_RESOURCE_ATTRIBUTES - value: "aigw.service=$(GATEWAY_NAME),service.instance.id=$(POD_NAME)" ---- -# 3. Gateway — references GatewayConfig via annotation -apiVersion: gateway.networking.k8s.io/v1 -kind: Gateway -metadata: - name: my-ai-gateway - namespace: default - annotations: - aigateway.envoyproxy.io/gateway-config: sw-test-config -spec: - gatewayClassName: envoy-ai-gateway - listeners: - - name: http - protocol: HTTP - port: 80 ---- -# 4. AIGatewayRoute — routing + token metadata for access logs -apiVersion: aigateway.envoyproxy.io/v1alpha1 -kind: AIGatewayRoute -metadata: - name: my-ai-gateway-route - namespace: default -spec: - parentRefs: - - name: my-ai-gateway - kind: Gateway - group: gateway.networking.k8s.io - llmRequestCosts: - - metadataKey: llm_input_token - type: InputToken - - metadataKey: llm_output_token - type: OutputToken - - metadataKey: llm_total_token - type: TotalToken - rules: - - backendRefs: - - name: ollama-backend ---- -# 5. AIServiceBackend + Backend — Ollama in-cluster -apiVersion: aigateway.envoyproxy.io/v1alpha1 -kind: AIServiceBackend -metadata: - name: ollama-backend - namespace: default -spec: - schema: - name: OpenAI - prefix: "/v1" - backendRef: - name: ollama-backend - kind: Backend - group: gateway.envoyproxy.io ---- -apiVersion: gateway.envoyproxy.io/v1alpha1 -kind: Backend -metadata: - name: ollama-backend - namespace: default -spec: - endpoints: - - fqdn: - hostname: ollama.default.svc.cluster.local - port: 11434 diff --git a/docs/en/swip/SWIP-10/kind-test-setup.sh b/docs/en/swip/SWIP-10/kind-test-setup.sh deleted file mode 100644 index 4fd3afcc467a..000000000000 --- a/docs/en/swip/SWIP-10/kind-test-setup.sh +++ /dev/null @@ -1,108 +0,0 @@ -#!/bin/bash -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -# SWIP-10 Local Verification: Envoy AI Gateway + SkyWalking OTLP on Kind -# -# Prerequisites: -# - kind, kubectl, helm, docker installed -# - Docker images pulled (or internet access for Kind to pull) -# -# This script sets up a Kind cluster with: -# - Envoy Gateway (v1.3.3) + AI Gateway controller (v0.5.0) -# - Ollama (in-cluster) with a small model -# - OTel Collector (debug exporter) to capture OTLP metrics and logs -# - AI Gateway configured with SkyWalking-compatible OTLP resource attributes -# -# Usage: -# ./kind-test-setup.sh # Full setup -# ./kind-test-setup.sh cleanup # Delete the cluster - -set -e - -CLUSTER_NAME="aigw-swip10-test" - -if [ "$1" = "cleanup" ]; then - echo "Cleaning up..." - kind delete cluster --name $CLUSTER_NAME - exit 0 -fi - -echo "=== Step 1: Create Kind cluster ===" -kind create cluster --name $CLUSTER_NAME - -echo "=== Step 2: Pre-load Docker images ===" -IMAGES=( - "envoyproxy/ai-gateway-controller:v0.5.0" - "envoyproxy/ai-gateway-extproc:v0.5.0" - "envoyproxy/gateway:v1.3.3" - "envoyproxy/envoy:distroless-v1.33.3" - "otel/opentelemetry-collector:latest" - "ollama/ollama:latest" -) -for img in "${IMAGES[@]}"; do - echo "Pulling $img..." - docker pull "$img" - echo "Loading $img into Kind..." - kind load docker-image "$img" --name $CLUSTER_NAME -done - -echo "=== Step 3: Install Envoy Gateway ===" -# enableBackend is required for Backend resources used by AIServiceBackend -helm install eg oci://docker.io/envoyproxy/gateway-helm \ - --version v1.3.3 -n envoy-gateway-system --create-namespace \ - --set config.envoyGateway.extensionApis.enableBackend=true -kubectl wait --for=condition=available deployment/envoy-gateway \ - -n envoy-gateway-system --timeout=120s - -echo "=== Step 4: Install AI Gateway ===" -helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm \ - --namespace envoy-ai-gateway-system --create-namespace -helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm \ - --namespace envoy-ai-gateway-system --create-namespace -kubectl wait --for=condition=available deployment/ai-gateway-controller \ - -n envoy-ai-gateway-system --timeout=120s - -echo "=== Step 5: Deploy test resources ===" -kubectl apply -f kind-test-resources.yaml - -echo "=== Step 6: Wait for pods ===" -sleep 10 -kubectl wait --for=condition=available deployment/ollama -n default --timeout=120s -kubectl wait --for=condition=available deployment/otel-collector -n default --timeout=60s - -echo "=== Step 7: Pull Ollama model ===" -OLLAMA_POD=$(kubectl get pod -l app=ollama -o jsonpath='{.items[0].metadata.name}') -kubectl exec "$OLLAMA_POD" -- ollama pull qwen2.5:0.5b - -echo "=== Step 8: Wait for Envoy pod ===" -sleep 30 -kubectl get pods -A - -echo "" -echo "=== Setup complete ===" -echo "To test:" -echo " kubectl port-forward -n envoy-gateway-system svc/envoy-default-my-ai-gateway-76d02f2b 8080:80 &" -echo " curl -s --noproxy '*' http://localhost:8080/v1/chat/completions \\" -echo " -H 'Content-Type: application/json' \\" -echo " -d '{\"model\":\"qwen2.5:0.5b\",\"messages\":[{\"role\":\"user\",\"content\":\"Say hi\"}]}'" -echo "" -echo "To check OTLP output:" -echo " kubectl logs -l app=otel-collector | grep -A 20 'ResourceMetrics\\|ResourceLog'" -echo "" -echo "To cleanup:" -echo " ./kind-test-setup.sh cleanup" diff --git a/docs/menu.yml b/docs/menu.yml index bde793633aa9..2ca1d7f1cabc 100644 --- a/docs/menu.yml +++ b/docs/menu.yml @@ -152,6 +152,8 @@ catalog: catalog: - name: "Virtual GenAI" path: "/en/setup/service-agent/virtual-genai" + - name: "Envoy AI Gateway" + path: "/en/setup/backend/backend-envoy-ai-gateway-monitoring" - name: "Self Observability" catalog: - name: "OAP self telemetry" diff --git a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java index fa392f9265d7..980b9d8a1ef6 100644 --- a/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java +++ b/oap-server/analyzer/meter-analyzer/src/main/java/org/apache/skywalking/oap/meter/analyzer/v2/dsl/SampleFamily.java @@ -98,6 +98,27 @@ static SampleFamily build(RunningContext ctx, Sample... samples) { public final RunningContext context; + @Override + public String toString() { + if (samples.length == 0) { + return "SampleFamily{EMPTY}"; + } + final StringBuilder sb = new StringBuilder("SampleFamily{samples=[\n"); + for (final Sample s : samples) { + sb.append(" ").append(s.getName()).append(s.getLabels()).append(" ").append(s.getValue()).append('\n'); + } + sb.append("]}"); + return sb.toString(); + } + + /** + * Dump this SampleFamily for debugging. + */ + public SampleFamily debugDump() { + log.info("{}", this); + return this; + } + /** * Following operations are used in DSL */ diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java index 284192ee8d73..54076b72d529 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java @@ -272,7 +272,13 @@ public enum Layer { * Virtual GenAI is a virtual layer used to represent and monitor remote, uninstrumented * Generative AI providers. */ - VIRTUAL_GENAI(45, false); + VIRTUAL_GENAI(45, false), + + /** + * Envoy AI Gateway is an AI/LLM traffic gateway built on Envoy Proxy, + * providing observability for GenAI API traffic. + */ + ENVOY_AI_GATEWAY(46, true); private final int value; /** diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java index 525ccf11e2fc..4651101e84e6 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java @@ -82,6 +82,7 @@ public class UITemplateInitializer { Layer.FLINK.name(), Layer.BANYANDB.name(), Layer.VIRTUAL_GENAI.name(), + Layer.ENVOY_AI_GATEWAY.name(), "custom" }; private final UITemplateManagementService uiTemplateManagementService; diff --git a/oap-server/server-receiver-plugin/otel-receiver-plugin/src/main/java/org/apache/skywalking/oap/server/receiver/otel/otlp/OpenTelemetryLogHandler.java b/oap-server/server-receiver-plugin/otel-receiver-plugin/src/main/java/org/apache/skywalking/oap/server/receiver/otel/otlp/OpenTelemetryLogHandler.java index 9bebac107744..3b9fc38c44e2 100644 --- a/oap-server/server-receiver-plugin/otel-receiver-plugin/src/main/java/org/apache/skywalking/oap/server/receiver/otel/otlp/OpenTelemetryLogHandler.java +++ b/oap-server/server-receiver-plugin/otel-receiver-plugin/src/main/java/org/apache/skywalking/oap/server/receiver/otel/otlp/OpenTelemetryLogHandler.java @@ -104,7 +104,13 @@ public void export(ExportLogsServiceRequest request, StreamObserver LABEL_MAPPINGS = + /** + * Fallback label mappings: if the target label (value) is absent in resource attributes, + * copy the source label (key) value as the target. The source label is always kept as-is + * (with dots converted to underscores by the first pass). + * + *

The {@code service.name → job_name} mapping is required because the + * + * OTel Collector Prometheus Receiver automatically converts the Prometheus {@code job} + * label to the {@code service.name} resource attribute. All Prometheus-based monitoring + * integrations (VM, Nginx, Redis, etc.) depend on this being available as {@code job_name} + * in MAL rules. When {@code job_name} is set explicitly in resource attributes (e.g., by + * Envoy AI Gateway), it takes precedence via {@code putIfAbsent}. + * + *

Legacy: The {@code net.host.name} and {@code host.name} mappings to + * {@code node_identifier_host_name} are kept for backward compatibility with existing + * VM/Windows MAL rules. New integrations should NOT add entries here — use the natural + * dot-to-underscore conversion instead (e.g., {@code host.name} becomes {@code host_name}). + */ + private static final Map FALLBACK_LABEL_MAPPINGS = ImmutableMap .builder() + // Legacy: use host_name (dot-to-underscore) for new integrations instead .put("net.host.name", "node_identifier_host_name") .put("host.name", "node_identifier_host_name") - .put("job", "job_name") + // OTel Collector Prometheus Receiver converts Prometheus `job` to `service.name`. + // All Prometheus-based MAL rules filter by job_name. When job_name is set explicitly + // in resource attributes (e.g., Envoy AI Gateway), it takes precedence via putIfAbsent. .put("service.name", "job_name") .build(); private List converters; @@ -99,18 +120,20 @@ public void processMetricsRequest(final ExportMetricsServiceRequest requests) { log.debug("Resource attributes: {}", request.getResource().getAttributesList()); } - final Map nodeLabels = - request - .getResource() - .getAttributesList() - .stream() - .collect(toMap( - it -> LABEL_MAPPINGS - .getOrDefault(it.getKey(), it.getKey()) - .replaceAll("\\.", "_"), - it -> anyValueToString(it.getValue()), - (v1, v2) -> v1 - )); + // First pass: collect all resource attributes with dots replaced by underscores + final Map nodeLabels = new HashMap<>(); + for (final var it : request.getResource().getAttributesList()) { + final String key = it.getKey().replace('.', '_'); + final String value = anyValueToString(it.getValue()); + nodeLabels.putIfAbsent(key, value); + } + // Second pass: apply fallback mappings — only if the target key is absent + for (final var it : request.getResource().getAttributesList()) { + final String targetKey = FALLBACK_LABEL_MAPPINGS.get(it.getKey()); + if (targetKey != null) { + nodeLabels.putIfAbsent(targetKey, anyValueToString(it.getValue())); + } + } ImmutableMap sampleFamilies = PrometheusMetricConverter.convertPromMetricToSampleFamily( request.getScopeMetricsList().stream() @@ -154,8 +177,9 @@ private static Map buildLabels(List kvs) { return kvs .stream() .collect(toMap( - KeyValue::getKey, - it -> anyValueToString(it.getValue()) + it -> it.getKey().replace('.', '_'), + it -> anyValueToString(it.getValue()), + (v1, v2) -> v1 )); } diff --git a/oap-server/server-starter/src/main/resources/application.yml b/oap-server/server-starter/src/main/resources/application.yml index d176e04faba6..b79ff36a95c8 100644 --- a/oap-server/server-starter/src/main/resources/application.yml +++ b/oap-server/server-starter/src/main/resources/application.yml @@ -237,7 +237,7 @@ agent-analyzer: log-analyzer: selector: ${SW_LOG_ANALYZER:default} default: - lalFiles: ${SW_LOG_LAL_FILES:envoy-als,mesh-dp,mysql-slowsql,pgsql-slowsql,redis-slowsql,k8s-service,nginx,default} + lalFiles: ${SW_LOG_LAL_FILES:envoy-als,mesh-dp,mysql-slowsql,pgsql-slowsql,redis-slowsql,k8s-service,nginx,envoy-ai-gateway,default} malFiles: ${SW_LOG_MAL_FILES:"nginx"} event-analyzer: @@ -390,7 +390,7 @@ receiver-otel: selector: ${SW_OTEL_RECEIVER:default} default: enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-traces,otlp-metrics,otlp-logs"} - enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*,banyandb/*"} + enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*,banyandb/*,envoy-ai-gateway/*"} receiver-zipkin: selector: ${SW_RECEIVER_ZIPKIN:-} diff --git a/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml b/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml new file mode 100644 index 000000000000..0af60377b5b1 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway access log processing via OTLP. +# +# Sampling policy: only persist abnormal or expensive requests. +# Normal 200 responses with low token count and no upstream failure are dropped. + +rules: + - name: envoy-ai-gateway-access-log + layer: ENVOY_AI_GATEWAY + dsl: | + filter { + // Drop normal logs: response < 400, no upstream failure, low token count + if (tag("response_code") != "" && tag("response_code") != "-") { + if (tag("response_code") as Integer < 400) { + if (tag("upstream_transport_failure_reason") == "" || tag("upstream_transport_failure_reason") == "-") { + if (tag("gen_ai.usage.input_tokens") != "" && tag("gen_ai.usage.input_tokens") != "-" + && tag("gen_ai.usage.output_tokens") != "" && tag("gen_ai.usage.output_tokens") != "-") { + if ((tag("gen_ai.usage.input_tokens") as Integer) + (tag("gen_ai.usage.output_tokens") as Integer) < 10000) { + abort {} + } + } + } + } + } + + extractor { + tag 'gen_ai.request.model': tag("gen_ai.request.model") + tag 'gen_ai.response.model': tag("gen_ai.response.model") + tag 'gen_ai.provider.name': tag("gen_ai.provider.name") + tag 'gen_ai.usage.input_tokens': tag("gen_ai.usage.input_tokens") + tag 'gen_ai.usage.output_tokens': tag("gen_ai.usage.output_tokens") + tag 'response_code': tag("response_code") + tag 'duration': tag("duration") + } + + sink { + } + } diff --git a/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-instance.yaml b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-instance.yaml new file mode 100644 index 000000000000..d0509d087e5e --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-instance.yaml @@ -0,0 +1,98 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway — Instance-level (per-pod) metrics +# +# Same metrics as gateway-service.yaml but scoped to individual pods. +# All durations are in seconds from the AI Gateway; multiply by 1000 for ms display. + +filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" +expSuffix: instance(['service_name'], ['service_instance_id'], Layer.ENVOY_AI_GATEWAY) +metricPrefix: meter_envoy_ai_gw_instance + +metricsRules: + # ===================== Aggregate metrics ===================== + + # Request CPM + - name: request_cpm + exp: gen_ai_server_request_duration_count.sum(['service_name', 'service_instance_id']).increase('PT1M') + + # Request latency average (ms) + - name: request_latency_avg + exp: gen_ai_server_request_duration_sum.sum(['service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_request_duration_count.sum(['service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # Request latency percentile (ms) + - name: request_latency_percentile + exp: gen_ai_server_request_duration.sum(['le', 'service_name', 'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # Input token rate (tokens/min) + - name: input_token_rate + exp: gen_ai_client_token_usage_sum.tagEqual('gen_ai_token_type', 'input').sum(['service_name', 'service_instance_id']).increase('PT1M') + + # Output token rate (tokens/min) + - name: output_token_rate + exp: gen_ai_client_token_usage_sum.tagEqual('gen_ai_token_type', 'output').sum(['service_name', 'service_instance_id']).increase('PT1M') + + # TTFT average (ms) + - name: ttft_avg + exp: gen_ai_server_time_to_first_token_sum.sum(['service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_time_to_first_token_count.sum(['service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # TTFT percentile (ms) + - name: ttft_percentile + exp: gen_ai_server_time_to_first_token.sum(['le', 'service_name', 'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # TPOT average (ms) + - name: tpot_avg + exp: gen_ai_server_time_per_output_token_sum.sum(['service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_time_per_output_token_count.sum(['service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # TPOT percentile (ms) + - name: tpot_percentile + exp: gen_ai_server_time_per_output_token.sum(['le', 'service_name', 'service_instance_id']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # ===================== Per-provider breakdown ===================== + + # Provider request CPM + - name: provider_request_cpm + exp: gen_ai_server_request_duration_count.sum(['gen_ai_provider_name', 'service_name', 'service_instance_id']).increase('PT1M') + + # Provider token rate + - name: provider_token_rate + exp: gen_ai_client_token_usage_sum.sum(['gen_ai_provider_name', 'gen_ai_token_type', 'service_name', 'service_instance_id']).increase('PT1M') + + # Provider latency average (ms) + - name: provider_latency_avg + exp: gen_ai_server_request_duration_sum.sum(['gen_ai_provider_name', 'service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_request_duration_count.sum(['gen_ai_provider_name', 'service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # ===================== Per-model breakdown ===================== + + # Model request CPM + - name: model_request_cpm + exp: gen_ai_server_request_duration_count.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') + + # Model token rate + - name: model_token_rate + exp: gen_ai_client_token_usage_sum.sum(['gen_ai_response_model', 'gen_ai_token_type', 'service_name', 'service_instance_id']).increase('PT1M') + + # Model latency average (ms) + - name: model_latency_avg + exp: gen_ai_server_request_duration_sum.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_request_duration_count.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # Model TTFT average (ms) + - name: model_ttft_avg + exp: gen_ai_server_time_to_first_token_sum.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_time_to_first_token_count.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') * 1000 + + # Model TPOT average (ms) + - name: model_tpot_avg + exp: gen_ai_server_time_per_output_token_sum.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') / gen_ai_server_time_per_output_token_count.sum(['gen_ai_response_model', 'service_name', 'service_instance_id']).increase('PT1M') * 1000 diff --git a/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-service.yaml b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-service.yaml new file mode 100644 index 000000000000..d4b745f9ec6d --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-service.yaml @@ -0,0 +1,103 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway — Service-level metrics +# +# Source OTLP metrics (dots → underscores by OTel receiver): +# gen_ai_client_token_usage — Histogram (Delta), labels: gen_ai_token_type, gen_ai_provider_name, gen_ai_response_model +# gen_ai_server_request_duration — Histogram (Delta), unit: seconds +# gen_ai_server_time_to_first_token — Histogram (Delta), unit: seconds, streaming only +# gen_ai_server_time_per_output_token — Histogram (Delta), unit: seconds, streaming only +# +# All durations are in seconds from the AI Gateway; multiply by 1000 for ms display. + +filter: "{ tags -> tags.job_name == 'envoy-ai-gateway' }" +expSuffix: service(['service_name'], Layer.ENVOY_AI_GATEWAY) +metricPrefix: meter_envoy_ai_gw + +metricsRules: + # ===================== Aggregate metrics ===================== + + # Request CPM — count of requests per minute + - name: request_cpm + exp: gen_ai_server_request_duration_count.sum(['service_name']).increase('PT1M') + + # Request latency average (ms) + - name: request_latency_avg + exp: gen_ai_server_request_duration_sum.sum(['service_name']).increase('PT1M') / gen_ai_server_request_duration_count.sum(['service_name']).increase('PT1M') * 1000 + + # Request latency percentile (ms) + - name: request_latency_percentile + exp: gen_ai_server_request_duration.sum(['le', 'service_name']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # Input token rate (tokens/min) + - name: input_token_rate + exp: gen_ai_client_token_usage_sum.tagEqual('gen_ai_token_type', 'input').sum(['service_name']).increase('PT1M') + + # Output token rate (tokens/min) + - name: output_token_rate + exp: gen_ai_client_token_usage_sum.tagEqual('gen_ai_token_type', 'output').sum(['service_name']).increase('PT1M') + + # TTFT average (ms) — streaming requests only + - name: ttft_avg + exp: gen_ai_server_time_to_first_token_sum.sum(['service_name']).increase('PT1M') / gen_ai_server_time_to_first_token_count.sum(['service_name']).increase('PT1M') * 1000 + + # TTFT percentile (ms) + - name: ttft_percentile + exp: gen_ai_server_time_to_first_token.sum(['le', 'service_name']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # TPOT average (ms) — time per output token, streaming only + - name: tpot_avg + exp: gen_ai_server_time_per_output_token_sum.sum(['service_name']).increase('PT1M') / gen_ai_server_time_per_output_token_count.sum(['service_name']).increase('PT1M') * 1000 + + # TPOT percentile (ms) + - name: tpot_percentile + exp: gen_ai_server_time_per_output_token.sum(['le', 'service_name']).increase('PT1M').histogram().histogram_percentile([50,75,90,95,99]) * 1000 + + # ===================== Per-provider breakdown ===================== + + # Provider request CPM — labeled by gen_ai_provider_name + - name: provider_request_cpm + exp: gen_ai_server_request_duration_count.sum(['gen_ai_provider_name', 'service_name']).increase('PT1M') + + # Provider token rate — labeled by gen_ai_provider_name and gen_ai_token_type + - name: provider_token_rate + exp: gen_ai_client_token_usage_sum.sum(['gen_ai_provider_name', 'gen_ai_token_type', 'service_name']).increase('PT1M') + + # Provider latency average (ms) — labeled by gen_ai_provider_name + - name: provider_latency_avg + exp: gen_ai_server_request_duration_sum.sum(['gen_ai_provider_name', 'service_name']).increase('PT1M') / gen_ai_server_request_duration_count.sum(['gen_ai_provider_name', 'service_name']).increase('PT1M') * 1000 + + # ===================== Per-model breakdown ===================== + + # Model request CPM — labeled by gen_ai_response_model + - name: model_request_cpm + exp: gen_ai_server_request_duration_count.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') + + # Model token rate — labeled by gen_ai_response_model and gen_ai_token_type + - name: model_token_rate + exp: gen_ai_client_token_usage_sum.sum(['gen_ai_response_model', 'gen_ai_token_type', 'service_name']).increase('PT1M') + + # Model latency average (ms) — labeled by gen_ai_response_model + - name: model_latency_avg + exp: gen_ai_server_request_duration_sum.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') / gen_ai_server_request_duration_count.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') * 1000 + + # Model TTFT average (ms) — labeled by gen_ai_response_model + - name: model_ttft_avg + exp: gen_ai_server_time_to_first_token_sum.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') / gen_ai_server_time_to_first_token_count.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') * 1000 + + # Model TPOT average (ms) — labeled by gen_ai_response_model + - name: model_tpot_avg + exp: gen_ai_server_time_per_output_token_sum.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') / gen_ai_server_time_per_output_token_count.sum(['gen_ai_response_model', 'service_name']).increase('PT1M') * 1000 diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json new file mode 100644 index 000000000000..fe314a11c000 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json @@ -0,0 +1,509 @@ +[ + { + "id": "Envoy-AI-Gateway-Instance", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 42, + "i": "0", + "type": "Tab", + "children": [ + { + "name": "Overview", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 13, + "i": "0", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_request_cpm" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Request CPM", + "unit": "calls/min" + } + ], + "widget": { + "title": "Request CPM", + "tips": "Calls Per Minute — total requests through this pod" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 13, + "i": "1", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_request_latency_avg" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Avg Latency", + "unit": "ms" + } + ], + "widget": { + "title": "Request Latency Avg" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 13, + "i": "2", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_request_latency_percentile" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Latency Percentile", + "unit": "ms" + } + ], + "widget": { + "title": "Request Latency Percentile", + "tips": "P50 / P75 / P90 / P95 / P99" + } + }, + { + "x": 0, + "y": 13, + "w": 8, + "h": 13, + "i": "3", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_input_token_rate" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Input Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Input Token Rate", + "tips": "Input (prompt) tokens per minute sent to LLM providers" + } + }, + { + "x": 8, + "y": 13, + "w": 8, + "h": 13, + "i": "4", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_output_token_rate" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Output Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Output Token Rate", + "tips": "Output (completion) tokens per minute generated by LLM providers" + } + }, + { + "x": 16, + "y": 13, + "w": 8, + "h": 13, + "i": "5", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_ttft_avg" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TTFT Avg", + "unit": "ms" + } + ], + "widget": { + "title": "Time to First Token Avg (TTFT)", + "tips": "Average time to first token for streaming requests" + } + }, + { + "x": 0, + "y": 26, + "w": 8, + "h": 13, + "i": "6", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_ttft_percentile" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TTFT Percentile", + "unit": "ms" + } + ], + "widget": { + "title": "Time to First Token Percentile (TTFT)", + "tips": "P50 / P75 / P90 / P95 / P99" + } + }, + { + "x": 8, + "y": 26, + "w": 8, + "h": 13, + "i": "7", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_tpot_avg" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TPOT Avg", + "unit": "ms" + } + ], + "widget": { + "title": "Time Per Output Token Avg (TPOT)", + "tips": "Average inter-token latency for streaming requests" + } + }, + { + "x": 16, + "y": 26, + "w": 8, + "h": 13, + "i": "8", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_instance_tpot_percentile" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TPOT Percentile", + "unit": "ms" + } + ], + "widget": { + "title": "Time Per Output Token Percentile (TPOT)", + "tips": "P50 / P75 / P90 / P95 / P99" + } + } + ] + }, + { + "name": "Providers", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 13, + "i": "0", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_provider_request_cpm,sum(gen_ai_provider_name))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Provider CPM", + "unit": "calls/min" + } + ], + "widget": { + "title": "Request CPM by Provider" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 13, + "i": "1", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_provider_token_rate,sum(gen_ai_provider_name))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Provider Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Token Rate by Provider" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 13, + "i": "2", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_provider_latency_avg,avg(gen_ai_provider_name))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Provider Latency", + "unit": "ms" + } + ], + "widget": { + "title": "Latency Avg by Provider" + } + } + ] + }, + { + "name": "Models", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 13, + "i": "0", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_model_request_cpm,sum(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model CPM", + "unit": "calls/min" + } + ], + "widget": { + "title": "Request CPM by Model" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 13, + "i": "1", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_model_token_rate,sum(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Token Rate by Model" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 13, + "i": "2", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_model_latency_avg,avg(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model Latency", + "unit": "ms" + } + ], + "widget": { + "title": "Latency Avg by Model" + } + }, + { + "x": 0, + "y": 13, + "w": 12, + "h": 13, + "i": "3", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_model_ttft_avg,avg(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model TTFT", + "unit": "ms" + } + ], + "widget": { + "title": "Time to First Token Avg by Model (TTFT)" + } + }, + { + "x": 12, + "y": 13, + "w": 12, + "h": 13, + "i": "4", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_instance_model_tpot_avg,avg(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model TPOT", + "unit": "ms" + } + ], + "widget": { + "title": "Time Per Output Token Avg by Model (TPOT)" + } + } + ] + }, + { + "name": "Log", + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 48, + "i": "0", + "type": "Log" + } + ] + } + ] + } + ], + "layer": "ENVOY_AI_GATEWAY", + "entity": "ServiceInstance", + "name": "Envoy-AI-Gateway-Instance", + "id": "Envoy-AI-Gateway-Instance", + "isRoot": false, + "expressions": [ + "avg(meter_envoy_ai_gw_instance_request_latency_avg)", + "avg(meter_envoy_ai_gw_instance_request_cpm)", + "avg(meter_envoy_ai_gw_instance_input_token_rate)", + "avg(meter_envoy_ai_gw_instance_output_token_rate)" + ], + "expressionsConfig": [ + { + "unit": "ms", + "label": "Latency" + }, + { + "label": "CPM", + "unit": "calls/min" + }, + { + "label": "Input Tokens", + "unit": "tokens/min" + }, + { + "label": "Output Tokens", + "unit": "tokens/min" + } + ] + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json new file mode 100644 index 000000000000..d23619c7eafd --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-root.json @@ -0,0 +1,79 @@ +[ + { + "id": "Envoy-AI-Gateway-Root", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 2, + "i": "1", + "type": "Text", + "graph": { + "fontColor": "theme", + "backgroundColor": "theme", + "content": "Observe Envoy AI Gateway via OTLP metrics and access logs", + "fontSize": 14, + "textAlign": "left", + "url": "https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/" + } + }, + { + "x": 0, + "y": 2, + "w": 24, + "h": 52, + "i": "0", + "type": "Widget", + "widget": { + "title": "Envoy AI Gateway" + }, + "graph": { + "type": "ServiceList", + "dashboardName": "Envoy-AI-Gateway-Service", + "fontSize": 12, + "showXAxis": false, + "showYAxis": false, + "showGroup": false + }, + "expressions": [ + "avg(meter_envoy_ai_gw_request_cpm)", + "avg(meter_envoy_ai_gw_request_latency_avg)", + "avg(meter_envoy_ai_gw_input_token_rate)", + "avg(meter_envoy_ai_gw_output_token_rate)" + ], + "subExpressions": [ + "meter_envoy_ai_gw_request_cpm", + "meter_envoy_ai_gw_request_latency_avg", + "meter_envoy_ai_gw_input_token_rate", + "meter_envoy_ai_gw_output_token_rate" + ], + "metricConfig": [ + { + "label": "CPM", + "unit": "calls/min" + }, + { + "unit": "ms", + "label": "Latency" + }, + { + "label": "Input Tokens", + "unit": "tokens/min" + }, + { + "label": "Output Tokens", + "unit": "tokens/min" + } + ] + } + ], + "id": "Envoy-AI-Gateway-Root", + "layer": "ENVOY_AI_GATEWAY", + "entity": "All", + "name": "Envoy-AI-Gateway-Root", + "isRoot": true + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json new file mode 100644 index 000000000000..e2599eee1b0a --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json @@ -0,0 +1,558 @@ +[ + { + "id": "Envoy-AI-Gateway-Service", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 42, + "i": "0", + "type": "Tab", + "children": [ + { + "name": "Overview", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 13, + "i": "0", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_request_cpm" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Request CPM", + "unit": "calls/min" + } + ], + "widget": { + "title": "Request CPM", + "tips": "Calls Per Minute — total requests through the AI Gateway" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 13, + "i": "1", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_request_latency_avg" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Avg Latency", + "unit": "ms" + } + ], + "widget": { + "title": "Request Latency Avg" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 13, + "i": "2", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_request_latency_percentile" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Latency Percentile", + "unit": "ms" + } + ], + "widget": { + "title": "Request Latency Percentile", + "tips": "P50 / P75 / P90 / P95 / P99" + } + }, + { + "x": 0, + "y": 13, + "w": 8, + "h": 13, + "i": "3", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_input_token_rate" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Input Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Input Token Rate", + "tips": "Input (prompt) tokens per minute sent to LLM providers" + } + }, + { + "x": 8, + "y": 13, + "w": 8, + "h": 13, + "i": "4", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_output_token_rate" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Output Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Output Token Rate", + "tips": "Output (completion) tokens per minute generated by LLM providers" + } + }, + { + "x": 16, + "y": 13, + "w": 8, + "h": 13, + "i": "5", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_ttft_avg" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TTFT Avg", + "unit": "ms" + } + ], + "widget": { + "title": "Time to First Token Avg (TTFT)", + "tips": "Average time to first token for streaming requests" + } + }, + { + "x": 0, + "y": 26, + "w": 8, + "h": 13, + "i": "6", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_ttft_percentile" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TTFT Percentile", + "unit": "ms" + } + ], + "widget": { + "title": "Time to First Token Percentile (TTFT)", + "tips": "P50 / P75 / P90 / P95 / P99" + } + }, + { + "x": 8, + "y": 26, + "w": 8, + "h": 13, + "i": "7", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_tpot_avg" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TPOT Avg", + "unit": "ms" + } + ], + "widget": { + "title": "Time Per Output Token Avg (TPOT)", + "tips": "Average inter-token latency for streaming requests" + } + }, + { + "x": 16, + "y": 26, + "w": 8, + "h": 13, + "i": "8", + "type": "Widget", + "expressions": [ + "meter_envoy_ai_gw_tpot_percentile" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "TPOT Percentile", + "unit": "ms" + } + ], + "widget": { + "title": "Time Per Output Token Percentile (TPOT)", + "tips": "P50 / P75 / P90 / P95 / P99" + } + } + ] + }, + { + "name": "Providers", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 13, + "i": "0", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_provider_request_cpm,sum(gen_ai_provider_name))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Provider CPM", + "unit": "calls/min" + } + ], + "widget": { + "title": "Request CPM by Provider" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 13, + "i": "1", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_provider_token_rate,sum(gen_ai_provider_name))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Provider Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Token Rate by Provider" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 13, + "i": "2", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_provider_latency_avg,avg(gen_ai_provider_name))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Provider Latency", + "unit": "ms" + } + ], + "widget": { + "title": "Latency Avg by Provider" + } + } + ] + }, + { + "name": "Models", + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 13, + "i": "0", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_model_request_cpm,sum(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model CPM", + "unit": "calls/min" + } + ], + "widget": { + "title": "Request CPM by Model" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 13, + "i": "1", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_model_token_rate,sum(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model Tokens", + "unit": "tokens/min" + } + ], + "widget": { + "title": "Token Rate by Model" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 13, + "i": "2", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_model_latency_avg,avg(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model Latency", + "unit": "ms" + } + ], + "widget": { + "title": "Latency Avg by Model" + } + }, + { + "x": 0, + "y": 13, + "w": 12, + "h": 13, + "i": "3", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_model_ttft_avg,avg(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model TTFT", + "unit": "ms" + } + ], + "widget": { + "title": "Time to First Token Avg by Model (TTFT)" + } + }, + { + "x": 12, + "y": 13, + "w": 12, + "h": 13, + "i": "4", + "type": "Widget", + "expressions": [ + "aggregate_labels(meter_envoy_ai_gw_model_tpot_avg,avg(gen_ai_response_model))" + ], + "graph": { + "type": "Line", + "showXAxis": true, + "showYAxis": true + }, + "metricConfig": [ + { + "label": "Model TPOT", + "unit": "ms" + } + ], + "widget": { + "title": "Time Per Output Token Avg by Model (TPOT)" + } + } + ] + }, + { + "name": "Log", + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 48, + "i": "0", + "type": "Log" + } + ] + }, + { + "name": "Instances", + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 17, + "i": "0", + "type": "Widget", + "graph": { + "type": "InstanceList", + "dashboardName": "Envoy-AI-Gateway-Instance", + "fontSize": 12 + }, + "expressions": [ + "avg(meter_envoy_ai_gw_instance_request_cpm)", + "avg(meter_envoy_ai_gw_instance_request_latency_avg)", + "avg(meter_envoy_ai_gw_instance_input_token_rate)", + "avg(meter_envoy_ai_gw_instance_output_token_rate)" + ], + "subExpressions": [ + "meter_envoy_ai_gw_instance_request_cpm", + "meter_envoy_ai_gw_instance_request_latency_avg", + "meter_envoy_ai_gw_instance_input_token_rate", + "meter_envoy_ai_gw_instance_output_token_rate" + ], + "metricConfig": [ + { + "label": "CPM", + "unit": "calls/min" + }, + { + "label": "Latency", + "unit": "ms" + }, + { + "label": "Input Tokens", + "unit": "tokens/min" + }, + { + "label": "Output Tokens", + "unit": "tokens/min" + } + ] + } + ] + } + ] + } + ], + "layer": "ENVOY_AI_GATEWAY", + "entity": "Service", + "name": "Envoy-AI-Gateway-Service", + "id": "Envoy-AI-Gateway-Service", + "isRoot": false, + "isDefault": true, + "expressions": [ + "avg(meter_envoy_ai_gw_request_latency_avg)", + "avg(meter_envoy_ai_gw_request_cpm)", + "avg(meter_envoy_ai_gw_input_token_rate)/1000000", + "avg(meter_envoy_ai_gw_output_token_rate)/1000000" + ], + "expressionsConfig": [ + { + "unit": "ms", + "label": "Latency" + }, + { + "label": "CPM", + "unit": "calls/min" + }, + { + "label": "Input Tokens", + "unit": "M tokens/min" + }, + { + "label": "Output Tokens", + "unit": "M tokens/min" + } + ] + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml b/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml index 2057f8b74a7a..feb37405afdd 100644 --- a/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml @@ -252,6 +252,11 @@ menus: description: Observe the virtual GenAI providers and models which are conjectured by language agents through various plugins. documentLink: https://skywalking.apache.org/docs/main/next/en/setup/service-agent/virtual-genai/ i18nKey: virtual_gen_ai + - title: Envoy AI Gateway + layer: ENVOY_AI_GATEWAY + description: Observe Envoy AI Gateway traffic including token usage, latency, TTFT, and per-provider/model breakdowns via OTLP metrics and access logs. + documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-envoy-ai-gateway-monitoring/ + i18nKey: envoy_ai_gateway - title: Self Observability icon: self_observability description: Self Observability provides the observabilities for running components and servers from the SkyWalking ecosystem. diff --git a/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml b/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml new file mode 100644 index 000000000000..89669cf6fd31 --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml @@ -0,0 +1,85 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway e2e — ai-gateway-cli + Ollama + SkyWalking OAP +# +# Architecture: +# trigger → ai-gateway-cli (port 1975) → ollama (port 11434) +# ↓ OTLP gRPC +# oap (port 11800) → banyandb + +services: + banyandb: + extends: + file: ../../script/docker-compose/base-compose.yml + service: banyandb + networks: + - e2e + + oap: + extends: + file: ../../script/docker-compose/base-compose.yml + service: oap + environment: + SW_STORAGE: banyandb + ports: + - 12800 + depends_on: + banyandb: + condition: service_healthy + + ollama: + image: ollama/ollama:0.6.2 + networks: + - e2e + expose: + - 11434 + healthcheck: + test: ["CMD", "ollama", "list"] + interval: 5s + timeout: 60s + retries: 120 + + aigw: + # TODO: pin to a release version once ai-gateway-cli HTTP listener is available in a release + image: envoyproxy/ai-gateway-cli:latest + command: run --run-id=0 + environment: + OPENAI_API_KEY: "dummy-key-not-used" + OPENAI_BASE_URL: "http://ollama:11434/v1" + OTEL_SERVICE_NAME: e2e-ai-gateway + OTEL_EXPORTER_OTLP_ENDPOINT: http://oap:11800 + OTEL_EXPORTER_OTLP_PROTOCOL: grpc + OTEL_METRICS_EXPORTER: otlp + OTEL_LOGS_EXPORTER: otlp + OTEL_METRIC_EXPORT_INTERVAL: "5000" + OTEL_RESOURCE_ATTRIBUTES: "job_name=envoy-ai-gateway,service.instance.id=aigw-1,service.layer=ENVOY_AI_GATEWAY" + ports: + - 1975 + networks: + - e2e + healthcheck: + test: ["CMD", "aigw", "healthcheck"] + interval: 5s + timeout: 60s + retries: 120 + depends_on: + oap: + condition: service_healthy + ollama: + condition: service_healthy + +networks: + e2e: diff --git a/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml b/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml new file mode 100644 index 000000000000..18aaa417b839 --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml @@ -0,0 +1,70 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway e2e test (docker-compose) +# +# Validates ENVOY_AI_GATEWAY layer metrics and logs via OTLP from ai-gateway-cli. +# +# Architecture: +# trigger (curl) → ai-gateway-cli (port 1975) → Ollama (port 11434) +# ↓ OTLP gRPC +# SkyWalking OAP (port 11800) +# ↓ +# BanyanDB + +setup: + env: compose + file: docker-compose.yml + timeout: 20m + init-system-environment: ../../script/env + steps: + - name: set PATH + command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + - name: install yq + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq + - name: install swctl + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl + - name: Pull Ollama model + command: docker compose -f test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml exec ollama ollama pull qwen2.5:0.5b + - name: Send error requests for log sampling verification + command: | + for i in 1 2 3; do + curl -s --max-time 15 -o /dev/null -w "%{http_code} " \ + http://${aigw_host}:${aigw_1975}/v1/chat/completions \ + -H 'Content-Type: application/json' \ + -d '{"model":"nonexistent-model","messages":[{"role":"user","content":"Hi"}]}' + sleep 1 + done + +trigger: + action: http + interval: 3s + times: 10 + url: http://${aigw_host}:${aigw_1975}/v1/chat/completions + method: POST + headers: + Content-Type: application/json + body: '{"model":"qwen2.5:0.5b","stream":true,"messages":[{"role":"user","content":"Say hi"}]}' + +verify: + retry: + count: 30 + interval: 10s + cases: + - includes: + - ./envoy-ai-gateway-cases.yaml + +cleanup: + on: always diff --git a/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml b/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml new file mode 100644 index 000000000000..19f11d7dc70b --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml @@ -0,0 +1,50 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Envoy AI Gateway e2e verification cases +# Service name = "e2e-ai-gateway" (from OTEL_SERVICE_NAME) + +cases: + # Service exists in ENVOY_AI_GATEWAY layer + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql service ls + expected: expected/service.yml + + # Service-level aggregate metrics + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_request_cpm --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_request_latency_avg --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_request_latency_percentile --service-name=e2e-ai-gateway + expected: expected/metrics-has-value-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_input_token_rate --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_output_token_rate --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + + # Provider breakdown + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_provider_request_cpm --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_provider_latency_avg --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + + # Model breakdown + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_model_request_cpm --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_envoy_ai_gw_model_latency_avg --service-name=e2e-ai-gateway + expected: expected/metrics-has-value.yml + + # Access logs — error requests (404) should be persisted by LAL sampling + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql logs ls --service-name=e2e-ai-gateway + expected: expected/logs.yml diff --git a/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml b/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml new file mode 100644 index 000000000000..37a7f56d4184 --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml @@ -0,0 +1,36 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +errorreason: null +logs: + {{- contains .logs }} + - servicename: e2e-ai-gateway + serviceid: {{ notEmpty .serviceid }} + serviceinstancename: aigw-1 + serviceinstanceid: {{ notEmpty .serviceinstanceid }} + endpointname: null + endpointid: null + traceid: null + timestamp: {{ gt .timestamp 0 }} + contenttype: TEXT + content: {{ notEmpty .content }} + tags: + {{- contains .tags }} + - key: response_code + value: "404" + - key: gen_ai.request.model + value: nonexistent-model + {{- end }} + {{- end }} diff --git a/test/e2e-v2/cases/envoy-ai-gateway/expected/metrics-has-value-label.yml b/test/e2e-v2/cases/envoy-ai-gateway/expected/metrics-has-value-label.yml new file mode 100644 index 000000000000..4b2001de51e5 --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/expected/metrics-has-value-label.yml @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: + {{- contains .metric.labels }} + - key: "p" + value: {{ notEmpty .value }} + {{- end}} + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null diff --git a/test/e2e-v2/cases/envoy-ai-gateway/expected/metrics-has-value.yml b/test/e2e-v2/cases/envoy-ai-gateway/expected/metrics-has-value.yml new file mode 100644 index 000000000000..979b9b25775c --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/expected/metrics-has-value.yml @@ -0,0 +1,34 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: [] + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ notEmpty .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null diff --git a/test/e2e-v2/cases/envoy-ai-gateway/expected/service.yml b/test/e2e-v2/cases/envoy-ai-gateway/expected/service.yml new file mode 100644 index 000000000000..c97c7e299737 --- /dev/null +++ b/test/e2e-v2/cases/envoy-ai-gateway/expected/service.yml @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- contains . }} +- id: {{ b64enc "e2e-ai-gateway" }}.1 + name: e2e-ai-gateway + group: "" + shortname: e2e-ai-gateway + layers: + - ENVOY_AI_GATEWAY + normal: true +{{- end }} diff --git a/test/e2e-v2/cases/storage/expected/config-dump.yml b/test/e2e-v2/cases/storage/expected/config-dump.yml index 118fa76f7012..5dded79d67c9 100644 --- a/test/e2e-v2/cases/storage/expected/config-dump.yml +++ b/test/e2e-v2/cases/storage/expected/config-dump.yml @@ -75,7 +75,7 @@ query.graphql.enableLogTestTool=false envoy-metric.default.gRPCSslCertChainPath= receiver-ebpf.default.gRPCPort=0 promql.default.buildInfoRevision= -receiver-otel.default.enabledOtelMetricsRules=apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*,banyandb/* +receiver-otel.default.enabledOtelMetricsRules=apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*,banyandb/*,envoy-ai-gateway/* core.default.syncPeriodHttpUriRecognitionPattern=10 core.default.enableHierarchy=true event-analyzer.provider=default @@ -196,7 +196,7 @@ promql.default.buildInfoBranch= envoy-metric.default.alsHTTPAnalysis= core.default.maxMessageSize=52428800 core.default.dataKeeperExecutePeriod=5 -log-analyzer.default.lalFiles=envoy-als,mesh-dp,mysql-slowsql,pgsql-slowsql,redis-slowsql,k8s-service,nginx,default +log-analyzer.default.lalFiles=envoy-als,mesh-dp,mysql-slowsql,pgsql-slowsql,redis-slowsql,k8s-service,nginx,envoy-ai-gateway,default promql.default.buildInfoBuildUser=****** aws-firehose.default.enableTLS=false ai-pipeline.default.baselineServerPort=18080