apache · wu-sheng · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026
diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md
@@ -169,7 +169,13 @@
 * Support Virtual-GenAI monitoring.
 * Fix on-demand pod log parsing failure by replacing invalid `DateTimeFormatter` pattern with `ISO_OFFSET_DATE_TIME`.
 * Fix Zipkin receiver compatibility with application/x-protobuf Content-Type.
-
+* Support Envoy AI Gateway observability (SWIP-10): new `ENVOY_AI_GATEWAY` layer with MAL/LAL rules
+  for GenAI metrics (token usage, latency, TTFT, TPOT) and access log sampling via OTLP.
+* OTel metric receiver: convert data point attribute dots to underscores (consistent with resource attributes
+  and metric names). Label mappings are now fallback-only — explicit `job_name` in resource attributes takes
+  precedence over the `service.name` fallback.
+* OTel log handler: prefer `service.instance.id` (OTel spec) over `service.instance` with fallback.
+* Add `SampleFamily.debugDump()` for MAL debugging.
 
 #### UI
 * Fix the missing icon in new native trace view.

diff --git a/docs/en/concepts-and-designs/lal.md b/docs/en/concepts-and-designs/lal.md
@@ -8,6 +8,19 @@ The LAL config files are in YAML format, and are located under directory `lal`.
 set `log-analyzer/default/lalFiles` in the `application.yml` file or set environment variable `SW_LOG_LAL_FILES` to
 activate specific LAL config files.
 
+## OTLP log attribute mapping
+
+When logs arrive via the OTLP receiver, resource attributes are mapped to `LogData` fields:
+
+| Resource attribute | LogData field | Notes |
+|---|---|---|
+| `service.name` | `service` | SkyWalking service name |
+| `service.instance.id` | `serviceInstance` | OTel standard ([spec](https://opentelemetry.io/docs/specs/semconv/resource/#service)). Falls back to `service.instance` for backward compatibility. |
+| `service.layer` | `layer` | Routes to the LAL rule with matching `layer` declaration |
+
+Log record attributes are available via `tag("attribute_name")` in LAL rules. Attribute keys
+retain their original names (dots are NOT converted to underscores in log attributes).
+
 ## Layer
 Layer should be declared in the LAL script to represent the analysis scope of the logs.
 

diff --git a/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md b/docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md
@@ -0,0 +1,104 @@
+# Envoy AI Gateway Monitoring
+
+## Envoy AI Gateway observability via OTLP
+
+[Envoy AI Gateway](https://aigateway.envoyproxy.io/) is a gateway/proxy for AI/LLM API traffic
+(OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy.
+It natively emits GenAI metrics and access logs via OTLP, following
+[OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+
+SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) — no OpenTelemetry
+Collector is needed between the AI Gateway and SkyWalking OAP.
+
+### Prerequisites
+- [Envoy AI Gateway](https://aigateway.envoyproxy.io/) deployed. See the
+  [Envoy AI Gateway getting started](https://aigateway.envoyproxy.io/docs/getting-started/) for installation.
+
+### Data flow
+1. Envoy AI Gateway processes LLM API requests and records GenAI metrics (token usage, latency, TTFT, TPOT).
+2. The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking OAP.
+3. SkyWalking OAP parses metrics with [MAL](../../concepts-and-designs/mal.md) rules and access logs
+   with [LAL](../../concepts-and-designs/lal.md) rules.
+
+### Set up
+
+The MAL rules (`envoy-ai-gateway/*`) and LAL rules (`envoy-ai-gateway`) are enabled by default
+in SkyWalking OAP. No OAP-side configuration is needed.
+
+Configure the AI Gateway to push OTLP to SkyWalking by setting these environment variables:
+
+| Env Var | Value | Purpose |
+|---------|-------|---------|
+| `OTEL_SERVICE_NAME` | Per-deployment gateway name (e.g., `my-ai-gateway`) | SkyWalking service name |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://skywalking-oap:11800` | SkyWalking OAP gRPC receiver |
+| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | OTLP transport |
+| `OTEL_METRICS_EXPORTER` | `otlp` | Enable OTLP metrics push |
+| `OTEL_LOGS_EXPORTER` | `otlp` | Enable OTLP access log push |
+| `OTEL_RESOURCE_ATTRIBUTES` | See below | Routing + instance + layer |
+
+**Required resource attributes** (in `OTEL_RESOURCE_ATTRIBUTES`):
+- `job_name=envoy-ai-gateway` — Fixed routing tag for MAL/LAL rules. Same for all AI Gateway deployments.
+- `service.instance.id=<instance-id>` — Instance identity. In Kubernetes, use the pod name via Downward API.
+- `service.layer=ENVOY_AI_GATEWAY` — Routes access logs to the AI Gateway LAL rules.
+
+**Example:**
+```bash
+OTEL_SERVICE_NAME=my-ai-gateway
+OTEL_EXPORTER_OTLP_ENDPOINT=http://skywalking-oap:11800
+OTEL_EXPORTER_OTLP_PROTOCOL=grpc
+OTEL_METRICS_EXPORTER=otlp
+OTEL_LOGS_EXPORTER=otlp
+OTEL_RESOURCE_ATTRIBUTES=job_name=envoy-ai-gateway,service.instance.id=pod-abc123,service.layer=ENVOY_AI_GATEWAY
+```
+
+### Supported Metrics
+
+SkyWalking observes the AI Gateway as a `LAYER: ENVOY_AI_GATEWAY` service. Each gateway deployment
+is a service, each pod is an instance. Metrics include per-provider and per-model breakdowns.
+
+#### Service Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description |
+|---|---|---|---|
+| Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per minute |
+| Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average request duration |
+| Request Latency Percentile | ms | meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 |
+| Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input (prompt) tokens per minute |
+| Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate | Output (completion) tokens per minute |
+| TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming only) |
+| TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile | P50/P75/P90/P95/P99 TTFT |
+| TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token (streaming only) |
+| TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile | P50/P75/P90/P95/P99 TPOT |
+
+#### Provider Breakdown Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description |
+|---|---|---|---|
+| Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm | Requests by provider |
+| Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate | Token rate by provider |
+| Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency by provider |
+
+#### Model Breakdown Metrics
+
+| Monitoring Panel | Unit | Metric Name | Description |
+|---|---|---|---|
+| Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm | Requests by model |
+| Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token rate by model |
+| Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by model |
+| Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model |
+| Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model |
+
+#### Instance Metrics
+
+All service-level metrics are also available per instance (pod) with `meter_envoy_ai_gw_instance_` prefix,
+including per-provider and per-model breakdowns.
+
+### Access Log Sampling
+
+The LAL rules apply a sampling policy to reduce storage:
+- **Error responses** (HTTP status >= 400) — always persisted.
+- **Upstream failures** — always persisted.
+- **High token cost** (>= 10,000 total tokens) — persisted for cost anomaly detection.
+- Normal successful responses with low token counts are dropped.
+
+The token threshold can be adjusted in `lal/envoy-ai-gateway.yaml`.
diff --git a/docs/en/setup/backend/marketplace.md b/docs/en/setup/backend/marketplace.md
@@ -12,6 +12,7 @@ SkyWalking provides ready-to-use monitoring capabilities for a wide range of tec
 - **Infrastructure** - Linux and Windows server monitoring
 - **Cloud Services** - AWS EKS, S3, DynamoDB, API Gateway, and more
 - **Gateways** - Nginx, APISIX, Kong monitoring
+- **GenAI** - [Virtual GenAI](../service-agent/virtual-genai.md) for agent-based LLM call monitoring, [Envoy AI Gateway](backend-envoy-ai-gateway-monitoring.md) for infrastructure-side AI traffic observability
 - **Databases** - MySQL, PostgreSQL, Redis, Elasticsearch, MongoDB, ClickHouse, and more
 - **Message Queues** - Kafka, RabbitMQ, Pulsar, RocketMQ, ActiveMQ
 - **Browser** - Real user monitoring for web applications

diff --git a/docs/en/setup/backend/opentelemetry-receiver.md b/docs/en/setup/backend/opentelemetry-receiver.md
@@ -26,7 +26,26 @@ The receiver adds label with key `node_identifier_host_name` to the collected da
 and its value is from `net.host.name` (or `host.name` for some OTLP versions) resource attributes defined in OpenTelemetry proto,
 for identification of the metric data.
 
-**Notice:** In the resource scope, dots (.) in the attributes' key names are converted to underscores (_), whereas in the metrics scope, they are not converted.
+**Label name conversion:** Dots (`.`) in attribute key names are converted to underscores (`_`) for both
+resource attributes and data point (metric-level) attributes. For example, `gen_ai.token.type` becomes
+`gen_ai_token_type` in MAL rules. Metric names also undergo the same conversion (e.g.,
+`gen_ai.client.token.usage` becomes `gen_ai_client_token_usage`).
+
+**Fallback label mappings:** The following resource attributes are copied to alternative label names
+if the target does not already exist. These are fallback-only — if the target label is already present
+in the resource attributes, the fallback is skipped.
+
+| Source | Target | Notes |
+|---|---|---|
+| `service.name` | `job_name` | The [OTel Collector Prometheus Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/README.md) automatically converts the Prometheus `job` label to `service.name`. This fallback ensures it is available as `job_name` for MAL rule filtering. |
+| `net.host.name` | `node_identifier_host_name` | Legacy: used by VM/Windows MAL rules |
+| `host.name` | `node_identifier_host_name` | Legacy: used by VM/Windows MAL rules |
+
+When `job_name` is set explicitly in `OTEL_RESOURCE_ATTRIBUTES` (e.g., by Envoy AI Gateway),
+it takes precedence and the `service.name` fallback is skipped.
+
+**Note:** The `net.host.name` and `host.name` mappings are legacy. New integrations should use
+the natural dot-to-underscore conversion (e.g., `host.name` → `host_name` in MAL rules).
 
 | Description                             | Configuration File                                  | Data Source                                                                                                            |
 |-----------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|