diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index 3ef88c95d8a0..2dd4f183aee3 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -245,6 +245,7 @@ admin-host only" entry above for the public REST retirement. #### OAP Server +* Add Node.js runtime metrics via the Node.js agent **`MeterReportService`** pipeline (`meter_instance_nodejs_*`, 1s collect/report). OAP analyzes raw meters through `nodejs-runtime.yaml`. Node.js E2E asserts six `meter_instance_nodejs_*` metrics (`test/e2e-v2/cases/nodejs/e2e.yaml`). * Add PHP runtime PHM meter analyzer (`php-runtime.yaml`) for SkyWalking PHP agent process metrics (CPU, memory, virtual memory, thread count, open file descriptors sampled from `/proc` on Linux). Registers six `meter_instance_php_*` metrics on the General Service @@ -333,5 +334,6 @@ * Improve downsampling documentation * Fix the docker-compose quickstart: OAP healthcheck no longer calls `curl` (absent from the JRE image) and probes the query port via bash `/dev/tcp`; the Horizon UI service maps the correct container port (8081) and mounts a `horizon.yaml` (binding `0.0.0.0`, OAP URLs, demo `admin`/`admin` login) instead of non-existent `SW_*_ADDRESS` env vars. * Add PHP runtime metrics (PHM) dashboard documentation (agent setup, OAP `php-runtime` MAL rules, Horizon UI widgets). +* Add Node.js runtime metrics dashboard documentation (agent setup, OAP `nodejs-runtime` MAL rules, Horizon UI widgets). All issues and pull requests are [here](https://github.com/apache/skywalking/issues?q=milestone:11.0.0) diff --git a/docs/en/debugging/config_dump.md b/docs/en/debugging/config_dump.md index 29911ef73f55..753f8dd31d8b 100644 --- a/docs/en/debugging/config_dump.md +++ b/docs/en/debugging/config_dump.md @@ -52,7 +52,7 @@ This API also provides the response in JSON format, which is more friendly for p "aws-firehose.default.port":"12801", "core.default.restPort":"12800", "receiver-sharing-server.default.gRPCSslCertChainPath":"", - "agent-analyzer.default.meterAnalyzerActiveFiles":"datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent,go-agent", + "agent-analyzer.default.meterAnalyzerActiveFiles":"datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent,go-agent,ruby-runtime,php-runtime,nodejs-runtime", "agent-analyzer.default.traceSamplingPolicySettingsFile":"trace-sampling-policy-settings.yml", "core.default.gRPCSslTrustedCAPath":"", "configuration-discovery.default.disableMessageDigest":"false", diff --git a/docs/en/setup/backend/backend-meter.md b/docs/en/setup/backend/backend-meter.md index fa1bf25f5d4b..e7c73e2acd4e 100644 --- a/docs/en/setup/backend/backend-meter.md +++ b/docs/en/setup/backend/backend-meter.md @@ -54,6 +54,7 @@ All following agents and components have built-in meters reporting to the OAP th 6. Rover(eBPF) agent for metrics used continues profiling 7. Satellite proxy self-observability metrics 8. PHP agent for PHM (PHP Health Metrics) runtime metrics — **Linux only** (`/proc` sampling) +9. Node.js agent for runtime metrics (`instance_nodejs_*` via `MeterReportService`). OAP exposes `meter_instance_nodejs_*` through `nodejs-runtime.yaml`. ## Configuration file diff --git a/docs/en/setup/backend/dashboards-nodejs-runtime.md b/docs/en/setup/backend/dashboards-nodejs-runtime.md new file mode 100644 index 000000000000..ae91a8a15fea --- /dev/null +++ b/docs/en/setup/backend/dashboards-nodejs-runtime.md @@ -0,0 +1,71 @@ +# Node.js runtime metrics + +The SkyWalking Node.js agent reports **runtime metrics** (process memory and CPU) through the +**`MeterReportService`** gRPC protocol—the same pipeline used by the Go and Python agents. OAP analyzes raw meters via +`meter-analyzer-config/nodejs-runtime.yaml` and stores **`meter_instance_nodejs_*`** metrics at the service instance level. + +## Platform support + +Runtime meters are collected on **Linux, macOS, and Windows** via Node.js built-in APIs +(`process.memoryUsage()`, `process.cpuUsage()`, `v8.getHeapStatistics()`). + +## Data flow + +1. On each collect interval (default **1 second**), the agent samples Node.js runtime APIs. +2. Samples are mapped into `MeterData` protobuf messages and buffered. +3. On each report interval (default **1 second**), buffered meters are sent to OAP over gRPC port **11800** + via `MeterReportService.collect` (independent from trace export on the same address). +4. OAP applies MAL rules in `nodejs-runtime.yaml` and exposes **`meter_instance_nodejs_*`** metrics. +5. Horizon UI renders widgets on **General Service → Instance → Dashboard** when the corresponding + `meter_instance_nodejs_*` metrics exist. + +## Agent setup + +Runtime metric reporting is **on by default**. Relevant environment variables: + +| Variable | Description | Default | +| :--- | :--- | :--- | +| `SW_AGENT_NODEJS_RUNTIME_METRICS_REPORTER_ACTIVE` | Master switch for runtime metric export | `true` | +| `SW_AGENT_NODEJS_RUNTIME_METRICS_COLLECT_PERIOD` | Sample interval (ms) | `1000` | +| `SW_AGENT_NODEJS_RUNTIME_METRICS_REPORT_PERIOD` | Report interval (ms) | `1000` | +| `SW_AGENT_NODEJS_RUNTIME_METRICS_BUFFER_SIZE` | Max buffered samples before dropping oldest | `600` | + +Deprecated aliases `SW_AGENT_RUNTIME_METRICS_*`, `SW_AGENT_NVM_METRICS_*`, and `SW_AGENT_NVM_JVM_*` are still accepted. + +See the [Node.js agent README](https://github.com/apache/skywalking-nodejs/blob/master/README.md#nodejs-runtime-metrics) +for startup examples and the full field mapping from Node.js APIs to meter names. + +## OAP setup + +Ensure the gRPC receiver is reachable on the port configured in `SW_AGENT_COLLECTOR_BACKEND_SERVICES` (default `11800`). +The `nodejs-runtime` meter analyzer file is included in the default `meterAnalyzerActiveFiles` list—no extra +configuration is required for current Node.js agents. + +Meter rules live in `oap-server/server-starter/src/main/resources/meter-analyzer-config/nodejs-runtime.yaml`. + +## UI location + +**Layer:** General Service (`GENERAL`) + +**Path:** select a Node.js service → **Instance** → **Dashboard** + +Widgets appear only when runtime data is present (`visibleWhen` checks each `meter_instance_nodejs_*` expression). + +## Runtime metrics + +The agent reports raw meter names; OAP prefixes them with `meter_` when exposing queryable metrics: + +| Unit | Agent meter name | OAP / UI metric name | Description | Data Source | +| :--- | :--- | :--- | :--- | :--- | +| % | `instance_nodejs_process_cpu` | `meter_instance_nodejs_process_cpu` | Process CPU (user + system) over collect interval | SkyWalking Node.js Agent | +| bytes | `instance_nodejs_heap_used` | `meter_instance_nodejs_heap_used` | V8 heap used | SkyWalking Node.js Agent | +| bytes | `instance_nodejs_heap_total` | `meter_instance_nodejs_heap_total` | V8 heap total | SkyWalking Node.js Agent | +| bytes | `instance_nodejs_heap_limit` | `meter_instance_nodejs_heap_limit` | V8 max heap size | SkyWalking Node.js Agent | +| bytes | `instance_nodejs_rss` | `meter_instance_nodejs_rss` | Resident set size | SkyWalking Node.js Agent | +| bytes | `instance_nodejs_external_memory` | `meter_instance_nodejs_external_memory` | External memory | SkyWalking Node.js Agent | + +## Customizations + +You can customize MAL expressions or dashboard panels. Metric definitions and expression rules are in +`meter-analyzer-config/nodejs-runtime.yaml`. Instance dashboard widget templates ship from the +SkyWalking Horizon UI bundle (`general.json` in apache/skywalking-horizon-ui). diff --git a/docs/menu.yml b/docs/menu.yml index 8fee302d1416..22058a9f08b3 100644 --- a/docs/menu.yml +++ b/docs/menu.yml @@ -262,6 +262,8 @@ catalog: path: "/en/setup/backend/backend-meter" - name: "PHP runtime metrics (PHM)" path: "/en/setup/backend/dashboards-php-runtime" + - name: "Node.js runtime metrics" + path: "/en/setup/backend/dashboards-nodejs-runtime" - name: "Telegraf Metrics" path: "/en/setup/backend/telegraf-receiver" - name: "Apdex Threshold" diff --git a/oap-server/analyzer/meter-analyzer-scripts-test/src/test/resources/scripts/mal/test-meter-analyzer-config/nodejs-runtime.data.yaml b/oap-server/analyzer/meter-analyzer-scripts-test/src/test/resources/scripts/mal/test-meter-analyzer-config/nodejs-runtime.data.yaml new file mode 100644 index 000000000000..098012cc621e --- /dev/null +++ b/oap-server/analyzer/meter-analyzer-scripts-test/src/test/resources/scripts/mal/test-meter-analyzer-config/nodejs-runtime.data.yaml @@ -0,0 +1,96 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +script: oap-server/server-starter/src/main/resources/meter-analyzer-config/nodejs-runtime.yaml +input: + instance_nodejs_process_cpu: + - labels: + instance: test-instance + value: 12.5 + instance_nodejs_heap_used: + - labels: + instance: test-instance + value: 1048576.0 + instance_nodejs_heap_total: + - labels: + instance: test-instance + value: 2097152.0 + instance_nodejs_heap_limit: + - labels: + instance: test-instance + value: 4294967296.0 + instance_nodejs_rss: + - labels: + instance: test-instance + value: 3145728.0 + instance_nodejs_external_memory: + - labels: + instance: test-instance + value: 65536.0 +expected: + meter_instance_nodejs_process_cpu: + entities: + - scope: SERVICE_INSTANCE + instance: test-instance + layer: GENERAL + samples: + - labels: + instance: test-instance + value: 12.5 + meter_instance_nodejs_heap_used: + entities: + - scope: SERVICE_INSTANCE + instance: test-instance + layer: GENERAL + samples: + - labels: + instance: test-instance + value: 1048576.0 + meter_instance_nodejs_heap_total: + entities: + - scope: SERVICE_INSTANCE + instance: test-instance + layer: GENERAL + samples: + - labels: + instance: test-instance + value: 2097152.0 + meter_instance_nodejs_heap_limit: + entities: + - scope: SERVICE_INSTANCE + instance: test-instance + layer: GENERAL + samples: + - labels: + instance: test-instance + value: 4294967296.0 + meter_instance_nodejs_rss: + entities: + - scope: SERVICE_INSTANCE + instance: test-instance + layer: GENERAL + samples: + - labels: + instance: test-instance + value: 3145728.0 + meter_instance_nodejs_external_memory: + entities: + - scope: SERVICE_INSTANCE + instance: test-instance + layer: GENERAL + samples: + - labels: + instance: test-instance + value: 65536.0 diff --git a/oap-server/server-starter/src/main/resources/application.yml b/oap-server/server-starter/src/main/resources/application.yml index 97cd27ee766d..eccd933d0c9b 100644 --- a/oap-server/server-starter/src/main/resources/application.yml +++ b/oap-server/server-starter/src/main/resources/application.yml @@ -229,7 +229,7 @@ agent-analyzer: # Nginx and Envoy agents can't get the real remote address. # Exit spans with the component in the list would not generate the client-side instance relation metrics. noUpstreamRealAddressAgents: ${SW_NO_UPSTREAM_REAL_ADDRESS:6000,9000} - meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent,go-agent,ruby-runtime,php-runtime} # Which files could be meter analyzed, files split by "," + meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent,go-agent,ruby-runtime,php-runtime,nodejs-runtime} # Which files could be meter analyzed, files split by "," slowCacheReadThreshold: ${SW_SLOW_CACHE_SLOW_READ_THRESHOLD:default:20,redis:10} # The slow cache read operation thresholds. Unit ms. slowCacheWriteThreshold: ${SW_SLOW_CACHE_SLOW_WRITE_THRESHOLD:default:20,redis:10} # The slow cache write operation thresholds. Unit ms. diff --git a/oap-server/server-starter/src/main/resources/meter-analyzer-config/nodejs-runtime.yaml b/oap-server/server-starter/src/main/resources/meter-analyzer-config/nodejs-runtime.yaml new file mode 100644 index 000000000000..e1f90a2fdaea --- /dev/null +++ b/oap-server/server-starter/src/main/resources/meter-analyzer-config/nodejs-runtime.yaml @@ -0,0 +1,32 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Node.js runtime metrics via MeterReportService (instance_nodejs_*). + +expSuffix: instance(['service'], ['instance'], Layer.GENERAL) +metricPrefix: meter +metricsRules: + - name: instance_nodejs_process_cpu + exp: instance_nodejs_process_cpu + - name: instance_nodejs_heap_used + exp: instance_nodejs_heap_used + - name: instance_nodejs_heap_total + exp: instance_nodejs_heap_total + - name: instance_nodejs_heap_limit + exp: instance_nodejs_heap_limit + - name: instance_nodejs_rss + exp: instance_nodejs_rss + - name: instance_nodejs_external_memory + exp: instance_nodejs_external_memory diff --git a/test/e2e-v2/cases/nodejs/Dockerfile.nodejs b/test/e2e-v2/cases/nodejs/Dockerfile.nodejs index e19e81f12503..3d28865d88a8 100644 --- a/test/e2e-v2/cases/nodejs/Dockerfile.nodejs +++ b/test/e2e-v2/cases/nodejs/Dockerfile.nodejs @@ -13,7 +13,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -FROM node:12 +FROM node:20 ARG SW_AGENT_NODEJS_COMMIT diff --git a/test/e2e-v2/cases/nodejs/e2e.yaml b/test/e2e-v2/cases/nodejs/e2e.yaml index c26660e3d652..0b348a75056e 100644 --- a/test/e2e-v2/cases/nodejs/e2e.yaml +++ b/test/e2e-v2/cases/nodejs/e2e.yaml @@ -109,3 +109,16 @@ verify: expected: expected/metrics-has-value.yml - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=service_instance_relation_server_cpm --instance-name=consumer-instance --service-name=consumer --dest-instance-name=consumer1 --dest-service-name=e2e-service-consumer expected: expected/metrics-has-value.yml + # nodejs runtime metrics (MeterReportService -> meter_instance_nodejs_*) + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_instance_nodejs_process_cpu --instance-name=provider-instance --service-name=provider + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_instance_nodejs_heap_used --instance-name=provider-instance --service-name=provider + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_instance_nodejs_heap_total --instance-name=provider-instance --service-name=provider + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_instance_nodejs_heap_limit --instance-name=provider-instance --service-name=provider + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_instance_nodejs_rss --instance-name=provider-instance --service-name=provider + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_instance_nodejs_external_memory --instance-name=provider-instance --service-name=provider + expected: expected/metrics-has-value.yml diff --git a/test/e2e-v2/cases/nodejs/expected/service-instance-consumer-nodejs.yml b/test/e2e-v2/cases/nodejs/expected/service-instance-consumer-nodejs.yml index f93d91198b2c..5cce62a74d19 100644 --- a/test/e2e-v2/cases/nodejs/expected/service-instance-consumer-nodejs.yml +++ b/test/e2e-v2/cases/nodejs/expected/service-instance-consumer-nodejs.yml @@ -22,8 +22,10 @@ {{- contains .attributes }} - name: OS Name value: linux + - name: hostname + value: {{ notEmpty .value }} - name: "Process No." - value: "1" + value: {{ notEmpty .value }} - name: ipv4s value: "" {{- end}} diff --git a/test/e2e-v2/cases/nodejs/expected/service-instance-provider-nodejs.yml b/test/e2e-v2/cases/nodejs/expected/service-instance-provider-nodejs.yml index d7e0937e3ba5..8820aa2618b1 100644 --- a/test/e2e-v2/cases/nodejs/expected/service-instance-provider-nodejs.yml +++ b/test/e2e-v2/cases/nodejs/expected/service-instance-provider-nodejs.yml @@ -22,8 +22,10 @@ {{- contains .attributes }} - name: OS Name value: linux + - name: hostname + value: {{ notEmpty .value }} - name: "Process No." - value: "1" + value: {{ notEmpty .value }} - name: ipv4s value: "" {{- end}} diff --git a/test/e2e-v2/cases/storage/expected/config-dump.yml b/test/e2e-v2/cases/storage/expected/config-dump.yml index f847b69d6652..6f07b0f40684 100644 --- a/test/e2e-v2/cases/storage/expected/config-dump.yml +++ b/test/e2e-v2/cases/storage/expected/config-dump.yml @@ -32,7 +32,7 @@ "admin-server.default.port": "17128", "admin-server.provider": "default", "agent-analyzer.default.forceSampleErrorSegment": "true", - "agent-analyzer.default.meterAnalyzerActiveFiles": "datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent,go-agent,ruby-runtime,php-runtime", + "agent-analyzer.default.meterAnalyzerActiveFiles": "datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent,go-agent,ruby-runtime,php-runtime,nodejs-runtime", "agent-analyzer.default.noUpstreamRealAddressAgents": "6000,9000", "agent-analyzer.default.segmentStatusAnalysisStrategy": "FROM_SPAN_STATUS", "agent-analyzer.default.slowCacheReadThreshold": "default:20,redis:10", diff --git a/test/e2e-v2/script/env b/test/e2e-v2/script/env index de74d3dd6408..8642e4c68488 100644 --- a/test/e2e-v2/script/env +++ b/test/e2e-v2/script/env @@ -16,7 +16,7 @@ SW_AGENT_JAVA_COMMIT=ac0df43d7140e726eba9e5e5b1b75cf364c71dff SW_AGENT_SATELLITE_COMMIT=ea27a3f4e126a24775fe12e2aa2695bcb23d99c3 SW_AGENT_NGINX_LUA_COMMIT=c3cee4841798a147d83b96a10914d4ac0e11d0aa -SW_AGENT_NODEJS_COMMIT=4f9a91dad3dfd8cfe5ba8f7bd06b39e11eb5e65e +SW_AGENT_NODEJS_COMMIT=36df516f737bfc665dab7312703d826f9dcf527e SW_AGENT_GO_COMMIT=19a9fa9bf058329281aa611f176cf5b7e5cbda8f SW_AGENT_PYTHON_COMMIT=b91ebc46010ba6a46b251d4df54190c3b64f2db8 SW_AGENT_CLIENT_JS_COMMIT=f08776d909eb1d9bc79c600e493030651b97e491