From c8b728c9f919069f8edbd3febfe73511812c47ae Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Thu, 30 Apr 2026 19:48:28 +0800 Subject: [PATCH] Add mini-program monitoring blog (WeChat / Alipay) --- .../index.md | 124 ++++++++++++++++++ .../index.md | 124 ++++++++++++++++++ 2 files changed, 248 insertions(+) create mode 100644 content/blog/2026-04-30-mini-program-monitoring-with-skywalking/index.md create mode 100644 content/zh/2026-04-30-mini-program-monitoring-with-skywalking/index.md diff --git a/content/blog/2026-04-30-mini-program-monitoring-with-skywalking/index.md b/content/blog/2026-04-30-mini-program-monitoring-with-skywalking/index.md new file mode 100644 index 000000000000..4e21862a5ec1 --- /dev/null +++ b/content/blog/2026-04-30-mini-program-monitoring-with-skywalking/index.md @@ -0,0 +1,124 @@ +--- +title: "Monitoring WeChat and Alipay Mini Programs with SkyWalking" +author: "Sheng Wu" +date: 2026-04-30 +description: "SkyAPM/mini-program-monitor and SkyWalking OAP together extend SkyWalking's end-user monitoring to WeChat and Alipay Mini Programs. This post focuses on the data path, the cross-platform abstraction, and the OAP-side integration." +tags: +- Mini Program +- WeChat +- Alipay +- OTLP +- End User Monitoring +--- + +# Monitoring WeChat and Alipay Mini Programs with SkyWalking + +Mini programs are a major part of the mobile experience in China, but the open-source observability ecosystem has long focused on web browsers and native apps. SkyWalking already covers browser (client-js), iOS, and the server side; mini programs and Android were the remaining gaps. With [SkyAPM/mini-program-monitor](https://github.com/SkyAPM/mini-program-monitor) joining the SkyWalking ecosystem, the mini-program half of that gap is closed — one SDK supports both WeChat and Alipay, and the matching OAP-side component IDs, MAL rules, and UI templates are merged on `main` and will ship with 10.5.0. + +This post is for teams that already run a SkyWalking backend and want to bring their mini programs into the same observability stack. The interesting parts aren't *that* the project exists — they are how the data flows from a mini program to a SkyWalking dashboard, how the two platforms coexist, and what design trade-offs you should know about before rolling this out. + +## Data path + +The SDK uses two protocols: + +- **OTLP HTTP** (error logs, performance metrics, request metrics) → OAP `/v1/logs`, `/v1/metrics` +- **SkyWalking native** (distributed tracing segments, optional) → OAP `/v3/segments` + +Why not a single protocol? OTLP already covers logs and metrics, so there's no point reinventing native endpoints for those. But for tracing, OAP's native `SegmentObject` maps more cleanly onto SkyWalking's trace model, and `sw8` header propagation to the backend works without any conversion. So traces go native, everything else goes OTLP, and neither side has to translate. + +OTLP defaults to protobuf; JSON is available for debugging. The SDK has zero runtime dependencies. + +## Two platforms, two independent Layers and dashboards + +Many teams maintain a WeChat mini program and an Alipay mini program against a shared backend. Rather than collapsing them into a single tagged service, the design promotes each platform to its own Layer — `WECHAT_MINI_PROGRAM` and `ALIPAY_MINI_PROGRAM` — with its own dashboard set. The SDK tags every signal with a resource attribute `miniprogram.platform = wechat | alipay` and assigns each platform its own component ID (WeChat = 10002, Alipay = 10003). + +On the OAP side, the MAL rule's `filter` routes data into the right Layer at ingest: + +```yaml +metricPrefix: meter_wechat_mp +filter: "{ tags -> tags.miniprogram_platform == 'wechat' }" +``` + +The Alipay rule mirrors this with `'alipay'`. The two rules are mutually exclusive — no double counting — and produce distinct metric prefixes (`meter_wechat_mp_*` vs `meter_alipay_mp_*`) that feed each Layer's dashboards. Even when both platforms use the same `service.name` (e.g. `mini-program-demo`), the UI exposes two completely separate entry points. + +## Asymmetric metric semantics + +This is the design choice I want to highlight. WeChat's base library exposes `PerformanceObserver`, which gives you renderer-authoritative timings: app launch, first render, route navigation, script execution, sub-package load — all real measurements. Alipay's base library doesn't offer an equivalent, so the SDK falls back to lifecycle hooks: the `App.onLaunch → App.onShow` delta is used as an approximation of launch time, and renderer-level timings simply aren't available. + +So the two MAL rule sets are deliberately not the same: + +- **WeChat**: `app_launch_duration`, `first_render_duration`, `route_duration`, `script_duration`, `package_load_duration`, `request_duration_percentile`, `request_cpm` +- **Alipay**: `app_launch_duration`, `first_render_duration`, `request_duration_percentile`, `request_cpm` + +The Alipay `app_launch_duration` is a lifecycle approximation and is not directly comparable to WeChat's renderer timing — the dashboard tooltip says so explicitly. Putting the two numbers side by side is comparing two different measurement definitions. + +## What the SDK does + +Four signals: + +- **Errors** — JS exceptions, unhandled promise rejections, and `pageNotFound` go out as OTLP logs, following the OTel `exception.*` semantic conventions (`exception.type`, `exception.stacktrace`). Anything downstream that speaks OTLP — SkyWalking, OTel Collector, Grafana — recognizes them. +- **Performance** — the metrics listed above. OTLP gauge. +- **Requests** — `wx.request` / `my.request` / `downloadFile` / `uploadFile` are reported as OTLP delta histograms, one batch per `flushInterval` (default 5s). The `le` bucket labels are already in milliseconds, and the MAL rule explicitly declares `MILLISECONDS` to disable the default SECONDS→MS rescale. Failed requests (4xx / 5xx / timeout) additionally emit an error log so you can pivot from a dashboard to a concrete failure. +- **Tracing (opt-in)** — when enabled, outbound requests get `sw8` header injection, and the resulting segments stitch together with backend traces into one end-to-end view. Trace data goes out as SkyWalking `SegmentObject`, not OTLP traces. + +Two reliability and cardinality details worth calling out: + +**Persisting events on app hide.** Mini programs get killed by the framework after some time in background, and weak networks make in-flight events easy to lose. The SDK writes unsent events to `wx.setStorage` / `my.setStorage` on `onAppHide` and restores them on the next launch. + +**Avoiding cardinality explosions.** Set `serviceInstance` to the app version (e.g. `1.4.2`), not a device ID — at a million DAU the device-ID dimension blows up the OAP instance index. For request paths, the SDK exposes `urlGroupRules` regex patterns to fold parameterized URLs like `/api/user/12345` into `/api/user/{id}` so the endpoint dimension doesn't blow up either. + +## What OAP needs + +If you're on `main` or a release ≥ 10.5.0, the following are already shipped: + +- `config/component-libraries.yml` registers `WeChat-MiniProgram: 10002` and `AliPay-MiniProgram: 10003` +- `config/otel-rules/miniprogram/` holds four MAL rules — service-scoped and instance-scoped for each platform +- `config/ui-initialized-templates/wechat_mini_program/` and `alipay_mini_program/` carry root / service / instance / endpoint dashboards +- `config/ui-initialized-templates/menu.yaml` registers both layers under the Mobile menu group + +The only thing left is enabling the OTel receiver and giving the SDK an OTLP HTTP port it can reach. SkyWalking OAP binds its OTLP HTTP handler onto the receiver-sharing-server port, and that port defaults to `0` — meaning it's folded into the core REST port (12800). If you want the SDK to use the standard OTLP HTTP port 4318, set the sharing port to 4318: + +```bash +docker run -d --name sw-oap \ + -p 11800:11800 -p 12800:12800 -p 4318:4318 \ + -e SW_STORAGE=banyandb \ + -e SW_STORAGE_BANYANDB_TARGETS=banyandb:17912 \ + -e SW_OTEL_RECEIVER=default \ + -e SW_RECEIVER_SHARING_REST_PORT=4318 \ + apache/skywalking-oap-server:latest +``` + +All receivers (OTLP, native segment, browser perf, log report) move to 4318 together, while GraphQL stays on 12800 for the UI. + +Minimal SDK config: + +```js +import MiniProgramMonitor from 'mini-program-monitor'; + +MiniProgramMonitor.init({ + service: 'mini-program-demo', + serviceInstance: '1.4.2', // Recommended: app version + collector: 'http://your-oap:4318', + enable: { + error: true, + perf: true, + request: true, + tracing: false, // Off by default; enable as needed + }, +}); +``` + +WeChat and Alipay use the same config — the SDK detects the platform at runtime and tags the data accordingly. + +## Compatibility + +- WeChat base library ≥ 2.11 +- Alipay base library ≥ 2.0 +- Apache SkyWalking OAP `main` or ≥ 10.5.0, with the OTLP HTTP receiver enabled +- Any other OTLP-compatible backend (OpenTelemetry Collector, Grafana, etc.) also works, but you won't get the SkyWalking-specific cross-platform dashboards + +## What's next + +To get involved, head over to [SkyAPM/mini-program-monitor](https://github.com/SkyAPM/mini-program-monitor) and open an issue or PR. The repo also ships a `make preview` target that boots OAP, the UI, and both platform simulators locally — handy if you want to play with it end-to-end. + +Android end-user experience monitoring is still a gap in the SkyWalking ecosystem; contributors interested in closing that one are very welcome. diff --git a/content/zh/2026-04-30-mini-program-monitoring-with-skywalking/index.md b/content/zh/2026-04-30-mini-program-monitoring-with-skywalking/index.md new file mode 100644 index 000000000000..d069139fecf1 --- /dev/null +++ b/content/zh/2026-04-30-mini-program-monitoring-with-skywalking/index.md @@ -0,0 +1,124 @@ +--- +title: "用 SkyWalking 监控微信和支付宝小程序" +author: "吴晟" +date: 2026-04-30 +description: "SkyAPM/mini-program-monitor 与 SkyWalking OAP 配合,把微信和支付宝小程序纳入 SkyWalking 的端用户体验监控。本文聚焦数据通路、双平台抽象与 OAP 端集成。" +tags: +- Mini Program +- WeChat +- Alipay +- OTLP +- End User Monitoring +--- + +# 用 SkyWalking 监控微信和支付宝小程序 + +小程序是国内移动端体验里绕不过去的一块,但开源监控生态长期偏向 Web 浏览器和原生 App。SkyWalking 自身已经覆盖了浏览器(client-js)、iOS、服务端,缺口主要在小程序和 Android。[SkyAPM/mini-program-monitor](https://github.com/SkyAPM/mini-program-monitor) 加入 SkyWalking 生态后,把这块缺口的小程序部分补上了 —— 一份 SDK 同时支持微信和支付宝,OAP 端的 component、MAL 规则、UI 模板已经合进 main 分支,会随 10.5.0 一起发布。 + +这篇博客面向已经跑着 SkyWalking 后端、希望把小程序也接进来的团队。重点不是"项目存在"这件事,而是数据从小程序到 SkyWalking dashboard 走的是哪条路、双平台是怎么共存的、以及上线之前需要知道哪些设计取舍。 + +## 数据通路 + +SDK 走两条腿: + +- **OTLP HTTP**(错误日志、性能指标、请求指标)→ OAP 的 `/v1/logs`、`/v1/metrics` +- **SkyWalking 原生协议**(链路追踪 segment,可选)→ OAP 的 `/v3/segments` + +为什么不是单协议?OTLP 已经覆盖了 logs 和 metrics 两类信号,没必要再造一份原生 endpoint;但分布式追踪上 OAP 的原生 `SegmentObject` 比 OTLP traces 表达力更贴 SkyWalking 自己的 trace 模型,且与服务端通过 `sw8` header 透传时无需任何转换。所以追踪走原生,其它走 OTLP,两边都不绕路。 + +OTLP 默认用 protobuf,调试时可切成 JSON。SDK 没有任何运行时依赖。 + +## 双平台对应两个独立的 Layer 与监控面板 + +很多团队会同时维护一个微信小程序和一个支付宝小程序,业务逻辑共享一个后端。这套设计没有把它们塞进同一个 service 用 tag 区分,而是直接做成两个独立的 Layer:`WECHAT_MINI_PROGRAM` 和 `ALIPAY_MINI_PROGRAM`,对应两套独立的监控面板。SDK 在每个信号上打 resource 属性 `miniprogram.platform = wechat | alipay`,并给两端各分配独立的 component ID(微信 10002、支付宝 10003)。 + +OAP 这一头是用 MAL 规则的 `filter` 把数据在 ingest 阶段就分流到对应 Layer 的: + +```yaml +metricPrefix: meter_wechat_mp +filter: "{ tags -> tags.miniprogram_platform == 'wechat' }" +``` + +支付宝那份规则同理过滤 `alipay`。两份规则互斥,不会重复计数;输出的 metric 前缀也不一样(`meter_wechat_mp_*` vs `meter_alipay_mp_*`),各自落在对应 Layer 的 dashboard 上。即使两端用同一个 `service.name`(比如都叫 `mini-program-demo`),UI 里也是两套完全独立的入口。 + +## 不对等的指标语义 + +这是这套设计里我特别想强调的一处诚实选择。微信的基础库提供 `PerformanceObserver`,能拿到来自渲染层的权威时序:app launch、first render、route navigation、script execution、sub-package load 都是真实指标。支付宝的基础库不提供等价 API,SDK 只能用生命周期回退做近似:`App.onLaunch → App.onShow` 的 delta 当作启动时间,渲染相关的拿不到。 + +所以两份 OAP 规则里的 metric 集合不对等: + +- 微信:`app_launch_duration`、`first_render_duration`、`route_duration`、`script_duration`、`package_load_duration`、`request_duration_percentile`、`request_cpm` +- 支付宝:`app_launch_duration`、`first_render_duration`、`request_duration_percentile`、`request_cpm` + +支付宝侧的 `app_launch_duration` 是生命周期近似值,与微信的渲染层数值不可直接对比,这一点在 dashboard 的字段提示里也写明了。把两个数字放一起做横评等于在比较两种不同测量定义。 + +## SDK 端做了什么 + +四类信号: + +- **错误**:JS 异常 / unhandled promise rejection / pageNotFound 走 OTLP logs,按 OTel `exception.*` 语义约定(`exception.type`、`exception.stacktrace`),下游不光 SkyWalking,OTel Collector / Grafana 也都认。 +- **性能**:上面那张表里那些。OTLP gauge。 +- **请求**:`wx.request` / `my.request` / `downloadFile` / `uploadFile` 都走 OTLP delta histogram,每个 flush 间隔(默认 5s)发一次增量。`le` 桶标签直接用 ms,OAP MAL 里显式声明 `MILLISECONDS` 阻止默认的 SECONDS→MS 缩放。失败请求(4xx/5xx/超时)额外发一条错误日志,方便从 dashboard 跳到具体错误。 +- **追踪(可选)**:开启后给出站请求注入 `sw8` 头,落到 OAP 后能与服务端 trace 拼成一条完整链路。trace 段以 SkyWalking `SegmentObject` 形式发出,不走 OTLP traces。 + +可靠性和基数控制的两个细节值得一提: + +**App hide 时落本地存储**。小程序后台一段时间会被框架杀掉,弱网时也容易丢包。SDK 在 `onAppHide` 时把未发送的事件写到 `wx.setStorage` / `my.setStorage`,下次启动恢复并继续上报。 + +**反基数膨胀**。强烈建议把 `serviceInstance` 设成应用版本号(如 `1.4.2`),不要用设备 ID —— 小程序日活百万级时设备 ID 维度直接把 OAP 的 instance 索引打爆。请求路径方面 SDK 提供 `urlGroupRules` 正则把 `/api/user/12345` 这类参数化路径归并到 `/api/user/{id}`,避免 endpoint 维度也膨胀。 + +## OAP 端要做什么 + +如果你用的是 main 分支或者 10.5.0 之后的发布版,下面这些已经内置: + +- `config/component-libraries.yml`:注册了 `WeChat-MiniProgram: 10002` 和 `AliPay-MiniProgram: 10003` +- `config/otel-rules/miniprogram/`:四份 MAL 规则,按 service / instance 维度分别定义 +- `config/ui-initialized-templates/wechat_mini_program/` 和 `alipay_mini_program/`:root / service / instance / endpoint 四张 dashboard +- `config/ui-initialized-templates/menu.yaml`:把两个 layer 注册到 Mobile 菜单组下 + +唯一需要做的就是确认 OTel receiver 启用、给 OTLP HTTP 一个 SDK 能直连的端口。SkyWalking OAP 的 OTLP HTTP handler 默认绑在 receiver-sharing-server 的端口上,而该端口默认值是 0(即复用 core REST 端口 12800)。如果想让 SDK 用标准 OTLP HTTP 端口 4318,把 sharing 端口设到 4318: + +```bash +docker run -d --name sw-oap \ + -p 11800:11800 -p 12800:12800 -p 4318:4318 \ + -e SW_STORAGE=banyandb \ + -e SW_STORAGE_BANYANDB_TARGETS=banyandb:17912 \ + -e SW_OTEL_RECEIVER=default \ + -e SW_RECEIVER_SHARING_REST_PORT=4318 \ + apache/skywalking-oap-server:latest +``` + +这样所有 receiver(OTLP + native segment + browser perf + log report)一起搬到 4318,GraphQL 仍在 12800 给 UI 用。 + +SDK 端配置最小集: + +```js +import MiniProgramMonitor from 'mini-program-monitor'; + +MiniProgramMonitor.init({ + service: 'mini-program-demo', + serviceInstance: '1.4.2', // 推荐:应用版本号 + collector: 'http://your-oap:4318', + enable: { + error: true, + perf: true, + request: true, + tracing: false, // 默认关,按需开 + }, +}); +``` + +微信和支付宝两端配置一模一样,平台标签由 SDK 在运行时自动判定。 + +## 兼容性 + +- 微信基础库 ≥ 2.11 +- 支付宝基础库 ≥ 2.0 +- Apache SkyWalking OAP main 分支或 ≥ 10.5.0;OTLP HTTP receiver 启用即可 +- 也可对接任意 OTLP 后端(OpenTelemetry Collector、Grafana 等),但那条路上拿不到 SkyWalking 专属的双平台 dashboard + +## 后续 + +参与方式直接去 [SkyAPM/mini-program-monitor](https://github.com/SkyAPM/mini-program-monitor) 提 issue / PR。仓库里有一个 `make preview` 一键拉起 OAP、UI、两端模拟器的本地 demo 环境,想看效果可以直接跑。 + +Android 端的端用户体验监控目前还是 SkyWalking 生态的空白,欢迎对这块感兴趣的同学一起补齐。