Skip to content

Commit 64151d6

Browse files
authored
chore(webapp): reduce telemetry ingestion log volume (#3832)
## Summary On a busy webapp the trace/log/metric ingestion path emits several `info` logs per insert batch, which makes up the bulk of the service's log output. This moves that per-batch chatter to `debug` and adds an opt-in to drop successful HTTP access logs, cutting log volume with no loss of error signal. ## Details The per-batch ClickHouse insert logs, the flush scheduler's concurrency adjustments, and the event-loop utilization sample (already exported as a metric, so the log line was redundant) now log at `debug`. Error and warning logs are untouched. New `HTTP_ACCESS_LOG_DISABLED=1` env var: when set, the HTTP access logger skips successful (2xx) requests while still logging non-2xx responses. Defaults off, so existing deployments are unchanged.
1 parent cae3dcb commit 64151d6

5 files changed

Lines changed: 21 additions & 6 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
area: webapp
3+
type: improvement
4+
---
5+
6+
Move per-batch ClickHouse event-insert logs to the debug level to cut default log volume, and add an `HTTP_ACCESS_LOG_DISABLED` env var that suppresses successful (2xx) HTTP access logs while still logging errors.

apps/webapp/app/eventLoopMonitor.server.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ function startEventLoopUtilizationMonitoring() {
124124
const utilization = Number.isFinite(diff.utilization) ? diff.utilization : 0;
125125

126126
if (Math.random() < env.EVENT_LOOP_MONITOR_UTILIZATION_SAMPLE_RATE) {
127-
logger.info("nodejs.event_loop.utilization", { utilization });
127+
logger.debug("nodejs.event_loop.utilization", { utilization });
128128
}
129129

130130
lastEventLoopUtilization = currentEventLoopUtilization;

apps/webapp/app/v3/dynamicFlushScheduler.server.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ export class DynamicFlushScheduler<T> {
310310
if (newConcurrency !== currentConcurrency) {
311311
this.limiter = pLimit(newConcurrency);
312312

313-
this.logger.info("Adjusted flush concurrency", {
313+
this.logger.debug("Adjusted flush concurrency", {
314314
previousConcurrency: currentConcurrency,
315315
newConcurrency,
316316
queuePressure,

apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ export class ClickhouseEventRepository implements IEventRepository {
269269
return;
270270
}
271271

272-
logger.info("ClickhouseEventRepository.flushBatch Inserted batch into clickhouse", {
272+
logger.debug("ClickhouseEventRepository.flushBatch Inserted batch into clickhouse", {
273273
events: events.length,
274274
insertResult: outcome.insertResult,
275275
sanitized: outcome.kind === "sanitized",
@@ -302,7 +302,7 @@ export class ClickhouseEventRepository implements IEventRepository {
302302
return;
303303
}
304304

305-
logger.info("ClickhouseEventRepository.flushLlmMetricsBatch Inserted LLM metrics batch", {
305+
logger.debug("ClickhouseEventRepository.flushLlmMetricsBatch Inserted LLM metrics batch", {
306306
rows: rows.length,
307307
sanitized: outcome.kind === "sanitized",
308308
});
@@ -421,7 +421,7 @@ export class ClickhouseEventRepository implements IEventRepository {
421421
throw insertError;
422422
}
423423

424-
logger.info("ClickhouseEventRepository.flushOtelMetricsBatch Inserted OTLP metrics batch", {
424+
logger.debug("ClickhouseEventRepository.flushOtelMetricsBatch Inserted OTLP metrics batch", {
425425
rows: rows.length,
426426
});
427427
});

apps/webapp/server.ts

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,16 @@ if (ENABLE_CLUSTER && cluster.isPrimary) {
108108
// more aggressive with this caching.
109109
app.use(express.static("public", { maxAge: "1h" }));
110110

111-
app.use(morgan("tiny"));
111+
// On high-volume machine-ingest services (e.g. otel) the per-request access
112+
// log dominates log volume. HTTP_ACCESS_LOG_DISABLED suppresses successful
113+
// (2xx) access logs; non-2xx responses are always logged so errors stay visible.
114+
const suppressSuccessfulAccessLogs = process.env.HTTP_ACCESS_LOG_DISABLED === "1";
115+
app.use(
116+
morgan("tiny", {
117+
skip: (_req, res) =>
118+
suppressSuccessfulAccessLogs && res.statusCode >= 200 && res.statusCode < 300,
119+
})
120+
);
112121

113122
process.title = ENABLE_CLUSTER
114123
? `node webapp-worker-${cluster.isWorker ? cluster.worker?.id : "solo"}`

0 commit comments

Comments
 (0)