feat(posthog): add Redis-backed flag definition cache for local evaluation#3384
feat(posthog): add Redis-backed flag definition cache for local evaluation#3384kilo-code-bot[bot] wants to merge 1 commit into
Conversation
…ation Enables PostHog local evaluation with a distributed Redis cache so that multiple Next.js server processes share a single copy of flag definitions instead of each making independent API calls on every polling interval. - Upgrade posthog-node 5.10.4 → 5.34.2 to get FlagDefinitionCacheProvider - Add posthog-flag-cache.ts: implements the cache with acquire-fetch-release distributed locking (SET NX) so only one process polls PostHog at a time - Add redisSetNX to redis.ts for atomic lock acquisition - Add POSTHOG_FLAG_DEFINITIONS_REDIS_KEY and POSTHOG_FLAG_CACHE_LOCK_REDIS_KEY to redis-keys.ts - Wire flagDefinitionCacheProvider + enableLocalEvaluation + personalApiKey into PostHogClient() when POSTHOG_PERSONAL_API_KEY is set - Document POSTHOG_PERSONAL_API_KEY in .env.local.example When Redis is absent or POSTHOG_PERSONAL_API_KEY is unset the client falls back to the previous behaviour (remote evaluation per request).
| async onFlagDefinitionsReceived(data: FlagDefinitionCacheData): Promise<void> { | ||
| await redisSet(POSTHOG_FLAG_DEFINITIONS_REDIS_KEY, JSON.stringify(data), CACHE_TTL_SECONDS); | ||
| // Release the lock so the next poll cycle can re-elect a leader. | ||
| await redisDel(POSTHOG_FLAG_CACHE_LOCK_REDIS_KEY); |
There was a problem hiding this comment.
WARNING: redisDel errors propagate uncaught here, which may surface to the PostHog SDK's polling mechanism.
If redisSet at line 52 succeeds but redisDel throws (e.g. Redis timeout), the lock remains held for the full 90s TTL even though the data was already written to Redis. All other instances will wait up to 90s before the next poll cycle can elect a new leader. This is a latency issue: flag definitions will be stale until the lock expires.
Consider wrapping redisDel in a try/catch here (similar to how it's wrapped in shutdown()) so errors are silently absorbed rather than surfacing to the SDK:
try {
await redisDel(POSTHOG_FLAG_CACHE_LOCK_REDIS_KEY);
} catch {
// Lock TTL will expire it; not critical since data was already written
}| } | ||
|
|
||
| // Single shared PostHog client for the process. | ||
| // Disabled outside production to avoid sending real events during tests/dev. |
There was a problem hiding this comment.
SUGGESTION: Stale comment — disabled is no longer passed to the PostHog constructor.
The original code passed disabled: !isProduction to the PostHog constructor. The new code relies on the early return at line 17-26 to prevent instantiation in non-production. The comment now implies there is still a disabled flag being set, which is misleading.
| // Disabled outside production to avoid sending real events during tests/dev. | |
| // Single shared PostHog client for the process — only created in production. |
| // Single shared PostHog client for the process. | ||
| // Disabled outside production to avoid sending real events during tests/dev. | ||
| instance = new PostHog(isProduction ? key : key || 'disabled', { | ||
| instance = new PostHog(key, { |
There was a problem hiding this comment.
WARNING: PostHog is instantiated with an empty string key (key = '') if NEXT_PUBLIC_POSTHOG_KEY is unset.
The original code used isProduction ? key : key || 'disabled' to fall back to 'disabled' and explicitly passed disabled: !isProduction. Now, the code path only reaches this line when isProduction is true, so passing an empty-string key to PostHog in production (when the env var is missing) will result in silent failures on every event capture. Previously, the disabled option would have made this a no-op.
Consider an early guard:
if (!key) {
// misconfigured — return stub to avoid silent event loss
return { capture: () => {}, isFeatureEnabled: async () => false, getFeatureFlag: async () => undefined, debug: () => {}, getFeatureFlagPayload: async () => undefined, alias: () => {} };
}
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Executive SummaryThe most significant issue is an unguarded error path in Overview
Issue Details (click to expand)WARNING
SUGGESTION
Files Reviewed (6 files)
Fix these issues in Kilo Cloud Reviewed by claude-sonnet-4.6 · 1,049,551 tokens Review guidance: REVIEW.md from base branch |
Summary
posthog-nodefrom 5.10.4 → 5.34.2 to gain theposthog-node/experimentalexport, which provides theFlagDefinitionCacheProviderinterface for distributed flag-definition caching.apps/web/src/lib/posthog-flag-cache.ts: a Redis-backed implementation ofFlagDefinitionCacheProvider. Each polling cycle one process wins aSET NXlock, fetches flag definitions from PostHog, writes them to Redis, then releases the lock — all other processes read from the shared cache and skip the API call.redisSetNXtoapps/web/src/lib/redis.tsfor atomic lock acquisition (mirrors the existingredisGet/redisSet/redisDelhelpers).POSTHOG_FLAG_DEFINITIONS_REDIS_KEYandPOSTHOG_FLAG_CACHE_LOCK_REDIS_KEYinredis-keys.ts.flagDefinitionCacheProvider,enableLocalEvaluation: true, andpersonalApiKeyintoPostHogClient()whenPOSTHOG_PERSONAL_API_KEYis set. Falls back gracefully to the previous remote-evaluation behaviour when the key is absent or Redis is unavailable.POSTHOG_PERSONAL_API_KEYin.env.local.example.Verification
POSTHOG_PERSONAL_API_KEY(a PostHog personal API key from https://us.posthog.com/settings/user-api-keys) andREDIS_URLin.env.local.getFeatureFlagPayload).posthog:flags:definitionsandposthog:flags:lockkeys appear in Redis (redis-cli KEYS posthog:*)./decidecalls to PostHog).POSTHOG_PERSONAL_API_KEY, behaviour is unchanged — remote evaluation per request.Visual Changes
N/A
Reviewer Notes
FlagDefinitionCacheProviderAPI is marked@experimentalin posthog-node and may change in minor versions — worth revisiting on the next major posthog-node upgrade.SET NX EX 90(90 s TTL > 30 s default polling interval). If the process that holds the lock crashes before callingonFlagDefinitionsReceived, a new leader is elected after 90 s.shouldFetchFlagDefinitionscatches the error and returnstrue, so all processes fall back to independent API calls — same as before this PR.JSON.parse(cached) as FlagDefinitionCacheDatacast ingetFlagDefinitionsis acceptable here: the data was serialised by our ownonFlagDefinitionsReceivedand the PostHog SDK validates it internally before use. No untrusted external input is involved.packages/db,packages/kilo-chat-hooks, etc. (present onmain) that prevented the push hook from completing. All files touched by this PR typecheck cleanly.Built for Evgeny Shurakov by Kilo for Slack