Skip to content

fix: bound metric exporter state growth#42

Merged
dmmulroy merged 3 commits into
mainfrom
ollie/memory-leak
Jun 17, 2026
Merged

fix: bound metric exporter state growth#42
dmmulroy merged 3 commits into
mainfrom
ollie/memory-leak

Conversation

@dmmulroy

@dmmulroy dmmulroy commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • expire counter series after five missed refreshes while preserving monotonic accumulation during brief gaps
  • correctly pass gauges through and export newly observed counters immediately
  • persist oversized MetricExporter snapshots in validated, bounded chunks with atomic generation switching and legacy-state migration
  • surface GraphQL access errors and retry denied product queries hourly instead of every refresh interval
  • recover the MetricExporter alarm loop after any uncaught handler failure so refreshes cannot silently stop
  • add Vitest coverage, typechecking, and pull-request CI

Motivation

MetricExporter retained every counter label set it had ever observed and persisted the complete metrics/counter snapshot under one Durable Object storage value. Label churn could therefore cause unbounded memory/storage growth and eventually oversized writes.

A failed alarm could also exhaust Cloudflare's finite platform retries before the application scheduled its next alarm. The Worker would remain available but continue serving stale metrics indefinitely. The top-level alarm recovery path now schedules a fallback alarm one minute later after any unexpected failure. If fallback scheduling itself fails, the error is rethrown so platform retries still apply.

GraphQL products unavailable to an account were also treated as successful empty responses and queried every minute.

Testing

  • bun install --frozen-lockfile
  • bun run check
  • bun run typecheck
  • bun run test (34 tests)
  • repeated alarm-failure regression (10 consecutive failures each schedule recovery)
  • wrangler deploy --dry-run with a temporary valid KV namespace ID

Closes #28

@dmmulroy dmmulroy merged commit c10f1a3 into main Jun 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLITE_TOOBIG + unbounded memory growth from counter accumulation

1 participant