Skip to content

fix(store): use PASSIVE checkpoint to avoid file-shrink under concurrent readers#316

Closed
edwardmhughes wants to merge 1 commit into
DeusData:mainfrom
edwardmhughes:fix/passive-checkpoint
Closed

fix(store): use PASSIVE checkpoint to avoid file-shrink under concurrent readers#316
edwardmhughes wants to merge 1 commit into
DeusData:mainfrom
edwardmhughes:fix/passive-checkpoint

Conversation

@edwardmhughes

Copy link
Copy Markdown
Contributor

Summary

cbm_store_checkpoint() used SQLITE_CHECKPOINT_TRUNCATE — the most aggressive checkpoint mode. On success it calls ftruncate(fd, 0) on the WAL file. When two codebase-memory-mcp processes share a cache dir, the truncation can shrink files while a sibling has the DB mmap'd through SQLite, raising SIGBUS on macOS (cluster_pagein past EOF).

Switching to SQLITE_CHECKPOINT_PASSIVE closes this truncation path:

  • PASSIVE never blocks readers
  • PASSIVE never calls ftruncate() on either file
  • SQLite still autocheckpoints in PASSIVE mode at 1000-page boundaries; disk reclamation is unaffected for single-process users

From the SQLite docs on wal_checkpoint:

PASSIVE mode does not wait for writers and never invokes a blocking read/write lock.

Changes

  • src/store/store.c: SQLITE_CHECKPOINT_TRUNCATESQLITE_CHECKPOINT_PASSIVE with explanatory comment
  • tests/test_store_checkpoint.c (new): Verifies WAL is not truncated to zero bytes after cbm_store_checkpoint() — this test would fail on the old TRUNCATE mode under the right timing
  • Makefile.cbm + tests/test_main.c: Wire new test suite

Fixes #314
Related: #277 (orphan-process WAL pin on Windows)

…ent readers

cbm_store_checkpoint() invoked SQLITE_CHECKPOINT_TRUNCATE, the most
aggressive mode. When two cbm-mcp processes share a cache dir, one
process's TRUNCATE can shrink files while another has them mmap'd,
raising SIGBUS on macOS. PASSIVE never blocks readers and never
ftruncate()s either file; SQLite still autocheckpoints in PASSIVE
mode at 1000-page boundaries, so reclamation is unaffected for
single-process users.

Recommended by SQLite docs for shared databases:
https://www.sqlite.org/pragma.html#pragma_wal_checkpoint
@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory labels May 4, 2026
@DeusData

Copy link
Copy Markdown
Owner

Landed on main as 2215356, thanks @edwardmhughes. The SQLITE_CHECKPOINT_TRUNCATEPASSIVE switch is precisely the right move for shared-cache scenarios — the ftruncate(fd, 0) that TRUNCATE invokes is exactly what makes sibling-process mmap pages disappear underfoot, and the SQLite docs explicitly recommend PASSIVE for this case.

Two notes:

  • Because feat(store): expose mmap_size via CBM_SQLITE_MMAP_SIZE env #315 (also yours, also touched Makefile.cbm and tests/test_main.c to wire a new test suite) landed first, this branch ended up conflicting on those two files. GitHub couldn't auto-rebase, so I resolved the conflicts locally by taking the union (both tests/test_store_pragmas.c and tests/test_store_checkpoint.c are now wired in side by side) and cherry-picked your commit with author preserved. The PR shows as closed rather than merged, but the content + your authorship landed verbatim on main.
  • The test_store_checkpoint.c smoke test (asserting WAL not truncated to zero) is the right shape — it'd fail under the old TRUNCATE mode under the right timing, so it actually defends the fix.

Pairs nicely with the CBM_SQLITE_MMAP_SIZE=0 knob from #315 — together they close both halves of the macOS multi-instance SIGBUS scenario in #314.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIGBUS on macOS arm64 when 2+ sessions share the same cache dir

2 participants