fix: default CODEBUFF_VERBOSE and CODEBUFF_PROMPT_LOG to disabled, fix prompt logger truncation by reillyse · Pull Request #1 · reillyse/codebuff

reillyse · 2026-04-16T18:27:51Z

Changes

CODEBUFF_VERBOSE: Default changed from TTY-based fallback to disabled (false). Only enabled when explicitly set to a non-0 value.
CODEBUFF_PROMPT_LOG: Default changed from enabled to disabled when unset. Only enabled when set to 1, true, or a custom path.
Prompt logger truncation fix: Added post-append truncateIfNeeded() call so large entries that push the log past 5MB are properly truncated.
Documentation: Updated docs/CLAUDE_OAUTH_DEPLOYMENT.md env vars table with new defaults for both CODEBUFF_VERBOSE and CODEBUFF_PROMPT_LOG.
Tests: Updated cli-lite prompt-logger tests to expect disabled-by-default behavior.

Files Changed

cli-lite/src/index.ts
cli-lite/src/prompt-logger.ts
cli-lite/src/repl.ts
cli-lite/src/__tests__/prompt-logger.test.ts
cli/src/utils/prompt-logger.ts
docs/CLAUDE_OAUTH_DEPLOYMENT.md

…mitation PostgreSQL ADD VALUE for enums is not visible within the same transaction, so the UPDATE statements need to run in a separate migration after the enum value is committed. - 0039: Add referral_legacy enum value + is_legacy column (DEFAULT true) - 0040: Backfill credit_ledger with referral_legacy type

…ction deployments

… issue drizzle-kit migrate runs all pending migrations in a single transaction, so the new enum value is not committed when the UPDATE tries to use it. Moved backfill to standalone script: scripts/backfill-referral-legacy.sql Run this manually after migration 0039 is deployed.

The trailing comma after the last entry caused drizzle-kit migrate to fail with a JSON parse error in CI.

…er production deployments" This reverts commit d6d19fa.

…ge banner

…e export error

… material + tweaks)

…everyone knows coding agents)

Co-authored-by: brandonkachen <brandonchenjiacheng@gmail.com> and Codebuff!

…o use credits

…bled at all, which matches backend semantics

…r schema - Remove stripe_price_id column from user table (migration 0041) - Remove STRIPE_USAGE_PRICE_ID from env schemas and defaults - Stop auto-subscribing new users to Pure Usage on signup - Remove stripe_price_id from TypeScript types (user.ts, next-auth.d.ts, typed.d.ts) - Clean up test utilities and eval configs - Delete scripts/update-stripe-subscriptions.ts

- Remove 'review' from FREEBUFF_REMOVED_COMMAND_IDS and FREEBUFF_REMOVED_COMMANDS - Enable CHATGPT_OAUTH_ENABLED flag

Unify the OAuth connect-command surface so both commands are available in every build and only accept their fully-qualified form. - Expose /connect:chatgpt in the regular Codebuff build (previously freebuff-only) - Drop the /chatgpt short alias; /connect:chatgpt is the only name - Drop the /claude short alias; /connect:claude is the only name - Update /plan and /review freebuff-only guards and help banner to point at /connect:chatgpt Backfills an OpenSpec change documenting the invariants as a new cli-slash-commands capability, already archived at openspec/changes/archive/2026-04-23-cli-connect-commands-cleanup/ with the main spec synced to openspec/specs/cli-slash-commands/spec.md.

Adds packages/agent-runtime/src/util/normalize-conversation.ts — a 4-pass normalization pass that runs immediately before every LLM call (in run-agent-step.ts) and at every STEP_ALL / GENERATE_N yield boundary (in run-programmatic-step.ts, added in the previous commit). Enforced invariants: 1. No consecutive assistant-role messages (invalid prefill shape) 2. No orphan tool_result (preceding assistant must carry matching id) 3. Every tool_use has an immediately-following matching tool_result 4. Conversation does not end with a trailing assistant before a model call Pass ordering matters: merge consecutive assistants → drop orphan tool_results → synthesize missing tool_results → handle trailing assistant. The merge-first step protects pass 2/3 against the [assistant(A), assistant(B), tool_result A, tool_result B] shape where A and B belong to the same turn; merging produces [assistant(A+B), tool_result A, tool_result B] which is valid. Severity split on repair emission: - emitRepair (warn only, never throws): consecutive-assistant merge and orphan-tool_result drop. These are lossless cleanup of shapes that occur naturally at runtime (e.g., successive end_turn calls with excludeToolFromMessageHistory: true) or trivially-safe dropping with no data fabrication. - emitSynthesis (throws in strict mode): synthesized missing tool_result and trailing-assistant continuation. These fabricate conversation data to keep the model happy, indicating an upstream bug that should surface at PR time. Strict mode is opt-in via CODEBUFF_STRICT_CONVERSATION=1; default is 'repair'. In repair mode returns a new array with structured WARN logs ('conversation.shape.repaired' event) per repair. Idempotent on already-valid conversations — the conversationHasViolation() fast path returns the original array unchanged with no telemetry. Wiring in run-agent-step.ts: normalizeConversation runs once on agentState.messageHistory just before both LLM chokepoints (promptAiSdk n-responses path and getAgentStreamFromTemplate streaming path).

Adds a Sparrow-only OpenTelemetry instrumentation layer targeting Honeycomb. Core module lives at common/src/sparrow/telemetry/ and emits a hierarchical span tree (prompt > agent.run > agent.step > gen_ai.chat + tool.call) with real OTel context propagation via trace.setSpan + context.with. - tracer-provider: BasicTracerProvider + BatchSpanProcessor + OTLP HTTP exporter - span-helpers: withSpan, withPromptSpan, withAgentRunSpan, withAgentStepSpan, recordLlmCall, recordToolCall (idempotent finish) - context-harvester: 5s TTL cache of git/cwd/branch/Linear-key context - cost-rollup: token and USD aggregation up the span tree - attributes: Honeycomb attribute naming conventions - sparrow-config: persisted JSON at ~/.config/manicode/sparrow-config.json - 7 test files covering init/shutdown/flush/privacy/context-cache/cost- rollup/OAuth-fallback/coverage-gaps (104 passing tests) - docs/telemetry.md and HONEYCOMB_* + SPARROW_TELEMETRY_CAPTURE_PROMPTS env var documentation Telemetry is a silent no-op when HONEYCOMB_API_KEY is absent. Content (prompt bodies, tool args, tool output) is never captured by default.

Adds /telemetry CLI command with subcommands: enable, disable, status, dataset, capture-prompts, debug, init, shutdown, flush. Commands hot-reconfig the tracer provider in-process without a CLI restart (flush + shutdown + re-init). - cli/src/commands/telemetry.ts: command implementation - cli/src/commands/command-registry.ts: registration - cli/src/index.tsx: initTelemetry on CLI startup - cli/src/utils/renderer-cleanup.ts: flush + shutdown on exit so in-flight spans reach Honeycomb before process termination

Adds SPARROW-tagged telemetry hooks at 5 upstream-file touchpoints. All failures are swallowed so telemetry cannot break user workflows. - main-prompt.ts: withPromptSpan wraps the top-level prompt execution; every user turn produces one root prompt span with harvested git/cwd/Linear context - run-agent-step.ts: withAgentRunSpan wraps each agent orchestration; withAgentStepSpan wraps each runAgentStep with step_number + agent_id - tool-executor.ts: recordToolCall around executeToolCall and executeCustomToolCall. Integrated onto upstream atomic-pair contract (commit 3d59dc9) — span opened before try/catch, finished exactly once on each success/failure path. spawn_agents tool.call spans record child.agent_id linkage to spawned agent.run spans. Aborted custom-tool short-circuit skips span to avoid zero-duration noise. - llm.ts: recordLlmCall around LLM dispatch with route/model/cost/tokens. Fixes cost-override bug: use !== undefined (not truthy check) so $0 cost overrides still emit spans with correct cost. - run.ts: plumbs telemetry context through the SDK run entry point

…ANGES - openspec/changes/sparrow-telemetry/: full proposal with design, tasks (42/50 complete; remaining 8 are live Honeycomb verification + full-regression), and specs. Strict-validates. - SPARROW_CHANGES.md: enumerates every upstream-file touchpoint (main-prompt, run-agent-step, tool-executor, llm, run) and new Sparrow-only files for future upstream-merge audits.

Every user turn is already wrapped in a `withPromptSpan` \u2014 the root `prompt` span that owns the full agent.run/agent.step/gen_ai.chat/ tool.call hierarchy. When that span closes, the BatchSpanProcessor has the full trace queued, but previously we had to wait up to 5s for its timer (or for the CLI to exit / a manual `/telemetry flush`) before it shipped to Honeycomb. This patch fires a non-blocking `flushTelemetry(2000)` in a finally block after the prompt span ends, so every turn reaches Honeycomb within ~1\u20132 seconds of the CLI finishing its response. Fire-and-forget (never awaited), so it adds zero latency to the turn result; errors are swallowed so telemetry can never break user workflows. Changes: - common/src/sparrow/telemetry/span-helpers.ts: withPromptSpan now wraps the withSpan call in try/finally and kicks off a fire-and- forget flush after the span has ended. - common/src/sparrow/telemetry/__tests__/integration.test.ts: 3 new tests covering success path, throwing callback, and flush-error swallowing. - docs/telemetry.md: new "When spans are flushed" section documenting the 5 flush triggers (turn-end + 5s timer + batch size + CLI exit + config changes).

- Add codebuff.oauth_account_id (sha256 hash of OAuth refresh/access token) so two Claude or ChatGPT subscriptions on the same machine can be distinguished in Honeycomb dashboards. - Propagate user.email, user.name, host.name from the per-prompt harvest cache onto every gen_ai.chat span at creation, so token totals can be sliced by user/machine without joining through trace IDs. - Add deriveOAuthAccountId helper + 6 unit tests covering stability, distinctness, env-var fallback (refreshToken= case), and the one-way property of the hash. - Add .agents/skills/honeycomb-usage-report/ skill that produces a daily + aggregate breakdown of LLM usage by route and model over the past 7 days.

….chat Adds two new boolean attributes on every gen_ai.chat span to enable silent-fallback observability: - codebuff.chatgpt_oauth_eligible: tri-state (true | false | unset). true when the openai/* model is in the ChatGPT OAuth allowlist; false when openai/* but not allowlisted (e.g. gpt-5-nano); unset for non-openai models. - codebuff.claude_oauth_eligible: binary (true | unset). true for any anthropic/* or claude-* model recognized by isClaudeModel. Computed centrally in recordLlmCall() so all streaming paths emit consistently. Decoupled from codebuff.route so dashboards can count 'could have used OAuth subscription but didn't' with: <attr>_eligible = true AND codebuff.route = codebuff_backend Includes tests covering allowlist hits, non-allowlist openai/*, non-OpenAI/non-Claude models, missing model, and the silent-fallback query combination. 🤖 Generated with Codebuff Co-Authored-By: Codebuff <noreply@codebuff.com>

…chat.tsx imports Post-cherry-pick fixup for the telemetry/cache-debug/ChatGPT OAuth range: - bun.lock: resync after bun install pulled in @opentelemetry/exporter-trace-otlp-http and upgraded @opentelemetry/{resources,sdk-trace-base,core,context-async-hooks} to v2 (resourceFromAttributes export) - common/src/util/messages.ts: add errorToolResult() helper used by packages/agent-runtime/src/util/normalize-conversation.ts to fabricate missing tool_result entries - cli/src/chat.tsx: add missing imports for BottomStatusLine, useClaudeQuotaQuery, getClaudeOAuthStatus that were referenced but never imported

file-picker handleSteps uses read_files before yielding STEP, but read_files was not in toolNames. The LLM would see read_files in message history and try to call it, triggering "Tool read_files is not currently available". Same issue for file-lister with read_subtree. - Add read_files to file-picker toolNames - Add read_subtree to file-lister toolNames - Regenerate cli-lite bundled agents

…t docs 4.6 - Opus: anthropic/claude-opus-4.6 → anthropic/claude-opus-4.8 - GPT-5: openai/gpt-5.1 → openai/gpt-5.2 - Grok: x-ai/grok-4.1-fast → x-ai/grok-4.3 - Haiku: anthropic/claude-3.5-haiku-20241022 → anthropic/claude-haiku-4.5 - Docs: updated claude-sonnet-4.5 → claude-sonnet-4.6 in all examples Updated model-config.ts constants, bundled-agents generated files, oauth mappings, agent-definition type unions, .agents/ files, OpenAI pricing tables, and all user-facing documentation.

- Add onBeforeSubagentPrompt/onAfterSubagentComplete hooks to AgentRuntimeDeps - Wire hooks through agent-runtime, SDK, and both CLI/cli-lite - Add getSubagentHippoContext (3s timeout) and storeSubagentResultToHippo - Enrich commander, file-picker, opus-agent, gpt-5-agent with hippo context - Fix cancelled outputs stored as success (now classified as failure) - Fix misleading error message in getSubagentHippoContext - Normalize agent IDs with getShortAgentId for fully-qualified ID support - Cap injected context at 1500 chars to prevent token bloat - Add warn-level logging for hook failures - Add hippo storage schema and accuracy eval design docs

- Add cli-lite/scripts/stamp-version.ts to stamp git SHA into package.json - Add stamp:version npm script - Revert package.json version to base 0.1.0 (script handles stamping)

The build script was installing the binary to `npm config get prefix` which could differ from where the shell resolves `codebuff` (e.g. nvm bin directory). Now detects the existing binary location via `which`, falling back to npm prefix for first-time installs.

- buildOutputDescription no longer pads --output with file lists (cli + cli-lite); files tracked via --files-changed - add buildSubagentOutputDescription: mine narrative from output/message keys, handle plain strings, errors, and cancelled (surface reason instead of "Completed (Ns)") - trim SUBAGENT_NARRATIVE_KEYS to the keys codebuff subagents actually emit (output, message) - pass actual hippo error into logHippoPrompt metadata on subagent failure - export + add 21 unit tests for buildSubagentOutputDescription

- Prune ALLOWED_MODEL_PREFIXES to ['anthropic', 'openai'] only - Remove deepseekModels/DeepseekModel and CURRENT_GROK_MODEL constants - Add CURRENT_HAIKU_MODEL ('anthropic/claude-haiku-4.5') for utility agents - Add CURRENT_GPT5_MINI_MODEL ('openai/gpt-5-mini') for research agents - Switch file-picker, file-lister, commander-lite → Haiku (Claude OAuth) - Switch researcher-web, researcher-docs → GPT-5-mini (ChatGPT OAuth) - Update FREE_MODE_AGENT_MODELS allowlist to match new model assignments - Add openai/gpt-5-mini to OPENROUTER_TO_OPENAI_MODEL_MAP - Prune ModelName type unions (3 copies) to OpenAI + Anthropic only - Switch getModelForMode experimental/ask from Gemini Pro → Claude Opus - Regenerate cli-lite bundled-agents with updated model assignments - Update file-picker tests to expect Haiku instead of Grok/Gemini 🤖 Generated with Codebuff Co-Authored-By: Codebuff <noreply@codebuff.com>

Grok is no longer used by any agent (file-lister migrated to claude-haiku-4.5). Removing the dangling x-ai/grok-4 definition so grok can never be reintroduced from this repo. Empties nonCacheableModels (its only entry was the grok model).

Print truncated request/response content for tool calls and truncated prompt+params on subagent start/finish. Make the truncation limit configurable via CODEBUFF_DEBUG_TRUNCATE (default 500, 0 disables). Fix finish-event cost display: totalCost is credits (1 credit = $0.01), now shown via formatCredits as e.g. "44 credits ($0.44)" instead of "$44.0000". Add unit tests in output.test.ts.

Refresh bundled-agents.generated.ts so the deployed cli-lite binary no longer references the removed google/gemini-2.0-flash-001 model. file-picker/file-lister/commander-lite use claude-haiku-4.5; researchers use gpt-5-mini. Pushing triggers weft to rebuild the agent image with the up-to-date binary.

…dError + nested cause) isTransientApiError now (A) recursively unwraps error.cause (cycle-guarded) so a transient 529/overload nested inside a wrapper error is recognized, and (B) treats AI_NoOutputGeneratedError as transient since mid-stream provider overloads surface that way. Retries remain capped by MAX_STEP_RETRIES at the call site. Adds unit tests (error-transient.test.ts) and integration tests (loop-agent-steps.test.ts).

Adds describeTransientApiError() and getTransientStatusCode() helpers in common/src/util/error.ts so the retry notice shows a meaningful reason (e.g. "Response stream interrupted (no output)" for AI_NoOutputGeneratedError, or "Transient API error (529)" walking the cause chain) instead of a vague message. run-agent-step.ts uses the helper in its retry notice. Adds unit tests and an integration assertion.

Wrap subagent execution in a per-subagent timeout (10 min) so a stalled LLM stream can no longer hang the parent's Promise.allSettled join indefinitely. On timeout the child is aborted, a subagent_finish event is emitted (so the UI never shows a dangling 'started' agent), and the active agentState is attached to SubagentTimeoutError so partial credits are still aggregated. Applies to both fan-out and inline spawns.

brandonkachen and others added 30 commits February 4, 2026 13:51

refactor(db): Switch from drizzle-kit push to migrate for safer produ…

d6d19fa

…ction deployments

chore: Remove backfill script (already applied manually)

92a7603

Enable invoice creation and tax id collection in stripe checkout

a07936a

fix(db): Remove trailing comma in migration journal JSON

e0435f8

The trailing comma after the last entry caused drizzle-kit migrate to fail with a JSON parse error in CI.

Revert "refactor(db): Switch from drizzle-kit push to migrate for saf…

1141a88

…er production deployments" This reverts commit d6d19fa.

fix: use dynamic WEBSITE_URL instead of hardcoded codebuff.com in usa…

d9efa47

…ge banner

Use full width for terminal command preview

119670c

fix: move Stripe webhook helpers to separate file to fix Next.js rout…

2ef223e

…e export error

fix: add missing env variable (CodebuffAI#427)

ab065a3

Try to fix some timeout errors

70d3787

New doc: What makes Codebuff unique (generated from all our marketing…

ae0f600

… material + tweaks)

Delete where codebuff shines (mostly obvious stuff b/c 1 year later, …

3946e0f

…everyone knows coding agents)

Subscription client changes (CodebuffAI#424)

2c423c3

Co-authored-by: brandonkachen <brandonchenjiacheng@gmail.com> and Codebuff!

Bump version to 1.0.610

9b50b8e

Subscription endpoint: support token bearer auth

34ac8ee

Don't show input box and subscriptoin limit banner together

7c7710e

Don't allow escaping the subscription limit banner unless you click t…

af93b6f

…o use credits

Bump version to 1.0.611

e3ef8e5

Trigger buffbench remotely

ebd3048

Change always use a la carte to simply whether credit spending is ena…

7001603

…bled at all, which matches backend semantics

Bump version to 1.0.612

363c40f

Fix updating a la carte preference in prod: allow auth via bearer token

ecaff08

Update pricing page title/description metadata

97e94e4

Fix anthropic to open router mapping for opus 4.6

f8383bc

Opus 4.6 (CodebuffAI#428)

db5ca02

Bump version to 1.0.613

752a623

Add a step prompt to read relevant skills

e437455

jahooma and others added 30 commits April 28, 2026 12:23

More comprehensive prompt cache debugging logs

72bb6d3

Update cache debug script

31ba76d

Further cache debugging code to track usage

c20e90e

Add /connect:chatgpt

f241743

Enable /review and /connect:chatgpt in Freebuff

312524e

- Remove 'review' from FREEBUFF_REMOVED_COMMAND_IDS and FREEBUFF_REMOVED_COMMANDS - Enable CHATGPT_OAUTH_ENABLED flag

Get chatgpt oauth working

a204887

UX improvements for connecting chatgpt

ec9ed7b

chore(cli-lite): add stamp-version script for automatic git ref stamping

1c51977

- Add cli-lite/scripts/stamp-version.ts to stamp git SHA into package.json - Add stamp:version npm script - Revert package.json version to base 0.1.0 (script handles stamping)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: default CODEBUFF_VERBOSE and CODEBUFF_PROMPT_LOG to disabled, fix prompt logger truncation#1

fix: default CODEBUFF_VERBOSE and CODEBUFF_PROMPT_LOG to disabled, fix prompt logger truncation#1
reillyse wants to merge 214 commits into
mainfrom
reillyse/hippo-integration

reillyse commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants