fix: default CODEBUFF_VERBOSE and CODEBUFF_PROMPT_LOG to disabled, fix prompt logger truncation#1
Open
reillyse wants to merge 214 commits into
Open
fix: default CODEBUFF_VERBOSE and CODEBUFF_PROMPT_LOG to disabled, fix prompt logger truncation#1reillyse wants to merge 214 commits into
reillyse wants to merge 214 commits into
Conversation
…mitation PostgreSQL ADD VALUE for enums is not visible within the same transaction, so the UPDATE statements need to run in a separate migration after the enum value is committed. - 0039: Add referral_legacy enum value + is_legacy column (DEFAULT true) - 0040: Backfill credit_ledger with referral_legacy type
…ction deployments
… issue drizzle-kit migrate runs all pending migrations in a single transaction, so the new enum value is not committed when the UPDATE tries to use it. Moved backfill to standalone script: scripts/backfill-referral-legacy.sql Run this manually after migration 0039 is deployed.
The trailing comma after the last entry caused drizzle-kit migrate to fail with a JSON parse error in CI.
…er production deployments" This reverts commit d6d19fa.
… material + tweaks)
…everyone knows coding agents)
Co-authored-by: brandonkachen <brandonchenjiacheng@gmail.com> and Codebuff!
…bled at all, which matches backend semantics
…r schema - Remove stripe_price_id column from user table (migration 0041) - Remove STRIPE_USAGE_PRICE_ID from env schemas and defaults - Stop auto-subscribing new users to Pure Usage on signup - Remove stripe_price_id from TypeScript types (user.ts, next-auth.d.ts, typed.d.ts) - Clean up test utilities and eval configs - Delete scripts/update-stripe-subscriptions.ts
- Remove 'review' from FREEBUFF_REMOVED_COMMAND_IDS and FREEBUFF_REMOVED_COMMANDS - Enable CHATGPT_OAUTH_ENABLED flag
Unify the OAuth connect-command surface so both commands are available in every build and only accept their fully-qualified form. - Expose /connect:chatgpt in the regular Codebuff build (previously freebuff-only) - Drop the /chatgpt short alias; /connect:chatgpt is the only name - Drop the /claude short alias; /connect:claude is the only name - Update /plan and /review freebuff-only guards and help banner to point at /connect:chatgpt Backfills an OpenSpec change documenting the invariants as a new cli-slash-commands capability, already archived at openspec/changes/archive/2026-04-23-cli-connect-commands-cleanup/ with the main spec synced to openspec/specs/cli-slash-commands/spec.md.
Adds packages/agent-runtime/src/util/normalize-conversation.ts — a 4-pass
normalization pass that runs immediately before every LLM call (in
run-agent-step.ts) and at every STEP_ALL / GENERATE_N yield boundary (in
run-programmatic-step.ts, added in the previous commit).
Enforced invariants:
1. No consecutive assistant-role messages (invalid prefill shape)
2. No orphan tool_result (preceding assistant must carry matching id)
3. Every tool_use has an immediately-following matching tool_result
4. Conversation does not end with a trailing assistant before a model call
Pass ordering matters: merge consecutive assistants → drop orphan
tool_results → synthesize missing tool_results → handle trailing assistant.
The merge-first step protects pass 2/3 against the [assistant(A),
assistant(B), tool_result A, tool_result B] shape where A and B belong to
the same turn; merging produces [assistant(A+B), tool_result A, tool_result
B] which is valid.
Severity split on repair emission:
- emitRepair (warn only, never throws): consecutive-assistant merge and
orphan-tool_result drop. These are lossless cleanup of shapes that
occur naturally at runtime (e.g., successive end_turn calls with
excludeToolFromMessageHistory: true) or trivially-safe dropping with no
data fabrication.
- emitSynthesis (throws in strict mode): synthesized missing tool_result
and trailing-assistant continuation. These fabricate conversation data
to keep the model happy, indicating an upstream bug that should surface
at PR time.
Strict mode is opt-in via CODEBUFF_STRICT_CONVERSATION=1; default is
'repair'. In repair mode returns a new array with structured WARN logs
('conversation.shape.repaired' event) per repair. Idempotent on
already-valid conversations — the conversationHasViolation() fast path
returns the original array unchanged with no telemetry.
Wiring in run-agent-step.ts: normalizeConversation runs once on
agentState.messageHistory just before both LLM chokepoints (promptAiSdk
n-responses path and getAgentStreamFromTemplate streaming path).
Adds a Sparrow-only OpenTelemetry instrumentation layer targeting Honeycomb. Core module lives at common/src/sparrow/telemetry/ and emits a hierarchical span tree (prompt > agent.run > agent.step > gen_ai.chat + tool.call) with real OTel context propagation via trace.setSpan + context.with. - tracer-provider: BasicTracerProvider + BatchSpanProcessor + OTLP HTTP exporter - span-helpers: withSpan, withPromptSpan, withAgentRunSpan, withAgentStepSpan, recordLlmCall, recordToolCall (idempotent finish) - context-harvester: 5s TTL cache of git/cwd/branch/Linear-key context - cost-rollup: token and USD aggregation up the span tree - attributes: Honeycomb attribute naming conventions - sparrow-config: persisted JSON at ~/.config/manicode/sparrow-config.json - 7 test files covering init/shutdown/flush/privacy/context-cache/cost- rollup/OAuth-fallback/coverage-gaps (104 passing tests) - docs/telemetry.md and HONEYCOMB_* + SPARROW_TELEMETRY_CAPTURE_PROMPTS env var documentation Telemetry is a silent no-op when HONEYCOMB_API_KEY is absent. Content (prompt bodies, tool args, tool output) is never captured by default.
Adds /telemetry CLI command with subcommands: enable, disable, status, dataset, capture-prompts, debug, init, shutdown, flush. Commands hot-reconfig the tracer provider in-process without a CLI restart (flush + shutdown + re-init). - cli/src/commands/telemetry.ts: command implementation - cli/src/commands/command-registry.ts: registration - cli/src/index.tsx: initTelemetry on CLI startup - cli/src/utils/renderer-cleanup.ts: flush + shutdown on exit so in-flight spans reach Honeycomb before process termination
Adds SPARROW-tagged telemetry hooks at 5 upstream-file touchpoints. All failures are swallowed so telemetry cannot break user workflows. - main-prompt.ts: withPromptSpan wraps the top-level prompt execution; every user turn produces one root prompt span with harvested git/cwd/Linear context - run-agent-step.ts: withAgentRunSpan wraps each agent orchestration; withAgentStepSpan wraps each runAgentStep with step_number + agent_id - tool-executor.ts: recordToolCall around executeToolCall and executeCustomToolCall. Integrated onto upstream atomic-pair contract (commit 3d59dc9) — span opened before try/catch, finished exactly once on each success/failure path. spawn_agents tool.call spans record child.agent_id linkage to spawned agent.run spans. Aborted custom-tool short-circuit skips span to avoid zero-duration noise. - llm.ts: recordLlmCall around LLM dispatch with route/model/cost/tokens. Fixes cost-override bug: use !== undefined (not truthy check) so $0 cost overrides still emit spans with correct cost. - run.ts: plumbs telemetry context through the SDK run entry point
…ANGES - openspec/changes/sparrow-telemetry/: full proposal with design, tasks (42/50 complete; remaining 8 are live Honeycomb verification + full-regression), and specs. Strict-validates. - SPARROW_CHANGES.md: enumerates every upstream-file touchpoint (main-prompt, run-agent-step, tool-executor, llm, run) and new Sparrow-only files for future upstream-merge audits.
Every user turn is already wrapped in a `withPromptSpan` \u2014 the root `prompt` span that owns the full agent.run/agent.step/gen_ai.chat/ tool.call hierarchy. When that span closes, the BatchSpanProcessor has the full trace queued, but previously we had to wait up to 5s for its timer (or for the CLI to exit / a manual `/telemetry flush`) before it shipped to Honeycomb. This patch fires a non-blocking `flushTelemetry(2000)` in a finally block after the prompt span ends, so every turn reaches Honeycomb within ~1\u20132 seconds of the CLI finishing its response. Fire-and-forget (never awaited), so it adds zero latency to the turn result; errors are swallowed so telemetry can never break user workflows. Changes: - common/src/sparrow/telemetry/span-helpers.ts: withPromptSpan now wraps the withSpan call in try/finally and kicks off a fire-and- forget flush after the span has ended. - common/src/sparrow/telemetry/__tests__/integration.test.ts: 3 new tests covering success path, throwing callback, and flush-error swallowing. - docs/telemetry.md: new "When spans are flushed" section documenting the 5 flush triggers (turn-end + 5s timer + batch size + CLI exit + config changes).
- Add codebuff.oauth_account_id (sha256 hash of OAuth refresh/access token) so two Claude or ChatGPT subscriptions on the same machine can be distinguished in Honeycomb dashboards. - Propagate user.email, user.name, host.name from the per-prompt harvest cache onto every gen_ai.chat span at creation, so token totals can be sliced by user/machine without joining through trace IDs. - Add deriveOAuthAccountId helper + 6 unit tests covering stability, distinctness, env-var fallback (refreshToken= case), and the one-way property of the hash. - Add .agents/skills/honeycomb-usage-report/ skill that produces a daily + aggregate breakdown of LLM usage by route and model over the past 7 days.
….chat Adds two new boolean attributes on every gen_ai.chat span to enable silent-fallback observability: - codebuff.chatgpt_oauth_eligible: tri-state (true | false | unset). true when the openai/* model is in the ChatGPT OAuth allowlist; false when openai/* but not allowlisted (e.g. gpt-5-nano); unset for non-openai models. - codebuff.claude_oauth_eligible: binary (true | unset). true for any anthropic/* or claude-* model recognized by isClaudeModel. Computed centrally in recordLlmCall() so all streaming paths emit consistently. Decoupled from codebuff.route so dashboards can count 'could have used OAuth subscription but didn't' with: <attr>_eligible = true AND codebuff.route = codebuff_backend Includes tests covering allowlist hits, non-allowlist openai/*, non-OpenAI/non-Claude models, missing model, and the silent-fallback query combination. 🤖 Generated with Codebuff Co-Authored-By: Codebuff <noreply@codebuff.com>
…chat.tsx imports
Post-cherry-pick fixup for the telemetry/cache-debug/ChatGPT OAuth range:
- bun.lock: resync after bun install pulled in @opentelemetry/exporter-trace-otlp-http and upgraded @opentelemetry/{resources,sdk-trace-base,core,context-async-hooks} to v2 (resourceFromAttributes export)
- common/src/util/messages.ts: add errorToolResult() helper used by packages/agent-runtime/src/util/normalize-conversation.ts to fabricate missing tool_result entries
- cli/src/chat.tsx: add missing imports for BottomStatusLine, useClaudeQuotaQuery, getClaudeOAuthStatus that were referenced but never imported
file-picker handleSteps uses read_files before yielding STEP, but read_files was not in toolNames. The LLM would see read_files in message history and try to call it, triggering "Tool read_files is not currently available". Same issue for file-lister with read_subtree. - Add read_files to file-picker toolNames - Add read_subtree to file-lister toolNames - Regenerate cli-lite bundled agents
…t docs 4.6 - Opus: anthropic/claude-opus-4.6 → anthropic/claude-opus-4.8 - GPT-5: openai/gpt-5.1 → openai/gpt-5.2 - Grok: x-ai/grok-4.1-fast → x-ai/grok-4.3 - Haiku: anthropic/claude-3.5-haiku-20241022 → anthropic/claude-haiku-4.5 - Docs: updated claude-sonnet-4.5 → claude-sonnet-4.6 in all examples Updated model-config.ts constants, bundled-agents generated files, oauth mappings, agent-definition type unions, .agents/ files, OpenAI pricing tables, and all user-facing documentation.
- Add onBeforeSubagentPrompt/onAfterSubagentComplete hooks to AgentRuntimeDeps - Wire hooks through agent-runtime, SDK, and both CLI/cli-lite - Add getSubagentHippoContext (3s timeout) and storeSubagentResultToHippo - Enrich commander, file-picker, opus-agent, gpt-5-agent with hippo context - Fix cancelled outputs stored as success (now classified as failure) - Fix misleading error message in getSubagentHippoContext - Normalize agent IDs with getShortAgentId for fully-qualified ID support - Cap injected context at 1500 chars to prevent token bloat - Add warn-level logging for hook failures - Add hippo storage schema and accuracy eval design docs
- Add cli-lite/scripts/stamp-version.ts to stamp git SHA into package.json - Add stamp:version npm script - Revert package.json version to base 0.1.0 (script handles stamping)
The build script was installing the binary to `npm config get prefix` which could differ from where the shell resolves `codebuff` (e.g. nvm bin directory). Now detects the existing binary location via `which`, falling back to npm prefix for first-time installs.
- buildOutputDescription no longer pads --output with file lists (cli + cli-lite); files tracked via --files-changed - add buildSubagentOutputDescription: mine narrative from output/message keys, handle plain strings, errors, and cancelled (surface reason instead of "Completed (Ns)") - trim SUBAGENT_NARRATIVE_KEYS to the keys codebuff subagents actually emit (output, message) - pass actual hippo error into logHippoPrompt metadata on subagent failure - export + add 21 unit tests for buildSubagentOutputDescription
- Prune ALLOWED_MODEL_PREFIXES to ['anthropic', 'openai'] only
- Remove deepseekModels/DeepseekModel and CURRENT_GROK_MODEL constants
- Add CURRENT_HAIKU_MODEL ('anthropic/claude-haiku-4.5') for utility agents
- Add CURRENT_GPT5_MINI_MODEL ('openai/gpt-5-mini') for research agents
- Switch file-picker, file-lister, commander-lite → Haiku (Claude OAuth)
- Switch researcher-web, researcher-docs → GPT-5-mini (ChatGPT OAuth)
- Update FREE_MODE_AGENT_MODELS allowlist to match new model assignments
- Add openai/gpt-5-mini to OPENROUTER_TO_OPENAI_MODEL_MAP
- Prune ModelName type unions (3 copies) to OpenAI + Anthropic only
- Switch getModelForMode experimental/ask from Gemini Pro → Claude Opus
- Regenerate cli-lite bundled-agents with updated model assignments
- Update file-picker tests to expect Haiku instead of Grok/Gemini
🤖 Generated with Codebuff
Co-Authored-By: Codebuff <noreply@codebuff.com>
Grok is no longer used by any agent (file-lister migrated to claude-haiku-4.5). Removing the dangling x-ai/grok-4 definition so grok can never be reintroduced from this repo. Empties nonCacheableModels (its only entry was the grok model).
Print truncated request/response content for tool calls and truncated prompt+params on subagent start/finish. Make the truncation limit configurable via CODEBUFF_DEBUG_TRUNCATE (default 500, 0 disables). Fix finish-event cost display: totalCost is credits (1 credit = $0.01), now shown via formatCredits as e.g. "44 credits ($0.44)" instead of "$44.0000". Add unit tests in output.test.ts.
Refresh bundled-agents.generated.ts so the deployed cli-lite binary no longer references the removed google/gemini-2.0-flash-001 model. file-picker/file-lister/commander-lite use claude-haiku-4.5; researchers use gpt-5-mini. Pushing triggers weft to rebuild the agent image with the up-to-date binary.
…dError + nested cause) isTransientApiError now (A) recursively unwraps error.cause (cycle-guarded) so a transient 529/overload nested inside a wrapper error is recognized, and (B) treats AI_NoOutputGeneratedError as transient since mid-stream provider overloads surface that way. Retries remain capped by MAX_STEP_RETRIES at the call site. Adds unit tests (error-transient.test.ts) and integration tests (loop-agent-steps.test.ts).
Adds describeTransientApiError() and getTransientStatusCode() helpers in common/src/util/error.ts so the retry notice shows a meaningful reason (e.g. "Response stream interrupted (no output)" for AI_NoOutputGeneratedError, or "Transient API error (529)" walking the cause chain) instead of a vague message. run-agent-step.ts uses the helper in its retry notice. Adds unit tests and an integration assertion.
Wrap subagent execution in a per-subagent timeout (10 min) so a stalled LLM stream can no longer hang the parent's Promise.allSettled join indefinitely. On timeout the child is aborted, a subagent_finish event is emitted (so the UI never shows a dangling 'started' agent), and the active agentState is attached to SubagentTimeoutError so partial credits are still aggregated. Applies to both fan-out and inline spawns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
false). Only enabled when explicitly set to a non-0value.1,true, or a custom path.truncateIfNeeded()call so large entries that push the log past 5MB are properly truncated.docs/CLAUDE_OAUTH_DEPLOYMENT.mdenv vars table with new defaults for bothCODEBUFF_VERBOSEandCODEBUFF_PROMPT_LOG.cli-liteprompt-logger tests to expect disabled-by-default behavior.Files Changed
cli-lite/src/index.tscli-lite/src/prompt-logger.tscli-lite/src/repl.tscli-lite/src/__tests__/prompt-logger.test.tscli/src/utils/prompt-logger.tsdocs/CLAUDE_OAUTH_DEPLOYMENT.md