Skip to content

Consolidate OpenAI server: unify implementations, add multi-protocol support#243

Open
bernardladenthin wants to merge 14 commits into
mainfrom
claude/clever-clarke-6a6vvg
Open

Consolidate OpenAI server: unify implementations, add multi-protocol support#243
bernardladenthin wants to merge 14 commits into
mainfrom
claude/clever-clarke-6a6vvg

Conversation

@bernardladenthin

Copy link
Copy Markdown
Owner

Summary

This PR consolidates the two independent OpenAI-compatible server implementations (OpenAiCompatServer from PR #240 and LlamaServer from #242) into a single unified codebase. The winning implementation is OpenAiCompatServer (JDK com.sun.net.httpserver, streaming SSE, zero extra dependencies), extended with:

  • Multi-protocol API surfaces: Ollama native (/api/chat, /api/generate, /api/tags, /api/version), Anthropic Messages (POST /v1/messages), OpenAI Responses (POST /v1/responses), and llama.cpp-native (POST /infill for fill-in-the-middle)
  • Additional OpenAI routes: POST /v1/completions (text), POST /v1/embeddings, POST /v1/rerank, GET /health (liveness probe)
  • CLI consolidation: OpenAiServerCli replaces LlamaServerArgs, with support for --embedding, --reranking, --mmproj flags
  • Streaming enhancements: stream_options.include_usage passthrough, cached_tokens safety net, CORS/OPTIONS preflight
  • Structured outputs: response_format support for JSON schema validation
  • Stateless protocol translators: Pure JSON-to-JSON mappers for Ollama, Anthropic, and Responses APIs (unit-testable, no model dependency)

The NanoHTTPD dependency is removed; the server now runs on the JDK's built-in HTTP server with zero extra runtime dependencies.

Test plan

  • Affected unit tests pass locally: OpenAiServerCliTest, OllamaApiSupportTest, AnthropicApiSupportTest, ResponsesApiSupportTest, OaiRerankSupportTest, OpenAiSseFormatterTest, OpenAiRequestMapperTest
  • Integration tests pass: OpenAiCompatServerHttpTest, OpenAiCompatServerIntegrationTest, OpenAiServerCompletionIntegrationTest, OpenAiServerEmbeddingsIntegrationTest, OpenAiServerRerankIntegrationTest
  • CI is green on this branch
  • Docs updated: README.md, TODO.md, docs/feature-investigation-ide-agent-backend.md, package-info.java

Related issues / PRs

Closes the "TWO implementations to consolidate" item in TODO.md. Consolidates PR #240 (OpenAiCompatServer) and #242 (LlamaServer / NanoHTTPD) into a single unified server.

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF

claude added 10 commits June 19, 2026 00:01
…p NanoHTTPD

Two interim OpenAI-compatible servers coexisted in net.ladenthin.llama.server
(PR #240's JDK com.sun.net.httpserver streaming server on top of #242's NanoHTTPD
blocking server). Settle on one: keep the JDK + SSE-streaming core, absorb the
NanoHTTPD server's extra routes / CLI / fat-jar entry point, then delete it.

Survivor: OpenAiCompatServer (dependency-free, embeddable, fat-jar Main-Class).
- Streaming chat via SSE with delta.tool_calls + prefill heartbeats (unchanged).
- Ported routes: POST /v1/completions, POST /v1/embeddings, GET /health.
- Broadened the model-free test seam ChatBackend -> OpenAiBackend (+ completions,
  embeddings); LlamaModelChatBackend -> LlamaModelBackend forwards the two new
  routes to handleCompletionsOai / handleEmbeddings.
- New testable CLI parser OpenAiServerCli (short/long/alias flags, --help,
  validation) replacing the inline arg map and the deleted LlamaServerArgs;
  produces ModelParameters + OpenAiServerConfig.

Deleted NanoHTTPD impl: LlamaServer, LlamaServerArgs, LlamaServerConfig,
OaiHttpServer, OaiRouter, OaiBackend, OaiResponse, LlamaModelOaiBackend
(+ OaiRouterTest, LlamaServerArgsTest, OaiHttpServerIntegrationTest).

Reconciliation:
- pom.xml: drop org.nanohttpd dependency + version; assembly Main-Class ->
  OpenAiCompatServer.
- spotbugs-exclude.xml: retarget CC_CYCLOMATIC_COMPLEXITY to OpenAiServerCli.parse;
  drop the LlamaModelOaiBackend EI_EXPOSE_REP2 entry (survivor is package-private,
  like the old LlamaModelChatBackend, which needed none).
- LlamaArchitectureTest Server layer + com.sun.net.httpserver exception and
  module-info `requires jdk.httpserver` unchanged (still correct for the survivor).
- LlamaModel javadoc link, README, CLAUDE.md, TODO.md, publish.yml comment updated;
  removed the consolidation block and the now-moot "implement SSE" TODO (its premise
  that com.sun.net.httpserver is ArchUnit-banned was wrong: it is the supported,
  exported jdk.httpserver module).

C++ (jllama.cpp / json_helpers.hpp / wrap_stream_chunk + its tests) unchanged: the
streaming path survives.

Verification (model-free): mvn compile test-compile; targeted tests
(LlamaArchitectureTest, OpenAiRequestMapperTest, OpenAiSseFormatterTest,
ChatStreamChunkParserTest, OpenAiCompatServerHttpTest, OpenAiServerCliTest) all
green; javadoc:jar clean; spotless:check clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
…ends

Implements the XS+S recommendations from the IDE/agent backend investigation,
targeting agentic tool-calling (Qwen) and local autocomplete:

XS:
- POST /infill route (FIM autocomplete: llama.vscode/Twinny/Tabby/Continue) —
  forwards verbatim to the existing native handleInfill; FIM tokens applied
  server-side from GGUF metadata. New OpenAiBackend.infill + LlamaModelBackend.
- Tolerant routing: every route also reachable without the /v1 prefix.
- cache_prompt defaulted true in the chat mapper (KV-prefix reuse for IDE latency).
- C++ regression guard (#20198): assert tool_calls.function.arguments is a JSON
  STRING, not an object — passes against pinned b9682, so agentic tool-calling is
  wire-correct for the OpenAI SDK / Roo Code / Copilot agent.

S:
- stream_options.include_usage passthrough: OpenAiRequestMapper forwards the
  stream_options object verbatim (new InferenceParameters.withStreamOptions) so
  the native server emits the trailing usage chunk OpenAI clients expect.
- cached_tokens safety net: OpenAiSseFormatter.ensureUsageCachedTokens guarantees
  usage.prompt_tokens_details.cached_tokens is present on the streamed usage chunk,
  fixing the documented Copilot custom-endpoint crash (microsoft/vscode #273482)
  regardless of upstream. Applied in the SSE path; token-delta chunks pass through
  unparsed.
- CORS: a com.sun.net.httpserver Filter answers OPTIONS preflights with 204 +
  Access-Control-Allow-{Origin,Methods,Headers} and stamps Allow-Origin on every
  response. New OpenAiServerConfig.corsAllowOrigin (default "*").

Tests: +infill/alias/CORS HTTP tests, +stream_options mapper test, +5
ensureUsageCachedTokens unit tests, +1 C++ arguments-as-string guard. Full server
+ json + arch suite green (77 model-free tests); C++ tool-call/stream suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
… docs

Continues the IDE/agent backend work (Medium items + documentation):

- POST /v1/rerank (+ /rerank, /reranking): RAG document reranking. Native
  handleRerank (made public, consistent with the other handle* methods) returns
  {document,index,score}; OaiRerankSupport reshapes it into the OpenAI rerank
  response with sorted {index, relevance_score}, top_n, and a `data` alias of
  `results` (Continue #6478). New OpenAiBackend.rerank + LlamaModelBackend.rerank.
- response_format passthrough (json_object / json_schema) for OpenAI structured
  outputs (new InferenceParameters.withResponseFormat; mapper forwards verbatim).
- Vision: --mmproj CLI flag (image_url content parts already pass through verbatim).
- CLI: --reranking (enableReranking), --mmproj (setMmproj) on OpenAiServerCli.

Docs:
- New docs/feature-investigation-ide-agent-backend.md (the deep-research report +
  an implementation-status preamble).
- README endpoints table + notes (rerank/infill, CORS, /v1-less aliases, response_
  format, the Copilot inline-completion limitation), CLAUDE.md server bullet,
  package-info, and TODO.md (DONE list + the deferred decisions: Ollama emulation,
  Anthropic /v1/messages + OpenAI /v1/responses shims, Continue native /completion,
  per-model FIM registry, /props).

Tests: +OaiRerankSupportTest (10), +rerank HTTP route, +response_format mapper test,
+--reranking/--mmproj CLI tests. Full server+json+arch suite green (138 tests);
javadoc + spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
Adds an Ollama-compatible surface so Copilot's built-in Ollama provider and
Ollama-hardcoded tools can drive the local model, translating to/from the internal
OpenAI chat path (no second inference path):

- GET /api/version, GET /api/tags, POST /api/show — discovery; /api/show advertises
  capabilities (completion/tools/insert [+vision when --mmproj]) and context length,
  which Copilot reads to enable tools/vision and size prompts.
- POST /api/chat — non-streaming (single JSON) and streaming (NDJSON, one object per
  line, terminated by a "done":true line). Request options (num_predict→max_tokens,
  temperature/top_p/top_k/seed/stop) and `format` (json / schema → response_format)
  are mapped; Ollama tool-call arguments (object) ↔ OpenAI (JSON string) are converted
  both ways.
- ToolCallDeltaAccumulator: reusable helper that reconstructs whole tool calls from
  OpenAI streaming delta.tool_calls fragments (shared by the non-OpenAI shims that
  deliver tool calls whole). Streamed tool calls are emitted on the Ollama done line.
- OpenAiServerConfig.supportsVision (set by --mmproj) feeds the /api/show vision flag.

All pure translation lives in OllamaApiSupport + ToolCallDeltaAccumulator (model-free
unit-tested); the server handlers are thin and reuse the OpenAiBackend seam.

Tests: +OllamaApiSupportTest (12), +ToolCallDeltaAccumulatorTest (3), +Ollama HTTP
route tests (version/tags/show/chat non-stream + NDJSON stream). Server+arch suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
…shims

Adds two more client-protocol surfaces over the internal OpenAI chat core (pure
translation, no second inference path; tool calls reconstructed from OpenAI
delta.tool_calls via the shared ToolCallDeltaAccumulator):

Anthropic Messages (POST /v1/messages):
- Request: system string/blocks → system message; content blocks (text / tool_use /
  tool_result) flattened to OpenAI messages (a user tool_result → a role:"tool"
  message); Anthropic tools (input_schema) → OpenAI function tools; tool_choice
  auto/any mapped.
- Non-streaming response: text + tool_use content blocks, stop_reason mapping
  (tool_calls→tool_use, length→max_tokens), usage.
- Streaming: the Anthropic SSE event sequence via AnthropicStreamTranslator
  (message_start → text content block start/delta/stop → tool_use blocks → message_
  delta → message_stop), with heartbeats.

OpenAI Responses (POST /v1/responses):
- Request: instructions → system; input string/array (message / function_call /
  function_call_output items) → OpenAI messages; flat function tools → nested.
- Non-streaming response: a `response` object whose output holds a message item
  (output_text) + one function_call item per tool call, with usage.
- Streaming: the Responses SSE event sequence via ResponsesStreamTranslator
  (response.created → output_item.added/content_part.added → output_text.delta* →
  *.done → function_call items → response.completed), with monotonic
  sequence_number, and heartbeats.

Both surfaces are reachable with and without the /v1 prefix and behind the CORS
filter. Docs (README endpoints table, CLAUDE.md server bullet, package-info, TODO)
updated; the two items are moved from "deferred" to done.

Tests: +AnthropicApiSupportTest (6), +AnthropicStreamTranslatorTest (4),
+ResponsesApiSupportTest (5), +ResponsesStreamTranslatorTest (4), + Anthropic/
Responses HTTP route tests (non-stream + stream). Full server+json+arch suite green
(251 tests); javadoc + spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
Fills the remaining tractable IDE-backend endpoints (those not blocked on a native
streaming-completion path):

- GET /props (llama.cpp-native): reports default_generation_settings.n_ctx and a
  modalities block (vision from --mmproj), which autocomplete clients such as
  llama.vscode read to size their context window. OpenAiSseFormatter.propsJson;
  unauthenticated like /health.
- POST /api/generate (Ollama-native prompt completion / FIM): maps to the native
  /v1/completions handler, or to /infill when a `suffix` is present (FIM). Options
  (num_predict→max_tokens, temperature/top_p/top_k/seed/stop) are mapped. Non-streaming
  returns one JSON; stream:true returns NDJSON (a content line + a done line).
  Generation completes before emission — documented as a single content chunk, since
  there is no streaming raw-completion path (tracked in TODO as the shared blocker for
  streaming /v1/completions, token-streaming /api/generate, and Continue's native
  /completion).

Docs (README, CLAUDE.md, package-info, TODO) updated; TODO now records the streaming
raw-completion JNI path as the one remaining blocker and trims the items it gates.

Tests: +/props + /api/generate translator unit tests (OllamaApiSupportTest,
OpenAiSseFormatterTest) and HTTP route tests (props, generate non-stream/FIM/stream).
Full server+json+arch suite green (261 tests); javadoc + spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
Make TODO.md reflect the shipped state accurately and capture every remaining item:

- Intro now lists the complete surface: the OpenAI routes plus /props and the three
  alternative protocol surfaces (Ollama /api/*, Anthropic /v1/messages, OpenAI
  /v1/responses) — previously it stopped at /health.
- Add the one genuinely-open item that was missing: live end-to-end validation of the
  Ollama/Anthropic/Responses surfaces against a real model + real clients (today they
  are covered only by model-free unit + fake-backend HTTP tests; only the OpenAI chat
  path has a gated integration test).

Done items remain marked DONE; the deferred follow-ups (streaming raw-completion JNI
path and what it gates, incremental tool-call streaming, per-model FIM registry,
multi-model registry, Gemma 4 validation) are unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
…cols

CI's test-java-linux-x86_64 job already has the three ingredients in one place —
the Linux x86_64 native lib (downloaded artifact), the models (incl. Qwen3-0.6B),
and `mvn test` — so OpenAiCompatServerIntegrationTest already round-trips the OpenAI
chat path over a real socket. Extend that same gated fixture (same loaded Qwen3
model, self-skips when absent) to smoke the new surfaces end-to-end:

- Ollama /api/chat (non-stream + NDJSON stream) and discovery (/api/version, /api/tags,
  /api/show).
- Anthropic /v1/messages (non-stream + SSE event stream).
- OpenAI /v1/responses (non-stream + SSE event stream).
- /props.

Assertions are structural only (markers the translators always emit: Ollama
"done":true, Anthropic event: message_start/message_stop, Responses
event: response.created/response.completed, response object shape) so they are robust
to the tiny model's wording — matching the existing chat round-trip's approach.

Embeddings / rerank / infill round-trips still need their own server fixtures in the
matching mode (models already in CI); tracked in TODO.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
…_result-only turn

Low-hanging unit-test coverage on the recently-added protocol translators (all pure,
model-free), plus one correctness fix surfaced by a new test:

Fix:
- AnthropicApiSupport: a user turn carrying only tool_result blocks now emits exactly
  one role:"tool" message instead of that plus a spurious empty {"role":"user",
  "content":""}. Guarded with a hadToolResult flag; assistant tool-call turns still
  carry null content.

New / extended tests:
- OpenAiServerConfigTest (new): builder defaults, isAuthenticationEnabled (null/empty/
  set), CORS + vision knobs, and the security contract that toString() never leaks the
  API key (only authEnabled boolean).
- OpenAiServerCli: --mmproj now asserted to flip toServerConfig().supportsVision.
- AnthropicApiSupport: system-as-blocks concatenation + stop_sequences mapping,
  tool_choice "any" -> "required", and the fixed tool_result-only branch.
- OllamaApiSupport: format-as-schema -> response_format json_schema; options.stop
  forwarding.
- ResponsesApiSupport: no-instructions (no system message), assistant message item,
  and non-function tools dropped.
- OpenAiCompatServerHttpTest: GET on the new POST routes (/v1/rerank, /v1/messages,
  /v1/responses, /api/chat, /api/generate) returns 405 (shared requirePostJson preamble).

Server + arch suite green (150 model-free tests); javadoc + spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
…ompletion/infill/generate

Completes the live end-to-end coverage of the IDE-backend surfaces. Each fixture boots
a real server over a real socket in the matching model mode, reuses a model CI already
downloads, self-skips when absent, and asserts structural shapes only:

- OpenAiServerEmbeddingsIntegrationTest (CodeLlama-7B + enableEmbedding): POST
  /v1/embeddings returns an OpenAI {object:list, data:[{object:embedding, embedding:[…]}]}
  shape; also covers the bare /embeddings alias.
- OpenAiServerRerankIntegrationTest (jina-reranker + enableReranking): POST /v1/rerank
  returns sorted {index, relevance_score} results capped by top_n, with the `data` alias.
- OpenAiServerCompletionIntegrationTest (CodeLlama-7B): POST /v1/completions, /infill, and
  Ollama /api/generate (plain + FIM via `suffix`) — CodeLlama is FIM-capable per
  LlamaModelTest#testGenerateInfill.

Also: add TestConstants.RERANKING_MODEL_PATH and route RerankingModelTest through it
(removes the duplicated literal). Used Java-8-safe idioms throughout.

These run in the same CI job that already round-trips the OpenAI chat path, so the
Ollama/Anthropic/Responses/embeddings/rerank/completion surfaces are now all validated
end-to-end against real models; only manual editor-client validation remains (TODO).

Server + arch suite green (integration fixtures self-skip without models locally);
javadoc + spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
claude added 2 commits June 19, 2026 08:20
…nsolidated server

The SonarQube "Build and analyze" job runs spotbugs at effort=Max/threshold=Low with
fb-contrib + findsecbugs, which flagged 30 Low-priority findings across the new
net.ladenthin.llama.server.* classes. Resolved as follows.

Fixed in code:
- Add Lombok @tostring to the stateful infra classes (AnthropicStreamTranslator,
  ResponsesStreamTranslator, ToolCallDeltaAccumulator, LlamaModelBackend,
  OpenAiCompatServer) — the project's established remedy for
  IMC_IMMATURE_CLASS_NO_TOSTRING (see Java8CompatibilityHelper). @tostring on the
  server is leak-safe: it renders the config via OpenAiServerConfig.toString(), which
  already redacts the API key.
- Add @EqualsAndHashCode to the immutable OpenAiServerConfig value type
  (IMC_IMMATURE_CLASS_NO_EQUALS).
- printReady(): println('[')/println(']') instead of length-1 strings (UCPM).

Suppressed (documented in spotbugs-exclude.xml) as by-design / false positive on
protocol-infrastructure code, mirroring the existing server PATH_TRAVERSAL/CRLF and
OpenAiServerCli.parse entries:
- IMPROPER_UNICODE: equalsIgnoreCase on ASCII HTTP method tokens (RFC 7230/7231).
- LEST: the UncheckedIOException -> IOException unwrap rethrows the original cause.
- WEM: input-validation precondition guards (same shape as ChatMessage.requireNonNull).
- MDM: main() blocks forever to keep the JVM alive for the daemon HTTP threads.
- NOS: per-request stream write lock passed as a parameter (not a shared field).
- MRC: sseDone()/heartbeat() are self-documenting SSE protocol-token accessors.
- PRMC: ResponsesApiSupport.dataObject() is a fresh-node factory, not cacheable.
- IMC_NO_EQUALS on the identity-managed ToolCallDeltaAccumulator.

Verified locally: mvn spotbugs:check -> BUILD SUCCESS (0 bugs); spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
OpenAiServerEmbeddingsIntegrationTest loaded CodeLlama-7B with enableEmbedding() only,
which leaves the pooling type NONE (CodeLlama's GGUF reports pooling = -1). The OpenAI
/v1/embeddings path (LlamaModel.handleEmbeddings with oaicompat=true) rejects pooling
NONE, so both test methods received HTTP 500 instead of 200 (Java Tests Ubuntu job on
PR #243).

Set .setPoolingType(PoolingType.MEAN) so CodeLlama produces a single pooled sentence
vector the OAI endpoint can return (MEAN/LAST both work for decoder-only models, per
LlamaEmbeddingsTest). The low-level LlamaModel#embed path is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
Comment thread src/main/java/net/ladenthin/llama/server/LlamaModelBackend.java Fixed
…p opaque-field toString

Two static-analysis findings on the new server code, both surfaced once the spotbugs fix let
the Maven build reach the sonar:sonar step for the first time:

1. SonarCloud quality gate (E Reliability on New Code, S2095 BLOCKER): the four streaming
   handlers (streamChat / streamOllamaChat / streamAnthropic / streamResponses) opened
   os = exchange.getResponseBody() and closed it via the closeQuietly() helper in finally.
   Sonar's S2095 does not trace closes through a helper, so it saw a possibly-unclosed
   resource. Fixed by closing os directly in the finally, under the per-stream write lock so
   the close still never races an in-flight heartbeat write; the closeQuietly helper is
   removed. Also moved heartbeatExecutor.scheduleAtFixedRate inside the try so a scheduling
   failure can no longer leak the stream (a real, if rare, pre-existing leak).

2. CodeQL "Use of default toString()" on LlamaModelBackend (non-blocking alert): the @tostring
   added in the previous commit rendered fields whose classes only inherit Object.toString
   (the request mapper and native model handle; HttpServer and the CORS Filter on the server).
   Dropped @tostring from LlamaModelBackend and OpenAiCompatServer — opaque-resource/service
   classes where a generated toString only emits identity hashes — and suppressed
   IMC_IMMATURE_CLASS_NO_TOSTRING for them with rationale. The translator/accumulator classes
   keep @tostring (their fields render meaningful state and are CodeQL-clean).

Verified locally: spotbugs:check -> 0 bugs; 67 server unit tests pass (incl. the 32
OpenAiCompatServerHttpTest streaming-path tests over real sockets); javadoc + spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
…rCloud S2445)

The SonarCloud quality gate kept failing with "E Reliability Rating on New Code" even after the
previous commit reworked the streaming close, and the gate was unchanged across that push — the
signal that the Blocker reliability bug was the synchronization TARGET, not the close. The
streaming write helpers synchronized on a writeLock passed as a method parameter (and, after the
last commit, on a method-local Object), which SonarCloud flags as java:S2445 "Blocks should be
synchronized on read-only fields" — a Blocker reliability bug. (fb-contrib's NOS flagged the same
code; that suppression masked it from spotbugs but Sonar evaluates it independently.)

Fix: introduce a small per-request AutoCloseable ResponseStream that owns the response
OutputStream and a private final lock, and serializes writeStrict / writeQuietly / close on that
owned lock. The four streaming handlers drive it via try-with-resources, so the stream is closed
(under the lock, after the heartbeat is cancelled) on every path — which also satisfies S2095 and
lets the now-stale NOS suppression be removed. Per-request locking is preserved (each request has
its own lock), so independent concurrent streams never serialize against each other.

Verified locally: spotbugs:check -> 0 bugs; 48 server unit tests pass incl. the 32
OpenAiCompatServerHttpTest streaming-path tests over real sockets; spotless clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JdLpWD8nedY7LwNnHefZLF
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
E Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants