Skip to content

Retry OpenAI API on internal server error#1226

Merged
sscargal merged 3 commits into
MemMachine:mainfrom
edwinyyyu:transient_openai_embedder
Mar 27, 2026
Merged

Retry OpenAI API on internal server error#1226
sscargal merged 3 commits into
MemMachine:mainfrom
edwinyyyu:transient_openai_embedder

Conversation

@edwinyyyu

Copy link
Copy Markdown
Contributor

Purpose of the change

Hit internal server error intermittently today.

Description

Add exception type to retryable errors for OpenAI APIs.

Type of change

[Please delete options that are not relevant.]

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g., code style improvements, linting)
  • Documentation update
  • Project Maintenance (updates to build scripts, CI, etc., that do not affect the main project)
  • Security (improves security without changing functionality)

How Has This Been Tested?

  • Unit Test
  • Integration Test
  • End-to-end Test
  • Test Script (please provide)
  • Manual verification (list step-by-step instructions)

Checklist

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>
Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds openai.InternalServerError to the set of retryable OpenAI SDK exceptions, addressing intermittent 5xx failures by allowing the existing retry/backoff logic to kick in.

Changes:

  • Treat openai.InternalServerError as retryable for the OpenAI Responses language model.
  • Treat openai.InternalServerError as retryable for the OpenAI Chat Completions language model.
  • Treat openai.InternalServerError as retryable for the OpenAI embedder chunk-cluster embedding calls.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
packages/server/src/memmachine_server/common/language_model/openai_responses_language_model.py Adds InternalServerError to the retryable exception tuple for response generation.
packages/server/src/memmachine_server/common/language_model/openai_chat_completions_language_model.py Adds InternalServerError to the retryable exception tuple for chat completions generation.
packages/server/src/memmachine_server/common/embedder/openai_embedder.py Adds InternalServerError to the retryable exception tuple for embedding requests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 264 to +267
openai.RateLimitError,
openai.APITimeoutError,
openai.APIConnectionError,
openai.InternalServerError,
Comment on lines 221 to +225
except (
openai.RateLimitError,
openai.APITimeoutError,
openai.APIConnectionError,
openai.InternalServerError,
Comment on lines 241 to +245
except (
openai.RateLimitError,
openai.APITimeoutError,
openai.APIConnectionError,
openai.InternalServerError,

@sscargal sscargal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Unit tests for this could be handled separately.

openai.RateLimitError,
openai.APITimeoutError,
openai.APIConnectionError,
openai.InternalServerError,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put all the retriable exceptions to a common place instead of hardcoding them everywhere?

@edwinyyyu

edwinyyyu commented Mar 17, 2026

Copy link
Copy Markdown
Contributor Author

There is a reproducible InternalServerError where retry won't help:
https://community.openai.com/t/special-tokens-cause-500-on-text-embedding-3-small-but-not-other-models/1377013

It appears they have rolled out changes gradually that make it no longer intermittent, but consistent.

@edwinyyyu edwinyyyu changed the title Retry OpenAI embeddings on internal server error Retry OpenAI API on internal server error Mar 17, 2026
Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>
@sscargal sscargal added this to the v0.3.3 milestone Mar 17, 2026
@sscargal sscargal merged commit 0a7662d into MemMachine:main Mar 27, 2026
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants