Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance request handling with improved retry logic and logging #6205

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

magic3007
Copy link

  • Added logging for retryable errors and retry attempts in the request module.
  • Increased the maximum number of retry attempts from 3 to 5.
  • Updated the exponential backoff strategy for retries, with a multiplier of 2 and adjusted min/max wait times.
  • Increased the default timeout for requests from 10 to 30 seconds.
  • Improved error handling to log connection and timeout errors.

These changes aim to provide better visibility into request failures and enhance the robustness of the request handling process.

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions


Link of any specific issues this addresses

- Added logging for retryable errors and retry attempts in the request module.
- Increased the maximum number of retry attempts from 3 to 5.
- Updated the exponential backoff strategy for retries, with a multiplier of 2 and adjusted min/max wait times.
- Increased the default timeout for requests from 10 to 30 seconds.
- Improved error handling to log connection and timeout errors.

These changes aim to provide better visibility into request failures and enhance the robustness of the request handling process.
@enyst enyst requested review from xingyaoww and rbren January 11, 2025 17:36
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending the PR!

We had some relevant discussion in this thread: adding these retry in the send_request function is going to cause issue somewhere 😓
#6175 (comment)

Maybe it is better to handle retry here:

def _send_action_server_request(self, method, url, **kwargs):
try:
return super()._send_action_server_request(method, url, **kwargs)
except requests.Timeout:
self.log(
'error',
f'No response received within the timeout period for url: {url}',
)
raise
except requests.HTTPError as e:
if e.response.status_code in (404, 502):
if e.response.status_code == 404:
raise AgentRuntimeDisconnectedError(
'Runtime is not responding. This may be temporary, please try again.'
) from e
else: # 502
raise AgentRuntimeDisconnectedError(
'Runtime is temporarily unavailable. This may be due to a restart or network issue, please try again.'
) from e
elif e.response.status_code == 503:
self.log('warning', 'Runtime appears to be paused. Resuming...')
self._resume_runtime()
return super()._send_action_server_request(method, url, **kwargs)
else:
raise e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants