You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the ansari.py is coded in a way that always anticipates litellm to return a streamed response (e.g., many "yield" statements, etc.).
We want to refactor and update this code so that it incorporates non-streamed generation as well (which can be used with certain endpoints like WhatsApp, where a message has to be sent as a whole and can't be streamed, etc.).
Additionally, this seems to be affecting the final response time to WhatsApp users. See this issue for more info.
The text was updated successfully, but these errors were encountered:
Currently, the ansari.py is coded in a way that always anticipates litellm to return a streamed response (e.g., many "yield" statements, etc.).
We want to refactor and update this code so that it incorporates non-streamed generation as well (which can be used with certain endpoints like WhatsApp, where a message has to be sent as a whole and can't be streamed, etc.).
Additionally, this seems to be affecting the final response time to WhatsApp users. See this issue for more info.
The text was updated successfully, but these errors were encountered: