Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Run Bedrock calls in executor for async #3884

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

lou-k
Copy link
Contributor

@lou-k lou-k commented Jul 10, 2024

Wraps synchronous calls in an executor for async support in Bedrock evals.

The current code makes a synchronous call in an async function. As a result, calling run_evals:

  1. Doesn't actually parallelize while using bedrock, resulting in very low throughput
  2. Doesn't properly display the progress bar since all calls are made before the gather. This confuses the user and makes them think something is wrong.

Proper async support in boto3 has been asked for for years, so I don't think we can count on it any time soon. Wrapping in an executor ties up some threads on I/O, but still results in significantly higher throughput, which I think eval users (including myself) require.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jul 10, 2024
@lou-k lou-k force-pushed the evals-bedrock-async branch 2 times, most recently from 691ffd2 to 7efc3e8 Compare July 10, 2024 22:35
@anticorrelator
Copy link
Contributor

@lou-k Thanks so much for the contribution! Because we don't have comprehensive testing around our model wrappers before we can merge this I was hoping we could check (or even explicitly test) the following cases:

  • Does llm_classify successfully parallelize calls to boto in a script via a sync entry point?
  • Does llm_classify successfully parallelize calls to boto when we need a nested event loop (e.g. in a notebook environment)
  • Do errors raised inside the boto call get properly propagated outside of the thread? (If we mock the boto call and have it return an error, will that error show up in the llm_classify "exceptions" column?)
  • Does this interact properly with the openinference Bedrock auto-instrumentation?

If we can confirm all of this behavior is preserved this seems like a great idea!

@lou-k
Copy link
Contributor Author

lou-k commented Jul 15, 2024

@anticorrelator thanks for the feedback! I'll run these tests this week and share results.

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jul 16, 2024
@lou-k lou-k force-pushed the evals-bedrock-async branch 2 times, most recently from bc9ada5 to 82c3e82 Compare July 16, 2024 13:48
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jul 16, 2024
@lou-k
Copy link
Contributor Author

lou-k commented Jul 16, 2024

@anticorrelator :

Does llm_classify successfully parallelize calls to boto in a script via a sync entry point?

Yes, confirmed locally by running the toxicity eval in a script

Does llm_classify successfully parallelize calls to boto when we need a nested event loop (e.g. in a notebook environment)

Yes, confirmed locally by running the toxicity benchmark notebook

Do errors raised inside the boto call get properly propagated outside of the thread? (If we mock the boto call and have it return an error, will that error show up in the llm_classify "exceptions" column?)

I've added a unit test that confirms errors are throln from the async call

Does this interact properly with the openinference Bedrock auto-instrumentation?

Yes, but it's worth calling out that calls to claude2 don't work properly with Bedrock's instrumentation right now since the code assumes a prompt is in the request body, and this BedrockModel class uses the messages api for claude2

I noticed tox is failing but I can't seem to find the error message -- can you see it?

@anticorrelator
Copy link
Contributor

@lou-k Thanks for checking! Let me try running tox locally on your branch

@anticorrelator
Copy link
Contributor

@lou-k looks like the import block needs formatting

tests/phoenix/evals/models/test_bedrock.py:1:1: I001 [*] Import block is un-sorted or un-formatted
Found 1 error.

If you run the ruff formatter everything should be good to go: ruff format .; ruff check . --fix;

Copy link
Contributor

@anticorrelator anticorrelator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

@lou-k
Copy link
Contributor Author

lou-k commented Jul 16, 2024

Ahh thanks! I'm new to ruff -- I'll fix it up and squash the commits

@lou-k lou-k force-pushed the evals-bedrock-async branch from 1fe6dfb to 00bbf63 Compare July 16, 2024 15:54
@lou-k
Copy link
Contributor Author

lou-k commented Jul 16, 2024

I believe this is ready to merge :highfive:

@anticorrelator anticorrelator merged commit 46e3b1c into Arize-ai:main Jul 16, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants