fix: Run Bedrock calls in executor for async #3884

lou-k · 2024-07-10T22:27:42Z

Wraps synchronous calls in an executor for async support in Bedrock evals.

The current code makes a synchronous call in an async function. As a result, calling run_evals:

Doesn't actually parallelize while using bedrock, resulting in very low throughput
Doesn't properly display the progress bar since all calls are made before the gather. This confuses the user and makes them think something is wrong.

Proper async support in boto3 has been asked for for years, so I don't think we can count on it any time soon. Wrapping in an executor ties up some threads on I/O, but still results in significantly higher throughput, which I think eval users (including myself) require.

anticorrelator · 2024-07-12T01:07:44Z

@lou-k Thanks so much for the contribution! Because we don't have comprehensive testing around our model wrappers before we can merge this I was hoping we could check (or even explicitly test) the following cases:

Does llm_classify successfully parallelize calls to boto in a script via a sync entry point?
Does llm_classify successfully parallelize calls to boto when we need a nested event loop (e.g. in a notebook environment)
Do errors raised inside the boto call get properly propagated outside of the thread? (If we mock the boto call and have it return an error, will that error show up in the llm_classify "exceptions" column?)
Does this interact properly with the openinference Bedrock auto-instrumentation?

If we can confirm all of this behavior is preserved this seems like a great idea!

lou-k · 2024-07-15T12:08:44Z

@anticorrelator thanks for the feedback! I'll run these tests this week and share results.

lou-k · 2024-07-16T14:39:29Z

@anticorrelator :

Does llm_classify successfully parallelize calls to boto in a script via a sync entry point?

Yes, confirmed locally by running the toxicity eval in a script

Does llm_classify successfully parallelize calls to boto when we need a nested event loop (e.g. in a notebook environment)

Yes, confirmed locally by running the toxicity benchmark notebook

Do errors raised inside the boto call get properly propagated outside of the thread? (If we mock the boto call and have it return an error, will that error show up in the llm_classify "exceptions" column?)

I've added a unit test that confirms errors are throln from the async call

Does this interact properly with the openinference Bedrock auto-instrumentation?

Yes, but it's worth calling out that calls to claude2 don't work properly with Bedrock's instrumentation right now since the code assumes a prompt is in the request body, and this BedrockModel class uses the messages api for claude2

I noticed tox is failing but I can't seem to find the error message -- can you see it?

anticorrelator · 2024-07-16T15:35:52Z

@lou-k Thanks for checking! Let me try running tox locally on your branch

anticorrelator · 2024-07-16T15:39:45Z

@lou-k looks like the import block needs formatting

tests/phoenix/evals/models/test_bedrock.py:1:1: I001 [*] Import block is un-sorted or un-formatted
Found 1 error.

If you run the ruff formatter everything should be good to go: ruff format .; ruff check . --fix;

anticorrelator

Thank you so much!

lou-k · 2024-07-16T15:45:10Z

Ahh thanks! I'm new to ruff -- I'll fix it up and squash the commits

lou-k · 2024-07-16T15:58:41Z

I believe this is ready to merge :highfive:

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jul 10, 2024

lou-k force-pushed the evals-bedrock-async branch 2 times, most recently from 691ffd2 to 7efc3e8 Compare July 10, 2024 22:35

RogerHYang requested a review from anticorrelator July 12, 2024 00:24

RogerHYang mentioned this pull request Jul 12, 2024

[ENHANCEMENT] allow concurrency without coroutines #3847

Closed

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jul 16, 2024

lou-k force-pushed the evals-bedrock-async branch 2 times, most recently from bc9ada5 to 82c3e82 Compare July 16, 2024 13:48

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jul 16, 2024

anticorrelator approved these changes Jul 16, 2024

View reviewed changes

fix: Run Bedrock calls in executor for async

00bbf63

lou-k force-pushed the evals-bedrock-async branch from 1fe6dfb to 00bbf63 Compare July 16, 2024 15:54

anticorrelator merged commit 46e3b1c into Arize-ai:main Jul 16, 2024
6 checks passed

mikeldking mentioned this pull request Jul 16, 2024

chore(main): release arize-phoenix-evals 0.14.1 #3906

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Run Bedrock calls in executor for async #3884

fix: Run Bedrock calls in executor for async #3884

lou-k commented Jul 10, 2024

anticorrelator commented Jul 12, 2024

lou-k commented Jul 15, 2024

lou-k commented Jul 16, 2024 •

edited

Loading

anticorrelator commented Jul 16, 2024

anticorrelator commented Jul 16, 2024

anticorrelator left a comment

lou-k commented Jul 16, 2024

lou-k commented Jul 16, 2024

fix: Run Bedrock calls in executor for async #3884

fix: Run Bedrock calls in executor for async #3884

Conversation

lou-k commented Jul 10, 2024

anticorrelator commented Jul 12, 2024

lou-k commented Jul 15, 2024

lou-k commented Jul 16, 2024 • edited Loading

anticorrelator commented Jul 16, 2024

anticorrelator commented Jul 16, 2024

anticorrelator left a comment

Choose a reason for hiding this comment

lou-k commented Jul 16, 2024

lou-k commented Jul 16, 2024

lou-k commented Jul 16, 2024 •

edited

Loading