Enabled native function calling for O1 + added support for reasoning_effort config in the config. #6256

AlexCuadron · 2025-01-14T04:11:20Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below.
The reasoning_effort parameter can be defined for the o1 family!

Give a summary of what the PR does, explaining any non-trivial design decisions

Added support for native function calling for O1 and added support for specifying the reasoning_effort in the configuration file.

Link of any specific issues this addresses

…arameter.

enyst · 2025-01-14T07:01:12Z

openhands/llm/llm.py

@@ -71,6 +71,7 @@
    'claude-3-5-haiku-20241022',
    'gpt-4o-mini',
    'gpt-4o',
+    'o1',


Have we tried without native function calling, to compare results between with it enabled and it disabled (prompting-based replacement)?

Just to note, strictly speaking using native is already supported, it's just not enabled by default. But there's a native_function_calling setting to enable it.

With native function calling the model solves 48% of the issues, with simulated function calling, 30%

I will make the results available soon, I still need to finish running SWE-Bench Verified (the result above is preliminary after running 300/500 issues)

That's a good result! I'm surprised, I'm losing track of our current evals, I thought it was much lower last time.

When using the current simulated tools from OH, O1's performance degrades significantly. It is quite interesting because 4o's performance is not impacted as much (19% vs 12%)

That makes sense to me actually! We have seen significant differences before. That might include even Sonnet 3.5, I just think we don't know for sure why, because when it jumped from something like ~26% to over 50%, three things happened:

switched from simulated "actions" to native tool calling

also redefined the prompts/tools very very close to Anthropic's tools

also went from Sonnet 3.5 (old) to Sonnet 3.5 (new) 😂
I'm not sure that we know which factor mattered how much on that one. 😅

These preliminary results are on this branch, or the supervisor branch?

Interesting! O1_native_tool_calls gets a higher score than Sonnet 3.5 (but not way higher, in no way enough to justify its price), so being close to Anthrotopic tools might matter but not that much.

The results will be shared today in Huggingface, I am currently evaluating them using the harness.
The supervisor branch will be done soon, but I will run the experiments first and then update the branch before or after ICML deadline (30 Jan), depending on how much work left I have 😅

openhands/core/config/llm_config.py

config.template.toml

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

enyst

Thank you!

enyst · 2025-01-14T20:41:33Z

@AlexCuadron An alternative way to implement in llm.py is to set the new kwarg directly in the partial function, along with the other kwargs that we know at the time of init, then it all works the same. But I'm fine with the current PR implementation too.

I haven't tested it, but if you're happy with it, we can merge it?

AlexCuadron · 2025-01-14T20:46:44Z

@AlexCuadron An alternative way to implement in llm.py is to set the new kwarg directly in the partial function, along with the other kwargs that we know at the time of init, then it all works the same. But I'm fine with the current PR implementation too.

I haven't tested it, but if you're happy with it, we can merge it?

Thanks for the heads up! I tested 4o and o1 and both work without any issue. I can merge it after the tests are completed

AlexCuadron added 2 commits January 13, 2025 22:11

added support for reasoning models. Also added the reasoning_effort p…

715cb87

…arameter.

Merge branch 'main' into o1_native

a9b6554

enyst reviewed Jan 14, 2025

View reviewed changes

openhands/core/config/llm_config.py Show resolved Hide resolved

AlexCuadron added 2 commits January 14, 2025 13:57

Merge branch 'main' into o1_native

f47ff60

added to config.template.toml

52ba208

AlexCuadron requested a review from enyst January 14, 2025 13:44

AlexCuadron added 6 commits January 14, 2025 14:02

fixes

1f1a76c

Merge branch 'main' into o1_native

a9eef0b

Merge branch 'main' into o1_native

3dc1161

Merge branch 'main' into o1_native

40202e3

Merge branch 'main' into o1_native

8517ac4

Merge branch 'main' into o1_native

e33ecb4

enyst reviewed Jan 14, 2025

View reviewed changes

config.template.toml Outdated Show resolved Hide resolved

AlexCuadron and others added 2 commits January 14, 2025 21:34

Merge branch 'main' into o1_native

87c68a8

Update config.template.toml

567221e

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

enyst approved these changes Jan 14, 2025

View reviewed changes

enyst changed the title ~~Added support for native function calling for O1 + reasoning_effort config in the config.~~ Enabled native function calling for O1 + added support for reasoning_effort config in the config. Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabled native function calling for O1 + added support for reasoning_effort config in the config. #6256

Enabled native function calling for O1 + added support for reasoning_effort config in the config. #6256

AlexCuadron commented Jan 14, 2025

enyst Jan 14, 2025

enyst Jan 14, 2025

AlexCuadron Jan 14, 2025

AlexCuadron Jan 14, 2025

enyst Jan 14, 2025

AlexCuadron Jan 14, 2025

enyst Jan 14, 2025

AlexCuadron Jan 14, 2025

enyst left a comment

enyst commented Jan 14, 2025 •

edited

Loading

AlexCuadron commented Jan 14, 2025 •

edited

Loading

Enabled native function calling for O1 + added support for reasoning_effort config in the config. #6256

Are you sure you want to change the base?

Enabled native function calling for O1 + added support for reasoning_effort config in the config. #6256

Conversation

AlexCuadron commented Jan 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enyst left a comment

Choose a reason for hiding this comment

enyst commented Jan 14, 2025 • edited Loading

AlexCuadron commented Jan 14, 2025 • edited Loading

enyst commented Jan 14, 2025 •

edited

Loading

AlexCuadron commented Jan 14, 2025 •

edited

Loading