Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/xpia sim and eval fixes #3723

Merged
merged 65 commits into from
Sep 6, 2024
Merged

Conversation

MilesHolland
Copy link
Member

@MilesHolland MilesHolland commented Sep 6, 2024

A fork of the original XPIA sim/eval branch with additional fixes for bugs discovered last night.

  • changes the jailbreak check for combining templates to only account for upia (since xpia doesn't merge templates)
  • removes conversations as an input for xpia evals until the default override bug is fixed.
  • Accounts for new xpia evaluator return fields.
  • Changes the output base name of 'reasoning' fields for label-based evaluators to just 'reason'

Original PR: #3703

Copy link

github-actions bot commented Sep 6, 2024

promptflow-evals test result

 12 files   12 suites   1h 37m 44s ⏱️
 19 tests  12 ✅  7 💤 0 ❌
228 runs  144 ✅ 84 💤 0 ❌

Results for commit b87a897.

♻️ This comment has been updated with latest results.

ninghu
ninghu previously approved these changes Sep 6, 2024
luigiw
luigiw previously approved these changes Sep 6, 2024
@MilesHolland MilesHolland dismissed stale reviews from luigiw and ninghu via 6d106c1 September 6, 2024 19:30
ninghu
ninghu previously approved these changes Sep 6, 2024
ninghu
ninghu previously approved these changes Sep 6, 2024
@MilesHolland MilesHolland merged commit b04e889 into main Sep 6, 2024
77 checks passed
@MilesHolland MilesHolland deleted the feature/xpia-sim-and-eval-fixes branch September 6, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants