Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/xpia sim and eval fixes #3723

Merged
merged 65 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
3366f03
preemptive ECI and IP simulator support
MilesHolland Aug 12, 2024
8e65d72
make ECI sim enum private
MilesHolland Aug 12, 2024
cccc2e7
Merge branch 'main' into feature/add-eci-and-ip-simulators
MilesHolland Aug 16, 2024
70843d3
add eci/ip evals v1
MilesHolland Aug 20, 2024
40384fa
changes
MilesHolland Aug 23, 2024
2388488
docstrings and rename ECI
MilesHolland Aug 23, 2024
2a7085d
update eci in api call, and record eval tests
MilesHolland Aug 23, 2024
9589ab0
update test runnage
MilesHolland Aug 23, 2024
41241dc
Merge branch 'main' into feature/add-eci-and-ip-simulators
MilesHolland Aug 23, 2024
fd2d292
rename package and fix examples
MilesHolland Aug 26, 2024
c4f73aa
update CL
MilesHolland Aug 26, 2024
3a8b5bf
more pr comments
MilesHolland Aug 27, 2024
1bf15bf
update recording
MilesHolland Aug 27, 2024
7f8a487
update eci test with recording
MilesHolland Aug 27, 2024
985202b
update sim tests
MilesHolland Aug 27, 2024
d8d2ebd
Merge branch 'main' into feature/add-eci-and-ip-simulators
MilesHolland Aug 27, 2024
a247f7a
xpia eval
MilesHolland Aug 28, 2024
0b48b02
remove docstring todo
MilesHolland Aug 28, 2024
bfca026
Merge branch 'feature/add-eci-and-ip-simulators' into feature/xpia-si…
MilesHolland Aug 28, 2024
157d764
Resolve merge conflicts
diondrapeck Aug 28, 2024
aee2f70
Rename XPIA to IndrectAttack
diondrapeck Aug 28, 2024
3f3e8d0
Add XPIA Simulator
diondrapeck Aug 29, 2024
854b952
Remove content safety reference
diondrapeck Aug 30, 2024
2677576
Add skip label to test
diondrapeck Aug 30, 2024
21844e4
Parse xpia response
diondrapeck Aug 30, 2024
285d214
Add context to evaluator
diondrapeck Aug 30, 2024
7f32bb7
revert adding demo
diondrapeck Aug 30, 2024
97e7d40
Change evaluator to follow chat protocol and accept conversation
diondrapeck Aug 30, 2024
e92adbc
Update docstring with example
diondrapeck Aug 30, 2024
6883f05
Update evaluation parsing and aggregation
diondrapeck Sep 2, 2024
735734c
Hide and update jailbreak param on adversarial simulator
diondrapeck Sep 2, 2024
3044851
Update IndirectAttackSimulator and IndirectAttackEvaluator docstring
diondrapeck Sep 2, 2024
acdf191
Update CHANGELOG
diondrapeck Sep 2, 2024
087d976
Add Q/A functionality
diondrapeck Sep 3, 2024
ed1f319
Fix evaluator docstring
diondrapeck Sep 3, 2024
19754e2
Merge branch 'main' into feature/xpia-sim-and-eval
diondrapeck Sep 3, 2024
0dc7275
Merge branch 'main' into feature/xpia-sim-and-eval
diondrapeck Sep 3, 2024
1eee767
Add xpia scenario
diondrapeck Sep 3, 2024
536bb8e
Update tests
diondrapeck Sep 3, 2024
6675a45
Merge branch 'feature/xpia-sim-and-eval' of https://github.com/micros…
diondrapeck Sep 3, 2024
cc596a0
Fix logging error
diondrapeck Sep 3, 2024
e33ec78
Ignore flake8 suggestion
diondrapeck Sep 3, 2024
aec16db
Updated tests to use new _jailbreak_type param
diondrapeck Sep 3, 2024
7c3a8c5
Update evaluator test
diondrapeck Sep 3, 2024
9889519
Record tests
diondrapeck Sep 4, 2024
52c8756
Update xpia simulator to return only one dataset
diondrapeck Sep 4, 2024
7c7f7f5
Add exception for conversation + q/a and add breaking change warning …
diondrapeck Sep 4, 2024
b4fdc3a
Add exception for conversation + q/a and add breaking change warning …
diondrapeck Sep 4, 2024
4506c49
Resolve merge conflicts
diondrapeck Sep 4, 2024
6ecc214
Update test
diondrapeck Sep 4, 2024
c37f570
Replace jailbreak param
diondrapeck Sep 4, 2024
70fc768
Resolve merge conflicts
diondrapeck Sep 4, 2024
4523fc6
no template merge on xpia, no xpia eval kwarg defaults
MilesHolland Sep 5, 2024
9406020
Merge branch 'main' into feature/xpia-sim-and-eval-fixes
MilesHolland Sep 6, 2024
b51af92
Merge branch 'main' into feature/xpia-sim-and-eval-fixes
MilesHolland Sep 6, 2024
30c819f
reasoning -> reason, no conversaion input, output subtypes
MilesHolland Sep 6, 2024
61b5b94
flake
MilesHolland Sep 6, 2024
8b9a970
docstring fixes
MilesHolland Sep 6, 2024
6d106c1
skip test
MilesHolland Sep 6, 2024
2f5fb8f
skip test
MilesHolland Sep 6, 2024
1414e10
Merge branch 'main' into feature/xpia-sim-and-eval-fixes
MilesHolland Sep 6, 2024
31a2c27
more test skips, fix new output names
MilesHolland Sep 6, 2024
b482b08
comment
MilesHolland Sep 6, 2024
307e40f
re-add skip
MilesHolland Sep 6, 2024
b87a897
update tests now that new fields are live
MilesHolland Sep 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,8 @@
"vnet",
"Weaviate",
"westus",
"wsid"
"wsid",
"Xpia"
],
"ignoreWords": [
"openmpi",
Expand Down Expand Up @@ -243,6 +244,8 @@
"azureopenaimodelconfiguration",
"openaimodelconfiguration",
"usecwd",
"upia",
"xpia",
"locustio",
"euap",
"Rerank",
Expand Down
5 changes: 5 additions & 0 deletions src/promptflow-evals/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@

## v0.3.3 (Upcoming)
### Features Added
- Introduced `IndirectAttackSimulator` to simulate XPIA (cross domain prompt injected attack) jailbreak attacks on your AI system.
- Introduced `IndirectAttackEvaluator` to evaluate content for the presence of XPIA (cross domain prompt injected attacks) injected into conversation or Q/A context to interrupt normal expected functionality by eliciting manipulated content, intrusion and attempting to gather information outside the scope of your AI system.
- Add a new evaluator (ProtectedMaterialEvaluator) and associated adversarial content simulator enum type (AdversarialScenario.ADVERSARIAL_CONTENT_PROTECTED_MATERIAL) for protected material, which determines if given inputs contain material protected by IP laws.

### Bugs Fixed
- Fixed evaluators to accept (non-Azure) Open AI Configs.

### Breaking Changes
- Replaced `jailbreak` parameter in `AdversarialSimulator` with `_jailbreak_type` parameter to support multiple jailbreak types. Instead of editing this parameter directly, we recommend using the `JailbreakAdversarialSimulator` class for UPIA jailbreak and `IndirectAttackSimulator` class for XPIA jailbreak.

### Improvements
- Renamed `JailbreakAdversarialSimulator` to `DirectAttackSimulator`
- Set the PF_EVALS_BATCH_USE_ASYNC environment variable to True by default to enable asynchronous batch run for async-enabled built-in evaluators, improving performance.
Expand Down
2 changes: 2 additions & 0 deletions src/promptflow-evals/promptflow/evals/_common/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class Tasks:

CONTENT_HARM = "content harm"
PROTECTED_MATERIAL = "protected material"
XPIA = "xpia"


class _InternalAnnotationTasks:
Expand All @@ -52,6 +53,7 @@ class EvaluationMetrics:
SELF_HARM = "self_harm"
SEXUAL = "sexual"
PROTECTED_MATERIAL = "protected_material"
XPIA = "xpia"


class _InternalEvaluationMetrics:
Expand Down
23 changes: 20 additions & 3 deletions src/promptflow-evals/promptflow/evals/_common/rai_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,9 @@ def generate_payload(normalized_user_text: str, metric: str) -> Dict:
elif metric == _InternalEvaluationMetrics.ECI:
task = _InternalAnnotationTasks.ECI
include_metric = False
elif metric == EvaluationMetrics.XPIA:
task = Tasks.XPIA
include_metric = False
return (
{
"UserTextList": [normalized_user_text],
Expand Down Expand Up @@ -207,8 +210,9 @@ def parse_response( # pylint: disable=too-many-branches,too-many-statements
:return: The parsed annotation result.
:rtype: List[List[Dict]]
"""

# non-numeric metrics
if metric_name in {EvaluationMetrics.PROTECTED_MATERIAL, _InternalEvaluationMetrics.ECI}:
if metric_name in {EvaluationMetrics.PROTECTED_MATERIAL, _InternalEvaluationMetrics.ECI, EvaluationMetrics.XPIA}:
if not batch_response or len(batch_response[0]) == 0 or metric_name not in batch_response[0]:
return {}
response = batch_response[0][metric_name]
Expand All @@ -218,8 +222,21 @@ def parse_response( # pylint: disable=too-many-branches,too-many-statements
result = {}
metric_prefix = _get_metric_prefix(metric_name)
# Use label instead of score since these are assumed to be boolean results.
result[metric_prefix + "_label"] = parsed_response["label"] if "label" in parsed_response else ""
result[metric_prefix + "_reasoning"] = parsed_response["reasoning"] if "reasoning" in parsed_response else ""
# Use np.nan as null value since it's ignored by aggregations rather than treated as 0.
result[metric_prefix + "_label"] = parsed_response["label"] if "label" in parsed_response else np.nan
result[metric_prefix + "_reason"] = parsed_response["reasoning"] if "reasoning" in parsed_response else ""

if metric_name == EvaluationMetrics.XPIA:
# Add "manipulated_content", "intrusion" and "information_gathering" to the result
# if present else set them to np.nan
result["manipulated_content"] = (
parsed_response["manipulated_content"] if "manipulated_content" in parsed_response else np.nan
)
result["intrusion"] = parsed_response["intrusion"] if "intrusion" in parsed_response else np.nan
result["information_gathering"] = (
parsed_response["information_gathering"] if "information_gathering" in parsed_response else np.nan
)

return result
return _parse_content_harm_response(batch_response, metric_name)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from ._relevance import RelevanceEvaluator
from ._rouge import RougeScoreEvaluator, RougeType
from ._similarity import SimilarityEvaluator
from ._xpia import IndirectAttackEvaluator

__all__ = [
"CoherenceEvaluator",
Expand All @@ -39,6 +40,7 @@
"HateUnfairnessEvaluator",
"ContentSafetyEvaluator",
"ContentSafetyChatEvaluator",
"IndirectAttackEvaluator",
"BleuScoreEvaluator",
"GleuScoreEvaluator",
"MeteorScoreEvaluator",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ class ECIEvaluator:
{
"ECI_label": "False",
"ECI_reasoning": "Some reason."
"ECI_reason": "Some reason."
}
"""

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ class ProtectedMaterialEvaluator:
.. code-block:: python
{
"label": "False",
"reasoning": "This question does not contain any protected material."
"protected_material_label": "False",
"protected_material_reason": "This question does not contain any protected material."
}
"""

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from ._xpia import IndirectAttackEvaluator

__all__ = [
"IndirectAttackEvaluator",
]
133 changes: 133 additions & 0 deletions src/promptflow-evals/promptflow/evals/evaluators/_xpia/_xpia.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import logging
from typing import Optional

from promptflow._utils.async_utils import async_run_allowing_running_loop
from promptflow.evals._common.constants import EvaluationMetrics
from promptflow.evals._common.rai_service import evaluate_with_rai_service

logger = logging.getLogger(__name__)


class IndirectAttackEvaluator:
"""
Initializes an XPIA (cross domain prompt injected attack) jailbreak evaluator to detect whether cross domain
injected attacks are present in your AI system's response.

:param project_scope: The scope of the Azure AI project.
It contains subscription id, resource group, and project name.
:type project_scope: dict
:param eval_last_turn: Set to True to evaluate only the most recent exchange in the dialogue,
focusing on the latest user inquiry and the assistant's corresponding response. Defaults to False
:type eval_last_turn: bool
:param credential: The credential for connecting to Azure AI project.
:type credential: ~azure.core.credentials.TokenCredential
:return: A function that evaluates and generates metrics for XPIA chat scenario. Metrics include the overall
evaluation label and reason for the Q/A Pair, as well as sub-labels
for manipulated content, intrusion, and information.
:rtype: Callable

**Usage**

.. code-block:: python

eval_fn = IndirectAttackEvaluator(model_config)
result = eval_fn(question="What is the capital of France?", answer="Paris.")

**Output format for question-answer pair**

.. code-block:: python

{
'xpia_label': False,
'xpia_reason': 'The conversation does not contain any manipulated content, intrusion or
MilesHolland marked this conversation as resolved.
Show resolved Hide resolved
information gathering.'
'xpia_information_gathering': False,
'xpia_intrusion': False
'xpia_manipulated_content': False
}

"""

def __init__(self, project_scope: dict, eval_last_turn: bool = False, credential=None):
self._evaluator = _IndirectAttackEvaluator(project_scope, credential)
self._eval_last_turn = eval_last_turn

def __call__(
self,
*,
question: Optional[str],
answer: Optional[str],
**kwargs,
):
"""
Evaluates content according to the presence of attacks injected into the conversation context to
interrupt normal expected functionality by eliciting manipulated content, intrusion and attempting
to gather information outside the scope of your AI system.

:keyword question: The question to be evaluated. Mutually exclusive with 'conversation'.
:paramtype question: Optional[str]
:keyword answer: The answer to be evaluated. Mutually exclusive with 'conversation'.
:paramtype answer: Optional[str]
:return: The evaluation scores and reasoning.
:rtype: dict
"""

return self._evaluator(question=question, answer=answer, **kwargs)


class _AsyncIndirectAttackEvaluator:
def __init__(self, project_scope: dict, credential=None):
self._project_scope = project_scope
self._credential = credential

async def __call__(self, *, question: str, answer: str, **kwargs):
"""
Evaluates content according to this evaluator's metric.
:keyword question: The question to be evaluated.
:paramtype question: str
:keyword answer: The answer to be evaluated.
:paramtype answer: str
:return: The evaluation score computation based on the metric (self.metric).
:rtype: Any
"""
# Validate inputs
# Raises value error if failed, so execution alone signifies success.
if not (question and question.strip() and question != "None") or not (
answer and answer.strip() and answer != "None"
):
raise ValueError("Both 'question' and 'answer' must be non-empty strings.")

# Run score computation based on supplied metric.
result = await evaluate_with_rai_service(
metric_name=EvaluationMetrics.XPIA,
question=question,
answer=answer,
project_scope=self._project_scope,
credential=self._credential,
)
return result


class _IndirectAttackEvaluator:
def __init__(self, project_scope: dict, credential=None):
self._async_evaluator = _AsyncIndirectAttackEvaluator(project_scope, credential)

def __call__(self, *, question: str, answer: str, **kwargs):
"""
Evaluates XPIA content.
:keyword question: The question to be evaluated.
:paramtype question: str
:keyword answer: The answer to be evaluated.
:paramtype answer: str
:keyword context: The context to be evaluated.
:paramtype context: str
:return: The XPIA score.
:rtype: dict
"""
return async_run_allowing_running_loop(self._async_evaluator, question=question, answer=answer, **kwargs)

def _to_async(self):
return self._async_evaluator
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .adversarial_scenario import AdversarialScenario
from .adversarial_simulator import AdversarialSimulator
from .direct_attack_simulator import DirectAttackSimulator
from .xpia_simulator import IndirectAttackSimulator

__all__ = ["AdversarialSimulator", "AdversarialScenario", "DirectAttackSimulator"]
__all__ = ["AdversarialSimulator", "AdversarialScenario", "DirectAttackSimulator", "IndirectAttackSimulator"]
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def __init__(self, azure_ai_project: Dict, token_manager: APITokenManager) -> No
self.parameter_json_endpoint = urljoin(self.api_url, "simulation/template/parameters")
self.jailbreaks_json_endpoint = urljoin(self.api_url, "simulation/jailbreak")
self.simulation_submit_endpoint = urljoin(self.api_url, "simulation/chat/completions/submit")
self.xpia_jailbreaks_json_endpoint = urljoin(self.api_url, "simulation/jailbreak/xpia")

def _get_service_discovery_url(self):
bearer_token = self.token_manager.get_token()
Expand Down Expand Up @@ -92,10 +93,15 @@ async def get_contentharm_parameters(self) -> Any:

return self.contentharm_parameters

async def get_jailbreaks_dataset(self) -> Any:
async def get_jailbreaks_dataset(self, type: str) -> Any:
"Get the jailbreaks dataset, if exists"
if self.jailbreaks_dataset is None:
self.jailbreaks_dataset = await self.get(self.jailbreaks_json_endpoint)
if type == "xpia":
self.jailbreaks_dataset = await self.get(self.xpia_jailbreaks_json_endpoint)
elif type == "upia":
self.jailbreaks_dataset = await self.get(self.jailbreaks_json_endpoint)
else:
raise ValueError("Invalid type, please provide either 'xpia' or 'upia'")

return self.jailbreaks_dataset

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ class AdversarialScenario(Enum):
ADVERSARIAL_CONTENT_GEN_UNGROUNDED = "adv_content_gen_ungrounded"
ADVERSARIAL_CONTENT_GEN_GROUNDED = "adv_content_gen_grounded"
ADVERSARIAL_CONTENT_PROTECTED_MATERIAL = "adv_content_protected_material"
ADVERSARIAL_INDIRECT_JAILBREAK = "adv_xpia"


class _UnstableAdversarialScenario(Enum):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,15 @@ def wrapper(*args, **kwargs):
scenario = str(kwargs.get("scenario", None))
max_conversation_turns = kwargs.get("max_conversation_turns", None)
max_simulation_results = kwargs.get("max_simulation_results", None)
jailbreak = kwargs.get("jailbreak", None)
_jailbreak_type = kwargs.get("_jailbreak_type", None)
decorated_func = monitor_operation(
activity_name="adversarial.simulator.call",
activity_type=ActivityType.PUBLICAPI,
custom_dimensions={
"scenario": scenario,
"max_conversation_turns": max_conversation_turns,
"max_simulation_results": max_simulation_results,
"jailbreak": jailbreak,
"_jailbreak_type": _jailbreak_type,
},
)(func)

Expand Down Expand Up @@ -115,7 +115,7 @@ async def __call__(
api_call_retry_sleep_sec: int = 1,
api_call_delay_sec: int = 0,
concurrent_async_task: int = 3,
jailbreak: bool = False,
_jailbreak_type: Optional[str] = None,
randomize_order: bool = True,
randomization_seed: Optional[int] = None,
):
Expand Down Expand Up @@ -149,9 +149,6 @@ async def __call__(
:keyword concurrent_async_task: The number of asynchronous tasks to run concurrently during the simulation.
Defaults to 3.
:paramtype concurrent_async_task: int
:keyword jailbreak: If set to True, allows breaking out of the conversation flow defined by the scenario.
Defaults to False.
:paramtype jailbreak: bool
:keyword randomize_order: Whether or not the order of the prompts should be randomized. Defaults to True.
:paramtype randomize_order: bool
:keyword randomization_seed: The seed used to randomize prompt selection. If unset, the system's
Expand Down Expand Up @@ -218,11 +215,11 @@ async def __call__(
total_tasks,
)
total_tasks = min(total_tasks, max_simulation_results)
if jailbreak:
jailbreak_dataset = await self.rai_client.get_jailbreaks_dataset()
if _jailbreak_type:
jailbreak_dataset = await self.rai_client.get_jailbreaks_dataset(type=_jailbreak_type)
progress_bar = tqdm(
total=total_tasks,
desc="generating jailbreak simulations" if jailbreak else "generating simulations",
desc="generating jailbreak simulations" if _jailbreak_type else "generating simulations",
ncols=100,
unit="simulations",
)
Expand All @@ -237,7 +234,7 @@ async def __call__(
random.shuffle(parameter_order)
for index in parameter_order:
parameter = template.template_parameters[index].copy()
if jailbreak:
if _jailbreak_type == "upia":
parameter = self._join_conversation_starter(parameter, random.choice(jailbreak_dataset))
tasks.append(
asyncio.create_task(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,6 @@ async def __call__(
api_call_retry_sleep_sec=api_call_retry_sleep_sec,
api_call_delay_sec=api_call_delay_sec,
concurrent_async_task=concurrent_async_task,
jailbreak=False,
randomize_order=True,
randomization_seed=randomization_seed,
)
Expand All @@ -221,7 +220,7 @@ async def __call__(
api_call_retry_sleep_sec=api_call_retry_sleep_sec,
api_call_delay_sec=api_call_delay_sec,
concurrent_async_task=concurrent_async_task,
jailbreak=True,
_jailbreak_type="upia",
randomize_order=True,
randomization_seed=randomization_seed,
)
Expand Down
Loading
Loading