Tutorial on Continuous Monitoring using ML Pipelines (#3644)

# Description A tutorial about creating a pipeline for continuous monitoring. It describes an advanced use case of [running flows in ML pipelines](https://github.com/microsoft/promptflow/tree/main/examples/tutorials/run-flow-with-pipeline). # All Promptflow Contribution checklist: - [x] **The pull request does not introduce [breaking changes].** - [x] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [x] **I have read the [contribution guidelines](https://github.com/microsoft/promptflow/blob/main/CONTRIBUTING.md).** - [x] **I confirm that all new dependencies are compatible with the MIT license.** - [x] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [x] Title of the pull request is clear and informative. - [x] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [ ] Pull request includes test coverage for the included changes. Co-authored-by: Philip Gao <yigao@microsoft.com> Co-authored-by: Brynn Yin <24237253+brynn-code@users.noreply.github.com>
microsoft · Aug 14, 2024 · f9936e5 · f9936e5
1 parent 027dbd9
commit f9936e5
Show file tree

Hide file tree

Showing 15 changed files with 1,023 additions and 0 deletions.
diff --git a/examples/flows/integrations/continuous-monitoring-with-pipeline/README.md b/examples/flows/integrations/continuous-monitoring-with-pipeline/README.md
@@ -0,0 +1,69 @@
+# Continuous Monitoring Pipeline
+
+This tutorial describes an advanced use case of [running flows in Azure ML Pipelines](https://github.com/microsoft/promptflow/blob/main/examples/tutorials/run-flow-with-pipeline/pipeline.ipynb).  
+The detailed explanations of the prerequisites and principles can be found in the aforementioned article.  
+Continuous monitoring is necessary to maintain the quality, performance and efficiency of Generative AI applications.  
+These factors directly impact the user experience and operational costs.  
+
+We will run evaluations on a basic chatbot flow, then aggregate the results to export and visualize the metrics.  
+The flows used in this pipeline are described below:
+- [Basic Chat](https://github.com/microsoft/promptflow/tree/main/examples/flows/chat/chat-basic)
+- [Q&A Evaluation](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-qna-rag-metrics)
+- [Perceived Intelligence Evaluation](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-perceived-intelligence)
+- [Summarization Evaluation](https://github.com/microsoft/promptflow/tree/main/examples/flows/evaluation/eval-summarization)
+
+Connections used in this flow:
+- `azure_open_ai_connection` connection (Azure OpenAI).
+
+## Prerequisites
+
+### Prompt flow SDK:
+- Azure cloud setup:
+  - An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
+  - Create an Azure ML resource from Azure portal - [Create a Azure ML workspace](https://ms.portal.azure.com/#view/Microsoft_Azure_Marketplace/MarketplaceOffersBlade/searchQuery/machine%20learning)
+  - Connect to your workspace then setup a basic computer cluster - [Configure workspace](https://github.com/microsoft/promptflow/blob/main/examples/configuration.ipynb)
+- Local environment setup:
+  - A python environment
+  - Installed Azure Machine Learning Python SDK v2 - [install instructions](https://github.com/microsoft/promptflow/blob/main/examples/README.md) - check the getting started section and make sure version of 'azure-ai-ml' is higher than `1.12.0`
+
+Note: when using the Prompt flow SDK, it may be useful to also install the [`Prompt flow for VS Code`](https://marketplace.visualstudio.com/items?itemName=prompt-flow.prompt-flow) extension (if using VS Code).
+
+### Azure AI/ML Studio:
+Start a compute session.  
+The follow the installation steps described in the notebook.
+
+## Setup connections
+Ensure that you have a connection to Azure OpenAI with the following deployments:
+- `gpt-35-turbo`
+- `gpt-4`
+
+## Run pipeline
+
+Run the notebook's steps until `3.2.2 Submit the job` to start the pipeline in Azure ML Studio.
+
+## Pipeline description
+The first node reads the evaluation dataset.  
+The second node is the main flow that will be monitored, it takes the output of the evaluation dataset as a `data` input.  
+After the main flow's node has completed, its output will go to 3 nodes:
+- Q&A Evaluation
+- Perceived Intelligence Evaluation
+- Simple Summarization
+
+The Simple Summarization and the main nodes' outputs will become the Summarization Evaluation node's input.
+
+Finally, all the evaluation metrics will be aggregated and displayed in Azure ML Pipeline's interface.
+
+![continuous_monitoring_pipeline.png](./monitoring/media/continuous_monitoring_pipeline.png)
+
+## Metrics visualization
+In the node `Convert evaluation results to parquet` Metrics tab, the aggregated metrics will be displayed.
+
+![metrics_tab.png](./monitoring/media/metrics_tab.png)
+
+The evolution of the metrics can be monitored by comparing multiple pipeline runs:  
+
+![compare_button.png](./monitoring/media/compare_button.png)
+
+![compare_metrics.png](./monitoring/media/compare_metrics.png)
+## Contact
+Please reach out to Lou Bigard (<loubigard@microsoft.com>) with any issues.
diff --git a/...ons/continuous-monitoring-with-pipeline/flows/standard/simple-summarization/flow.dag.yaml b/...ons/continuous-monitoring-with-pipeline/flows/standard/simple-summarization/flow.dag.yaml
@@ -0,0 +1,45 @@
+$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
+environment:
+  python_requirements_txt: requirements.txt
+inputs:
+  answer:
+    type: string
+outputs:
+  summary:
+    type: string
+    reference: ${summarize_text_content.output}
+nodes:
+- name: summarize_text_content
+  use_variants: true
+node_variants:
+  summarize_text_content:
+    default_variant_id: variant_0
+    variants:
+      variant_0:
+        node:
+          type: llm
+          source:
+            type: code
+            path: summarize_text_content.jinja2
+          inputs:
+            deployment_name: gpt-35-turbo
+            model: gpt-3.5-turbo
+            max_tokens: 128
+            temperature: 0.2
+            text: ${inputs.answer}
+          connection: open_ai_connection
+          api: chat
+      variant_1:
+        node:
+          type: llm
+          source:
+            type: code
+            path: summarize_text_content__variant_1.jinja2
+          inputs:
+            deployment_name: gpt-35-turbo
+            model: gpt-3.5-turbo
+            max_tokens: 256
+            temperature: 0.3
+            text: ${inputs.answer}
+          connection: open_ai_connection
+          api: chat
diff --git a/.../continuous-monitoring-with-pipeline/flows/standard/simple-summarization/requirements.txt b/.../continuous-monitoring-with-pipeline/flows/standard/simple-summarization/requirements.txt
@@ -0,0 +1,2 @@
+promptflow[azure]>=1.7.0
+promptflow-tools
diff --git a/...onitoring-with-pipeline/flows/standard/simple-summarization/summarize_text_content.jinja2 b/...onitoring-with-pipeline/flows/standard/simple-summarization/summarize_text_content.jinja2
@@ -0,0 +1,7 @@
+# system:
+Please summarize the following text in one paragraph. 100 words.
+Do not add any information that is not in the text.
+
+# user:
+Text: {{text}}
+Summary:
diff --git a/...ith-pipeline/flows/standard/simple-summarization/summarize_text_content__variant_1.jinja2 b/...ith-pipeline/flows/standard/simple-summarization/summarize_text_content__variant_1.jinja2
@@ -0,0 +1,7 @@
+# system:
+Please summarize some keywords of this paragraph and have some details of each keywords.
+Do not add any information that is not in the text.
+
+# user:
+Text: {{text}}
+Summary:
diff --git a/...ions/continuous-monitoring-with-pipeline/monitoring/components/convert_parquet/conda.yaml b/...ions/continuous-monitoring-with-pipeline/monitoring/components/convert_parquet/conda.yaml
@@ -0,0 +1,10 @@
+name: convert_to_parquet
+channels:
+  - defaults
+dependencies:
+  - python=3.10
+  - pip=22.2
+  - pip:
+    - azureml-mlflow==1.56.0
+    - pandas==2.2.2
+    - pyarrow
diff --git a/...tinuous-monitoring-with-pipeline/monitoring/components/convert_parquet/convert_parquet.py b/...tinuous-monitoring-with-pipeline/monitoring/components/convert_parquet/convert_parquet.py
@@ -0,0 +1,116 @@
+from pathlib import Path
+import pandas as pd
+import mlflow
+import argparse
+import datetime
+from functools import reduce
+
+
+def parse_args():
+    # setup argparse
+    parser = argparse.ArgumentParser()
+
+    # add arguments
+    parser.add_argument(
+        "--eval_qna_rag_metrics_output_folder",
+        type=str,
+        help="path containing data for qna rag evaluation metrics",
+    )
+    parser.add_argument(
+        "--eval_perceived_intelligence_output_folder",
+        type=str,
+        default="./",
+        help="input path for perceived intelligence evaluation metrics",
+    )
+
+    parser.add_argument(
+        "--eval_summarization_output_folder",
+        type=str,
+        default="./",
+        help="input path for summarization evaluation metrics",
+    )
+
+    parser.add_argument(
+        "--eval_results_output",
+        type=str,
+        default="./",
+        help="output path for aggregated metrics",
+    )
+
+    # parse args
+    args = parser.parse_args()
+
+    # return args
+    return args
+
+
+def get_file(f):
+    f = Path(f)
+    if f.is_file():
+        return f
+    else:
+        files = list(f.iterdir())
+        if len(files) == 1:
+            return files[0]
+        else:
+            raise Exception("********This path contains more than one file*******")
+
+
+def convert_to_parquet(
+    eval_qna_rag_metrics_output_folder,
+    eval_perceived_intelligence_output_folder,
+    eval_summarization_output_folder,
+    eval_results_output,
+):
+    now = f"{datetime.datetime.now():%Y%m%d%H%M%S}"
+
+    eval_qna_rag_metrics_file = get_file(eval_qna_rag_metrics_output_folder)
+    eval_qna_rag_metrics_data = pd.read_json(eval_qna_rag_metrics_file, lines=True)
+
+    eval_perceived_intelligence_file = get_file(
+        eval_perceived_intelligence_output_folder
+    )
+    eval_perceived_intelligence_data = pd.read_json(
+        eval_perceived_intelligence_file, lines=True
+    )
+
+    eval_summarization_file = get_file(eval_summarization_output_folder)
+    eval_summarization_data = pd.read_json(eval_summarization_file, lines=True)
+
+    all_dataframes = [
+        eval_qna_rag_metrics_data,
+        eval_perceived_intelligence_data,
+        eval_summarization_data,
+    ]
+    eval_results_data = reduce(
+        lambda left, right: pd.merge(left, right, on="line_number"), all_dataframes
+    )
+
+    eval_results_data["timestamp"] = pd.Timestamp("now")
+
+    eval_results_data.to_parquet(eval_results_output + f"/{now}_eval_results.parquet")
+
+    eval_results_data_mean = eval_results_data.mean(numeric_only=True)
+
+    for metric, avg in eval_results_data_mean.items():
+        if metric == "line_number":
+            continue
+        mlflow.log_metric(metric, avg)
+
+
+def main(args):
+    convert_to_parquet(
+        args.eval_qna_rag_metrics_output_folder,
+        args.eval_perceived_intelligence_output_folder,
+        args.eval_summarization_output_folder,
+        args.eval_results_output,
+    )
+
+
+# run script
+if __name__ == "__main__":
+    # parse args
+    args = parse_args()
+
+    # call main function
+    main(args)
diff --git a/...nuous-monitoring-with-pipeline/monitoring/components/convert_parquet/convert_parquet.yaml b/...nuous-monitoring-with-pipeline/monitoring/components/convert_parquet/convert_parquet.yaml
@@ -0,0 +1,20 @@
+$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
+type: command
+
+name: convert_to_parquet
+display_name: Convert evaluation results to parquet
+inputs:
+  eval_qna_rag_metrics_output_folder: 
+    type: uri_folder
+  eval_perceived_intelligence_output_folder:
+    type: uri_folder
+  eval_summarization_output_folder:
+    type: uri_folder
+outputs:
+  eval_results_output:
+    type: uri_folder
+code: ./
+command: python convert_parquet.py --eval_qna_rag_metrics_output_folder ${{inputs.eval_qna_rag_metrics_output_folder}} --eval_perceived_intelligence_output_folder ${{inputs.eval_perceived_intelligence_output_folder}} --eval_summarization_output_folder ${{inputs.eval_summarization_output_folder}} --eval_results_output ${{outputs.eval_results_output}}
+environment:
+  conda_file: ./conda.yaml
+  image: mcr.microsoft.com/azureml/inference-base-2004:latest