diff --git a/website/blog-30-days-of-ia-2024/2024-10-17/evaluate-with-ai.md b/website/blog-30-days-of-ia-2024/2024-10-17/evaluate-with-ai.md
index 0c78153d13..4b7a8a25dd 100644
--- a/website/blog-30-days-of-ia-2024/2024-10-17/evaluate-with-ai.md
+++ b/website/blog-30-days-of-ia-2024/2024-10-17/evaluate-with-ai.md
@@ -3,7 +3,7 @@ date: 2024-10-17T09:00
slug: evaluate-with-ai
title: "2.4 Evaluate with AI!"
authors: [nitya, marlene]
-draft: false
+draft: true
hide_table_of_contents: false
toc_min_heading_level: 2
toc_max_heading_level: 3
@@ -87,9 +87,9 @@ Let's take a quick look at the [**default quality metrics**](https://learn.micro
| Metric | What does it assess? | How does it work? | When should you use it? | Inputs Needed |
|:--|:--|:--|:--|:--|
| **Groundedness**
1=ungrounded
5=grounded | How well does model's generated answers align with information from source data ("context")? | Checks if response corresponds _verifiably_ to source context |When factual correctness and contextual accuracy are key - e.g., is it grounded in "my" product data? | Question, Context, Generated Response |
-| **Relevance**
1=bad
5=good | Are the model's generated responses pertinent, and directly related, to the given queries? | Assesses ability of responses to capture the key points of context that relate to the query | When evaluating your application's ability to understand the inputs and generate _contextually-relevant_ responses | |
-| **Groundedness**| Given support knowledge, does the ANSWER use the information provided by the CONTEXT? | | | |
-| **Relevance**| How well does the ANSWER address the main aspects of the QUESTION, based on the CONTEXT? | | | |
+| **Relevance**
1=bad
5=good | Are the model's generated responses pertinent, and directly related, to the given queries? | Assesses ability of responses to capture the key points of context that relate to the query | When evaluating your application's ability to understand the inputs and generate _contextually-relevant_ responses | Question, Answer|
+| **Fluency** 1=bad
5=fluent| How grammatically and linguistically correct the model's predicted answer is. | Checks quality of individual sentences in the ANSWER? Are they well-written and grammatically correct? | When evaluating your application's ability to generate _readable_ responses| Question, Answer |
+| **Coherence**
1=bad
5=good | Measures the quality of all sentences in a model's predicted answer and how they fit together naturally. | Checks how well do all sentences in the ANSWER fit together? Do they sound natural when taken as a whole? | When the _readability_ of the response is important | Question, Answer|
To create these custom evaluators, we need to do three things: