Top K behavior is incompatible between index method `Economical` and `High Quality` #12144

utsumi-fj · 2024-12-27T02:56:26Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Top K behavior is incompatible between index method Economical and High Quality.
There are two Top K settings in Dify. One is for the knowledge and another is for the Retrieval Setting of the context.
When using Economical knowledge, Top K for the knowledge seems to be overwritten by Top K for Retrieval Setting.
In other words, with Economical knowledge, Top K for Retrieval Setting seems to be used in actual retrieval and Top K for the knowledge seems not to be used.
On the other hand, with High Quality knowledge, Top K for the knowledge seems to be used in actual retrieval and Top K for Retrieval Setting seems not to be used.

Steps to reproduce:

1. Create `Economical` knowledge and `High Quality` knowledge

Create two knowledges one with index method Economical and another with index method High Quality.

2. Set Top K for both knowledges

For both knowledges(Economical and High Quality), set the specific value to Top K (e.g. Top K = 2).

Economical, Top K = 2:

High Quality, Top K = 2:

3. Create chatbot with `Economical` knowledge and chatbot with `High Quality` knowledge

Create two chatbots one with the context including only Economical knowledge and another with the context including only High Quality knowledge. And, in Retrieval Setting, set the different value from the value in step 2 to Top K (e.g. Top K = 10).

4. Compare the number of chunks retrieved by two chatbots

For two chatbots(Economical and High Quality), input a query respectively, and compare the number of chunks retrieved by two chatbots.

✔️ Expected Behavior

The priority for two Top K settings (one for knowledge and another for Retrieval Setting) is same between index method Economical and High Quality.

❌ Actual Behavior

Retrieval result with `Economical` knowledge

The number of the retrieved chunks is same as Top K value in Retrieval Setting as follows.

Retrieval result with `High Quality` knowledge

The number of the retrieved chunks is different from Top K value in Retrieval Setting and same as Top K value for the knowledge.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-12-27T02:58:18Z

Hey there, @utsumi-fj! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's get that issue sorted out together!

The incompatibility in Top K behavior between the Economical and High Quality index methods in Dify arises from their different retrieval mechanisms. The Economical method uses a keyword-based retrieval approach, which prioritizes the Top K setting from the Retrieval Setting, effectively overriding the Top K setting for the knowledge base. In contrast, the High Quality method employs a more sophisticated retrieval process, such as semantic or vector-based search, which respects the Top K setting configured for the knowledge base itself ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

dosubot bot added the 🐞 bug Something isn't working label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top K behavior is incompatible between index method `Economical` and `High Quality` #12144

Top K behavior is incompatible between index method `Economical` and `High Quality` #12144

utsumi-fj commented Dec 27, 2024

dosubot bot commented Dec 27, 2024

Top K behavior is incompatible between index method Economical and High Quality #12144

Top K behavior is incompatible between index method Economical and High Quality #12144

Comments

utsumi-fj commented Dec 27, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

1. Create Economical knowledge and High Quality knowledge

2. Set Top K for both knowledges

3. Create chatbot with Economical knowledge and chatbot with High Quality knowledge

4. Compare the number of chunks retrieved by two chatbots

✔️ Expected Behavior

❌ Actual Behavior

Retrieval result with Economical knowledge

Retrieval result with High Quality knowledge

dosubot bot commented Dec 27, 2024

Top K behavior is incompatible between index method `Economical` and `High Quality` #12144

Top K behavior is incompatible between index method `Economical` and `High Quality` #12144

1. Create `Economical` knowledge and `High Quality` knowledge

3. Create chatbot with `Economical` knowledge and chatbot with `High Quality` knowledge

Retrieval result with `Economical` knowledge

Retrieval result with `High Quality` knowledge