Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top K behavior is incompatible between index method Economical and High Quality #12144

Open
5 tasks done
utsumi-fj opened this issue Dec 27, 2024 · 1 comment
Open
5 tasks done
Labels
🐞 bug Something isn't working

Comments

@utsumi-fj
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Top K behavior is incompatible between index method Economical and High Quality.
There are two Top K settings in Dify. One is for the knowledge and another is for the Retrieval Setting of the context.
When using Economical knowledge, Top K for the knowledge seems to be overwritten by Top K for Retrieval Setting.
In other words, with Economical knowledge, Top K for Retrieval Setting seems to be used in actual retrieval and Top K for the knowledge seems not to be used.
On the other hand, with High Quality knowledge, Top K for the knowledge seems to be used in actual retrieval and Top K for Retrieval Setting seems not to be used.

Steps to reproduce:

1. Create Economical knowledge and High Quality knowledge

Create two knowledges one with index method Economical and another with index method High Quality.

2. Set Top K for both knowledges

For both knowledges(Economical and High Quality), set the specific value to Top K (e.g. Top K = 2).

Economical, Top K = 2:
economical_knowledge_settings

High Quality, Top K = 2:
high_quality_knowledge_settings

3. Create chatbot with Economical knowledge and chatbot with High Quality knowledge

Create two chatbots one with the context including only Economical knowledge and another with the context including only High Quality knowledge. And, in Retrieval Setting, set the different value from the value in step 2 to Top K (e.g. Top K = 10).

4. Compare the number of chunks retrieved by two chatbots

For two chatbots(Economical and High Quality), input a query respectively, and compare the number of chunks retrieved by two chatbots.

✔️ Expected Behavior

The priority for two Top K settings (one for knowledge and another for Retrieval Setting) is same between index method Economical and High Quality.

❌ Actual Behavior

Retrieval result with Economical knowledge

The number of the retrieved chunks is same as Top K value in Retrieval Setting as follows.
economical_retrieval_result

Retrieval result with High Quality knowledge

The number of the retrieved chunks is different from Top K value in Retrieval Setting and same as Top K value for the knowledge.
high_quality_retrieval_result

Copy link

dosubot bot commented Dec 27, 2024

Hey there, @utsumi-fj! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's get that issue sorted out together!

The incompatibility in Top K behavior between the Economical and High Quality index methods in Dify arises from their different retrieval mechanisms. The Economical method uses a keyword-based retrieval approach, which prioritizes the Top K setting from the Retrieval Setting, effectively overriding the Top K setting for the knowledge base. In contrast, the High Quality method employs a more sophisticated retrieval process, such as semantic or vector-based search, which respects the Top K setting configured for the knowledge base itself [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant