Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Conversation Token Count by Pruning Intermediate History #98

Open
abdullah-alnahas opened this issue Nov 22, 2024 · 1 comment
Open

Comments

@abdullah-alnahas
Copy link
Collaborator

Currently, each conversation turn includes the full chat history, increasing token usage. I propose omitting intermediate tool-related messages, replacing them with placeholders like "DELETED FOR CONVENIENCE".

Example:

Instead of:

[ User1, Assistant (tool call), Tool Output, Assistant (response), User2, ... ]

Use:

[ User1, "DELETED", "DELETED", Assistant (response), User2, ... ]

Benefits:

  • Lower token count & cost
  • Faster processing
  • Maintain essential conversation context

Action Points:

  1. Implement placeholder pruning
  2. Test impact on conversation quality and token reduction

I expect this will significantly reduce token usage without negatively impacting conversation quality.

@waleedkadous
Copy link
Collaborator

I think it will actually make it worse. Example: if the previous search results include hadith etc, it can lead to a better formulated answer.

In addition, our prompt costs have come down a lot since OpenAI introduced prompt caching:

https://platform.openai.com/docs/guides/prompt-caching

Prior prompts will be cached, so we'll only be charged a small amount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants