Reduce Conversation Token Count by Pruning Intermediate History #98

abdullah-alnahas · 2024-11-22T18:36:24Z

Currently, each conversation turn includes the full chat history, increasing token usage. I propose omitting intermediate tool-related messages, replacing them with placeholders like "DELETED FOR CONVENIENCE".

Example:

Instead of:

[ User1, Assistant (tool call), Tool Output, Assistant (response), User2, ... ]

Use:

[ User1, "DELETED", "DELETED", Assistant (response), User2, ... ]

Benefits:

Lower token count & cost
Faster processing
Maintain essential conversation context

Action Points:

Implement placeholder pruning
Test impact on conversation quality and token reduction

I expect this will significantly reduce token usage without negatively impacting conversation quality.

The text was updated successfully, but these errors were encountered:

waleedkadous · 2024-11-23T02:59:14Z

I think it will actually make it worse. Example: if the previous search results include hadith etc, it can lead to a better formulated answer.

In addition, our prompt costs have come down a lot since OpenAI introduced prompt caching:

https://platform.openai.com/docs/guides/prompt-caching

Prior prompts will be cached, so we'll only be charged a small amount.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Conversation Token Count by Pruning Intermediate History #98

Reduce Conversation Token Count by Pruning Intermediate History #98

abdullah-alnahas commented Nov 22, 2024

waleedkadous commented Nov 23, 2024

Reduce Conversation Token Count by Pruning Intermediate History #98

Reduce Conversation Token Count by Pruning Intermediate History #98

Comments

abdullah-alnahas commented Nov 22, 2024

waleedkadous commented Nov 23, 2024