You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Open AI enables automatic Prompt Caching, as described here: https://platform.openai.com/docs/guides/prompt-caching
The number of cached tokens for a prompt is returned in the usage structure, in the "cached_tokens" field. For example:
"usage": {
"prompt_tokens": 2006,
"completion_tokens": 300,
"total_tokens": 2306,
"prompt_tokens_details": {
"cached_tokens": 1920
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
}
The request is for Token Counter to enable to get this field in addition to prompt_tokens, completion_tokens and total_tokens.
Reason
I am not familiar with a way to access this field when using Open AI indirectly with Llamaindex.
Value of Feature
As mentioned in Open AI's docs, having access to this value enables to monitor metrics such as cache hit rates, latency, and the percentage of tokens cached to optimize prompt and caching strategy.
The text was updated successfully, but these errors were encountered:
Feature Description
Open AI enables automatic Prompt Caching, as described here: https://platform.openai.com/docs/guides/prompt-caching
The number of cached tokens for a prompt is returned in the usage structure, in the "cached_tokens" field. For example:
"usage": {
"prompt_tokens": 2006,
"completion_tokens": 300,
"total_tokens": 2306,
"prompt_tokens_details": {
"cached_tokens": 1920
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
}
The request is for Token Counter to enable to get this field in addition to prompt_tokens, completion_tokens and total_tokens.
Reason
I am not familiar with a way to access this field when using Open AI indirectly with Llamaindex.
Value of Feature
As mentioned in Open AI's docs, having access to this value enables to monitor metrics such as cache hit rates, latency, and the percentage of tokens cached to optimize prompt and caching strategy.
The text was updated successfully, but these errors were encountered: