📊 Model Configuration
Number of Transformer layers in the model
Number of Key-Value heads in the attention mechanism
Dimension of each attention head
Total number of model parameters, in billions
💻 System Configuration
GPU or system available memory size
💬 Conversation Mode
Average number of turns in a conversation
Frequency of new conversations starting
Average time interval between consecutive requests within the same conversation
Average number of input+output tokens per request
📈 Calculation Results
Hit Rate
-
%
Cache Utilization
-
%
Derived QPS
-
req/s
Cache Memory
-
GB
Detailed Metrics
Memory per Token:
-
Maximum Cached Tokens:
-
Active Conversations:
-
Cache Hits per Second:
-
Model Memory Usage:
-
Cacheable Conversations:
-
💡 Optimization Suggestions
Click the Calculate button to get optimization suggestions