📊 Model Configuration

Number of Transformer layers in the model
Number of Key-Value heads in the attention mechanism
Dimension of each attention head
Total number of model parameters, in billions

💻 System Configuration

GPU or system available memory size

💬 Conversation Mode

Average number of turns in a conversation
Frequency of new conversations starting
Average time interval between consecutive requests within the same conversation
Average number of input+output tokens per request

📈 Calculation Results

Hit Rate

-
%

Cache Utilization

-
%

Derived QPS

-
req/s

Cache Memory

-
GB

Detailed Metrics

Memory per Token: -
Maximum Cached Tokens: -
Active Conversations: -
Cache Hits per Second: -
Model Memory Usage: -
Cacheable Conversations: -

💡 Optimization Suggestions

Click the Calculate button to get optimization suggestions