Gemma 4 VRAM Estimator Dashboard

INPUT PANEL

Parameters (Params) 11.9B

Quantization Precision 4-bit (QAT)

Context Window 8,192 tokens

KV Cache Quantization 8-bit (FP8)

PLE CPU Offloading OFF

Host System RAM Overhead 1.0 GB

GQA Architecture Details

Attention Layers 42

Query Attention Heads 32

KV Attention Heads (GQA) 16

Attention Head Dimension 256

ACTIVE VRAM REQUIREMENT

7.50 GB

Model Weights VRAM: 5.95 GB

Optimized KV Cache VRAM: 1.30 GB

Serving Engine Overhead: 0.25 GB

ACTIVE CPU RAM REQUIREMENT

1.00 GB

Offloaded PLE Weights: 0.00 GB

System Host Overhead: 1.00 GB

CALCULATION DIAGNOSTICS

1. MODEL WEIGHTS COMPUTATION

Weights_Size = Params * (Bits / 8)

11.9B Params * (4 Bits / 8) = 5.95 GB

2. GQA KV CACHE ALLOCATION

KV_Bytes = 2 * Layers * KV_Heads * Head_Dim * Context * (KV_Bits / 8)

2 * 42 Layers * 16 KV_Heads * 256 Head_Dim * 8,192 Context * (8 Bits / 8) = 5.64 GB (peak)

Dynamic Engine Optimization (vLLM/llama.cpp Page-Pool): 1.30 GB allocated dynamically (saves ~65% peak)

HARDWARE COMPATIBILITY