GEMMA 4 VRAM ESTIMATOR

Modern Enterprise Hardware Footprint Estimator

INPUT PANEL
Parameters (Params) 11.9B
Quantization Precision 4-bit (QAT)
Context Window 8,192 tokens
KV Cache Quantization 8-bit (FP8)
Host System RAM Overhead 1.0 GB
GQA Architecture Details
Attention Layers 42
Query Attention Heads 32
KV Attention Heads (GQA) 16
Attention Head Dimension 256
ACTIVE VRAM REQUIREMENT
7.50 GB
Model Weights VRAM: 5.95 GB
Optimized KV Cache VRAM: 1.30 GB
Serving Engine Overhead: 0.25 GB
ACTIVE CPU RAM REQUIREMENT
1.00 GB
Offloaded PLE Weights: 0.00 GB
System Host Overhead: 1.00 GB
CALCULATION DIAGNOSTICS
1. MODEL WEIGHTS COMPUTATION
Weights_Size = Params * (Bits / 8)
11.9B Params * (4 Bits / 8) = 5.95 GB
2. GQA KV CACHE ALLOCATION
KV_Bytes = 2 * Layers * KV_Heads * Head_Dim * Context * (KV_Bits / 8)
2 * 42 Layers * 16 KV_Heads * 256 Head_Dim * 8,192 Context * (8 Bits / 8) = 5.64 GB (peak)
Dynamic Engine Optimization (vLLM/llama.cpp Page-Pool): 1.30 GB allocated dynamically (saves ~65% peak)
HARDWARE COMPATIBILITY