LLM Perf Model
GitHub

Per-Op Layer Breakdown

Per-operation analysis of a single Transformer layer — FLOPs, IO bytes, arithmetic intensity, and bottleneck type.

🧠 Model

🖥️ Device

📊 Device Utilization

⚙️ Runtime Configuration

🧮 Quantization

Operation FLOPs Bytes (R+W) AI Bound Time %
Attention
Q/K/V Proj + Attn + O Proj
FFN / MLP
Gate, Up, Down + Activation
Norm + Residual
RMSNorm, Residual connections

See overall performance estimates

📊 Overall Performance ←