Per-Op Layer Breakdown
Per-operation analysis of a single Transformer layer — FLOPs, IO bytes, arithmetic intensity, and bottleneck type.
🧠 Model
🖥️ Device
📊 Device Utilization
⚙️ Runtime Configuration
🧮 Quantization
| Operation | FLOPs | Bytes (R+W) | AI | Bound | Time | % |
|---|
Attention
—
Q/K/V Proj + Attn + O Proj
FFN / MLP
—
Gate, Up, Down + Activation
Norm + Residual
—
RMSNorm, Residual connections
See overall performance estimates
📊 Overall Performance ←