Mabble Rabble
random ramblings & thunderous tidbits
Fundamental Methods of Prediction Speed-Ups
There are four fundamental ways in which one can speed-up prediction and reduce memory footprint of transformer models:
Knowledge Distillation
Quantization
Pruning
Graph Optimization
Newer Post
Older Post
Home