related documents Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget Conference Proceeding