hewumars
10/11/2019 - 2:42 AM

模型压缩paper汇总

awesome-model-compression

this collecting the papers (main from arxiv.org) about Model compression:
Structure;
Distillation;
Binarization;
Quantization;
Pruning;
Low Rank Approximation.

also, some papers and links collected from below, they are all awesome resources:


1990

  • 【Pruning】 LeCun Y, Denker J S, Solla S A. Optimal brain damage .[C]//Advances in neural information processing systems. 1990: 598-605.

1993

  • Hassibi, Babak, and David G. Stork. Second order derivatives for network pruning: Optimal brain surgeon .[C]Advances in neural information processing systems. 1993.
  • J. L. Holi and J. N. Hwang. [Finite precision error analysis of neural network hardware implementations]. In Ijcnn-91- Seattle International Joint Conference on Neural Networks, pages 519–525 vol.1, 1993.

2001

2011

2012

  • D. Hammerstrom. [A vlsi architecture for highperformance, low-cost, on-chip learning]. In IJCNN International Joint Conference on Neural Networks, pages 537– 544 vol.2, 2012.

2013

2014

2015

2016

2017

2018

2019


Projects

  • NVIDIA TensorRT:  Programmable Inference Accelerator;  
  • Tencent/PocketFlow:  An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications;
  • dmlc/tvm:  Open deep learning compiler stack for cpu, gpu and specialized accelerators;
  • Tencent/ncnn:  ncnn is a high-performance neural network inference framework optimized for the mobile platform;
  • pytorch/glow:  Compiler for Neural Network hardware accelerators;
  • NervanaSystems/neon:  Intel® Nervana™ reference deep learning framework committed to best performance on all hardware;
  • NervanaSystems/distiller:  Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research;
  • OAID/Tengine:  Tengine is a lite, high performance, modular inference engine for embedded device;
  • fpeder/espresso:  Efficient forward propagation for BCNNs;
  • Tensorflow lite:  TensorFlow Lite is an open source deep learning framework for on-device inference.;  
  • Core ML:  Reduce the storage used by the Core ML model inside your app bundle;
  • pytorch-tensor-decompositions:  PyTorch implementation of [1412.6553] and [1511.06530] tensor decomposition methods for convolutional layers;
  • tensorflow/quantize:  
  • mxnet/quantization:  This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN or CUDNN.
  • TensoRT4-Example:  
  • NAF-tensorflow:  "Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow;

others