this collecting the papers (main from arxiv.org) about Model compression: Structure; Distillation; Binarization; Quantization; Pruning; Low Rank Approximation.
also, some papers and links collected from below, they are all awesome resources:
J. L. Holi and J. N. Hwang. [Finite precision error analysis of neural network hardware implementations]. In Ijcnn-91- Seattle International Joint Conference on Neural Networks, pages 519–525 vol.1, 1993.
【Quantization】 Jegou, Herve, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search IEEE transactions on pattern analysis and machine intelligence 33.1 (2011): 117-128.
2012
D. Hammerstrom. [A vlsi architecture for highperformance, low-cost, on-chip learning]. In IJCNN International Joint Conference on Neural Networks, pages 537– 544 vol.2, 2012.
2013
M. Denil, B. Shakibi, L. Dinh, N. de Freitas, et al. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, pages 2148–2156, 2013
2014
K. Hwang and W. Sung. [Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1]. In 2014 IEEE Workshop on Signal Processing Systems (SiPS), pages 1–6. IEEE, 2014.
Y. Chen, N. Sun, O. Temam, T. Luo, S. Liu, S. Zhang, L. He, J.Wang, L. Li, and T. Chen. Dadiannao: A machinelearning supercomputer. In Ieee/acm International Symposium on Microarchitecture, pages 609–622, 2014.
【Distillation】Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. Dark knowledge .[C]Presented as the keynote in BayLearn 2 (2014).
【Distillation】driana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio .FitNets: Hints for Thin Deep Nets .[J] arXiv preprint arXiv:1412.06550
Zhang. [Optimizing fpga-based accelerator design for deep convolutional neural networks.] In Proceedings of the 2015 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, FPGA ’15, 2015.
【Low Rank Approximation】 Yang Z, Moczulski M, Denil M, et al. Deep fried convnets .[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1476-1483.
V. Lebedev and V. Lempitsky. Fast convnets using groupwise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2554– 2564, 2016.
S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Xie, Y. Chen, and T. Chen. [Cambricon: An instruction set architecture for neural networks]. SIGARCH Comput. Archit. News, 44(3), June 2016.
K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi. [Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks]. In Design Automation Conference, page 124, 2016.
H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From highlevel deep neural models to fpgas. In Ieee/acm International Symposium on Microarchitecture, pages 1–12, 2016.
【Structure】Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally .DSD: Dense-Sparse-Dense Training for Deep Neural Networks .[J] arXiv preprint arXiv:1607.04381
D. Nguyen, D. Kim, and J. Lee. [Double MAC: doubling the performance of convolutional neural networks on modern fpgas]. In Design, Automation and Test in Europe Conference and Exhibition, DATE 2017, Lausanne, Switzerland, March 27-31, 2017, pages 890–893, 2017.
Edward. [Lognet: Energy-efficient neural networks using logarithmic computation]. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5900–5904, 2017.
H. Sim and J. Lee. [A new stochastic computing multiplier with application to deep convolutional neural networks]. In Design Automation Conference, page 29, 2017.
J. H. Ko, B. Mudassar, T. Na, and S. Mukhopadhyay. [Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation]. In Design Automation Conference, page 59, 2017.
L. Chen, J. Li, Y. Chen, Q. Deng, J. Shen, X. Liang, and L. Jiang.[ Accelerator-friendly neural-network training: Learning variations and defects in rram crossbar]. In Design, Automation and Test in Europe Conference and Exhibition, pages 19–24, 2017.
M. Price, J. Glass, and A. P. Chandrakasan. [14.4 a scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating.] In Solid-State Circuits Conference, pages 244–245, 2017.
P. Wang and J. Cheng. Fixed-point factorized networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
S. Venkataramani, A. Ranjan, S. Banerjee, D. Das, S. Avancha, A. Jagannathan, A. Durg, D. Nagaraj, B. Kaul, P. Dubey, and A. Raghunathan. [Scaledeep: A scalable compute architecture for learning and evaluating deep networks]. SIGARCH Comput. Archit. News, 45(2):13–26, June 2017.
Y. Ma, M. Kim, Y. Cao, S. Vrudhula, J. S. Seo, Y. Ma, M. Kim, Y. Cao, S. Vrudhula, and J. S. Seo. [End-to-end scalable fpga accelerator for deep residual networks.] In IEEE International Symposium on Circuits and Systems, pages 1–4, 2017.
Y. Ma, Y. Cao, S. Vrudhula, and J. S. Seo. [An automatic rtl compiler for high-throughput fpga implementation of diverse deep convolutional neural networks]. In International Conference on Field Programmable Logic and Applications, pages 1–8, 2017.
G. Li, F. Li, T. Zhao, and J. Cheng. [Block convolution: Towards memory-efficeint inference of large-scale cnns on fpga]. In Design Automation and Test in Europe, 2018.
Q. Hu, P.Wang, and J. Cheng. From hashing to cnns: Training binary weight networks via hashing. In AAAI, February 2018.
J. Cheng, J. Wu, C. Leng, Y. Wang, and Q. Hu. [Quantized cnn: A unified approach to accelerate and compress convolutional networks]. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), PP:1–14.
Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg .Distilling Policy Distillation .[J] arXiv preprint arXiv:1902.02186.
Stefan Uhlich, Lukas Mauch, Kazuki Yoshiyama, Fabien Cardinaux, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura .Differentiable Quantization of Deep Neural Networks .[J] arXiv preprint arXiv:1905.11452.
Simon Wiedemann, Heiner Kirchoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinc, David Neumann, Tung Nguyen, Ahmed Osman, Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Wojciech Samek .DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks .[J] arXiv preprint arXiv:1907.11900.
Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, David Brooks .MASR: A Modular Accelerator for Sparse RNNs .[J] arXiv preprint arXiv:1908.08976.