Large pretrained transformer models have revolutionized modern AI applications with their state-of-the-art performance in natural language processing (NLP). However, their substantial parameter count poses challenges for real-world deployment. To address this, researchers often reduce model size by pruning parameters based on their magnitude or sensitivity. Previous research has demonstrated the limitations of magnitude pruning, especially in the context of transfer learning for modern NLP tasks. In this paper, we introduce a new magnitude-based pruning algorithm called mixture Gaussian prior pruning (MGPP), which employs a mixture Gaussian prior for regularization. MGPP prunes non-expressive weights under the guidance of the mixture Gaussian prior, aiming to retain the model’s expressive capability. Extensive evaluations across various NLP tasks, including natural language understanding, question answering, and natural language generation, demonstrate the superiority of MGPP over existing pruning methods, particularly in high sparsity settings. Additionally, we provide a theoretical justification for the consistency of the sparse transformer, shedding light on the effectiveness of the proposed pruning method.
Abstract: Change point problem has been studied extensively since 1950s due to its broad applications in many fields such as finance, biology and so on. As a special case of the multiple change point problem, the epidemic change point problem has received a lot of attention especially in medical studies. In this paper, a nonparametric method based on the empirical likelihood is proposed to detect the epidemic changes of the mean after unknown change points. Under some mild conditions, the asymptotic null distribution of the empirical likelihood ratio test statistic is proved to be the extreme distribution. The consistency of the test is also proved. Simulations indicate that the test behaves comparable to the other available tests while it enjoys less constraint on the data distribution. The method is applied to the Standford heart transplant data and detects the change points successfully.