An Empirical Comparison of BERT and Lightweight Variants for IMDb Sentiment Classification

Authors

  • Xuanyu Wu Author

DOI:

https://doi.org/10.61173/kjazw478

Keywords:

Sentiment analysis, Transformer-based models, Model compression, IMDb movie reviews

Abstract

Pre-trained transformer models are now the most common choices for sentiment analysis and other text classification tasks. However, their large number of parameters and high inference cost make it hard to deploy them in limited resource settings. To solve this problem, several lightweight variants have been introduced, this paper presents a controlled empirical study of four transformerbased models - BERT-base, DistilBERT, TinyBERT, and ALBERT-base - on the IMDb movie review sentiment classification task. All models are fine-tuned under the same conditions. Besides standard metrics, total and persample inference time were measured as well. To better understand stability, the test set is divided into short and long review subsets based on word count and compare model performance across subsets. Results show an accuracy-efficiency trade-off. BERT-base achieves the highest test accuracy (86.3%), followed by ALBERT (85.7%), while DistilBERT is lower (85.2%) but offers about 2 times faster inference. TinyBERT is the fastest but has lower accuracy (81.3%). Across these models, performance on short reviews is higher than on long reviews. The performance drop from short to long texts is more severe for the smallest model, TinyBERT, finding that highly compressed models struggle more with long and complex texts. This paper discusses the implications of findings for selecting sentiment analysis models under different application requirements.

Downloads

Published

2026-02-28

Issue

Section

Articles