Limitations of Support Vector Machine and Random Forest in Multi-Class Sentiment Analysis: Evidence from Neutral Sentiment Misclassification on Imbalanced Data

Ardy Wicaksono; Suyahman Suyahman; Muhammad Anwar Fauzi; Deny Prasetyo; Dwi Utari Iswavigra; Yulaikha Mar'atullatifah; Agatha Pricillia Sekar  Tamtomo; Muhammad Adi Pratama

doi:10.54082/jiki.323

Authors

Ardy Wicaksono Computer Science, Universitas Sugeng Hartono, Indonesia https://orcid.org/0000-0003-3418-2271
Suyahman Suyahman Computer Science, Universitas Sugeng Hartono, Indonesia
Muhammad Anwar Fauzi Digital Business, Universitas Sugeng Hartono, Indonesia
Deny Prasetyo Computer Science, Universitas Sugeng Hartono, Indonesia
Dwi Utari Iswavigra Computer Science, Universitas Sugeng Hartono, Indonesia
Yulaikha Mar'atullatifah Computer Science, Universitas Sugeng Hartono, Indonesia
Agatha Pricillia Sekar Tamtomo Digital Business, Universitas Sugeng Hartono, Indonesia
Muhammad Adi Pratama English Language and Culture, Universitas Sugeng Hartono, Indonesia

DOI:

https://doi.org/10.54082/jiki.323

Keywords:

Machine-Learning, Sentiment-Analysis, Support-Vector-Machine, Random-Forest, Text-Classification, Imbalanced-Data

Abstract

The rapid growth of mobile applications has generated large volumes of user reviews, making automated sentiment analysis essential for understanding user perceptions. Previous studies have shown that while machine learning models perform well in binary sentiment classification, they often struggle in multi-class settings, particularly in identifying neutral sentiment due to linguistic ambiguity and class imbalance. This study aims to comparatively evaluate the performance of Support Vector Machine (SVM) and Random Forest in multi-class sentiment analysis, with a specific focus on their ability to handle the neutral sentiment category. A supervised learning approach was employed using 2,112 Indonesian-language user reviews collected from the Google Play Store. The data were preprocessed using standard Natural Language Processing techniques and represented using TF-IDF features. Both models were trained and evaluated using accuracy, precision, recall, F1-score, and confusion matrices. The results indicate that SVM achieved an accuracy of 86.52%, outperforming Random Forest, which obtained 83.45%. However, both models completely failed to classify the neutral sentiment class, yielding zero precision and recall for this category. This failure highlights the dominant influence of severe class imbalance and insufficient feature discrimination for neutral sentiment. The findings underscore a critical limitation of traditional machine learning approaches in multi-class sentiment analysis and emphasize the need for improved strategies, such as data resampling, advanced feature representation, or hybrid models, to enhance neutral sentiment detection in real-world applications.

References

A. M. Zahran and B. Rolando, “Understanding The Influence Of Online Reviews On,” Int. J. Econ. Bus. Stud., vol. 2, no. 1, pp. 1–23, 2025, doi: https://doi.org/10.1234/ijebs.v2i1.109.

P. Weichbroth, “Usability of mobile applications: A systematic literature study,” IEEE Access, vol. 8, pp. 55563–55577, 2020, doi: 10.1109/ACCESS.2020.2981892.

M. Masumbika and N. Patrick, “Discover Sustainability Enhancing environmental decision ‑ making : a systematic review of data analytics applications in monitoring and management,” Discov. Sustain., 2024, doi: 10.1007/s43621-024-00510-0.

F. Faridh, “Respons Publik Terhadap Layanan Pengaduan SP4N LAPOR!: Analisis Sentimen Multi-Platform.,” J. Audiens, vol. 6, no. 2, pp. 295–311, 2025, doi: https://doi.org/10.18196/jas.v6i2.616.

T. Kaur and G. Kaur, “Sentiment Analysis for Low Resource Language Using Machine Learning and Deep Learning,” AIJR Proceeding, vol. 7, no. 6, pp. 179–185, 2025, doi: 10.21467/proceedings.7.6.

K. Alahmadi, S. Alharbi, J. Chen, and X. Wang, “Generalizing sentiment analysis : a review of progress , challenges , and emerging directions,” Soc. Netw. Anal. Min., vol. 15, no. 1, pp. 1–28, 2025, doi: 10.1007/s13278-025-01461-8.

Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 4, p. 102048, 2024, doi: 10.1016/j.jksuci.2024.102048.

S. Simonsen and S. Simonsen, “Social Actor Representation in Media Discourse : How Neutral Linguistic Cues Get Endowed with Meaning that Signifies Ethnicity,” Lang. Discourse Soc., vol. 11, no. 1, 2023.

M. Lango, “Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification : An Experimental Study,” Found. Comput. Decis. Sci, vol. 44, no. 2, pp. 151–178, 2019, doi: 10.2478/fcds-2019-0009.

R. Obiedat et al., “Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution,” IEEE Access, vol. 10, pp. 22260–22273, 2022, doi: 10.1109/ACCESS.2022.3149482.

J. C. Melaka, “Exploring Enhancement In Sentiment Analysis Using Descriptive-Semantic Techniques : A Systematic Literature Review,” J. Theor. Appl. Inf. Technol., vol. 103, no. 13, pp. 4819–4837, 2025.

M. Ramdan, A. Surya, and U. Hayati, “Analisis Sentimen Ulasan Pengguna Ovo Menggunakan,” JATI(Jurnal Mhs. Tek. Inform., vol. 8, no. 3, pp. 2780–2786, 2024.

F. Fitriana and H. Setiawan, “Performance Analysis of SVM In Emotion Classification : A Comparative Study Of TF-IDF and Countvectorizer,” J. Embed. Syst. Secur. Intell. Syst., vol. 6, no. 2, pp. 133–145, 2025.

S. Gholampour, “Impact of Nature of Medical Data on Machine and Deep Learning for Imbalanced Datasets : Clinical Validity of SMOTE Is Questionable,” MAKE Mach. Learn. Knowl. Extr., vol. 6, no. 2, pp. 827–841, 2024.

M. Owusu-Adjei, J. Ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digit. Heal., vol. 2, no. 10, pp. 1–19, 2023, doi: 10.1371/journal.pdig.0000290.

T. Cai and X. Zhang, “Imbalanced Text Sentiment Classification Based on MultiChannel BLTCN-BLSTM SelfAttention,” Sensors, vol. 23, no. 4, pp. 2–15, 2023.

P. Putra, M. K. Anam, A. Chan, A. Hadi, N. Hendri, and A. Masnur, “Optimizing Sentiment Analysis on Imbalanced Hotel Review Data Using SMOTE and Ensemble Machine Learning Techniques,” J. Appl. Data Sci., vol. 6, no. 2, pp. 921–935, 2025, doi: https://doi.org/10.47738/jads.v6i2.618.

A. Wilson and M. R. Anwar, “The Future of Adaptive Machine Learning Algorithms in High-Dimensional Data Processing,” Int. Trans. Artif. Intell., vol. 3, no. 1, pp. 97–107, 2024.

M. K. Anam, T. P. Lestari, H. Yenni, T. Nasution, and M. B. Firdaus, “Enhancement of Machine Learning Algorithm in Fine-grained Sentiment Analysis Using the Ensemble.pdf,” ECTITransactionsonComputerandInformationTechnology, vol. 19, no. 2, pp. 159–167, 2025, doi: https://doi.org/10.37936/ecti-cit.2025192.257815.

A. Tabassum and R. R. Patil, “A Survey on Text Pre-Processing & Feature Extraction Techniques in Natural Language Processing,” Int. Res. J. Eng. Technol., vol. 07, no. 06, pp. 4864–4867, 2020, [Online]. Available: www.irjet.net

Sunardi and Suyahman, “Analisis Komparasi Prediksi Serangan DDoS Menggunakan Machine Learning,” Proceeding Informatics Collab. Dessimenation Meet., vol. 1, no. 1, pp. 84–91, 2025, [Online]. Available: https://www.kaggle.com/datasets/oktayrdeki/ddos-traffic-dataset/

A. Javeed, A. L. Dallora, J. S. Berglund, A. Ali, L. Ali, and P. Anderberg, “Machine Learning for Dementia Prediction: A Systematic Review and Future Research Directions,” J. Med. Syst., vol. 47, no. 1, 2023, doi: 10.1007/s10916-023-01906-7.

W. Hussain et al., “Ensemble genetic and CNN model-based image classification by enhancing hyperparameter tuning,” Sci. Rep., vol. 15, no. 1, pp. 1–24, 2025, doi: 10.1038/s41598-024-76178-3.

S. Muawanah, U. Muzayanah, M. G. R. Pandin, M. D. S. Alam, and J. P. N. Trisnaningtyas, “Stress and Coping Strategies of Madrasah’s Teachers on Applying Distance Learning During COVID-19 Pandemic in Indonesia,” Qubahan Acad. J., vol. 3, no. 4, pp. 206–218, 2023, doi: 10.48161/Issn.2709-8206.

G. Papoutsoglou et al., “Machine learning approaches in microbiome research: challenges and best practices,” Front. Microbiol., vol. 14, no. 09, pp. 1–21, 2023, doi: 10.3389/fmicb.2023.1261889.

Limitations of Support Vector Machine and Random Forest in Multi-Class Sentiment Analysis: Evidence from Neutral Sentiment Misclassification on Imbalanced Data

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Sidebar

Information