Harnessing Ensemble Techniques for Sentiment Analysis and Toxic Comment Classification
DOI:
https://doi.org/10.47392/IRJAEH.2025.0119Keywords:
Machine learning, Natural language processing, Multi-label classification, Fasttext, Ensemble learning, Sentiment analysis, Toxic comment classificationAbstract
The rapid growth of user-generated content on digital platforms has raised concerns over toxic comments, which can disrupt online interactions. Sentiment analysis and toxic comment classification play a crucial role in moderating such content; however, traditional models often struggle with class imbalance, contextual ambiguity, and linguistic complexity, leading to inaccurate predictions. While machine learning and deep learning models have been widely applied, individual models frequently lack generalizability across diverse comment structures and sentiments. This research introduces FusionBoost, an ensemble learning approach that integrates Logistic Regression (LR) and XGBoost, leveraging their complementary strengths for improved predictive performance. The dataset undergoes rigorous preprocessing, including tokenization, stopword removal, and FastText embeddings, ensuring effective feature representation. Experimental results indicate that FusionBoost outperforms individual classifiers, significantly reducing false negatives in toxicity detection and improving sentiment classification accuracy. The study underscores the effectiveness of ensemble learning in addressing contextual challenges and enhancing model interpretability. Future research may explore transformer-based architectures like BERT to further refine classification performance. This work contributes to the development of more robust and interpretable natural language processing (NLP) models, facilitating safer and more meaningful digital interactions.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.