A Comprehensive Review on Advances in Large Language Models for Topic Modeling
DOI:
https://doi.org/10.47392/IRJAEH.2026.0011Keywords:
LLM, Topic Modeling, BERT, Text SummarizationAbstract
Topic Modeling (TM) is an unsupervised machine learning (ML) approach that extracts latent patterns/ topics/ words from a given large text dataset, to accomplish semantic text summarization. TM plays a vital role in extracting topics from the text datasets pertaining to various domains including bioinformatics, economics, healthcare and social media data analysis. This literature review has analyzed the ML-based approaches devised for topic modeling. This article specifically provides deeper insights into the chosen baseline and advanced approaches using Large Language Model (LLM). The article highlights the potential of LLM integration with Bidirectional Encoder Representations from Transformers (BERT) guided by the clustering-based approaches, for semantic clustering of topics and associated documents. A thorough investigation on those methods for TM is performed and documented, as a source of reference. In addition, the compiled details on the evaluation metrics for TM, presented in this article may serve as ready-reference for those researchers interested in TM. Further, this article highlights the limitations of the reviewed articles which will prompt the researcher to develop novel methodologies and metrics for efficient TM.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.