Topic Modeling Empowered by a Deep Learning Framework Integrating BERTopic, XLM-R, and GPT
Keywords:
Topic Modeling, Language modeling, XLM-R, GPTAbstract
Topic modeling facilitates the identification of hidden themes and patterns in large text collections. It enables a thorough investigation of the messages contained in texts. Topic modeling is a popular research subject, with several translations already being investigated, including English and Arabic. However, there is a need for more research into low-resource languages, including Urdu. In this study, we propose using the BERTopic, XLM-R, and GPT frameworks on Urdu text. The proposed approach, which includes fine-tuned BERT, XLM-R, and GPT models, aims to capture the contextual nuances and grammatical intricacies of Urdu text. In this investigation, we used existing Urdu textual data. We evaluated the performance of our proposed approaches to existing techniques such as LDA and NMF utilizing coherence and diversity measures. The results show that our proposed strategy outperforms existing methods, with an average coherence improvement of 0.05 and a diversity score of 0.87. These findings demonstrate the efficacy of the proposed approach in extracting significant topics from Urdu texts, hence assisting scholarly endeavors in comparative studies of Urdu translations. Integrating real-time Urdu topic modeling into social media and news monitoring systems can help in trend analysis, misinformation detection, and sentiment-aware content moderation. Another practical application is the incorporation of topic modeling in Urdu search engines and recommendation systems, improving information retrieval for Urdu-speaking users.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License