Survey on Focused Web Crawling Approaches for Ayurvedic Plant Information Retrieval

Authors

  • Sakshi Powale UG Scholar, Department of AI&DS, Rajiv Gandhi Institute of Tech, Mumbai, Maharashtra, India. Author
  • Asavari Desai UG Scholar, Department of AI&DS, Rajiv Gandhi Institute of Tech, Mumbai, Maharashtra, India. Author
  • Riya Desai UG Scholar, Department of AI&DS, Rajiv Gandhi Institute of Tech, Mumbai, Maharashtra, India. Author
  • Swati Sonavane UG Scholar, Department of AI&DS, Rajiv Gandhi Institute of Tech, Mumbai, Maharashtra, India. Author
  • Dr. Divya Tamma Assistant Professor, Department of AI&DS, Rajiv Gandhi Institute of Tech, Mumbai, Maharashtra, India. Author

DOI:

https://doi.org/10.47392/IRJAEH.2024.0359

Keywords:

Focused Web Crawling, Deep Learning, TRES Framework, Reinforcement Learning, Convolutional Neural Networks (CNN), Natural Language Processing (NLP)

Abstract

The Ayurvedic medical system is heavily reliant on medicinal plants, demanding correct information retrieval from the web. This study examines specific web crawling strategies for collecting useful information about Ayurvedic botanicals, with a focus on deep learning methodologies. Crawlers optimized with machine learning models retrieve domain-specific content while filtering out unnecessary data. The paper looks at approaches such as the TRES framework, which is a reinforcement learning-based crawler that discretizes vast state and action spaces in order to effectively choose ideal URLs. Furthermore, convolutional neural networks (CNN) and natural language processing (NLP) have been used in crawlers to improve categorization, as demonstrated by successful Turkish language processing applications. The paper "Learning to Crawl: Comparing Classification Schemes" conducts a comparative comparison of old rule-based approaches and newer deep learning classifiers, demonstrating the latter's superiority. In addition, a Naive Bayes classifier is employed in an Ayurvedic plant-focused crawler, which employs query expansion via a carefully curated thesaurus to improve relevancy in retrieved web pages. This poll emphasizes the need for more efficient, adaptive, and focused crawlers powered by deep learning to progress Ayurvedic research.

Downloads

Download data is not yet available.

Downloads

Published

2024-11-23

How to Cite

Survey on Focused Web Crawling Approaches for Ayurvedic Plant Information Retrieval. (2024). International Research Journal on Advanced Engineering Hub (IRJAEH), 2(11), 2607-2614. https://doi.org/10.47392/IRJAEH.2024.0359

Similar Articles

1-10 of 270

You may also start an advanced similarity search for this article.