Doc Assist: Intelligent Document Processing Assistance for Enhanced Accessibility
DOI:
https://doi.org/10.47392/IRJAEH.2025.0324Keywords:
OCR, BM25, Semantic Search, Word Embedding’s, Document RetrievalAbstract
This project presents a desktop assistant designed to retrieve information from non-machine-readable documents, such as scanned images and PDFs. Using Tesseract OCR, the system extracts text, and BM25 is employed for effective document ranking based on user-provided keywords. Additionally, word embeddings are integrated to improve semantic search accuracy. The application is built with Tkinter, offering an intuitive, offline experience. The system's architecture is optimized for quick document retrieval, ensuring minimal resource consumption while maintaining relevance. This documentation covers the design, implementation, and challenges encountered during development.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.