Voice Based Virtual Assistant Using Python
DOI:
https://doi.org/10.47392/IRJAEH.2025.0278Keywords:
Interactive AI, Real-Time Interaction, Intelligent Assistants, Voice-Controlled Systems, Multimodal Input Integration, Context, Human-Computer Interaction (HCI), Speech Recognition, Computer Vision (CV), Natural Language Processing (NPL)Abstract
Virtual assistants are playing an important part in human-computer interaction in today's digital era, providing users with a helpful means of accessing technology. Most virtual assistants today, however, such as Siri, Google Assistant, and Alexa, are single-modal, typically text or voice-based, and are not able to understand high-level interactions that need multiple-modal support. This project suggests a speech-enabled virtual assistant capable of processing more than a single input audio, video, and text simultaneously giving rise to more accurate, intelligent, and context-aware responses. Through the integration of Natural Language Processing (NLP), Computer Vision (CV), and Speech Recognition technologies, the assistant provides better user experience by grasping not only words spoken or text typed together with visual feedback such as facial expression, gesture, and object recognition. This multimodal strategy greatly enhances accuracy in practical applications, making the system more interactive and efficient. The motivation behind this project comes from the limitation of conventional virtual assistants that cannot process different types of input at the same time. For instance, voice interfaces get voice commands spoken during rain or traffic wrong, while text interfaces cannot sense emotion or urgency. Video input, however, can offer rich contextual information, such as identifying a user's expressions or environment, enabling the assistant to react more suitably. Through integration of these modalities, the system provides enhanced understanding and responsiveness, enabling human-computer interactions to be smoother and more natural.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.