AI-Based Monitoring Of Mental Health and Work Performance

Authors

  • Bharti. P. Ahuja Associate professor, Dept. of CSE, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
  • Ashutosh Bhagat UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
  • Riya Dhawale UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
  • Rohan Ghute UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
  • Sagar Pati UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author

DOI:

https://doi.org/10.47392/IRJAEH.2026.0460

Keywords:

Audio Emotion Recognition, Wav2Vec2, Dual Head Personalization, Speaker Verification, Diarization, RAG, FastAPI, MongoDB, Qdrant, Whisper

Abstract

Emotion aware intelligent systems are increasingly using techniques based on spoken interaction data, but two major issues arise for practical deployments: Inter speaker interference in shared conversations and a lack of user specific adaptation. In this work, we introduce a production level multi-tenant backend architecture for audio emotion recognition with retrieval-augmented generation (RAG) and extend it with an incremental speaker aware module that isolates the owner’s voice to personalize predictions, allowing us to preserve the contextual entirety of conversation when retrieving similar tasks. The baseline pipeline consists of audio validation, preprocessing, Wav2Vec2 based embedding extraction, dual-head emotion inference (its global head plus user head), Whisper transcription where data is stored in MongoDB and Qdrant. And you apply a feedback loop and adaptive alpha blending to control personalization at a baseline the model starts out with global performance but goes user specific as corrected feedback is provided. For mixed-speaker sessions, we propose an Incremental V1 module, including VAD segmentation, segment-level speaker embeddings, clustering, owner verification and the construction of the owner only emotion path. This new enrolment structure exposes new endpoints and additive response metadata without breaking existing API contracts. The project presents an accuracy of 72.69% emotion (216 test samples) with practical low latency inference behavior, and places significant validation on speaker aware API and safety improvements based on extensive focused privacy tests covering 20 passing tests discrete to the set target environment. The resulting design axes deployability, personalization reliability and backward compatibility for real-world conversational AI systems

Downloads

Download data is not yet available.

Downloads

Published

2026-05-12

How to Cite

AI-Based Monitoring Of Mental Health and Work Performance. (2026). International Research Journal on Advanced Engineering Hub (IRJAEH), 4(05), 3541-3544. https://doi.org/10.47392/IRJAEH.2026.0460

Similar Articles

31-40 of 463

You may also start an advanced similarity search for this article.