AI-Based Monitoring Of Mental Health and Work Performance

Bharti. P. Ahuja; Ashutosh Bhagat; Riya Dhawale; Rohan Ghute; Sagar Pati

doi:10.47392/IRJAEH.2026.0460

Authors

Bharti. P. Ahuja Associate professor, Dept. of CSE, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
Ashutosh Bhagat UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
Riya Dhawale UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
Rohan Ghute UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author
Sagar Pati UG, Dept. of CO, Guru Gobind Singh College of Engineering and Research Centre, Maharashtra, India. Author

DOI:

https://doi.org/10.47392/IRJAEH.2026.0460

Keywords:

Audio Emotion Recognition, Wav2Vec2, Dual Head Personalization, Speaker Verification, Diarization, RAG, FastAPI, MongoDB, Qdrant, Whisper

Abstract

Emotion aware intelligent systems are increasingly using techniques based on spoken interaction data, but two major issues arise for practical deployments: Inter speaker interference in shared conversations and a lack of user specific adaptation. In this work, we introduce a production level multi-tenant backend architecture for audio emotion recognition with retrieval-augmented generation (RAG) and extend it with an incremental speaker aware module that isolates the owner’s voice to personalize predictions, allowing us to preserve the contextual entirety of conversation when retrieving similar tasks. The baseline pipeline consists of audio validation, preprocessing, Wav2Vec2 based embedding extraction, dual-head emotion inference (its global head plus user head), Whisper transcription where data is stored in MongoDB and Qdrant. And you apply a feedback loop and adaptive alpha blending to control personalization at a baseline the model starts out with global performance but goes user specific as corrected feedback is provided. For mixed-speaker sessions, we propose an Incremental V1 module, including VAD segmentation, segment-level speaker embeddings, clustering, owner verification and the construction of the owner only emotion path. This new enrolment structure exposes new endpoints and additive response metadata without breaking existing API contracts. The project presents an accuracy of 72.69% emotion (216 test samples) with practical low latency inference behavior, and places significant validation on speaker aware API and safety improvements based on extensive focused privacy tests covering 20 passing tests discrete to the set target environment. The resulting design axes deployability, personalization reliability and backward compatibility for real-world conversational AI systems

Downloads

Download data is not yet available.

AI-Based Monitoring Of Mental Health and Work Performance

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Language

Information

Make a Submission