AI-Based Monitoring Of Mental Health and Work Performance
DOI:
https://doi.org/10.47392/IRJAEH.2026.0460Keywords:
Audio Emotion Recognition, Wav2Vec2, Dual Head Personalization, Speaker Verification, Diarization, RAG, FastAPI, MongoDB, Qdrant, WhisperAbstract
Emotion aware intelligent systems are increasingly using techniques based on spoken interaction data, but two major issues arise for practical deployments: Inter speaker interference in shared conversations and a lack of user specific adaptation. In this work, we introduce a production level multi-tenant backend architecture for audio emotion recognition with retrieval-augmented generation (RAG) and extend it with an incremental speaker aware module that isolates the owner’s voice to personalize predictions, allowing us to preserve the contextual entirety of conversation when retrieving similar tasks. The baseline pipeline consists of audio validation, preprocessing, Wav2Vec2 based embedding extraction, dual-head emotion inference (its global head plus user head), Whisper transcription where data is stored in MongoDB and Qdrant. And you apply a feedback loop and adaptive alpha blending to control personalization at a baseline the model starts out with global performance but goes user specific as corrected feedback is provided. For mixed-speaker sessions, we propose an Incremental V1 module, including VAD segmentation, segment-level speaker embeddings, clustering, owner verification and the construction of the owner only emotion path. This new enrolment structure exposes new endpoints and additive response metadata without breaking existing API contracts. The project presents an accuracy of 72.69% emotion (216 test samples) with practical low latency inference behavior, and places significant validation on speaker aware API and safety improvements based on extensive focused privacy tests covering 20 passing tests discrete to the set target environment. The resulting design axes deployability, personalization reliability and backward compatibility for real-world conversational AI systems
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.