Techniques de détection de l’activité vocale dans un canal de communication

Chelloug, Charaf Eddine; Farrouki, Atef

dc.contributor.author	Chelloug, Charaf Eddine
dc.contributor.author	Farrouki, Atef
dc.date.accessioned	2022-05-24T09:52:36Z
dc.date.available	2022-05-24T09:52:36Z
dc.date.issued	2020-06-16
dc.identifier.uri	http://depot.umc.edu.dz/handle/123456789/5791
dc.description.abstract	The main goal of the Voice Activity Detection (VAD) techniques is to distinguish between voiced regions and silent intervals in an audio communication. This step is crucial in most speech processing systems such as mobile communications, Voice over IP, speech recognition and hearing aid systems. This task seems to be relatively easy in a weakly disturbed environment. However, in the case of strongly noise, it becomes difficult to provide accurate information about the presence of active voice. In general, a VAD achieves the compression of silence intervals in modern communications systems by reducing the average bit rate via the discontinuous transmission mode (DTX). In this thesis, we describe the main standardized VAD methods mentioned in the literature; namely the VAD G.729-B approved by ITU-T in 1996, the AMR VAD (Adaptive Multi Rate), the AFE VAD (Advanced front-end) and SILK (developed by Skype). The VAD of the G.729-B standard generates a binary VAD decision for each frame as a function of four relevant parameters extracted from the audio signal. In a simplified way, these parameters are directly related to energy and spectral components of a voiced frame. The G.729-B is used in the majority of audio transmission applications, becoming the most popular VAD technique. Therefore, the G-729-B is today the best standard for comparative studies in most scientific articles dealing with VAD. In the 1st contribution, we propose a VAD scheme based on adaptive threshold while maintaining the False Acceptance Rate at a nominal value. As well known in the binary decision theory, the error rate, denoted ""False Acceptance Rate"", is related to the probability of misclassified a frame of silence as Active Voice. The basic idea is to perform sequential tests, based on full band energy, in order to reject or to accept the frame under investigation as active voice region. The most interesting feature of the proposed algorithm concerns its ability to dynamically update the noise level estimator, according to the current environment. Taking into account the long-term stationary property of the speech, we also developed a smoothing procedure to discard discontinuities that may appear in the processed signal. The performance of the proposed approach has been evaluated and compared to the VAD of the G.729-B in several situations including various environmental acoustic noises with different SNRs. Analysis of the results has been performed using the NOIZEUS experimental database as well as real recorded signals.The 2nd contribution consists of implementing the proposed approach on a microcontrollerbased system, in order to: • Ensure the robustness of the algorithm, • Evaluate its implementation complexity • Validate the real time operation mode In this context, various tests were conducted in real time mode via the development tools available on the microcontroller system (STM32F7). These tools allowed performing realtime monitoring of several signal parameters in realistic situations. By this way, we were able to accurately determine the processing time (Latency) required to generate a final decision for each frame. The real-time analysis allowed us to obtain a global latency of 4 μs, which seems sufficient to guarantee real-time operation regarding to the common sampling frequencies of speech processing systems (8 kHz to 16 kHz).
dc.language.iso	fr
dc.publisher	Université Frères Mentouri - Constantine 1
dc.subject	Electronique: Signaux et Systèmes de Télécommunications
dc.subject	détection d'activité vocale DAV
dc.subject	énergie
dc.subject	seuil adaptatif
dc.subject	Voice Activity Detection
dc.subject	Energy
dc.subject	Adaptatif threshold
dc.subject	تحديد النشاط الصوتي
dc.subject	الطاقة
dc.subject	عتبة ديناميكية
dc.title	Techniques de détection de l’activité vocale dans un canal de communication
dc.title.alternative	Comparaison aux standards de compression audio.
dc.type	Thesis