發明
中華民國
109115334
I 749547
應用深度學習的語音增強系統
元智大學
2021/12/11
根據相關過往的研究經驗,傳統基於深度學習的語音增強系統會因為不同的語者與環境背景噪聲而降低系統性能。 為了克服這些因素對系統帶來的負面影響,本論文提出一種基於深度學習之語者與語者環境感知語音增強系統(speaker and speaking environment-aware denoising neural network system, SEaDNN),系統評估結果顯示此系統能夠有效提升語音增強在不同語者與當時與者背景環境變化之影響下的性能表現。 SEaDNN系統分成兩部分,第一部分是利用深度神經網路擷取語者特徵編碼以及語者所在環境的噪聲特徵編碼,第二部份則使用第一部份所得到的語音訊號特徵編碼,預測出增強後的語者的語音訊號。 因為這些額外加入的特徵編碼,使得SEaDNN可以根據不同的語者以及語者所在環境提升了系統語音增強的性能。 本研究使用TIMIT語音語料庫進行評估SEaDNN系統。評估結果指出,本研究提出之系統在未知語者以及未知環境噪聲的影響下仍然可以提升語音品質與清晰度,相較於其他傳統非監督式與監督式語音增強系統,展現出良好的可靠性與適應能力。 Previous studies indicated that noise and speaker variations can degrade the performance of deep learning based speech enhancement (SE) systems. To increase the system performances over environmental variations, we propose a deep learning based speaker and speaking environment aware speech enhancement system (SEaDNN) that integrates a deep neural network speech enhancement system with embedded speaker identity code and environmental noise code. The overall system first extracts embedded speaker identity features and environment features using a neural network model, then the deep neural network speech enhancement takes the augmented features as the input to generate the enhanced spectra. With the additional embedded features, the SE system can be guided to generate the optimal output corresponding to the speaker identity. We tested the proposed SE system on the TIMIT dataset. Experimental results show that the proposed SE system can improve the sound quality and intelligibility of speech signals from additive noise-corrupted utterances when compared with conventional supervised or unsupervised SE techniques and the noisy baseline. In addition, the further analyses suggest the system robustness for those unseen speakers when combining with speaker features.
產學合作組
(03)4638800#2286
版權所有 © 國家科學及技術委員會 National Science and Technology Council All Rights Reserved.
建議使用IE 11或以上版本瀏覽器,最佳瀏覽解析度為1024x768以上|政府網站資料開放宣告
主辦單位:國家科學及技術委員會 執行單位:台灣經濟研究院 網站維護:台灣經濟研究院