應用深度學習的語音增強系統 | 專利查詢 | 國科會計畫補助科研產業化平台

專利類型

發明

專利國別 (專利申請國家)

中華民國

專利申請案號

109115334

專利證號

I 749547

專利獲證名稱

應用深度學習的語音增強系統

專利所屬機關 (申請機關)

元智大學

獲證日期

2021/12/11

技術說明

根據相關過往的研究經驗，傳統基於深度學習的語音增強系統會因為不同的語者與環境背景噪聲而降低系統性能。為了克服這些因素對系統帶來的負面影響，本論文提出一種基於深度學習之語者與語者環境感知語音增強系統(speaker and speaking environment-aware denoising neural network system, SEaDNN)，系統評估結果顯示此系統能夠有效提升語音增強在不同語者與當時與者背景環境變化之影響下的性能表現。 SEaDNN系統分成兩部分，第一部分是利用深度神經網路擷取語者特徵編碼以及語者所在環境的噪聲特徵編碼，第二部份則使用第一部份所得到的語音訊號特徵編碼，預測出增強後的語者的語音訊號。因為這些額外加入的特徵編碼，使得SEaDNN可以根據不同的語者以及語者所在環境提升了系統語音增強的性能。本研究使用TIMIT語音語料庫進行評估SEaDNN系統。評估結果指出，本研究提出之系統在未知語者以及未知環境噪聲的影響下仍然可以提升語音品質與清晰度，相較於其他傳統非監督式與監督式語音增強系統，展現出良好的可靠性與適應能力。 Previous studies indicated that noise and speaker variations can degrade the performance of deep learning based speech enhancement (SE) systems. To increase the system performances over environmental variations, we propose a deep learning based speaker and speaking environment aware speech enhancement system (SEaDNN) that integrates a deep neural network speech enhancement system with embedded speaker identity code and environmental noise code. The overall system first extracts embedded speaker identity features and environment features using a neural network model, then the deep neural network speech enhancement takes the augmented features as the input to generate the enhanced spectra. With the additional embedded features, the SE system can be guided to generate the optimal output corresponding to the speaker identity. We tested the proposed SE system on the TIMIT dataset. Experimental results show that the proposed SE system can improve the sound quality and intelligibility of speech signals from additive noise-corrupted utterances when compared with conventional supervised or unsupervised SE techniques and the noisy baseline. In addition, the further analyses suggest the system robustness for those unseen speakers when combining with speaker features.

備註

連絡單位 (專責單位/部門名稱)

產學合作組

連絡電話

(03)4638800#2286

網址

http://www.yzu.edu.tw/admin/rd/