No. 12 - Jonas Jungclaussen

Jonas Jungclaussen: Artificial Bandwidth Extension of Speech Signals using Neural Networks

Pdf-based submission (available freely via the MACAU system), 2021

Commission

Prof. Dr.-Ing. Gerhard Schmidt
(first reviewer)
Prof. Dr.-Ing. Peter Jax
(second reviewer)
Prof. Dr.-Ing. Stephan Pachnicke
(examiner)
Prof. Dr.-Ing. Peter Höher
(head of the examination board)

Abstract

Although mobile broadband telephony has been standardized for over 15 years, many countries still do not have a nationwide network with good coverage. As a result, many cellphone calls are still downgraded to narrowband telephony. The resulting loss of quality can be reduced by artificial bandwidth extension. There has been great progress in bandwidth extension in recent years due to the use of neural networks. The topic of this thesis is the enhancement of artificial bandwidth extension using neural networks. A special focus is given to hands-free calls in a car, where the risk is high that the wideband connection is lost due to the fast movement.

The bandwidth of narrowband transmission is not only reduced towards higher frequencies above 3.5 kHz but also towards lower frequencies below 300 Hz. There are already methods that estimate the low-frequency components quite well, which will therefore not be covered in this thesis.

In most bandwidth extension algorithms, the narrowband signal is initially separated into a spectral envelope and an excitation signal. Both parts are then extended separately in order to finally combine both parts again. While the extension of the excitation can be implemented using simple methods without reducing the speech quality compared to wideband speech, the estimation of the spectral envelope for frequencies above 3.5 kHz is not yet solved satisfyingly. Current bandwidth extension algorithms are just able to reduce the quality loss due to narrowband transmission by a maximum of 50 % in most evaluations.

In this work, a modification for an existing method for excitation extension is proposed which achieves slight improvements while not generating additional computational complexity. In order to enhance the wideband envelope estimation with neural networks, two modifications of the training process are proposed. On the one hand, the loss function is extended with a discriminative part to address the different characteristics of phoneme classes. On the other hand, by using a GAN (generative adversarial network) for the training phase, a second network is added temporarily to evaluate the quality of the estimation.

The neural networks that were trained are compared in subjective and objective evaluations. A final listening test addressed the scenario of a hands-free call in a car, which was simulated acoustically. The quality loss caused by the missing high frequency components could be reduced by 60 % with the proposed approach.

Visit of the Hans Böckler Foundation

The Hans Böckler Foundation offers students not only financial support, but also a wide range of seminars. We had the pleasure of taking part in an exciting seminar on the topic of ‘Data channels in the seabed - insights into the underwater infrastructure of the future’. We spent one day at the Faculty of Engineering at Kiel University. We were given presentations on various topics. Such as the geology of the seabed and the submarine cable incident between Finland and Estonia.

We were particularly impressed by the opportunity to take a look behind the scenes and experience the work of the students and researchers up close. We were allowed to visit the clean room and the special ‘paddling pool’ where experiments are tested directly in water. We were deeply impressed by the university and its diverse research opportunities. Thank you very much for your hospitality and the exciting insights!

Text and photo by Luise Artmann

No. 12 - Jonas Jungclaussen

Commission

Abstract

Contact