PhD Theses
No. 12 - Jonas Jungclaussen
Jonas Jungclaussen: Artificial Bandwidth Extension of Speech Signals using Neural Networks
Pdf-based submission (available freely via the MACAU system), 2021
Commission
- Prof. Dr.-Ing. Gerhard Schmidt
(first reviewer) - Prof. Dr.-Ing. Peter Jax
(second reviewer) - Prof. Dr.-Ing. Stephan Pachnicke
(examiner) - Prof. Dr.-Ing. Peter Höher
(head of the examination board)
Abstract
Although mobile broadband telephony has been standardized for over 15 years, many countries still do not have a nationwide network with good coverage. As a result, many cellphone calls are still downgraded to narrowband telephony. The resulting loss of quality can be reduced by artificial bandwidth extension. There has been great progress in bandwidth extension in recent years due to the use of neural networks. The topic of this thesis is the enhancement of artificial bandwidth extension using neural networks. A special focus is given to hands-free calls in a car, where the risk is high that the wideband connection is lost due to the fast movement.
The bandwidth of narrowband transmission is not only reduced towards higher frequencies above 3.5 kHz but also towards lower frequencies below 300 Hz. There are already methods that estimate the low-frequency components quite well, which will therefore not be covered in this thesis.
In most bandwidth extension algorithms, the narrowband signal is initially separated into a spectral envelope and an excitation signal. Both parts are then extended separately in order to finally combine both parts again. While the extension of the excitation can be implemented using simple methods without reducing the speech quality compared to wideband speech, the estimation of the spectral envelope for frequencies above 3.5 kHz is not yet solved satisfyingly. Current bandwidth extension algorithms are just able to reduce the quality loss due to narrowband transmission by a maximum of 50 % in most evaluations.
In this work, a modification for an existing method for excitation extension is proposed which achieves slight improvements while not generating additional computational complexity. In order to enhance the wideband envelope estimation with neural networks, two modifications of the training process are proposed. On the one hand, the loss function is extended with a discriminative part to address the different characteristics of phoneme classes. On the other hand, by using a GAN (generative adversarial network) for the training phase, a second network is added temporarily to evaluate the quality of the estimation.
The neural networks that were trained are compared in subjective and objective evaluations. A final listening test addressed the scenario of a hands-free call in a car, which was simulated acoustically. The quality loss caused by the missing high frequency components could be reduced by 60 % with the proposed approach.
No. 11 - Minh H. Pham
Minh H. Pham: Axial Movements in Older Adults and Patients with Parkinson’s Disease – Algorithm Development and Validation with Inertial Measurement Units Data
To appear soon, 2019
Commission
- Prof. Dr.-Ing. Gerhard Schmidt
(first reviewer) - Prof. Dr. med. Walter Maetzler
(second reviewer) - Prof. Dr.-Ing. Andreas Bahr
(examiner) - Prof. Dr.-Ing. Michael Höft
(head of the examination board)
Abstract
Movements that deviate from physiological performance are associated with many disabilities and reduce the ability to perform daily activities. These impaired movements are associated with e.g. aging and neurodegenerative diseases. An objective and quantitative evaluation of these impaired movements is of high clinical relevance, for both patients and the professional medical team that treats the patient. Moreover, assessment in the usual environment of the affected persons may be superior to assessments performed in the clinic and doctor’s practice, because the latter environments may lead to artificial results and can only be performed at certain time points.
The dynamic development of mobile technological devices has led to a new era of assessment in the medical field. Assessment of movements, especially axial (i.e. close to body center / trunk) movements are especially interesting for this development as sensors that detect movements accurately – e.g. accelerometers, gyroscopes and magnetometers – are especially far developed, reasonably priced and easily to integrate in mobile technology. However, there is a substantial lack of useful and, particularly, of validated algorithms for sensors and inertial measurement units that detect quantity and quality of specific movements in vulnerable cohorts. This work contributes to this area to such an extent, as it presents and discusses three algorithms that detect and evaluate specific movements detected with an inertial measurement unit (IMU) worn on the lower back by older adults and patients with Parkinson’s disease (PD). This work includes the evaluation of data from the supervised and unsupervised environment, and the validation of each algorithm.
No. 10 - Christin Baasch
Christin Baasch: Instrumentelle Analyse von Parkinson-Sprache
Shaker-Verlag, 2019
Commission
- Prof. Dr.-Ing. Gerhard Schmidt
(first reviewer) - Prof. Dr.-Ing. Sebastian Möller
(second reviewer) - Prof. Dr.-Ing. Stephan Pachnicke
(examiner) - Prof. Dr.-Ing. Jeffrey McCord
(head of the examination board)
Abstract
Parkinson’s Disease is one of the most frequent neurodegenerative diseases worldwide. Besides motor disorders, patients affected by this disease mostly suffer from a speech disorder named dysarthria.
It will be treated by a speech therapist with a speech therapy, its success as well as the progress of the dysarthria shall be documented. Therefore, a multitude of different methods are available to do so, but all of them have one thing in common: they are not completely objective, because not fully automatic. There ist always a subjective component, where a rater or another person influences the process.
This work presents a system, named SINAS, for fully automatic rating of the dysarthria. The system contains two main components: a recording tool and an analysis tool. The first one gives the possibility to the speech therapist to guide the patient easy and with visual aid by HTML pages through different speech tasks. Thereby the recordings will be robust in level and independent of the position of the microphone.
In the analysis tool acoustic measures are calculated from the recordings, which are intended to evaluate the three clusters of symptoms of dysarthria. These measures form the entry of a neural network, which gives an NTID rating as a result. The NTID scale rates the inteligibility of the recording and therefore the dysarthria of the patient in six steps. The validation of the tool is done by comparison of the results with a survey, where people rated the recordings of Parkinson patients according to the NTID scale, the mean value for each recording is then taken as a reference. As cost functions for evaluating the developed system the correlation, the mean absolute error, as well as the variance of the error are taken, on the basis of these functions the system will be optimized.
For further evaluation and to take into account the uncertainty of the raters, the epsilon insensitive RMSE is used to evaluate the performance of the system. This clearly shows the possibility of a fully automatic NTID rating of the patients with the presented SINAS system.
The developed tool can now form the basis for many applications to support the speech therapy of Parkinson patients.
No. 9 - Philipp Bulling
Phlipp Bulling: Rückkopplungsunterdrückung für Innenraumkommunikationssysteme
Pdf-based submission (available freely via the MACAU system), 2018
Commission
- Prof. Dr.-Ing. Gerhard Schmidt
(first reviewer) - Prof. Dr.-Ing. Jürgen Freudenberger
(second reviewer) - Prof. Dr.-Ing. habil. Thomas Meurer
(examiner) - Prof. Dr.-Ing. Michael Höft
(examiner) - Prof. Dr.-Ing. Jeffrey McCord
(head of the examination board)
Abstract
The communication between the passengers inside a car can be difficult due to large background noise levels. It can be improved with so-called in-car communication systems. These systems capture the voice of talkers by means of microphones and play it back via loudspeakers close to the listeners. However, the challenge is the electro-acoustic feedback, which occurs when the microphone not only captures the local speech but also the loudspeaker signal. Without countermeasures, this feedback results in annoying howling sounds.
The problem of the electro-acoustic feedback has not yet been solved for in-car communication systems. Therefore, in this work techniques to suppress the feedback by means of digital signal processing are presented. The main part of this work focuses on adaptive feedback cancellation. Here, the impulse response between loudspeaker and microphone is estimated with an adaptive filter. The difficulty is a strong correlation between loudspeaker and local speech that prevents the adaptive filter from converging towards the desired solution. In order to improve convergence, a novel stepsize control is presented. As signals are not correlated during reverberation, the stepsize control exploits reverberant signal periods to update the filter coefficients. In addition to the adaptive feedback canceler, a postfilter is presented. The task of the postfilter is to suppress the residual feedback that remains after the feedback cancellation, by means of a Wiener-filter. Therefore, the postfilter is controlled depending on the adaptive filter's state of convergence. Finally, two techniques to improve the speech quality are presented. Firstly, an automatic equalizer is described that improves the sound quality. Secondly, it is shown that speech intelligibility can be improved by adding harmonics to a speech signal.
Besides the theoretical investigations, in this work also the practical realization of the algorithms is regarded. Therefore, the algorithms are integrated into a specially developed real-time framework and tested in demonstration cars under realistic conditions during numerous test drives. These test drives show a significant increase of both stability and speech quality compared to existing approaches.
No. 8 - Jens Reermann
Jens Reermann: Signalverarbeitung für magnetoelektrische Sensorsysteme
Shaker-Verlag, 2017
Commission
- Prof. Dr.-Ing. Gerhard Schmidt
(first reviewer) - Prof. Dr. rer. nat. habil. Franz Faupel
(second reviewer) - Prof. Dr.-Ing. Dr.-Ing. habil. Robert Weigel
(third reviewer) - Prof. Dr.-Ing. Michael Höft
(examiner) - Prof. Dr.-Ing. habil. Eckhard Quandt
(head of the examination board)
Abstract
The measurement of magnetic fields for medical diagnostics is only well-established at highly specialized centers because of the high costs involved. The reason for this is the indispensable use of highly sensitive magnetic field sensors based on Super-Conducting Quantum Interference Devices. Although such systems have met the necessary technical requirements for decades, they are nonetheless expensive and very complicated to run because of cryogenic cooling. To establish the widespread use of magnetic measurements in the field of medicine, concepts for sensors that are uncooled, and thereby less expensive and user-friendly, are being researched with detection limits sufficient for measurements. A promising area of research deals with magnetoelectric sensors (ME-sensors).
To increase the usability of such sensors in realistic measurement environments and improve their signal quality with respect to the signal-to-noise ratio (SNR), this thesis examines various methods of signal processing. First, the basic procedures for measuring magnetic signals using the ME-sensors are presented. Special attention is paid to the modelling of sensor systems, the determination of the operation point, and the reduction of the signal dynamic. Due to their cantilever design, the ME-sensors have a high mechanic cross-sensitivity. Furthermore, they also measure magnetic fields of disturbing sources. To reduce their influence, the work presented here investigates different approaches based on noise cancellation. The use of a magnetic reference successfully cancels magnetic disturbances. With regard to acoustic or mechanical disturbances, various reference sensors are considered.
Irrespective of the distortion type, their influence can be reduced by up to 40 dB. Additionally, combination approaches are also investigated. These approaches are based on the idea of utilizing different frequency ranges in parallel and subsequently combining the sensor readout signals. By means of such methods, the detection limit of the sensors can be improved by more than 5 dB. In addition to this static improvement, another decisive advantage is achieved with dynamically adapting the combination. If a continuous data stream is not required and the desired signal has in principle a periodic nature, several averaging methods for an improved detection limit are discussed. In the same way, adaptive implementation of the averaging process can reduce the crosssensitivity.
These methods enabled the first biomagnetic measurement with an MEsensor by detecting the R-wave as part of a magnetocardiogram. All in all, each processing step permits continued improvement of the sensor signal with regard to their SNR. The usability of the ME-sensors in real measurement environments is thereby significantly improved.
More Articles ...
Page 2 of 4