Seminar "Selected Topics in Speech and Audio Signal Processing"
| Basic Information | |
|---|---|
| Lecturers: | Gerhard Schmidt and group |
| Semester: | Summer term |
| Language: | English or German |
| Target group: | Master students in electrical engineering and computer engineering |
| Prerequisites: | Fundamentals in digital signal processing |
| Registration procedure: |
If you want to sign up for this seminar, you need to register with the following information in the registration form
Please note that the registration period starts 01.04.2025 at 10:00 h and ends 13.04.2025 at 23:59 h. All applications before and after this registration period will not be taken into account. Registration will be possible within the before mentioned time by sending an e-mail with the desired seminar topic, name and matriculation number to Only one student per topic is permitted (first come - first serve). The registration is binding. A deregistration is only possible by sending an e-mail with your name and matriculation number to |
| Time: | Preliminary meeting per arrangement with individual supervisor Written report due on xx.xx.2025 Final presentations, xx.xx.2025 at xx:xx h, xx |
| Contents: |
Students write a scientific report on a topic closely related to the current research of the DSS group. Potential topics, therefore, deal with digital signal processing related to speech and audio signal processing. Students will also present their findings in front of the other participants and the DSS group. |
Topics for SoSe 25
| Topic title | Description |
|---|---|
| Real-time Formant Extraction in Pathological Speech | Formants, which are the resonant frequencies of the vocal tract, play a crucial role in speech production and perception. The real-time extraction of formants is especially important in applications such as pathological speech processing, where it aids in diagnosing and monitoring speech disorders like dysarthria, apraxia, and other conditions that affect speech clarity and quality. Various methods for real-time formant extraction, including both classical signal processing techniques and modern machine learning-driven approaches, should be compared. Understanding the advantages and limitations of these methods will help in selecting the most appropriate approach for different speech processing tasks.
|
| Speech Evaluation Metrics | The assessment of speech quality and intelligibility is of great interest, both in speech therapy and in the evaluation of algorithms for improving speech signals with regard to interference factors such as noise and reverberation. The aim of this seminar paper is to summarize and compare various objective assessment methods. In addition to traditional assessment criteria such as STOI, PESQ, etc., approaches based on neural networks should also be used for the comparison.
|
| Auditory Feedback Modulation | Auditory feedback, which involves the real-time processing and modulation of sound signals, plays a crucial role in modifying voice quality, particularly in medical contexts. It is especially important for applications in speech therapy and rehabilitation, where controlling and enhancing voice quality can aid in diagnosing and treating speech disorders. Real-time modification of voice quality can be achieved through a range of techniques, including signal processing, and machine learning algorithms. Challenges and innovations involved in developing systems that dynamically adjust voice quality based on the user's specific needs and environmental factors will be explored. By reviewing existing literature on voice quality modification techniques, the research aims to provide a comprehensive understanding of the current approaches and their effectiveness in the treatment of speech disorders, such as dysphonia or other vocal impairments.
|
| Diffusion-based Neural Networks for Speech Enhancement | In many real-world applications, such as telephone conversations, video conferencing, or hearing aids, speech quality is often compromised by background noise. Conventional speech enhancement methods reach their limits, especially in the presence of severe distortion or complex background noise. Diffusion-based neural networks offer a promising approach by gradually reconstructing speech signals while preserving fine acoustic details. The aim of this seminar is to explore how these models work and what advantages they offer over traditional approaches.
|
Formants, which are the resonant frequencies of the vocal tract, play a crucial role in speech production and perception. The real-time extraction of formants is especially important in applications such as pathological speech processing, where it aids in diagnosing and monitoring speech disorders like dysarthria, apraxia, and other conditions that affect speech clarity and quality. Various methods for real-time formant extraction, including both classical signal processing techniques and modern machine learning-driven approaches, should be compared. Understanding the advantages and limitations of these methods will help in selecting the most appropriate approach for different speech processing tasks.
The assessment of speech quality and intelligibility is of great interest, both in speech therapy and in the evaluation of algorithms for improving speech signals with regard to interference factors such as noise and reverberation. The aim of this seminar paper is to summarize and compare various objective assessment methods. In addition to traditional assessment criteria such as STOI, PESQ, etc., approaches based on neural networks should also be used for the comparison.
Auditory feedback, which involves the real-time processing and modulation of sound signals, plays a crucial role in modifying voice quality, particularly in medical contexts. It is especially important for applications in speech therapy and rehabilitation, where controlling and enhancing voice quality can aid in diagnosing and treating speech disorders. Real-time modification of voice quality can be achieved through a range of techniques, including signal processing, and machine learning algorithms. Challenges and innovations involved in developing systems that dynamically adjust voice quality based on the user's specific needs and environmental factors will be explored. By reviewing existing literature on voice quality modification techniques, the research aims to provide a comprehensive understanding of the current approaches and their effectiveness in the treatment of speech disorders, such as dysphonia or other vocal impairments.
In many real-world applications, such as telephone conversations, video conferencing, or hearing aids, speech quality is often compromised by background noise. Conventional speech enhancement methods reach their limits, especially in the presence of severe distortion or complex background noise. Diffusion-based neural networks offer a promising approach by gradually reconstructing speech signals while preserving fine acoustic details. The aim of this seminar is to explore how these models work and what advantages they offer over traditional approaches.
Visit of the Hans Böckler Foundation