To see the other types of publications on this topic, follow the link: Speech intelligibility enhancement.

Journal articles on the topic 'Speech intelligibility enhancement'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech intelligibility enhancement.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Nemade, Milind U., and Satish K. Shah. "Speech Enhancement Techniques: Quality vs. Intelligibility." International Journal of Future Computer and Communication 3, no. 3 (2014): 216–21. http://dx.doi.org/10.7763/ijfcc.2014.v3.299.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kates, James M. "Speech intelligibility enhancement." Journal of the Acoustical Society of America 83, no. 6 (1988): 2474. http://dx.doi.org/10.1121/1.396313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Yu Xiang, and Jian Fen Ma. "Speech Intelligibility Enhancement Using Distortion Control." Advanced Materials Research 912-914 (April 2014): 1391–94. http://dx.doi.org/10.4028/www.scientific.net/amr.912-914.1391.

Full text
Abstract:
In order to improve the intelligibility of the noisy speech, a novel speech enhancement algorithm using distortion control is proposed. The reason why current speech enhancement algorithm cannot improve speech intelligibility is that these algorithms aim to minimize the overall distortion of the enhanced speech. However, different speech distortions make different contributions to the speech intelligibility. The distortion in excess of 6.02dB has the most detrimental effects on speech intelligibility. In the process of noise reduction, the type of speech distortion can be determined by signal distortion ratio. The distortion in excess of 6.02dB can be properly controlled via tuning the gain function of the speech enhancement algorithm. The experiment results show that the proposed algorithm can improve the intelligibility of the noisy speech considerably.
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Peng, and Jian Fen Ma. "A Higher Intelligibility Speech-Enhancement Algorithm." Applied Mechanics and Materials 321-324 (June 2013): 1075–79. http://dx.doi.org/10.4028/www.scientific.net/amm.321-324.1075.

Full text
Abstract:
A higher intelligibility speech-enhancement algorithm based on subspace is proposed. The majority existing speech-enhancement algorithms cannot effectively improve enhanced speech intelligibility. One important reason is that they only use Minimum Mean Square Error (MMSE) to constrain speech distortion but ignore that speech distortion region differences have a significant effect on intelligibility. A priori Signal Noise Ratio (SNR) and gain matrix were used to determine the distortion region. Then the gain matrix was modified to constrain the magnitude spectrum of the amplification distortion in excess of 6.02 dB which damages intelligibility much. Both objective evaluation and subjective audition show that the proposed algorithm does improve the enhanced speech intelligibility.
APA, Harvard, Vancouver, ISO, and other styles
5

Yi, Astrid, Willy Wong, and Moshe Eizenman. "Gaze Patterns and Audiovisual Speech Enhancement." Journal of Speech, Language, and Hearing Research 56, no. 2 (2013): 471–80. http://dx.doi.org/10.1044/1092-4388(2012/10-0288).

Full text
Abstract:
Purpose In this study, the authors sought to quantify the relationships between speech intelligibility (perception) and gaze patterns under different auditory–visual conditions. Method Eleven subjects listened to low-context sentences spoken by a single talker while viewing the face of one or more talkers on a computer display. Subjects either maintained their gaze at a specific distance (0°, 2.5°, 5°, 10°, and 15°) from the center of the talker's mouth (CTM) or moved their eyes freely on the computer display. Eye movements were monitored with an eye-tracking system, and speech intelligibility was evaluated by the mean percentage of correctly perceived words. Results With a single talker and a fixed point of gaze, speech intelligibility was similar for all fixations within 10° of the CTM. With visual cues from two talker faces and a speech signal from one of the talkers, speech intelligibility was similar to that of a single talker for fixations within 2.5° of the CTM. With natural viewing of a single talker, gaze strategy changed with speech-signal-to-noise ratio (SNR). For low speech-SNR, a strategy that brought the point of gaze directly to within 2.5° of the CTM was used in approximately 80% of trials, whereas in high speech-SNR it was used in only approximately 50% of trials. Conclusions With natural viewing of a single talker and high speech-SNR, subjects can shift their gaze between points on the talker's face without compromising speech intelligibility. With low-speech SNR, subjects change their gaze patterns to fixate primarily on points that are in close proximity to the talker's mouth. The latter strategy is essential to optimize speech intelligibility in situations where there are simultaneous visual cues from multiple talkers (i.e., when some of the visual cues are distracters).
APA, Harvard, Vancouver, ISO, and other styles
6

Giri, Mahesh, and Neela Rayavarapu. "Improving the intelligibility of dysarthric speech using a time domain pitch synchronous-based approach." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 4 (2023): 4041. http://dx.doi.org/10.11591/ijece.v13i4.pp4041-4051.

Full text
Abstract:
Dysarthria is a motor speech impairment that reduces the intelligibility of speech. Observations indicate that for different types of dysarthria, the fundamental frequency, intensity, and speech rate of speech are distinct from those of unimpaired speakers. Therefore, the proposed enhancement technique modifies these parameters so that they fall in the range for unimpaired speakers. The fundamental frequency and speech rate of dysarthric speech are modified using the time domain pitch synchronous overlap and add (TD-PSOLA) algorithm. Then its intensity is modified using the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT)-based approach. This technique is applied to impaired speech samples of ten dysarthric speakers. After enhancement, the intelligibility of impaired and enhanced dysarthric speech is evaluated. The change in the intelligibility of impaired and enhanced dysarthric speech is evaluated using the rating scale and word count methods. The improvement in intelligibility is significant for speakers whose original intelligibility was poor. In contrast, the improvement in intelligibility was minimal for speakers whose intelligibility was already high. According to the rating scale method, for diverse speakers, the change in intelligibility ranges from 9% to 53%. Whereas, according to the word count method, this change in intelligibility ranges from 0% to 53%.
APA, Harvard, Vancouver, ISO, and other styles
7

Mahesh, Giri, and Rayavarapu Neela. "Improving the intelligibility of dysarthric speech using a time domain pitch synchronous-based approach." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 4 (2023): 4041–51. https://doi.org/10.11591/ijece.v13i4.pp4041-4051.

Full text
Abstract:
Dysarthria is a motor speech impairment that reduces the intelligibility of speech. Observations indicate that for different types of dysarthria, the fundamental frequency, intensity, and speech rate of speech are distinct from those of unimpaired speakers. Therefore, the proposed enhancement technique modifies these parameters so that they fall in the range for unimpaired speakers. The fundamental frequency and speech rate of dysarthric speech are modified using the time domain pitch synchronous overlap and add (TD-PSOLA) algorithm. Then its intensity is modified using the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT)-based approach. This technique is applied to impaired speech samples of ten dysarthric speakers. After enhancement, the intelligibility of impaired and enhanced dysarthric speech is evaluated. The change in the intelligibility of impaired and enhanced dysarthric speech is evaluated using the rating scale and word count methods. The improvement in intelligibility is significant for speakers whose original intelligibility was poor. In contrast, the improvement in intelligibility was minimal for speakers whose intelligibility was already high. According to the rating scale method, for diverse speakers, the change in intelligibility ranges from 9% to 53%. Whereas, according to the word count method, this change in intelligibility ranges from 0% to 53%.
APA, Harvard, Vancouver, ISO, and other styles
8

Ghorpade, Kalpana, and Arti Khaparde. "Single-channel speech enhancement by PSO-GSA with harmonic regeneration noise reduction." Bulletin of Electrical Engineering and Informatics 12, no. 5 (2023): 2895–902. http://dx.doi.org/10.11591/eei.v12i5.5373.

Full text
Abstract:
Speech quality significantly affects the performance of speech dependent systems. Noise in the background lowers the clarity and intelligibility of speech. The augmentation of speech can increase its quality. We propose a single-channel speech improvement framework that combines particle swarm optimization (PSO), gravitational search algorithm (GSA), and harmonic regeneration noise reduction (HRNR) to minimize speech signal noise and increase speech intelligibility. The proposed hybrid algorithm optimizes the amount of overlap between the noisy speech frames. This helps in reducing the overlapped noise. Then HRNR algorithm is applied to retain the speech harmonics. The algorithm gives improvement in the speech intelligibility for babble, car and exhibition noise. The segmental signal to noise ratio (SNR) is also improved for these noise types. There is improvement in speech intelligibility with minimal speech distortion.
APA, Harvard, Vancouver, ISO, and other styles
9

Shahidi, Lidea K., Leslie M. Collins, and Boyla O. Mainsah. "Objective intelligibility measurement of reverberant vocoded speech for normal-hearing listeners: Towards facilitating the development of speech enhancement algorithms for cochlear implants." Journal of the Acoustical Society of America 155, no. 3 (2024): 2151–68. http://dx.doi.org/10.1121/10.0025285.

Full text
Abstract:
Cochlear implant (CI) recipients often struggle to understand speech in reverberant environments. Speech enhancement algorithms could restore speech perception for CI listeners by removing reverberant artifacts from the CI stimulation pattern. Listening studies, either with cochlear-implant recipients or normal-hearing (NH) listeners using a CI acoustic model, provide a benchmark for speech intelligibility improvements conferred by the enhancement algorithm but are costly and time consuming. To reduce the associated costs during algorithm development, speech intelligibility could be estimated offline using objective intelligibility measures. Previous evaluations of objective measures that considered CIs primarily assessed the combined impact of noise and reverberation and employed highly accurate enhancement algorithms. To facilitate the development of enhancement algorithms, we evaluate twelve objective measures in reverberant-only conditions characterized by a gradual reduction of reverberant artifacts, simulating the performance of an enhancement algorithm during development. Measures are validated against the performance of NH listeners using a CI acoustic model. To enhance compatibility with reverberant CI-processed signals, measure performance was assessed after modifying the reference signal and spectral filterbank. Measures leveraging the speech-to-reverberant ratio, cepstral distance and, after modifying the reference or filterbank, envelope correlation are strong predictors of intelligibility for reverberant CI-processed speech.
APA, Harvard, Vancouver, ISO, and other styles
10

Majewski, Wojciech J. "Aural method of speech intelligibility enhancement." Journal of the Acoustical Society of America 103, no. 5 (1998): 2772. http://dx.doi.org/10.1121/1.421402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Kollmeier, B., and J. Peissig. "Speech Intelligibility Enhancement by Interaural Magnification." Acta Oto-Laryngologica 109, sup469 (1990): 215–23. http://dx.doi.org/10.1080/00016489.1990.12088432.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Deux, Florent, and Mendel Kleiner. "Binaural enhancement of speech intelligibility metrics." Journal of the Acoustical Society of America 123, no. 5 (2008): 3608. http://dx.doi.org/10.1121/1.2934792.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Ghorpade, Kalpana, and Arti Khaparde. "SINGLE CHANNEL SPEECH ENHANCEMENT USING EVOLUTIONARY ALGORITHM WITH LOG-MMSE." ASEAN Engineering Journal 12, no. 1 (2022): 83–91. http://dx.doi.org/10.11113/aej.v12.16770.

Full text
Abstract:
Additive noise degrades speech quality and intelligibility. Speech enhancement reduces this noise to make speech more pleasant and intelligible. It plays a significant role in speech recognition or speech-operated systems. In this paper, we propose a single-channel speech enhancement method in which the log-minimum mean square error method (log-MMSE) and modified accelerated particle swarm optimization algorithm are used to design a filter for improving the quality and intelligibility of noisy speech. Accelerated particle swarm optimization (APSO) algorithm is modified in which a single dimension of particle position is changed in a single iteration while obtaining the particle’s new position. Using this algorithm, a filter is designed with multiple passbands and notches for speech enhancement. The modified algorithm converges faster compared with standard particle swarm optimization algorithm (PSO) and APSO giving optimum filter coefficients. The designed filter is used to enhance the speech. The proposed speech enhancement method improves the perceptual estimation of speech quality (PESQ) by 17.05% for 5dB babble noise, 33.92 % for 5dB car noise, 14.96 % for 5dB airport noise, and 39.13 % for 5dB exhibition noise. The average output PESQ for these four types of noise is improved compared to conventional methods of speech enhancement. There is an average of 7.58 dB improvement in segmental SNR for these noise types. The proposed method improves speech intelligibility with minimum speech distortion.
APA, Harvard, Vancouver, ISO, and other styles
14

Zhang, Yunqi C., Yusuke Hioka, C. T. Justine Hui, and Catherine I. Watson. "Performance of speech enhancement algorithms on the speech intelligibility of native Mandarin listeners immersed in English-speaking environment." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 268, no. 8 (2023): 397–402. http://dx.doi.org/10.3397/in_2023_0071.

Full text
Abstract:
Speech enhancement algorithms have been developed to improve speech intelligibility for listeners under noisy conditions. However, all existing algorithms were evaluated only by native listeners, the performance of such algorithms on non-native listeners was rarely investigated. This study conducts a subjective listening test on native New Zealand English listeners and native Mandarin listeners who have been immersed in New Zealand English-speaking environment for more than one year. The participants were asked to transcribe noisy English sentences processed by five widely used single-channel speech enhancement algorithms. The speech intelligibility of the two groups was quantified and compared to investigate the effectiveness of the speech enhancement algorithms on non-native listeners who are familiar with the target language.
APA, Harvard, Vancouver, ISO, and other styles
15

Dachasilaruk, Siriporn, Niphat Jantharamin, and Apichai Rungruang. "Speech intelligibility enhancement for Thai-speaking cochlear implant listeners." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 3 (2019): 866. http://dx.doi.org/10.11591/ijeecs.v13.i3.pp866-875.

Full text
Abstract:
Cochlear implant (CI) listeners encounter difficulties in communicating with other persons in noisy listening environments. However, most CI research has been carried out using the English language. In this study, single-channel speech enhancement (SE) strategies as a pre-processing approach for the CI system were investigated in terms of Thai speech intelligibility improvement. Two SE algorithms, namely multi-band spectral subtraction (MBSS) and Weiner filter (WF) algorithms, were evaluated. Speech signals consisting of monosyllabic and bisyllabic Thai words were degraded by speech-shaped noise and babble noise at SNR levels of 0, 5, and 10 dB. Then the noisy words were enhanced using SE algorithms. The enhanced words were fed into the CI system to synthesize vocoded speech. The vocoded speech was presented to twenty normal-hearing listeners. The results indicated that speech intelligibility was marginally improved by the MBSS algorithm and significantly improved by the WF algorithm in some conditions. The enhanced bisyllabic words showed a noticeably higher intelligibility improvement than the enhanced monosyllabic words in all conditions, particularly in speech-shaped noise. Such outcomes may be beneficial to Thai-speaking CI listeners.
APA, Harvard, Vancouver, ISO, and other styles
16

Siriporn, Dachasilaruk, Jantharamin Niphat, and Rungruang Apichai. "Speech intelligibility enhancement for Thai-speaking cochlear implant listeners." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 3 (2019): 866–75. https://doi.org/10.11591/ijeecs.v13.i3.pp866-875.

Full text
Abstract:
Cochlear implant (CI) listeners encounter difficulties in communicating with other persons in noisy listening environments. However, most CI research has been carried out using the English language. In this study, single-channel speech enhancement (SE) strategies as a pre-processing approach for the CI system were investigated in terms of Thai speech intelligibility improvement. Two SE algorithms, namely multi-band spectral subtraction (MBSS) and Weiner filter (WF) algorithms, were evaluated. Speech signals consisting of monosyllabic and bisyllabic Thai words were degraded by speech-shaped noise and babble noise at SNR levels of 0, 5, and 10 dB. Then the noisy words were enhanced using SE algorithms. The enhanced words were fed into the CI system to synthesize vocoded speech. The vocoded speech was presented to twenty normal-hearing listeners. The results indicated that speech intelligibility was marginally improved by the MBSS algorithm and significantly improved by the WF algorithm in some conditions. The enhanced bisyllabic words showed a noticeably higher intelligibility improvement than the enhanced monosyllabic words in all conditions, particularly in speech-shaped noise. Such outcomes may be beneficial to Thai-speaking CI listeners.
APA, Harvard, Vancouver, ISO, and other styles
17

Smriti, Sahu, and Rayavarapu Neela. "Compressive speech enhancement using semi-soft thresholding and improved threshold estimation." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 3 (2023): 2788–800. https://doi.org/10.11591/ijece.v13i3.pp2788-2800.

Full text
Abstract:
Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basisfunction based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silenceregion of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches.
APA, Harvard, Vancouver, ISO, and other styles
18

Meghana, Rasamalla. "Universal Score-based Speech Enhancement with High Content Preservation." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 06 (2025): 1–9. https://doi.org/10.55041/ijsrem49349.

Full text
Abstract:
Abstract - Speech enhancement aims to improve the quality and intelligibility of speech signals corrupted by noise or distortions. Traditional methods often struggle to generalize across diverse noise types and acoustic conditions, limiting their real-world applicability. In this work, we propose a universal score-based speech enhancement framework that leverages recent advances in score-based generative modeling to robustly denoise speech signals while preserving critical speech content. Our approach models the complex speech distribution through a learned score function, enabling effective removal of various noise patterns without relying on explicit noise assumptions. Extensive experiments demonstrate that the proposed method achieves superior enhancement performance across multiple challenging noise scenarios, outperforming state-of-the-art baselines in both objective metrics and perceptual quality. Notably, the approach excels in preserving speech content and naturalness, making it suitable for practical applications such as telecommunication, hearing aids, and automatic speech recognition. Key Words: Speech Enhancement, Score-based Generative Models, Noise Robustness, Content Preservation, Speech Denoising, Universal Enhancement, Deep Learning, Signal Processing, Speech Intelligibility, Noise Generalization
APA, Harvard, Vancouver, ISO, and other styles
19

Li, Dengshi, Chenyi Zhu, and Lanxin Zhao. "D2StarGAN: A Near-Far End Noise Adaptive StarGAN for Speech Intelligibility Enhancement." Electronics 12, no. 17 (2023): 3620. http://dx.doi.org/10.3390/electronics12173620.

Full text
Abstract:
When using mobile communication, the voice output from the device is already relatively clear, but in a noisy environment, it is difficult for the listener to obtain the information expressed by the speaker with clarity. Consequently, speech intelligibility enhancement technology has emerged to help alleviate this problem. Speech intelligibility enhancement (IENH) is a technique that enhances speech intelligibility during the reception phase. Previous research has focused on IENH through normal versus different levels of Lombardic speech conversion, inspired by a well-known acoustic mechanism called the Lombard effect. However, these methods often lead to speech distortion and impair the overall speech quality. To address the speech quality degradation problem, we propose an improved (StarGAN)-based IENH framework by combining StarGAN networks with the dual discriminator idea to construct the conversion framework. This approach offers two main advantages: (1) Addition of a speech metric discriminator on top of StarGAN to optimize multiple intelligibility and quality-related metrics simultaneously; (2) a framework that is adaptive to different distal and proximal noise levels with different noise types. Experimental results from objective experiments and subjective preference tests show that our approach outperforms the baseline approach, and these enable IENH to be more widely used.
APA, Harvard, Vancouver, ISO, and other styles
20

Goldberg, Hyman. "Electroacoustic speech intelligibility enhancement method and apparatus." Journal of the Acoustical Society of America 101, no. 3 (1997): 1221. http://dx.doi.org/10.1121/1.419429.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Srinivasarao, V., and Umesh Ghanekar. "Speech intelligibility enhancement: a hybrid wiener approach." International Journal of Speech Technology 23, no. 3 (2020): 517–25. http://dx.doi.org/10.1007/s10772-020-09737-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Sahu, Smriti, and Neela Rayavarapu. "Compressive speech enhancement using semi-soft thresholding and improved threshold estimation." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 3 (2023): 2788. http://dx.doi.org/10.11591/ijece.v13i3.pp2788-2800.

Full text
Abstract:
<span lang="EN-US">Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basis-function based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silence-region of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches.</span>
APA, Harvard, Vancouver, ISO, and other styles
23

Lohith, Lakkakula. "Speech Enhancement via Metric GAN and Kolmogorov-Arnold Networks: A Deep Learning Approach in Python." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem49208.

Full text
Abstract:
Abstract - Speech enhancement in noisy environments remains a critical challenge for robust voice communication systems. Traditional signal processing techniques and supervised deep learning models often struggle to generalize to diverse noise conditions and fail to optimize for human perceptual quality. This paper proposes a novel Metric GAN+KAN architecture, which integrates a Generative Adversarial Network (GAN) with Kolmogorov-Arnold Networks (KAN) to enhance speech signals by focusing both on perceptual fidelity and structural consistency. The GAN-based generator learns to map noisy speech spectrograms to clean counterparts, while the discriminator enforces perceptual realism. The KAN component introduces domain-aware constraints that preserve the harmonic structure and energy characteristics of speech. We train the system using perceptual metrics such as PESQ and STOI, enabling the model to directly optimize for intelligibility and clarity. Experimental results on the VoiceBank-DEMAND dataset demonstrate significant improvements over conventional methods, achieving a PESQ of 3.1, STOI of 0.88, and SDR of 15 dB. This work paves the way for real-time, intelligibility-focused speech enhancement systems in practical applications. Key Words: Speech Enhancement, Generative Adversarial Networks (GAN), Kolmogorov-Arnold Networks (KAN), Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal-to-Distortion Ratio (SDR), Deep Learning, Noisy Speech, Real-Time Audio Processing, Knowledge-Aware Constraints
APA, Harvard, Vancouver, ISO, and other styles
24

Espy-Wilson, Carol Y., Venkatesh R. Chari, Joel M. MacAuslan, Caroline B. Huang, and Michael J. Walsh. "Enhancement of Electrolaryngeal Speech by Adaptive Filtering." Journal of Speech, Language, and Hearing Research 41, no. 6 (1998): 1253–64. http://dx.doi.org/10.1044/jslhr.4106.1253.

Full text
Abstract:
Artificial larynges provide a means of verbal communication for people who have either lost or are otherwise unable to use their larynges. Although they enable adequate communication, the resulting speech has an unnatural quality and is significantly less intelligible than normal speech. One of the major problems with the widely used Transcutaneous Artificial Larynx (TAL) is the presence of a steady background noise caused by the leakage of acoustic energy from the TAL, its interface with the neck, and the surrounding neck tissue. The severity of the problem varies from speaker to speaker, partly depending upon the characteristics of the individual's neck tissue. The present study tests the hypothesis that TAL speech is enhanced in quality (as assessed through listener preference judgments) and intelligibility by removal of the inherent, directly radiated background signal. In particular, the focus is on the improvement of speech over the telephone or through some other electronic communication medium. A novel adaptive filtering architecture was designed and implemented to remove the background noise. Perceptual tests were conducted to assess speech, from two individuals with a laryngectomy and two normal speakers using the Servox TAL, before and after processing by the adaptive filter. A spectral analysis of the adaptively filtered TAL speech revealed a significant reduction in the amount of background source radiation yet preserved the acoustic characteristics of the vocal output. Results from the perceptual tests indicate a clear preference for the processed speech. In general, there was no significant improvement or degradation in intelligibility. However, the processing did improve the intelligibility of word-initial non-nasal consonants.
APA, Harvard, Vancouver, ISO, and other styles
25

Veeramakal, T., Syed Raffi Ahamed J, and Bagiyalakshmi N. "Speech Signal Enhancement with Integrated Weighted Filtering for PSNR Reduction in Multimedia Applications." Journal of Computer Allied Intelligence 2, no. 3 (2024): 1–14. http://dx.doi.org/10.69996/jcai.2024011.

Full text
Abstract:
This paper investigates the effectiveness of the Weighted Kalman Integrated Band Rejection (WKBR) method for enhancing speech signals in multimedia applications. Speech enhancement is crucial for improving the quality and intelligibility of audio in environments with varying noise types and levels. The WKBR method is evaluated across ten different noise scenarios, including white noise, babble noise, street noise, airplane cabin noise, and more. Performance metrics such as Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Short-Time Objective Intelligibility (STOI) are used to quantify the enhancement. The results show significant improvements, with PSNR increasing from an average of 12.8 dB before enhancement to 21.9 dB after enhancement, MSE reducing from an average of 0.0179 to 0.0053, and STOI scores improving from an average of 0.58 to 0.75. These findings highlight the potential of WKBR as a powerful tool for speech signal enhancement, making it a promising solution for real-world multimedia applications where clear and intelligible speech is essential.
APA, Harvard, Vancouver, ISO, and other styles
26

Guan, Jingjing, and Chang Liu. "Speech Perception in Noise With Formant Enhancement for Older Listeners." Journal of Speech, Language, and Hearing Research 62, no. 9 (2019): 3290–301. http://dx.doi.org/10.1044/2019_jslhr-s-18-0089.

Full text
Abstract:
Purpose Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to explore whether 2nd formant (F2) enhancement improves speech perception in noise for older listeners with hearing impairment (HI) and normal hearing (NH). Method Target words (e.g., color and digit) were selected and presented based on the paradigm of the coordinate response measure corpus. Speech recognition thresholds with original and F2-enhanced speech in 2- and 6-talker babble were examined for older listeners with NH and HI. Results The thresholds for both the NH and HI groups improved for enhanced speech signals primarily in 2-talker babble, but not in 6-talker babble. The F2 enhancement benefits did not correlate significantly with listeners' age and their average hearing thresholds in most listening conditions. However, speech intelligibility index values increased significantly with F2 enhancement in babble for listeners with HI, but not for NH listeners. Conclusions Speech sounds with F2 enhancement may improve listeners' speech perception in 2-talker babble, possibly due to a greater amount of speech information available in temporally modulated noise or a better capacity to separate speech signals from background babble.
APA, Harvard, Vancouver, ISO, and other styles
27

Kates, James M. "Speech Enhancement Based on a Sinusoidal Model." Journal of Speech, Language, and Hearing Research 37, no. 2 (1994): 449–64. http://dx.doi.org/10.1044/jshr.3702.449.

Full text
Abstract:
Sinusoidal modeling is a new procedure for representing the speech signal. In this approach, the signal is divided into overlapping segments, the Fourier transform computed for each segment, and a set of desired spectral peaks is identified. The speech is then resynthesized using sinusoids that have the frequency, amplitude, and phase of the selected peaks, with the remaining spectral information being discarded. Using a limited number of sinusoids to reproduce speech in a background of multi-talker speech babble results in a speech signal that has an improved signal-to-noise ratio and enhanced spectral contrast. The more intense spectral components, assumed to be primarily the desired speech, are reproduced, whereas the less intense components, assumed to be primarily background noise, are not. To test the effectiveness of this processing approach as a noise suppression technique, both consonant recognition and perceived speech intelligibility were determined in quiet and in noise for a group of subjects with normal hearing as the number of sinusoids used to represent isolated speech tokens was varied. The results show that reducing the number of sinusoids used to represent the speech causes reduced consonant recognition and perceived intelligibility both in quiet and in noise, and suggests that similar results would be expected for listeners with hearing impairments.
APA, Harvard, Vancouver, ISO, and other styles
28

Graetzer, Simone, and Carl Hopkins. "Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios." Journal of the Acoustical Society of America 152, no. 6 (2022): 3458–70. http://dx.doi.org/10.1121/10.0016494.

Full text
Abstract:
The literature shows that the intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) between –10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at lower SNRs, that is, SNRs from −29 to –5 dB. These comprise an Ideal Binary Mask (IBM) with a Local Criterion (LC) set to 0 dB and an Ideal Ratio Mask (IRM). The performance of three intrusive Short-Time Objective Intelligibility (STOI) variants—STOI, STOI+, and Extended Short-Time Objective Intelligibility (ESTOI)—is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (>90% for sentence material) even at very low mixture SNRs, while IBMs with LC = 0 provide limited intelligibility gains for SNR < −14 dB. It is also shown that, unlike STOI, STOI+ and ESTOI are suitable metrics for speech mixed with WGN at low SNRs and processed by IBMs with LC = 0 even when speech is high-pass filtered to flatten the spectral tilt before masking.
APA, Harvard, Vancouver, ISO, and other styles
29

Ullah, Rizwan, Lunchakorn Wuttisittikulkij, Sushank Chaudhary, et al. "End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement." Sensors 22, no. 20 (2022): 7782. http://dx.doi.org/10.3390/s22207782.

Full text
Abstract:
Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets.
APA, Harvard, Vancouver, ISO, and other styles
30

Gopi Tilak, V., and S. Koteswara Rao. "Dual and joint estimation for speech enhancement." International Journal of Engineering & Technology 7, no. 2.7 (2018): 5. http://dx.doi.org/10.14419/ijet.v7i2.7.10243.

Full text
Abstract:
Maintaining good quality and intelligibility of speech is the primary constraint in mobile communications. The present work is on the enhancement of speech under the consideration of additive white and colored noise environments using Kalman filter. Dual and Joint estimation techniques were applied and the quality of speech is analyzed through the signal to noise ratio. The techniques were applied in both ideal and practical cases for two different speech samples.
APA, Harvard, Vancouver, ISO, and other styles
31

Y, Sravanthi. "LSTM - Aided Speech Enhancement with Wiener Filter Adaptation." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem30882.

Full text
Abstract:
Speech enhancement plays a pivotal role in various applications, from improving the intelligibility of spoken communication in noisy environments. With the assistance of deep learning, a novel approach speech signal enhancement model is introduced in this research. The proposed LSTM model estimates the tuning factor of the Wiener filter with the aid of extracted features to obtain the de-noised speech signal. This model is structured into two phases: Training and Testing. During the training phase, Non-negative Matrix Factorization (NMF) is employed to estimate both the noise and signal spectrum from the noisy input signal. Subsequently, Empirical Mean Decomposition (EMD) features are extracted from the Wiener filter and a de-noised speech signal is obtained via processing. Additionally, bark frequency information is evaluated. In the testing phase, the LSTM model has been trained by the extracted features (EMD) via a modified wiener filter. The combination of LSTM-based temporal modeling with trained features and the adaptive Wiener filter results in significantly improved speech quality and intelligibility. Keywords— Speech Enhancement, Non-negative Matrix Factorization, Empirical Mode Decomposition, Wiener Filter.
APA, Harvard, Vancouver, ISO, and other styles
32

Saunders, Gabrielle H., and James M. Kates. "Speech intelligibility enhancement using hearing-aid array processing." Journal of the Acoustical Society of America 102, no. 3 (1997): 1827–37. http://dx.doi.org/10.1121/1.420107.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Hong, Sae Mi, and Hyun Sub Sim. "Planning of Speech Intelligibility Enhancement Program for Dysarthria." AAC Research & Practice 3, no. 2 (2015): 177. http://dx.doi.org/10.14818/aac.2015.12.3.2.177.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Abajaddi, Nesrine, Youssef Elfahm, Badia Mounir, and Abdelmajid Farchi. "A robust speech enhancement method in noisy environments." International journal of electrical and computer engineering systems 14, no. 9 (2023): 973–83. http://dx.doi.org/10.32985/ijeces.14.9.2.

Full text
Abstract:
Speech enhancement aims to eliminate or reduce undesirable noises and distortions, this processing should keep features of the speech to enhance the quality and intelligibility of degraded speech signals. In this study, we investigated a combined approach using single-frequency filtering (SFF) and a modified spectral subtraction method to enhance single-channel speech. The SFF method involves dividing the speech signal into uniform subband envelopes, and then performing spectral over-subtraction on each envelope. A smoothing parameter, determined by the a-posteriori signal-to-noise ratio (SNR), is used to estimate and update the noise without the need for explicitly detecting silence. To evaluate the performance of our algorithm, we employed objective measures such as segmental SNR (segSNR), extended short-term objective intelligibility (ESTOI), and perceptual evaluation of speech quality (PESQ). We tested our algorithm with various types of noise at different SNR levels and achieved results ranging from 4.24 to 15.41 for segSNR, 0.57 to 0.97 for ESTOI, and 2.18 to 4.45 for PESQ. Compared to other standard and existing speech enhancement methods, our algorithm produces better results and performs well in reducing undesirable noises.
APA, Harvard, Vancouver, ISO, and other styles
35

Blake, Helen L. "Intelligibility Enhancement via Telepractice During COVID-19 Restrictions." Perspectives of the ASHA Special Interest Groups 5, no. 6 (2020): 1797–800. http://dx.doi.org/10.1044/2020_persp-20-00133.

Full text
Abstract:
Purpose Speech-language pathologists (SLPs) may be approached by multilingual speakers seeking to improve their intelligibility in English. Intelligibility is an essential element of spoken language proficiency and is especially important for multilingual university students given their need to express complex ideas in an additional language. Intelligibility Enhancement is an assessment and intervention approach that aims to improve the intelligibility of consonants, vowels, and prosody with multilingual speakers who are learning to speak English. This article describes the student-led delivery of Intelligibility Enhancement with multilingual clients in a university clinic using a telepractice model after restrictions were imposed due to the COVID-19 pandemic. Conclusions Telepractice offered an appropriate mode of service delivery that facilitated a high-quality, best practice, and continuous service to Intelligibility Enhancement clients while also permitting student SLPs' ongoing clinical education employing an increasingly utilized technology. Previous research demonstrated the effectiveness of the Intelligibility Enhancement Assessment and Intervention Protocols in increasing English intelligibility in multilingual university students. The modifications necessary to provide this intervention via telepractice in a student-led clinic not only offer a possible solution to supporting multilingual university students' English intelligibility during the COVID-19 pandemic but also will inform the understanding of SLPs providing similar interventions in the future.
APA, Harvard, Vancouver, ISO, and other styles
36

Fallah, Ali, and Steven van de Par. "A Speech Preprocessing Method Based on Perceptually Optimized Envelope Processing to Increase Intelligibility in Reverberant Environments." Applied Sciences 11, no. 22 (2021): 10788. http://dx.doi.org/10.3390/app112210788.

Full text
Abstract:
Speech intelligibility in public places can be degraded by the environmental noise and reverberation. In this study, a new near-end listening enhancement (NELE) approach is proposed in which using a time varying filter jointly enhances the onsets and reduces the overlap masking. For optimization, some look-ahead in clean speech and prior knowledge of room impulse response (RIR) are required. In this method, by optimizing a defined cost function, the Spectro-Temporal Envelope of reverb speech is optimized to be as close as possible to that of clean speech. In this cost function, onsets of speech are optimized with increased weight. This approach is different from overlap-masking ratio (OMR) and speech enhancement (OE) approaches (Grosse, van de Par, 2017, J. Audio Eng. Soc., Vol. 65 (1/2), pp. 31–41) that only consider previous frames in each time slot for determining the time variant filtering. The SRT measurements show that the new optimization framework enhances the speech intelligibility up to 2 dB more that OE.
APA, Harvard, Vancouver, ISO, and other styles
37

Park, Gyuseok, Woohyeong Cho, Kyu-Sung Kim, and Sangmin Lee. "Speech Enhancement for Hearing Aids with Deep Learning on Environmental Noises." Applied Sciences 10, no. 17 (2020): 6077. http://dx.doi.org/10.3390/app10176077.

Full text
Abstract:
Hearing aids are small electronic devices designed to improve hearing for persons with impaired hearing, using sophisticated audio signal processing algorithms and technologies. In general, the speech enhancement algorithms in hearing aids remove the environmental noise and enhance speech while still giving consideration to hearing characteristics and the environmental surroundings. In this study, a speech enhancement algorithm was proposed to improve speech quality in a hearing aid environment by applying noise reduction algorithms with deep neural network learning based on noise classification. In order to evaluate the speech enhancement in an actual hearing aid environment, ten types of noise were self-recorded and classified using convolutional neural networks. In addition, noise reduction for speech enhancement in the hearing aid were applied by deep neural networks based on the noise classification. As a result, the speech quality based on the speech enhancements removed using the deep neural networks—and associated environmental noise classification—exhibited a significant improvement over that of the conventional hearing aid algorithm. The improved speech quality was also evaluated by objective measure through the perceptual evaluation of speech quality score, the short-time objective intelligibility score, the overall quality composite measure, and the log likelihood ratio score.
APA, Harvard, Vancouver, ISO, and other styles
38

Van Engen, Kristin J., Jasmine E. B. Phelps, Rajka Smiljanic, and Bharath Chandrasekaran. "Enhancing Speech Intelligibility: Interactions Among Context, Modality, Speech Style, and Masker." Journal of Speech, Language, and Hearing Research 57, no. 5 (2014): 1908–18. http://dx.doi.org/10.1044/jslhr-h-13-0076.

Full text
Abstract:
Purpose The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous sentences, conversational and clear speaking styles, auditory-only (AO) and audiovisual (AV) presentation modalities, and 4 different maskers (2-talker babble, 4-talker babble, 8-talker babble, and speech-shaped noise). Results Semantic context, clear speech, and visual input all improved intelligibility but also interacted with one another and with masking condition. Semantic context was beneficial across all maskers in AV conditions but only in speech-shaped noise in AO conditions. Clear speech provided the most benefit for AV speech with semantically anomalous targets. Finally, listeners were better able to take advantage of visual information for meaningful versus anomalous sentences and for clear versus conversational speech. Conclusion Because intelligibility-enhancing cues influence each other and depend on masking condition, multiple maskers and enhancement cues should be used to accurately assess individuals' speech-in-noise perception.
APA, Harvard, Vancouver, ISO, and other styles
39

Cox, Trevor, Michael Akeroyd, Jon Barker, et al. "Predicting Speech Intelligibility for People with a Hearing Loss: The Clarity Challenges." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 3 (2023): 4599–606. http://dx.doi.org/10.3397/in_2022_0662.

Full text
Abstract:
Objective speech intelligibility metrics are used to reduce the need for time consuming listening tests. They are used in the design of audio systems; room acoustics and signal processing algorithms. Most published speech intelligibility metrics have been developed using young adults with so-called 'normal hearing', and therefore do not work well for those with different hearing characteristics. One of the most common causes of aural diversity is sensorineural hearing loss. While partially restoring perception through hearing aids is possible, results are mixed. This has led to the Clarity Project, which is running an open series of Enhancement Challenges to improve the processing of speech-in-noise for hearing aids. To enable this, objective metrics of speech intelligibility are needed, which work from signals produced by hearing aids for diverse listeners. For this reason, Clarity is also running Prediction Challenges to improve speech intelligibility models. Competitors are given a set of audio signals produced by hearing aid algorithms, and challenged to predict how many words a listener with a particular hearing characteristic will achieve. Drawing on the learning from the challenge, we will outline what has been learnt about improving intelligibility metrics for those with a hearing impairment.
APA, Harvard, Vancouver, ISO, and other styles
40

Jiang, Yi, Hong Zhou, Yuan Yuan Zu, and Xiao Chen. "Energy Based Dual-Microphone Electronic Speech Segregation." Applied Mechanics and Materials 385-386 (August 2013): 1381–84. http://dx.doi.org/10.4028/www.scientific.net/amm.385-386.1381.

Full text
Abstract:
Speech segregation based on energy has a good performance on dual-microphone electronic speech signal processing. The implication of the binary mask to an auditory mixture has been shown to yield substantial improvements in signal-to-noise-ratio (SNR) and intelligibility. To evaluate the performance of a binary mask based dual microphone speech enhancement algorithm, various spatial noise sources and reverberation test conditions are used. Two compare dual microphone systems based on energy difference and machine learning are used at the same time. Result with SNR and speech intelligibility show that more robust performance can be achieved than the two compare systems.
APA, Harvard, Vancouver, ISO, and other styles
41

Snyder, Gregory J., Molly Grace Williams, Molly E. Gough, and Paul G. Blanchet. "Fluency-Enhancing Strategies for Hypokinetic Dysarthria Exacerbated by Subthalamic Nucleus Brain Stimulation: A Case Study." Perspectives of the ASHA Special Interest Groups 3, no. 4 (2018): 4–16. http://dx.doi.org/10.1044/persp3.sig4.4.

Full text
Abstract:
Introduction Speech disorders associated with Parkinson's disease (PD) and the pharmaceutical treatments of PD are well documented. A relatively recent treatment alternative for PD is deep brain stimulation (DBS) of the subthalamic nucleus (STN), which is used to manage the symptoms of PD as the disease progresses. This case study documented the speech characteristics of a unique client with PD STN-DBS and reported initial findings on a variety of fluency- and intelligibility-enhancing strategies. Method A speech-language pathologist referred a 63-year-old man, previously diagnosed by a speech-language pathologist with neurogenic stuttering as a result of an STN-DBS battery change, for a speech evaluation, reporting lack of success with traditional stuttering treatment strategies. The client's speech was assessed, and a variety of fluency- and intelligibility-enhancing techniques were tested during trial therapy. Results The client's speech exhibited the hallmark characteristics of hypokinetic dysarthria, including speech disfluencies. A variety of pacing and prosthetic strategies were tested, revealing that auditory and tactile prosthetic speech feedback provided optimal improvements in fluency and intelligibility. Discussion These results suggest that the prosthetic speech feedback provided optimal intelligibility and fluency enhancement and could potentially improve articulation and speech volume, which are also common in cases of hypokinetic dysarthria.
APA, Harvard, Vancouver, ISO, and other styles
42

Li, Qiuying, Tao Zhang, Yanzhang Geng, and Zhen Gao. "Microphone array speech enhancement based on optimized IMCRA." Noise Control Engineering Journal 69, no. 6 (2021): 468–76. http://dx.doi.org/10.3397/1/376944.

Full text
Abstract:
Microphone array speech enhancement algorithm uses temporal and spatial informa- tion to improve the performance of speech noise reduction significantly. By combining noise estimation algorithm with microphone array speech enhancement, the accuracy of noise estimation is improved, and the computation is reduced. In traditional noise es- timation algorithms, the noise power spectrum is not updated in the presence of speech, which leads to the delay and deviation of noise spectrum estimation. An optimized im- proved minimum controlled recursion average speech enhancement algorithm, based on a microphone matrix is proposed in this paper. It consists of three parts. The first part is the preprocessing, divided into two branches: the upper branch enhances the speech signal, and the lower branch gets the noise. The second part is the optimized improved minimum controlled recursive averaging. The noise power spectrum is updated not only in the non-speech segments but also in the speech segments. Fi- nally, according to the estimated noise power spectrum, the minimum mean-square error log-spectral amplitude algorithm is used to enhance speech. Testing data are from TIMIT and Noisex-92 databases. Short-time objective intelligibility and seg- mental signal-to-noise ratio are chosen as evaluation metrics. Experimental results show that the proposed speech enhancement algorithm can improve the segmental signal-to-noise ratio and short-time objective intelligibility for various noise types at different signal-to-noise ratio levels.
APA, Harvard, Vancouver, ISO, and other styles
43

Md. Easir Arafat, Indraneel Misra, and Md. Ekramul Hamid. "A comparative study for throat microphone speech enhancement with different approaches." International Journal of Science and Research Archive 13, no. 1 (2024): 850–59. http://dx.doi.org/10.30574/ijsra.2024.13.1.1631.

Full text
Abstract:
Throat microphones (TM) offer significant advantages in noisy environments by capturing speech signals directly from the throat, thus minimizing external noise. However, TM signals often lack clarity and intelligibility compared to conventional microphones. This paper presents a comparative study of three prominent feature extraction techniques—Mel-frequency cepstral coefficients (MFCC), Linear Predictive Cepstral coefficients (LPCC), Perceptual Linear Prediction (PLP) for enhancing speech captured by throat microphones. Each technique is evaluated based on its ability to enhance speech clarity and reduce noise interference. Experimental results on the ATR503 dataset, consisting of throat and close-talk microphone recordings, reveal that LPCC achieved an average Signal-to-Noise Ratio (SNR) improvement of 3dB and a Perceptual Evaluation of Speech Quality (PESQ) score increase of 1.3133 and 0.9553 compared to MFCC and PLP. In subjective evaluations the highest mean rating of 8.46 for LPCC indicates it was perceived as the most intelligible and clear. LPC spectra analysis demonstrates that Linear Predictive Cepstral Coefficients (LPCC) in retrieving missing frequencies in speech captured by throat microphones. These findings suggest that LPCC is a robust method for throat microphone speech enhancement, offering significant improvements in speech intelligibility and quality in noisy environments.
APA, Harvard, Vancouver, ISO, and other styles
44

Brochier, Tim J., Amanda Fullerton, Adam Hersbach, Harish Krishnamoorthi, and Zachary Smith. "Deep neural network-based speech enhancement for cochlear implants." Journal of the Acoustical Society of America 154, no. 4_supplement (2023): A28. http://dx.doi.org/10.1121/10.0022678.

Full text
Abstract:
Noisy conditions make understanding speech with a cochlear implant (CI) difficult. Speech enhancement (SE) algorithms based on signal statistics can be beneficial in stationary noise, but rarely provide benefit in modulated multi-talker babble. Current approaches using deep neural networks (DNNs) rely on a data driven approach for training and promise improvements in a wide variety of noisy conditions. In this study a DNN-based SE algorithm was evaluated in CI listeners. The network was trained on a large database of publicly available recordings. A double-blinded acute evaluation was conducted with 10 adult CI users by assessing intelligibility and quality of speech embedded in a range of different noise types. The DNN-based SE algorithm provided significant benefits in speech intelligibility and sound quality in all noise types that were evaluated. Speech reception thresholds, the SNR required to understand 50% of the speech material, improved by 1.8 to 3.5 dB depending on noise type. Benefits varied with the SNR of the input signal and the mixing ratio parameter that was used to combine the original and de-noised signals. The results demonstrate that DNN-based SE can provide benefits in natural, modulated noise conditions, which is critical to CI users in their day-to-day environment.
APA, Harvard, Vancouver, ISO, and other styles
45

Thoidis, Iordanis, Lazaros Vrysis, Dimitrios Markou, and George Papanikolaou. "Temporal Auditory Coding Features for Causal Speech Enhancement." Electronics 9, no. 10 (2020): 1698. http://dx.doi.org/10.3390/electronics9101698.

Full text
Abstract:
Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have seen great success, mainly operating on the log-power spectra. Although these approaches surpass the need for exhaustive feature extraction and selection, it is still unclear whether they target the important sound characteristics related to speech perception. In this study, we propose a novel set of auditory-motivated features for single-channel speech enhancement by fusing temporal envelope and temporal fine structure information in the context of vocoder-like processing. A causal gated recurrent unit (GRU) neural network is employed to recover the low-frequency amplitude modulations of speech. Experimental results indicate that the exploited system achieves considerable gains for normal-hearing and hearing-impaired listeners, in terms of objective intelligibility and quality metrics. The proposed auditory-motivated feature set achieved better objective intelligibility results compared to the conventional log-magnitude spectrogram features, while mixed results were observed for simulated listeners with hearing loss. Finally, we demonstrate that the proposed analysis/synthesis framework provides satisfactory reconstruction accuracy of speech signals.
APA, Harvard, Vancouver, ISO, and other styles
46

Takale, Dattatray G., Shreyas Thombal, Najim Tadvi, Sunil Sonu, Samadhan Suryawanashi, and Ashwajit Surwade. "Speech Enhancement Using Machine Learning." Journal of Electrical Engineering and Electronics Design 2, no. 1 (2024): 11–15. http://dx.doi.org/10.48001/joeeed.2024.2111-15.

Full text
Abstract:
The incorporation of machine learning, specifically deep learning, into speech enhancement algorithms represents an advanced methodology aimed at restoring original speech signals from distorted counterparts. This innovative approach incorporates the use of Charlier polynomials-based discrete transform, particularly the discrete Charlier transform (DCHT), to extract spectra from noisy signals employing a fully connected neural network. Leveraging the capabilities of deep learning, particularly in handling nonlinear mapping challenges, the system acquires contextual information from speech signals, resulting in enhanced speech characterized by improved quality and intelligibility. The proposed algorithm undergoes rigorous empirical testing through self-comparison, fine-tuning the DCHT parameter to optimize the performance of speech enhancement models. The experimentation entails the variation of DCHT parameter values, with evaluation conducted using the TIMIT database. Diverse speech measures are employed for comprehensive assessment, revealing the effectiveness of the DCHT-based trained model in enhancing speech signals within specific conditions.
APA, Harvard, Vancouver, ISO, and other styles
47

Chilakawad, Aparna, and Pandurangarao N. Kulkarni. "Spectral splitting of speech signal using time varying recursive filters for binaural hearing aids." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 4 (2024): 4998. http://dx.doi.org/10.11591/ijai.v13.i4.pp4998-5004.

Full text
Abstract:
<p>Speech perception in noisy environments is reduced in human with sensorineural hearing loss (SNHL) due to masking. Moderate SNHL cannot be cured medically hence masking effects should be reduced to enhance speech perception. To reduce masking, processing delay and hardware complexity the present paper is proposed a scheme to partition the voice signal into two signals which are complementary to each other by using the filter-bank summation method (FBSM) with a set of time-varying recursive band pass filters. Performance of the filter is evaluated with following measures: perceptual evaluation of speech quality (PESQ), mean opinion score (MOS) for speech quality and modified rhyme test (MRT) for speech intelligibility. The test signals used for the evaluation of quality are a syllable and a word and for the evaluation of intelligibility 300 monosyllabic words are used. The results demonstrated an increase in the quality and intelligibility of processed speech in a noisy environment. As a result, there is an enhancement in perception of processed speech in a noisy environment.</p>
APA, Harvard, Vancouver, ISO, and other styles
48

Aparna, Chilakawad, and N. Kulkarni Pandurangarao. "Spectral splitting of speech signal using time varying recursive filters for binaural hearing aids." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 4 (2024): 4998–5004. https://doi.org/10.11591/ijai.v13.i4.pp4998-5004.

Full text
Abstract:
Speech perception in noisy environments is reduced in human with sensorineural hearing loss (SNHL) due to masking. Moderate SNHL cannot be cured medically hence masking effects should be reduced to enhance speech perception. To reduce masking, processing delay and hardware complexity the present paper is proposed a scheme to partition the voice signal into two signals which are complementary to each other by using the filter-bank summation method (FBSM) with a set of time-varying recursive band pass filters. Performance of the filter is evaluated with following measures: perceptual evaluation of speech quality (PESQ), mean opinion score (MOS) for speech quality and modified rhyme test (MRT) for speech intelligibility. The test signals used for the evaluation of quality are a syllable and a word and for the evaluation of intelligibility 300 monosyllabic words are used. The results demonstrated an increase in the quality and intelligibility of processed speech in a noisy environment. As a result, there is an enhancement in perception of processed speech in a noisy environment.
APA, Harvard, Vancouver, ISO, and other styles
49

Munson, Benjamin. "Audiovisual enhancement and single-word intelligibility in children's speech." Journal of the Acoustical Society of America 148, no. 4 (2020): 2765. http://dx.doi.org/10.1121/1.5147696.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Harris, John G., and Mark D. Skowronski. "Energy redistribution speech intelligibility enhancement, vocalic and transitional cues." Journal of the Acoustical Society of America 112, no. 5 (2002): 2305. http://dx.doi.org/10.1121/1.4808562.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!