Journal articles: 'Speech processing systems. Signal processing'

1

Dasarathy, Belur V. "Robust speech processing." Information Fusion 5, no. 2 (June 2004): 75. http://dx.doi.org/10.1016/j.inffus.2004.02.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Delic, Vlado, Darko Pekar, Radovan Obradovic, and Milan Secujski. "Speech signal processing in ASR&TTS algorithms." Facta universitatis - series: Electronics and Energetics 16, no. 3 (2003): 355–64. http://dx.doi.org/10.2298/fuee0303355d.

Full text

Abstract:

Speech signal processing and modeling in systems for continuous speech recognition and Text-to-Speech synthesis in Serbian language are described in this paper. Both systems are fully developed by the authors and do not use any third party software. Accuracy of the speech recognizer and intelligibility of the TTS system are in the range of the best solutions in the world, and all conditions are met for commercial use of these solutions.

APA, Harvard, Vancouver, ISO, and other styles

3

de Abreu, Caio Cesar Enside, Marco Aparecido Queiroz Duarte, Bruno Rodrigues de Oliveira, Jozue Vieira Filho, and Francisco Villarreal. "Regression-Based Noise Modeling for Speech Signal Processing." Fluctuation and Noise Letters 20, no. 03 (January 30, 2021): 2150022. http://dx.doi.org/10.1142/s021947752150022x.

Full text

Abstract:

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.

APA, Harvard, Vancouver, ISO, and other styles

4

Hu, J., C. C. Cheng, and W. H. Liu. "Processing of speech signals using a microphone array for intelligent robots." Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 219, no. 2 (March 1, 2005): 133–43. http://dx.doi.org/10.1243/095965105x9461.

Full text

Abstract:

For intelligent robots to interact with people, an efficient human-robot communication interface is very important (e.g. voice command). However, recognizing voice command or speech represents only part of speech communication. The physics of speech signals includes other information, such as speaker direction. Secondly, a basic element of processing the speech signal is recognition at the acoustic level. However, the performance of recognition depends greatly on the reception. In a noisy environment, the success rate can be very poor. As a result, prior to speech recognition, it is important to process the speech signals to extract the needed content while rejecting others (such as background noise). This paper presents a speech purification system for robots to improve the signal-to-noise ratio of reception and an algorithm with a multidirection calibration beamformer.

APA, Harvard, Vancouver, ISO, and other styles

5

M Tasbolatov, N. Mekebayev, O. Mamyrbayev, M. Turdalyuly, D. Oralbekova,. "Algorithms and architectures of speech recognition systems." Psychology and Education Journal 58, no. 2 (February 20, 2021): 6497–501. http://dx.doi.org/10.17762/pae.v58i2.3182.

Full text

Abstract:

Digital processing of speech signal and the voice recognition algorithm is very important for fast and accurate automatic scoring of the recognition technology. A voice is a signal of infinite information. The direct analysis and synthesis of a complex speech signal is due to the fact that the information is contained in the signal. Speech is the most natural way of communicating people. The task of speech recognition is to convert speech into a sequence of words using a computer program. This article presents an algorithm of extracting MFCC for speech recognition. The MFCC algorithm reduces the processing power by 53% compared to the conventional algorithm. Automatic speech recognition using Matlab.

APA, Harvard, Vancouver, ISO, and other styles

6

Zheng, Jian, and Tian De Gao. "A Dual-DSP Sonobuoy Signal Processing System." Applied Mechanics and Materials 571-572 (June 2014): 873–77. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.873.

Full text

Abstract:

Sonobuoy is used as aviation antisubmarine device to detect submarines, and its wireless communication mechanism would introduce radio interference. The speech signal needs to be identified from the submarine noise in order to facilitate sonar signal processing system to do further processing of the signal. This paper presents a TS-201 based dual-DSP sonobuoy signal processing system, and proposes an algorithm using Cubic Spline Interpolation and Pearson correlation coefficient to identify the speech signal from submarine radiated noise signal. This article describes the specific signal processing algorithm of the system, the hardware and software design of the system. This article uses a large number of data from experiments to test the hardware and software systems separately. The results of tests are analyzed, which indicate that the system function well in identifying speech signal from submarine radiated noise signal to, with a percentage of 98% correct rate.

APA, Harvard, Vancouver, ISO, and other styles

7

Hills, A., and K. Scott. "Perceived degradation effects in packet speech systems." IEEE Transactions on Acoustics, Speech, and Signal Processing 35, no. 5 (May 1987): 699–701. http://dx.doi.org/10.1109/tassp.1987.1165187.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Tsuhan. "Video signal processing systems and methods utilizing automated speech analysis." Journal of the Acoustical Society of America 112, no. 2 (2002): 368. http://dx.doi.org/10.1121/1.1507005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Varga, A., and F. Fallside. "A technique for using multipulse linear predictive speech synthesis in text-to-speech type systems." IEEE Transactions on Acoustics, Speech, and Signal Processing 35, no. 4 (April 1987): 586–87. http://dx.doi.org/10.1109/tassp.1987.1165151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

LÉVY-VÉHEL, JACQUES. "FRACTAL APPROACHES IN SIGNAL PROCESSING." Fractals 03, no. 04 (December 1995): 755–75. http://dx.doi.org/10.1142/s0218348x95000679.

Full text

Abstract:

Some recent advances in the application of fractal tools for studying complex signals are presented. The first part of the paper is devoted to a brief description of the theoretical methods used. These essentially consist of generalizations of previous techniques that allow us to efficiently handle real signals. We present some results dealing with the multifractal analysis of sequences of Choquet capacities, and the possibility of constructing such capacities with prescribed spectrum. Related results concerning the pointwise irregularity of a continuous function at each point are given in the frame of iterated functions systems. Finally, some results on a particular stochastic process are sketched: the multifractional Brownian motion, which is a generalization of the classical fractional Brownian motion, where the parameter H is replaced by a function. The second part consists of the description of selected applications of current interest, in the fields of image analysis, speech synthesis and road traffic modeling. In each case we try to show how a fractal approach provides new means to solve specific problems in signal processing, sometimes with greater success than classical methods.

APA, Harvard, Vancouver, ISO, and other styles

11

Popov, Dmitry, Artem Gapochkin, and Alexey Nekrasov. "An Algorithm of Daubechies Wavelet Transform in the Final Field When Processing Speech Signals." Electronics 7, no. 7 (July 18, 2018): 120. http://dx.doi.org/10.3390/electronics7070120.

Full text

Abstract:

Development and improvement of a mathematical model for a large-scale analysis based on the Daubechies discrete wavelet transform will be implemented in an algebraic system possessing a property of ring and field suitable for speech signals processing. Modular codes are widely used in many areas of modern information technologies. The use of these non-positional codes can provide high-speed data processing. Therefore, these algebraic systems should be used in the algorithms of digital processing of signals, which are characterized by processing large amounts of data in real time. In addition, modular codes make it possible to implement large-scale signal processing using the wavelet transform. The paper discusses examples of the Daubechies wavelet transform application. Integer processing, presented in the paper, will reduce the number of rounding errors when processing the speech signals.

APA, Harvard, Vancouver, ISO, and other styles

12

Ma, Lina, and Yanjie Lei. "Optimization of Computer Aided English Pronunciation Teaching System Based on Speech Signal Processing Technology." Computer-Aided Design and Applications 18, S3 (October 20, 2020): 129–40. http://dx.doi.org/10.14733/cadaps.2021.s3.129-140.

Full text

Abstract:

After the development of speech signal processing technology has matured, various language learning tools have begun to emerge. The speech signal processing technology has many functions, such as standard tape reading, making audio aids, synthesizing speech, and performing speech evaluation. Therefore, the adoption of speech signal processing technology in English pronunciation teaching can meet different teaching needs. Voice signal processing technology can present teaching information in different forms, and promote multi-form communication between teachers and students, and between students and students. This will help stimulate students' interest in learning English and improve the overall teaching level of English pronunciation. This research first investigates and studies the current level of English pronunciation mastery. After combining the relevant principles of speech signal processing technology, it puts forward the areas that need to be optimized in the design of the English pronunciation teaching system. Through the demand analysis and function analysis of the system, this research uses speech signal processing technology to extract the characteristics of the speech signal---Mel Frequency Cepstrum Coefficient (MFCC), The system's speech signal preprocessing, speech signal feature extraction and dynamic time warping (DTW) recognition algorithms are optimized. At the same time, this research combines multimedia teaching resources such as text, pronunciation video and excellent courses to study the realization process of each function of the system.

APA, Harvard, Vancouver, ISO, and other styles

13

Yang, Jie. "Combining Speech Enhancement and Cepstral Mean Normalization for LPC Cepstral Coefficients." Key Engineering Materials 474-476 (April 2011): 349–54. http://dx.doi.org/10.4028/www.scientific.net/kem.474-476.349.

Full text

Abstract:

A mismatch between the training and testing in noisy circumstance often causes a drastic decrease in the performance of speech recognition system. The robust feature coefficients might suppress this sensitivity of mismatch during the recognition stage. In this paper, we investigate the noise robustness of LPC Cepstral Coefficients (LPCC) by using speech enhancement with feature post-processing. At front-end, speech enhancement in the wavelet domain is used to remove noise components from noisy signals. This enhanced processing adopts the combination of discrete wavelet transform (DWT), wavelet packet decomposition (WPD), multi-thresholds processing etc to obtain the estimated speech. The feature post-processing employs cepstral mean normalization (CMN) to compensate the signal distortion and residual noise of enhanced signals in the cepstral domain. The performance of digit speech recognition systems is evaluated under noisy environments based on NOISEX-92 database. The experimental results show that the presented method exhibits performance improvements in the adverse noise environment compared with the previous features.

APA, Harvard, Vancouver, ISO, and other styles

14

Jiang, Dazhi, Zhihui He, Yingqing Lin, Yifei Chen, and Linyan Xu. "An Improved Unsupervised Single-Channel Speech Separation Algorithm for Processing Speech Sensor Signals." Wireless Communications and Mobile Computing 2021 (February 27, 2021): 1–13. http://dx.doi.org/10.1155/2021/6655125.

Full text

Abstract:

As network supporting devices and sensors in the Internet of Things are leaping forward, countless real-world data will be generated for human intelligent applications. Speech sensor networks, an important part of the Internet of Things, have numerous application needs. Indeed, the sensor data can further help intelligent applications to provide higher quality services, whereas this data may involve considerable noise data. Accordingly, speech signal processing method should be urgently implemented to acquire low-noise and effective speech data. Blind source separation and enhancement technique refer to one of the representative methods. However, in the unsupervised complex environment, in the only presence of a single-channel signal, many technical challenges are imposed on achieving single-channel and multiperson mixed speech separation. For this reason, this study develops an unsupervised speech separation method CNMF+JADE, i.e., a hybrid method combined with Convolutional Non-Negative Matrix Factorization and Joint Approximative Diagonalization of Eigenmatrix. Moreover, an adaptive wavelet transform-based speech enhancement technique is proposed, capable of adaptively and effectively enhancing the separated speech signal. The proposed method is aimed at yielding a general and efficient speech processing algorithm for the data acquired by speech sensors. As revealed from the experimental results, in the TIMIT speech sources, the proposed method can effectively extract the target speaker from the mixed speech with a tiny training sample. The algorithm is highly general and robust, capable of technically supporting the processing of speech signal acquired by most speech sensors.

APA, Harvard, Vancouver, ISO, and other styles

15

Delić, Vlado, Zoran Perić, Milan Sečujski, Nikša Jakovljević, Jelena Nikolić, Dragiša Mišković, Nikola Simić, Siniša Suzić, and Tijana Delić. "Speech Technology Progress Based on New Machine Learning Paradigm." Computational Intelligence and Neuroscience 2019 (June 25, 2019): 1–19. http://dx.doi.org/10.1155/2019/4368036.

Full text

Abstract:

Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computational intelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.

APA, Harvard, Vancouver, ISO, and other styles

16

Puder, Henning, and Gerhard Schmidt. "Applied speech and audio processing." Signal Processing 86, no. 6 (June 2006): 1121–23. http://dx.doi.org/10.1016/j.sigpro.2005.07.034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Vu, Nhat Truong Minh, Binh Hieu Nguyen, Nhat Minh Pham, Thuan Huu Huynh, Tu Trong Bui, and Quan Hai Vu. "Performing Text – To – Speech based Hidden Markov Model on Digital Signal Processing platform." Science and Technology Development Journal 17, no. 1 (March 31, 2014): 32–38. http://dx.doi.org/10.32508/stdj.v17i1.1240.

Full text

Abstract:

Text To Speech (TTS) using Hidden Markov Model (HMM) has become popular in recent years. However, because most of such systems were implemented on personal computers (PCs), it is difficult to offer these systems to real applications. In this paper, we present a hardware implementation of TTS based on DSP architecture, which is applicable for real applications. By optimizing hardware architecture, the quality of the DSP-based synthesized speech is nearly identical to that synthesized on PCs.

APA, Harvard, Vancouver, ISO, and other styles

18

Dovgal, V. M., and Min Zo Hein. "COMPARATIVE CHARACTERISTICS OF SOFTWARE SYSTEMS FOR ANALYSIS AND PROCESSING OF SPEECH SIGNALS USING WAVELETS." Herald of Dagestan State Technical University. Technical Sciences 45, no. 3 (May 12, 2019): 103–13. http://dx.doi.org/10.21822/2073-6185-2018-45-3-103-113.

Full text

Abstract:

Objectives. This article is devoted to the problem of processing and analysis of speech signals on the basis of the wavelet transform method, which has become one of the most relevant in recent years.Method. The growing relevance and undoubted practical value became the reason for the emergence of a large number of software systems that allow the processing of speech signals on the basis of this method. However, each of these systems has significant differences in the interface provided by the processing tools, functions, has a number of advantages and disadvantages. At the moment, a large number of manuals and recommendations for specific software packages have been written, but these materials are fragmented and unsystematic.Result. This article attempts to systematize the theoretical material and describe the similarities and differences, advantages and disadvantages of the three most popular software systems: 1) MATLAB 6.0/6.1/6.5 Wavelet Toolbox 2/2.1/2.2; 2) Mathcad; 3) Wavelet Explorer of Mathematica.Conclusion. This article will be useful for specialists dealing with the problem of speech signal processing using the wavelet transform method, as it contains material that has practical value, and will allow to facilitate the work of a specialist related to the selection of the optimal for the implementation of a specific task of the software complex.

APA, Harvard, Vancouver, ISO, and other styles

19

Järvinen, Kari. "Digital speech processing: Speech coding, synthesis, and recognition." Signal Processing 30, no. 1 (January 1993): 133–34. http://dx.doi.org/10.1016/0165-1684(93)90056-g.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Bengio, Samy. "Multimodal speech processing using asynchronous Hidden Markov Models." Information Fusion 5, no. 2 (June 2004): 81–89. http://dx.doi.org/10.1016/j.inffus.2003.04.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Alkalani, Fadi, and Raed Sahawneh. "Methods and Algorithms of Speech Signals Processing and Compression and Their Implementation in Computer Systems." Oriental journal of computer science and technology 10, no. 04 (December 11, 2017): 736–44. http://dx.doi.org/10.13005/ojcst/10.04.06.

Full text

Abstract:

The review and comparative analysis of the methods of compression and recognition of speech signals is carried out. The result of the carried out analysis of the existing recognition methods indicates, that all of them are based on the use of “inflexible” algorithms, which are badly adapted to the characteristic features of speech signals, thus degrading the efficiency of the operation of the whole recognition system. The necessity of the use of algorithms for determination of recognition features along with the use of the wavelet packet analysis as one of the advanced directions of the creation of the effective methods and principles of the development of the speech signals recognition systems is substantiated. Analysis of the compression methods with the use of the orthogonal transformations at the complete exception of minimal decomposition factors is conducted; a maximal possible compression degree is defined. In this compression method the orthogonal transformation of the signal segment with the subsequent exception of the set of the smallest modulo decomposition factors, irrespective of the order of their distribution, is conducted. Therefore the additional transfer of the information on the factors distribution is required. As a result, two information streams appear, the first one corresponds to the information stream on the decomposition factors, and the second stream transfers information on the distribution of these factors. Method of the determination of the speech signals recognition features and the algorithm for nonlinear time normalization is proposed and proved. Wavelet-packet transformation is adaptive, i.e. it allows adapting to the signal features more accurately by means of the choice of the proper tree of the optimal decomposition form, which provides the minimal number of wavelet factors at the prescribed accuracy of signal reconstruction, thus eliminating the information-surplus and unnecessary details of the signals. Estimation of the informativeness of the set of wavelet factors is accomplished by the entropy. In order to obtain the recognition factors, the spectral analysis operation is used. In order to carry out the temporary normalization, the deforming function is found, the use of which minimizes the discrepancy between the standard and new words realization. Dedicated to the determination of admissible compression factors on the basis of the orthogonal transformations use at the incomplete elimination of the set of minimal decomposition factors, to the creation of the block diagram of the method of the recognition features formation, to the practical testing of the software- methods. In order to elevate the compression factor, the adaptive uniform quantization is used, where the adaptation is conducted for all the decomposition factors. The program testing of the recognition methods is carried out by means of determination of the classification error probability using Mahalanobis (Gonzales) distance.

APA, Harvard, Vancouver, ISO, and other styles

22

Murthy, Hema A., and B. Yegnanarayana. "Speech processing using group delay functions." Signal Processing 22, no. 3 (March 1991): 259–67. http://dx.doi.org/10.1016/0165-1684(91)90014-a.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Chien, Jen-Tzung, and Man-Wai Mak. "Guest Editorial: Modern Speech Processing and Learning." Journal of Signal Processing Systems 92, no. 8 (July 9, 2020): 775–76. http://dx.doi.org/10.1007/s11265-020-01577-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Savchenko, L. V., and A. V. Savchenko. "Fuzzy Phonetic Encoding of Speech Signals in Voice Processing Systems." Journal of Communications Technology and Electronics 64, no. 3 (March 2019): 238–44. http://dx.doi.org/10.1134/s1064226919030173.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Кропотов, Юрий, Yuriy Kropotov, Алексей Белов, Aleksey Belov, Александр Проскуряков, and Aleksandr Proskuryakov. "EFFECTIVENESS INCREASE IN AUDIO EXCHANGE TELECOMMUNICATION SYSTEMS UNDER CONDITIONS OF EXTERNAL ACOUSTIC NOISE BY METHODS OF ADAPTIVE FILTERING." Bulletin of Bryansk state technical university 2019, no. 3 (March 27, 2019): 71–77. http://dx.doi.org/10.30987/article_5c8b5cebac6217.27543313.

Full text

Abstract:

Signal processing in the telecommunication systems of audioinformation exchange is conditioned on the requirement in the separation of useful speech acoustic information, in the increase of the verification of information perception by subscribers of a communication system, in the stability increase of telecommunication systems at the suppression of external acoustic interference and echosignal compensations. Therefore during designing telecommunication systems, in particular, speakerphone systems (SS) operating under conditions of an active impact of external acoustic interference and echosignals, there is specified a problem of the algorithm formation of efficient noise suppression for an essential “signal-noise” ratio support. The investigation object is design methods of adaptive algorithms for speech signals processing and acoustic interference suppression at the expense of the controlled change of a rejection area in the range from 0 to 300…1000Hz depending on an interference situation. The work aim is an investigation of the speech signals characteristics and acoustic noise of different nature and also problems consideration in the matter of an algorithm creation for adaptive filtering and suppression of external acoustic interference and echosignals. At that the increase of the “signal-acoustic interference” ratio in the systems of audioexchange telecommunications is carried out through the methods of adaptive filtering. The results obtained in the course of the investigations of different acoustic interference suppression show that through the method of linear filtering in the system of telecommunications of speech information exchange it is possible to ensure the essential ratio Rs/Rak.pom. >20 dB and, accordingly, an essential syllab-ic legibility S ≥ 93%.

APA, Harvard, Vancouver, ISO, and other styles

26

Sussman, Harvey M., David Fruchter, Jon Hilbert, and Joseph Sirosh. "Linear correlates in the speech signal: The orderly output constraint." Behavioral and Brain Sciences 21, no. 2 (April 1998): 241–59. http://dx.doi.org/10.1017/s0140525x98001174.

Full text

Abstract:

Neuroethological investigations of mammalian and avian auditory systems have documented species-specific specializations for processing complex acoustic signals that could, if viewed in abstract terms, have an intriguing and striking relevance for human speech sound categorization and representation. Each species forms biologically relevant categories based on combinatorial analysis of information-bearing parameters within the complex input signal. This target article uses known neural models from the mustached bat and barn owl to develop, by analogy, a conceptualization of human processing of consonant plus vowel sequences that offers a partial solution to the noninvariance dilemma – the nontransparent relationship between the acoustic waveform and the phonetic segment. Critical input sound parameters used to establish species-specific categories in the mustached bat and barn owl exhibit high correlation and linearity due to physical laws. A cue long known to be relevant to the perception of stop place of articulation is the second formant (F2) transition. This article describes an empirical phenomenon – the locus equations – that describes the relationship between the F2 of a vowel and the F2 measured at the onset of a consonant-vowel (CV) transition. These variables, F2 onset and F2 vowel within a given place category, are consistently and robustly linearly correlated across diverse speakers and languages, and even under perturbation conditions as imposed by bite blocks. A functional role for this category-level extreme correlation and linearity (the “orderly output constraint”) is hypothesized based on the notion of an evolutionarily conserved auditory-processing strategy. High correlation and linearity between critical parameters in the speech signal that help to cue place of articulation categories might have evolved to satisfy a preadaptation by mammalian auditory systems for representing tightly correlated, linearly related components of acoustic signals.

APA, Harvard, Vancouver, ISO, and other styles

27

Finke, Mareike, Pascale Sandmann, Hanna Bönitz, Andrej Kral, and Andreas Büchner. "Consequences of Stimulus Type on Higher-Order Processing in Single-Sided Deaf Cochlear Implant Users." Audiology and Neurotology 21, no. 5 (2016): 305–15. http://dx.doi.org/10.1159/000452123.

Full text

Abstract:

Single-sided deaf subjects with a cochlear implant (CI) provide the unique opportunity to compare central auditory processing of the electrical input (CI ear) and the acoustic input (normal-hearing, NH, ear) within the same individual. In these individuals, sensory processing differs between their two ears, while cognitive abilities are the same irrespectively of the sensory input. To better understand perceptual-cognitive factors modulating speech intelligibility with a CI, this electroencephalography study examined the central-auditory processing of words, the cognitive abilities, and the speech intelligibility in 10 postlingually single-sided deaf CI users. We found lower hit rates and prolonged response times for word classification during an oddball task for the CI ear when compared with the NH ear. Also, event-related potentials reflecting sensory (N1) and higher-order processing (N2/N4) were prolonged for word classification (targets versus nontargets) with the CI ear compared with the NH ear. Our results suggest that speech processing via the CI ear and the NH ear differs both at sensory (N1) and cognitive (N2/N4) processing stages, thereby affecting the behavioral performance for speech discrimination. These results provide objective evidence for cognition to be a key factor for speech perception under adverse listening conditions, such as the degraded speech signal provided from the CI.

APA, Harvard, Vancouver, ISO, and other styles

28

Wszołek, W., A. Izworski, and G. Izworski. "Signal Processing and Analysis of Pathological Speech Using Artificial Intelligence and Learning Systems Methods." Acta Physica Polonica A 123, no. 5 (June 2013): 995–1000. http://dx.doi.org/10.12693/aphyspola.123.995.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Deng, L., Y. Wang, K. Wang, A. Acero, H. Hon, J. Droppo, C. Boulis, M. Mahajan, and X. D. Huang. "Speech and Language Processing for Multimodal Human-Computer Interaction." Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, no. 2/3 (February 2004): 161–87. http://dx.doi.org/10.1023/b:vlsi.0000015095.19623.73.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Czap, Laszlo, and Judit Pinter. "Noise Reduction in Voice Controlled Logistic Systems." Applied Mechanics and Materials 309 (February 2013): 260–67. http://dx.doi.org/10.4028/www.scientific.net/amm.309.260.

Full text

Abstract:

The most comfortable way of human communication is speech, which is a possible channel of human-machine interface as well. Moreover, a voice driven system can be controlled with busy hands. Performance of a speech recognition system is highly decayed by presence of noise. Logistic systems typically work in noisy environment, so noise reduction is crucial in industrial speech processing systems. Traditional noise reduction procedures (e.g. Wiener and Kalman filters) are effective on stationary or Gaussian noise. The noise of a real workplace can be captured by an additional microphone: The voice microphone takes both speech and noise, while the noise mike takes only the noise signal. Because of the phase shift of the two signals, simple subtraction in time domain is ineffective. In this paper, we discuss a spectral representation modeling the noise and voice signals. A frequency spectrum based noise cancellation method is proposed and verified in real industrial environment.

APA, Harvard, Vancouver, ISO, and other styles

31

HimaBindu, Gottumukkala, Gondi Lakshmeeswari, Giddaluru Lalitha, and Pedalanka P. S. Subhashini. "Recognition Using DNN with Bacterial Foraging Optimization Using MFCC Coefficients." Journal Européen des Systèmes Automatisés 54, no. 2 (April 27, 2021): 283–87. http://dx.doi.org/10.18280/jesa.540210.

Full text

Abstract:

Speech is an important mode of communication for people. For a long time, researchers have been working hard to develop conversational machines which will communicate with speech technology. Voice recognition is a part of a science called signal processing. Speech recognition is becoming more successful for providing user authentication. The process of user recognition is becoming more popular now a days for providing security by authenticating the users. With the rising importance of automated information processing and telecommunications, the usefulness of recognizing an individual from the features of user voice is increasing. In this paper, the three stages of speech recognition processing are defined as pre-processing, feature extraction and decoding. Speech comprehension has been significantly enhanced by using foreign languages. Automatic Speech Recognition (ASR) aims to translate text to speech. Speaker recognition is the method of recognizing an individual through his/her voice signals. The new speaker initially privileges identity for speaker authentication, and then the stated model is used for identification. The identity argument is approved when the match is above a predefined threshold. The speech used for these tasks may be either text-dependent or text-independent. The article uses Bacterial Foraging Optimization Algorithm (BFO) for accurate speech recognition through Mel Frequency Cepstral Coefficients (MFCC) model using DNN. Speech recognition efficiency is compared to that of the conventional system.

APA, Harvard, Vancouver, ISO, and other styles

32

Jiang, Yi, Hong Zhou, Yuan Yuan Zu, and Xiao Chen. "Energy Based Dual-Microphone Electronic Speech Segregation." Applied Mechanics and Materials 385-386 (August 2013): 1381–84. http://dx.doi.org/10.4028/www.scientific.net/amm.385-386.1381.

Full text

Abstract:

Speech segregation based on energy has a good performance on dual-microphone electronic speech signal processing. The implication of the binary mask to an auditory mixture has been shown to yield substantial improvements in signal-to-noise-ratio (SNR) and intelligibility. To evaluate the performance of a binary mask based dual microphone speech enhancement algorithm, various spatial noise sources and reverberation test conditions are used. Two compare dual microphone systems based on energy difference and machine learning are used at the same time. Result with SNR and speech intelligibility show that more robust performance can be achieved than the two compare systems.

APA, Harvard, Vancouver, ISO, and other styles

33

Hiroshige, Makoto, Yoshikazu Miyanaga, and Koji Tochinai. "An Adaptive Signal Processing System for Voiced Speech with Automatic Spectrum Selector." Systems and Computers in Japan 21, no. 4 (1990): 85–96. http://dx.doi.org/10.1002/scj.4690210409.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Manoharan, Samuel, and Narain Ponraj. "Analysis of Complex Non-Linear Environment Exploration in Speech Recognition by Hybrid Learning Technique." December 2020 2, no. 4 (February 19, 2021): 202–9. http://dx.doi.org/10.36548//jiip.2020.4.005.

Full text

Abstract:

Recently, the application of voice-controlled interfaces plays a major role in many real-time environments such as a car, smart home and mobile phones. In signal processing, the accuracy of speech recognition remains a thought-provoking challenge. The filter designs assist speech recognition systems in terms of improving accuracy by parameter tuning. This task is some degree of form filter’s narrowed specifications which lead to complex nonlinear problems in speech recognition. This research aims to provide analysis on complex nonlinear environment and exploration with recent techniques in the combination of statistical-based design and Support Vector Machine (SVM) based learning techniques. Dynamic Bayes network is a dominant technique related to speech processing characterizing stack co-occurrences. This method is derived from mathematical and statistical formalism. It is also used to predict the word sequences along with the posterior probability method with the help of phonetic word unit recognition. This research involves the complexities of signal processing that it is possible to combine sentences with various types of noises at different signal-to-noise ratios (SNR) along with the measure of comparison between the two techniques.

APA, Harvard, Vancouver, ISO, and other styles

35

Manoharan, Samuel, and Narain Ponraj. "Analysis of Complex Non-Linear Environment Exploration in Speech Recognition by Hybrid Learning Technique." December 2020 2, no. 4 (February 19, 2021): 202–9. http://dx.doi.org/10.36548/jiip.2020.4.005.

Full text

Abstract:

Recently, the application of voice-controlled interfaces plays a major role in many real-time environments such as a car, smart home and mobile phones. In signal processing, the accuracy of speech recognition remains a thought-provoking challenge. The filter designs assist speech recognition systems in terms of improving accuracy by parameter tuning. This task is some degree of form filter’s narrowed specifications which lead to complex nonlinear problems in speech recognition. This research aims to provide analysis on complex nonlinear environment and exploration with recent techniques in the combination of statistical-based design and Support Vector Machine (SVM) based learning techniques. Dynamic Bayes network is a dominant technique related to speech processing characterizing stack co-occurrences. This method is derived from mathematical and statistical formalism. It is also used to predict the word sequences along with the posterior probability method with the help of phonetic word unit recognition. This research involves the complexities of signal processing that it is possible to combine sentences with various types of noises at different signal-to-noise ratios (SNR) along with the measure of comparison between the two techniques.

APA, Harvard, Vancouver, ISO, and other styles

36

Bamford, John, Mary Hostler, and Gareth Pont. "Digital Signal Processing Hearing Aids, Personal FM Systems, and Interference: Is There a Problem?" Ear and Hearing 26, no. 3 (June 2005): 341–49. http://dx.doi.org/10.1097/00003446-200506000-00009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Xu, Yang, Zhe Zhang, and Zhi Yu Huang. "Vehicle Embedded Speech Recognition and Control System Research and Implementation." Applied Mechanics and Materials 494-495 (February 2014): 104–7. http://dx.doi.org/10.4028/www.scientific.net/amm.494-495.104.

Full text

Abstract:

For the driver in the process of moving inconvenient to manually operated vehicle electronics, as well as the monopoly of foreign technology and other issues related , a framework based on DSP + MCU car speech recognition and control systems is designed. According to the embedded application environment, the corresponding recognition algorithm and the hardware architecture of DSP + MCU are chosen, in which DSP is mainly responsible for voice signal processing work, MCU is responsible for communicating with DSP and MCU to obtain recognition results after speech signal processing, as the final control instructions .The experimental results show that the hardware platform can run normally, and control the car body light on experimental bench.

APA, Harvard, Vancouver, ISO, and other styles

38

Motz, Hans, and Frank Rattay. "Signal Processing Strategies for Electrostimulated Ear Prostheses Based on Simulated Nerve Response." Perception 16, no. 6 (December 1987): 777–84. http://dx.doi.org/10.1068/p160777.

Full text

Abstract:

Improvements to the coding strategy of prostheses for the profoundly deaf which can be implemented by the speech processors of these devices are suggested. Difficulties with vowel recognition are diagnosed as being due to nerve properties. These are studied with the aid of a model in order to find ways of overcoming the difficulties.

APA, Harvard, Vancouver, ISO, and other styles

39

Agrawal, S. K., and O. P. Sahu. "Two-Channel Quadrature Mirror Filter Bank: An Overview." ISRN Signal Processing 2013 (September 3, 2013): 1–10. http://dx.doi.org/10.1155/2013/815619.

Full text

Abstract:

During the last two decades, there has been substantial progress in multirate digital filters and filter banks. This includes the design of quadrature mirror filters (QMF). A two-channel QMF bank is extensively used in many signal processing fields such as subband coding of speech signal, image processing, antenna systems, design of wavelet bases, and biomedical engineering and in digital audio industry. Therefore, new efficient design techniques are being proposed by several authors in this area. This paper presents an overview of analysis and design techniques of the two-channel QMF bank. Application in the area of subband coding and future research trends are also discussed.

APA, Harvard, Vancouver, ISO, and other styles

40

Reddy, Bharathi, D. Leela Rani, and Prof S. Varadarajan. "HIGH SPEED CARRY SAVE MULTIPLIER BASED LINEAR CONVOLUTION USING VEDIC MATHAMATICS." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 4, no. 2 (May 15, 2013): 284–87. http://dx.doi.org/10.24297/ijct.v4i2a2.3173.

Full text

Abstract:

VLSI applications include Digital Signal Processing, Digital control systems, Telecommunications, Speech and Audio processing for audiology and speech language pathology. The latest research in VLSI is the design and implementation of DSP systems which are essential for above applications. The fundamental computation in DSP Systems is convolution. Convolution and LTI systems are the heart and soul of DSP. The behavior of LTI systems in continuous time is described by Convolution integral whereas the behavior in discrete-time is described by Linear convolution. In this paper, Linear convolution is performed using carry save multiplier architecture based on vertical and cross wise algorithm of Urdhva â€“ Tiryagbhyam in Vedic mathematics. Coding is done using Verilog HDL(verilog Hardware Description Language). Simulation and Synthesis are performed using Xilinx FPGA

APA, Harvard, Vancouver, ISO, and other styles

41

Scollie, Susan, Charla Levy, Nazanin Pourmand, Parvaneh Abbasalipour, Marlene Bagatto, Frances Richert, Shane Moodie, Jeff Crukley, and Vijay Parsa. "Fitting Noise Management Signal Processing Applying the American Academy of Audiology Pediatric Amplification Guideline: Verification Protocols." Journal of the American Academy of Audiology 27, no. 03 (March 2016): 237–51. http://dx.doi.org/10.3766/jaaa.15060.

Full text

Abstract:

Background: Although guidelines for fitting hearing aids for children are well developed and have strong basis in evidence, specific protocols for fitting and verifying some technologies are not always available. One such technology is noise management in children’s hearing aids. Children are frequently in high-level and/or noisy environments, and many options for noise management exist in modern hearing aids. Verification protocols are needed to define specific test signals and levels for use in clinical practice. Purpose: This work aims to (1) describe the variation in different brands of noise reduction processors in hearing aids and the verification of these processors and (2) determine whether these differences are perceived by 13 children who have hearing loss. Finally, we aimed to develop a verification protocol for use in pediatric clinical practice. Study Sample: A set of hearing aids was tested using both clinically available test systems and a reference system, so that the impacts of noise reduction signal processing in hearing aids could be characterized for speech in a variety of background noises. A second set of hearing aids was tested across a range of audiograms and across two clinical verification systems to characterize the variance in clinical verification measurements. Finally, a set of hearing aid recordings that varied by type of noise reduction was rated for sound quality by children with hearing loss. Results: Significant variation across makes and models of hearing aids was observed in both the speed of noise reduction activation and the magnitude of noise reduction. Reference measures indicate that noise-only testing may overestimate noise reduction magnitude compared to speech-in-noise testing. Variation across clinical test signals was also observed, indicating that some test signals may be more successful than others for characterization of hearing aid noise reduction. Children provided different sound quality ratings across hearing aids, and for one hearing aid rated the sound quality as higher with the noise reduction system activated. Conclusions: Implications for clinical verification systems may be that greater standardization and the use of speech-in-noise test signals may improve the quality and consistency of noise reduction verification cross clinics. A suggested clinical protocol for verification of noise management in children’s hearing aids is suggested.

APA, Harvard, Vancouver, ISO, and other styles

42

Liu, Ching-Feng, Wei-Siang Ciou, Peng-Ting Chen, and Yi-Chun Du. "A Real-Time Speech Separation Method Based on Camera and Microphone Array Sensors Fusion Approach." Sensors 20, no. 12 (June 22, 2020): 3527. http://dx.doi.org/10.3390/s20123527.

Full text

Abstract:

In the context of assisted human, identifying and enhancing non-stationary speech targets speech in various noise environments, such as a cocktail party, is an important issue for real-time speech separation. Previous studies mostly used microphone signal processing to perform target speech separation and analysis, such as feature recognition through a large amount of training data and supervised machine learning. The method was suitable for stationary noise suppression, but relatively limited for non-stationary noise and difficult to meet the real-time processing requirement. In this study, we propose a real-time speech separation method based on an approach that combines an optical camera and a microphone array. The method was divided into two stages. Stage 1 used computer vision technology with the camera to detect and identify interest targets and evaluate source angles and distance. Stage 2 used beamforming technology with microphone array to enhance and separate the target speech sound. The asynchronous update function was utilized to integrate the beamforming control and speech processing to reduce the effect of the processing delay. The experimental results show that the noise reduction in various stationary and non-stationary noise environments were 6.1 dB and 5.2 dB respectively. The response time of speech processing was less than 10ms, which meets the requirements of a real-time system. The proposed method has high potential to be applied in auxiliary listening systems or machine language processing like intelligent personal assistant.

APA, Harvard, Vancouver, ISO, and other styles

43

Xie, Xin, and Emily Myers. "Left Inferior Frontal Gyrus Sensitivity to Phonetic Competition in Receptive Language Processing: A Comparison of Clear and Conversational Speech." Journal of Cognitive Neuroscience 30, no. 3 (March 2018): 267–80. http://dx.doi.org/10.1162/jocn_a_01208.

Full text

Abstract:

The speech signal is rife with variations in phonetic ambiguity. For instance, when talkers speak in a conversational register, they demonstrate less articulatory precision, leading to greater potential for confusability at the phonetic level compared with a clear speech register. Current psycholinguistic models assume that ambiguous speech sounds activate more than one phonological category and that competition at prelexical levels cascades to lexical levels of processing. Imaging studies have shown that the left inferior frontal gyrus (LIFG) is modulated by phonetic competition between simultaneously activated categories, with increases in activation for more ambiguous tokens. Yet, these studies have often used artificially manipulated speech and/or metalinguistic tasks, which arguably may recruit neural regions that are not critical for natural speech recognition. Indeed, a prominent model of speech processing, the dual-stream model, posits that the LIFG is not involved in prelexical processing in receptive language processing. In the current study, we exploited natural variation in phonetic competition in the speech signal to investigate the neural systems sensitive to phonetic competition as listeners engage in a receptive language task. Participants heard nonsense sentences spoken in either a clear or conversational register as neural activity was monitored using fMRI. Conversational sentences contained greater phonetic competition, as estimated by measures of vowel confusability, and these sentences also elicited greater activation in a region in the LIFG. Sentence-level phonetic competition metrics uniquely correlated with LIFG activity as well. This finding is consistent with the hypothesis that the LIFG responds to competition at multiple levels of language processing and that recruitment of this region does not require an explicit phonological judgment.

APA, Harvard, Vancouver, ISO, and other styles

44

Wagner, Luise, Stefan K. Plontke, and Torsten Rahne. "Perception of Iterated Rippled Noise Periodicity in Cochlear Implant Users." Audiology and Neurotology 22, no. 2 (2017): 104–15. http://dx.doi.org/10.1159/000478649.

Full text

Abstract:

Pitch perception is more challenging for individuals with cochlear implants (CIs) than normal-hearing subjects because the signal processing by CIs is restricted. Processing and perceiving the periodicity of signals may contribute to pitch perception. Whether individuals with CIs can discern pitch within an iterated rippled noise (IRN) signal is still unclear. In a prospective controlled psychoacoustic study with 34 CI users and 15 normal-hearing control subjects, the difference limen between IRN signals with different numbers of iterations was measured. In 7 CI users and 15 normal-hearing control listeners with single-sided deafness, pitch matching between IRN and harmonic complex tones was measured. The pitch onset response (POR) following signal changes from white noise to IRN was measured electrophysiologically. The CI users could discriminate different numbers of iteration in IRN signals, but worse than normal-hearing listeners. A POR was measured for both normal-hearing subjects and CI users increasing with the pitch salience of the IRN. This indicates that the POR could serve as an objective measure to monitor progress during audioverbal therapy after CI surgery.

APA, Harvard, Vancouver, ISO, and other styles

45

Xie, Lei, Tan Lee, and Man-Wai Mak. "Guest Editorial: Advances in Deep Learning for Speech Processing." Journal of Signal Processing Systems 90, no. 7 (February 17, 2018): 959–61. http://dx.doi.org/10.1007/s11265-018-1333-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

HOU, YU. "A COMPACTLY SUPPORTED, SYMMETRICAL AND QUASI-ORTHOGONAL WAVELET." International Journal of Wavelets, Multiresolution and Information Processing 08, no. 06 (November 2010): 931–40. http://dx.doi.org/10.1142/s0219691310003900.

Full text

Abstract:

Based on the wavelet theory and optimization method, a class of single wavelets with compact support, symmetry and quasi-orthogonality are designed and constructed. Some mathematical properties of the wavelets, such as orthogonality, linear phase property and vanishing moments and so on, are studied. A speech compression experiment is implemented in order to investigate the performance of signal reconstruction and speech compression for the proposed wavelets. Comparison with some conventional wavelets shows that the proposed wavelets have a very good performance of signal reconstruction and speech compression.

APA, Harvard, Vancouver, ISO, and other styles

47

James, Praveen Edward, Hou Kit Mun, and Chockalingam Aravind Vaithilingam. "A Hybrid Spoken Language Processing System for Smart Device Troubleshooting." Electronics 8, no. 6 (June 16, 2019): 681. http://dx.doi.org/10.3390/electronics8060681.

Full text

Abstract:

The purpose of this work is to develop a spoken language processing system for smart device troubleshooting using human-machine interaction. This system combines a software Bidirectional Long Short Term Memory Cell (BLSTM)-based speech recognizer and a hardware LSTM-based language processor for Natural Language Processing (NLP) using the serial RS232 interface. Mel Frequency Cepstral Coefficient (MFCC)-based feature vectors from the speech signal are directly input into a BLSTM network. A dropout layer is added to the BLSTM layer to reduce over-fitting and improve robustness. The speech recognition component is a combination of an acoustic modeler, pronunciation dictionary, and a BLSTM network for generating query text, and executes in real time with an 81.5% Word Error Rate (WER) and average training time of 45 s. The language processor comprises a vectorizer, lookup dictionary, key encoder, Long Short Term Memory Cell (LSTM)-based training and prediction network, and dialogue manager, and transforms query intent to generate response text with a processing time of 0.59 s, 5% hardware utilization, and an F1 score of 95.2%. The proposed system has a 4.17% decrease in accuracy compared with existing systems. The existing systems use parallel processing and high-speed cache memories to perform additional training, which improves the accuracy. However, the performance of the language processor has a 36.7% decrease in processing time and 50% decrease in hardware utilization, making it suitable for troubleshooting smart devices.

APA, Harvard, Vancouver, ISO, and other styles

48

Falk, Tiago H., and Sebastian Moller. "Towards Signal-Based Instrumental Quality Diagnosis for Text-to-Speech Systems." IEEE Signal Processing Letters 15 (2008): 781–84. http://dx.doi.org/10.1109/lsp.2008.2006709.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Ryan, Michael J., Nicole M. Kime, and Gil G. Rosenthal. "Patterns of evolution in human speech processing and animal communication." Behavioral and Brain Sciences 21, no. 2 (April 1998): 282–83. http://dx.doi.org/10.1017/s0140525x98481172.

Full text

Abstract:

We consider Sussman et al.'s suggestion that auditory biases for processing low-noise relationships among pairs of acoustic variables is a preadaptation for human speech processing. Data from other animal communication systems, especially those involving sexual selection, also suggest that neural biases in the receiver system can generate strong selection on the form of communication signals.

APA, Harvard, Vancouver, ISO, and other styles

50

BALASA, FLORIN, FRANK H. M. FRANSSEN, FRANCKY V. M. CATTHOOR, and HUGO J. DE MAN. "TRANSFORMATION OF NESTED LOOPS WITH MODULO INDEXING TO AFFINE RECURRENCES." Parallel Processing Letters 04, no. 03 (September 1994): 271–80. http://dx.doi.org/10.1142/s0129626494000260.

Full text

Abstract:

For multi-dimensional (M-D) signal and data processing systems, transformation of algorithmic specifications is a major instrument both in code optimization and code generation for parallelizing compilers and in control flow optimization as a preprocessor for architecture synthesis. State-of-the-art transformation techniques are limited to affine index expressions. This is however not sufficient for many important applications in image, speech and numerical processing. In this paper, a novel transformation method is introduced, oriented to the subclass of algorithm specifications that contains modulo expressions of affine functions to index M-D signals. The method employs extensively the concept of Hermite normal form. The transformation method can be carried out in polynomial time, applying only integer arithmetic.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Speech processing systems. Signal processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles