To see the other types of publications on this topic, follow the link: Speech - Signal Processing.

Dissertations / Theses on the topic 'Speech - Signal Processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Speech - Signal Processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Little, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Full text
Abstract:
Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.
APA, Harvard, Vancouver, ISO, and other styles
2

Wells, Ian. "Digital signal processing architectures for speech recognition." Thesis, University of the West of England, Bristol, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294705.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Morris, Robert W. "Enhancement and recognition of whispered speech." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rex, James Alexander. "Microphone signal processing for speech recognition in cars." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.326728.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Shah, Afnan Arafat. "Improving automatic speech recognition transcription through signal processing." Thesis, University of Southampton, 2017. https://eprints.soton.ac.uk/418970/.

Full text
Abstract:
Automatic speech recognition (ASR) in the educational environment could be a solution to address the problem of gaining access to the spoken words of a lecture for many students who find lectures hard to understand, such as those whose mother tongue is not English or who have a hearing impairment. In such an environment, it is difficult for ASR to provide transcripts with Word Error Rates (WER) less than 25% for the wide range of speakers. Reducing the WER reduces the time and therefore cost of correcting errors in the transcripts. To deal with the variation of acoustic features between speakers, ASR systems implement automatic vocal tract normalisation (VTN) that warps the formants (resonant frequencies) of the speaker to better match the formants of the speakers in the training set. The ASR also implements automatic dynamic time warping (DTW) to deal with variation in the speaker’s rate of speaking, by aligning the time series of the new spoken words with the time series of the matching spoken words of the training set. This research investigates whether the ASR’s automatic estimation of VTN and DTW can be enhanced through pre-processing the recording by manually warping the formants and speaking rate of the recordings using sound processing libraries (Rubber Band and SoundTouch) before transcribing the pre-processed recordings using ASR. An initial experiment, performed with the recordings of two male and two female speakers, showed that pre-processing the recording could improve the WER by an average of 39.5% for male speakers and 36.2% for female speakers. However the selection of the best warp factors was achieved through an iterative ‘trial and error’ approach that involved many hours calculating the word error rate for each warp factor setting. Finding a more efficient approach for selecting the warp factors for pre-processing was then investigated. The second experiment investigated the development of a modification function using, as its training set, the best warp factors from the ‘trial and error’ approach to estimate the modification percentage required to improve the WER of a recording. A modification function was found that on average improved the WER by 16% for female speakers and 7% for male speakers.
APA, Harvard, Vancouver, ISO, and other styles
7

Wu, Ping. "Kohonen self-organising neural networks in speech signal processing." Thesis, University of Reading, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386985.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Stringer, Paul David. "Binaural signal processing for the enhancement of speech perception." Thesis, University of York, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282296.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hanna, Salim Alia. "Digital signal processing algorithms for speech coding and recognition." Thesis, Imperial College London, 1987. http://hdl.handle.net/10044/1/46268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Toner, Edward. "The enhancement of noise corrupted speech signals." Thesis, University of the West of Scotland, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359727.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Edwards, Richard. "Advanced signal processing techniques for pitch synchronous sinusoidal speech coders." Thesis, University of Surrey, 2007. http://epubs.surrey.ac.uk/833/.

Full text
Abstract:
Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.
APA, Harvard, Vancouver, ISO, and other styles
13

Spittle, Gary. "An investigation into improving speech intelligibility using binaural signal processing." Thesis, University of York, 2009. http://etheses.whiterose.ac.uk/1141/.

Full text
Abstract:
This thesis sets out to improve the intelligibility of a target speech sound source when presented with simultaneous masking sounds. A summary of the human hearing system and methods for spatialising sounds is provided as background to the problem. A detailed review of relevant research in auditory masking, auditory continuity and speech intelligibility is discussed. Angular separation and sound object differentiation through amplitude modification are used to enhance a target speech sound. A novel method is developed for achieving this using only the binaural signals received at the ears of a listener. This new approach is termed an auditory lens. Psychoacoustic evaluation of the auditory lens processing has shown comparable intelligibility scores to direct spatialisation techniques which require prior knowledge of sound source spectral content and direction. The success of the auditory lens has led to a number of potential further research projects that will take the processing system closer to a real wearable product.
APA, Harvard, Vancouver, ISO, and other styles
14

Canagarajah, Cedric Nishanthan. "Digital signal processing techniques for speech enhancement in hearing aids." Thesis, University of Cambridge, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260433.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Wang, Tianyu Tom. "Toward an interpretive framework of two-dimensional speech-signal processing." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/65520.

Full text
Abstract:
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 177-179).
Traditional representations of speech are derived from short-time segments of the signal and result in time-frequency distributions of energy such as the short-time Fourier transform and spectrogram. Speech-signal models of such representations have had utility in a variety of applications such as speech analysis, recognition, and synthesis. Nonetheless, they do not capture spectral, temporal, and joint spectrotemporal energy fluctuations (or "modulations") present in local time-frequency regions of the time-frequency distribution. Inspired by principles from image processing and evidence from auditory neurophysiological models, a variety of twodimensional (2-D) processing techniques have been explored in the literature as alternative representations of speech; however, speech-based models are lacking in this framework. This thesis develops speech-signal models for a particular 2-D processing approach in which 2-D Fourier transforms are computed on local time-frequency regions of the canonical narrowband or wideband spectrogram; we refer to the resulting transformed space as the Grating Compression Transform (GCT). We argue for a 2-D sinusoidal-series amplitude modulation model of speech content in the spectrogram domain that relates to speech production characteristics such as pitch/noise of the source, pitch dynamics, formant structure and dynamics, and offset/onset content. Narrowband- and wideband-based models are shown to exhibit important distinctions in interpretation and oftentimes "dual" behavior. In the transformed GCT space, the modeling results in a novel taxonomy of signal behavior based on the distribution of formant and onset/offset content in the transformed space via source characteristics. Our formulation provides a speech-specific interpretation of the concept of "modulation" in 2-D processing in contrast to existing approaches that have done so either phenomenologically through qualitative analyses and/or implicitly through data-driven machine learning approaches. One implication of the proposed taxonomy is its potential for interpreting transformations of other time-frequency distributions such as the auditory spectrogram which is generally viewed as being "narrowband"/"wideband" in its low/high-frequency regions. The proposed signal model is evaluated in several ways. First, we perform analysis of synthetic speech signals to characterize its properties and limitations. Next, we develop an algorithm for analysis/synthesis of spectrograms using the model and demonstrate its ability to accurately represent real speech content. As an example application, we further apply the models in cochannel speaker separation, exploiting the GCT's ability to distribute speaker-specific content and often recover overlapping information through demodulation and interpolation in the 2-D GCT space. Specifically, in multi-pitch estimation, we demonstrate the GCT's ability to accurately estimate separate and crossing pitch tracks under certain conditions. Finally, we demonstrate the model's ability to separate mixtures of speech signals using both prior and estimated pitch information. Generalization to other speech-signal processing applications is proposed.
by Tianyu Tom Wang.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
16

Larreategui, Mikel. "High-quality text-to-speech synthesis using sinusoidal techniques." Thesis, Staffordshire University, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.309790.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text
Abstract:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
APA, Harvard, Vancouver, ISO, and other styles
18

Allred, Daniel Jackson. "Evaluation and Comparison of Beamforming Algorithms for Microphone Array Speech Processing." Thesis, Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11606.

Full text
Abstract:
Recent years have brought many new developments in the processing of speech and acoustic signals. Yet, despite this, the process of acquiring signals has gone largely unchanged. Adding spatial diversity to the repertoire of signal acquisition has long been known to offer advantages for processing signals further. The processing capabilities of mobile devices had not previously been able to handle the required computation to handle these previous streams of information. But current processing capabilities are such that the extra workload introduced by the addition of mutiple sensors on a mobile device are not over-burdensome. How these extra data streams can best be handled is still an open question. The present work deals with the examination of one type of spatial processing technique, known as beamforming. A microphone array test platform is constructed and verified through a number of beamforming agorithms. Issues related to speech acquisition through microphones arrays are discussed. The algorithms used for verification are presented in detail and compared to one another.
APA, Harvard, Vancouver, ISO, and other styles
19

Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.

Full text
Abstract:
Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.
APA, Harvard, Vancouver, ISO, and other styles
20

Elvira, Jose M. "Neural networks for speech and speaker recognition." Thesis, Staffordshire University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262314.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Oberhofer, Robert. "Pitch adaptive variable bitrate CELP speech coding." Thesis, University of Ulster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264811.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Sepehr, H. "Advanced adaptive signal processing techniques for low complexity speech enhancement applications." Thesis, University College London (University of London), 2011. http://discovery.ucl.ac.uk/1306808/.

Full text
Abstract:
This thesis research is focused on using subband and multi rate adaptive signal processing techniques in order to develop practical speech enhancement algorithms. This thesis comprises of research on three different speech enhancement applications. Firstly, design of a novel method for attenuation of a siren signal in an emergency telephony system (by use of single source siren noise reduction algorithms) is investigated. The proposed method is based on wavelet filter banks and series of adaptive notch filters in order to detect and attenuate the siren noise signal with minimal effect on quality of speech signal. Results of my testing show that this algorithm provides superior results in comparison to prior art solutions. Secondly, effect of time and frequency resolution of a filter bank used in a statistical single source noise reduction algorithm is investigated. Following this study, a novel method for improvement of time domain noise reduction algorithm is presented. The suggested method is based on detection of transient elements of speech signal followed by a time varying signal dependent filter bank. This structure provides a high time resolution at points of transient in a noisy speech signal hence temporal smearing of the processed signal is avoided. Additionally, this algorithm provides high frequency resolution at other times which results in a good performing noise reduction algorithm and benchmarking results against a prior art algorithm and a commercially available noise reduction solution show better performance of proposed algorithm. The time domain nature of algorithm provides a low processing delay algorithm that is suitable for applications with low latency requirement such as hearing aid devices. Thirdly, a low footprint delayless subband adaptive filtering algorithm for applications with low processing delay requirement such as echo cancellation (EC) in telephony networks is proposed. The suggested algorithm saves substantial memory and MIPS and provides significantly faster convergence rate in comparison with prior art algorithms. Finally, challenges and issues for implementation of real-time audio signal processing algorithms on DSP chipsets (especially low power DSPs) are briefly explained and some applications of research conducted in this thesis are presented.
APA, Harvard, Vancouver, ISO, and other styles
23

Fallatah, Anwar. "Speech Auditory Brainstem Response Signal Processing: Estimation, Modeling, Detection, and Enhancement." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39699.

Full text
Abstract:
The speech auditory brainstem response (sABR) is a promising technique for assessing the function of the auditory system. This non-invasive technique has shown utility as a marker of central processing disorders, some types of learning difficulties in children, and potentially for fitting hearing aids. However, the sABR needs a long recording time to obtain a reliable signal due to the high background noise, which limits its clinical applicability. The objective of this work is to develop methods to detect the sABR in high background noise and enhance it based on a modeling approach and through experimental testing. First, sABR noise estimation based on LQ/QR decomposition is derived, and its mathematical proof is shown. Second, an autoregression model is used to estimate the single-trial sABR which is then used to test several sABR detection and enhancement methods. Third, a novel Artificial Neural Network (ANN) based detection approach is proposed and compared using modeled and recorded data to other detection methods in the literature: Optimal Linear Filter (LF), Online Estimator (OE), Mutual Information (MI) and Artificial Neural Network based on the Discrete Wavelet Transform and Approximate Entropy (ANN DA). Finally, comprehensive evaluation of several sABR enhancement methods is performed, based on the Wiener Filter (WF), Maximum-SNR Filter (Max-SNR), Adaptive Noise Cancellation (ANC) with Least-Mean-Square (LMS), Affine Projection (AP) and Recursive-Least-Square (RLS) adaptation algorithm. The results show that the developed LQ/QR decomposition estimated noise is similar to the actual noise, and the modeled data are statistically similar to the recorded data. Moreover, the proposed ANN-based detection method is more accurate and requires less processing time than other methods, and the comprehensive evaluation of enhancement methods shows that RLS has best overall performance in enhancing the sABR. Therefore, the methods developed and evaluated in this work have the potential to reduce the required recording time for the sABR, and thus make it more practical as a clinical tool.
APA, Harvard, Vancouver, ISO, and other styles
24

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Rao, Hrishikesh. "Paralinguistic event detection in children's speech." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54332.

Full text
Abstract:
Paralinguistic events are useful indicators of the affective state of a speaker. These cues, in children's speech, are used to form social bonds with their caregivers. They have also been found to be useful in the very early detection of developmental disorders such as autism spectrum disorder (ASD) in children's speech. Prior work on children's speech has focused on the use of a limited number of subjects which don't have sufficient diversity in the type of vocalizations that are produced. Also, the features that are necessary to understand the production of paralinguistic events is not fully understood. To account for the lack of an off-the-shelf solution to detect instances of laughter and crying in children's speech, the focus of the thesis is to investigate and develop signal processing algorithms to extract acoustic features and use machine learning algorithms on various corpora. Results obtained using baseline spectral and prosodic features indicate the ability of the combination of spectral, prosodic, and dysphonation-related features that are needed to detect laughter and whining in toddlers' speech with different age groups and recording environments. The use of long-term features were found to be useful to capture the periodic properties of laughter in adults' and children's speech and detected instances of laughter to a high degree of accuracy. Finally, the thesis focuses on the use of multi-modal information using acoustic features and computer vision-based smile-related features to detect instances of laughter and to reduce the instances of false positives in adults' and children's speech. The fusion of the features resulted in an improvement of the accuracy and recall rates than when using either of the two modalities on their own.
APA, Harvard, Vancouver, ISO, and other styles
26

Lu, Nan. "Development of new digital signal processing procedures and applications to speech, electromyography and image processing." Thesis, University of Liverpool, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445962.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Smith, Daniel. "An analysis of blind signal separation for real time application." Access electronically, 2006. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20070815.152400/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Ikram, Muhammad Zubair. "Multichannel blind separation of speech signals in a reverberant environment." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Nayfeh, Taysir H. "Multi-signal processing for voice recognition in noisy environments." Thesis, This resource online, 1991. http://scholar.lib.vt.edu/theses/available/etd-10222009-125021/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Chan, Arthur Yu Chung. "Robust speech recognition against unknown short-time noise /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20CHAN.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 119-125). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
31

Doukas, Nikolaos. "Voice activity detection using energy based measures and source separation." Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Kale, Kaustubh R. "Low complexity, narrow baseline beamformer for hand-held devices." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0001223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Hild, Kenneth E. "Blind separation of convolutive mixtures using Renyi's divergence." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0002387.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/36534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-06072004-131138/unrestricted/ertan%5Fali%5Fe%5F200405%5Fphd.pdf.

Full text
Abstract:
Thesis (Ph. D.)--School of Electrical and Computer Engineering, Georgia Institute of Technology, 2004. Directed by Thomas P. Barnwell, III.
Vita. Includes bibliographical references (leaves 221-226).
APA, Harvard, Vancouver, ISO, and other styles
36

Bakheet, Mohammed. "Improving Speech Recognition for Arabic language Using Low Amounts of Labeled Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176437.

Full text
Abstract:
The importance of Automatic Speech Recognition (ASR) Systems, whose job is to generate text from audio, is increasing as the number of applications of these systems is rapidly going up. However, when it comes to training ASR systems, the process is difficult and rather tedious, and that could be attributed to the lack of training data. ASRs require huge amounts of annotated training data containing the audio files and the corresponding accurately written transcript files. This annotated (labeled) training data is very difficult to find for most of the languages, it usually requires people to perform the annotation manually which, apart from the monetary price it costs, is error-prone. A supervised training task is impractical for this scenario.  The Arabic language is one of the languages that do not have an abundance of labeled data, which makes its ASR system's accuracy very low compared to other resource-rich languages such as English, French, or Spanish. In this research, we take advantage of unlabeled voice data by learning general data representations from unlabeled training data (only audio files) in a self-supervised task or pre-training phase. This phase is done by using wav2vec 2.0 framework which masks out input in the latent space and solves a contrastive task. The model is then fine-tuned on a few amounts of labeled data. We also exploit models that have been pre-trained on different languages, by using wav2vec 2.0, for the purpose of fine-tuning them on Arabic language by using annotated Arabic data.   We show that using wav2vec 2.0 framework for pre-training on Arabic is considerably time and resource-consuming. It took the model 21.5 days (about 3 weeks) to complete 662 epochs and get a validation accuracy of 58%.  Arabic is a right-to-left (rtl) language with many diacritics that indicate how letters should be pronounced, these two features make it difficult for Arabic to fit into these models, as it requires heavy pre-processing for the transcript files. We demonstrate that we can fine-tune a cross-lingual model, that is trained on raw waveforms of speech in multiple languages, on Arabic data and get a low word error rate 36.53%. We also prove that by fine-tuning the model parameters we can increase the accuracy, thus, decrease the word error rate from 54.00% to 36.69%.
APA, Harvard, Vancouver, ISO, and other styles
37

Birkenes, Øystein. "A Framework for Speech Recognition using Logistic Regression." Doctoral thesis, Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1599.

Full text
Abstract:

Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones.

In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion.

Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results.

A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.

APA, Harvard, Vancouver, ISO, and other styles
38

Sukittanon, Somsak. "Modulation scale analysis : theory and application for nonstationary signal classification /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/5875.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Wilson, Leslie. "The Music Muse." Thesis, Virginia Tech, 1996. http://hdl.handle.net/10919/36769.

Full text
Abstract:
Ever wonder why two people can sing the same note with the same loudness, but sound completely different? Middle C is middle C no matter who sings it, yet for some reason Lucianno Pavarotti1s middle C sounds richer and more beautiful than Bob Dylan1s middle C, for example. But then again, what is beauty in singing? It is a completely biased and abstract concept. To some, Bob Dylan1s voice may epitomize tonal beauty, while to others his voice may be comparable to fingernails on a chalk board. Anyway, differences in tone quality, or timbre, are due to differences in the spectral characteristics in different voices. The Music Muse is a computer program designed to help singers train their voices by showing them the individual components of their voices that combine to produce timbre. In paintings, many colors are combined to produce different hues and shades of color. The individual colors that make up the hue are difficult to distinguish. Similarly in music, harmonics with varying amplitudes combine to create voice colors, or timbres. These individual harmonics are difficult to distinguish by the ear alone. The Music Muse splits the voice up into its harmonic components by means of a Fourier transform. The transformed data is then plotted on a harmonic spectrum, from which singers can observe the number of harmonics in their tone, and their amplitudes relative to one another. It is these spectral characteristics that are important to voice timbre. The amplitudes of the harmonics in a voiced tone are determined by the resonant frequencies of the vocal tract. These resonances are called formants. When a harmonic that is produced by the vocal cords has a frequency that is at or near a formant frequency, it is amplified. Formants are determined by the length, size, and shape of the vocal tract. These parameters differ from person to person, and change during articulation. Optimal tonal quality during singing is obtained by placing formants at a desired frequency. The Music Muse calculates the formants of the voice by means of cepstral analysis. The formants are then plotted. With this tool, singers can learn how to place their formants. One of the difficulties of voice training is that singing is rated on a scale of quality, which is difficult to quantify. Also, feedback tends to be biased, and therefore subjective in nature. The Music Muse provides singers with the technology to quantify quality to a degree that makes it less of an abstract concept, and therefore more attainable.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
40

George, E. Bryan. "An analysis-by-synthesis approach to sinusoidal modeling applied to speech and music signal processing." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Salvi, Giampiero. "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis." Doctoral thesis, Stockholm : KTH School of Computer Science and Comunication, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Skjei, Thomas. "Real-Time Fundamental Frequency Estimation Algorithm for Disconnected Speech." VCU Scholars Compass, 2011. http://scholarscompass.vcu.edu/etd/191.

Full text
Abstract:
A new algorithm is presented for real-time fundamental frequency estimation of speech signals. This method extends and alters the YIN algorithm, which uses the autocorrelation-based difference function, by adding features to reduce latency, correct predictable errors, and make it structurally appropriate for real-time processing scenarios. The algorithm is shown to reduce the error rate of its predecessor while demonstrating latencies sufficient for real-time processing. The results indicate that the algorithm can be realized as a real-time estimator of spoken pitch and pitch variation, which has applications including diagnosis and biofeedback-based therapy of many speech disorders.
APA, Harvard, Vancouver, ISO, and other styles
43

Faubel, Friedrich [Verfasser], and Dietrich [Akademischer Betreuer] Klakow. "Statistical signal processing techniques for robust speech recognition / Friedrich Faubel. Betreuer: Dietrich Klakow." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2016. http://d-nb.info/1090875703/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Chan, C. F. "Low bit-rate speech coding : A parallel processing approach using digital signal processors." Thesis, University of Essex, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.375652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Meyer, Georg. "Models of neurons in the ventral cochlear nucleus : signal processing and speech recognition." Thesis, Keele University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334715.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

健紘, 大田, and Kenko Ota. "Studies in signal processing for robust speech recognition in noisy and reverberant environments." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB10268908/?lang=0, 2008. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB10268908/?lang=0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

McLurg, Craig J. (Craig James) Carleton University Dissertation Engineering Electrical. "Hardware and software for a speech and signal processing subsystem for a multiprocessor." Ottawa, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
48

Tryfou, Georgina. "Time-frequency reassignment for acoustic signal processing. From speech to singing voice applications." Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2562/2/PhD-Thesis.pdf.

Full text
Abstract:
The various time-frequency (TF) representations of acoustic signals share the common objective to describe the temporal evolution of the spectral content of the signal, i.e., how the energy, or intensity, of the signal is changing in time. Many TF representations have been proposed in the past, and among them the short-time Fourier transform (STFT) is the one most commonly found in the core of acoustic signal processing techniques. However, certain problems that arise from the use of the STFT have been extensively discussed in the literature. These problems concern the unavoidable trade-off between the time and frequency resolution, and the fact that the selected resolution is fixed over the whole spectrum. In order to improve upon the spectrogram, several variations have been proposed over the time. One of these variations, stems from a promising method called reassignment. According to this method, the traditional spectrogram, as obtained from the STFT, is reassigned to a sharper representation called the Reassigned Spectrogram (RS). In this thesis we elaborate on approaches that utilize the RS as the TF representation of acoustic signals, and we exploit this representation in the context of different applications, as for instance speech recognition and melody extraction. The first contribution of this work is a method for speech parametrization, which results in a set of acoustic features called time-frequency reassigned cepstral coefficients (TFRCC). Experimental results show the ability of TFRCC features to present higher level characteristics of speech, a fact that leads to advantages in phone-level speech segmentation and speech recognition. The second contribution is the use of the RS as the basis to extract objective quality measures, and in particular the reassigned cepstral distance and the reassigned point-wise distance. Both measures are used for channel selection (CS), following our proposal to perform objective quality measure based CS for improving the accuracy of speech recognition in a multi-microphone reverberant environment. The final contribution of this work, is a method to detect harmonic pitch contours from singing voice signals, using a dominance weighting of the RS. This method has been exploited in the context of melody extraction from polyphonic music signals.
APA, Harvard, Vancouver, ISO, and other styles
49

Bakir, Tariq Saad. "Blind adaptive dereverberation of speech signals using a microphone array." Diss., Available online, Georgia Institute of Technology, 2004:, 2004. http://etd.gatech.edu/theses/available/etd-06072004-131047/unrestricted/bakir%5Ftariq%5Fs%5F200405%5Fphd.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Leis, John W. "Spectral coding methods for speech compression and speaker identification." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36062/7/36062_Digitised_Thesis.pdf.

Full text
Abstract:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography