To see the other types of publications on this topic, follow the link: Gaussian mixer model (GMM).

Dissertations / Theses on the topic 'Gaussian mixer model (GMM)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 17 dissertations / theses for your research on the topic 'Gaussian mixer model (GMM).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zhao, David Yuheng. "Model Based Speech Enhancement and Coding." Doctoral thesis, Stockholm : Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4412.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tomashenko, Natalia. "Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1040/document.

Full text
Abstract:
Les différences entre conditions d'apprentissage et conditions de test peuvent considérablement dégrader la qualité des transcriptions produites par un système de reconnaissance automatique de la parole (RAP). L'adaptation est un moyen efficace pour réduire l'inadéquation entre les modèles du système et les données liées à un locuteur ou un canal acoustique particulier. Il existe deux types dominants de modèles acoustiques utilisés en RAP : les modèles de mélanges gaussiens (GMM) et les réseaux de neurones profonds (DNN). L'approche par modèles de Markov cachés (HMM) combinés à des GMM (GMM-HMM) a été l'une des techniques les plus utilisées dans les systèmes de RAP pendant de nombreuses décennies. Plusieurs techniques d'adaptation ont été développées pour ce type de modèles. Les modèles acoustiques combinant HMM et DNN (DNN-HMM) ont récemment permis de grandes avancées et surpassé les modèles GMM-HMM pour diverses tâches de RAP, mais l'adaptation au locuteur reste très difficile pour les modèles DNN-HMM. L'objectif principal de cette thèse est de développer une méthode de transfert efficace des algorithmes d'adaptation des modèles GMM aux modèles DNN. Une nouvelle approche pour l'adaptation au locuteur des modèles acoustiques de type DNN est proposée et étudiée : elle s'appuie sur l'utilisation de fonctions dérivées de GMM comme entrée d'un DNN. La technique proposée fournit un cadre général pour le transfert des algorithmes d'adaptation développés pour les GMM à l'adaptation des DNN. Elle est étudiée pour différents systèmes de RAP à l'état de l'art et s'avère efficace par rapport à d'autres techniques d'adaptation au locuteur, ainsi que complémentaire<br>Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them
APA, Harvard, Vancouver, ISO, and other styles
3

Miyajima, Chiyomi, Yoshihiro Nishiwaki, Koji Ozawa, et al. "Driver Modeling Based on Driving Behavior and Its Evaluation in Driver Identification." IEEE, 2007. http://hdl.handle.net/2237/9623.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jose, Neenu. "SPEAKER AND GENDER IDENTIFICATION USING BIOACOUSTIC DATA SETS." UKnowledge, 2018. https://uknowledge.uky.edu/ece_etds/120.

Full text
Abstract:
Acoustic analysis of animal vocalizations has been widely used to identify the presence of individual species, classify vocalizations, identify individuals, and determine gender. In this work automatic identification of speaker and gender of mice from ultrasonic vocalizations and speaker identification of meerkats from their Close calls is investigated. Feature extraction was implemented using Greenwood Function Cepstral Coefficients (GFCC), designed exclusively for extracting features from animal vocalizations. Mice ultrasonic vocalizations were analyzed using Gaussian Mixture Models (GMM) which yielded an accuracy of 78.3% for speaker identification and 93.2% for gender identification. Meerkat speaker identification with Close calls was implemented using Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM), with an accuracy of 90.8% and 94.4% respectively. The results obtained shows these methods indicate the presence of gender and identity information in vocalizations and support the possibility of robust gender identification and individual identification using bioacoustic data sets.
APA, Harvard, Vancouver, ISO, and other styles
5

Huang, Xuan. "Balance-guaranteed optimized tree with reject option for live fish recognition." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/9779.

Full text
Abstract:
This thesis investigates the computer vision application of live fish recognition, which is needed in application scenarios where manual annotation is too expensive, when there are too many underwater videos. This system can assist ecological surveillance research, e.g. computing fish population statistics in the open sea. Some pre-processing procedures are employed to improve the recognition accuracy, and then 69 types of features are extracted. These features are a combination of colour, shape and texture properties in different parts of the fish such as tail/head/top/bottom, as well as the whole fish. Then, we present a novel Balance-Guaranteed Optimized Tree with Reject option (BGOTR) for live fish recognition. It improves the normal hierarchical method by arranging more accurate classifications at a higher level and keeping the hierarchical tree balanced. BGOTR is automatically constructed based on inter-class similarities. We apply a Gaussian Mixture Model (GMM) and Bayes rule as a reject option after the hierarchical classification to evaluate the posterior probability of being a certain species to filter less confident decisions. This novel classification-rejection method cleans up decisions and rejects unknown classes. After constructing the tree architecture, a novel trajectory voting method is used to eliminate accumulated errors during hierarchical classification and, therefore, achieves better performance. The proposed BGOTR-based hierarchical classification method is applied to recognize the 15 major species of 24150 manually labelled fish images and to detect new species in an unrestricted natural environment recorded by underwater cameras in south Taiwan sea. It achieves significant improvements compared to the state-of-the-art techniques. Furthermore, the sequence of feature selection and constructing a multi-class SVM is investigated. We propose that an Individual Feature Selection (IFS) procedure can be directly exploited to the binary One-versus-One SVMs before assembling the full multiclass SVM. The IFS method selects different subsets of features for each Oneversus- One SVM inside the multiclass classifier so that each vote is optimized to discriminate the two specific classes. The proposed IFS method is tested on four different datasets comparing the performance and time cost. Experimental results demonstrate significant improvements compared to the normal Multiclass Feature Selection (MFS) method on all datasets.
APA, Harvard, Vancouver, ISO, and other styles
6

Harrysson, Mattias. "Neural probabilistic topic modeling of short and messy text." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189532.

Full text
Abstract:
Exploring massive amount of user generated data with topics posits a new way to find useful information. The topics are assumed to be “hidden” and must be “uncovered” by statistical methods such as topic modeling. However, the user generated data is typically short and messy e.g. informal chat conversations, heavy use of slang words and “noise” which could be URL’s or other forms of pseudo-text. This type of data is difficult to process for most natural language processing methods, including topic modeling. This thesis attempts to find the approach that objectively give the better topics from short and messy text in a comparative study. The compared approaches are latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words, and a new approach based on previous work named Neural Probabilistic Topic Modeling (NPTM). It could only be concluded that NPTM have a tendency to achieve better topics on short and messy text than LDA and RO-LDA. GMM on the other hand could not produce any meaningful results at all. The results are less conclusive since NPTM suffers from long running times which prevented enough samples to be obtained for a statistical test.<br>Att utforska enorma mängder användargenererad data med ämnen postulerar ett nytt sätt att hitta användbar information. Ämnena antas vara “gömda” och måste “avtäckas” med statistiska metoder såsom ämnesmodellering. Dock är användargenererad data generellt sätt kort och stökig t.ex. informella chattkonversationer, mycket slangord och “brus” som kan vara URL:er eller andra former av pseudo-text. Denna typ av data är svår att bearbeta för de flesta algoritmer i naturligt språk, inklusive ämnesmodellering. Det här arbetet har försökt hitta den metod som objektivt ger dem bättre ämnena ur kort och stökig text i en jämförande studie. De metoder som jämfördes var latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words samt en egen metod med namnet Neural Probabilistic Topic Modeling (NPTM) baserat på tidigare arbeten. Den slutsats som kan dras är att NPTM har en tendens att ge bättre ämnen på kort och stökig text jämfört med LDA och RO-LDA. GMM lyckades inte ge några meningsfulla resultat alls. Resultaten är mindre bevisande eftersom NPTM har problem med långa körtider vilket innebär att tillräckligt många stickprov inte kunde erhållas för ett statistiskt test.
APA, Harvard, Vancouver, ISO, and other styles
7

Jiřík, Leoš. "Rozpoznávání pozic a gest." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235932.

Full text
Abstract:
This thesis inquires the existing methods on the field of image recognition with regards to gesture recognition. Some methods have been chosen for deeper study and these are to be discussed later on. The second part goes in for the concenpt of an algorithm that would be able of robust gesture recognition based on data acquired within the AMI and M4 projects. A new ways to achieve precise information on participants position are suggested along with dynamic data processing approaches toward recognition. As an alternative, recognition using Gaussian Mixture Models and periodicity analysis are brought in. The gesture class in focus are speech supporting gestures. The last part demonstrates the results and discusses future work.
APA, Harvard, Vancouver, ISO, and other styles
8

Larsson, Alm Kevin. "Automatic Speech Quality Assessment in Unified Communication : A Case Study." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159794.

Full text
Abstract:
Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world.
APA, Harvard, Vancouver, ISO, and other styles
9

Verdet, Florian. "Exploring variabilities through factor analysis in automatic acoustic language recognition." Phd thesis, Université d'Avignon, 2011. http://tel.archives-ouvertes.fr/tel-00954255.

Full text
Abstract:
Language Recognition is the problem of discovering the language of a spoken definitionutterance. This thesis achieves this goal by using short term acoustic information within a GMM-UBM approach.The main problem of many pattern recognition applications is the variability of problemthe observed data. In the context of Language Recognition (LR), this troublesomevariability is due to the speaker characteristics, speech evolution, acquisition and transmission channels.In the context of Speaker Recognition, the variability problem is solved by solutionthe Joint Factor Analysis (JFA) technique. Here, we introduce this paradigm toLanguage Recognition. The success of JFA relies on several assumptions: The globalJFA assumption is that the observed information can be decomposed into a universalglobal part, a language-dependent part and the language-independent variabilitypart. The second, more technical assumption consists in the unwanted variability part to be thought to live in a low-dimensional, globally defined subspace. In this work, we analyze how JFA behaves in the context of a GMM-UBM LR framework. We also introduce and analyze its combination with Support Vector Machines(SVMs).The first JFA publications put all unwanted information (hence the variability) improvemen tinto one and the same component, which is thought to follow a Gaussian distribution.This handles diverse kinds of variability in a unique manner. But in practice,we observe that this hypothesis is not always verified. We have for example thecase, where the data can be divided into two clearly separate subsets, namely datafrom telephony and from broadcast sources. In this case, our detailed investigations show that there is some benefit of handling the two kinds of data with two separatesystems and then to elect the output score of the system, which corresponds to the source of the testing utterance.For selecting the score of one or the other system, we need a channel source related analyses detector. We propose here different novel designs for such automatic detectors.In this framework, we show that JFA's variability factors (of the subspace) can beused with success for detecting the source. This opens the interesting perspectiveof partitioning the data into automatically determined channel source categories,avoiding the need of source-labeled training data, which is not always available.The JFA approach results in up to 72% relative cost reduction, compared to the overall resultsGMM-UBM baseline system. Using source specific systems followed by a scoreselector, we achieve 81% relative improvement.
APA, Harvard, Vancouver, ISO, and other styles
10

Wu, Burton. "New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/46084/1/Burton_Wu_Thesis.pdf.

Full text
Abstract:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
APA, Harvard, Vancouver, ISO, and other styles
11

DASTHAGIRAIAH, PALLA. "CAMERS BASED TRAFFIC MONITORING." Thesis, 2011. http://dspace.dtu.ac.in:8080/jspui/handle/repository/13988.

Full text
Abstract:
Due to heavy traffic congestion in city roads, commuting became a very difficult task for the common man. Hence we developed a traffic monitoring and classification system using visual sensors for implementing an efficient traffic management system. The approach in this project is on appropriate methods of image processing and computer vision algorithms to be applied to road traffic flow and monitoring. The system involves extracting and analyzing the spatial and spatial temporal interest points. The traffic patterns are identified, classified using Gaussian Mixture Model (GMM), Expectation Maximization (EM) algorithms. Traffic flow monitoring and traffic analysis is based on computer vision techniques in a real-time mode invoke precious and complicated demand of efficient computer algorithms and technological solutions. Real time traffic flow analysis then leads to generate reports of traffic volume count (without identifying and tracking vehicles) which helps in efficient designing of roads and transport system. Traffic monitoring and classification using a distributed camera network is presented in this work. The activities on each road link are monitored and features are derived to identify the pattern. Then it is learnt, classified and communicated to neighboring road links. We used GMM-EM based classification. The proposed method is neither based on tracking nor on vehicle detection. Apart from this, the method is flexible, adaptive, robust and computationally light. Unlike the existing methods it does not assume or draws analogies of traffic moving as particles, neither does it impose restriction on road conditions or road tributaries and distributaries.
APA, Harvard, Vancouver, ISO, and other styles
12

Marczyk, Michał. "Przetwarzanie i klasyfikacja danych uzyskiwanych z użyciem wysokoprzepustowych technik pomiarowych biologii molekularnej." Rozprawa doktorska, 2013. https://repolis.bg.polsl.pl/dlibra/docmetadata?showContent=true&id=9638.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Marczyk, Michał. "Przetwarzanie i klasyfikacja danych uzyskiwanych z użyciem wysokoprzepustowych technik pomiarowych biologii molekularnej." Rozprawa doktorska, 2013. https://delibra.bg.polsl.pl/dlibra/docmetadata?showContent=true&id=9638.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Gorur, Pushkar. "Bitrate Reduction Techniques for Low-Complexity Surveillance Video Coding." Thesis, 2016. http://etd.iisc.ac.in/handle/2005/2681.

Full text
Abstract:
High resolution surveillance video cameras are invaluable resources for effective crime prevention and forensic investigations. However, increasing communication bandwidth requirements of high definition surveillance videos are severely limiting the number of cameras that can be deployed. Higher bitrate also increases operating expenses due to higher data communication and storage costs. Hence, it is essential to develop low complexity algorithms which reduce data rate of the compressed video stream without affecting the image fidelity. In this thesis, a computer vision aided H.264 surveillance video encoder and four associated algorithms are proposed to reduce the bitrate. The proposed techniques are (I) Speeded up foreground segmentation, (II) Skip decision, (III) Reference frame selection and (IV) Face Region-of-Interest (ROI) coding. In the first part of the thesis, a modification to the adaptive Gaussian Mixture Model (GMM) based foreground segmentation algorithm is proposed to reduce computational complexity. This is achieved by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we compute periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal. In the second part, we propose a skip decision technique that uses a spatial sampler to sample pixels. The sampled pixels are segmented using the speeded up GMM algorithm. The storage pattern of the GMM parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. In the third part, a reference frame selection algorithm is proposed to maximize the number of background Macroblocks (MB’s) (i.e. MB’s that contain background image content) in the Decoded Picture Buffer. This reduces the cost of coding uncovered background regions. Distortion over foreground pixels is measured to quantify the performance of skip decision and reference frame selection techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence. In the final part of the thesis, face and shadow region detection is combined with the skip decision algorithm to perform ROI coding for pedestrian surveillance videos. Since person identification requires high quality face images, MB’s containing face image content are encoded with a low Quantization Parameter setting (i.e. high quality). Other regions of the body in the image are considered as RORI (Regions of reduced interest) and are encoded at low quality. The shadow regions are marked as Skip. Techniques that use only facial features to detect faces (e.g. Viola Jones face detector) are not robust in real world scenarios. Hence, we propose to initially detect pedestrians using deformable part models. The face region is determined using the deformed part locations. Detected pedestrians are tracked using an optical flow based tracker combined with a Kalman filter. The tracker improves the accuracy and also avoids the need to run the object detector on already detected pedestrians. Shadow and skin detector scores are computed over super pixels. Bilattice based logic inference is used to combine multiple likelihood scores and classify the super pixels as ROI, RORI or RONI. The coding mode and QP values of the MB’s are determined using the super pixel labels. The proposed techniques provide a further reduction in bitrate of up to 50.2%.
APA, Harvard, Vancouver, ISO, and other styles
15

Gorur, Pushkar. "Bitrate Reduction Techniques for Low-Complexity Surveillance Video Coding." Thesis, 2016. http://etd.iisc.ernet.in/handle/2005/2681.

Full text
Abstract:
High resolution surveillance video cameras are invaluable resources for effective crime prevention and forensic investigations. However, increasing communication bandwidth requirements of high definition surveillance videos are severely limiting the number of cameras that can be deployed. Higher bitrate also increases operating expenses due to higher data communication and storage costs. Hence, it is essential to develop low complexity algorithms which reduce data rate of the compressed video stream without affecting the image fidelity. In this thesis, a computer vision aided H.264 surveillance video encoder and four associated algorithms are proposed to reduce the bitrate. The proposed techniques are (I) Speeded up foreground segmentation, (II) Skip decision, (III) Reference frame selection and (IV) Face Region-of-Interest (ROI) coding. In the first part of the thesis, a modification to the adaptive Gaussian Mixture Model (GMM) based foreground segmentation algorithm is proposed to reduce computational complexity. This is achieved by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we compute periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal. In the second part, we propose a skip decision technique that uses a spatial sampler to sample pixels. The sampled pixels are segmented using the speeded up GMM algorithm. The storage pattern of the GMM parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. In the third part, a reference frame selection algorithm is proposed to maximize the number of background Macroblocks (MB’s) (i.e. MB’s that contain background image content) in the Decoded Picture Buffer. This reduces the cost of coding uncovered background regions. Distortion over foreground pixels is measured to quantify the performance of skip decision and reference frame selection techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence. In the final part of the thesis, face and shadow region detection is combined with the skip decision algorithm to perform ROI coding for pedestrian surveillance videos. Since person identification requires high quality face images, MB’s containing face image content are encoded with a low Quantization Parameter setting (i.e. high quality). Other regions of the body in the image are considered as RORI (Regions of reduced interest) and are encoded at low quality. The shadow regions are marked as Skip. Techniques that use only facial features to detect faces (e.g. Viola Jones face detector) are not robust in real world scenarios. Hence, we propose to initially detect pedestrians using deformable part models. The face region is determined using the deformed part locations. Detected pedestrians are tracked using an optical flow based tracker combined with a Kalman filter. The tracker improves the accuracy and also avoids the need to run the object detector on already detected pedestrians. Shadow and skin detector scores are computed over super pixels. Bilattice based logic inference is used to combine multiple likelihood scores and classify the super pixels as ROI, RORI or RONI. The coding mode and QP values of the MB’s are determined using the super pixel labels. The proposed techniques provide a further reduction in bitrate of up to 50.2%.
APA, Harvard, Vancouver, ISO, and other styles
16

Chatterjee, Saikat. "Rate-Distortion Performance And Complexity Optimized Structured Vector Quantization." Thesis, 2008. https://etd.iisc.ac.in/handle/2005/1056.

Full text
Abstract:
Although vector quantization (VQ) is an established topic in communication, its practical utility has been limited due to (i) prohibitive complexity for higher quality and bit-rate, (ii) structured VQ methods which are not analyzed for optimum performance, (iii) difficulty of mapping theoretical performance of mean square error (MSE) to perceptual measures. However, an ever increasing demand for various source signal compression, points to VQ as the inevitable choice for high efficiency. This thesis addresses all the three above issues, utilizing the power of parametric stochastic modeling of the signal source, viz., Gaussian mixture model (GMM) and proposes new solutions. Addressing some of the new requirements of source coding in network applications, the thesis also presents solutions for scalable bit-rate, rate-independent complexity and decoder scalability. While structured VQ is a necessity to reduce the complexity, we have developed, analyzed and compared three different schemes of compensation for the loss due to structured VQ. Focusing on the widely used methods of split VQ (SVQ) and KLT based transform domain scalar quantization (TrSQ), we develop expressions for their optimum performance using high rate quantization theory. We propose the use of conditional PDF based SVQ (CSVQ) to compensate for the split loss in SVQ and analytically show that it achieves coding gain over SVQ. Using the analytical expressions of complexity, an algorithm to choose the optimum splits is proposed. We analyze these techniques for their complexity as well as perceptual distortion measure, considering the specific case of quantizing the wide band speech line spectrum frequency (LSF) parameters. Using natural speech data, it is shown that the new conditional PDF based methods provide better perceptual distortion performance than the traditional methods. Exploring the use of GMMs for the source, we take the approach of separately estimating the GMM parameters and then use the high rate quantization theory in a simplified manner to derive closed form expressions for optimum MSE performance. This has led to the development of non-linear prediction for compensating the split loss (in contrast to the linear prediction using a Gaussian model). We show that the GMM approach can improve the recently proposed adaptive VQ scheme of switched SVQ (SSVQ). We derive the optimum performance expressions for SSVQ, in both variable bit rate and fixed bit rate formats, using the simplified approach of GMM in high rate theory. As a third scheme for recovering the split loss in SVQ and reduce the complexity, we propose a two stage SVQ (TsSVQ), which is analyzed for minimum complexity as well as perceptual distortion. Utilizing the low complexity of transform domain SVQ (TrSVQ) as well as the two stage approach in a universal coding framework, it is shown that we can achieve low complexity as well as better performance than SSVQ. Further, the combination of GMM and universal coding led to the development of a highly scalable coder which can provide both bit-rate scalability, decoder scalability and rate-independent low complexity. Also, the perceptual distortion performance is comparable to that of SSVQ. Since GMM is a generic source model, we develop a new method of predicting the performance bound for perceptual distortion using VQ. Applying this method to LSF quantization, the minimum bit rates for quantizing telephone band LSF (TB-LSF) and wideband LSF (WB-LSF) are derived.
APA, Harvard, Vancouver, ISO, and other styles
17

Chatterjee, Saikat. "Rate-Distortion Performance And Complexity Optimized Structured Vector Quantization." Thesis, 2008. http://hdl.handle.net/2005/1056.

Full text
Abstract:
Although vector quantization (VQ) is an established topic in communication, its practical utility has been limited due to (i) prohibitive complexity for higher quality and bit-rate, (ii) structured VQ methods which are not analyzed for optimum performance, (iii) difficulty of mapping theoretical performance of mean square error (MSE) to perceptual measures. However, an ever increasing demand for various source signal compression, points to VQ as the inevitable choice for high efficiency. This thesis addresses all the three above issues, utilizing the power of parametric stochastic modeling of the signal source, viz., Gaussian mixture model (GMM) and proposes new solutions. Addressing some of the new requirements of source coding in network applications, the thesis also presents solutions for scalable bit-rate, rate-independent complexity and decoder scalability. While structured VQ is a necessity to reduce the complexity, we have developed, analyzed and compared three different schemes of compensation for the loss due to structured VQ. Focusing on the widely used methods of split VQ (SVQ) and KLT based transform domain scalar quantization (TrSQ), we develop expressions for their optimum performance using high rate quantization theory. We propose the use of conditional PDF based SVQ (CSVQ) to compensate for the split loss in SVQ and analytically show that it achieves coding gain over SVQ. Using the analytical expressions of complexity, an algorithm to choose the optimum splits is proposed. We analyze these techniques for their complexity as well as perceptual distortion measure, considering the specific case of quantizing the wide band speech line spectrum frequency (LSF) parameters. Using natural speech data, it is shown that the new conditional PDF based methods provide better perceptual distortion performance than the traditional methods. Exploring the use of GMMs for the source, we take the approach of separately estimating the GMM parameters and then use the high rate quantization theory in a simplified manner to derive closed form expressions for optimum MSE performance. This has led to the development of non-linear prediction for compensating the split loss (in contrast to the linear prediction using a Gaussian model). We show that the GMM approach can improve the recently proposed adaptive VQ scheme of switched SVQ (SSVQ). We derive the optimum performance expressions for SSVQ, in both variable bit rate and fixed bit rate formats, using the simplified approach of GMM in high rate theory. As a third scheme for recovering the split loss in SVQ and reduce the complexity, we propose a two stage SVQ (TsSVQ), which is analyzed for minimum complexity as well as perceptual distortion. Utilizing the low complexity of transform domain SVQ (TrSVQ) as well as the two stage approach in a universal coding framework, it is shown that we can achieve low complexity as well as better performance than SSVQ. Further, the combination of GMM and universal coding led to the development of a highly scalable coder which can provide both bit-rate scalability, decoder scalability and rate-independent low complexity. Also, the perceptual distortion performance is comparable to that of SSVQ. Since GMM is a generic source model, we develop a new method of predicting the performance bound for perceptual distortion using VQ. Applying this method to LSF quantization, the minimum bit rates for quantizing telephone band LSF (TB-LSF) and wideband LSF (WB-LSF) are derived.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!