Dissertations / Theses: 'Image and speech coding'

1

Yan, Ming. "VLSI architectures for speech and image coding applications." Thesis, Queen's University Belfast, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356855.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

So, Stephen. "Efficient Block Quantisation for Image and Speech Coding." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/366625.

Full text

Abstract:

Signal coding or compression has played a significant role in the success of digital communications and multimedia. The use of signal coding pervades many aspects of our digital lifestyle-a lifestyle that has seen widespread demand for applications like third generation mobile telephony, portable music players, Internet-based video conferencing, digital television, etc. The issues that arise, when dealing with the transmission and storage of digital media, are the limited bandwidth of communication channels, the limited capacity of storage devices, and the limited processing ability of the encoding and decoding devices. The aim of signal coding is therefore to represent digital media, such as speech, music, images, and video, as efficiently as possible. Coding efficiency encompasses rate-distortion (for lossy coding), computational complexity, and static memory requirements. The fundamental operation in lossy signal coding is quantisation. Its rate-distortion efficiency is influenced by the properties of the signal source, such as statistical dependencies and its probability density function. Vector quantisers are known to theoretically achieve the lowest distortion, at a given rate and dimension, of any quantisation scheme, though their computational complexity and memory requirements grow exponentially with rate and dimension. Structurally constrained vector quantisers, such as product code vector quantisers, alleviate these complexity issues, though this is achieved at the cost of degraded rate-distortion performance. Block quantisers or transform coders, which are a special case of product code vector quantisation, possess both low computational and memory requirements, as well as the ability to scale to any bitrate, which is termed as bitrate scalability. However, the prerequisite for optimal block quantisation, namely a purely Gaussian data source with uniform correlation, is rarely ever met with real-world signals. The Gaussian mixture model-based block quantiser, which was originally developed for line spectral frequency (LSF) quantisation for speech coding, overcomes these problems of source mismatch and non-stationarity by estimating the source using a GMM. The split vector quantiser, which was also successfully applied to LSF quantisation in the speech coding literature, is a product code vector quantiser that overcomes the complexity problem of unconstrained vector quantisers, by partitioning vectors into sub-vectors and quantising each one independently. The complexity can be significant reduced via more vector splitting, though this inevitably leads to an accompanying degradation in the rate-distortion efficiency. This is because the structural constraint of vector splitting causes losses in several properties of vector quantisers, which are termed as 'advantages'. This dissertation makes several contributions to the area of block and vector quantisation, more specifically to the GMM-based block quantiser and split vector quantiser, which aim to improve their rate-distortion and computational efficiency. These new quantisation schemes are evaluated and compared with existing and popular schemes in the areas of lossy image coding, LSF quantisation in narrowband speech coding, LSF and immittance spectral pair (ISP) quantisation in wideband speech coding, and Mel frequency-warped cepstral coefficient (MFCC) quantisation in distributed speech recognition. These contributions are summarised below. A novel technique for encoding fractional bits in a fixed-rate 0MM-based block quantiser scheme is presented. In the 0MM-based block quantiser, fractional bitrates are often assigned to each of the cluster block quantisers. This new encoding technique leads to better utilisation of the bit budget by allowing the use of, and providing for the encoding of, quantiser levels in a fixed-rate framework. The algorithm is based on a generalised positional number system and has a low complexity. A lower complexity 0MM-based block quantiser, that replaces the KLT with the discrete cosine transform (DOT), is proposed for image coding. Due to its source independent nature and amenability to efficient implementation, the DOT allows a fast 0MM-based block quantiser to be realised that achieves comparable rate-distortion performance as the KLT-based scheme in the block quantisation of images. Transform image coding often suffers from block artifacts at relatively low bitrates. We propose a scheme that minimises the block artifacts of block quantisation by pre-processing the image using the discrete wavelet transform, extracting vectors via a tree structure that exploits spatial self-similarity, and quantising these vectors using the 0MM-based block quantiser. Visual examination shows that block artifacts are considerably reduced by the wavelet pre-processing step. The multi-frame 0MM-based block quantiser is a modified scheme that exploits memory across successive frames or vectors. Its main advantages over the memoryless scheme in the application of LSF and ISP quantisation, are better rate-distortion and computational efficiency, through the exploitation of correlation across multiple frames and mean squared error selection criterion, respectively. The multi-frame 0MM-based block quantiser is also evaluated for the quantisation of MFCC feature vectors for distributed speech recognition and is shown to be superior to all quantisation schemes considered. A new product code vector quantiser, called the switched split vector quantiser (SSVQ), is proposed for speech LSF and ISP quantisation. SSVQ is a hybrid scheme, combining a switch vector quantiser with several split vector quantisers. It aims to overcome the losses of rate-distortion efficiency in split vector quantisers, by exploiting full vector dependencies before the vector splitting. It is shown that the SSVQ alleviates the losses in two of the three vector quantiser 'advantages'. The SSVQ also has a remarkably low computational complexity, though this is achieved at the cost of an increase in memory requirements.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Microelectronic Engineering
Full Text

APA, Harvard, Vancouver, ISO, and other styles

3

Savvides, Vasos E. "Perceptual models in speech quality assessment and coding." Thesis, Loughborough University, 1988. https://dspace.lboro.ac.uk/2134/36273.

Full text

Abstract:

The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment.

APA, Harvard, Vancouver, ISO, and other styles

4

Lo, Ka-Yiu. "Pitch synchronous speech coding at very low bit rates." Thesis, University of Liverpool, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321128.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Farsi, Hassan. "Advanced pre-and-post processing techniques for speech coding." Thesis, University of Surrey, 2003. http://epubs.surrey.ac.uk/844491/.

Full text

Abstract:

Advances in digital technology in the last decade have motivated the development of very efficient and high quality speech compression algorithms. While in the early low bit rate coding systems, the main target was the production of intelligible speech at low bit rates, expansion of new applications such as mobile satellite systems increased the demand for reducing the transmission bandwidth and achieving higher speech quality. This resulted in the development of efficient parametric models for speech production system. These models were the basis of powerful speech compression algorithms such as CELP, MBE, MELP and WI. The performance of a speech coder not only depends on the speech production model employed but also on the accurate estimation of speech parameters. Periodicity, also known as pitch, is one of the speech parameters that greatly affect the synthesised speech quality. Thus, the subject of pitch determination has attracted much research in the area of low bit rate coding. In these studies it is assumed that for a short segment of speech, called frame, the pitch is fixed or smoothly evolving. The pitch estimation algorithms generally fail to determine irregular variations, which can occur at onset and offset speech segments. In order to overcome this problem, a novel preprocessing method, which detects irregular pitch variations and modifies the speech signal such as to improve the accuracy of the pitch estimation, is proposed. This method results in more regular speech while maintaining perceptual speech quality. The perceptual quality of the synthesised speech may also be improved using postfiltering techniques. Conventional postfiltering methods generally consider the enhancement of the whole speech spectrum. This may result in the broadening of the first formant, which leads to the increase of quantisation noise for this formant. A new postfiltering technique, which is based on factorising the linear prediction synthesis filter, is proposed. This provides more control over the formant bandwidth and attenuation of spectral speech valleys. Key words: Pitch smoothing, speech pre-processor, postfiltering.

APA, Harvard, Vancouver, ISO, and other styles

6

Peng, Yong Kian. "Speech coding based on a pitch synchronous pattern recognition approach." Thesis, University of Ulster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245804.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Meh, Chu Chu. "Exploiting spatial and temporal redundancies for vector quantization of speech and images." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54442.

Full text

Abstract:

The objective of the proposed research is to compress data such as speech, audio, and images using a new re-ordering vector quantization approach that exploits the transition probability between consecutive code vectors in a signal. Vector quantization is the process of encoding blocks of samples from a data sequence by replacing every input vector from a dictionary of reproduction vectors. Shannon’s rate-distortion theory states that signals encoded as blocks of samples have a better rate-distortion performance relative to when encoded on a sample-to-sample basis. As such, vector quantization achieves a lower coding rate for a given distortion relative to scalar quantization for any given signal. Vector quantization does not take advantage of the inter-vector correlation between successive input vectors in data sequences. It has been demonstrated that real signals have significant inter-vector correlation. This correlation has led to vector quantization approaches that encode input vectors based on previously encoded vectors. Some methods have been proposed in literature to exploit the dependence between successive code vectors. Predictive vector quantization, dynamic codebook re-ordering, and finite-state vector quantization are examples of vector quantization schemes that use intervector correlation. Predictive vector quantization and finite-state vector quantization predict the reproduction vector for a given input vector by using past input vectors. Dynamic codebook re-ordering vector quantization has the same reproduction vectors as standard vector quantization. The dynamic codebook re-ordering algorithm is based on the concept of re-ordering indices whereby existing reproduction vectors are assigned new channel indices according a structure that orders the reproduction vectors in an order of increasing dissimilarity. Hence, an input vector encoded in the standard vector quantization method is transmitted through a channel with new indices such that 0 is assigned to the closest reproduction vector to the past reproduction vector. Larger index values are assigned to reproduction vectors that have larger distances from the previous reproduction vector. Dynamic codebook re-ordering assumes that the reproduction vectors of two successive vectors of real signals are typically close to each other according to a distance metric. Sometimes, two successively encoded vectors may have relatively larger distances from each other. Our likelihood codebook re-ordering vector quantization algorithm exploits the structure within a signal by exploiting the non-uniformity in the reproduction vector transition probability in a data sequence. Input vectors that have higher probability of transition from prior reproduction vectors are assigned indices of smaller values. The code vectors that are more likely to follow a given vector are assigned indices closer to 0 while the less likely are given assigned indices of higher value. This re-ordering provides the reproduction dictionary a structure suitable for entropy coding such as Huffman and arithmetic coding. Since such transitions are common in real signals, it is expected that our proposed algorithm when combined with entropy coding algorithms such binary arithmetic and Huffman coding, will result in lower bit rates for the same distortion as a standard vector quantization algorithm. The re-ordering vector quantization approach on quantized indices can be useful in speech, images, audio transmission. By applying our re-ordering approach to these data types, we expect to achieve lower coding rates for a given distortion or perceptual quality. This reduced coding rate makes our proposed algorithm useful for transmission and storage of larger image, speech streams for their respective communication channels. The use of truncation on the likelihood codebook re-ordering scheme results in much lower compression rates without significantly distorting the perceptual quality of the signals. Today, texts and other multimedia signals may be benefit from this additional layer of likelihood re-ordering compression.

APA, Harvard, Vancouver, ISO, and other styles

8

Abboud, Karim. "Wideband CELP speech coding." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56805.

Full text

Abstract:

The purpose of this thesis is to study the coding of wideband speech and to improve on previous Code-Excited Linear Prediction (CELP) coders in terms of speech quality and bit rate. To accomplish this task, improved coding techniques are introduced and the operating bit rate is reduced while maintaining and even enhancing the speech quality.
the first approach considers the quantization of Liner Predictive Coding (LPC) parameters and uses a three way split vector quantization. Both scalar and vector quantization are initially studied; results show that, with adequate codebook training, the second method generates better results while using a fewer number of bits. Nevertheless, the use of vector quantizers remain highly complex in terms of memory and number of computations. A new quantization scheme, split vector quantization (split VQ), is investigated to overcome this complexity problem. Using a new weighted distance measure as a selection criterion for split VQ, the average spectral distortion is significantly reduced to match the results obtained with scalar quantizers.
The second approach introduces a new pitch predictor with an increased temporal resolution for periodicity. This new technique has the advantage of maintaining the same quality obtained with conventional multiple coefficient predictors at a reduced bit rate. Furthermore, the conventional CELP noise weighting filter is modified to allow more freedom and better accuracy in the modeling of both tilt and formant structures. Throughout this process, different noise weighting schemes are evaluated and the results show that the new filter greatly contributes in solving the problem of high frequency distortion.
The final wideband CELP coder is operational at 11.7 kbits/s and generates a high perceptual quality of the reconstructed speech using the fractional pitch predictor and the new perceptual noise weighting filter.

APA, Harvard, Vancouver, ISO, and other styles

9

Streit, Juergen Stefan. "Digital image coding." Thesis, University of Southampton, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.361092.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Sturt, Christian. "Pitch synchronous speech coding techniques." Thesis, University of Surrey, 2003. http://epubs.surrey.ac.uk/843327/.

Full text

Abstract:

Efficient source coding techniques are necessary to make optimal use of the limited bandwidth available in mobile phone networks. Most current mobile telephone communication systems compress the speech waveform by using speech coders based on the Code Excited Linear Prediction (CELP) model. Such coders give high quality speech at bit rates of 8 kbps and above. Below 8 kbps, the quality of the coded speech degrades rapidly. At rates of 6 kbps and below, parametric speech coders offer better speech quality. These coders reduce the required bit rate by transmitting certain characteristics of the speech waveform to the decoder, rather than attempting to code the waveform itself. The disadvantage of parametric coders is that the maximum achievable quality is limited by assumptions made during the coding of the speech signal. The aim of the research presented is to investigate and eliminate the factors that limit the speech quality of parametric coders. A new pitch synchronous coding model is proposed that operates on individual pitch cycle waveforms of speech rather than longer, fixed length frames as used in classic techniques. In order to implement a pitch synchronous coder, new pitch cycle detection algorithms have been proposed. Pitch synchronous parameter analysis was investigated and several new techniques have been developed. A novel pitch synchronous split-band voicing estimator has been proposed that utilises only the phase of the speech harmonics rather than the periodicity used in traditional techniques. Fixed rate quantisation of pitch synchronous speech parameters has been investigated and a joint quantisation/interpolation scheme has been proposed. This scheme has been applied to the quantisation of the pitch synchronous parameters and has been shown to outperform traditional quantisation techniques. A comparison of a reference parametric coder with its pitch synchronous counterpart has shown that the pitch synchronous paradigm eliminates some of the main factors that limit the speech quality in parametric coders. It is expected that this will lead to the development of speech coders that can produce speech of higher quality than current parametric coders operating at the same bit rate. Key words: Speech Coding, Pitch Synchronous, Sinusoidal Coding, Split-Band LPC Coding.

APA, Harvard, Vancouver, ISO, and other styles

11

Kaouri, Hussein Ali. "Speech coding using vector quantisation." Thesis, Queen's University Belfast, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Kritzinger, Carl. "Low bit rate speech coding." Thesis, Stellenbosch : University of Stellenbosch, 2006. http://hdl.handle.net/10019.1/2078.

Full text

Abstract:

Thesis (MScIng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006.
Despite enormous advances in digital communication, the voice is still the primary tool with which people exchange ideas. However, uncompressed digital speech tends to require prohibitively high data rates (upward of 64kbps), making it impractical for many applications. Speech coding is the process of reducing the data rate of digital voice to manageable levels. Parametric speech coders or vocoders utilise a-priori information about the mechanism by which speech is produced in order to achieve extremely efficient compression of speech signals (as low as 1 kbps). The greater part of this thesis comprises an investigation into parametric speech coding. This consisted of a review of the mathematical and heuristic tools used in parametric speech coding, as well as the implementation of an accepted standard algorithm for parametric voice coding. In order to examine avenues of improvement for the existing vocoders, we examined some of the mathematical structure underlying parametric speech coding. Following on from this, we developed a novel approach to parametric speech coding which obtained promising results under both objective and subjective evaluation. An additional contribution by this thesis was the comparative subjective evaluation of the effect of parametric speech coding on English and Xhosa speech. We investigated the performance of two different encoding algorithms on the two languages.

APA, Harvard, Vancouver, ISO, and other styles

13

Burnett, I. S. "Hybrid techniques for speech coding." Thesis, University of Bath, 1992. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.317353.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Chowdhury, Md Mahbubul Islam. "Image segmentation for coding." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0017/MQ55494.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

VASCONCELLOS, EDMAR DA COSTA. "SUB-BAND IMAGE CODING." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 1994. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=8635@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
Este trabalho aborda o problema da compressão de imagens explorando a técnica de codificação por sub-bandas(SBB). Como estrutura básica, usada na primeira parte do trabalho, tem-se a divisão da imagem em 16 sub-bandas buscando replicar os resultados de woods [1]. As componentes das 16 SBB são quantizadas e codificadas, e bits são alocados às SBB usando como critério a minimização do erro médio quadrático. Os quantizadores são projetados segundo uma distribuição Gaussiana Generalizada. Neste processo de codificação, a sub-banda de mais baixa freqüência é codificada com DPCM, enquanto as demais SBB são codificadas por PCM. Como inovação, é proposto o uso do algoritmo de Lempel-Ziv na codificação sem perdas (compactação) das sub-bandas quantizadas. Na compactação são empregados os algoritmos de Huffman e LZW (modificação do LZA). Os resultados das simulações são apresentados em termos da taxa (bits/pixel) versus relação sinal ruído de pico e em termos de analise subjetiva das imagens reconstruídas. Os resultados obtidos indicam um desempenho de compressão superior quanto o algoritmo de Huffman é usado, comparado com o algoritmo LZW. A melhoria de desempenho, na técnica de decomposição em sub-bandas, observada com o algoritmo de Huffman foi superior (2dB acima). Todavia, tendo em vista as vantagens da universalidade do algoritmo de Lempel-Ziv, deve-se continuar a investigar o seu desempenho implementado de forma diferente do explorado neste trabalho.
This work focus on the problem of image compression, with exploring the techniques of subband coding. The basic structure, used in the sirst part of this tesis, encompass the uniform decomposition of the image into 16 subbands. This procedure aims at reproducing the reults of Woods [1]. The component of the 16 subbands are quatized and coded and bits are optimally allocated among the subbands to minimize the mean-squared error. The quantizers desingned match the Generelized Gaussian Distribuition, which model the subband components. In the coding process, the lowest subband is DPCM coded while the higher subbands are coded with PCM. As an innovation, it is proposed the use of the algorithm LZW for coding without error (compaction) the quantized subbands. In the compactation process, the Huffamn and LZW algorithms are used. The simulation results are presented in terms of rate (bits/pel) versus peak signal-to-noise and subjective quality. The performance of the subband decomposition tecnique obtained with the Huffamn´s algorithm is about 2dB better than that obtained with the LZW. The universality of the Lempel-Ziv algorithm is, however, an advantage that leads us to think that further investigation should still be pursued.

APA, Harvard, Vancouver, ISO, and other styles

16

Al-Naimi, Khaldoon Taha. "Advanced speech processing and coding techniques." Thesis, University of Surrey, 2002. http://epubs.surrey.ac.uk/843488/.

Full text

Abstract:

Over the past two decades there has been substantial growth in speech communications and new speech related applications. Bandwidth constraints led researchers to investigate ways of compressing speech signals whilst maintaining speech quality and intelligibility so as to increase the possible number of customers for the given bandwidth. Because of this a variety of speech coding techniques have been proposed over this period. At the heart of any proposed speech coding method is quantisation of the speech production model parameters that need to be transmitted to the decoder. Quantisation is a controlling factor for the targeted bit rates and for meeting quality requirements. The objectives of the research presented in this thesis are twofold. The first enabling the development of a very low bit rate speech coder which maintains quality and intelligibility. This includes increasing the robustness to various operating conditions as well as enhancing the estimation and improving the quantisation of speech model parameters. The second objective is to provide a method for enhancing the performance of an existing speech related application. The first objective is tackled with the aid of three techniques. Firstly, various novel estimation techniques are proposed which are such that the resultant estimated speech production model parameters have less redundant information and are highly correlated. This leads to easier quantisation (due to higher correlation) and therefore to bit saving. The second approach is to make use of the joint effect of the quantisation of spectral parameters (i.e. LSF and spectral amplitudes) for their big impact on the overall bit allocation required. Work towards the first objective also includes a third technique which enhances the estimation of a speech model parameter (i.e. the pitch) through a robust statistics-based post-processing (or tracking) method which operates in noise contaminated environments. Work towards the second objective focuses on an application where speech plays an important role, namely echo-canceller and noise-suppressor systems. A novel echo-canceller method is proposed which resolves most of the weaknesses present in existing echo-canceller systems and improves the system performance.

APA, Harvard, Vancouver, ISO, and other styles

17

Zhao, David Yuheng. "Model Based Speech Enhancement and Coding." Doctoral thesis, Stockholm : Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4412.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Katugampala, Nilantha N. "Multimode speech coding below 6 kbps." Thesis, University of Surrey, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365141.

Full text

Abstract:

The past two decades have witnessed a rapid expansion of the telecommunications industry. This growth has been primarily fuelled by the proliferation of the digital communication systems and services which have become easily available through wired and wireless networks. Current research trends involving integration and packetisation of voice, video and data channels into true multimedia communications, promise a similar technological revolution in the next decade. The available bandwidth in wire based terrestrial network is a relatively cheap and expandable resource. However in satellite and cellular radio systems the bandwidth is inherently limited and an expensive resource. In order to accommodate ever growing numbers of subscribers whilst maintaining high quality and low operational costs, it is essential to maximise the spectral efficiency. The research presented in this thesis has focused on the development of new source compression algorithms, tailored for human speech in order to improve the spectral efficiency of digital transmission systems. Recently there is an increasing interest on speech coding algorithms which combine various existing technologies in order to improve the speech quality .whilst maintaining the low transmission rate of the existing coding techniques. The aim of the research presented in this thesis was to develop a complete hybrid coding algorithm which combines harmonic and waveform approximating coding techniques. In order to integrate the two coding paradigms novel phase synchronisation and classification techniques were developed. The perceptual quality of the speech synthesised using the unquantised hybrid model achieves nearly transparent quality. The hybrid model was used to develop variable bit rate coders, which are particularly advantageous for voice storage, Code Division Multiple Access (CDMA) wireless networks, packet switched networks, and statistical multiplexing of speech for multi channel communications.

APA, Harvard, Vancouver, ISO, and other styles

19

Green, Richard C. "Walsh based cepstra for speech coding." Thesis, King's College London (University of London), 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.392848.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Ooi, James M. 1970. "Application of wavelets to speech coding." Thesis, Massachusetts Institute of Technology, 1993. http://hdl.handle.net/1721.1/12340.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Zolfaghari, Parham Seyed. "Sinusoidal model based segmental speech coding." Thesis, University of Cambridge, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621177.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Andersson, Tomas. "On error-robust source coding with image coding applications." Licentiate thesis, Stockholm : Department of Signals, Sensors and Systems, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4046.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Bergström, Peter. "Eye-movement controlled image coding /." Linköping : Univ, 2003. http://www.bibl.liu.se/liupubl/disp/disp2003/tek831s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Silva, Eduardo Antonio Barros da. "Wavelet transforms for image coding." Thesis, University of Essex, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282495.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Kubrick, Aharon H. "Image coding employing vector quantisation." Thesis, City University London, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.357009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Morgan, Pamela Sheila. "Medical image coding and segmentation :." Thesis, University of Bristol, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442206.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Desai, Ujjaval Yogesh. "Coding of segmented image sequences." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/11984.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (leaves 72-74).
by Ujjaval Yogesh.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

28

Frajka, Tamás. "Image coding subject to constraints /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3090437.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Batri, Nadim. "Robust spectral parameter coding in speech processing." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0005/MQ43996.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Asenstorfer, John A. "Source-channel coding for CELP speech coders /." Title page, contents and abstract only, 1994. http://web4.library.adelaide.edu.au/theses/09PH/09pha816.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Soong, Michael. "Predictive split vector quantization for speech coding." Thesis, McGill University, 1994. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=68054.

Full text

Abstract:

The purpose of this thesis is to examine techniques for efficiently coding speech Linear Predictive Coding (LPC) coefficients. Vector Quantization (VQ) is an efficient approach to encode speech at low bit rates. However its exponentially growing complexity poses a formidable barrier. Thus a structured vector quantizer is normally used instead.
Summation Product Codes (SPCs) are a family of structured vector quantizers that circumvent the complexity obstacle. The performance of SPC vector quantizers can be traded off against their storage and encoding complexity. Besides the complexity factors, the design algorithm can also affect the performance of the quantizer. The conventional generalized Lloyd's algorithm (GLA) generates sub-optimal codebooks. For particular SPC such as multistage VQ, the GLA is applied to design the stage codebooks stage-by-stage. Joint design algorithms on the other hand update all the stage codebooks simultaneously.
In this thesis, a general formulation and an algorithm solution to the joint codebook design problem is provided for the SPCs. The key to this algorithm is that every PC has a reference product codebook which minimizes the overall distortion. This joint design algorithm is tested with a novel SPC, namely "Predictive Split VQ (PSVQ)".
VQ of speech Line Spectral Frequencies (LSF's) using PSVQ is also presented. A result in this work is that PSVQ, designed using the joint codebook design algorithm requires only 20 bits/frame(20 ms) for transparent coding of a 10$ sp{ rm th}$ order LSF's parameters.

APA, Harvard, Vancouver, ISO, and other styles

32

Grass, John. "Quantization of predictor coefficients in speech coding." Thesis, McGill University, 1990. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=60067.

Full text

Abstract:

This thesis examines techniques of efficiently coding Linear Predictive Coding (LPC) coefficients with 20 to 30 bits per 20 ms speech frame.
Scalar quantization is the first approach evaluated. Results show that Line Spectral Frequencies require significantly fewer bits than reflection coefficients for comparable performance. The second approach investigated is the use of vector-scalar quantization. In the first stage, vector quantization is performed. The second stage consists of a bank of scalar quantizers which code the vector errors between the original LPC coefficients and the components of the vector of the quantized coefficients.
The approach is to couple the vector and scalar quantization stages. Every codebook vector is compared to the original LPC coefficient vector to produce error vectors. The second innovation into vector-scalar quantization is the incorporation of a small adaptive codebook to the large fixed codebook. Frame-to-frame correlation of the LPC coefficients is exploited at no extra cost in bits.
The performance of the vector-scalar quantization using the two new techniques is better than that of the scalar coding techniques currently used in conventional LPC coders.

APA, Harvard, Vancouver, ISO, and other styles

33

Maroun, Nabih. "Toll-quality speech coding at 8 kbs." Thesis, McGill University, 1993. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56802.

Full text

Abstract:

There has been an ongoing effort to achieve very high quality speech coding at medium transmission bit rates. Consequently, the TIA has chosen the Vector SUM Linear Predictive (VSELP) implementation of an 8 kb/s coder to be the standard for North-American cellular digital telephony. However, it was only recently that, in view of the increased research focus on developing toll-quality speech coding at such bit rates, the CCITT has imposed a set of specifications for standardizing low-delay coders operating at 8 kb/s. The Low-Delay Code Excited Linear Predictive (LD-CELP) suggested by Chen is presently the only potential candidate for CCITT standardization, achieving a one-way coding delay of 10 ms. However, just like the VSELP coding algorithm, the 8 kb/s LD-CELP version does not quite yield toll-quality reconstructed speech. The purpose of the work in this thesis is to establish the minimum requirements for a coding structure capable of generating toll-quality coded speech at 8 kb/s. The purpose of this thesis is to show that, by slightly relaxing the coding delay constraint, perceptual enhancement techniques yield toll quality coding after redesigning and fine-tuning the optimization and quantization procedures of a CELP coder.

APA, Harvard, Vancouver, ISO, and other styles

34

Suddle, Muhammad Riaz. "Speech coding in private and broadcast networks." Thesis, University of Surrey, 1996. http://epubs.surrey.ac.uk/1019/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Oberhofer, Robert. "Pitch adaptive variable bitrate CELP speech coding." Thesis, University of Ulster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264811.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Thorpe, T. F. "Performance bounds for digital coding of speech." Thesis, University of Cambridge, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234070.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Gant, Nicolas Roland Noel. "The linear predictive coding of mask speech." Thesis, University of Southampton, 1986. https://eprints.soton.ac.uk/52261/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Deloche, François. "Short time-scale efficient coding of speech." Thesis, Paris, EHESS, 2019. http://www.theses.fr/2019EHES0142.

Full text

Abstract:

L’analyse de données de parole a montré que la sélectivité fréquentielle de la cochlée est adaptée à la structure statistique de la parole. Ce résultat est conforme à l'hypothèse du codage efficace selon laquelle le traitement sensoriel adopte un schéma de codage qui est optimal pour les stimuli naturels. Cependant, le signal de la parole possède une structure riche, même sur des petites échelles de temps, du fait de la diversité des facteurs acoustiques à l'origine de la génération de la parole. Cette complexité de structure motive l'idée qu'une représentation non linéaire de la parole pourrait aboutir à un schéma de codage plus efficace qu‘une simple représentation linéaire. La première étape dans la recherche de stratégies efficaces est la description de la structure statistique de la parole à un niveau fin. Dans cette thèse, j'explore la structure statistique au niveau des phonèmes en adoptant une approche paramétrique pour la représentation du signal. La décomposition la plus parcimonieuse est recherchée parmi une famille de dictionnaires de filtres de Gabor dont la sélectivité fréquentielle suit différentes lois de puissance dans la gamme des hautes fréquences 1-8kHz. L'utilisation de ces dictionnaires comme représentations temps-fréquence parcimonieuses est justifiée mathématiquement et empiriquement. Un lien formel avec les travaux précédents, fondés sur l'Analyse en Composantes indépendantes (ACI), est présenté. Les lois de puissance des représentations parcimonieuses offrent une interprétation riche de la structure statistique de la parole, et peuvent être reliées à des facteurs acoustiques clés déduits de l'analyse de données réelles et synthétiques. Les résultats montrent en outre qu'une stratégie de codage efficace, reflétant le comportement non linéaire de la cochlée, consiste à réduire la sélectivité fréquentielle avec le niveau d'intensité sonore
Cochlear frequency selectivity is known to reflect the overall statistical structure of speech, in line with the hypothesis that low-level sensory processing provides efficient codes for information contained in natural stimuli. Speech signals, however, possess a complex structure, even on short-time scales, as a result of the diversity of acoustic factors involved in the generation of speech. This rich structure means that advanced coding schemes based on a nonlinear representation of speech sounds could provide more efficient codes. The first step in finding efficient strategies is to describe the statistical structure of speech at a fine level — at the level of phonemes or even finer at the level of acoustic events. In this thesis, I use a parametric approach to explore the fine-grained statistical structure of speech. The goal of this method is to find the sparsest representation of speech sounds among a family of dictionaries of Gabor filters whose frequency selectivity follows different power laws in the high frequency range 1-8kHz. I motivate the use of Gabor filters for the search of sparse time-frequency representations of speech signals, and I show that the dictionary method has a formal link with previous work based on Independent Component Analysis (ICA). The acoustic factors that affect the power law associated with the sparsest decomposition can be inferred from the analyses of synthetic and real data. The results suggest that an efficient speech coding strategy is to reduce frequency selectivity with sound intensity level, reflecting the nonlinear behavior of the cochlea

APA, Harvard, Vancouver, ISO, and other styles

39

Hoyle, Robert D. (Robert Douglas) Carleton University Dissertation Engineering Electrical. "Digital speech coding for land mobile radio." Ottawa, 1986.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

40

Mason, Michael. "Hybrid coding of speech and audio signals." Thesis, Queensland University of Technology, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

41

Chaiyaboonthanit, Thanit. "Image coding using wavelet transform and adaptive block truncation coding /." Online version of thesis, 1991. http://hdl.handle.net/1850/10913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Leong, Michael. "Representing voiced speech using prototype waveform interpolation for low-rate speech coding." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56796.

Full text

Abstract:

In recent years, research in narrow-band digital speech coding has achieved good quality speech coders at low rates of 4.8 to 8.0 kb/s. This thesis examines the method proposed by W. B. Kleijn called prototype waveform interpolation (PWI) for coding the voiced sections of speech efficiently to achieve a coder below 4.8 kb/s while maintaining, even improving, the perceptual quality of current coders.
In examining the PWI method, it was found that although the method generally works very well there are occasional sections of the reconstructed voiced speech where audible distortion can be heard, even when the prototypes are not quantized. The research undertaken in this thesis focuses on the fundamental principles behind modelling voiced speech using PWI instead of focusing on bit allocation for encoding the prototypes. Problems in the PWI method are found that may be have been overlooked as encoding error if full encoding were implemented.
Kleijn uses PWI to represent voiced sections of the excitation signal which is the residual obtained after the removal of short-term redundancies by a linear predictive filter. The problem with this method is that when the PWI reconstructed excitation is passed through the inverse filter to synthesize the speech undesired effects occur due to the time-varying nature of the filter. The reconstructed speech may have undesired envelope variations which result in audible warble.
This thesis proposes an energy fixup to smoothen the synthesized speech envelope when the interpolation procedure fails to provide the smooth linear result that is desired. Further investigation, however, leads to the final proposal in this thesis that PWI should he performed on the clean speech signal instead of the excitation to achieve consistently reliable results for all voiced frames.

APA, Harvard, Vancouver, ISO, and other styles

43

Varga, A. P. "Multipulse excited linear predictive analysis in speech coding and constructive speech synthesis." Thesis, University of Cambridge, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.372909.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Accardi, Anthony J. (Anthony Joseph) 1976. "A modular approach to speech enhancement with an application to speech coding." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9976.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, Thesis (B.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.
Includes bibliographical references (p. 98-101).
by Anthony J. Accardi.
B.S.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

45

Greenwood, Andrew Richard. "Articulatory speech synthesis." Thesis, University of Liverpool, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386773.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Islam, Tamanna. "Interpolation of linear prediction coefficients for speech coding." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0034/MQ64229.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Loo, James H. Y. (James Hung Yan). "Intraframe and interframe coding of speech spectral parameters." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=24065.

Full text

Abstract:

Most low bit rate speech coders employ linear predictive coding (LPC) which models the short-term spectral information within each speech frame as an all-pole filter. In this thesis, we examine various methods that can efficiently encode spectral parameters for every 20 ms frame interval. Line spectral frequencies (LSF) are found to be the most effective parametric representation for spectral coding. Product code vector quantization (VQ) techniques such as split VQ (SVQ) and multi-stage VQ (MSVQ) are employed in intraframe spectral coding, where each frame vector is encoded independently from other frames. Depending on the product code structure, "transparent coding" quality is achieved for SVQ at 26-28 bits/frame and for MSVQ at 25-27 bits/frame.
Because speech is quasi-stationary, interframe coding methods such as predictive SVQ (PSVQ) can exploit the correlation between adjacent LSF vectors. Nonlinear PSVQ (NPSVQ) is introduced in which a nonparametric and nonlinear predictor replaces the linear predictor used in PSVQ. Regardless of predictor type, PSVQ garners a performance gain of 5-7 bits/frame over SVQ. By interleaving intraframe SVQ with PSVQ, error propagation is limited to at most one adjacent frame. At an overall bit rate of about 21 bits/frame, NPSVQ can provide similar coding quality as intraframe SVQ at 24 bits/frame (an average gain of 3 bits/frame). The particular form of nonlinear prediction we use incurs virtually no additional encoding computational complexity. Voicing classification is used in classified NPSVQ (CNPSVQ) to obtain an additional average gain of 1 bit/frame for unvoiced frames. Furthermore, switched-adaptive predictive SVQ (SA-PSVQ) provides an improvement of 1 bit/frame over PSVQ, or 6-8 bits/frame over SVQ, but error propagation increases to 3-7 frames. We have verified our comparative performance results using subjective listening tests.

APA, Harvard, Vancouver, ISO, and other styles

49

Ramachandran, Ravi P. "Pitch filtering in adaptive predictive coding of speech." Thesis, McGill University, 1986. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=65345.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Roy, Guylain. "Low-rate analysis-by-synthesis wideband speech coding." Thesis, McGill University, 1990. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=59643.

Full text

Abstract:

This thesis studies low-rate wideband analysis-by-synthesis speech coders. The wideband speech signals have a bandwidth of up to 8 kHz and are sampled at 16 kHz, while the target operating bit rate is 16 kbits/sec. Applications for such a coder range from high-quality voice-mail services to teleconferencing. In order to achieve a low operating rate, the coding places more emphasis on the lower frequencies (0 to 4 kHz), while the higher frequencies (4 to 8 kHz) are coded less precisely but with little perceived degradation.
The study consists of three stages. First, aspects of wideband spectral envelope modeling using Line Spectral Frequencies (LSF's) are studied. Then, the underlying coder structure is derived from a basic Residual Excited Linear Predictive coder (RELP). This structure is enhanced by the addition of a pitch prediction stage, and by the development of full-band and split-band pitch parameter optimization procedures. These procedures are then applied to an Code Excited Linear Prediction (CELP) model. Finally, the performance of full-band and split-band CELP structures are compared.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Image and speech coding'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles