Log in

Relevant bibliographies by topics / Vocoders / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Vocoders.

Dissertations / Theses on the topic 'Vocoders'

Author: Grafiati

Published: 4 June 2021

Last updated: 30 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Vocoders.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Crossman, A. H. "Multipulse-excitation applied to vocoders." Thesis, University of Cambridge, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232981.

Full text

Abstract:

Multipulse-excitation has greatly improved the speech quality achievable from linear predictive coders which previously required speech to be classified as voiced or unvoiced for excitation purposes. Multipulse removes the need for voicing classification, improving speech quality by enhancing the excitation and offsetting errors in the vocal tract filter. An investigation of multipulse-excitation applied to a channel vocoder and a formant synthesiser was conducted. The prime objective was to improve the performance of these algorithms and achieve multipulse linear prediction speech quality, our target quality. This dissertation outlines and restates the idea of multipulse-excitation applied to a linear predictive vocoder. We then examine a high quality channel vocoder and formant synthesiser, and the use of multipulse-excitation to improve their performances. In each case time and frequency domain multipulsecalgorithms were used. Various modifications were made to these algorithms in order to accommodate multipulse-excitation and improve the overall speech quality. In the case of the channel vocoder this involved a novel technique, which sacrificed the inherent waveform preserving properties of the multipulse algorithm. Only by increasing both the pulse rate and the number of channels could the multipulse-excited channel vocoder achieve our target quality. With the formant synthesiser it was possible, by variation of the pulse rate alone, to achieve our target quality. Comparisons are drawn between the three multipulse algorithms and reasons given for their differing performance; this is substantiated by experimental results. These results suggested interesting improvements to the multipulse-excited formant synthesiser; and also hinted at a new and novel technique for formant tracking, using multipulse-excitation applied to a formant synthesiser.

APA, Harvard, Vancouver, ISO, and other styles

2

Ma, Wei. "Multi-band excitation based vocoders and their real-time implementation." Thesis, University of Surrey, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.240182.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Fang, Jie. "Design of secure speech encryption systems." Thesis, Queensland University of Technology, 1990. https://eprints.qut.edu.au/36471/1/36471_Fang_1990.pdf.

Full text

Abstract:

This thesis investigates the design of digital speech encryption systems based on low bit rate vocoders. The speech quality and the cryptographic strength of the system are determined by vocoder and encryptor respectively. Three different low bit rate vocoders, 2400 BPS LPC ( Linear Prediction Coding) vocoder, 9600 BPS MELPC (Mul tipulse Excited Linear Prediction Coding) vocoder and 4800 BPS CELP (Codebook Excited Linear Prediction coding) vocoder, have been simulated. The performances of these vocoders are evaluated by using four objective measures. The thesis considers the follows aspects of digital encryption system: * Security * Speech quality * Robustness * System delay Several choices of the cryptosystem for the encryption of digital speech are investigated, and the performance of the overall system is discussed. The work presented in this thesis enables a secure communication system designer to select a speech coding scheme and a cipher system to meet required level of security and speech quality. encryption systems throughout this thesis refers to mathematics analysis and simulation of such systems rather than the actual construction of electronic circuits.

APA, Harvard, Vancouver, ISO, and other styles

4

Bliūdžius, Mindaugas. "Skaitmeninių kalbos įrašų glaudinimo metodai." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2004. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2004~D_20040529_122424-17577.

Full text

Abstract:

The past three decades has witnessed substantial progress towards the application of low-rate speech coders to civilian and military communications as well as computer-related voice applications. Central to this progress has been the development of new speech coders capable of producing high-quality speech at low data rates. Most of these coders incorporate mechanisms to: represent the spectral properties of speech, provide for speech waveform matching, and "optimize" the coder's performance for the human ear. A number of these coders have already been adopted in national and international cellular telephony standards. The objective of this paper is to provide a tutorial overview of speech coding methodologies with emphasis on those algorithms that are part of the recent low-rate standards for voice applications. Although the emphasis is on the new low-rate coders, we attempt to provide a comprehensive survey by covering some of the traditional methodologies as well. The paper starts with a historical perspective and continues with a brief discussion on the speech properties and performance measures. Then I proceed with descriptions of waveform coders, linear predictive vocoders, and analysis-by-synthesis linear predictive coders. At the end the system for computer-based stenographing is presented. Quality research and ways how to improve this system will be provided.

APA, Harvard, Vancouver, ISO, and other styles

5

Chmayssani, Toufic. "Modulation sur les canaux vocodés." Phd thesis, Université Paris-Est, 2010. http://tel.archives-ouvertes.fr/tel-00587629.

Full text

Abstract:

Les canaux vocodés sont les canaux de communications dédiés à la voix et dans lesquels le signal traverse divers équipements destinés au transport de la voix tels que des codeurs de parole, des détecteurs d'activité vocale (VAD), des systèmes de transmission discontinue (DTX). Il peut s'agir de systèmes de communications téléphoniques filaires ou mobiles (réseaux cellulaires 2G/3G, satellites INMARSAT...) ou de voix sur IP. Les codeurs de parole dans les normes récentes pour les réseaux de téléphonie mobiles ou de voix sur IP font appel à des algorithmes de compression dérivés de la technique CELP (Code Excited Linear Prediction) qui permettent d'atteindre des débits de l'ordre de la dizaine de Kb/s bien inférieurs aux codeurs des réseaux téléphoniques filaires (typiquement 64 ou 32 Kb/s). Ces codeurs tirent leur efficacité de l'utilisation de caractéristiques spécifiques aux signaux de parole et à l'audition humaine. Aussi les signaux autres que la parole sont-ils généralement fortement distordus par ces codeurs. La transmission de données sur les canaux vocodés peut être intéressante pour des raisons liées à la grande disponibilité des canaux dédiés à la voix et pour des raisons de discrétion de la communication (sécurité). Mais le signal modulé transmis sur ces canaux vocodés est soumis aux dégradations causées par les codeurs de parole, ce qui impose des contraintes sur le type de modulation utilisé. Cette thèse a porté sur la conception et l'évaluation de modulations permettant la transmission de données sur les canaux vocodés. Deux approches de modulations ont été proposées pour des applications correspondant à des débits de transmission possibles assez différents. La principale application visée par la thèse concerne la transmission de parole chiffrée, transmission pour laquelle le signal de parole est numérisé, comprimé à bas débit par un codeur de parole puis sécurisé par un algorithme de cryptage. Pour cette application, nous nous sommes focalisés sur les réseaux de communications utilisant des codeurs CELP de débits supérieurs à la dizaine de Kb/s typiquement les canaux de communication mobiles de deuxième ou troisième génération. La première approche de modulation proposée concerne cette application. Elle consiste à utiliser des modulations numériques après optimisation de leurs paramètres de façon à prendre en compte les contraintes imposées par le canal et à permettre des débits et des performances en probabilité d'erreur compatibles avec la transmission de parole chiffrée (typiquement un débit supérieur à 1200 b/s avec un BER de l'ordre de 10-3). Nous avons montré que la modulation QPSK optimisée permet d'atteindre ces performances. Un système de synchronisation est aussi étudié et adapté aux besoins et aux contraintes du canal vocodé. Les performances atteintes par la modulation QPSK avec le système de synchronisation proposé, ainsi que la qualité de la parole sécurisée transmise ont été évalués par simulation et validés expérimentalement sur un canal GSM réel grâce à un banc de test développé dans la thèse.La deuxième approche de modulation a privilégié la robustesse du signal modulé lors de la transmission à travers un codeur de parole quelconque, même un codeur à bas débit tels que les codeurs MELP à 2400 ou 1200 b/s. Dans ce but, nous avons proposé une modulation effectuée par concaténation de segments de parole naturelle associée à une technique de démodulation qui segmente le signal reçu et identifie les segments de parole par programmation dynamique avec taux de reconnaissance élevé. Cette modulation a été évaluée par simulation sur différents codeurs de parole. Elle a aussi été testée sur des canaux GSM réels. Les résultats obtenus montrent une probabilité d'erreur très faible quelque soit le canal vocodé et le débit des codeurs de parole utilisés mais pour des débits possibles relativement faibles. Les applications envisageables sont restreintes à des débits typiquement inférieurs à 200 b/s.Enfin nous nous sommes intéressés aux détecteurs d'activité vocale dont l'effet peut-être très dommageable pour les signaux de données. Nous avons proposé une méthode permettant de contrer les VAD utilisés dans les réseaux GSM. Son principe consiste à rompre la stationnarité du spectre du signal modulé, stationnarité sur laquelle s'appuie le VAD pour décider que le signal n'est pas de la parole

APA, Harvard, Vancouver, ISO, and other styles

6

LeBlanc, Wilfrid P. (Wilfrid Paul) Carleton University Dissertation Engineering Electrical. "An advanced speech coder based on a rate-distortion theory framework." Ottawa, 1988.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Griffin, Daniel W. "Multi-band excitation vocoder." Thesis, Massachusetts Institute of Technology, 1987. http://hdl.handle.net/1721.1/14803.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Martins, José Antônio. "Vocoder LPC com quantização vetorial." [s.n.], 1991. http://repositorio.unicamp.br/jspui/handle/REPOSIP/261389.

Full text

Abstract:

Orientador : Fabio Violaro<br>Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica<br>Made available in DSpace on 2018-07-13T23:59:40Z (GMT). No. of bitstreams: 1 Martins_JoseAntonio_M.pdf: 6784204 bytes, checksum: 4e9df50ca8f72e1710d541924b76a67c (MD5) Previous issue date: 1991<br>Resumo: Neste trabalho são descritos os princípios do vocoder LPC, sendo mostrados os métodos para cálculo dos parâmetros do mesmo. Também são apresentados os resultados de simulações de vocoders LPC usando quantização escalar, quantização vetorial e interpolação dos parâmetros quantizados. Inicialmente foi projetado um vocoder LPC não quantizado, o qual serviu de padrão para a avaliação dos vocoders quantizados. Usando a quantização escalar dos coeficientes razão log-área foi obtido um vocoder à taxa de 2200 bit /s, assegurando uma boa qualidade e alta inteligibilidade da voz sintetizada. Com o uso da quantização vetorial obteve-se um bom desempenho em taxas da ordem de 1000 bit/s. Essas taxas foram reduzidas em 50% com o uso da interpolação linear, transmitindo apenas os parâmetros dos quadros ímpares. Assim, conseguiu-se vocoders com taxas ao redor de 500 bit/s, apresentando voz sintetizada com degradação em relação aos sistemas anteriores, mas ainda assegurando uma boa inteligibilidade<br>Abstract: Not informed.<br>Mestrado<br>Eletronica e Comunicações<br>Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

9

Vávra, Jakub. "Šifrování telefonních hovorů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235954.

Full text

Abstract:

This master's thesis is about making draft and implementing land-line phone call encryption using FITkit. The ultimate goal is to find suitable compression and encryption methods, implement or adapt them for FITkit board and create functional solution.

APA, Harvard, Vancouver, ISO, and other styles

10

Hudson, Nicholaus D. W. "The self-excited vocoder for mobile telephony." Thesis, University of Bath, 1992. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.760629.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Moore, James Thomas. "A mixed excitation vocoder with fuzzy logic classifier." Thesis, Monterey, California. Naval Postgraduate School, 1992. http://hdl.handle.net/10945/23960.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Foley, Jeffrey J. (Jeffrey Joseph). "Digital implementation of a frequency-lowering channel vocoder." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/38798.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.<br>Includes bibliographical references (p. 58-59).<br>by Jeffrey J. Foley.<br>M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

13

Carr, Raymond C. "Improvements to a pitch-synchronous linear predictive coding (LPC) vocoder." Thesis, University of Ottawa (Canada), 1989. http://hdl.handle.net/10393/5954.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Yeh, Ernest Nanjung 1975. "Advanced Vocoder Idle Slot Exploitation for TIA IS-136 standard." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/47580.

Full text

Abstract:

Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.<br>Includes bibliographical references (p. 55).<br>by Ernest Nanjung Yeh.<br>S.B.and M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

15

Manjunath, Sharath. "Implementation of a variable rate vocoder and its performance analysis." Thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06102009-063255/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Iyengar, Vasu. "A low delay 16 kbit/sec coder for speech signals /." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63799.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Karoui, Chadlia. "Neuroplasticity behind the rehabilitation of asymmetrical hearing loss and tinnitus through cochlear implantation : from psychoacoustic evaluations to neuroimaging studies." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30151.

Full text

Abstract:

Ce travail de thèse visait à étudier les adaptations périphériques et centrales du système auditif liées à l'effet bénéfique des implants cochléaires (IC) chez les sujets présentant une perte auditive asymétrique (AHL) et des acouphènes. En ce sens, notre principal intérêt était d'étudier la possibilité d'une fusion entre le signal électrique de l'IC et le signal acoustique de l'oreille auditive et de déterminer si cette fusion restaure les mécanismes d'intégration binaurale, comme chez les sujets ayant une audition normale (NH), tant sur le niveau comportemental qu'au niveau central. D'un point de vue clinique, ces études sur la récupération de l'audition chez les sujets souffrant d'AHL fourniront des informations cruciales sur les capacités plastiques du cerveau à s'adapter à la stimulation électrique et guideront ainsi les stratégies thérapeutiques permettant de mieux récupérer les capacités binaurales et la perception linguistique et paralinguistique. Nous avons combiné différents types de tests comportementaux et audiologiques, des analyses radiologiques et une évaluation en neuroimagerie (imagerie PET Scan H2O15). En outre, nous avons pu décrire certaines propriétés qualitatives du son perçu du côté implanté et évaluer la réponse centrale à cette incohérence spectrale - lorsque les deux signaux de nature différente sont présentés, nous renseignant potentiellement sur des stratégies adaptatives possibles. Par ailleurs, nous avons confirmé que les principaux avantages de la réafférentation électrique via l'IC sont principalement la diminution et, dans certains cas, la suppression des acouphènes. Nous avons également envisagé plusieurs stratégies thérapeutiques pour le masquage des acouphènes impliquant non seulement l'oreille IC, mais également l'oreille NH. Dans l'ensemble, nous estimons que les sujets AHL bénéficient réellement de l'implantation cochléaire. Par conséquent, nos données indiquent que les adaptations plastiques induites par la réafférentation électrique chez les sujets AHL pourraient jouer un rôle déterminant dans la restauration des capacités binaurales, dans l'adaptation aux caractéristiques spectrales du signal IC et dans la suppression des acouphènes, ce qui permettrait potentiellement d'apporter un peu plus d'informations sur leurs mécanismes sous-jacents<br>This thesis work aimed to investigate the peripheral and central adaptations of the auditory system related to the beneficial effect of cochlear implants (CI) in subjects with asymmetrical hearing loss (AHL) and tinnitus. In this sense, our main interest was to study the possible fusion between the electric signal of the CI and the acoustic signal from the hearing ear and assess if it restores the binaural integration mechanisms as in normal-hearing (NH) subjects, both on behavioral and central levels. From the clinical standpoint, these studies on hearing recovery in AHL CI subjects will provide crucial information on the plastic abilities of the brain to adapt to electrical stimulation and thus to guide therapeutic strategies to better recover binaural abilities, and linguistic and para-linguistic perception. We combined behavioral and audiological testing, radiological analysis and neuroimaging investigation (H2O15PET Scan imaging). Besides, we were able to describe some qualitative properties of the perceived sound on the implanted side and to evaluate the central response to this spectral inconsistency- when the two signals of different nature are presented, potentially informing on possible adaptive strategies. In addition, we confirmed that the main benefits of electrical reafferentation via the CI is mostly the decrease, and in some cases the suppression, of tinnitus. We also considered several therapeutic strategies for tinnitus masking involving not only the CI ear but also the NH ear. Overall, we strongly believe that AHL subjects truly benefit from cochlear implantation. Hence, our data indicate that plastic adaptations to the CI input in AHL subjects may play a key role on restoring binaural hearing abilities, accommodation to CI signal spectral characteristics and tinnitus suppression which may shed some light on its underlying mechanisms

APA, Harvard, Vancouver, ISO, and other styles

18

Huang, Ying. "Effects of vocoder distortion and packet loss on network echo cancellation." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0029/MQ66876.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

McCree, Alan V. "A new LPC vocoder model for low bit rate speech coding." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Chung, Jae H. "A new homomorphic vocoder framework using analysis-by-synthesis excitation analysis." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15471.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Donaldson, Nicholas. "Extending the phase vocoder with damped sinusoid atomic decomposition of transients." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104832.

Full text

Abstract:

Pitch-preserving time scale modification and time-preserving pitch modification of recorded sounds are integral effects in modern digital music production, and some implementation of these effects can be found in nearly all commercial digital audio production software. Recent research has led to improvements in the reduction of transient smearing artifacts in otherwise high-quality frequency domain time scaling (phase vocoder) algorithms, but many modern implementations still exhibit noticeable smoothing of very abrupt transients, especially for drastic time scale modifications. By using a sparse atomic decomposition method to create representations of the transients in an audio signal, the transient and steady-state content of the signal can be separated and processed separately. The phase vocoder can be used to modify only the steady-state content of the signal, preserving the fidelity of transients when using time scaling effects. Such an extension is introduced here, along with a working software implementation, which performs such feature-specific processing through the use of a damped sinusoid matching pursuit algorithm to represent and remove transients from an audio signal. A high-resolution transient onset detection algorithm is also presented, as well as a practical application of phase locking to a computationally efficient phase vocoder formulation.<br>Modifier indépendamment la hauteur et l'échelle temporelle d'enregistrements sonores est devenu un outil essentiel de la production audio numérique actuelle; si bien que la plupart des logiciels commerciaux dédiés à la production incluent une version de ces effets. Les algorithmes d'étirement du sons fondés sur le vocodeur de phase permettent d'obtenir des résultats de très bonne qualité, notamment à la suite de travaux récents visant à réduire l'"étalement" des transitoires, artefacts caractéristiques de ces méthodes. Cependant, même les algorithmes les plus récents étalent les transitoires très abruptes, et ce d'autant plus que les modifications de l'échelle temporelle sont extrêmes. Afin de proposer une solution à ce problème, nous faisons ici appel à une décomposition atomique parcimonieuse permettant de dissocier les variations brusques du signal de ses variations plus lentes. Ceci permet alors de laisser les transitoires intacts et de ne modifier que le reste du son à l'aide d'un algorithme de type vocodeur de phase. Ceci assure ainsi une meilleure qualité de l'étirement temporel, même dans les cas extrêmes. Nous présentons dans ce mémoire les détails d'une telle méthode ainsi qu'un logiciel utilisant un algorithme de type "matching pursuit" pour représenter les transitoires du signal audio par des sinusoïdes amorties exponentiellement. Les autres contributions originales de ce travail incluent une nouvelle méthode de détection d'attaque à haute-résolution temporelle, ainsi que l'implémentation d'une version du vocodeur de phase peu coûteuse en temps de calcul et particulièrement appropriée à l'étirement des sons.

APA, Harvard, Vancouver, ISO, and other styles

22

Apel, Theodore R. "Feature preservation and negated music in a phase vocoder sound representation." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2008. http://wwwlib.umi.com/cr/ucsd/fullcit?p3303958.

Full text

Abstract:

Thesis (Ph. D.)--University of California, San Diego, 2008.<br>Title from first page of PDF file (viewed Jun. 17, 2008). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references: P. 92-98.

APA, Harvard, Vancouver, ISO, and other styles

23

De, Meuleneire Mickaël. "Codage imbriqué pour la parole à 8-32 KBIT/S combinant techniques CELP, ondelettes et extension de bande." Télécom Bretagne, 2007. http://www.theses.fr/2007TELB0055.

Full text

Abstract:

The constraints of quality of service related to Voice over IP applications have made necessary the development of a new class of codecs, called embedded, or scalable, codecs, able to decode a part of the generated bitstream. The wideband speech codec developed during this thesis provides an embedded bitstream that can be decoded at bitrates ranging from 8 to 32 kbit/s. To do so, the codec structure comprises three layers. First, a split band structure separates the narrowband component and wideband component of the input signal. Then, the first layer, called core layer, encodes the narrow band component of the input signal. This layer makes use of the ITU-T G. 729 coder. Afterwards, the second layer, called first enhancement layer, utilizes bandwidth extensio techniques relying on a wavelet filter bank to reproduce artificially the wideband component, with an additional bitrate of 2 kbit/s. Finally, the second and last enhancement layer, progressively encodes the wavelet coefficients of the difference between the original signal and the G. 729 output in the narrowband part, and encodes the wavelet coefficients of the original signal in the wideband part. Hence, the decoder ensures a narrowband signal at 8 kbit/s, enables wideband rendering at 10 kbit/s and improves the quality up to 32 kbit/s. Listening tests have shown that the quality of the codec improves gracefully as the bitrate increases. For speech signals the codec at 24 kbit/s and 32 kbit/s is shown to be equivalent to the ITU-T G. 722 codec at 56 and 64 kbit/s, respectively. Moreover, the codec at 32 kbit/s is assessed to be equivalent to the recently standardized embedded codec ITU-T G. 729. 1 at the same bitrate.<br>Les contraintes de qualité de service liées aux applications de voix sur IP ont rendu nécessaire le développement d'une nouvelle classe de codecs, qualifiés d'imbriqués, ou scalables, qui sont capables de décoder tout ou partie du train binaire. Le codec de parole en bande élargie développé au cours de cette thèse produit un train binaire qui peut être décodé à des débits variant de 8 à 32 kbit/s. Dans ce but, la structure du codeur comprend trois couches. Tout d'abord, un premier banc de filtre isole la composante bande étroite de la composante bande élargie du signal d'entrée. Puis, la première couche, appelée couche coeur, encode la composante bande étroite du signal d'entrée. Cette couche utilise le codeur G. 729 de l'UIT-T. Ensuite, la deuxième couche, encore appelée première couche d'amélioration, emploie des techniques d'extension de bandes qui reposent sur l'utilisation d'un banc de filtre en ondelettes pour reproduire artificiellement la composante bande élargie, avec un débit additionnel de 2 kbit/s. Enfin, la seconde et dernière couche d'amélioration, encode de manière progressive les coefficients d'ondelettes de la différence entre le signal original et la sortie du G. 729 dans la partie bande étroite, et encode les coefficients d'ondelettes du signal original dans la partie bande élargie. Par conséquent, le décodeur assure un signal reconstruit à bande étroite à un débit de 8 kbit/s, produit un signal bande élargie à 10 kbit/s, et améliore la qualité jusqu'à un débit de 32 kbit/s. Des tests d'écoute ont montré que la qualité du codec s'améliore avec une augmentation du débit. Pour des signaux de parole, le codec à 24 et 32 kbit/s est équivalent au codeur G. 722 de l'UIT-T à 56 et 64 kbit/s. De plus, le codec à 32 kbit/s est équivalent au codeur imbriqué G. 729. 1 au même débit, récemment standardisé à l'UIT-T

APA, Harvard, Vancouver, ISO, and other styles

24

Perini, Jean-Bernard. "Étude et simulation d'un procédé de codage en sous-bandes de type amplitude-phase : application à la compression numérique du signal de la parole." Nice, 1986. http://www.theses.fr/1986NICE4017.

Full text

Abstract:

On propose un nouveau type de codage pour les vocodeurs a bande de base qui permet la réduction du débit binaire tout en préservant la qualité de la voix comprimée : un procédé de codage amplitude-phase

APA, Harvard, Vancouver, ISO, and other styles

25

Morgenstern, Robert M. "Vector quantization applied to speech coding in the wireless environment." Thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-07292009-090440/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

LeBlanc, Wilfrid P. (Wilfrid Paul) Carleton University Dissertation Engineering Electrical. "Speech coding at low to medium bit rates." Ottawa, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

27

Hervais-Adelman, Alexis Georges. "The perceptual learning of noise-vocoded speech." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Madre, Guillaume. "Application de la transformée en nombres entiers à l'étude et au développement d'un codeur de parole pour transmission sur réseaux IP." Brest, 2004. http://www.theses.fr/2004BRES2036.

Full text

Abstract:

Notre étude s'inscrit dans le domaine de la compression des signaux vocaux pour la transmission de voix par réseau Internet (VoIP : Voice over Internet Protocol). Les perspectives à moyen terme étant la mise en place d'une application de téléphonie IP, nos travaux fournissent les premiers éléments pour le fonctionnement en réel d'un système de codage de parole et son intégration à un processeur DSP. Ils se sont concentrés sur le codeur de parole G. 729 de type CS-ACELP (Conjugate Structure –Algebraic Code-Excited Linear Prediction), retenu parmi les recommandations de l'Union Internationale des Télécommunications (UIT) et déjà reconnu pour sa faible complexité d'implantation. L'étude principale a été d'améliorer ses performances et diminuer sa charge de calcul tout en maintenant un équilibre entre la qualité de codage et la complexité de calcul engendrée. Pour réduire le coût de calcul de ce codeur, nous avons approfondi les bases mathématiques de la Transformée en Nombres Entiers (NTT : Number Theoretic Transform) qui est amenée à trouver des applications de plus en plus diverses en traitement du signal. Nous avons introduit plus particulièrement la Transformée en Nombres de Fermat (FNT : Fermat Number Transform) qui est la plus adaptée aux opérations de traitement numérique. Nous avons constaté que son application à certains algorithmes de codage permet une réduction importante de la complexité de calcul. Ainsi, le développement de nouveaux algorithmes performants, pour la Prédiction Linéaire (LP : Linear Prediction) du signal et la modélisation de l'excitation, a permis une modification du codeur G. 729 en vue de son implantation sur un processeur à virgule fixe. De plus, une nouvelle fonction de détection d'activité de parole (VAD : Voice Activity Detection) a permis la mise en place d'une procédure de compression des silences plus efficace et la réduction du débit de transmission<br>Our study considers the vocal signals compression for the transmission of Voice over Internet Protocol (VoIP). The prospects being the implementation of a telephony IP application, the work provides the first elements for a real-time speech coding system and its integration to a DSP. They are concentrated on the speech CS-ACELP (Conjugate Structure- Algebraic Code-Excited Linear Prediction) G. 729 coder, retained among the International Telecommunications Union (ITU) recommendations and already recognized for its low implementation complexity. The main aspect was to improve its performances and to decrease its computational cost, while maintaining the compromise between the coding quality and the required complexity. To reduce the computational cost of this coder, we looked further into the mathematical bases of the Number Theoretic Transform (NTT) which is brought to find more and more various applications in signal processing. We introduced more particularly the Fermat Number Transform (FNT) which is well suited for digital processing operations. Its application to different coding algorithms allows an important reduction of the computational complexity. Thus, the development of new efficient algorithms, for the Linear Prediction (LP) of the speech signal and the excitation modeling, has allowed a modification of the G. 729 coder and his implementation on a fixed-point processor. Moreover, a new function of Voice Activity Detection (VAD) has carried out the implementation of one more efficient procedure for silences compression and the reduction of the transmission rate

APA, Harvard, Vancouver, ISO, and other styles

29

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Sherbrooke : Université de Sherbrooke, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

30

Stefanovic, Milos. "Vocoder model based variable rate narrowband and wideband speech coding below 9 kbps." Thesis, University of Surrey, 1999. http://epubs.surrey.ac.uk/843965/.

Full text

Abstract:

The past two decades have witnessed rapid growth and development within the telecommunications industry. This has been primarily fuelled by the proliferation of digital mobile communication applications and services which have become commonplace and easily within the financial reach of businesses and the general public. Current research trends, involving integration and packetisation of voice, video and data channels into true multimedia communications, promise a similar technological revolution in the next decade. One of the key design issues of the new high quality multimedia services is a requirement for very high data rates. Whilst the available bandwidth in wire based terrestrial network is a relatively cheap and expandable resource, it becomes inherently limited in satellite or cellular radio systems. In order to accommodate ever growing numbers of subscribers whilst maintaining high quality and low operational costs, it is necessary to maximise spectral efficiency and reduce power consumption. This has given rise to the rapid development of signal compression techniques, which in the speech transmission domain are known as speech coding algorithms. The research carried out for this thesis has mainly focused on the design and development of low bit rate narrowband and wideband speech coding systems which utilise a variable rate approach in order to improve their perceptual quality and reduce their transmission rates. The algorithms subsequently developed are based on the existing vocoding schemes, whose rigid fixed rate structure is a major limitation to achieving higher quality and lower rates. The variable rate schemes utilise the time-varying characteristics of the speech signal which is classified according to the developed segmentation algorithms. Two main schemes were developed, a variable bit rate with an average as low as 1.35 kbps and a variable frame rate with an average of 2.1 kbps, both achieving or even surpassing the subjective quality of the existing vocoding standard at 4.15 kbps. Wideband speech exhibits characteristics which are not embodied within narrowband speech and which contribute to the superior perceived quality. A very high quality wideband vocoder operating at rates (fixed and variable) below 9 kbps is presented in this thesis, whereby particular attention is paid to preserving the information in higher frequencies in order to maximise the attainable quality.

APA, Harvard, Vancouver, ISO, and other styles

31

Kim, Hyun Soo Electrical Engineering &amp Telecommunications Faculty of Engineering UNSW. "Speech analysis techniques useful for low or variable bit rate coding." Awarded by:University of New South Wales. School of Electrical Engineering and Telecommunications, 2005. http://handle.unsw.edu.au/1959.4/22050.

Full text

Abstract:

We investigate, improve and develop speech analysis techniques which can be used to enhance various speech processing systems, especially low bit rate or variable bit rate coding of speech. The coding technique based on the sinusoidal representation of speech is investigated and implemented. Based on this study of the sinusoidal model of speech, improved analysis techniques to determine voicing, pitch and spectral estimation are developed, as well as noise reduction technique. We investigate the properties and limitations of the spectral envelope estimation vocoder (SEEVOC). We generalize, optimize and improve the SEEVOC and also compare it with LP in the presence of noise. The properties and applications of morphological filters for speech analysis are investigated. We introduce and investigate a novel nonlinear spectral envelope estimation method based on morphological operations, which is found to be very robust against noise. This method is also compared with the SEEVOC method. A simple method for the optimum selection of the structuring set size without using prior pitch information is proposed for many purposes. The morphological approach is then used for a new pitch estimation method and for the general sinusoidal analysis of speech or audio. Many of the new methods are based on a novel systematic analysis of the peak features of signals, including the study of higher order peaks. We propose a novel peak feature algorithm, which measure the peak characteristics of speech signal in time domain, to be used for end point detection and segmentation of speech. This nonparametric algorithm is flexible, efficient and very robust in noise. Several simple voicing measures are proposed and used in a new speech classifier. The harmonic-plus-noise decomposition technique is improved and extended to give an alternative to the methods used in the sinusoidal analysis method. Its applications to pitch estimation, speech classification and noise reduction are investigated.

APA, Harvard, Vancouver, ISO, and other styles

32

Lee, Keebbum state. "Korean-English Bilinguals’ perception of noise-vocoded speech." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1562004544370682.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Arnaud, Charles. "Nouvelles méthodes de régénération de la composante haute fréquence du signal d'excitation d'un codeur de type RELP." Nice, 1987. http://www.theses.fr/1987NICE4090.

Full text

Abstract:

Présentation d'une méthode originale de codage du signal a haute fréquence, visant à améliorer la qualité du plan de la perception auditive du signal restitue par un vocodeur a bande de base et a excitation résiduelle (RELP)

APA, Harvard, Vancouver, ISO, and other styles

34

Etame, Etame Thierry. "Conception de signaux de référence pour l'évaluation de la qualité perçue des codeurs de la parole et du son." Rennes 1, 2008. http://www.theses.fr/2008REN1S112.

Full text

Abstract:

La manière la plus fiable d’évaluer la qualité des codecs consiste toujours à réaliser des séances d’écoute subjectives avec des méthodes qui doivent sans cesse s'adapter aux dégradations générées par les nouveaux schémas de compression. Ces tests nécessitent la présence de conditions de référence afin de permettre la comparaison des résultats d'un test à l'autre. Or, avec le système de référence MNRU (Modulated Noise Reference Unit ou appareil de référence à bruit modulé) actuel, seule la dégradation du bruit de quantification générée par les codeurs de forme d'onde PCM (Pulse Code Modulation ou MIC, Modulation par Impulsion Codée) est prise en compte. L'objectif de ce travail de thèse est de proposer un système de référence adapté aux dégradations générées par les nouveaux schémas de compression. La démarche adoptée consiste à déterminer et à caractériser l'espace perceptif qui sous-tend la perception des dégradations des codeurs actuels, pour pouvoir simuler ces dégradations<br>Subjective assessment is the most reliable way to determine overall perceived voice quality of network equipment, as digital codecs. Reference conditions are useful in subjective tests to provide anchors so that results from different tests can be compared. The Modulated Noise Reference Unit (MNRU) provides a simulated and calibrated degradation qualitatively similar to quantization distortion of waveform codecs. The introduction of new technologies for telecommunications services introduce new types of distortions and so the MNRU is not representative any more of the current degradations. The purpose of our work is to produce a reference system that can simulate and calibrate current degradations of speech and audio codec. The first step of the work consists in producing the multidimensional perceptive space underlying the perception of current degradations. The characterization of these perceptive dimensions should help to simulate and calibrate similar degradations

APA, Harvard, Vancouver, ISO, and other styles

35

Atkinson, Ian Andrew. "Advanced linear predictive speech compression at 3.0 kbits/sec and below." Thesis, University of Surrey, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.336527.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Fischman, Rajmil. "Musical applications of digital synthesis and processing techniques : realisation using Csound and the Phase Vocoder." Thesis, University of York, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.280530.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Fék, Márk. "Étude de la compression de la parole et des signaux audionumériques dans la bande élargie." Rennes 1, 2006. http://www.theses.fr/2006REN1S063.

Full text

Abstract:

Cette thèse présente un codeur parole et audio fonctionnant dans la bande élargie (50-7000 Hz). Le codeur proposé extrait des composants sinusoïdaux stables de l'entrée et les code séparement. Le résidu est codé à l'aide de la transformation en paquet d'ondelettes et un modèle psychoacoustique. La comparaison de trois méthodes d'analyse sinusoïdale (McAulay-Quatieri, Thomson et FHILN) est présentée. Une mesure de similitude sinusoïdale est utilisée pour rendre l'extraction des sinusoïdes plus robuste. Des méthodes de la quantification et codage des paramètres des sinusoïdes sont présentées. Une nouvelle méthode est proposée pour coder les fréquences des sinusoïdes. La quantification scalaire combinée avec le codage entropique est utilisée pour coder les coefficients de la transformation en paquet d'ondelettes. La méthode fournit une qualité parole et audio presque transparente à 32-62 kbps. La substitution du bruit perceptuel est introduit pour coder les sous-bandes bruitées de façon plus efficace. La qualité de la musique codée est restée presque transparente, mais la parole codée est devenue bruitée. Nous avons développé une méthode de quantification en réseau de points pour coder les coefficients de la transformation de paquet d'ondelettes. La méthode utilise le réseau Zn répartie en hyper-pyramides. Le débit du codeur a été réduit à 32-54 kbps, sans dégrader la qualité. Le test d'écoutes effectué a montré que la qualité du codeur est comparable a celle du codeur MPEG-1 couche III (MP3) fonctionnant à 64 kbps.

APA, Harvard, Vancouver, ISO, and other styles

38

Rochette, Denis. "Etude et réalisation d'un vocodeur à dictionnaire LPC 800 BITS/S." Grenoble 2 : ANRT, 1986. http://catalogue.bnf.fr/ark:/12148/cb37600727b.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Gouveia, Paulo D. F. "Codificação de fala por modelos variáveis no tempo." Master's thesis, Universidade de Aveiro, 1996. http://hdl.handle.net/10198/1572.

Full text

Abstract:

O trabalho apresentado nesta tese representa uma contribuição para a optimização da codificação da fala. Utilizam-se para o efeito modelos de codificação baseados em filtros LP (filtros de Predição Linear) de parâmetros variáveis no tempo, contrastando com os modelos fixos utilizados nos métodos convencionais. Nestes, a adaptação dos filtros de predição realiza-se simplesmente através de actualizações periódicas dos seus parâmetros, não traduzindo por isso uma evolução gradual e contínua ao longo do tempo. A técnica utilizada na implementação dos modelos variáveis tem por base a utilização de funções do tipo B-spline na representação das formas de onda dos parâmetros LP. Para o estudo da viabilidade do modelo proposto, analisou-se o desempenho de um vocoder de predição linear incluindo, quer o modelo LP de parâmetros variáveis, quer o modelo LP de parâmetros fixos convencional, por forma a possibilitar a comparação de desempenhos. Dos resultados obtidos concluímos que a codificação de fala por modelos variáveis no tempo, embora não tenha evidenciado vantagens convincentes, pode ser encarada como outra forma de codificação, competindo por isso com as metodologias já existentes. The work presented in this thesis aims at to be a contribution to speech coding. To accomplish this objective, coding models based on LP filters (Linear Predictive Filters) with time-varying parameters are used, and compared with fixed models used in conventional methods. In these models, the predictive filters adaptation is carried on simply through periodic updatings of its parameters, therefore doesn’t representing a gradual and continuous evolution in time. The technique used in varying models implementation is based on the utilization of B-spline like functions to represent the LP parameters waveforms. In order to make a viability study of the proposed model, the performance of a linear predictive vocoder was analyzed, including both the LP model with varying parameters and the conventional LP model with fixed parameters, thus enabling the comparison of their performances. From the results, we concluded that speech coding by time-varying models, although it had not demonstrated clear benefits, can be viewed as another coding way, therefore competing with the already existing methodologies.

APA, Harvard, Vancouver, ISO, and other styles

40

Leitner, Jakub. "Hlasové kodéry pro nízké přenosové rychlosti." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218173.

Full text

Abstract:

The final thesis deals with coders and voice coders used in speech signal processing. The aim is to create an integral overview of coders and voice coders including a description of their properties, in the second part of the thesis a simulation of algorithms and methods of speech processing is performed in Matlab Simulink program.The basic methods of speech processing and a parametric LPC voice coder were simulated in time domain. In the LPC voice coder model there are implemented the algorithms for obtaining speech segment parameters. These are the algorithm for classification of voiced and unvoiced speech segment, LPC analysis and pitch detection. The output is a parametric signal that enables a receiver to synthesize a speech signal. The appendix 1 contains a list of names of coders or standard numbers of coders and their properties, the appendix 2 includes an overview of speech processing methods.

APA, Harvard, Vancouver, ISO, and other styles

41

Ghenania, Mohamed. "Techniques de conversion de format entre codeurs CELP normalisés : Speech coding format conversion between standardized CELP coders." Rennes 1, 2005. http://www.theses.fr/2005REN11038.

Full text

Abstract:

Cette thèse porte sur la conversion de format en codage de la parole (CELP), ou transcodage. La solution classique de conversion de format, appelée tandem, nécessite la mise en cascade d'une opération de décodage et d'une opération de re-codage. Cette approche par défaut utilisée aujourd'hui dans les réseaux de communications présente des inconvénients en terme de complexité, de délai et de qualité. Pour pallier ces problèmes, l'idée de transcodages dits « intelligents » a été étudiée. En s'appuyant sur les caractéristiques du modèle commun utilisé par les codeurs CELP, la conversion de format peut être rendue plus efficace. Les codeurs CELP extraient du signal puis quantifient et transmettent quatre jeux de paramètres: les coefficients LPC, l'excitation adaptative, l'excitation fixe et les gains associés. Dans cette thèse, on se propose de développer pour ces paramètres des techniques efficaces de conversion entre les principaux codeurs CELP utilisés aujourd'hui

APA, Harvard, Vancouver, ISO, and other styles

42

Saadane, Abdelhakim. "Optimisation d'un vocodeur a canaux pour la correction de la parole hyperbare." Rennes 1, 1989. http://www.theses.fr/1989REN10090.

Full text

Abstract:

Au dela d'une certaine profondeur les plongeurs utilisent, pour la respiration des melanges synthetiques dont l'inhalaaltere le fonctionnement de la phonation. On exploite les resultats de travaux precedents pour proposer une nouvelle approche de vocodeurs a canaux. Avec un tel procede, l'enveloppe spectrale est estimee par un banc de filtres dont une optimisation est donnee

APA, Harvard, Vancouver, ISO, and other styles

43

Markle, Blake L. "A comparative study of time-stretching algorithms for audio signals /." Thesis, McGill University, 2001. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=31119.

Full text

Abstract:

Algorithms exist which will perform independent transformations on frequency or duration of a digital audio signal. These processes have different results different types of audio signals. A comparative study of granular and phase vocoder algorithms, implementation, and their respective effects on audio signals was made to determine which algorithm is best suited to a particular type of audio signal.

APA, Harvard, Vancouver, ISO, and other styles

44

SOTERO, FILHO Roberto Fernando Batista. "Novas abordagens para codificação de voz e reconhecimento automático de locutor projetadas via mascaramento pleno em frequência por oitava." Universidade Federal de Pernambuco, 2009. https://repositorio.ufpe.br/handle/123456789/26231.

Full text

Abstract:

Submitted by Pedro Barros (pedro.silvabarros@ufpe.br) on 2018-08-27T22:00:17Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) DISSERTAÇÃO Roberto Fernando Batista Sotero Filho.pdf: 4760318 bytes, checksum: c985fe678efa727fd6aeae0a5fb97627 (MD5)<br>Approved for entry into archive by Alice Araujo (alice.caraujo@ufpe.br) on 2018-09-05T19:02:50Z (GMT) No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) DISSERTAÇÃO Roberto Fernando Batista Sotero Filho.pdf: 4760318 bytes, checksum: c985fe678efa727fd6aeae0a5fb97627 (MD5)<br>Made available in DSpace on 2018-09-05T19:02:50Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) DISSERTAÇÃO Roberto Fernando Batista Sotero Filho.pdf: 4760318 bytes, checksum: c985fe678efa727fd6aeae0a5fb97627 (MD5) Previous issue date: 2009-10-30<br>CAPES<br>A área de processamento digital de sinais de voz (PDSV) é uma das mais importantes do processamento digital de sinais. Como sub-áreas relevantes do PDSV estão a Codificação da Voz e o Reconhecimento Automático de Locutor (RAL). Esta dissertação propõe uma nova abordagem para um vocoder baseado no Mascaramento Pleno em Frequência por Oitavas (MPFO) em adição a uma técnica de preenchimento espectral via distribuição beta de probabilidade. O método do MPFO consiste em simplificar a magnitude do espectro em frequência do sinal, considerando apenas uma amostra por oitava. Tal abordagem, que oferece um compromisso entre taxa de bits (e.g. 2,7 kbits/s), complexidade, inteligibilidade e qualidade dos sinais de voz, permitiu a criação de um novo formato binário de representação digital da voz: o formato voz. Apresenta-se, também, um novo método de baixa complexidade computacional para RAL, baseando-se em uma das propriedades-chave da percepção auditiva humana: o mascaramento acústico em frequência. O vetor característico dos quadros do sinal de voz é representado pela fração média das amplitudes dos tons de mascaramento em cada oitava. Ambos os tipos de reconhecimento de locutor (de texto dependente e de texto independente) são estudados. Os resultados confirmam que o algoritmo proposto oferece um compromisso entre a complexidade e a taxa de identificações corretas (típico 85%), sendo atrativo para aplicações em sistemas embarcados.<br>Digital processing of speech signals (DPSS) is one of the most important areas of digital signal processing. Voice coding and automatic speaker recognition (ASR) are relevant DPSS sub-fields. This dissertation introduces a new vocoder scheme, which is based on full frequency masking per octave (FFMO), jointly with a new spectral stuffing technique through the beta probability distribution. The FFMO method consists of simplifying the magnitude of the voice spectrum. It retains just one spectral sample per octave. This approach offers a tradeoff between the bit rate (e.g., 2.7 kbits/s), complexity, intelligibility and voice quality. A new file format, termed voz, was proposed. A novel and low-complexity ASR technique, based one of the key-properties of the human hearing perception - the auditory frequency masking - is also presented. The feature vectors of voice frames are represented by the average amplitude of the largest spectral samples within each octave. Both text-dependent and text-independent speaker recognition is investigated. Results support a tradeoff between recognition efficiency (typically 85%) and complexity of this kind of vocoder-based systems, being thereby attractive for embedded systems.

APA, Harvard, Vancouver, ISO, and other styles

45

Mesnildrey, Quentin. "Towards a better understanding of the cochlear implant-auditory nerve interface : from intracochlear electrical recordings to psychophysics." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0007/document.

Full text

Abstract:

L'implant cochléaire est une prothèse neurale implantée visant à restituer une sensation auditive chez des personnes souffrant de surdité neurosensorielle sévère à profonde. Si les performances en reconnaissance de la parole sont relativement bonnes dans le silence, elles chutent dramatiquement dans des environnements sonores complexes. L'une des principales limites de l'appareil vient du fait que chaque électrode stimule une large portion de la cochlée. Ainsi lorsque plusieurs électrodes sont activées les champs électriques produits interfèrent ce qui détériore la transmissions des informations sonores. Plusieurs modes de stimulation ont été proposés pour remédier à ce problème mais les améliorations en termes de reconnaissance de la parole restent limités. Dans ce projet, nous cherchons tout d'abord à expliquer via une simulateur acoustique, les résultats décevants obtenus avec le mode de stimulation bipolaire. Dans un deuxième temps nous tentons de mieux comprendre le comportement électrique de l'oreille interne implantée afin d'optimiser la stimulation multipolaire phased array (van den Honert et Kelsall 2007). Pour obtenir une stimulation efficace il faut par ailleurs s'assurer de l'état de la population neuronale à stimuler. Dans ce projet nous essayons donc de mieux comprendre l'interface électrode-neurones et d'identifier un possible corrélat psychophysique de l'état des neurones. Enfin nous discutons la possibilité de créer une stimulation optimale focalisée directement au niveau des neurones<br>The cochlear implant is a neural prosthesis designed to restore an auditory sensation to people suffering from severe to profound sensorineural deafness. While satisfying speech recognition can be achieved in silence, their performance dramatically drop in more complex environments. One main limitations of the present device is due to the fact that each electrode stimulates a wide portion of the cochlea. As a result, when several electrodes are activated, the electrical field produced by different electrodes overlap which distorts the transmission of sound information. Several alternative stimulation modes have been proposed to overcome this issue but the benefit in terms of speech recognition remained limited. In this project, we first used an acoustic simulator of the cochlear implant to explain the desappointing results obtained with the bipolar stimuilation mode. We then try to better understand the electrical behavior of the implanted cochlea in order to optimize the multipolar phased array stimulation strategy ( van den Honert and Kelsall 2007). To achieve an efficient stimulation of the neural population it is necessary to determine the distribution of neural survival. In this project we aim to better understand the electrode-neuron interface and identify a possible psychophysical correlate of neural survival. Finally, we discuss the main results and the possibility to design an optimal stimulation strategy to achieve a spatially-focussed electrical field at the level of the nerve fibers

APA, Harvard, Vancouver, ISO, and other styles

46

Disch, Sascha [Verfasser]. "Modulation vocoder for analysis, processing and synthesis of audio signals with application to frequency selective pitch transposition / Sascha Disch." Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2011. http://d-nb.info/1014323789/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

McGettigan, Carolyn. "Factors affecting the perception of noise-vocoded speech : stimulus properties and listener variability." Thesis, University College London (University of London), 2008. http://discovery.ucl.ac.uk/1444460/.

Full text

Abstract:

This thesis presents an investigation of two general factors affecting speech perception in normal-hearing adults. Two sets of experiments are described, in which speakers of English are presented with degraded (noise-vocoded) speech. The first set of studies investigates the importance of linguistic rhythm as a cue for perceptual adaptation to noise-vocoded sentences. Results indicate that the presence of native English rhythmic patterns benefits speech recognition and adaptation, but not when higher-level linguistic information is absent (i.e. when the sentences are in a foreign language). It is proposed that rhythm may help in the perceptual encoding of degraded speech in phonological working memory. Experiments in this strand also present evidence against a critical role for indexical characteristics of the speaker in the adaptation process. The second set of studies concerns the issue of individual differences in speech perception. A psychometric curve-fitting approach is selected as the preferred method of quantifying variability in noise-vocoded sentence recognition. Measures of working memory and verbal IQ are identified as candidate correlates of performance with noise-vocoded sentences. When the listener is exposed to noise-vocoded stimuli from different linguistic categories (consonants and vowels, isolated words, sentences), there is evidence for the interplay of two initial listening 'modes' in response to the degraded speech signal, representing 'top-down' cognitive-linguistic processing and 'bottom-up' acoustic-phonetic analysis. Detailed analysis of segment recognition presents a perceptual role for temporal information across all the linguistic categories, and suggests that performance could be improved through training regimes that direct attention to the most informative acoustic properties of the stimulus. Across several experiments, the results also demonstrate long-term aspects of perceptual learning. In sum, this thesis demonstrates that consideration of both stimulus-based and listener-based factors forms a promising approach to the characterization of speech perception processes in the healthy adult listener.

APA, Harvard, Vancouver, ISO, and other styles

48

Daniell, Paul. "A Cross-Language Acoustic-Perceptual Study of the Effects of Simulated Hearing Loss on Speech Intonation." Thesis, University of Canterbury. Department of Communication Disorders, 2012. http://hdl.handle.net/10092/7646.

Full text

Abstract:

Aim : The purpose of this study was to examine the impact of simulated hearing loss on the acoustic contrasts between declarative questions and declarative statements and on the perception of speech intonation. A further purpose of the study was to investigate whether any such effects are universal or language specific. Method: Speakers included four native speakers of English and four native speakers of Mandarin and Taiwanese, with two female and two male adults in each group. Listeners included ten native English and ten native speakers of Mandarin and Taiwanese, with five female and five male adults in each group. All participants were aged between 19 and 55 years old. The speaker groups were asked to read a list of 28 phrases, with each phrase expressed as a declarative statement or a declarative question separately. These phrases were then filtered through six types of simulated hearing loss configurations, including three levels of temporal jittering for simulating a loss in neural synchrony, a high level of temporal jittering in combination with a high-pass or a low-pass filter that simulate falling and rising audiometric hearing loss configurations, and a vocoder processing procedure to simulate cochlear implant processing. A selection of acoustic measures was derived from the sentences and from some embedded vowels, including /i/, /a/, and /u/. The listener groups were asked to listen to the tokens in their native language and indicate if they heard a statement or a question. Results: The maximum fundamental frequency (F0) of the last syllable (MaxF0-last) and the maximum F0 of the remaining sentence segment (MaxF0-rest) were found to be consistently higher in declarative questions than in declarative statements. The percent jitter measure was found to worsen with simulated hearing loss as the level of temporal jittering increased. The vocoder-processed signals showed the highest percent jitter measure and the spread of spectral energy around the dominant pitch. Results from the perceptual data showed that participants in all three groups performed significantly worse with vocoder-processed tokens compared to the original tokens. Tokens with temporal jitter alone did not result in significantly worse perceptual results. Perceptual results from the Taiwanese group were significantly worse than the English group under the two filtered conditions. Mandarin listeners performed significantly worse with the neutral tone on the last syllable, and Taiwanese listeners performed significantly worse with the rising tone on the last syllable. Perception of male intonation was worse than female intonation with temporal jitter and high-pass filtering, and perception of female intonation was worse than male intonation with most temporal jittering conditions, including the temporal jitter and low-pass filtering condition. Conclusion: A rise in pitch for the whole sentence, as well as that in the final syllable, was identified as the main acoustic marker of declarative questions in all of the three languages tested. Perception of intonation was significantly reduced by vocoder processing, but not by temporal jitter alone. Under certain simulated hearing loss conditions, perception of intonation was found to be significantly affected by language, lexical tone, and speaker gender.

APA, Harvard, Vancouver, ISO, and other styles

49

Hu, Qiong. "Statistical parametric speech synthesis based on sinusoidal models." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28719.

Full text

Abstract:

This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis.

APA, Harvard, Vancouver, ISO, and other styles

50

Rahrer, Timothy J. (Timothy Joseph) Carleton University Dissertation Engineering Electrical. "A digital signal processing-based hearing prosthesis and implementation of principal components analysis for a tactile aid." Ottawa, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!