Dissertations / Theses on the topic 'Prosody features'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 39 dissertations / theses for your research on the topic 'Prosody features.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Brierley, Claire. "Prosody resources and symbolic prosodic features for automated phrase break prediction." Thesis, University of Leeds, 2011. http://etheses.whiterose.ac.uk/2038/.
Full textVäyrynen, E. (Eero). "Emotion recognition from speech using prosodic features." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526204048.
Full textTiivistelmä Emootiontunnistus on affektiivisen laskennan keskeinen osa-alue. Siinä pyritään ihmisen kommunikaatioon sisältyvien emotionaalisten viestien selvittämiseen, esim. visuaalisten, auditiivisten ja/tai fysiologisten vihjeiden avulla. Puhe on ihmisten tärkein tapa kommunikoida ja on siten ensiarvoisen tärkeässä roolissa viestinnän oikean semanttisen ja emotionaalisen tulkinnan kannalta. Emotionaalinen tieto välittyy puheessa paljolti jatkuvana paralingvistisenä viestintänä, jonka tärkein komponentti on prosodia. Tämän affektiivisen ja emotionaalisen tulkinnan vajaavaisuus ihminen-kone – interaktioissa rajoittaa kuitenkin vielä nykyisellään teknologisten laitteiden toimintaa ja niiden käyttökokemusta. Tässä väitöstyössä on käytetty puheen prosodisia ja akustisia piirteitä puhutun suomen emotionaalisen sisällön tunnistamiseksi. Työssä on kehitetty pitkien puhenäytteiden prosodisiin piirteisiin perustuvia emootiontunnistusmenetelmiä. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamiseksi on taas kehitetty informaatiofuusioon perustuva menetelmä käyttäen prosodian sekä äänilähteen laadullisten piirteiden yhdistelmää. Lisäksi on kehitetty teknologinen viitekehys emotionaalisen puheen visualisoimiseksi prosodisten piirteiden avulla. Tutkimuksessa saavutettiin ihmisten tunnistuskykyyn verrattava automaattisen emootiontunnistuksen taso käytettäessä suppeaa perusemootioiden joukkoa (neutraali, surullinen, iloinen ja vihainen). Emootiontunnistuksen suorituskyky puhutulle suomelle havaittiin olevan verrannollinen länsieurooppalaisten kielten kanssa. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamisessa saavutettiin taas parempi suorituskyky käytettäessä fuusiomenetelmää. Emotionaalisen puheen visualisoimiseksi kehitetyllä opetettavalla epälineaarisella manifoldimallinnustekniikalla pystyttiin tuottamaan aineistolle emootion dimensionaalisen mallin kaltainen visuaalinen rakenne. Mataladimensionaalisen kuvauksen voitiin edelleen osoittaa säilyttävän sekä tutkimusaineiston emotionaalisten luokkien että emotionaalisen intensiteetin topologisia rakenteita. Tässä väitöksessä kehitettiin hahmontunnistusmenetelmiin perustuvaa teknologiaa emotionaalisen puheen tunnistamiseksi käytettäessä sekä pitkiä että lyhyitä puhenäytteitä. Emotionaalisen aineiston visualisointiin ja luokitteluun kehitettyä teknologista kehysmenetelmää käyttäen voidaan myös esittää puheaineistoa muidenkin semanttisten rakenteiden mukaisesti
Rask, Linnea. "Prosodic Features in Child-directed Speech during the Child's First Year." Thesis, Stockholms universitet, Avdelningen för fonetik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-118382.
Full textZelenák, Martin. "Detection and handling of overlapping speech for speaker diarization." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.
Full textFonseca, De Sam Bento Ribeiro Manuel. "Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31338.
Full textGangireddy, Siva Reddy. "Recurrent neural network language models for automatic speech recognition." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28990.
Full textOliveira, Miguel. "Prosodic features in spontaneous narratives." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ61670.pdf.
Full textIliev, Alexander Iliev. "Emotion Recognition Using Glottal and Prosodic Features." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_dissertations/515.
Full textChan, Oscar. "Prosodic features for a maximum entropy language model." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.
Full textBryant, Gregory Alan. "Prosodic features of verbal irony in spontaneous speech /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2004. http://uclibs.org/PID/11984.
Full textBreen, Mara E. "The identification and function of English prosodic features." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/40974.
Full textIncludes bibliographical references (leaves 98-102).
This thesis contains three sets of studies designed to explore the identification and function of prosodic features in English. The first set of studies explores the identification of prosodic features using prosodic annotation. We compared inter-rater agreement for two current prosodic annotation schemes, ToBI (Silverman, et al., 1992) and RaP (Dilley & Brown, 2005) which provide guidelines for the identification of English prosodic features. The studies described here survey inter-rater agreement for both novice and expert raters in both systems, and for both spontaneous and read speech. The results indicate high agreement for both systems on binary classification, but only moderate agreement for categories with more than two levels. The second section explores an aspect of the function of prosody in determining the propositional content of a sentence by investigating the relationship between syntactic structure and intonational phrasing. The first study tests and refines a model designed to predict the intonational phrasing of a sentence given the syntactic structure. In further analysis, we demonstrate that specific acoustic cues-word duration and the presence of silence after a word, can give rise to the perception of intonational boundaries. The final set of experiments explores the relationship between prosody and information structure, and how this relationship is realized acoustically. In a series of four experiments, we manipulated the information status of elements of declarative sentences by varying the questions that preceded those sentences. We found that all of the acoustic features we tested-duration, f0, and intensity-were utilized by speakers to indicate the location of an accented element. However, speakers did not consistently indicate differences in information status type (wide focus, new information, contrastive information) with the acoustic features we investigated.
by Mara E. Breen.
Ph.D.
Bergqvist, Magdalena. "Detecting engagement from prosodic features in spoken dialog." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-455483.
Full textPerera, Katharine. "The development of prosodic features in children's oral reading." Thesis, University of Manchester, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.276094.
Full textSwart, Philippa H. "Prosodic features of imperatives in Xhosa : implications for a text-to-speech system." Thesis, Stellenbosch : Stellenbosch University, 2000. http://hdl.handle.net/10019.1/51891.
Full textENGLISH ABSTRACT: This study focuses on the prosodic features of imperatives and the role of prosodies in the development of a text-to-speech (TIS) system for Xhosa, an African tone language. The perception of prosody is manifested in suprasegmental features such as fundamental frequency (pitch), intensity (loudness) and duration (length). Very little experimental research has been done on the prosodic features of any grammatical structures (moods and tenses) in Xhosa, therefore it has not yet been determined how and to what degree the different prosodic features are combined and utilized in the production and perception of Xhosa speech. One such grammatical structure, for which no explicit descriptive phonetic information exists, is the imperative mood expressing commands. In this study it was shown how the relationship between duration, pitch and loudness, as manifested in the production and perception of Xhosa imperatives could be determined through acoustic analyses and perceptual experiments. An experimental phonetic approach proved to be essential for the acquisition of substantial and reliable prosodic information. An extensive acoustic analysis was conducted to acquire prosodic information on the production of imperatives by Xhosa mother tongue speakers. Subsequently, various statistical parameters were calculated on the raw acoustic data (i) to establish patterns of significance and (ii) to represent the large amount of numeric data generated, in a compact manner. A perceptual experiment was conducted to investigate the perception of imperatives. The prosodic parameters that were extracted from the acoustic analysis were applied to synthesize imperatives in different contexts. A novel approach to Xhosa speech synthesis was adopted. Monotonous verbs were recorded by one speaker and the pitch and duration of these words were then manipulated with the TD-PSOLA technique. Combining the results of the acoustic analysis and the perceptual experiment made it possible to present a prosodic model for the generation of perceptually acceptable imperati ves in a practical Xhosa TIS system. Prosody generation in a natural language processing (NLP) module and its place within the larger framework of text-to-speech synthesis was discussed. It was shown that existing architectures for TTS synthesis would not be appropriate for Xhosa without some adaptation. Hence, a unique architecture was suggested and its possible application subsequently illustrated. Of particular importance was the development of an alternative algorithm for grapheme-to-phoneme conversion. Keywords: prosody, speech synthesis, speech perception, acoustic analysis, Xhosa
AFRIKAANSE OPSOMMING: Hierdie studie fokus op die prodiese eienskappe van imperatiewe en die rol van prosodie in die ontwikkeling van 'n teks-na-spraak-sisteem vir Xhosa, 'n Afrika-toontaal. Die persepsie van prosodie word gemanifesteer in suprasegmentele eienskappe soos fundamentele frekwensie (toonhoogte), intensiteit (luidheid) en duur (lengte). Weinig eksperimentele navorsing bestaan ten opsigte van die prosodiese eienskappe van enige grammatikale strukture (modus en tyd) in Xhosa. Hoe en tot watter mate die verskillende prosodiese kenmerke gekombineer en gebruik word in die produksie en persepsie van Xhosa-spraak is nog nie duidelik nie. 'n Grammatikale struktuur waarvoor geen eksplisiete deskriptiewe fonetiese inligting bestaan nie, is die van die imperatiewe modus wat bevele uitdruk. Hierdie studie wys hoe die verhouding tussen duur, toonhoogte en luidheid, soos gemanifesteer in die produksie en persepsie van Xhosa-imperatiewe bepaal kon word deur akoestiese analises en persepsueIe eksperimente. Dit het geblyk dat 'n eksperimenteelfonetiese benadering noodsaaklik is vir die verkryging van sinvolle en betroubare prosodiese inligting. 'n Uitgebreide akoestiese analise is uitgevoer om prosodiese data omtrent die produksie van imperatiewe deur Xhosa-moedertaalsprekers te bekom. Vervolgens is verskeie statistiese analises op die rou akoestiese data uitgevoer om (i) patrone van beduidenheid te bepaal en om (ii) die groot hoeveelheid numeriese data wat gegenereer is meer kompak voor te stel. 'n PersepsueIe eksperiment is uitgevoer met die doelom die persepsie van imperatiewe te ondersoek. Die prosodiese parameters soos uit die akoestiese analise bekom, is toegepas in die sintese van bevele in verskillende kontekste. 'n Nuwe benadering tot Xhosaspraaksintese is gevolg. Monotone werkwoorde is vir een spreker opgeneem en die toonhoogte en duur van hierdie woorde is met TD-PSOLA tegniek gemanipuleer. 'n Kombinasie van akoestiese en persepsueie resultate is aangewend om 'n prosodiese model te ontwikkel vir die sintese van persepsueel aanvaarbare imperatiewe in 'n praktiese Xhosa teks- na- spraaksinteti seerder . Prosodie-generering in 'n natuurlike taalprosesering-module en die plek daarvan binne die raamwerk van teks-na-spraaksintese is bespreek. Daar is gewys dat bestaande argitekture vir teks-na-spraaksisteme nie sonder sommige aanpassings toepaslik vir Xhosa sal wees nie. Derhalwe is 'n unieke argitektuur gesuggereer en die moontlike toepassing daarvan geïllustreer. Die ontwikkeling van 'n alternatiewe algoritme vir letter-na-klankomsetting was van besondere belang. Sleutelwoorde: spraaksintese, spraakpersepsie, akoestiese analise, Xhosa
Wong, Jimmy Pui Fung. "The use of prosodic features in Chinese speech recognition and spoken language processing /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20WONG.
Full textIncludes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
Burov, Ivaylo. "Les phénomènes de Sandhi dans l'espace gallo-roman." Phd thesis, Université Michel de Montaigne - Bordeaux III, 2012. http://tel.archives-ouvertes.fr/tel-00807535.
Full textSethu, Vidhyasaharan Electrical Engineering & Telecommunications Faculty of Engineering UNSW. "Automatic emotion recognition: an investigation of acoustic and prosodic parameters." Awarded by:University of New South Wales. Electrical Engineering & Telecommunications, 2009. http://handle.unsw.edu.au/1959.4/44620.
Full textBirchwood, Aina, and Leidnert Michaela Eriksson. "Nyordsinlärning i relation till ordförråd, nonordsrepetition och prosodi hos en grupp barn i förskoleåldern med typisk språkutveckling." Thesis, Linköpings universitet, Institutionen för klinisk och experimentell medicin, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-105600.
Full textSeveral studies have shown that vocabulary size and nonword repetition ability correlate with novel word learning. The impact of prosodic features on novel word learning has, however, not been studied extensively. The purpose of this study was to examine how children aged 4:5–6:0 with typical language development perform on novel word learning, vocabulary and nonword repetition and to explore what impact prosodic features have on the ability to learn novel words. The study involved 15 children whose performance on the novel word learning task, vocabulary testing and nonword repetition was calculated. The novel word learning task consisted of six words which were connected to six different items. The novel words were matched in pairs differing by only one prosodic feature: either the number of syllables, stress or tonal word accent. No significant correlations between the novel word learning, vocabulary and nonword repetition were found. However, the correlation between age and nonword repetition reached near significance, which indicated that increased age gave a higher result on the nonword repetition. Regarding how the prosodic features related to the novel word learning, a significant difference between stress placements was detected. Novel words with stress on the final syllable were easier to learn. The children also achieved a higher result on the three syllable words than the two syllable words. The study implies that stress and word length seem to play a somewhat important role for novel word learning in contrast to tonal word accent, while it appears to be no relation between novel word learning, vocabulary and nonword repetition.
Clemens, Denise Leslie. "A study of the capability of the computerized Visi-Pitch when investigating prosodic features of motherese." PDXScholar, 1988. https://pdxscholar.library.pdx.edu/open_access_etds/3743.
Full textJolley, Caitlin. "The Effect of Computer-Based Pronunciation Readings on ESL Learners' Perception and Production of Prosodic Features in a Short-Term ESP Course." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4321.
Full textNavrátil, Michal. "Rozpoznávání emočních stavů pomocí analýzy řečového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217263.
Full textPfeifer, Leon. "Automatické rozpoznávání emočních stavů člověka na základě analýzy řečového projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217520.
Full textHanyášová, Lucie. "Metody texturní analýzy v medicínských obrazech." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217230.
Full textAnderson, Jill M. "Lateralization Effects of Brainstem Responses and Middle Latency Responses to a Complex Tone and Speech Syllable." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1313687765.
Full textCauvin, Evelyne. "Elaboration de critères prosodiques pour une évaluation semi-automatique des apprenants francophones de l'anglais." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC097/document.
Full textThe aim of our study is to modelise the prosodic interlanguage of Francophone learners of English in order to provide useful criteria for a semi-automatic assessment of their prosodic level in English. Learner assessment is a field that requires to be very rigorous and fair when setting up criteria that ensure validity, reliability, feasibility and equality, whereas English prosody is highly variable. Hence, few studies have carried out research in assessing prosody because it represents a real challenge. To address this issue, a specific strategy has been devised to elaborate a methodology that would ensure assessing a reading task successfully.The approach relies upon the constant symbiosis between prosody and a speaker’s subjective response to their environment. Our methodology, also known as « profiling », first aims at selecting relevant native perceived and acoustic prosodic features that will optimize assessment criteria by using their degree of emphasis and creating speakers’ prosodic profiles. Then, using the Longdale-Charliphonia corpus, the learner's productions are analysed acoustically. The automatic classification of the learners based on acoustic or perception prosodic variables is then submitted to expert aural assessment which assesses the learner evaluation criteria.This study achieves: A modelisation of non-native English prosody based on assessment grids that rely upon features of both native and non-native speakers of English, namely, speech rate – with or without the inclusion of pauses, register, melody and rhythm,A semi-automatic evaluation of 15 representative learners based on the above modelisation – ranking and marking,A comparison of the semi-automatic results with those of experts' auditory assessment; correspondence between the two varies from 56.83% to 59.74% when categorising the learners into three prosodic proficiency groups
Van, Heerden Charl Johannes. "Phoneme duration modelling for speaker verification." Diss., Pretoria : [s.n.], 2009. http://upetd.up.ac.za/thesis/available/etd-06262009-150945/.
Full textLáník, Aleš. "Detekce výrobků na pásovém dopravníku." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235894.
Full textHautala, T. (Terhi). "Ikääntyneiden kuuntelijoiden puheen ymmärtäminen kognitiivisesti vaativassa tilanteessa." Doctoral thesis, Oulun yliopisto, 2013. http://urn.fi/urn:isbn:9789526201856.
Full textTiivistelmä Ikääntyvien ihmisten puheen vastaanotossa vaikuttavat samanaikaisesti monet tekijät: kuulokyky, auditiivisen järjestelmän ikääntymismuutokset sekä havaintotoimintojen ja kognitiivisten toimintojen muutokset. Nämä voivat vaikeuttaa puheen ymmärtämistä erityisesti kognitiivisesti vaativassa tilanteessa. Tämän tutkimuksen tavoitteena on selvittää ikääntyneille osallistujille (N = 36) suunnitellun automaattisen puhelinpalvelujärjestelmän käyttöön liittyviä tekijöitä. Tavoitteena on selvittää se, missä määrin toisaalta kokeiltuun järjestelmään liittyvät tekijät ja toisaalta käyttäjien ominaisuudet sekä heidän toimintansa tutkimustilanteessa olivat yhteydessä järjestelmän menestykselliseen käyttöön. Tutkimuksessa käytetään kvantitatiivisia ja kvalitatiivisia menetelmiä. Järjestelmässä kokeiltiin neljän eri puhujan äänillä nauhoitettuja toimintaohjeita. Heidän puheensa prosodisia piirteitä analysoitiin äänen ja puheen analyysiohjelmilla. Ikääntyneisiin osallistujiin (n = 30) liittyviä muuttujia tutkittiin haastattelulla, kuulon tutkimuksilla (äänesaudiometria ja puheaudiometria), kognitiivisella seulontatestillä (Mini-mental state examination = MMSE) ja puheen ymmärtämistä mittaavalla Token-testillä. Mittaustulosten ja muuttujien yhteyttä tehtävistä suoriutumiseen tarkasteltiin tilastollisesti. Osallistujien toimintaa havainnoitiin järjestelmän käyttötilanteessa aineistolähtöisellä laadullisella videoanalyysillä. Järjestelmän puhujilla havaittiin ikääntyneille suunnatun puheen piirteitä. Tehtävistä suoriutuminen oli kuitenkin hyvin samanlaista puhujasta riippumatta. Semanttisesti monimutkaisin tekstivalikko oli osallistujille vaikein äänite. Matala Token-testin pistemäärä ja heikko puheen tunnistuskyky liittyivät heikkoon tehtävistä suoriutumiseen. Laadullisen analyysin perusteella puheen ymmärtämisen ohella keskeisiä kognitiivisia prosesseja tehtävissä menestymisen kannalta olivat seuraavat: ohjeiden muistaminen, huomion suuntaaminen, jakaminen ja ylläpito. Heikko suoriutuminen tehtävissä ja Token-testissä sekä tutkimustilanteessa havaitut toiminnan ohjauksen ongelmat ennustivat toisesta tutkimusvaiheesta poisjääntiä seuraavana vuonna. Kognitiivisesti vaativista kielen käyttötilanteista tehtävillä laadullisilla analyyseilla voidaan arvioida monimutkaisia kielellis-kognitiivisia toimintoja ja löytää mahdollisesti alkaviin muistisairauksiin liittyviä lieviä kielellisiä muutoksia. Tuloksia voidaan hyödyntää ääneen perustuvien käyttöliittymien suunnittelussa. Ikääntyneille suunnatun puheen etuja ja haittoja on tärkeää pohtia myös hoitotyön ja puheterapian näkökulmasta
Yu-Ping, HUNG, and 洪宇平. "Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/5jw3er.
Full textChilds, Jacob Auburn. "Suprasegmental features and their classroom application in pronunciation instruction." 2012. http://hdl.handle.net/2152/19921.
Full texttext
Wu, Jung-yun, and 吳仲耘. "Pitch Prediction Using Prosody Hierarchy and Dynamic Features for HMM-based Mandarin Speech Synthesis." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/03573076744504522613.
Full text國立成功大學
資訊工程學系碩博士班
96
Prosody is the main measurement of naturalness for speech, and pitch is the key factor known to carry the prosodic information. In resent years, speech synthesis based on Hidden Markov Models has been developed, which can synthesize smooth speech and in an advantageous position about its flexible property and portable in size. Nevertheless, there is still room for improvement in “the naturalness” of synthesized speech. In our research, we take the “prosody hierarchy structure” as the basis of pitch prediction model, and apply “dynamic features” to the unit of each hierarchical layer. We describe prosodic units as the supra-segmental units which occur in a hierarchy structure and reflect how brain processes speech; the latter preserve time correlation between adjacent units and result in more natural connection among each conjunction point. Applying this framework to HMM-based speech synthesis system, we can result a better, natural sounding speech. The purpose of this thesis is to develop a pitch prediction model using prosody hierarchy structure and dynamic features and to investigate the improvement of naturalness for synthesized speech. More specifically, this research is aimed to: (1) Prediction and generation of prosody hierarchy structure; (2) Dynamic features for each hierarchical layer; (3) Building the pitch prediction model for each layer: CART for prosodic word and syllable level, HMM for frame level; (4) Feature analysis using STRAIGHT (Speech Transformation and Representation based on Adaptive Interpolation of weiGHTed spectrogram). The experimental result using both subjective and objective tests in the proposed approach and other comparative systems shows that our scheme is better can comparative ones and can generate more natural sounding speech.
Lin, Yi-Ju, and 林奕儒. "Mispronunciation Detection and Diagnosis Combining Prosodic Features and Phonetic Features." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2n6r3r.
Full text國立臺灣師範大學
資訊工程學系
107
The main idea of this thesis is to discuss the assists of the multi-task deep neural network model and prosody characteristics in mispronunciation detection and diagnosis (MDD). The purpose of computer assisted pronunciation training (CAPT) is to help second-language (L2) learners automatically correcting the mistaken pronunciation. Computer assisted pronunciation training can be divided into mispronunciation detection and mispronunciation diagnosis. This paper mainly focuses on three aspects. First, we explore the benefits using the combined features of prosodic and phonetic characteristic in mispronunciation detection and diagnosis task. Second, we use multi-task learning models to help solving the data unbalanced problem. Last but not least, we combine likelihood-based scoring (GOP) method and classification-based scoring method in order to achieve better detection and diagnosis results. The result of experiments shows that phonetic features work better when we need to detect the mispronunciation. On the contrary, prosodic features are more helpful to mispronunciation diagnosis task.
"Spoken language identification with prosodic features." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075120.
Full textThere are no conventional ways to model prosody. We use a large prosodic feature set which covers fundamental frequency (FO), duration and intensity. It also considers various extraction and normalization methods of each type of features. In terms of modeling, the vector space modeling approach is adopted. We introduce a framework called prosodic attribute model (PAM) to model the acoustic correlates of prosodic events in a flexible manner. Feature selection and preliminary LID tests are carried out to derive a preferred term-document matrix construction for modeling.
This thesis focuses on the use of prosodic features for automatic spoken language identification (LID). LID is the problem of automatically determining the language of spoken utterances. After three decades of research, the state-of-the-art LID systems seem to give a saturating performance. To meet the tight requirements on accuracy, prosody is proposed as alternative features to provide complementary information to LID.
Ng, Wai Man.
Adviser: Tan Lee.
Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 112-125).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Chen, Yan-Ting, and 陳彥廷. "Prosody Feature-based German Stressed/Unstressed Syllable Classification — A First Study." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/8fmt7w.
Full text國立臺北科技大學
電腦與通訊研究所
99
Stress phenomenon is an important issue for the understanding of the stress-timed language semantic. For developing the stressed/unstressed judgement module of the german computer assisted language learning system, and considering the characteristics that prosody feature varies with the sentence content. A new normalization procedure and feature extraction method is proposed in this paper. Mainly based on the ability of fundamental frequency decomposition of Fujisaki Model, as remove the phrase influence. Moreover, extract features by considering the difference between the target syllable and it’s neighbors. The performance of the method is evaluated using 「The Kiel Corpus of Read Speech, Vol. I」database. Using decision tree for feature selection. Comparing to traditional feature extraction, the proposed methods is better and promising to reduce the phrase influence.
Chang, Shih-Cheng, and 張仕承. "Emotional Voice Conversion Using Prosodic and Spectral Features." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/9yx887.
Full text國立臺灣科技大學
資訊工程系
105
In this thesis, conversion methods for three prosodic features (pitch contour, duration and intensity) are studied. Then, an emotional voice conversion system is constructed. A neutral input speech is converted to a speech of angry, happy or sad emotion. In the training stage, the F0 GMM and spectrum GMM models were trained for each of the three target emotions respectively by using the corresponding parallel corpus of 120 sentences. Based on sentence segmentation rules, the mean and standard deviation values of the prosodic features are measured across sentences for three segments respectively. Also, this measuring is performed for each target emotion’s training sentences respectively. In the conversion stage, the pitch contour and DCC coefficients of a neutral input speech are mapped to the pitch contour and DCC coefficients for a specified target emotion in terms of the corresponding F0 and spectrum GMM. When using F0 GMM to convert pitch contour, we find that the obtained pitch contour is of fluctuations. Therefore, we study to reduce the fluctuations with median smoothing and moving average processing. Next, by using segmental tables of statistical parameters obtained in the training stage, the three prosodic features (pitch contour, duration, and intensity) are converted with the method, segmental standard deviation matching (SSDM). To let the emotion expressed in the converted speech more close to the target emotion, we propose a dynamic speech duration adjusting method. The duration of a frame is dynamically determined according to its energy ratio. To evaluate the performance of our emotional voice conversion system, we had conducted two subjective listening tests. The first test is to compare the emotional expressions of two converted speeches by two conversion methods. The percentages of the votes obtained by our method are 95% for angry emotion, 65% for happy emotion, and 67.5% for sad emotion. As to the second test, each participant is requested to recognize the emotion expressed in the speech played to him. The results show that the recognition rates obtained by our conversion method are 87.5% for angry emotion, 61.3% for happy emotion, and 77.5% for sad emotion. Therefore, the emotional voice conversion system using the studied conversion method is effective in converting a neutral speech to a speech of a specified target emotion.
Owens, Kate. "Effects of prosodic features on judgements of intelligibility and accentedness." Thesis, 1985. http://spectrum.library.concordia.ca/3124/1/ML23158.pdf.
Full textSchindlerová, Tereza. "Specifika prozodie českého filmového dabingu." Master's thesis, 2015. http://www.nusl.cz/ntk/nusl-391361.
Full textMedeiros, Henrique Rodrigues Barbosa de. "Automatic detection of disfluencies in a corpus of university lectures." Master's thesis, 2014. http://hdl.handle.net/10071/8683.
Full textEsta tese aborda a identificação de sequências disfluentes e respetivas regiões estruturais. As experiências aqui descritas baseiam-se em segmentação e informação relativa a prosódia, calculadas a partir de um corpus de aulas universitárias em Português Europeu, contendo cerca de 32 horas de fala e de cerca de 7,7% de disfluências. O conjunto de características utilizadas provou ser discriminatório na identificação das regiões contidas na produção de disfluências. Os melhores resultados dizem respeito à deteção do interregnum, seguida da deteção do ponto de interrupção. Foram testados vários métodos de aprendizagem automática, sendo as Árvores de Decisão e Regressão as que geralmente obtiveram os melhores resultados. O conjunto de características mais informativas para a identificação e distinção de regiões disfluentes abrange rácios de duração de palavras, nível de confiança da palavra atual, rácios envolvendo silêncios e declives de pitch e de energia. Características tais como o número de fones e sílabas por palavra provaram ser mais úteis para a identificação do interregnum, enquanto pitch e energia foram os mais adequados para identificar o ponto de interrupção. Foram também realizadas experiências focando a deteção de pausas preenchidas. Por enquanto, para estas experiências foi utilizado apenas material proveniente de alinhamento forçado, já que o sistema de reconhecimento automático não está bem adaptado a este domínio. Este estudo representa um novo passo no sentido da deteção automática de pausas preenchidas para Português Europeu, utilizando recursos prosódicos. Em trabalho futuro pretende-se estender esse estudo para transcrições automáticas e também abordar outros domínios, explorando conjuntos mais extensos de características linguísticas.
Vojtěch, Albert. "Komplexní slova typu 'absobloominlutely'." Master's thesis, 2019. http://www.nusl.cz/ntk/nusl-393648.
Full text