Log in

Relevant bibliographies by topics / Prosody features / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Prosody features.

Dissertations / Theses on the topic 'Prosody features'

Author: Grafiati

Published: 4 June 2021

Last updated: 16 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 39 dissertations / theses for your research on the topic 'Prosody features.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Brierley, Claire. "Prosody resources and symbolic prosodic features for automated phrase break prediction." Thesis, University of Leeds, 2011. http://etheses.whiterose.ac.uk/2038/.

Full text

Abstract:

It is universally recognised that humans process speech and language in chunks, each meaningful in itself. Any two renditions or assimilations of a given sentence will exhibit similarities and discrepancies in chunking, where speakers and readers use pauses and inflections to mark phrase breaks. This thesis reviews deterministic and stochastic approaches to phrase break prediction, plus datasets, evaluation metrics and feature sets. Early rule-based experimental work with a chunk parser gives rise to motivational insights, namely: the limitations of traditional features (syntax and punctuation) and deficiency of prosody in current phrasing models, and the problem of evaluating performance when the training set only represents one phrasing variant. Such insights inform resource creation in the form of ProPOSEL, a prosody and part-of-speech English lexicon, to create a domain-independent knowledge source, plus prosodic annotation and text analytics tool for corpus-based research, supported by a comprehensive software tutorial. Future applications of ProPOSEL include prosody-motivated speech-to-viseme generation for "talking heads" and expressive avatar creation. Here, ProPOSEL is used to build the ProPOSEC dataset by merging and annotating two versions of the Spoken English Corpus. Linguistic data arrays in this dataset are first mined for prosodic boundary correlates and later re-conceptualised as training instances for supervised machine learning. This thesis contends that native English speakers use certain sound patterns (e.g. diphthongs and triphthongs) as linguistic signs for phrase breaks, having observed these same patterns at rhythmic junctures in poetry. Pre-boundary lexical items bearing these complex vowels and gold-standard boundary annotations are found to be highly correlated via the chi-squared statistic in different genres, including seventeenth century English verse, and for multiple speakers. Complex vowels and other symbolic prosodic features are then implemented in a phrasing model to evaluate efficacy for phrase break prediction. The ultimate challenge is to better understand how sound and rhythm, as components of the linguistic sign, inform psycholinguistic chunking even during silent reading.

APA, Harvard, Vancouver, ISO, and other styles

2

Väyrynen, E. (Eero). "Emotion recognition from speech using prosodic features." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526204048.

Full text

Abstract:

Abstract Emotion recognition, a key step of affective computing, is the process of decoding an embedded emotional message from human communication signals, e.g. visual, audio, and/or other physiological cues. It is well-known that speech is the main channel for human communication and thus vital in the signalling of emotion and semantic cues for the correct interpretation of contexts. In the verbal channel, the emotional content is largely conveyed as constant paralinguistic information signals, from which prosody is the most important component. The lack of evaluation of affect and emotional states in human machine interaction is, however, currently limiting the potential behaviour and user experience of technological devices. In this thesis, speech prosody and related acoustic features of speech are used for the recognition of emotion from spoken Finnish. More specifically, methods for emotion recognition from speech relying on long-term global prosodic parameters are developed. An information fusion method is developed for short segment emotion recognition using local prosodic features and vocal source features. A framework for emotional speech data visualisation is presented for prosodic features. Emotion recognition in Finnish comparable to the human reference is demonstrated using a small set of basic emotional categories (neutral, sad, happy, and angry). A recognition rate for Finnish was found comparable with those reported in the western language groups. Increased emotion recognition is shown for short segment emotion recognition using fusion techniques. Visualisation of emotional data congruent with the dimensional models of emotion is demonstrated utilising supervised nonlinear manifold modelling techniques. The low dimensional visualisation of emotion is shown to retain the topological structure of the emotional categories, as well as the emotional intensity of speech samples. The thesis provides pattern recognition methods and technology for the recognition of emotion from speech using long speech samples, as well as short stressed words. The framework for the visualisation and classification of emotional speech data developed here can also be used to represent speech data from other semantic viewpoints by using alternative semantic labellings if available
Tiivistelmä Emootiontunnistus on affektiivisen laskennan keskeinen osa-alue. Siinä pyritään ihmisen kommunikaatioon sisältyvien emotionaalisten viestien selvittämiseen, esim. visuaalisten, auditiivisten ja/tai fysiologisten vihjeiden avulla. Puhe on ihmisten tärkein tapa kommunikoida ja on siten ensiarvoisen tärkeässä roolissa viestinnän oikean semanttisen ja emotionaalisen tulkinnan kannalta. Emotionaalinen tieto välittyy puheessa paljolti jatkuvana paralingvistisenä viestintänä, jonka tärkein komponentti on prosodia. Tämän affektiivisen ja emotionaalisen tulkinnan vajaavaisuus ihminen-kone – interaktioissa rajoittaa kuitenkin vielä nykyisellään teknologisten laitteiden toimintaa ja niiden käyttökokemusta. Tässä väitöstyössä on käytetty puheen prosodisia ja akustisia piirteitä puhutun suomen emotionaalisen sisällön tunnistamiseksi. Työssä on kehitetty pitkien puhenäytteiden prosodisiin piirteisiin perustuvia emootiontunnistusmenetelmiä. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamiseksi on taas kehitetty informaatiofuusioon perustuva menetelmä käyttäen prosodian sekä äänilähteen laadullisten piirteiden yhdistelmää. Lisäksi on kehitetty teknologinen viitekehys emotionaalisen puheen visualisoimiseksi prosodisten piirteiden avulla. Tutkimuksessa saavutettiin ihmisten tunnistuskykyyn verrattava automaattisen emootiontunnistuksen taso käytettäessä suppeaa perusemootioiden joukkoa (neutraali, surullinen, iloinen ja vihainen). Emootiontunnistuksen suorituskyky puhutulle suomelle havaittiin olevan verrannollinen länsieurooppalaisten kielten kanssa. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamisessa saavutettiin taas parempi suorituskyky käytettäessä fuusiomenetelmää. Emotionaalisen puheen visualisoimiseksi kehitetyllä opetettavalla epälineaarisella manifoldimallinnustekniikalla pystyttiin tuottamaan aineistolle emootion dimensionaalisen mallin kaltainen visuaalinen rakenne. Mataladimensionaalisen kuvauksen voitiin edelleen osoittaa säilyttävän sekä tutkimusaineiston emotionaalisten luokkien että emotionaalisen intensiteetin topologisia rakenteita. Tässä väitöksessä kehitettiin hahmontunnistusmenetelmiin perustuvaa teknologiaa emotionaalisen puheen tunnistamiseksi käytettäessä sekä pitkiä että lyhyitä puhenäytteitä. Emotionaalisen aineiston visualisointiin ja luokitteluun kehitettyä teknologista kehysmenetelmää käyttäen voidaan myös esittää puheaineistoa muidenkin semanttisten rakenteiden mukaisesti

APA, Harvard, Vancouver, ISO, and other styles

3

Rask, Linnea. "Prosodic Features in Child-directed Speech during the Child's First Year." Thesis, Stockholms universitet, Avdelningen för fonetik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-118382.

Full text

Abstract:

This study investigates prosodic features of child-directed speech during the child’s first year, using the automated prosodic annotation software Prosogram. From previous studies on first language acquisition and child-directed speech we know that speech directed to infants and small children is characterised by exaggerated use of several prosodic features, including a higher pitch, livelier pitch movement and slower speech rate. Annotation of these phenomena has previously been done manually, which is time consuming and includes a risk of circularity. If we can use semi-automated systems to carry out this task, it would be a huge methodological gain. This study analysed recordings of 10 parent-child pairs at four occasions (3, 6, 9 and 12 months of age) for a total of 40 recordings. The audio files were analysed in Prosogram in order to detect possible differences depending on the child’s age. The results showed a noticeable change in child-directed speech over the first year of the child’s life. A change in several characteristic prosodic features was noted to occur between the ages of 6 and 9 months. Pitch levels decreased, and articulation rate increased. Additionally, parents seemed to use pitch values much higher than their mean pitch speaking to children aged 3 and 6 months than to children aged 9 and 12 months. Despite using a relatively small sample, the results show several interesting trends in the usage of child-directed speech. Furthermore, this study shows that Prosogram is a useful tool for automatic analysis of child-directed speech.

APA, Harvard, Vancouver, ISO, and other styles

4

Zelenák, Martin. "Detection and handling of overlapping speech for speaker diarization." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.

Full text

Abstract:

For the last several years, speaker diarization has been attracting substantial research attention as one of the spoken language technologies applied for the improvement, or enrichment, of recording transcriptions. Recordings of meetings, compared to other domains, exhibit an increased complexity due to the spontaneity of speech, reverberation effects, and also due to the presence of overlapping speech. Overlapping speech refers to situations when two or more speakers are speaking simultaneously. In meeting data, a substantial portion of errors of the conventional speaker diarization systems can be ascribed to speaker overlaps, since usually only one speaker label is assigned per segment. Furthermore, simultaneous speech included in training data can eventually lead to corrupt single-speaker models and thus to a worse segmentation. This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from different microphone pairs are fused by means of principal component analysis, linear discriminant analysis, or by a multi-layer perceptron. In addition, we also investigate the possibility of employing longterm prosodic information. The most suitable subset from a set of candidate prosodic features is determined in two steps. Firstly, a ranking according to mRMR criterion is obtained, and then, a standard hill-climbing wrapper approach is applied in order to determine the optimal number of features. The novel spatial as well as prosodic parameters are used in combination with spectral-based features suggested previously in the literature. In experiments conducted on AMI meeting data, we show that the newly proposed features do contribute to the detection of overlapping speech, especially on data originating from a single recording site. In speaker diarization, for segments including detected speaker overlap, a second speaker label is picked, and such segments are also discarded from the model training. The proposed overlap labeling technique is integrated in Viterbi decoding, a part of the diarization algorithm. During the system development it was discovered that it is favorable to do an independent optimization of overlap exclusion and labeling with respect to the overlap detection system. We report improvements over the baseline diarization system on both single- and multi-site AMI data. Preliminary experiments with NIST RT data show DER improvement on the RT ¿09 meeting recordings as well. The addition of beamforming and TDOA feature stream into the baseline diarization system, which was aimed at improving the clustering process, results in a bit higher effectiveness of the overlap labeling algorithm. A more detailed analysis on the overlap exclusion behavior reveals big improvement contrasts between individual meeting recordings as well as between various settings of the overlap detection operation point. However, a high performance variability across different recordings is also typical of the baseline diarization system, without any overlap handling.

APA, Harvard, Vancouver, ISO, and other styles

5

Fonseca, De Sam Bento Ribeiro Manuel. "Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31338.

Full text

Abstract:

Statistical parametric speech synthesis (SPSS) has seen improvements over recent years, especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it can also be bland and monotonous. Proper generation of natural speech prosody is still a largely unsolved problem. This is relevant especially in the context of expressive audiobook speech synthesis, where speech is expected to be fluid and captivating. In general, prosody can be seen as a layer that is superimposed on the segmental (phone) sequence. Listeners can perceive the same melody or rhythm in different utterances, and the same segmental sequence can be uttered with a different prosodic layer to convey a different message. For this reason, prosody is commonly accepted to be inherently suprasegmental. It is governed by longer units within the utterance (e.g. syllables, words, phrases) and beyond the utterance (e.g. discourse). However, common techniques for the modeling of speech prosody - and speech in general - operate mainly on very short intervals, either at the state or frame level, in both hidden Markov model (HMM) and deep neural network (DNN) based speech synthesis. This thesis presents contributions supporting the claim that stronger representations of suprasegmental variation are essential for the natural generation of fundamental frequency for statistical parametric speech synthesis. We conceptualize the problem by dividing it into three sub-problems: (1) representations of acoustic signals, (2) representations of linguistic contexts, and (3) the mapping of one representation to another. The contributions of this thesis provide novel methods and insights relating to these three sub-problems. In terms of sub-problem 1, we propose a multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform, as well as a wavelet-based decomposition strategy that is linguistically and perceptually motivated. In terms of sub-problem 2, we investigate additional linguistic features such as text-derived word embeddings and syllable bag-of-phones and we propose a novel method for learning word vector representations based on acoustic counts. Finally, considering sub-problem 3, insights are given regarding hierarchical models such as parallel and cascaded deep neural networks.

APA, Harvard, Vancouver, ISO, and other styles

6

Gangireddy, Siva Reddy. "Recurrent neural network language models for automatic speech recognition." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28990.

Full text

Abstract:

The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) for large vocabulary continuous speech recognition (LVCSR). RNNLMs are currently state-of-the-art and shown to consistently reduce the word error rates (WERs) of LVCSR tasks when compared to other language models. In this thesis we propose various advances to RNNLMs. The advances are: improved learning procedures for RNNLMs, enhancing the context, and adaptation of RNNLMs. We learned better parameters by a novel pre-training approach and enhanced the context using prosody and syntactic features. We present a pre-training method for RNNLMs, in which the output weights of a feed-forward neural network language model (NNLM) are shared with the RNNLM. This is accomplished by first fine-tuning the weights of the NNLM, which are then used to initialise the output weights of an RNNLM with the same number of hidden units. To investigate the effectiveness of the proposed pre-training method, we have carried out text-based experiments on the Penn Treebank Wall Street Journal data, and ASR experiments on the TED lectures data. Across the experiments, we observe small but significant improvements in perplexity (PPL) and ASR WER. Next, we present unsupervised adaptation of RNNLMs. We adapted the RNNLMs to a target domain (topic or genre or television programme (show)) at test time using ASR transcripts from first pass recognition. We investigated two approaches to adapt the RNNLMs. In the first approach the forward propagating hidden activations are scaled - learning hidden unit contributions (LHUC). In the second approach we adapt all parameters of RNNLM.We evaluated the adapted RNNLMs by showing the WERs on multi genre broadcast speech data. We observe small (on an average 0.1% absolute) but significant improvements in WER compared to a strong unadapted RNNLM model. Finally, we present the context-enhancement of RNNLMs using prosody and syntactic features. The prosody features were computed from the acoustics of the context words and the syntactic features were from the surface form of the words in the context. We trained the RNNLMs with word duration, pause duration, final phone duration, syllable duration, syllable F0, part-of-speech tag and Combinatory Categorial Grammar (CCG) supertag features. The proposed context-enhanced RNNLMs were evaluated by reporting PPL and WER on two speech recognition tasks, Switchboard and TED lectures. We observed substantial improvements in PPL (5% to 15% relative) and small but significant improvements in WER (0.1% to 0.5% absolute).

APA, Harvard, Vancouver, ISO, and other styles

7

Oliveira, Miguel. "Prosodic features in spontaneous narratives." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ61670.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Iliev, Alexander Iliev. "Emotion Recognition Using Glottal and Prosodic Features." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_dissertations/515.

Full text

Abstract:

Emotion conveys the psychological state of a person. It is expressed by a variety of physiological changes, such as changes in blood pressure, heart beat rate, degree of sweating, and can be manifested in shaking, changes in skin coloration, facial expression, and the acoustics of speech. This research focuses on the recognition of emotion conveyed in speech. There were three main objectives of this study. One was to examine the role played by the glottal source signal in the expression of emotional speech. The second was to investigate whether it can provide improved robustness in real-world situations and in noisy environments. This was achieved through testing in clear and various noisy conditions. Finally, the performance of glottal features was compared to diverse existing and newly introduced emotional feature domains. A novel glottal symmetry feature is proposed and automatically extracted from speech. The effectiveness of several inverse filtering methods in extracting the glottal signal from speech has been examined. Other than the glottal symmetry, two additional feature classes were tested for emotion recognition domains. They are the: Tonal and Break Indices (ToBI) of American English intonation, and Mel Frequency Cepstral Coefficients (MFCC) of the glottal signal. Three corpora were specifically designed for the task. The first two investigated the four emotions: Happy, Angry, Sad, and Neutral, and the third added Fear and Surprise in a six emotions recognition task. This work shows that the glottal signal carries valuable emotional information and using it for emotion recognition has many advantages over other conventional methods. For clean speech, in a four emotion recognition task using classical prosodic features achieved 89.67% recognition, ToBI combined with classical features, reached 84.75% recognition, while using glottal symmetry alone achieved 98.74%. For a six emotions task these three methods achieved 79.62%, 90.39% and 85.37% recognition rates, respectively. Using the glottal signal also provided greater classifier robustness under noisy conditions and distortion caused by low pass filtering. Specifically, for additive white Gaussian noise at SNR = 10 dB in the six emotion task the classical features and the classical with ToBI both failed to provide successful results; speech MFCC's achieved a recognition rate of 41.43% and glottal symmetry reached 59.29%. This work has shown that the glottal signal, and the glottal symmetry in particular, provides high class separation for both the four and six emotion cases. It is confidently surpassing the performance of all other features included in this investigation in noisy speech conditions and in most clean signal conditions.

APA, Harvard, Vancouver, ISO, and other styles

9

Chan, Oscar. "Prosodic features for a maximum entropy language model." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.

Full text

Abstract:

A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source of knowledge. This contains, amongst other things, the melodic and rhythmic aspects of speech referred to as prosody. Although prosody has been found to be an important factor in human speech recognition, its use in ASR has been limited. The goal of this research is to investigate how prosodic information can be employed to improve the language modelling component of a continuous speech recognition system. Because prosodic features are largely suprasegmental, operating over units larger than the phonetic segment, the language model is an appropriate place to incorporate such information. The prosodic features and standard language model features are combined under the maximum entropy framework, which provides an elegant solution to modelling information obtained from multiple, differing knowledge sources. We derive features for the model based on perceptually transcribed Tones and Break Indices (ToBI) labels, and analyse their contribution to the word recognition task. While ToBI has a solid foundation in linguistic theory, the need for human transcribers conflicts with the statistical model's requirement for a large quantity of training data. We therefore also examine the applicability of features which can be automatically extracted from the speech signal. We develop representations of an utterance's prosodic context using fundamental frequency, energy and duration features, which can be directly incorporated into the model without the need for manual labelling. Dimensionality reduction techniques are also explored with the aim of reducing the computational costs associated with training a maximum entropy model. Experiments on a prosodically transcribed corpus show that small but statistically significant reductions to perplexity and word error rates can be obtained by using both manually transcribed and automatically extracted features.

APA, Harvard, Vancouver, ISO, and other styles

10

Bryant, Gregory Alan. "Prosodic features of verbal irony in spontaneous speech /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2004. http://uclibs.org/PID/11984.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Breen, Mara E. "The identification and function of English prosodic features." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/40974.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2007.
Includes bibliographical references (leaves 98-102).
This thesis contains three sets of studies designed to explore the identification and function of prosodic features in English. The first set of studies explores the identification of prosodic features using prosodic annotation. We compared inter-rater agreement for two current prosodic annotation schemes, ToBI (Silverman, et al., 1992) and RaP (Dilley & Brown, 2005) which provide guidelines for the identification of English prosodic features. The studies described here survey inter-rater agreement for both novice and expert raters in both systems, and for both spontaneous and read speech. The results indicate high agreement for both systems on binary classification, but only moderate agreement for categories with more than two levels. The second section explores an aspect of the function of prosody in determining the propositional content of a sentence by investigating the relationship between syntactic structure and intonational phrasing. The first study tests and refines a model designed to predict the intonational phrasing of a sentence given the syntactic structure. In further analysis, we demonstrate that specific acoustic cues-word duration and the presence of silence after a word, can give rise to the perception of intonational boundaries. The final set of experiments explores the relationship between prosody and information structure, and how this relationship is realized acoustically. In a series of four experiments, we manipulated the information status of elements of declarative sentences by varying the questions that preceded those sentences. We found that all of the acoustic features we tested-duration, f0, and intensity-were utilized by speakers to indicate the location of an accented element. However, speakers did not consistently indicate differences in information status type (wide focus, new information, contrastive information) with the acoustic features we investigated.
by Mara E. Breen.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

12

Bergqvist, Magdalena. "Detecting engagement from prosodic features in spoken dialog." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-455483.

Full text

Abstract:

This thesis aims to detect engagement in humans playing a speech-driven game with an embodied agent. Prosodic features of the human speech are used to detect engagement using a k-nearest neighbours machine learning algorithm. This thesis uses a corpus consisting of games where the human and the embodied agent are playing a geography-themed game belonging to the Rapid Dialogue Game domain. The corpus was collected previously to this thesis, however the data was annotated using a coding scheme designed specifically for this thesis. The labels in the coding scheme were defined depending on attention and affect levels in the player. A high level of attention would mean the player was actively trying to score points in the game and affect referred to emotion of positive, negative and neutral kind. The coding scheme got an inter-annotators reliability of 80% and 57% when two games were annotated by three people and the reliability was calculated for the two games separately. The results show that the algorithm was not able to accurately detect engagement in players. The algorithm got an accuracy of 45% when classifying four levels of engagement, and 64% when classifying two levels. One reason the system was unable to perform the task could be that the machine learning algorithm used is not able to learn the behaviors needed, or that the selected features are not enough to learn from. It could also be that the coding scheme used to annotate data was not specific enough which could create incorrectly labeled data.

APA, Harvard, Vancouver, ISO, and other styles

13

Perera, Katharine. "The development of prosodic features in children's oral reading." Thesis, University of Manchester, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.276094.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Swart, Philippa H. "Prosodic features of imperatives in Xhosa : implications for a text-to-speech system." Thesis, Stellenbosch : Stellenbosch University, 2000. http://hdl.handle.net/10019.1/51891.

Full text

Abstract:

Thesis (MA)--University of Stellenbosch, 2000.
ENGLISH ABSTRACT: This study focuses on the prosodic features of imperatives and the role of prosodies in the development of a text-to-speech (TIS) system for Xhosa, an African tone language. The perception of prosody is manifested in suprasegmental features such as fundamental frequency (pitch), intensity (loudness) and duration (length). Very little experimental research has been done on the prosodic features of any grammatical structures (moods and tenses) in Xhosa, therefore it has not yet been determined how and to what degree the different prosodic features are combined and utilized in the production and perception of Xhosa speech. One such grammatical structure, for which no explicit descriptive phonetic information exists, is the imperative mood expressing commands. In this study it was shown how the relationship between duration, pitch and loudness, as manifested in the production and perception of Xhosa imperatives could be determined through acoustic analyses and perceptual experiments. An experimental phonetic approach proved to be essential for the acquisition of substantial and reliable prosodic information. An extensive acoustic analysis was conducted to acquire prosodic information on the production of imperatives by Xhosa mother tongue speakers. Subsequently, various statistical parameters were calculated on the raw acoustic data (i) to establish patterns of significance and (ii) to represent the large amount of numeric data generated, in a compact manner. A perceptual experiment was conducted to investigate the perception of imperatives. The prosodic parameters that were extracted from the acoustic analysis were applied to synthesize imperatives in different contexts. A novel approach to Xhosa speech synthesis was adopted. Monotonous verbs were recorded by one speaker and the pitch and duration of these words were then manipulated with the TD-PSOLA technique. Combining the results of the acoustic analysis and the perceptual experiment made it possible to present a prosodic model for the generation of perceptually acceptable imperati ves in a practical Xhosa TIS system. Prosody generation in a natural language processing (NLP) module and its place within the larger framework of text-to-speech synthesis was discussed. It was shown that existing architectures for TTS synthesis would not be appropriate for Xhosa without some adaptation. Hence, a unique architecture was suggested and its possible application subsequently illustrated. Of particular importance was the development of an alternative algorithm for grapheme-to-phoneme conversion. Keywords: prosody, speech synthesis, speech perception, acoustic analysis, Xhosa
AFRIKAANSE OPSOMMING: Hierdie studie fokus op die prodiese eienskappe van imperatiewe en die rol van prosodie in die ontwikkeling van 'n teks-na-spraak-sisteem vir Xhosa, 'n Afrika-toontaal. Die persepsie van prosodie word gemanifesteer in suprasegmentele eienskappe soos fundamentele frekwensie (toonhoogte), intensiteit (luidheid) en duur (lengte). Weinig eksperimentele navorsing bestaan ten opsigte van die prosodiese eienskappe van enige grammatikale strukture (modus en tyd) in Xhosa. Hoe en tot watter mate die verskillende prosodiese kenmerke gekombineer en gebruik word in die produksie en persepsie van Xhosa-spraak is nog nie duidelik nie. 'n Grammatikale struktuur waarvoor geen eksplisiete deskriptiewe fonetiese inligting bestaan nie, is die van die imperatiewe modus wat bevele uitdruk. Hierdie studie wys hoe die verhouding tussen duur, toonhoogte en luidheid, soos gemanifesteer in die produksie en persepsie van Xhosa-imperatiewe bepaal kon word deur akoestiese analises en persepsueIe eksperimente. Dit het geblyk dat 'n eksperimenteelfonetiese benadering noodsaaklik is vir die verkryging van sinvolle en betroubare prosodiese inligting. 'n Uitgebreide akoestiese analise is uitgevoer om prosodiese data omtrent die produksie van imperatiewe deur Xhosa-moedertaalsprekers te bekom. Vervolgens is verskeie statistiese analises op die rou akoestiese data uitgevoer om (i) patrone van beduidenheid te bepaal en om (ii) die groot hoeveelheid numeriese data wat gegenereer is meer kompak voor te stel. 'n PersepsueIe eksperiment is uitgevoer met die doelom die persepsie van imperatiewe te ondersoek. Die prosodiese parameters soos uit die akoestiese analise bekom, is toegepas in die sintese van bevele in verskillende kontekste. 'n Nuwe benadering tot Xhosaspraaksintese is gevolg. Monotone werkwoorde is vir een spreker opgeneem en die toonhoogte en duur van hierdie woorde is met TD-PSOLA tegniek gemanipuleer. 'n Kombinasie van akoestiese en persepsueie resultate is aangewend om 'n prosodiese model te ontwikkel vir die sintese van persepsueel aanvaarbare imperatiewe in 'n praktiese Xhosa teks- na- spraaksinteti seerder . Prosodie-generering in 'n natuurlike taalprosesering-module en die plek daarvan binne die raamwerk van teks-na-spraaksintese is bespreek. Daar is gewys dat bestaande argitekture vir teks-na-spraaksisteme nie sonder sommige aanpassings toepaslik vir Xhosa sal wees nie. Derhalwe is 'n unieke argitektuur gesuggereer en die moontlike toepassing daarvan geïllustreer. Die ontwikkeling van 'n alternatiewe algoritme vir letter-na-klankomsetting was van besondere belang. Sleutelwoorde: spraaksintese, spraakpersepsie, akoestiese analise, Xhosa

APA, Harvard, Vancouver, ISO, and other styles

15

Wong, Jimmy Pui Fung. "The use of prosodic features in Chinese speech recognition and spoken language processing /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20WONG.

Full text

Abstract:

Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

16

Burov, Ivaylo. "Les phénomènes de Sandhi dans l'espace gallo-roman." Phd thesis, Université Michel de Montaigne - Bordeaux III, 2012. http://tel.archives-ouvertes.fr/tel-00807535.

Full text

Abstract:

Cette thèse de doctorat s'inscrit principalement, mais non entièrement, dans le domaine de la phonologie générale et romane. Elle a pour objet d'étude plusieurs phénomènes de sandhi attestés dans quelques variétés de gallo-roman : français, occitan, wallon, franco-provençal. Comme une grande partie des phénomènes phonologiques postlexicaux étudiés sont panromans, la thèse ne les analyse pas comme des processus isolés, mais à travers leur variation diatopique et diachronique, c'est-à-dire comme des manifestations concrètes de tendances communes aux langues romanes, tout en essayant d'expliquer leur motivation par des principes phonologiques universels, ainsi que par les méthodes de l'analyse contrastive.Dans cette thèse on pourrait délimiter trois grandes parties thématiques. La première a une portée théorique et englobe les chapitres I et II où sont présentées et analysées des données d'une soixantaine de langues parlées dans le monde entier. Dans cette partie je passe en revue les diverses acceptions controversées du terme de sandhi en vue d'en proposer ma propre définition grâce au formalisme de la phonologie prosodique. La deuxième partie a une portée phonologique et englobe les chapitres III, IV et V où sont étudiés trois phénomènes de sandhi de l'espace gallo-roman, à savoir la liaison, le redoublement phonosyntaxique et les alternances vocaliques avec zéro en syllabe initiale. La dernière partie thématique est représentée par le chapitre VI qui a une portée sociolinguistique. Les trois phénomènes de sandhi en question y sont comparés et analysés à la lumière des facteurs pour leur variation, parmi lesquels la tradition graphique occupe une place privilégiée.

APA, Harvard, Vancouver, ISO, and other styles

17

Sethu, Vidhyasaharan Electrical Engineering &amp Telecommunications Faculty of Engineering UNSW. "Automatic emotion recognition: an investigation of acoustic and prosodic parameters." Awarded by:University of New South Wales. Electrical Engineering & Telecommunications, 2009. http://handle.unsw.edu.au/1959.4/44620.

Full text

Abstract:

An essential step to achieving human-machine speech communication with the naturalness of communication between humans is developing a machine that is capable of recognising emotions based on speech. This thesis presents research addressing this problem, by making use of acoustic and prosodic information. At a feature level, novel group delay and weighted frequency features are proposed. The group delay features are shown to emphasise information pertaining to formant bandwidths and are shown to be indicative of emotions. The weighted frequency feature, based on the recently introduced empirical mode decomposition, is proposed as a compact representation of the spectral energy distribution and is shown to outperform other estimates of energy distribution. Feature level comparisons suggest that detailed spectral measures are very indicative of emotions while exhibiting greater speaker specificity. Moreover, it is shown that all features are characteristic of the speaker and require some of sort of normalisation prior to use in a multi-speaker situation. A novel technique for normalising speaker-specific variability in features is proposed, which leads to significant improvements in the performances of systems trained and tested on data from different speakers. This technique is also used to investigate the amount of speaker-specific variability in different features. A preliminary study of phonetic variability suggests that phoneme specific traits are not modelled by the emotion models and that speaker variability is a more significant problem in the investigated setup. Finally, a novel approach to emotion modelling that takes into account temporal variations of speech parameters is analysed. An explicit model of the glottal spectrum is incorporated into the framework of the traditional source-filter model, and the parameters of this combined model are used to characterise speech signals. An automatic emotion recognition system that takes into account the shape of the contours of these parameters as they vary with time is shown to outperform a system that models only the parameter distributions. The novel approach is also empirically shown to be on par with human emotion classification performance.

APA, Harvard, Vancouver, ISO, and other styles

18

Birchwood, Aina, and Leidnert Michaela Eriksson. "Nyordsinlärning i relation till ordförråd, nonordsrepetition och prosodi hos en grupp barn i förskoleåldern med typisk språkutveckling." Thesis, Linköpings universitet, Institutionen för klinisk och experimentell medicin, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-105600.

Full text

Abstract:

Vid flertalet studier har det framkommit att ordförrådets storlek och förmågan till nonordsrepetition påvisar samband med nyordsinlärning. De prosodiska egenskapernas inverkan vid nyordsinlärning är emellertid inte lika studerad. Syftet med föreliggande studie var att undersöka hur barn mellan 4:5 och 6:0 år med typisk språkutveckling presterar på nyordsinlärning i relation till ordförråd och repetition av nonord samt att utforska vilken inverkan prosodiska egenskaper har på förmågan till nyordsinlärning. I studien deltog 15 barn vilkas resultat på nyorden, ordförrådstestningen och nonordsrepetitionen uträknades. Nyordsinlärningsuppgiften bestod av sex ord vilka sammankopplades med sex olika föremål. Nyorden matchades i par med avseende på en åtskiljande prosodisk egenskap mellan dem: antingen antal stavelser, betoning eller ordaccent. Inga signifikanta korrelationer mellan nyordsinlärning, ordförråd och nonordsrepetition kunde påvisas. Det framkom dock att korrelationen mellan ålder och nonordsrepetition var nära signifikans och indikerade att ökad ålder gav ett högre resultat på nonordsrepetitionen. Gällande de prosodiska egenskapernas relation till nyordsinlärning upptäcktes en signifikant skillnad i betoningsplacering, nyord med betoning på den finala stavelsen fick högst resultat. Barnen lärde sig också trestaviga ord i större utsträckning än tvåstaviga ord. Studien implicerar att betoning och ordlängd verkar ha viss betydelse för nyordsinlärning i kontrast till ordaccent, medan det inte kan påvisas några föreliggande korrelationer mellan nyordsinlärning, ordförråd och nonordsrepetition.
Several studies have shown that vocabulary size and nonword repetition ability correlate with novel word learning. The impact of prosodic features on novel word learning has, however, not been studied extensively. The purpose of this study was to examine how children aged 4:5–6:0 with typical language development perform on novel word learning, vocabulary and nonword repetition and to explore what impact prosodic features have on the ability to learn novel words. The study involved 15 children whose performance on the novel word learning task, vocabulary testing and nonword repetition was calculated. The novel word learning task consisted of six words which were connected to six different items. The novel words were matched in pairs differing by only one prosodic feature: either the number of syllables, stress or tonal word accent. No significant correlations between the novel word learning, vocabulary and nonword repetition were found. However, the correlation between age and nonword repetition reached near significance, which indicated that increased age gave a higher result on the nonword repetition. Regarding how the prosodic features related to the novel word learning, a significant difference between stress placements was detected. Novel words with stress on the final syllable were easier to learn. The children also achieved a higher result on the three syllable words than the two syllable words. The study implies that stress and word length seem to play a somewhat important role for novel word learning in contrast to tonal word accent, while it appears to be no relation between novel word learning, vocabulary and nonword repetition.

APA, Harvard, Vancouver, ISO, and other styles

19

Clemens, Denise Leslie. "A study of the capability of the computerized Visi-Pitch when investigating prosodic features of motherese." PDXScholar, 1988. https://pdxscholar.library.pdx.edu/open_access_etds/3743.

Full text

Abstract:

With commercial availability of non-real and real-time spectrum analyzers, the speech-language pathologist has the means to objectively extract and measure pitch taken from speech samples. Though both types of spectrum analyzers provide the clinician with viable methods of measuring fundamental frequency and frequency range values, pitch extraction using real time allows for greater efficiency in acoustic measurements. The Kay Elemetrics Visi-Pitch is one such real-time spectrum analyzer that is less expensive and more accessible than other real time speech science hardware. The purpose of this study was to investigate the capability of a computerized Visi-Pitch to reflect elevation of fundamental frequency and expansion of frequency range by female adults.

APA, Harvard, Vancouver, ISO, and other styles

20

Jolley, Caitlin. "The Effect of Computer-Based Pronunciation Readings on ESL Learners' Perception and Production of Prosodic Features in a Short-Term ESP Course." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4321.

Full text

Abstract:

Recent studies on pronunciation teaching in ESL classrooms have found that the teaching of suprasegmentals, namely stress, pausing, and intonation, has a great effect on improving intelligibility (Derwing, Munro, & Wiebe, 1998; Kang, Rubin, & Pickering, 2010; Morley, 1991). The current project describes the development and implementation of computer-based pronunciation materials used for an English for Specific Purposes (ESP) program. The pronunciation program made use of cued pronunciation readings (CPRs) which used suprasegmentals and were developed for English as a second language (ESL) missionaries at the Provo, Utah, Missionary Training Center (MTC). Because there was no pronunciation program in place at the MTC, instructional materials that focused on prosodic features were greatly needed. Missionaries participated in the program anywhere from three to six weeks. Results from the implementation period revealed that missionaries made medium to large gains in their ability to perceive suprasegmentals after using the practice tasks and small-medium gains in their ability to produce suprasegmentals during this short time period. Recommendations for further development, implementation, and testing of similar materials are made for use with individuals in other ESP settings like these missionaries at the MTC.

APA, Harvard, Vancouver, ISO, and other styles

21

Navrátil, Michal. "Rozpoznávání emočních stavů pomocí analýzy řečového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217263.

Full text

Abstract:

The diploma thesis deals with the analysis of human emotional states speaker by the help of analyse speech signals. The thesis has two parts. In the first part, the process of speech generating is described in addition to the description of the commonly used pre-processing methods such as denoising or preemphasis. The first part also deals with the major and minor prosody features, these features are: the fundamental frequency, energy, spectral features and time domain features such as the speech rate. The second part of this thesis deals with a task of emotion recognition from the speech signal. When we accumulate sufficient of the number of recordings emotive state will be able to rekognize emotive state with high probability. All project is prepared for use in real time. The last part of this thesis thesis contains description and results of the experiments made on a large number of speech records.

APA, Harvard, Vancouver, ISO, and other styles

22

Pfeifer, Leon. "Automatické rozpoznávání emočních stavů člověka na základě analýzy řečového projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217520.

Full text

Abstract:

The diploma thesis deals with the analysis of human emotional states. The thesis consists of three parts. The first part is charcterize, the process of speech generating, from phonetic and psychological poin of view. In the second part there are proccesed metods and contextual things.(preprocessing of signal, voice activity detector). For calculation fundamental Frequency it was used metod of central clipping, another used metod is formant frequency analyse and the last is metod of determinatin of nuber of thorns and planes. In the thirt part there are proccesesed results of measurements performed by particural metods. It was scorred five different emotional states: neutral, anger, happiness, sadness and surprise. At the end of this part there are discussed results for each metod.

APA, Harvard, Vancouver, ISO, and other styles

23

Hanyášová, Lucie. "Metody texturní analýzy v medicínských obrazech." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217230.

Full text

Abstract:

This thesis is focused on texture analysis methods. The project contains an overview of widely used methods. The main aim of the thesis is to develop a method for texture analysis of retinal images, which will be used for distinction of two patient groups, one with glaucoma eyes and one healthy. It is observed that glaucoma patients don´t have a texture on the eye ground. Preprocessing of the images is found by transfer of the image to different color spaces to achieve the best emphasis of the eye ground texture. Co-occurrence matrix is chosen for texture analysis of this data. The thesis contains detail description of the chosen solutions and feature discussion and the result is a list of features, which can be used for distinction between glaucoma and healthy eyes. The method is implemented in Matlab environment.

APA, Harvard, Vancouver, ISO, and other styles

24

Anderson, Jill M. "Lateralization Effects of Brainstem Responses and Middle Latency Responses to a Complex Tone and Speech Syllable." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1313687765.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Cauvin, Evelyne. "Elaboration de critères prosodiques pour une évaluation semi-automatique des apprenants francophones de l'anglais." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC097/document.

Full text

Abstract:

L’objectif de cette thèse est de modéliser l’interlangue prosodique des apprenants francophones de l’anglais afin de pouvoir élaborer des critères utilisables pour une évaluation semi-automatique de leur niveau prosodique. Le domaine évaluatif requiert la plus grande rigueur dans la mise en place de ses critères pour aboutir à la validité, la fiabilité, la faisabilité et l’équité maximales, alors que la prosodie anglaise de la langue cible se caractérise par son extrême variabilité. Aussi, peu d’études se sont engagées dans l’évaluation de la prosodie, qui représente une réelle gageure. Pour relever ce défi, une stratégie particulière a été mise en place pour élaborer une méthodologie permettant d’atteindre l’objectif fixé, en lecture.L’approche choisie repose sur la symbiose permanente qu’entretient la prosodie avec le monde dans lequel évolue le locuteur. Cette méthodologie, ou « profilage », est destinée à sélectionner par inclusion ou exclusion les éléments analysés tant au niveau perceptif qu’acoustique. Le profilage des réalisations sur l’axe syntagmatique permet de sélectionner les locuteurs natifs servant de modèles, et celui basé sur le phénomène d’emphase rend possible un ciblage de leurs réalisations les plus pertinentes à modéliser sur l’axe paradigmatique. Conformément à cette méthodologie d’investigation nouvelle et aux résultats perceptifs et acoustiques concordants pour la langue cible, les réalisations des apprenants francophones du corpus Longdale-Charliphonia sont analysés acoustiquement. Le classement automatique à partir des variables prosodiques (acoustiques et perceptives) est confronté à celui d’experts évaluant par perception classique.Les travaux de cette thèse aboutissent essentiellement à : Une modélisation de la prosodie anglaise non native par grilles évaluatives critériées s’appuyant sur critères distinctifs natifs et non natifs issus de variables temporelles (vitesse d’élocution avec ou sans pauses), de registre et de mélodie, ainsi que de rythme, À partir de ces variables, une évaluation semi-automatisée de 15 apprenants représentatifs du corpus par classement et notation, une correspondance des résultats de l’évaluation traditionnelle avec celle semi-automatique évoluant entre 56,83% et 59,74% dans une catégorisation des apprenants en 3 niveaux de maîtrise, en fonction du profilage d’experts évaluateurs
The aim of our study is to modelise the prosodic interlanguage of Francophone learners of English in order to provide useful criteria for a semi-automatic assessment of their prosodic level in English. Learner assessment is a field that requires to be very rigorous and fair when setting up criteria that ensure validity, reliability, feasibility and equality, whereas English prosody is highly variable. Hence, few studies have carried out research in assessing prosody because it represents a real challenge. To address this issue, a specific strategy has been devised to elaborate a methodology that would ensure assessing a reading task successfully.The approach relies upon the constant symbiosis between prosody and a speaker’s subjective response to their environment. Our methodology, also known as « profiling », first aims at selecting relevant native perceived and acoustic prosodic features that will optimize assessment criteria by using their degree of emphasis and creating speakers’ prosodic profiles. Then, using the Longdale-Charliphonia corpus, the learner's productions are analysed acoustically. The automatic classification of the learners based on acoustic or perception prosodic variables is then submitted to expert aural assessment which assesses the learner evaluation criteria.This study achieves: A modelisation of non-native English prosody based on assessment grids that rely upon features of both native and non-native speakers of English, namely, speech rate – with or without the inclusion of pauses, register, melody and rhythm,A semi-automatic evaluation of 15 representative learners based on the above modelisation – ranking and marking,A comparison of the semi-automatic results with those of experts' auditory assessment; correspondence between the two varies from 56.83% to 59.74% when categorising the learners into three prosodic proficiency groups

APA, Harvard, Vancouver, ISO, and other styles

26

Van, Heerden Charl Johannes. "Phoneme duration modelling for speaker verification." Diss., Pretoria : [s.n.], 2009. http://upetd.up.ac.za/thesis/available/etd-06262009-150945/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Láník, Aleš. "Detekce výrobků na pásovém dopravníku." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235894.

Full text

Abstract:

In this master thesis, object's detection in image and tracking these objects in temporal area will be presented. First, theoretical background of the image's preprocessing, image filtration, the foreground extraction, and many others various image's features will be described. Next, design and implementation of detector will be processed. This part of my master thesis containes mainly information about detection of objects on belt conveyer Finally,results, conclusion and many supplementary data such as a photography camera's location will be shown.

APA, Harvard, Vancouver, ISO, and other styles

28

Hautala, T. (Terhi). "Ikääntyneiden kuuntelijoiden puheen ymmärtäminen kognitiivisesti vaativassa tilanteessa." Doctoral thesis, Oulun yliopisto, 2013. http://urn.fi/urn:isbn:9789526201856.

Full text

Abstract:

Abstract There are multiple factors simultaneously affecting speech perception in elderly people. These factors include hearing acuity, aging of the auditory system, and changes in both perception and cognitive processes, all of which can interfere with speech comprehension, especially in cognitively demanding situations. The aim of this study is to clarify which factors influence the use of an automatic phone service system designed for elderly (N = 36) people. More specifically, the aim is to investigate whether it is the factors connected to the system itself or the factors connected to the elderly users and their actions with the system that are the most crucial for using the system successfully. Both quantitative and qualitative methods are used in the study. There were four people who performed as speakers in the system. Analysis of the prosodic features of their speech was performed using acoustic analysis software. The variables connected to the elderly participants (n = 30) were investigated using interviews, pure-tone and speech audiometric tests, the Mini-Mental State Examination test (MMSE), and the Token Test for speech comprehension. Statistical analyses were used to explore whether there was a statistical connection between the acoustic measurements or the variables connected to participants themselves and their performance in usability test situation. In addition, the elderly participants’ actions in the test situation were observed using a material-based, qualitative video-analysis. The individuals who performed as speakers in the system were observed to use features of elderspeak in their speech. However, these speaker characteristics had little effect on the participants’ performance in the tasks. It was the voice-menu that contained the most semantically complex text structure that proved to be the most difficult for participants. Both low scores in the Token test and poor word recognition were connected to poor performance in the tasks. It was found based on the qualitative analysis that in addition to speech comprehension, there were other cognitive processes that were important for completing the tasks successfully, i.e. remembering the instructions given (memory), and the ability to direct, divide and maintain attention during the tasks. Poor performance in the tasks and in the Token Test, as well as problems in executive functions observed in the test situation, were found to be factors predicting dropping out of the next phase of the study the following year. Qualitative analysis of language use in cognitively demanding situations can be used in evaluation of high-level language performance. It may be useful for detecting mild changes in language skills that can be symptomatic of early stages of memory disorders. The results of this study can also be utilized when designing voice-based interfaces. In addition, it is important to consider both advantages and disadvantages of using elderspeak in the fields of nursing and speech therapy
Tiivistelmä Ikääntyvien ihmisten puheen vastaanotossa vaikuttavat samanaikaisesti monet tekijät: kuulokyky, auditiivisen järjestelmän ikääntymismuutokset sekä havaintotoimintojen ja kognitiivisten toimintojen muutokset. Nämä voivat vaikeuttaa puheen ymmärtämistä erityisesti kognitiivisesti vaativassa tilanteessa. Tämän tutkimuksen tavoitteena on selvittää ikääntyneille osallistujille (N = 36) suunnitellun automaattisen puhelinpalvelujärjestelmän käyttöön liittyviä tekijöitä. Tavoitteena on selvittää se, missä määrin toisaalta kokeiltuun järjestelmään liittyvät tekijät ja toisaalta käyttäjien ominaisuudet sekä heidän toimintansa tutkimustilanteessa olivat yhteydessä järjestelmän menestykselliseen käyttöön. Tutkimuksessa käytetään kvantitatiivisia ja kvalitatiivisia menetelmiä. Järjestelmässä kokeiltiin neljän eri puhujan äänillä nauhoitettuja toimintaohjeita. Heidän puheensa prosodisia piirteitä analysoitiin äänen ja puheen analyysiohjelmilla. Ikääntyneisiin osallistujiin (n = 30) liittyviä muuttujia tutkittiin haastattelulla, kuulon tutkimuksilla (äänesaudiometria ja puheaudiometria), kognitiivisella seulontatestillä (Mini-mental state examination = MMSE) ja puheen ymmärtämistä mittaavalla Token-testillä. Mittaustulosten ja muuttujien yhteyttä tehtävistä suoriutumiseen tarkasteltiin tilastollisesti. Osallistujien toimintaa havainnoitiin järjestelmän käyttötilanteessa aineistolähtöisellä laadullisella videoanalyysillä. Järjestelmän puhujilla havaittiin ikääntyneille suunnatun puheen piirteitä. Tehtävistä suoriutuminen oli kuitenkin hyvin samanlaista puhujasta riippumatta. Semanttisesti monimutkaisin tekstivalikko oli osallistujille vaikein äänite. Matala Token-testin pistemäärä ja heikko puheen tunnistuskyky liittyivät heikkoon tehtävistä suoriutumiseen. Laadullisen analyysin perusteella puheen ymmärtämisen ohella keskeisiä kognitiivisia prosesseja tehtävissä menestymisen kannalta olivat seuraavat: ohjeiden muistaminen, huomion suuntaaminen, jakaminen ja ylläpito. Heikko suoriutuminen tehtävissä ja Token-testissä sekä tutkimustilanteessa havaitut toiminnan ohjauksen ongelmat ennustivat toisesta tutkimusvaiheesta poisjääntiä seuraavana vuonna. Kognitiivisesti vaativista kielen käyttötilanteista tehtävillä laadullisilla analyyseilla voidaan arvioida monimutkaisia kielellis-kognitiivisia toimintoja ja löytää mahdollisesti alkaviin muistisairauksiin liittyviä lieviä kielellisiä muutoksia. Tuloksia voidaan hyödyntää ääneen perustuvien käyttöliittymien suunnittelussa. Ikääntyneille suunnatun puheen etuja ja haittoja on tärkeää pohtia myös hoitotyön ja puheterapian näkökulmasta

APA, Harvard, Vancouver, ISO, and other styles

29

Yu-Ping, HUNG, and 洪宇平. "Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/5jw3er.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Childs, Jacob Auburn. "Suprasegmental features and their classroom application in pronunciation instruction." 2012. http://hdl.handle.net/2152/19921.

Full text

Abstract:

This Report examines the importance of suprasegmentals and how one might teach them. I demonstrate, through the readings of experts in the field, the close relationship between suprasegmental features and intelligibility, which I support with a review of research literature as the goal of instruction. Pronunciation and suprasegmental research in pedagogy is analyzed and discussed, and teacher and learner beliefs are compared with current research-backed conclusions. Finally, this Report provides the readers with sample lessons on nuclear stress to demonstrate how to incorporate a five-step pronunciation framework into a classroom or tutoring setting.
text

APA, Harvard, Vancouver, ISO, and other styles

31

Wu, Jung-yun, and 吳仲耘. "Pitch Prediction Using Prosody Hierarchy and Dynamic Features for HMM-based Mandarin Speech Synthesis." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/03573076744504522613.

Full text

Abstract:

碩士
國立成功大學
資訊工程學系碩博士班
96
Prosody is the main measurement of naturalness for speech, and pitch is the key factor known to carry the prosodic information. In resent years, speech synthesis based on Hidden Markov Models has been developed, which can synthesize smooth speech and in an advantageous position about its flexible property and portable in size. Nevertheless, there is still room for improvement in “the naturalness” of synthesized speech. In our research, we take the “prosody hierarchy structure” as the basis of pitch prediction model, and apply “dynamic features” to the unit of each hierarchical layer. We describe prosodic units as the supra-segmental units which occur in a hierarchy structure and reflect how brain processes speech; the latter preserve time correlation between adjacent units and result in more natural connection among each conjunction point. Applying this framework to HMM-based speech synthesis system, we can result a better, natural sounding speech. The purpose of this thesis is to develop a pitch prediction model using prosody hierarchy structure and dynamic features and to investigate the improvement of naturalness for synthesized speech. More specifically, this research is aimed to: (1) Prediction and generation of prosody hierarchy structure; (2) Dynamic features for each hierarchical layer; (3) Building the pitch prediction model for each layer: CART for prosodic word and syllable level, HMM for frame level; (4) Feature analysis using STRAIGHT (Speech Transformation and Representation based on Adaptive Interpolation of weiGHTed spectrogram). The experimental result using both subjective and objective tests in the proposed approach and other comparative systems shows that our scheme is better can comparative ones and can generate more natural sounding speech.

APA, Harvard, Vancouver, ISO, and other styles

32

Lin, Yi-Ju, and 林奕儒. "Mispronunciation Detection and Diagnosis Combining Prosodic Features and Phonetic Features." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2n6r3r.

Full text

Abstract:

碩士
國立臺灣師範大學
資訊工程學系
107
The main idea of this thesis is to discuss the assists of the multi-task deep neural network model and prosody characteristics in mispronunciation detection and diagnosis (MDD). The purpose of computer assisted pronunciation training (CAPT) is to help second-language (L2) learners automatically correcting the mistaken pronunciation. Computer assisted pronunciation training can be divided into mispronunciation detection and mispronunciation diagnosis. This paper mainly focuses on three aspects. First, we explore the benefits using the combined features of prosodic and phonetic characteristic in mispronunciation detection and diagnosis task. Second, we use multi-task learning models to help solving the data unbalanced problem. Last but not least, we combine likelihood-based scoring (GOP) method and classification-based scoring method in order to achieve better detection and diagnosis results. The result of experiments shows that phonetic features work better when we need to detect the mispronunciation. On the contrary, prosodic features are more helpful to mispronunciation diagnosis task.

APA, Harvard, Vancouver, ISO, and other styles

33

"Spoken language identification with prosodic features." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075120.

Full text

Abstract:

The PAM-based prosodic LID system is compared with other prosodic LID systems with a task of pairwise language identification. The advantages of comprehensive modeling of prosodic features is clearly demonstrated. Analysis reveals the confusion patterns among target languages, as well as the feature-language relationship. The PAM-based prosodic LID system is combined with a state-of-the-art phonotactic system by score-level fusion. Complementary effects are demonstrated between the two different features in the LID problem. An additional operation on score calibration, which further improves the LID system performance, is also introduced.
There are no conventional ways to model prosody. We use a large prosodic feature set which covers fundamental frequency (FO), duration and intensity. It also considers various extraction and normalization methods of each type of features. In terms of modeling, the vector space modeling approach is adopted. We introduce a framework called prosodic attribute model (PAM) to model the acoustic correlates of prosodic events in a flexible manner. Feature selection and preliminary LID tests are carried out to derive a preferred term-document matrix construction for modeling.
This thesis focuses on the use of prosodic features for automatic spoken language identification (LID). LID is the problem of automatically determining the language of spoken utterances. After three decades of research, the state-of-the-art LID systems seem to give a saturating performance. To meet the tight requirements on accuracy, prosody is proposed as alternative features to provide complementary information to LID.
Ng, Wai Man.
Adviser: Tan Lee.
Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 112-125).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.

APA, Harvard, Vancouver, ISO, and other styles

34

Chen, Yan-Ting, and 陳彥廷. "Prosody Feature-based German Stressed/Unstressed Syllable Classification — A First Study." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/8fmt7w.

Full text

Abstract:

碩士
國立臺北科技大學
電腦與通訊研究所
99
Stress phenomenon is an important issue for the understanding of the stress-timed language semantic. For developing the stressed/unstressed judgement module of the german computer assisted language learning system, and considering the characteristics that prosody feature varies with the sentence content. A new normalization procedure and feature extraction method is proposed in this paper. Mainly based on the ability of fundamental frequency decomposition of Fujisaki Model, as remove the phrase influence. Moreover, extract features by considering the difference between the target syllable and it’s neighbors. The performance of the method is evaluated using 「The Kiel Corpus of Read Speech, Vol. I」database. Using decision tree for feature selection. Comparing to traditional feature extraction, the proposed methods is better and promising to reduce the phrase influence.

APA, Harvard, Vancouver, ISO, and other styles

35

Chang, Shih-Cheng, and 張仕承. "Emotional Voice Conversion Using Prosodic and Spectral Features." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/9yx887.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
105
In this thesis, conversion methods for three prosodic features (pitch contour, duration and intensity) are studied. Then, an emotional voice conversion system is constructed. A neutral input speech is converted to a speech of angry, happy or sad emotion. In the training stage, the F0 GMM and spectrum GMM models were trained for each of the three target emotions respectively by using the corresponding parallel corpus of 120 sentences. Based on sentence segmentation rules, the mean and standard deviation values of the prosodic features are measured across sentences for three segments respectively. Also, this measuring is performed for each target emotion’s training sentences respectively. In the conversion stage, the pitch contour and DCC coefficients of a neutral input speech are mapped to the pitch contour and DCC coefficients for a specified target emotion in terms of the corresponding F0 and spectrum GMM. When using F0 GMM to convert pitch contour, we find that the obtained pitch contour is of fluctuations. Therefore, we study to reduce the fluctuations with median smoothing and moving average processing. Next, by using segmental tables of statistical parameters obtained in the training stage, the three prosodic features (pitch contour, duration, and intensity) are converted with the method, segmental standard deviation matching (SSDM). To let the emotion expressed in the converted speech more close to the target emotion, we propose a dynamic speech duration adjusting method. The duration of a frame is dynamically determined according to its energy ratio. To evaluate the performance of our emotional voice conversion system, we had conducted two subjective listening tests. The first test is to compare the emotional expressions of two converted speeches by two conversion methods. The percentages of the votes obtained by our method are 95% for angry emotion, 65% for happy emotion, and 67.5% for sad emotion. As to the second test, each participant is requested to recognize the emotion expressed in the speech played to him. The results show that the recognition rates obtained by our conversion method are 87.5% for angry emotion, 61.3% for happy emotion, and 77.5% for sad emotion. Therefore, the emotional voice conversion system using the studied conversion method is effective in converting a neutral speech to a speech of a specified target emotion.

APA, Harvard, Vancouver, ISO, and other styles

36

Owens, Kate. "Effects of prosodic features on judgements of intelligibility and accentedness." Thesis, 1985. http://spectrum.library.concordia.ca/3124/1/ML23158.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Schindlerová, Tereza. "Specifika prozodie českého filmového dabingu." Master's thesis, 2015. http://www.nusl.cz/ntk/nusl-391361.

Full text

Abstract:

The aim of the thesis was to design an analytical model enabling a comparative analysis of both original and dubbed versions of film dialogues, with special regard to prosodic interference (intonation in particular), to describe and explain such interference, caused by certain differences between Czech and English, and to assess its influence on communication, considering the nature of a film character and its reception by Czech recipients. The analysis showed that higher pitch register and extended intonation range were the most common types of interference, bringing about the impression of over-emotive and over-melodious speech and also changing some of the film characters. These types of interference were caused by a different way of using intonation within the system of a language; English uses intonation to signal information structure and to express emotions as well. Interesting results were obtained when a structural approach to a character, as proposed by Jiří Levý (1971), was applied in the analysis. Surprisingly, another type of interference was discovered; it is a sort "indirect" interference developed in cases when the dubbed version closely follows the original and its dominant prosodic features and uses them in situations where the original does not. Such interference is changing the...

APA, Harvard, Vancouver, ISO, and other styles

38

Medeiros, Henrique Rodrigues Barbosa de. "Automatic detection of disfluencies in a corpus of university lectures." Master's thesis, 2014. http://hdl.handle.net/10071/8683.

Full text

Abstract:

This dissertation focuses on the identification of disfluent sequences and their distinct structural regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32 hours of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. The best results concern the detection of the interregnum, followed by the detection of the interruption point. Several machine learning methods have been applied, but experiments show that Classification and Regression Trees usually outperform the other methods. The set of most informative features for cross-region identification encompasses word duration ratios, word confidence score, silent ratios, and pitch and energy slopes. Features such as the number of phones and syllables per word proved to be more useful for the identification of the interregnum, whereas energy slopes were most suited for identifying the interruption point. We have also conducted initial experiments on automatic detecting filled pauses, the most frequent disfluency type. For now, only force aligned transcripts were used, since the ASR system is not well adapted to this domain. This study is a step towards automatic detection of filled pauses for European Portuguese using prosodic features. Future work will extend this study for fully automatic transcripts, and will also tackle other domains, also exploring extended sets of linguistic features.
Esta tese aborda a identificação de sequências disfluentes e respetivas regiões estruturais. As experiências aqui descritas baseiam-se em segmentação e informação relativa a prosódia, calculadas a partir de um corpus de aulas universitárias em Português Europeu, contendo cerca de 32 horas de fala e de cerca de 7,7% de disfluências. O conjunto de características utilizadas provou ser discriminatório na identificação das regiões contidas na produção de disfluências. Os melhores resultados dizem respeito à deteção do interregnum, seguida da deteção do ponto de interrupção. Foram testados vários métodos de aprendizagem automática, sendo as Árvores de Decisão e Regressão as que geralmente obtiveram os melhores resultados. O conjunto de características mais informativas para a identificação e distinção de regiões disfluentes abrange rácios de duração de palavras, nível de confiança da palavra atual, rácios envolvendo silêncios e declives de pitch e de energia. Características tais como o número de fones e sílabas por palavra provaram ser mais úteis para a identificação do interregnum, enquanto pitch e energia foram os mais adequados para identificar o ponto de interrupção. Foram também realizadas experiências focando a deteção de pausas preenchidas. Por enquanto, para estas experiências foi utilizado apenas material proveniente de alinhamento forçado, já que o sistema de reconhecimento automático não está bem adaptado a este domínio. Este estudo representa um novo passo no sentido da deteção automática de pausas preenchidas para Português Europeu, utilizando recursos prosódicos. Em trabalho futuro pretende-se estender esse estudo para transcrições automáticas e também abordar outros domínios, explorando conjuntos mais extensos de características linguísticas.

APA, Harvard, Vancouver, ISO, and other styles

39

Vojtěch, Albert. "Komplexní slova typu 'absobloominlutely'." Master's thesis, 2019. http://www.nusl.cz/ntk/nusl-393648.

Full text

Abstract:

iv Abstract The MA thesis examines the word-formation potential of expletive insertion with simple and complex words in English. It represents a linguistic phenomenon that is commonly used by native speakers, shows a certain degree of regularity and has gained popularity with the rise of the Internet, social media and the movie industry. The theoretical part introduces the previous studies on the phenomenon and presents the basic features of the phenomenon, namely the categorization of inserts and the classification of their positions in terms of the structure of the base as outlined by McMillan (1980). The extraction of the sample is described in the methodology section. The empirical part examines the phenomenon's main principles of use governed by prosody and morphology and illustrates the properties and both regularities and irregularities that the process exhibits (predictable insert position, poly-syllabicity of the base, its unchanged meaning and syntactic category, alternative categories of input bases and morphematic discontinuity of bases). The analysis comprises of two main parts: the study of the inserted bases (word-class, type of base, simple vs. complex, and a number of syllables) and the study of the expletive insert (representation of individual inserts and their position relative to stress...

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!