Academic literature on the topic 'Prosody features'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Prosody features.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Prosody features"

1

Freese, Jeremy, and Douglas W. Maynard. "Prosodic features of bad news and good news in conversation." Language in Society 27, no. 2 (April 1998): 195–219. http://dx.doi.org/10.1017/s0047404500019850.

Full text
Abstract:
ABSTRACTRecent work suggests the importance of integrating prosodic research with research on the sequential organization of ordinary conversation. This paper examines how interactants use prosody as a resource in the joint accomplishment of delivered news as good or bad. Analysis of approximately 100 naturally occurring conversational news deliveries reveals that both good and bad news are presented and received with characteristic prosodic features that are consistent with expression of joy and sorrow, respectively, as described in the existing literature on prosody. These prosodic features are systematically deployed in each of the four turns of the prototypical news delivery sequence. Proposals and ratifications of the valence of a delivery are often made prosodically in the initial turns of the prototypical four-turn news delivery, while lexical assessments of news are often made later. When prosody is used to propose the valence of an item of news, subsequent lexical assessments tend to be alignments with these earlier ascriptions of valence, rather than independent appraisals of the news. (Bad news, good news, conversation analysis, prosody, sequencing).
APA, Harvard, Vancouver, ISO, and other styles
2

Jones, Harrison N. "Prosody in Parkinson's Disease." Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders 19, no. 3 (October 2009): 77–82. http://dx.doi.org/10.1044/nnsld19.3.77.

Full text
Abstract:
Abstract Purpose: Prosodic abnormalities are commonly recognized to be present in the speech of individuals with Parkinson's disease (PD) and hypokinetic dysarthria. Emerging evidence also suggests that deficits in the receptive processing of prosody are present in individuals with PD. This paper reviews aspects of prosody in PD, including the perceptual and acoustic features and their effect on communication; receptive deficits in prosodic processing; and the effects of medical, surgical, and behavioral treatments on prosody. Methods: Published reports on the above listed aspects of prosody in PD are reviewed and reported. Results and Conclusions: The perceptual and acoustic characteristics of prosodic impairments in PD are well defined. Perceptually, the principal prosodic features include monopitch, reduced stress, monoloudness, and rate abnormalities. The most common acoustic findings are decreased variability of fundamental frequency (F0) and intensity. A growing literature also suggests that the basal ganglia are critical in receptively processing prosodic information, which is impaired in PD. The role of medical and surgical treatment of PD on speech prosody remains unclear but, overall, appears limited. Behavioral treatments for prosodic disturbance appear promising, though further study is required.
APA, Harvard, Vancouver, ISO, and other styles
3

Corrales-Astorgano, Mario, Pastora Martínez-Castilla, David Escudero-Mancebo, Lourdes Aguilar, César González-Ferreras, and Valentín Cardeñoso-Payo. "Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity." Applied Sciences 9, no. 7 (April 5, 2019): 1440. http://dx.doi.org/10.3390/app9071440.

Full text
Abstract:
Prosody is a fundamental speech element responsible for communicative functions such as intonation, accent and phrasing, and prosodic impairments of individuals with intellectual disabilities reduce their communication skills. Yet, technological resources have paid little attention to prosody. This study aims to develop an automatic classifier to predict the prosodic quality of utterances produced by individuals with Down syndrome, and to analyse how inter-individual heterogeneity affects assessment results. A therapist and an expert in prosody judged the prosodic appropriateness of a corpus of Down syndrome’ utterances collected through a video game. The judgments of the expert were used to train an automatic classifier that predicts prosodic quality by using a set of fundamental frequency, duration and intensity features. The classifier accuracy was 79.3% and its true positive rate 89.9%. We analyzed how informative each of the features was for the assessment and studied relationships between participants’ developmental level and results: interspeaker variability conditioned the relative weight of prosodic features for automatic classification and participants’ developmental level was related to the prosodic quality of their productions. Therefore, since speaker variability is an intrinsic feature of individuals with Down syndrome, it should be considered to attain an effective automatic prosodic assessment system.
APA, Harvard, Vancouver, ISO, and other styles
4

Shiamizadeh, Zohreh, Johanneke Caspers, and Niels O. Schiller. "Do Persian Native Speakers Prosodically Mark Wh-in-situ Questions?" Language and Speech 62, no. 2 (February 5, 2018): 229–49. http://dx.doi.org/10.1177/0023830917753237.

Full text
Abstract:
It has been shown that prosody contributes to the contrast between declarativity and interrogativity, notably in interrogative utterances lacking lexico-syntactic features of interrogativity. Accordingly, it may be proposed that prosody plays a role in marking wh-in-situ questions in which the interrogativity feature (the wh-phrase) does not move to sentence-initial position, as, for example, in Persian. This paper examines whether prosody distinguishes Persian wh-in-situ questions from declaratives in the absence of the interrogativity feature in the sentence-initial position. To answer this question, a production experiment was designed in which wh-questions and declaratives were elicited from Persian native speakers. On the basis of the results of previous studies, we hypothesize that prosodic features mark wh-in-situ questions as opposed to declaratives at both the local (pre- and post-wh part) and global level (complete sentence). The results of the current study confirm our hypothesis that prosodic correlates mark the pre-wh part as well as the complete sentence in wh-in-situ questions. The results support theoretical concepts such as the frequency code, the universal dichotomous association between relaxation and declarativity on the one hand and tension and interrogativity on the other, the relation between prosody and pragmatics, and the relation between prosody and encoding and decoding of sentence type.
APA, Harvard, Vancouver, ISO, and other styles
5

Huhtamäki, Martina, Jan Lindström, and Anne-Marie Londen. "Other-repetition sequences in Finland Swedish: Prosody, grammar, and context in action ascription." Language in Society 49, no. 4 (March 11, 2020): 653–86. http://dx.doi.org/10.1017/s0047404520000056.

Full text
Abstract:
AbstractThis study examines other-repetitions in Finland Swedish talk-in-interaction: their sequential trajectories, prosodic design, and lexicogrammatical features. The key objective is to explore how prosody can contribute to the action conveyed by a repetition turn, that is, whether it deals with a problem of hearing or understanding, a problem of expectation, or just registers receipt of information. The analysis shows that large and upgraded prosodic features (higher onset, wider pitch span than the previous turn) co-occur with repair- and expectation-oriented repetitions, whereas small, downgraded prosody (lower onset, narrower pitch span than the previous turn) is characteristic of registering. However, the distinguishing strength of prosody is mostly gradient (rather than discrete), and because of this, other concomitant cues, most notably the speakers’ epistemic positions in relation to the repeated item, are also of importance for ascribing a certain pragmatic function to a repetition. (Repetition, other-repetition, action ascription, prosody in conversation, repair, epistemics, conversation analysis, interactional linguistics, Finland Swedish)*
APA, Harvard, Vancouver, ISO, and other styles
6

Muñiz-Cachón, Carmen. "Prosody: A feature of languages or a feature of speakers?" Prosodic Issues in Language Contact Situations 16, no. 3 (December 31, 2019): 462–74. http://dx.doi.org/10.1075/sic.00047.mun.

Full text
Abstract:
Abstract Social situations of language coexistence have resulted in linguistic manifestations of bilingualism and diglossia, including linguistic interference, lexical loans and code switching. What role does prosody play in social bilingualism? In other words, when contact between different languages is not restricted to the individual but affects an entire speech community, does a dominant prosody exist? Does prosody vary among different linguistic varieties? In order to find an answer to these questions, we hereby show the results of a research project on the prosodic features of Asturian and Castilian spoken in the centre of Asturias. This experimental study is based on the speech of four informants from Oviedo – two men and two women – two of which speak Castilian, while the other two speak Asturian.
APA, Harvard, Vancouver, ISO, and other styles
7

Ahrens, Barbara. "Prosodic phenomena in simultaneous interpreting." Interpreting. International Journal of Research and Practice in Interpreting 7, no. 1 (June 1, 2005): 51–76. http://dx.doi.org/10.1075/intp.7.1.04ahr.

Full text
Abstract:
This paper reports on an empirical study on prosody in English-German simultaneous interpreting. It discusses prosody with particular reference to its tonal, durational and dynamic features, such as intonation, pauses, rhythm and accent, as well as its main functions, i.e. structure and prominence. Following a review of previous studies on the topic, a conceptual approach for the analysis of prosody in terms of structure and prominence is developed and subsequently applied to an authentic corpus of professional simultaneous interpretation consisting of three German versions of a 72-minute English source text. Prosodic patterns in the corpus are analyzed by means of a computer-aided method using the software PRAAT. The findings confirm that prosodic features are interdependent and that those in the target texts show certain characteristics that are specific to simultaneous interpreting.
APA, Harvard, Vancouver, ISO, and other styles
8

Mirzayeva, Intizar Kahraman. "The Scopes of Experimental-phonetic Analysis." Theory and Practice in Language Studies 6, no. 10 (October 1, 2016): 1912. http://dx.doi.org/10.17507/tpls.0610.03.

Full text
Abstract:
The article investigates the nature of prosodic features of speech. The discussed problem has always been interested the linguists for many years. The prosodic features such as length, accent and stress, tone, intonation and others are analysesd in the article. The article states that from the beginning of the investigation of these features were based primarily on segments – vowels and consonants and prosodic features were either ignored or forced into an inappropriate segmental mould. The author explains the meaning of the term of ‘prosodic means’. She writes that ‘prosodic means’ is derived from the Greek ‘prosodia’ meaning a musical term which appears to signify something like ‘song sung to music’ or ‘sung accompaniment’. It implies that prosody is the musical accompaniment to the words themselves. Recently, the term covers such things as rhythmical patterns, rhyming schemes and verse structure. It is necessary to stress that in linguistic contexts it encounters with a different meaning such as characteristics of utterances as stress and intonation.
APA, Harvard, Vancouver, ISO, and other styles
9

Popovic, Branislav, Dragan Knezevic, Milan Secujski, and Darko Pekar. "Automatic prosody generation in a text-to-speech system for Hebrew." Facta universitatis - series: Electronics and Energetics 27, no. 3 (2014): 467–77. http://dx.doi.org/10.2298/fuee1403467p.

Full text
Abstract:
The paper presents the module for automatic prosody generation within a system for automatic synthesis of high-quality speech based on arbitrary text in Hebrew. The high quality of synthesis is due to the high accuracy of automatic prosody generation, enabling the introduction of elements of natural sentence prosody of Hebrew. Automatic morphological annotation of text is based on the application of an expert algorithm relying on transformational rules. Syntactic-prosodic parsing is also rule based, while the generation of the acoustic representation of prosodic features is based on classification and regression trees. A tree structure generated during the training phase enables accurate prediction of the acoustic representatives of prosody, namely, durations of phonetic segments as well as temporal evolution of fundamental frequency and energy. Such an approach to automatic prosody generation has lead to an improvement in the quality of synthesized speech, as confirmed by listening tests.
APA, Harvard, Vancouver, ISO, and other styles
10

Myers, Brett, Miriam Lense, and Reyna Gordon. "Pushing the Envelope: Developments in Neural Entrainment to Speech and the Biological Underpinnings of Prosody Perception." Brain Sciences 9, no. 3 (March 22, 2019): 70. http://dx.doi.org/10.3390/brainsci9030070.

Full text
Abstract:
Prosodic cues in speech are indispensable for comprehending a speaker’s message, recognizing emphasis and emotion, parsing segmental units, and disambiguating syntactic structures. While it is commonly accepted that prosody provides a fundamental service to higher-level features of speech, the neural underpinnings of prosody processing are not clearly defined in the cognitive neuroscience literature. Many recent electrophysiological studies have examined speech comprehension by measuring neural entrainment to the speech amplitude envelope, using a variety of methods including phase-locking algorithms and stimulus reconstruction. Here we review recent evidence for neural tracking of the speech envelope and demonstrate the importance of prosodic contributions to the neural tracking of speech. Prosodic cues may offer a foundation for supporting neural synchronization to the speech envelope, which scaffolds linguistic processing. We argue that prosody has an inherent role in speech perception, and future research should fill the gap in our knowledge of how prosody contributes to speech envelope entrainment.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Prosody features"

1

Brierley, Claire. "Prosody resources and symbolic prosodic features for automated phrase break prediction." Thesis, University of Leeds, 2011. http://etheses.whiterose.ac.uk/2038/.

Full text
Abstract:
It is universally recognised that humans process speech and language in chunks, each meaningful in itself. Any two renditions or assimilations of a given sentence will exhibit similarities and discrepancies in chunking, where speakers and readers use pauses and inflections to mark phrase breaks. This thesis reviews deterministic and stochastic approaches to phrase break prediction, plus datasets, evaluation metrics and feature sets. Early rule-based experimental work with a chunk parser gives rise to motivational insights, namely: the limitations of traditional features (syntax and punctuation) and deficiency of prosody in current phrasing models, and the problem of evaluating performance when the training set only represents one phrasing variant. Such insights inform resource creation in the form of ProPOSEL, a prosody and part-of-speech English lexicon, to create a domain-independent knowledge source, plus prosodic annotation and text analytics tool for corpus-based research, supported by a comprehensive software tutorial. Future applications of ProPOSEL include prosody-motivated speech-to-viseme generation for "talking heads" and expressive avatar creation. Here, ProPOSEL is used to build the ProPOSEC dataset by merging and annotating two versions of the Spoken English Corpus. Linguistic data arrays in this dataset are first mined for prosodic boundary correlates and later re-conceptualised as training instances for supervised machine learning. This thesis contends that native English speakers use certain sound patterns (e.g. diphthongs and triphthongs) as linguistic signs for phrase breaks, having observed these same patterns at rhythmic junctures in poetry. Pre-boundary lexical items bearing these complex vowels and gold-standard boundary annotations are found to be highly correlated via the chi-squared statistic in different genres, including seventeenth century English verse, and for multiple speakers. Complex vowels and other symbolic prosodic features are then implemented in a phrasing model to evaluate efficacy for phrase break prediction. The ultimate challenge is to better understand how sound and rhythm, as components of the linguistic sign, inform psycholinguistic chunking even during silent reading.
APA, Harvard, Vancouver, ISO, and other styles
2

Väyrynen, E. (Eero). "Emotion recognition from speech using prosodic features." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526204048.

Full text
Abstract:
Abstract Emotion recognition, a key step of affective computing, is the process of decoding an embedded emotional message from human communication signals, e.g. visual, audio, and/or other physiological cues. It is well-known that speech is the main channel for human communication and thus vital in the signalling of emotion and semantic cues for the correct interpretation of contexts. In the verbal channel, the emotional content is largely conveyed as constant paralinguistic information signals, from which prosody is the most important component. The lack of evaluation of affect and emotional states in human machine interaction is, however, currently limiting the potential behaviour and user experience of technological devices. In this thesis, speech prosody and related acoustic features of speech are used for the recognition of emotion from spoken Finnish. More specifically, methods for emotion recognition from speech relying on long-term global prosodic parameters are developed. An information fusion method is developed for short segment emotion recognition using local prosodic features and vocal source features. A framework for emotional speech data visualisation is presented for prosodic features. Emotion recognition in Finnish comparable to the human reference is demonstrated using a small set of basic emotional categories (neutral, sad, happy, and angry). A recognition rate for Finnish was found comparable with those reported in the western language groups. Increased emotion recognition is shown for short segment emotion recognition using fusion techniques. Visualisation of emotional data congruent with the dimensional models of emotion is demonstrated utilising supervised nonlinear manifold modelling techniques. The low dimensional visualisation of emotion is shown to retain the topological structure of the emotional categories, as well as the emotional intensity of speech samples. The thesis provides pattern recognition methods and technology for the recognition of emotion from speech using long speech samples, as well as short stressed words. The framework for the visualisation and classification of emotional speech data developed here can also be used to represent speech data from other semantic viewpoints by using alternative semantic labellings if available
Tiivistelmä Emootiontunnistus on affektiivisen laskennan keskeinen osa-alue. Siinä pyritään ihmisen kommunikaatioon sisältyvien emotionaalisten viestien selvittämiseen, esim. visuaalisten, auditiivisten ja/tai fysiologisten vihjeiden avulla. Puhe on ihmisten tärkein tapa kommunikoida ja on siten ensiarvoisen tärkeässä roolissa viestinnän oikean semanttisen ja emotionaalisen tulkinnan kannalta. Emotionaalinen tieto välittyy puheessa paljolti jatkuvana paralingvistisenä viestintänä, jonka tärkein komponentti on prosodia. Tämän affektiivisen ja emotionaalisen tulkinnan vajaavaisuus ihminen-kone – interaktioissa rajoittaa kuitenkin vielä nykyisellään teknologisten laitteiden toimintaa ja niiden käyttökokemusta. Tässä väitöstyössä on käytetty puheen prosodisia ja akustisia piirteitä puhutun suomen emotionaalisen sisällön tunnistamiseksi. Työssä on kehitetty pitkien puhenäytteiden prosodisiin piirteisiin perustuvia emootiontunnistusmenetelmiä. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamiseksi on taas kehitetty informaatiofuusioon perustuva menetelmä käyttäen prosodian sekä äänilähteen laadullisten piirteiden yhdistelmää. Lisäksi on kehitetty teknologinen viitekehys emotionaalisen puheen visualisoimiseksi prosodisten piirteiden avulla. Tutkimuksessa saavutettiin ihmisten tunnistuskykyyn verrattava automaattisen emootiontunnistuksen taso käytettäessä suppeaa perusemootioiden joukkoa (neutraali, surullinen, iloinen ja vihainen). Emootiontunnistuksen suorituskyky puhutulle suomelle havaittiin olevan verrannollinen länsieurooppalaisten kielten kanssa. Lyhyiden puheenpätkien emotionaalisen sisällön tunnistamisessa saavutettiin taas parempi suorituskyky käytettäessä fuusiomenetelmää. Emotionaalisen puheen visualisoimiseksi kehitetyllä opetettavalla epälineaarisella manifoldimallinnustekniikalla pystyttiin tuottamaan aineistolle emootion dimensionaalisen mallin kaltainen visuaalinen rakenne. Mataladimensionaalisen kuvauksen voitiin edelleen osoittaa säilyttävän sekä tutkimusaineiston emotionaalisten luokkien että emotionaalisen intensiteetin topologisia rakenteita. Tässä väitöksessä kehitettiin hahmontunnistusmenetelmiin perustuvaa teknologiaa emotionaalisen puheen tunnistamiseksi käytettäessä sekä pitkiä että lyhyitä puhenäytteitä. Emotionaalisen aineiston visualisointiin ja luokitteluun kehitettyä teknologista kehysmenetelmää käyttäen voidaan myös esittää puheaineistoa muidenkin semanttisten rakenteiden mukaisesti
APA, Harvard, Vancouver, ISO, and other styles
3

Rask, Linnea. "Prosodic Features in Child-directed Speech during the Child's First Year." Thesis, Stockholms universitet, Avdelningen för fonetik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-118382.

Full text
Abstract:
This study investigates prosodic features of child-directed speech during the child’s first year, using the automated prosodic annotation software Prosogram. From previous studies on first language acquisition and child-directed speech we know that speech directed to infants and small children is characterised by exaggerated use of several prosodic features, including a higher pitch, livelier pitch movement and slower speech rate. Annotation of these phenomena has previously been done manually, which is time consuming and includes a risk of circularity. If we can use semi-automated systems to carry out this task, it would be a huge methodological gain. This study analysed recordings of 10 parent-child pairs at four occasions (3, 6, 9 and 12 months of age) for a total of 40 recordings. The audio files were analysed in Prosogram in order to detect possible differences depending on the child’s age. The results showed a noticeable change in child-directed speech over the first year of the child’s life. A change in several characteristic prosodic features was noted to occur between the ages of 6 and 9 months. Pitch levels decreased, and articulation rate increased. Additionally, parents seemed to use pitch values much higher than their mean pitch speaking to children aged 3 and 6 months than to children aged 9 and 12 months. Despite using a relatively small sample, the results show several interesting trends in the usage of child-directed speech. Furthermore, this study shows that Prosogram is a useful tool for automatic analysis of child-directed speech.
APA, Harvard, Vancouver, ISO, and other styles
4

Zelenák, Martin. "Detection and handling of overlapping speech for speaker diarization." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.

Full text
Abstract:
For the last several years, speaker diarization has been attracting substantial research attention as one of the spoken language technologies applied for the improvement, or enrichment, of recording transcriptions. Recordings of meetings, compared to other domains, exhibit an increased complexity due to the spontaneity of speech, reverberation effects, and also due to the presence of overlapping speech. Overlapping speech refers to situations when two or more speakers are speaking simultaneously. In meeting data, a substantial portion of errors of the conventional speaker diarization systems can be ascribed to speaker overlaps, since usually only one speaker label is assigned per segment. Furthermore, simultaneous speech included in training data can eventually lead to corrupt single-speaker models and thus to a worse segmentation. This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from different microphone pairs are fused by means of principal component analysis, linear discriminant analysis, or by a multi-layer perceptron. In addition, we also investigate the possibility of employing longterm prosodic information. The most suitable subset from a set of candidate prosodic features is determined in two steps. Firstly, a ranking according to mRMR criterion is obtained, and then, a standard hill-climbing wrapper approach is applied in order to determine the optimal number of features. The novel spatial as well as prosodic parameters are used in combination with spectral-based features suggested previously in the literature. In experiments conducted on AMI meeting data, we show that the newly proposed features do contribute to the detection of overlapping speech, especially on data originating from a single recording site. In speaker diarization, for segments including detected speaker overlap, a second speaker label is picked, and such segments are also discarded from the model training. The proposed overlap labeling technique is integrated in Viterbi decoding, a part of the diarization algorithm. During the system development it was discovered that it is favorable to do an independent optimization of overlap exclusion and labeling with respect to the overlap detection system. We report improvements over the baseline diarization system on both single- and multi-site AMI data. Preliminary experiments with NIST RT data show DER improvement on the RT ¿09 meeting recordings as well. The addition of beamforming and TDOA feature stream into the baseline diarization system, which was aimed at improving the clustering process, results in a bit higher effectiveness of the overlap labeling algorithm. A more detailed analysis on the overlap exclusion behavior reveals big improvement contrasts between individual meeting recordings as well as between various settings of the overlap detection operation point. However, a high performance variability across different recordings is also typical of the baseline diarization system, without any overlap handling.
APA, Harvard, Vancouver, ISO, and other styles
5

Fonseca, De Sam Bento Ribeiro Manuel. "Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31338.

Full text
Abstract:
Statistical parametric speech synthesis (SPSS) has seen improvements over recent years, especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it can also be bland and monotonous. Proper generation of natural speech prosody is still a largely unsolved problem. This is relevant especially in the context of expressive audiobook speech synthesis, where speech is expected to be fluid and captivating. In general, prosody can be seen as a layer that is superimposed on the segmental (phone) sequence. Listeners can perceive the same melody or rhythm in different utterances, and the same segmental sequence can be uttered with a different prosodic layer to convey a different message. For this reason, prosody is commonly accepted to be inherently suprasegmental. It is governed by longer units within the utterance (e.g. syllables, words, phrases) and beyond the utterance (e.g. discourse). However, common techniques for the modeling of speech prosody - and speech in general - operate mainly on very short intervals, either at the state or frame level, in both hidden Markov model (HMM) and deep neural network (DNN) based speech synthesis. This thesis presents contributions supporting the claim that stronger representations of suprasegmental variation are essential for the natural generation of fundamental frequency for statistical parametric speech synthesis. We conceptualize the problem by dividing it into three sub-problems: (1) representations of acoustic signals, (2) representations of linguistic contexts, and (3) the mapping of one representation to another. The contributions of this thesis provide novel methods and insights relating to these three sub-problems. In terms of sub-problem 1, we propose a multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform, as well as a wavelet-based decomposition strategy that is linguistically and perceptually motivated. In terms of sub-problem 2, we investigate additional linguistic features such as text-derived word embeddings and syllable bag-of-phones and we propose a novel method for learning word vector representations based on acoustic counts. Finally, considering sub-problem 3, insights are given regarding hierarchical models such as parallel and cascaded deep neural networks.
APA, Harvard, Vancouver, ISO, and other styles
6

Gangireddy, Siva Reddy. "Recurrent neural network language models for automatic speech recognition." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28990.

Full text
Abstract:
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) for large vocabulary continuous speech recognition (LVCSR). RNNLMs are currently state-of-the-art and shown to consistently reduce the word error rates (WERs) of LVCSR tasks when compared to other language models. In this thesis we propose various advances to RNNLMs. The advances are: improved learning procedures for RNNLMs, enhancing the context, and adaptation of RNNLMs. We learned better parameters by a novel pre-training approach and enhanced the context using prosody and syntactic features. We present a pre-training method for RNNLMs, in which the output weights of a feed-forward neural network language model (NNLM) are shared with the RNNLM. This is accomplished by first fine-tuning the weights of the NNLM, which are then used to initialise the output weights of an RNNLM with the same number of hidden units. To investigate the effectiveness of the proposed pre-training method, we have carried out text-based experiments on the Penn Treebank Wall Street Journal data, and ASR experiments on the TED lectures data. Across the experiments, we observe small but significant improvements in perplexity (PPL) and ASR WER. Next, we present unsupervised adaptation of RNNLMs. We adapted the RNNLMs to a target domain (topic or genre or television programme (show)) at test time using ASR transcripts from first pass recognition. We investigated two approaches to adapt the RNNLMs. In the first approach the forward propagating hidden activations are scaled - learning hidden unit contributions (LHUC). In the second approach we adapt all parameters of RNNLM.We evaluated the adapted RNNLMs by showing the WERs on multi genre broadcast speech data. We observe small (on an average 0.1% absolute) but significant improvements in WER compared to a strong unadapted RNNLM model. Finally, we present the context-enhancement of RNNLMs using prosody and syntactic features. The prosody features were computed from the acoustics of the context words and the syntactic features were from the surface form of the words in the context. We trained the RNNLMs with word duration, pause duration, final phone duration, syllable duration, syllable F0, part-of-speech tag and Combinatory Categorial Grammar (CCG) supertag features. The proposed context-enhanced RNNLMs were evaluated by reporting PPL and WER on two speech recognition tasks, Switchboard and TED lectures. We observed substantial improvements in PPL (5% to 15% relative) and small but significant improvements in WER (0.1% to 0.5% absolute).
APA, Harvard, Vancouver, ISO, and other styles
7

Oliveira, Miguel. "Prosodic features in spontaneous narratives." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ61670.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Iliev, Alexander Iliev. "Emotion Recognition Using Glottal and Prosodic Features." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_dissertations/515.

Full text
Abstract:
Emotion conveys the psychological state of a person. It is expressed by a variety of physiological changes, such as changes in blood pressure, heart beat rate, degree of sweating, and can be manifested in shaking, changes in skin coloration, facial expression, and the acoustics of speech. This research focuses on the recognition of emotion conveyed in speech. There were three main objectives of this study. One was to examine the role played by the glottal source signal in the expression of emotional speech. The second was to investigate whether it can provide improved robustness in real-world situations and in noisy environments. This was achieved through testing in clear and various noisy conditions. Finally, the performance of glottal features was compared to diverse existing and newly introduced emotional feature domains. A novel glottal symmetry feature is proposed and automatically extracted from speech. The effectiveness of several inverse filtering methods in extracting the glottal signal from speech has been examined. Other than the glottal symmetry, two additional feature classes were tested for emotion recognition domains. They are the: Tonal and Break Indices (ToBI) of American English intonation, and Mel Frequency Cepstral Coefficients (MFCC) of the glottal signal. Three corpora were specifically designed for the task. The first two investigated the four emotions: Happy, Angry, Sad, and Neutral, and the third added Fear and Surprise in a six emotions recognition task. This work shows that the glottal signal carries valuable emotional information and using it for emotion recognition has many advantages over other conventional methods. For clean speech, in a four emotion recognition task using classical prosodic features achieved 89.67% recognition, ToBI combined with classical features, reached 84.75% recognition, while using glottal symmetry alone achieved 98.74%. For a six emotions task these three methods achieved 79.62%, 90.39% and 85.37% recognition rates, respectively. Using the glottal signal also provided greater classifier robustness under noisy conditions and distortion caused by low pass filtering. Specifically, for additive white Gaussian noise at SNR = 10 dB in the six emotion task the classical features and the classical with ToBI both failed to provide successful results; speech MFCC's achieved a recognition rate of 41.43% and glottal symmetry reached 59.29%. This work has shown that the glottal signal, and the glottal symmetry in particular, provides high class separation for both the four and six emotion cases. It is confidently surpassing the performance of all other features included in this investigation in noisy speech conditions and in most clean signal conditions.
APA, Harvard, Vancouver, ISO, and other styles
9

Chan, Oscar. "Prosodic features for a maximum entropy language model." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.

Full text
Abstract:
A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source of knowledge. This contains, amongst other things, the melodic and rhythmic aspects of speech referred to as prosody. Although prosody has been found to be an important factor in human speech recognition, its use in ASR has been limited. The goal of this research is to investigate how prosodic information can be employed to improve the language modelling component of a continuous speech recognition system. Because prosodic features are largely suprasegmental, operating over units larger than the phonetic segment, the language model is an appropriate place to incorporate such information. The prosodic features and standard language model features are combined under the maximum entropy framework, which provides an elegant solution to modelling information obtained from multiple, differing knowledge sources. We derive features for the model based on perceptually transcribed Tones and Break Indices (ToBI) labels, and analyse their contribution to the word recognition task. While ToBI has a solid foundation in linguistic theory, the need for human transcribers conflicts with the statistical model's requirement for a large quantity of training data. We therefore also examine the applicability of features which can be automatically extracted from the speech signal. We develop representations of an utterance's prosodic context using fundamental frequency, energy and duration features, which can be directly incorporated into the model without the need for manual labelling. Dimensionality reduction techniques are also explored with the aim of reducing the computational costs associated with training a maximum entropy model. Experiments on a prosodically transcribed corpus show that small but statistically significant reductions to perplexity and word error rates can be obtained by using both manually transcribed and automatically extracted features.
APA, Harvard, Vancouver, ISO, and other styles
10

Bryant, Gregory Alan. "Prosodic features of verbal irony in spontaneous speech /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2004. http://uclibs.org/PID/11984.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Prosody features"

1

Fox, Anthony. Prosodic features and prosodic structure: The phonology of suprasegmentals. Oxford: Oxford University Press, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rao, K. Sreenivasa, V. Ramu Reddy, and Sudhamay Maity. Language Identification Using Spectral and Prosodic Features. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-17163-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Rao, K. Sreenivasa, and Shashidhar G. Koolagudi. Robust Emotion Recognition using Spectral and Prosodic Features. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4614-6360-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Rao, K. Sreenivasa. Robust Emotion Recognition using Spectral and Prosodic Features. New York, NY: Springer New York, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Obeng, Samuel Gyasi. Conversational strategies in Akan: Prosodic features and discourse categories. Köln: Köppe, 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Oostendorp, Marc van. Phonological projection: A theory of feature content and prosodic structure. Berlin: Mouton de Gruyter, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Allen, W. Sidney. Accent and rhythm: Prosodic features of Latin and Greek : a study in theory and reconstruction. Cambridge: Cambridge University Press, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Zhi, Minje. Studies on prosodic features of Korean: Phonetic properties of quantity in Seoul and tone in Busan. Umea: Department of Phonetics, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Silva, Viola De. Quantity and quality as universal and specific features of sound systems: Experimental phonetic research on interaction of Russian and Finnish sound systems. Jyväskylä: University of Jyväskylä, 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Geluykens, Ronald. "Questioning intonation": An empirical study into the prosodic feature "rising intonation" and its relevance for the production and recognition of questions. Wilrijk, Belgium: Universiteit Antwerpen, Universitaire Instelling Antwerpen, Departement Germaanse, Afdeling Linguïstiek, 1986.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Prosody features"

1

Hirose, Keikiehi. "Disambiguating Recognition Results by Prosodic Features." In Computing Prosody, 327–42. New York, NY: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4612-2258-3_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nakajima, Shin’ya, and Hajime Tsukada. "Prosodic Features of Utterances in Task-Oriented Dialogues." In Computing Prosody, 81–93. New York, NY: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4612-2258-3_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sato, Hirokazu. "Interaction between Phonetic Features and Accent-Placement in Japanese Family Names." In Prosody and Syntax, 223. Amsterdam: John Benjamins Publishing Company, 2006. http://dx.doi.org/10.1075/ubli.3.13sat.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kang, Okim, David O. Johnson, and Alyssa Kermad. "Computerized Systems for Measuring Suprasegmental Features." In Second Language Prosody and Computer Modeling, 87–120. New York: Routledge, 2021. http://dx.doi.org/10.4324/9781003022695-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ding, Hongwei, and Rüdiger Hoffmann. "An Investigation of Prosodic Features in the German Speech of Chinese Speakers." In Prosody and Language in Contact, 221–41. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015. http://dx.doi.org/10.1007/978-3-662-45168-7_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rajendran, Vaibhavi, and G. Bharadwaja Kumar. "Prosody Detection from Text Using Aggregative Linguistic Features." In Communications in Computer and Information Science, 736–49. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8657-1_57.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Anne, Koteswara Rao, Swarna Kuchibhotla, and Hima Deepthi Vankayalapati. "Emotion Recognition Using Prosodic Features." In SpringerBriefs in Electrical and Computer Engineering, 7–15. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-15530-2_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fishman, Ben, Itshak Lapidot, and Irit Opher. "Prosodic Features’ Criterion for Hebrew." In Text, Speech, and Dialogue, 482–91. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00794-2_52.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rao, K. Sreenivasa, V. Ramu Reddy, and Sudhamay Maity. "Language Identification Using Prosodic Features." In SpringerBriefs in Electrical and Computer Engineering, 55–81. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-17163-0_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mary, Leena. "Prosodic Features for Speaker Recognition." In Forensic Speaker Recognition, 365–88. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0263-3_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Prosody features"

1

Dominguez, Mónica, Mireia Farrús, and Leo Wanner. "Combining acoustic and linguistic features in phrase-oriented prosody prediction." In Speech Prosody 2016. ISCA, 2016. http://dx.doi.org/10.21437/speechprosody.2016-163.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Magnier, Julien, Maya Gratier, and Anne Lacheret. "Expressive prosody vs neutral prosody: From descriptive binary to continuous features." In 7th International Conference on Speech Prosody 2014. ISCA: ISCA, 2014. http://dx.doi.org/10.21437/speechprosody.2014-16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Abburi, Harika, K. N. R. K. Raju Alluri, Anil Kumar Vuppala, Manish Shrivastava, and Suryakanth V. Gangashetty. "Sentiment analysis using relative prosody features." In 2017 Tenth International Conference on Contemporary Computing (IC3). IEEE, 2017. http://dx.doi.org/10.1109/ic3.2017.8284296.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Le Maguer, Sébastien, Bernd Möbius, and Ingmar Steiner. "Toward the use of information density based descriptive features in HMM based speech synthesis." In Speech Prosody 2016. ISCA, 2016. http://dx.doi.org/10.21437/speechprosody.2016-211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhao, Yi, Chuang Ding, Nobuaki Minematsu, and Daisuke Saito. "A study on BLSTM-RNN-based Chinese prosodic structure prediction in a unified framework with character-level features." In Speech Prosody 2016. ISCA, 2016. http://dx.doi.org/10.21437/speechprosody.2016-14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pistor, Tillmann, and Carsten Keil. "VJ.PEAT: Automated measurement of prosodic features." In 9th International Conference on Speech Prosody 2018. ISCA: ISCA, 2018. http://dx.doi.org/10.21437/speechprosody.2018-115.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Virkkunen, Päivi, Juraj Šimko, Heini Kallio, and Martti Vainio. "Prosodic features of Finnish compound words." In 9th International Conference on Speech Prosody 2018. ISCA: ISCA, 2018. http://dx.doi.org/10.21437/speechprosody.2018-177.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Yue, and Yi Xu. "Intermediate features are not useful for tone perception." In 10th International Conference on Speech Prosody 2020. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/speechprosody.2020-105.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sloan, Rose, Syed Sarfaraz Akhtar, Bryan Li, Ritvik Shrivastava, Agustin Gravano, and Julia Hirschberg. "Prosody Prediction from Syntactic, Lexical, and Word Embedding Features." In 10th ISCA Speech Synthesis Workshop. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/ssw.2019-48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Hung, Yu-Ping, Han-Yun Yeh, I.-Bin Liao, Chen-Ming Pan, and Chen-Yu Chiang. "An investigation on linguistic features for Mandarin prosody generation." In 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA). IEEE, 2014. http://dx.doi.org/10.1109/icsda.2014.7051426.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Prosody features"

1

Clemens, Denise. A study of the capability of the computerized Visi-Pitch when investigating prosodic features of motherese. Portland State University Library, January 2000. http://dx.doi.org/10.15760/etd.5627.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography