Статті в журналах з теми "Emotional speech database"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Emotional speech database.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Emotional speech database".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Tank, Vishal P., and S. K. Hadia. "Creation of speech corpus for emotion analysis in Gujarati language and its evaluation by various speech parameters." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 5 (October 1, 2020): 4752. http://dx.doi.org/10.11591/ijece.v10i5.pp4752-4758.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In the last couple of years emotion recognition has proven its significance in the area of artificial intelligence and man machine communication. Emotion recognition can be done using speech and image (facial expression), this paper deals with SER (speech emotion recognition) only. For emotion recognition emotional speech database is essential. In this paper we have proposed emotional database which is developed in Gujarati language, one of the official’s language of India. The proposed speech corpus bifurcate six emotional states as: sadness, surprise, anger, disgust, fear, happiness. To observe effect of different emotions, analysis of proposed Gujarati speech database is carried out using efficient speech parameters like pitch, energy and MFCC using MATLAB Software.
2

Byun, Sung-Woo, and Seok-Pil Lee. "A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms." Applied Sciences 11, no. 4 (February 21, 2021): 1890. http://dx.doi.org/10.3390/app11041890.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies.
3

손남호, Hwang Hyosung, and Ho-Young Lee. "Emotional Speech Database and the Acoustic Analysis of Emotional Speech." EONEOHAG ll, no. 72 (August 2015): 175–99. http://dx.doi.org/10.17290/jlsk.2015..72.175.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Vicsi, Klára, and Dávid Sztahó. "Recognition of Emotions on the Basis of Different Levels of Speech Segments." Journal of Advanced Computational Intelligence and Intelligent Informatics 16, no. 2 (March 20, 2012): 335–40. http://dx.doi.org/10.20965/jaciii.2012.p0335.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Emotions play a very important role in human-human and human-machine communication. They can be expressed by voice, bodily gestures, and facial movements. People’s acceptance of any kind of intelligent device depends, to a large extent, on how the device reflects emotions. This is the reason why automatic emotion recognition is a recent research topic. In this paper we deal with automatic emotion recognition from human voice. Numerous papers in this field deal with database creation and with the examination of acoustic features appropriate for such recognition, but only few attempts were made to compare different emotional segmentation units that are needed to recognize the emotions in spontaneous speech properly. In the Laboratory of Speech Acoustics experiments were ran to examine the effect of diverse speech segment lengths on recognition performance. An emotional database was prepared on the basis of three different segmentation levels: word, intonational phrase and sentence. Automatic recognition tests were conducted using support vector machines with four basic emotions: neutral, anger, sadness, and joy. The analysis of the results clearly shows that intonation phrase-sized speech units give the best performance in emotional recognition in continuous speech.
5

Quan, Changqin, Bin Zhang, Xiao Sun, and Fuji Ren. "A combined cepstral distance method for emotional speech recognition." International Journal of Advanced Robotic Systems 14, no. 4 (July 1, 2017): 172988141771983. http://dx.doi.org/10.1177/1729881417719836.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.
6

Shahin, Ismail. "Employing Emotion Cues to Verify Speakers in Emotional Talking Environments." Journal of Intelligent Systems 25, no. 1 (January 1, 2016): 3–17. http://dx.doi.org/10.1515/jisys-2014-0118.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractUsually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people’s talking tone include happiness, anger, and sadness. Such emotions are directly affected by the patient’s health status. In neutral talking environments, speakers can be easily verified; however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker’s emotion cues (text-independent and emotion-dependent speaker verification problem) based on both hidden Markov models (HMMs) and suprasegmental HMMs as classifiers. The approach is composed of two cascaded stages that combine and integrate an emotion recognizer and a speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and the Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.
7

Caballero-Morales, Santiago-Omar. "Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels." Scientific World Journal 2013 (2013): 1–13. http://dx.doi.org/10.1155/2013/162093.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR’s output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech.
8

Sultana, Sadia, M. Shahidur Rahman, M. Reza Selim, and M. Zafar Iqbal. "SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla." PLOS ONE 16, no. 4 (April 30, 2021): e0250173. http://dx.doi.org/10.1371/journal.pone.0250173.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
SUBESCO is an audio-only emotional speech corpus for Bangla language. The total duration of the corpus is in excess of 7 hours containing 7000 utterances, and it is the largest emotional speech corpus available for this language. Twenty native speakers participated in the gender-balanced set, each recording of 10 sentences simulating seven targeted emotions. Fifty university students participated in the evaluation of this corpus. Each audio clip of this corpus, except those of Disgust emotion, was validated four times by male and female raters. Raw hit rates and unbiased rates were calculated producing scores above chance level of responses. Overall recognition rate was reported to be above 70% for human perception tests. Kappa statistics and intra-class correlation coefficient scores indicated high-level of inter-rater reliability and consistency of this corpus evaluation. SUBESCO is an Open Access database, licensed under Creative Common Attribution 4.0 International, and can be downloaded free of charge from the web link: https://doi.org/10.5281/zenodo.4526477.
9

Keshtiari, Niloofar, Michael Kuhlmann, Moharram Eslami, and Gisela Klann-Delius. "Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD)." Behavior Research Methods 47, no. 1 (May 23, 2014): 275–94. http://dx.doi.org/10.3758/s13428-014-0467-x.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Werner, S., and G. N. Petrenko. "Speech Emotion Recognition: Humans vs Machines." Discourse 5, no. 5 (December 18, 2019): 136–52. http://dx.doi.org/10.32603/2412-8562-2019-5-5-136-152.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Introduction. The study focuses on emotional speech perception and speech emotion recognition using prosodic clues alone. Theoretical problems of defining prosody, intonation and emotion along with the challenges of emotion classification are discussed. An overview of acoustic and perceptional correlates of emotions found in speech is provided. Technical approaches to speech emotion recognition are also considered in the light of the latest emotional speech automatic classification experiments.Methodology and sources. The typical “big six” classification commonly used in technical applications is chosen and modified to include such emotions as disgust and shame. A database of emotional speech in Russian is created under sound laboratory conditions. A perception experiment is run using Praat software’s experimental environment.Results and discussion. Cross-cultural emotion recognition possibilities are revealed, as the Finnish and international participants recognised about a half of samples correctly. Nonetheless, native speakers of Russian appear to distinguish a larger proportion of emotions correctly. The effects of foreign languages knowledge, musical training and gender on the performance in the experiment were insufficiently prominent. The most commonly confused pairs of emotions, such as shame and sadness, surprise and fear, anger and disgust as well as confusions with neutral emotion were also given due attention.Conclusion. The work can contribute to psychological studies, clarifying emotion classification and gender aspect of emotionality, linguistic research, providing new evidence for prosodic and comparative language studies, and language technology, deepening the understanding of possible challenges for SER systems.
11

Anvarjon, Tursunov, Mustaqeem, and Soonil Kwon. "Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features." Sensors 20, no. 18 (September 12, 2020): 5212. http://dx.doi.org/10.3390/s20185212.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Artificial intelligence (AI) and machine learning (ML) are employed to make systems smarter. Today, the speech emotion recognition (SER) system evaluates the emotional state of the speaker by investigating his/her speech signal. Emotion recognition is a challenging task for a machine. In addition, making it smarter so that the emotions are efficiently recognized by AI is equally challenging. The speech signal is quite hard to examine using signal processing methods because it consists of different frequencies and features that vary according to emotions, such as anger, fear, sadness, happiness, boredom, disgust, and surprise. Even though different algorithms are being developed for the SER, the success rates are very low according to the languages, the emotions, and the databases. In this paper, we propose a new lightweight effective SER model that has a low computational complexity and a high recognition accuracy. The suggested method uses the convolutional neural network (CNN) approach to learn the deep frequency features by using a plain rectangular filter with a modified pooling strategy that have more discriminative power for the SER. The proposed CNN model was trained on the extracted frequency features from the speech data and was then tested to predict the emotions. The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results. The experimental results demonstrated that the proposed CNN-based SER system can achieve a better recognition performance than the state-of-the-art SER systems.
12

Zhao, Hui, Yu Tai Wang, and Xing Hai Yang. "Emotion Detection System Based on Speech and Facial Signals." Advanced Materials Research 459 (January 2012): 483–87. http://dx.doi.org/10.4028/www.scientific.net/amr.459.483.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This paper introduces the present status of speech emotion detection. In order to improve the emotion recognition rate of single mode, the bimodal fusion method based on speech and facial expression is proposed. First, we establishes emotional database include speech and facial expression. For different emotions, calm, happy, surprise, anger, sad, we extract ten speech parameters and use the PCA method to detect the speech emotion. Then we analyze the bimodal emotion detection of fusing facial expression information. The experiment results show that the emotion recognition rate with bimodal fusion is about 6 percent points higher than the recognition rate with only speech prosodic features
13

Arimoto, Yoshiko, Sumio Ohno, and Hitoshi Iida. "Assessment of spontaneous emotional speech database toward emotion recognition: Intensity and similarity of perceived emotion from spontaneously expressed emotional speech." Acoustical Science and Technology 32, no. 1 (2011): 26–29. http://dx.doi.org/10.1250/ast.32.26.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Batliner, Anton, Dino Seppi, Stefan Steidl, and Björn Schuller. "Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach." Advances in Human-Computer Interaction 2010 (2010): 1–15. http://dx.doi.org/10.1155/2010/782802.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We deal with the topic of segmenting emotion-related (emotional/affective) episodes into adequate units for analysis and automatic processing/classification—a topic that has not been addressed adequately so far. We concentrate on speech and illustrate promising approaches by using a database with children's emotional speech. We argue in favour of the word as basic unit and map sequences of words on both syntactic and ‘‘emotionally consistent” chunks and report classification performances for an exhaustive modelling of our data by mapping word-based paralinguistic emotion labels onto three classes representing valence (positive, neutral, negative), and onto a fourth rest (garbage) class.
15

B.Waghmare, V., R. R. Deshmukh, P. P. Shrishrimal, and G. B. Janvale. "Development of Isolated Marathi Words Emotional Speech Database." International Journal of Computer Applications 94, no. 4 (May 16, 2014): 19–22. http://dx.doi.org/10.5120/16331-5611.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Moriarty, Peter M., Michelle Vigeant, Rachel Wolf, Rick Gilmore, and Pamela Cole. "Creation and characterization of an emotional speech database." Journal of the Acoustical Society of America 143, no. 3 (March 2018): 1869. http://dx.doi.org/10.1121/1.5036133.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Huang, Ri Sheng. "Information Technology in an Improved Supervised Locally Linear Embedding for Recognizing Speech Emotion." Advanced Materials Research 1014 (July 2014): 375–78. http://dx.doi.org/10.4028/www.scientific.net/amr.1014.375.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
To improve effectively the performance on speech emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech feature data lying on a nonlinear manifold embedded in high-dimensional acoustic space. This paper proposes an improved SLLE algorithm, which enhances the discriminating power of low-dimensional embedded data and possesses the optimal generalization ability. The proposed algorithm is used to conduct nonlinear dimensionality reduction for 48-dimensional speech emotional feature data including prosody so as to recognize three emotions including anger, joy and neutral. Experimental results on the natural speech emotional database demonstrate that the proposed algorithm obtains the highest accuracy of 90.97% with only less 9 embedded features, making 11.64% improvement over SLLE algorithm.
18

Tursunov, Anvarjon, Soonil Kwon, and Hee-Suk Pang. "Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features." Applied Sciences 9, no. 12 (June 17, 2019): 2470. http://dx.doi.org/10.3390/app9122470.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.
19

Trabelsi, Imen, and Med Salim Bouhlel. "Feature Selection for GUMI Kernel-Based SVM in Speech Emotion Recognition." International Journal of Synthetic Emotions 6, no. 2 (July 2015): 57–68. http://dx.doi.org/10.4018/ijse.2015070104.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech emotion recognition is the indispensable requirement for efficient human machine interaction. Most modern automatic speech emotion recognition systems use Gaussian mixture models (GMM) and Support Vector Machines (SVM). GMM are known for their performance and scalability in the spectral modeling while SVM are known for their discriminatory power. A GMM-supervector characterizes an emotional style by the GMM parameters (mean vectors, covariance matrices, and mixture weights). GMM-supervector SVM benefits from both GMM and SVM frameworks. In this paper, the GMM-UBM mean interval (GUMI) kernel based on the Bhattacharyya distance is successfully used. CFSSubsetEval combined with Best first algorithm and Greedy stepwise were also utilized on the supervectors space in order to select the most important features. This framework is illustrated using Mel-frequency cepstral (MFCC) coefficients and Perceptual Linear Prediction (PLP) features on two different emotional databases namely the Surrey Audio-Expressed Emotion and the Berlin Emotional speech Database.
20

Keshtiari, Niloofar, Michael Kuhlmann, Moharram Eslami, and Gisela Klann-Delius. "Erratum to: Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD)." Behavior Research Methods 47, no. 1 (November 26, 2014): 295. http://dx.doi.org/10.3758/s13428-014-0504-9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Zvarevashe, Kudakwashe, and Oludayo O. Olugbara. "Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm." Intelligent Data Analysis 24, no. 5 (September 30, 2020): 1065–86. http://dx.doi.org/10.3233/ida-194747.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted features and machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.
22

Seo, Minji, and Myungho Kim. "Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition." Sensors 20, no. 19 (September 28, 2020): 5559. http://dx.doi.org/10.3390/s20195559.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech emotion recognition (SER) classifies emotions using low-level features or a spectrogram of an utterance. When SER methods are trained and tested using different datasets, they have shown performance reduction. Cross-corpus SER research identifies speech emotion using different corpora and languages. Recent cross-corpus SER research has been conducted to improve generalization. To improve the cross-corpus SER performance, we pretrained the log-mel spectrograms of the source dataset using our designed visual attention convolutional neural network (VACNN), which has a 2D CNN base model with channel- and spatial-wise visual attention modules. To train the target dataset, we extracted the feature vector using a bag of visual words (BOVW) to assist the fine-tuned model. Because visual words represent local features in the image, the BOVW helps VACNN to learn global and local features in the log-mel spectrogram by constructing a frequency histogram of visual words. The proposed method shows an overall accuracy of 83.33%, 86.92%, and 75.00% in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Berlin Database of Emotional Speech (EmoDB), and Surrey Audio-Visual Expressed Emotion (SAVEE), respectively. Experimental results on RAVDESS, EmoDB, SAVEE demonstrate improvements of 7.73%, 15.12%, and 2.34% compared to existing state-of-the-art cross-corpus SER approaches.
23

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." International Journal of Synthetic Emotions 7, no. 1 (January 2016): 58–68. http://dx.doi.org/10.4018/ijse.2016010105.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.
24

Sekkate, Sara, Mohammed Khalil, Abdellah Adib, and Sofia Ben Jebara. "An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition." Computers 8, no. 4 (December 13, 2019): 91. http://dx.doi.org/10.3390/computers8040091.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Because one of the key issues in improving the performance of Speech Emotion Recognition (SER) systems is the choice of an effective feature representation, most of the research has focused on developing a feature level fusion using a large set of features. In our study, we propose a relatively low-dimensional feature set that combines three features: baseline Mel Frequency Cepstral Coefficients (MFCCs), MFCCs derived from Discrete Wavelet Transform (DWT) sub-band coefficients that are denoted as DMFCC, and pitch based features. Moreover, the performance of the proposed feature extraction method is evaluated in clean conditions and in the presence of several real-world noises. Furthermore, conventional Machine Learning (ML) and Deep Learning (DL) classifiers are employed for comparison. The proposal is tested using speech utterances of both of the Berlin German Emotional Database (EMO-DB) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) speech databases through speaker independent experiments. Experimental results show improvement in speech emotion detection over baselines.
25

Sun, Ying, Xue-Ying Zhang, Jiang-He Ma, Chun-Xiao Song, and Hui-Fen Lv. "Nonlinear Dynamic Feature Extraction Based on Phase Space Reconstruction for the Classification of Speech and Emotion." Mathematical Problems in Engineering 2020 (April 9, 2020): 1–15. http://dx.doi.org/10.1155/2020/9452976.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Due to the shortcomings of linear feature parameters in speech signals, and the limitations of existing time- and frequency-domain attribute features in characterizing the integrity of the speech information, in this paper, we propose a nonlinear method for feature extraction based on the phase space reconstruction (PSR) theory. First, the speech signal was analyzed using a nonlinear dynamic model. Then, the model was used to reconstruct a one-dimensional time speech signal. Finally, nonlinear dynamic (NLD) features based on the reconstruction of the phase space were extracted as the new characteristic parameters. Then, the performance of NLD features was verified by comparing their recognition rates with those of other features (NLD features, prosodic features, and MFCC features). Finally, the Korean isolated words database, the Berlin emotional speech database, and the CASIA emotional speech database were chosen for validation. The effectiveness of the NLD features was tested using the Support Vector Machine classifier. The results show that NLD features not only have high recognition rate and excellent antinoise performance for speech recognition tasks but also can fully characterize the different emotions contained in speech signals.
26

Huang, Chengwei, Guoming Chen, Hua Yu, Yongqiang Bao, and Li Zhao. "Speech Emotion Recognition under White Noise." Archives of Acoustics 38, no. 4 (December 1, 2013): 457–63. http://dx.doi.org/10.2478/aoa-2013-0054.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.
27

Mustaqeem and Soonil Kwon. "A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition." Sensors 20, no. 1 (December 28, 2019): 183. http://dx.doi.org/10.3390/s20010183.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech is the most significant mode of communication among human beings and a potential method for human-computer interaction (HCI) by using a microphone sensor. Quantifiable emotion recognition using these sensors from speech signals is an emerging area of research in HCI, which applies to multiple applications such as human-reboot interaction, virtual reality, behavior assessment, healthcare, and emergency call centers to determine the speaker’s emotional state from an individual’s speech. In this paper, we present major contributions for; (i) increasing the accuracy of speech emotion recognition (SER) compared to state of the art and (ii) reducing the computational complexity of the presented SER model. We propose an artificial intelligence-assisted deep stride convolutional neural network (DSCNN) architecture using the plain nets strategy to learn salient and discriminative features from spectrogram of speech signals that are enhanced in prior steps to perform better. Local hidden patterns are learned in convolutional layers with special strides to down-sample the feature maps rather than pooling layer and global discriminative features are learned in fully connected layers. A SoftMax classifier is used for the classification of emotions in speech. The proposed technique is evaluated on Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets to improve accuracy by 7.85% and 4.5%, respectively, with the model size reduced by 34.5 MB. It proves the effectiveness and significance of the proposed SER technique and reveals its applicability in real-world applications.
28

Bang, Jaehun, Taeho Hur, Dohyeong Kim, Thien Huynh-The, Jongwon Lee, Yongkoo Han, Oresti Banos, Jee-In Kim, and Sungyoung Lee. "Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments." Sensors 18, no. 11 (November 2, 2018): 3744. http://dx.doi.org/10.3390/s18113744.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Personalized emotion recognition provides an individual training model for each target user in order to mitigate the accuracy problem when using general training models collected from multiple users. Existing personalized speech emotion recognition research has a cold-start problem that requires a large amount of emotionally-balanced data samples from the target user when creating the personalized training model. Such research is difficult to apply in real environments due to the difficulty of collecting numerous target user speech data with emotionally-balanced label samples. Therefore, we propose the Robust Personalized Emotion Recognition Framework with the Adaptive Data Boosting Algorithm to solve the cold-start problem. The proposed framework incrementally provides a customized training model for the target user by reinforcing the dataset by combining the acquired target user speech with speech from other users, followed by applying SMOTE (Synthetic Minority Over-sampling Technique)-based data augmentation. The proposed method proved to be adaptive across a small number of target user datasets and emotionally-imbalanced data environments through iterative experiments using the IEMOCAP (Interactive Emotional Dyadic Motion Capture) database.
29

Zaidan, Noor Aina, and Md Sah Hj Salam. "Emotional speech feature selection using end-part segmented energy feature." Indonesian Journal of Electrical Engineering and Computer Science 15, no. 3 (September 1, 2019): 1374. http://dx.doi.org/10.11591/ijeecs.v15.i3.pp1374-1381.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The accuracy of human emotional detection is crucial in the industry to ensure effective conversations and messages delivery. The process involved in identifying emotions must be carried out properly and using a method that guarantees high level of emotional recognition. Energy feature is said to be a prosodic information encoder and there are still studies on energy use in speech prosody and it motivate us to run an experiment on energy features. We have conducted two sets of studies: 1) whether local or global features that contribute most to emotional recognition and 2) the effect of the end-part segment length towards emotion recognition accuracy using 2 types of segmentation approach. This paper discussed about Absolute Time Intervals at Relative Positions (ATIR) segmentation approach and global ATIR (GATIR) using end-part segmented global energy feature extracted from Berlin Emotional Speech Database (EMO-DB). We observed that global feature contribute more to the emotional recognition and global features that are derived from longer segments give higher recognition accuracy than global feature derived from short segments. The addition of utterance-based feature (GTI) to ATIR segmentation somewhat contributes to increase the accuracy by 5% up to 8% and conclude that GATIR outperformed ATIR segmentation approached in term of its higher recognition rate. The results of this study where almost all the sub-tests provide an increased result proving that global feature derived from longer segment lengths acquire more emotional information and enhance the system performance.
30

Noh, Kyoung Ju, Chi Yoon Jeong, Jiyoun Lim, Seungeun Chung, Gague Kim, Jeong Mook Lim, and Hyuntae Jeong. "Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets." Sensors 21, no. 5 (February 24, 2021): 1579. http://dx.doi.org/10.3390/s21051579.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.
31

Pramod Reddy, A., and Vijayarajan V. "Recognition of human emotion with spectral features using multi layer-perceptron." International Journal of Knowledge-based and Intelligent Engineering Systems 24, no. 3 (September 28, 2020): 227–33. http://dx.doi.org/10.3233/kes-200044.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
For emotion recognition, here the features extracted from prevalent speech samples of Berlin emotional database are pitch, intensity, log energy, formant, mel-frequency ceptral coefficients (MFCC) as base features and power spectral density as an added function of frequency. In these work seven emotions namely anger, neutral, happy, Boredom, disgust, fear and sadness are considered in our study. Temporal and Spectral features are considered for building AER(Automatic Emotion Recognition) model. The extracted features are analyzed using Support Vector Machine (SVM) and with multilayer perceptron (MLP) a class of feed-forward ANN classifiers is/are used to classify different emotional states. We observed 91% accuracy for Angry and Boredom emotional classes by using SVM and more than 96% accuracy using ANN and with an overall accuracy of 87.17% using SVM, 94% for ANN.
32

Jaratrotkamjorn, Apichart. "Bimodal Emotion Recognition Using Deep Belief Network." ECTI Transactions on Computer and Information Technology (ECTI-CIT) 15, no. 1 (January 14, 2021): 73–81. http://dx.doi.org/10.37936/ecti-cit.2021151.226446.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The emotions are very important in human daily life. In order to make the machine can recognize the human emotional state, and it can intelligently respond to need for human, which are very important in human-computer interaction. The majority of existing work concentrate on the classification of six basic emotions only. In this research work propose the emotion recognition system through the multimodal approach, which integrated information from both facial and speech expressions. The database has eight basic emotions (neutral, calm, happy, sad, angry, fearful, disgust, and surprised). Emotions are classified using deep belief network method. The experiment results show that the performance of bimodal emotion recognition system, it has better improvement. The overall accuracy rate is 97.92%.
33

Cai, Linqin, Yaxin Hu, Jiangong Dong, and Sitong Zhou. "Audio-Textual Emotion Recognition Based on Improved Neural Networks." Mathematical Problems in Engineering 2019 (December 31, 2019): 1–9. http://dx.doi.org/10.1155/2019/2593036.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.
34

Jiang, Xiaoqing, Kewen Xia, Lingyin Wang, and Yongliang Lin. "Reordering Features with Weights Fusion in Multiclass and Multiple-Kernel Speech Emotion Recognition." Journal of Electrical and Computer Engineering 2017 (2017): 1–7. http://dx.doi.org/10.1155/2017/8709518.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The selection of feature subset is a crucial aspect in speech emotion recognition problem. In this paper, a Reordering Features with Weights Fusion (RFWF) algorithm is proposed for selecting more effective and compact feature subset. The RFWF algorithm fuses the weights reflecting the relevance, complementarity, and redundancy between features and classes comprehensively and implements the reordering of features to construct feature subset with excellent emotional recognizability. A binary-tree structured multiple-kernel SVM classifier is adopted in emotion recognition. And different feature subsets are selected in different nodes of the classifier. The highest recognition accuracy of the five emotions in Berlin database is 90.549% with only 15 features selected by RFWF. The experimental results show the effectiveness of RFWF in building feature subset and the utilization of different feature subsets for specified emotions can improve the overall recognition performance.
35

Lee, Sanghyun, David K. Han, and Hanseok Ko. "Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition." Sensors 20, no. 22 (November 23, 2020): 6688. http://dx.doi.org/10.3390/s20226688.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.
36

Yu, Yeonguk, and Yoon-Joong Kim. "Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database." Electronics 9, no. 5 (April 26, 2020): 713. http://dx.doi.org/10.3390/electronics9050713.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.
37

Lieskovská, Eva, Maroš Jakubec, Roman Jarina, and Michal Chmulík. "A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism." Electronics 10, no. 10 (May 13, 2021): 1163. http://dx.doi.org/10.3390/electronics10101163.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The attention mechanism has been recently incorporated in DNN architectures to emphasise also emotional salient information. This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance. Overall comparison of the system accuracies is performed on a widely used IEMOCAP benchmark database.
38

Huang, Chengwei, Ruiyu Liang, Qingyun Wang, Ji Xi, Cheng Zha, and Li Zhao. "Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data." Mathematical Problems in Engineering 2013 (2013): 1–9. http://dx.doi.org/10.1155/2013/265819.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We study the cross-database speech emotion recognition based on online learning. How to apply a classifier trained on acted data to naturalistic data, such as elicited data, remains a major challenge in today’s speech emotion recognition system. We introduce three types of different data sources: first, a basic speech emotion dataset which is collected from acted speech by professional actors and actresses; second, a speaker-independent data set which contains a large number of speakers; third, an elicited speech data set collected from a cognitive task. Acoustic features are extracted from emotional utterances and evaluated by using maximal information coefficient (MIC). A baseline valence and arousal classifier is designed based on Gaussian mixture models. Online training module is implemented by using AdaBoost. While the offline recognizer is trained on the acted data, the online testing data includes the speaker-independent data and the elicited data. Experimental results show that by introducing the online learning module our speech emotion recognition system can be better adapted to new data, which is an important character in real world applications.
39

Partila, Pavol, Miroslav Voznak, and Jaromir Tovarek. "Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System." Scientific World Journal 2015 (2015): 1–7. http://dx.doi.org/10.1155/2015/573068.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks,k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
40

Cairong, Zou, Zhang Xinran, Zha Cheng, and Zhao Li. "A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition." Journal of Electrical and Computer Engineering 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/7437860.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN) in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram) as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.
41

Helmiyah, Siti, Abdul Fadlil, and Anton Yudhana. "Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC)." CogITo Smart Journal 4, no. 2 (February 8, 2019): 372. http://dx.doi.org/10.31154/cogito.v4i2.129.372-381.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Human emotion recognition subject becomes important due to it's usability in daily lifestyle which requires human and computer interraction. Human emotion recognition is a complex problem due to the difference within custom tradition and specific dialect which exists on different ethnic, region and community. This problem also exacerbated due to objectivity assessment for the emotion is difficult since emotion happens unconsciously. This research conducts an experiment to discover pattern of emotion based on feature extracted from speech. Method used for feature extraction on this experiment is Mel-Frequency Cepstral Coefficient (MFCC) which is a method that similar to the human hearing system. Dataset used on this experiment is Berlin Database of Emotional Speech (Emo-DB). Emotions that are used for this experiments are happiness, boredom, neutral, sad and anger. For each of these emotion, 3 samples from Emo-DB are taken as experimental subject. The emotion patterns are successfully visible using specific values for MFCC parameters such as 25 for frame duration, 10 for frame shift, 0.97 for preemphasis coefficient, 20 for filterbank channel and 12 for ceptral coefficients. MFCC features are then extracted and calculated to find mean values from these parameters. These mean values are then plotted based on timeframe graph to be investigated to find the specific pattern which appears from each emotion. Keywords— Emotion, Speech, Mel-Frequency Cepstral Coefficients (MFCC).
42

Agrima, Abdellah, Ilham Mounir, Abdelmajid Farchi, Laila Elmaazouzi, and Badia Mounir. "Emotion recognition from syllabic units using k-nearest-neighbor classification and energy distribution." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 6 (December 1, 2021): 5438. http://dx.doi.org/10.11591/ijece.v11i6.pp5438-5449.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this article, we present an automatic technique for recognizing emotional states from speech signals. The main focus of this paper is to present an efficient and reduced set of acoustic features that allows us to recognize the four basic human emotions (anger, sadness, joy, and neutral). The proposed features vector is composed by twenty-eight measurements corresponding to standard acoustic features such as formants, fundamental frequency (obtained by Praat software) as well as introducing new features based on the calculation of the energies in some specific frequency bands and their distributions (thanks to MATLAB codes). The extracted measurements are obtained from syllabic units’ consonant/vowel (CV) derived from Moroccan Arabic dialect emotional database (MADED) corpus. Thereafter, the data which has been collected is then trained by a k-nearest-neighbor (KNN) classifier to perform the automated recognition phase. The results reach 64.65% in the multi-class classification and 94.95% for classification between positive and negative emotions.
43

Wang, Jade, Trent Nicol, Erika Skoe, Mikko Sams, and Nina Kraus. "Emotion Modulates Early Auditory Response to Speech." Journal of Cognitive Neuroscience 21, no. 11 (November 2009): 2121–28. http://dx.doi.org/10.1162/jocn.2008.21147.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In order to understand how emotional state influences the listener's physiological response to speech, subjects looked at emotion-evoking pictures while 32-channel EEG evoked responses (ERPs) to an unchanging auditory stimulus (“danny”) were collected. The pictures were selected from the International Affective Picture System database. They were rated by participants and differed in valence (positive, negative, neutral), but not in dominance and arousal. Effects of viewing negative emotion pictures were seen as early as 20 msec (p = .006). An analysis of the global field power highlighted a time period of interest (30.4–129.0 msec) where the effects of emotion are likely to be the most robust. At the cortical level, the responses differed significantly depending on the valence ratings the subjects provided for the visual stimuli, which divided them into the high valence intensity group and the low valence intensity group. The high valence intensity group exhibited a clear divergent bivalent effect of emotion (ERPs at Cz during viewing neutral pictures subtracted from ERPs during viewing positive or negative pictures) in the time period of interest (rΦ = .534, p < .01). Moreover, group differences emerged in the pattern of global activation during this time period. Although both groups demonstrated a significant effect of emotion (ANOVA, p = .004 and .006, low valence intensity and high valence intensity, respectively), the high valence intensity group exhibited a much larger effect. Whereas the low valence intensity group exhibited its smaller effect predominantly in frontal areas, the larger effect in the high valence intensity group was found globally, especially in the left temporal areas, with the largest divergent bivalent effects (ANOVA, p < .00001) in high valence intensity subjects around the midline. Thus, divergent bivalent effects were observed between 30 and 130 msec, and were dependent on the subject's subjective state, whereas the effects at 20 msec were evident only for negative emotion, independent of the subject's behavioral responses. Taken together, it appears that emotion can affect auditory function early in the sensory processing stream.
44

Mustaqeem and Soonil Kwon. "CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network." Mathematics 8, no. 12 (November 30, 2020): 2133. http://dx.doi.org/10.3390/math8122133.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Artificial intelligence, deep learning, and machine learning are dominant sources to use in order to make a system smarter. Nowadays, the smart speech emotion recognition (SER) system is a basic necessity and an emerging research area of digital audio signal processing. However, SER plays an important role with many applications that are related to human–computer interactions (HCI). The existing state-of-the-art SER system has a quite low prediction performance, which needs improvement in order to make it feasible for the real-time commercial applications. The key reason for the low accuracy and the poor prediction rate is the scarceness of the data and a model configuration, which is the most challenging task to build a robust machine learning technique. In this paper, we addressed the limitations of the existing SER systems and proposed a unique artificial intelligence (AI) based system structure for the SER that utilizes the hierarchical blocks of the convolutional long short-term memory (ConvLSTM) with sequence learning. We designed four blocks of ConvLSTM, which is called the local features learning block (LFLB), in order to extract the local emotional features in a hierarchical correlation. The ConvLSTM layers are adopted for input-to-state and state-to-state transition in order to extract the spatial cues by utilizing the convolution operations. We placed four LFLBs in order to extract the spatiotemporal cues in the hierarchical correlational form speech signals using the residual learning strategy. Furthermore, we utilized a novel sequence learning strategy in order to extract the global information and adaptively adjust the relevant global feature weights according to the correlation of the input features. Finally, we used the center loss function with the softmax loss in order to produce the probability of the classes. The center loss increases the final classification results and ensures an accurate prediction as well as shows a conspicuous role in the whole proposed SER scheme. We tested the proposed system over two standard, interactive emotional dyadic motion capture (IEMOCAP) and ryerson audio visual database of emotional speech and song (RAVDESS) speech corpora, and obtained a 75% and an 80% recognition rate, respectively.
45

Chang, Xin, and Władysław Skarbek. "Multi-Modal Residual Perceptron Network for Audio–Video Emotion Recognition." Sensors 21, no. 16 (August 12, 2021): 5452. http://dx.doi.org/10.3390/s21165452.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Emotion recognition is an important research field for human–computer interaction. Audio–video emotion recognition is now attacked with deep neural network modeling tools. In published papers, as a rule, the authors show only cases of the superiority in multi-modality over audio-only or video-only modality. However, there are cases of superiority in uni-modality that can be found. In our research, we hypothesize that for fuzzy categories of emotional events, the within-modal and inter-modal noisy information represented indirectly in the parameters of the modeling neural network impedes better performance in the existing late fusion and end-to-end multi-modal network training strategies. To take advantage of and overcome the deficiencies in both solutions, we define a multi-modal residual perceptron network which performs end-to-end learning from multi-modal network branches, generalizing better multi-modal feature representation. For the proposed multi-modal residual perceptron network and the novel time augmentation for streaming digital movies, the state-of-the-art average recognition rate was improved to 91.4% for the Ryerson Audio–Visual Database of Emotional Speech and Song dataset and to 83.15% for the Crowd-Sourced Emotional Multi Modal Actors dataset. Moreover, the multi-modal residual perceptron network concept shows its potential for multi-modal applications dealing with signal sources not only of optical and acoustical types.
46

Basalaeva, Elena G., Elena Yu Bulygina, and Tatiana A. Tripolskaya. "Stylistic Qualification of Colloquial Vocabulary in the Database of Pragmatically Marked Vocabulary of the Russian Language." Voprosy leksikografii, no. 20 (2021): 5–22. http://dx.doi.org/10.17223/22274200/20/1.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The article focuses on the issue of the stylistic ranking (clarification of the system of stylistic labels) and correlation of stylistic labels with emotional and evaluative ones in the electronic lexicographic resource Database of Pragmatically Marked Vocabulary. The resource pursues the goal of the most complete lexicographic portraiture of the lexical meaning that contains pragmatic semantics (ideological, gender, national-cultural and emotional-evaluative). The macrostructure of the Database, criteria for the formation of the glossary, ways of interpreting the pragmatic content are described in the works of the participants in the lexicographic project E.G. Basalaeva, E.Yu. Bulygina, T.A. Tripolskaya. The material of the research is emotional-evaluative vocabulary (mrak, mychat’ (about human speech), vkalyvat’, uzhas, balbes, nytik, myam-lya, etc.) in need of special stylistic (colloquial, vernacular, rude, everyday, etc.) quali-fication. The authors relied on definitional, component and corpus analysis in the study. The study of the main explanatory dictionaries of the Russian language from the stand-point of the placement and correlation of stylistic and semantic labels allows obtaining the following information: 1) about lexicographic traditions relevant to the development of a new dictionary resource; 2) about possible dynamic processes in the field of stylis-tically marked vocabulary; 3) about the hybrid nature of stylistic and semantic labels: in some cases, stylistic labels (without emotional-evaluative ones) are designed to com-bine information about the sphere of use and emotional-evaluative semantics; and sometimes semantic labels, for example, contemptuous or abusive, serves as an indica-tor of colloquial or vernacular use. One of the primary tasks in the lexicography of these lexemes is the development of a possibly consistent system of stylistic and semantic labels. In the Database of Pragmatically Marked Vocabulary, the authors pro-pose to use semantic and stylistic labels in parallel in relation to each other. The created Database makes it possible to present a lexicographic portrait of a pragmatically marked word, the presence and interaction of pragmatic microcomponents, their dy-namics manifested in modern communication. Thus, the article analyzes the systems of vocabulary labels. The stylistic and semantic (emotional-evaluative) labels have been correlated, considering the seme and semantic variation of a pragmatically marked word; the algorithm for the lexicographying of emotional-evaluative semantics, which determines the stylistic characteristics of a word, has been developed.
47

Metallinou, Angeliki, Zhaojun Yang, Chi-chun Lee, Carlos Busso, Sharon Carnicke, and Shrikanth Narayanan. "The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations." Language Resources and Evaluation 50, no. 3 (April 17, 2015): 497–521. http://dx.doi.org/10.1007/s10579-015-9300-0.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Avdeev, Vladimir, Viktor Trushin, and Mihail Kungurov. "Unified Speech-Like Interference for Active Protection of Speech Information." Informatics and Automation 19, no. 5 (October 15, 2020): 991–1017. http://dx.doi.org/10.15622/ia.2020.19.5.4.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The paper considers the possibility of creating a speech-like interference for the means of vibro-acoustic protection of speech information based on tables of syllables and words of the Russian language. The choice of research directions and experimental conditions is substantiated: synthesis of sound files by random sampling of speech elements from a database, research of spectra of synthesized noise, algorithm for creating interference of the “speech choir” type, study of autocorrelation functions of synthesized speech-like interference, as well as their probability distribution density. It is shown that the spectral and statistical characteristics of the synthesized speech-like interference type "speech choir" of five voices are close to similar characteristics of real speech signals. At the same time, the speech choir was formed by averaging the instantaneous values of temporary realizations of sound files. It is shown that the spectral power density of the speech-like interference of the “speech choir” type practically is not changed with the number of averaged “voices” starting from five. The probability density distribution of the speech-like interference value with an increase in the number of voices in the “speech choir” approaches the normal law (unlike a real speech signal whose probability density is close to the Laplace distribution). Evaluation of the autocorrelation function gave a correlation interval of several milliseconds. The articulation tests of speech intelligibility using synthesized speech-like interference with different signal-to-noise ratios showed the possibility of reducing the integral noise level by 12-15 dB compared to noise-like interference. The dependencies of verbal intelligibility on the integral signal-to-noise ratio are constructed on the basis of polynomial and piecewise linear approximations. A preliminary assessment of a possible impact of speech-like interference on the psycho-emotional state of a person was performed. The direction of further research on increasing the efficiency of algorithms for generating speech-like interference is discussed.
49

Sepúlveda, Axel, Francisco Castillo, Carlos Palma, and Maria Rodriguez-Fernandez. "Emotion Recognition from ECG Signals Using Wavelet Scattering and Machine Learning." Applied Sciences 11, no. 11 (May 27, 2021): 4945. http://dx.doi.org/10.3390/app11114945.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Affect detection combined with a system that dynamically responds to a person’s emotional state allows an improved user experience with computers, systems, and environments and has a wide range of applications, including entertainment and health care. Previous studies on this topic have used a variety of machine learning algorithms and inputs such as audial, visual, or physiological signals. Recently, a lot of interest has been focused on the last, as speech or video recording is impractical for some applications. Therefore, there is a need to create Human–Computer Interface Systems capable of recognizing emotional states from noninvasive and nonintrusive physiological signals. Typically, the recognition task is carried out from electroencephalogram (EEG) signals, obtaining good accuracy. However, EEGs are difficult to register without interfering with daily activities, and recent studies have shown that it is possible to use electrocardiogram (ECG) signals for this purpose. This work improves the performance of emotion recognition from ECG signals using wavelet transform for signal analysis. Features of the ECG signal are extracted from the AMIGOS database using a wavelet scattering algorithm that allows obtaining features of the signal at different time scales, which are then used as inputs for different classifiers to evaluate their performance. The results show that the proposed algorithm for extracting features and classifying the signals obtains an accuracy of 88.8% in the valence dimension, 90.2% in arousal, and 95.3% in a two-dimensional classification, which is better than the performance reported in previous studies. This algorithm is expected to be useful for classifying emotions using wearable devices.
50

Reese, K., D. P. Terry, B. Maxwell, R. Zafonte, P. D. Berkner, and G. L. Iverson. "The Association Between Past Speech Therapy and Preseason Symptom Reporting in Adolescent Student Athletes." Archives of Clinical Neuropsychology 34, no. 5 (July 2019): 752. http://dx.doi.org/10.1093/arclin/acz026.22.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Purpose Neurodevelopmental conditions, such as ADHD, have been shown to be associated with different baseline symptom reporting, but the relationship between a history of speech therapy and symptom reporting is not well understood. This study examined the association between prior speech therapy and baseline symptom reporting in student athletes. Methods A preseason baseline database contained 40,378 athletes ages 13–18 who had not sustained a concussion in the past 6 months. Of these, 27,550 athletes denied having all developmental/health conditions (controls) and 1,497 reported only having history of speech therapy (total sample: age M=15.5, SD=1.26; 47% girls). Mann-Whitney U-Tests were used to compare baseline symptom reporting on the ImPACT® Post-Concussion Symptom Scale between athletes with prior speech therapy and controls. Individual symptoms were dichotomized (absent vs. present) and compared between groups using chi-square tests. Results There was a higher proportion of boys in the prior speech therapy group than in the control group (62% vs. 53%; X2=41.9, p<.001). Athletes with speech therapy histories reported greater overall baseline symptoms (ps<.001). The effect sizes were minimal-to-small (Cohen’s d: girls= 0.10; boys=0.20). Slightly higher portions of boys and girls with a history of speech therapy reported trouble falling asleep, fatigue, and difficulty concentrating/ remembering compared to their control counterparts (ps<.05). Further, compared to controls, a higher portion of boys (but not girls) with speech therapy histories reported physical and emotional symptoms. Conclusion Adolescents with speech therapy histories report slightly more symptoms than controls during baseline testing, with a stronger effect in boys. However, effect sizes were very small.

До бібліографії