Journal articles: 'Perceptual linear predictive'

1

Hermansky, Hynek. "Perceptual linear predictive (PLP) analysis of speech." Journal of the Acoustical Society of America 87, no. 4 (April 1990): 1738–52. http://dx.doi.org/10.1121/1.399423.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chen, Sai, Hong Cui Wang, Jia Jia, Ye Teng An, and Jian Wu Dang. "Comparison of Mel Frequency Ceptrum Coefficient and Perceptual Linear Predictive in Perceptual Measurement of Chinese Initials." Applied Mechanics and Materials 411-414 (September 2013): 291–97. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.291.

Full text

Abstract:

Many works have been done in the methods of improving performance by proposing new speech characteristics and new perception measurements. However, they only focus on one of the two aspects. In this paper, we try to study the relationship between them. That is, we discuss which acoustic features or their combinations are the most consistent with the real perception of Chinese initials. We propose a method that can measure the acoustic distance and keep it monotonically related to the perceptual distance of Chinese initials. We first define the acoustic distance and perceptual distance between different Chinese initials, and single out a proper combination of acoustic features and two compatible distance metrics by conducting clustering analysis on the samples of all types of Chinese initials using MFCC and PLP. Based on the data provided by the General Hospital of the People's Liberation Army, we then calculate the acoustic distance and perceptual distance. Finally, we calculate the Spearman's rho between two types of distance corresponding to the two calculation method. The experiment results show that there is a relatively high strength of monotonic relationship with the selected acoustic features between two types of distance.

APA, Harvard, Vancouver, ISO, and other styles

3

Kowler, Eileen, Jason F. Rubinstein, Elio M. Santos, and Jie Wang. "Predictive Smooth Pursuit Eye Movements." Annual Review of Vision Science 5, no. 1 (September 15, 2019): 223–46. http://dx.doi.org/10.1146/annurev-vision-091718-014901.

Full text

Abstract:

Smooth pursuit eye movements maintain the line of sight on smoothly moving targets. Although often studied as a response to sensory motion, pursuit anticipates changes in motion trajectories, thus reducing harmful consequences due to sensorimotor processing delays. Evidence for predictive pursuit includes ( a) anticipatory smooth eye movements (ASEM) in the direction of expected future target motion that can be evoked by perceptual cues or by memory for recent motion, ( b) pursuit during periods of target occlusion, and ( c) improved accuracy of pursuit with self-generated or biologically realistic target motions. Predictive pursuit has been linked to neural activity in the frontal cortex and in sensory motion areas. As behavioral and neural evidence for predictive pursuit grows and statistically based models augment or replace linear systems approaches, pursuit is being regarded less as a reaction to immediate sensory motion and more as a predictive response, with retinal motion serving as one of a number of contributing cues.

APA, Harvard, Vancouver, ISO, and other styles

4

Lange, Elke B., and Klaus Frieler. "Challenges and Opportunities of Predicting Musical Emotions with Perceptual and Automatized Features." Music Perception 36, no. 2 (December 1, 2018): 217–42. http://dx.doi.org/10.1525/mp.2018.36.2.217.

Full text

Abstract:

Music information retrieval (MIR) is a fast-growing research area. One of its aims is to extract musical characteristics from audio. In this study, we assumed the roles of researchers without further technical MIR experience and set out to test in an exploratory way its opportunities and challenges in the specific context of musical emotion perception. Twenty sound engineers rated 60 musical excerpts from a broad range of styles with respect to 22 spectral, musical, and cross-modal features (perceptual features) and perceived emotional expression. In addition, we extracted 86 features (acoustic features) of the excerpts with the MIRtoolbox (Lartillot & Toiviainen, 2007). First, we evaluated the perceptual and extracted acoustic features. Both perceptual and acoustic features posed statistical challenges (e.g., perceptual features were often bimodally distributed, and acoustic features highly correlated). Second, we tested the suitability of the acoustic features for modeling perceived emotional content. Four nearly disjunctive feature sets provided similar results, implying a certain arbitrariness of feature selection. We compared the predictive power of perceptual and acoustic features using linear mixed effects models, but the results were inconclusive. We discuss critical points and make suggestions to further evaluate MIR tools for modeling music perception and processing.

APA, Harvard, Vancouver, ISO, and other styles

5

Ding, Jian Li, and Yong Yang. "Automatic Recognition of Aircraft Noise with PLP Method." Applied Mechanics and Materials 160 (March 2012): 145–49. http://dx.doi.org/10.4028/www.scientific.net/amm.160.145.

Full text

Abstract:

This paper proposes a modified auditory feature extraction algorithm based on perceptual linear predictive analysis which is more suitable for automatic recognition of aircraft noise. In this algorithm, a different distribution of filter-bank is introduced in order to fit the physical characteristic of aircraft noise and the result shows that the modified method indeed performs better. The effect of Gammatone filter in improving the robustness of recognition algorithm is also demonstrated in the experiment.

APA, Harvard, Vancouver, ISO, and other styles

6

Habeck, Christian, Qolamreza Razlighi, and Yaakov Stern. "Predictive utility of task-related functional connectivity vs. voxel activation." PLOS ONE 16, no. 4 (April 8, 2021): e0249947. http://dx.doi.org/10.1371/journal.pone.0249947.

Full text

Abstract:

Functional connectivity, both in resting state and task performance, has steadily increased its share of neuroimaging research effort in the last 1.5 decades. In the current study, we investigated the predictive utility regarding behavioral performance and task information for 240 participants, aged 20–77, for both voxel activation and functional connectivity in 12 cognitive tasks, belonging to 4 cognitive reference domains (Episodic Memory, Fluid Reasoning, Perceptual Speed, and Vocabulary). We also added a model only comprising brain-structure information not specifically acquired during performance of a cognitive task. We used a simple brain-behavioral prediction technique based on Principal Component Analysis (PCA) and regression and studied the utility of both modalities in quasi out-of-sample predictions, using split-sample simulations (= 5-fold Monte Carlo cross validation) with 1,000 iterations for which a regression model predicting a cognitive outcome was estimated in a training sample, with a subsequent assessment of prediction success in a non-overlapping test sample. The sample assignments were identical for functional connectivity, voxel activation, and brain structure, enabling apples-to-apples comparisons of predictive utility. All 3 models that were investigated included the demographic covariates age, gender, and years of education. A minimal reference model using simple linear regression with just these 3 covariates was included for comparison as well and was evaluated with the same resampling scheme as described above. Results of the comparison between voxel activation and functional connectivity were mixed and showed some dependency on cognitive outcome; however, mean differences in predictive utility between voxel activation and functional connectivity were rather small in terms of within-modality variability or predictive success. More notably, only in the case of Fluid Reasoning did concurrent functional neuroimaging provided compelling about cognitive performance beyond structural brain imaging or the minimal reference model.

APA, Harvard, Vancouver, ISO, and other styles

7

Al Mahmud, Nahyan, and Shahfida Amjad Munni. "Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition." International journal of Multimedia & Its Applications 12, no. 5 (October 30, 2020): 1–8. http://dx.doi.org/10.5121/ijma.2020.12501.

Full text

Abstract:

The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.

APA, Harvard, Vancouver, ISO, and other styles

8

Chappell, Whitney. "Phonological (in)visibility." Journal of Second Language Pronunciation 5, no. 3 (May 6, 2019): 435–63. http://dx.doi.org/10.1075/jslp.17034.cha.

Full text

Abstract:

Abstract Reduced vowels between obstruents and rhotics are durationally variable and phonologically invisible in Spanish, e.g. p ə rado ‘field’ as /pɾ/. The present study compares L1-Spanish speakers, English monolinguals, and L2-Spanish learners’ perceptual boundaries for reduced vowels in Spanish. A native speaker produced 70 Spanish nonce words with word-initial obstruent + vowel + flap sequences, and the duration of each vowel was manipulated from 100% to 75%, 50%, and 25% of its original duration. To determine whether these groups perceive variably reduced vowels as phonologically visible, 78 listeners counted the number of syllables perceived in 280 target audio files. Linear regression models fitted to 21,436 responses indicate that English monolinguals apply an L1 perceptual strategy, but L2-Spanish learners have shifted their perceptual boundaries. The study concludes that the perception of highly variable acoustic information becomes more native-like with greater L2 proficiency, while age of acquisition is less predictive of native-like perception.

APA, Harvard, Vancouver, ISO, and other styles

9

Krause, Bryan M., and Geoffrey M. Ghose. "Micropools of reliable area MT neurons explain rapid motion detection." Journal of Neurophysiology 120, no. 5 (November 1, 2018): 2396–409. http://dx.doi.org/10.1152/jn.00845.2017.

Full text

Abstract:

Many models of perceptually based decisions postulate that actions are initiated when accumulated sensory signals reach a threshold level of activity. These models have received considerable neurophysiological support from recordings of individual neurons while animals are engaged in motion discrimination tasks. These experiments have found that the activity of neurons in a particular visual area strongly associated with motion processing (MT), when pooled over hundreds of milliseconds, is sufficient to explain behavioral timing and performance. However, this level of pooling may be problematic for urgent perceptual decisions in which rapid detection dictates temporally precise integration. In this paper, we explore the physiological basis of one such task in which macaques detected brief (~70 ms) transients of coherent motion within ~240 ms. We find that a simple linear summation model based on realistic stimulus responses of as few as 40 correlated neurons can predict the reliability and timing of rapid motion detection. The model naturally reproduces a distinctive physiological relationship observed in rapid detection tasks in which the individual neurons with the most reliable stimulus responses are also the most predictive of impending behavioral choices. Remarkably, we observed this relationship across our simulated neuronal populations even when all neurons within the pool were weighted equally with respect to readout. These results demonstrate that small numbers of reliable sensory neurons can dominate perceptual judgments without any explicit reliability based weighting and are sufficient to explain the accuracy, latency, and temporal precision of rapid detection. NEW & NOTEWORTHY Computational and psychophysical models suggest that performance in many perceptual tasks may be based on the preferential sampling of reliable neurons. Recent studies of MT neurons during rapid motion detection, in which only those neurons with the most reliable sensory responses were strongly predictive of the animals’ decisions, seemingly support this notion. Here we show that a simple threshold model without explicit reliability biases can explain both the behavioral accuracy and precision of these detections and the distribution of sensory- and choice-related signals across neurons.

APA, Harvard, Vancouver, ISO, and other styles

10

Xue, Yuqun, Zhijiu Zhu, Jianhua Jiang, Yi Zhan, Zenghui Yu, Xiaohua Fan, and Shushan Qiao. "Fast Computation of LSP Frequencies Using the Bairstow Method." Electronics 9, no. 3 (February 26, 2020): 387. http://dx.doi.org/10.3390/electronics9030387.

Full text

Abstract:

Linear prediction is the kernel technology in speech processing. It has been widely applied in speech recognition, synthesis, and coding, and can efficiently and correctly represent the speech frequency spectrum with only a few parameters. Line Spectrum Pairs (LSPs) frequencies, as an alternative representation of Linear Predictive Coding (LPC), have the advantages of good quantization accuracy and low spectral sensitivity. However, computing the LSPs frequencies takes a long time. To address this issue, a fast computation algorithm, based on the Bairstow method for computing LSPs frequencies from linear prediction coefficients, is proposed in this paper. The algorithm process first transforms the symmetric and antisymmetric polynomial to general polynomial, then extracts the polynomial roots. Associated with the short-term stationary property of speech signal, an adaptive initial method is applied to reduce the average iteration numbers by 26%, as compared to the statics in the initial method, with a Perceptual Evaluation of Speech Quality (PESQ) score reaching 3.46. Experimental results show that the proposed method can extract the polynomial roots efficiently and accurately with significantly reduced computation complexity. Compared to previous works, the proposed method is 17 times faster than Tschirnhus Transform, and has a 22% PESQ improvement on the Birge-Vieta method with an almost comparable computation time.

APA, Harvard, Vancouver, ISO, and other styles

11

Guedry, F. E., A. H. Rupert, B. J. McGrath, and C. M. Qman. "The Dynamics of Spatial Orientation During Complex and Changing Linear and Angular Acceleration1." Journal of Vestibular Research 2, no. 4 (October 1, 1992): 259–83. http://dx.doi.org/10.3233/ves-1992-2402.

Full text

Abstract:

The dynamics of spatial orientation perception were examined in a series of experiments in which a total of 43 subjects were passively exposed to various combinations of linear and angular acceleration during centrifuge runs. Perceptual effects during deceleration were much stronger than effects during acceleration. The dynamics of spatial orientation perception differed substantially from changes in the vestibulo-ocular reflex (VOR). VOR was fairly well predicted by a current model, but our experiments revealed perceived change in attitude (roll, pitch, yaw tilt position in space) and perceived angular velocity in space that was not reflected by parallel changes in the plane or magnitude of the VOR. This series of experiments establishes several facts concerning spatial orientation perception beyond the predictive domain of any current model. New concepts are needed and several are suggested to deal with changing reactions to complex combinations of linear and angular accelerations.

APA, Harvard, Vancouver, ISO, and other styles

12

Ma, Changxue, and D. O’Shaughnessy. "A perceptual study of source coding of Fourier phase and amplitude of the linear predictive coding residual of vowel sounds." Journal of the Acoustical Society of America 95, no. 4 (April 1994): 2231–39. http://dx.doi.org/10.1121/1.408683.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Kaur, Gurpreet, Mohit Srivastava, and Amod Kumar. "Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks." Journal of Telecommunications and Information Technology 2 (June 29, 2018): 23–31. http://dx.doi.org/10.26636/jtit.2018.119617.

Full text

Abstract:

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coeﬃcient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

APA, Harvard, Vancouver, ISO, and other styles

14

Deng, Lei, and Yong Gao. "Gammachirp Filter Banks Applied in Roust Speaker Recognition Based GMM-UBM Classifier." International Arab Journal of Information Technology 17, no. 2 (February 28, 2019): 170–77. http://dx.doi.org/10.34028/iajit/17/2/4.

Full text

Abstract:

In this paper, authors propose an auditory feature extraction algorithm in order to improve the performance of the speaker recognition system in noisy environments. In this auditory feature extraction algorithm, the Gammachirp filter bank is adapted to simulate the auditory model of human cochlea. In addition, the following three techniques are applied: cube-root compression method, Relative Spectral Filtering Technique (RASTA), and Cepstral Mean and Variance Normalization algorithm (CMVN).Subsequently, based on the theory of Gaussian Mixes Model-Universal Background Model (GMM-UBM), the simulated experiment was conducted. The experimental results implied that speaker recognition systems with the new auditory feature has better robustness and recognition performance compared to Mel-Frequency Cepstral Coefficients(MFCC), Relative Spectral-Perceptual Linear Predictive (RASTA-PLP),Cochlear Filter Cepstral Coefficients (CFCC) and gammatone Frequency Cepstral Coefficeints (GFCC)

APA, Harvard, Vancouver, ISO, and other styles

15

Murton, Olivia, Robert Hillman, and Daryush Mehta. "Cepstral Peak Prominence Values for Clinical Voice Evaluation." American Journal of Speech-Language Pathology 29, no. 3 (August 4, 2020): 1596–607. http://dx.doi.org/10.1044/2020_ajslp-20-00001.

Full text

Abstract:

Purpose The goal of this study was to employ frequently used analysis methods and tasks to identify values for cepstral peak prominence (CPP) that can aid clinical voice evaluation. Experiment 1 identified CPP values to distinguish speakers with and without voice disorders. Experiment 2 was an initial attempt to estimate auditory-perceptual ratings of overall dysphonia severity using CPP values. Method CPP was computed using the Analysis of Dysphonia in Speech and Voice (ADSV) program and Praat. Experiment 1 included recordings from 295 patients with medically diagnosed voice disorders and 50 vocally healthy control speakers. Speakers produced sustained /a/ vowels and the English language Rainbow Passage. CPP cutoff values that best distinguished patient and control speakers were identified. Experiment 2 analyzed recordings from 32 English speakers with varying dysphonia severity and provided preliminary validation of the Experiment 1 cutoffs. Speakers sustained the /a/ vowel and read four sentences from the Consensus Auditory-Perceptual Evaluation of Voice protocol. Trained listeners provided auditory-perceptual ratings of overall dysphonia for the recordings, which were estimated using CPP values in a linear regression model whose performance was evaluated using the coefficient of determination ( r 2 ). Results Experiment 1 identified CPP cutoff values of 11.46 dB (ADSV) and 14.45 dB (Praat) for the sustained /a/ vowels and 6.11 dB (ADSV) and 9.33 dB (Praat) for the Rainbow Passage. CPP values below those thresholds indicated the presence of a voice disorder with up to 94.5% accuracy. In Experiment 2, CPP values estimated ratings of overall dysphonia with r 2 values up to .74. Conclusions The CPP cutoff values identified in Experiment 1 provide normative reference points for clinical voice evaluation based on sustained /a/ vowels and the Rainbow Passage. Experiment 2 provides an initial predictive framework that can be used to relate CPP values to the auditory perception of overall dysphonia severity based on sustained /a/ vowels and Consensus Auditory-Perceptual Evaluation of Voice sentences.

APA, Harvard, Vancouver, ISO, and other styles

16

Davies-Venn, Evelyn, and Pamela Souza. "The Role of Spectral Resolution, Working Memory, and Audibility in Explaining Variance in Susceptibility to Temporal Envelope Distortion." Journal of the American Academy of Audiology 25, no. 06 (June 2014): 592–604. http://dx.doi.org/10.3766/jaaa.25.6.9.

Full text

Abstract:

Background: Several studies have shown that hearing thresholds alone cannot adequately predict listeners’ success with hearing-aid amplification. Furthermore, previous studies have shown marked differences in listeners’ susceptibility to distortions introduced by certain nonlinear amplification parameters. Purpose: The purpose of this study was to examine the role of spectral resolution, working memory, and audibility in explaining perceptual susceptibility to temporal envelope and other hearing-aid compression-induced distortions for listeners with mild to moderate and moderate to severe hearing loss. Research Design: A between-subjects repeated-measures design was used to compare speech recognition scores with linear versus compression amplification, for listeners with mild to moderate and moderate to severe hearing loss. Study Sample: The study included 15 adult listeners with mild to moderate hearing loss and 13 adults with moderate to severe hearing loss. Data Collection/Analysis: Speech recognition scores were measured for vowel-consonant-vowel syllables processed with linear, moderate compression, and extreme compression amplification. Perceptual susceptibility to compression-induced temporal envelope distortion was defined as the difference in scores between linear and compression amplification. Both overall scores and consonant feature scores (i.e., place, manner, and voicing) were analyzed. Narrowband spectral resolution was measured using individual measures of auditory filter bandwidth at 2000 Hz. Working memory was measured using the reading span test. Signal audibility was quantified using the Aided Audibility Index. Multiple linear regression was used to determine the predictive role of spectral resolution, working memory, and audibility benefit on listeners’ susceptibility to compression-induced distortions. Results: For all listeners, spectral resolution, working memory, and audibility benefit were significant predictors of overall distortion scores. For listeners with moderate to severe hearing loss, spectral resolution and audibility benefit predicted distortion scores for consonant place and manner of articulation features, and audibility benefit predicted distortion scores for consonant voicing features. For listeners with mild to moderate hearing loss, the model did not predict distortion scores for overall or consonant feature scores. Conclusions: The results from this study suggest that when audibility is adequately controlled, measures of spectral resolution may identify the listeners who are most susceptible to compression-induced distortions. Working memory appears to modulate the negative effect of these distortions for listeners with moderate to severe hearing loss.

APA, Harvard, Vancouver, ISO, and other styles

17

Meliza, C. Daniel, Zhiyi Chi, and Daniel Margoliash. "Representations of Conspecific Song by Starling Secondary Forebrain Auditory Neurons: Toward a Hierarchical Framework." Journal of Neurophysiology 103, no. 3 (March 2010): 1195–208. http://dx.doi.org/10.1152/jn.00464.2009.

Full text

Abstract:

The functional organization giving rise to stimulus selectivity in higher-order auditory neurons remains under active study. We explored the selectivity for motifs, spectrotemporally distinct perceptual units in starling song, recording the responses of 96 caudomedial mesopallium (CMM) neurons in European starlings ( Sturnus vulgaris) under awake-restrained and urethane-anesthetized conditions. A subset of neurons was highly selective between motifs. Selectivity was correlated with low spontaneous firing rates and high spike timing precision, and all but one of the selective neurons had similar spike waveforms. Neurons were further tested with stimuli in which the notes comprising the motifs were manipulated. Responses to most of the isolated notes were similar in amplitude, duration, and temporal pattern to the responses elicited by those notes in the context of the motif. For these neurons, we could accurately predict the responses to motifs from the sum of the responses to notes. Some notes were suppressed by the motif context, such that removing other notes from motifs unmasked additional excitation. Models of linear summation of note responses consistently outperformed spectrotemporal receptive field models in predicting responses to song stimuli. Tests with randomized sequences of notes confirmed the predictive power of these models. Whole notes gave better predictions than did note fragments. Thus in CMM, auditory objects (motifs) can be represented by a linear combination of excitation and suppression elicited by the note components of the object. We hypothesize that the receptive fields arise from selective convergence by inputs responding to specific spectrotemporal features of starling notes.

APA, Harvard, Vancouver, ISO, and other styles

18

Wolff, Wanja, Julia Schüler, Jonas Hofstetter, Lorena Baumann, Lena Wolf, and Christian Dettmers. "Trait Self-Control Outperforms Trait Fatigue in Predicting MS Patients’ Cortical and Perceptual Responses to an Exhaustive Task." Neural Plasticity 2019 (April 24, 2019): 1–10. http://dx.doi.org/10.1155/2019/8527203.

Full text

Abstract:

Patients with multiple sclerosis (PwMS) frequently suffer from fatigue, but this debilitating symptom is not yet fully understood. We propose that self-control can be conceptually and mechanistically linked to the fatigue concept and might help explain some of the diversity on how PwMS who suffer from fatigue deal with this symptom. To test this claim, we first assessed how cortical oxygenation and measures of motor and cognitive state fatigue change during a strenuous physical task, and then we tested the predictive validity of trait fatigue and trait self-control in explaining the observed changes. A sample of N=51 PwMS first completed a test battery to collect trait measures of fatigue and self-control. PwMS then performed an isometric hand contraction task at 10% of their maximum voluntary contraction until exhaustion while we repeatedly assessed ratings of perceived cognitive and motor exertion. In addition, we continuously measured oxygenation of the prefrontal cortex (PFC) using functional near-infrared spectroscopy. Linear mixed-effect models revealed significant increases in perceived motor and cognitive exertion, as well as increases in PFC oxygenation. Hierarchical stepwise regression analyses showed that higher trait self-control predicted a less steep increase in PFC oxygenation and perceived cognitive exertion, while trait fatigue did not predict change in any dependent variable. These results provide preliminary evidence for the suggested link between self-control and fatigue. As self-control can be enhanced with training, this finding possibly has important implications for devising nonpharmacological interventions to help patients deal with symptoms of fatigue.

APA, Harvard, Vancouver, ISO, and other styles

19

Duncan, E. Susan, Neila J. Donovan, and Seyed Ahmad Sajjadi. "Clinical Assessment of Characteristics of Apraxia of Speech in Primary Progressive Aphasia." American Journal of Speech-Language Pathology 29, no. 1S (February 21, 2020): 485–97. http://dx.doi.org/10.1044/2019_ajslp-cac48-18-0225.

Full text

Abstract:

Purpose We sought to examine interrater reliability in clinical assessment of apraxia of speech (AOS) in individuals with primary progressive aphasia and to identify speech characteristics predictive of AOS diagnosis. Method Fifty-two individuals with primary progressive aphasia were recorded performing a variety of speech tasks. These recordings were viewed by 2 experienced speech-language pathologists, who independently rated them on the presence and severity of AOS as well as 14 associated speech characteristics. We calculated interrater reliability (percent agreement and Cohen's kappa) for these ratings. For each rater, we used stepwise regression to identify speech characteristics significantly predictive of AOS diagnosis. We used the overlap between raters to create a more parsimonious model, which we evaluated with multiple linear regression. Results Results yielded high agreement on the presence (90%) and severity of AOS (weighted Cohen's κ = .834) but lower agreement for specific speech characteristics (weighted Cohen's κ ranging from .036 to .582). Stepwise regression identified 2 speech characteristics predictive of AOS diagnosis for both raters (articulatory groping and increased errors with increased length/complexity). These alone accounted for ≥ 50% of the variance of AOS severity in the constrained model. Conclusions Our study adds to a growing body of research that highlights the difficulty in objective clinical characterization of AOS and perceptual characterization of speech features. It further supports the need for consensus diagnostic criteria with standardized testing tools and for the identification and validation of objective markers of AOS. Additionally, these findings underscore the need for a training protocol if diagnostic tools are to be effective when shared beyond the research teams that develop and test them and disseminated to practicing speech-language pathologists, in order to ensure consistent application.

APA, Harvard, Vancouver, ISO, and other styles

20

Stuart, Andrew, and Emma R. Daughtrey. "On the Relationship Between Musicianship and Contralateral Suppression of Transient-Evoked Otoacoustic Emissions." Journal of the American Academy of Audiology 27, no. 04 (April 2016): 333–44. http://dx.doi.org/10.3766/jaaa.15057.

Full text

Abstract:

Background: The medial olivocochlear (MOC) efferent reflex that modulates outer hair cell function has been shown to be more robust in musicians versus nonmusicians as evidenced in greater contralateral suppression of transient-evoked otoacoustic emissions (TEOAEs). All previous research comparing musical ability and MOC efferent strength has defined musicianship dichotomously (i.e., high-level music students or professional classical musicians versus nonmusicians). Purpose: The objective of the study was to further explore contralateral suppression of TEOAEs among adults with a full spectrum of musicianship ranging from no history of musicianship to professional musicians. Musicianship was defined by both self-report and with an objective test to quantify individual differences in perceptual music skills. Research Design: A single-factor between-subjects and correlational research designs were employed. Study Sample: Forty-five normal-hearing young adults participated. Data Collection and Analysis: Participants completed a questionnaire concerning their music experience and completed the Brief Profile of Music Perception Skills (PROMS) to quantify perceptual musical skills across multiple musical domains (i.e., accent, melody, tempo, and tuning). TEOAEs were evaluated with 60 dB peak equivalent sound pressure level click stimuli with and without a contralateral 65 dB sound pressure level white noise suppressor. TEOAE suppression was expressed in two ways, absolute TEOAE suppression in dB and a normalized index of TEOAE suppression (i.e., percentage of suppression). Results: Participants who considered themselves musicians scored significantly higher on all subscales and total Brief PROMS score (p < 0.05). There was no statistically significant difference between musicians and nonmusicians in absolute TEOAE suppression or percentage of TEOAE suppression (p > 0.05). There were no statistically significant correlations or linear predictive relationships between subscale or total Brief PROMS scores with absolute and percentage of TEOAE suppression (p > 0.05). Conclusions: The findings do not support the notion of a graded enhancement of MOC efferent suppression among adults with varied degrees of musicianship from nonmusicians to professional musicians.

APA, Harvard, Vancouver, ISO, and other styles

21

Qi, Yingyong, and Robert A. Fox. "Analysis of nasal consonants using perceptual linear prediction." Journal of the Acoustical Society of America 91, no. 3 (March 1992): 1718–26. http://dx.doi.org/10.1121/1.402451.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Guo, Feng Hua. "Haptic Data Compression Based on Linear Prediction." Advanced Materials Research 712-715 (June 2013): 2712–15. http://dx.doi.org/10.4028/www.scientific.net/amr.712-715.2712.

Full text

Abstract:

A new linear prediction-based haptic data reduction technique is presented. The prediction approach relies on the least-squares method to reduce the number of data packets. Knowledge from human haptic perception is incorporated into the architecture to assess the perceptual quality of the compressed haptic signals. Experiments prove the effectiveness of the proposed approach in data reduction rate.

APA, Harvard, Vancouver, ISO, and other styles

23

Clemins, Patrick J., and Michael T. Johnson. "Generalized perceptual linear prediction features for animal vocalization analysis." Journal of the Acoustical Society of America 120, no. 1 (July 2006): 527–34. http://dx.doi.org/10.1121/1.2203596.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Tamazin, Mohamed, Ahmed Gouda, and Mohamed Khedr. "Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients." Applied Sciences 9, no. 10 (May 27, 2019): 2166. http://dx.doi.org/10.3390/app9102166.

Full text

Abstract:

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades in the presence of noisy environments. Developing a robust ASR system that can work in real-world noise and other acoustic distorting conditions is an attractive research topic. Many advanced algorithms have been developed in the literature to deal with this problem; most of these algorithms are based on modeling the behavior of the human auditory system with perceived noisy speech. In this research, the power-normalized cepstral coefficient (PNCC) system is modified to increase robustness against the different types of environmental noises, where a new technique based on gammatone channel filtering combined with channel bias minimization is used to suppress the noise effects. The TIDIGITS database is utilized to evaluate the performance of the proposed system in comparison to the state-of-the-art techniques in the presence of additive white Gaussian noise (AWGN) and seven different types of environmental noises. In this research, one word is recognized from a set containing 11 possibilities only. The experimental results showed that the proposed method provides significant improvements in the recognition accuracy at low signal to noise ratios (SNR). In the case of subway noise at SNR = 5 dB, the proposed method outperforms the mel-frequency cepstral coefficient (MFCC) and relative spectral (RASTA)–perceptual linear predictive (PLP) methods by 55% and 47%, respectively. Moreover, the recognition rate of the proposed method is higher than the gammatone frequency cepstral coefficient (GFCC) and PNCC methods in the case of car noise. It is enhanced by 40% in comparison to the GFCC method at SNR 0dB, while it is improved by 20% in comparison to the PNCC method at SNR −5dB.

APA, Harvard, Vancouver, ISO, and other styles

25

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." International Journal of Synthetic Emotions 7, no. 1 (January 2016): 58–68. http://dx.doi.org/10.4018/ijse.2016010105.

Full text

Abstract:

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

APA, Harvard, Vancouver, ISO, and other styles

26

Li, Guan Yu, Hong Zhi Yu, Yong Hong Li, and Ning Ma. "Features Extraction for Lhasa Tibetan Speech Recognition." Applied Mechanics and Materials 571-572 (June 2014): 205–8. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.205.

Full text

Abstract:

Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.

APA, Harvard, Vancouver, ISO, and other styles

27

Cutzu, Florin, and Michael Tarr. "Inferring Perceptual Saliency Fields from Viewpoint-Dependent Recognition Data." Neural Computation 11, no. 6 (August 1, 1999): 1331–48. http://dx.doi.org/10.1162/089976699300016269.

Full text

Abstract:

We present an algorithm for computing the relative perceptual saliencies of the features of a three-dimensional object using either goodness-of-view scores measured at several viewpoints or perceptual similarities among several object views. This technique addresses the inverse, illposed version of the direct problem of predicting goodness-of-view scores or view point similarities when the object features are known. On the basis of a linear model for the direct problem, we solve the inverse problem using the method of regularization. The critical assumption we make to regularize the solution is that perceptual salience varies slowly on the surface of the object. The salient regions derived using this assumption empirically indicate what object structures are important in human three-dimensional object perception, a domain where theories typically have been based on somewhat ad hoc features.

APA, Harvard, Vancouver, ISO, and other styles

28

Gaballah, Amr, Vijay Parsa, Daryn Cushnie-Sparrow, and Scott Adams. "Improved Estimation of Parkinsonian Vowel Quality through Acoustic Feature Assimilation." Scientific World Journal 2021 (July 14, 2021): 1–11. http://dx.doi.org/10.1155/2021/6076828.

Full text

Abstract:

This paper investigated the performance of a number of acoustic measures, both individually and in combination, in predicting the perceived quality of sustained vowels produced by people impaired with Parkinson’s disease (PD). Sustained vowel recordings were collected from 51 PD patients before and after the administration of the Levodopa medication. Subjective ratings of the overall vowel quality were garnered using a visual analog scale. These ratings served to benchmark the effectiveness of the acoustic measures. Acoustic predictors of the perceived vowel quality included the harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPP), recurrence period density entropy (RPDE), Gammatone frequency cepstral coefficients (GFCCs), linear prediction (LP) coefficients and their variants, and modulation spectrogram features. Linear regression (LR) and support vector regression (SVR) models were employed to assimilate multiple features. Different feature dimensionality reduction methods were investigated to avoid model overfitting and enhance the prediction capabilities for the test dataset. Results showed that the RPDE measure performed the best among all individual features, while a regression model incorporating a subset of features produced the best overall correlation of 0.80 between the predicted and actual vowel quality ratings. This model may therefore serve as a surrogate for auditory-perceptual assessment of Parkinsonian vowel quality. Furthermore, the model may offer the clinician a tool to predict who may benefit from Levodopa medication in terms of enhanced voice quality.

APA, Harvard, Vancouver, ISO, and other styles

29

Clemins, Patrick J., and Michael T. Johnson. "Frequency weighting in the feature extraction process: Effects of parameter choice on generalized perceptual linear prediction coefficients." Journal of the Acoustical Society of America 118, no. 3 (September 2005): 2019. http://dx.doi.org/10.1121/1.4785744.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Lalitha, S., and Deepa Gupta. "An Encapsulation of Vital Non-Linear Frequency Features for Various Speech Applications." Journal of Computational and Theoretical Nanoscience 17, no. 1 (January 1, 2020): 303–7. http://dx.doi.org/10.1166/jctn.2020.8666.

Full text

Abstract:

Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual linear prediction coefficients (PLPCs) are widely casted nonlinear vocal parameters in majority of the speaker identification, speaker and speech recognition techniques as well in the field of emotion recognition. Post 1980s, significant exertions are put forth on for the progress of these features. Considerations like the usage of appropriate frequency estimation approaches, proposal of appropriate filter banks, and selection of preferred features perform a vital part for the strength of models employing these features. This article projects an overview of MFCC and PLPC features for different speech applications. The insights such as performance metrics of accuracy, background environment, type of data, and size of features are inspected and concise with the corresponding key references. Adding more to this, the advantages and shortcomings of these features have been discussed. This background work will hopefully contribute to floating a heading step in the direction of the enhancement of MFCC and PLPC with respect to novelty, raised levels of accuracy, and lesser complexity.

APA, Harvard, Vancouver, ISO, and other styles

31

Eringis, Deividas, and Gintautas Tamulevičius. "Improving Speech Recognition Rate through Analysis Parameters." Electrical, Control and Communication Engineering 5, no. 1 (May 1, 2014): 61–66. http://dx.doi.org/10.2478/ecce-2014-0009.

Full text

Abstract:

Abstract Speech signal is redundant and non-stationary by nature. Because of vocal tract inertness these variations are not very rapid and the signal can be considered as stationary in short segments. It is presumed that in short-time magnitude spectrum the most distinct information of speech is contained. This is the main reason for speech signal analysis in frame-by-frame manner. The analyzed speech signal is segmented into overlapping segments (so-called frames) for this purpose. Segments of 15-25 ms with the overlap of 10-15 ms are used usually. In this paper we present results of our investigation of analysis window length and frame shift influence on speech recognition rate. We have analyzed three different cepstral analysis approaches for this purpose: mel frequency cepstral analysis (MFCC), linear prediction cepstral analysis (LPCC) and perceptual linear prediction cepstral analysis (PLPC). The highest speech recognition rate was obtained using 10 ms length analysis window with the frame shift varying from 7.5 to 10 ms (regardless of analysis type). The highest increase of recognition rate was 2.5 %.

APA, Harvard, Vancouver, ISO, and other styles

32

Upadhya, Savitha S., A. N. Cheeran, and J. H. Nirmal. "Multitaper perceptual linear prediction features of voice samples to discriminate healthy persons from early stage Parkinson diseased persons." International Journal of Speech Technology 21, no. 3 (November 15, 2017): 391–99. http://dx.doi.org/10.1007/s10772-017-9473-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

R, Thiruven Gatanadhan. "Speech/music classification using PLP and SVM." International Journal of Engineering and Computer Science 8, no. 02 (February 14, 2019): 24469–72. http://dx.doi.org/10.18535/ijecs.v8i02.4277.

Full text

Abstract:

Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. This paper deals with the Speech/Music classification problem, starting from a set of features extracted directly from audio data. Automatic audio classification is very useful in audio indexing; content based audio retrieval and online audio distribution. The accuracy of the classification relies on the strength of the features and classification scheme. In this work Perceptual Linear Prediction (PLP) features are extracted from the input signal. After feature extraction, classification is carried out, using Support Vector Model (SVM) model. The proposed feature extraction and classification models results in better accuracy in speech/music classification.

APA, Harvard, Vancouver, ISO, and other styles

34

Zhang, Chun Ling, Sheng Hui Zhao, Hong Yuan Xiao, Jing Wang, and Jing Ming Kuang. "An Improved Method for AMR-WB Speech Codec." Advanced Materials Research 756-759 (September 2013): 1259–63. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1259.

Full text

Abstract:

an improved method is proposed to skip the look-ahead period in this paper. The improved method uses the autocorrelation algorithm to calculate the Linear Prediction (LP) coefficients and then the LP coefficients are employed to extrapolate new samples for replacing the look-ahead samples. To evaluate the quality of this method, perceptual evaluation of speech quality (PESQ) and the A/B listening test method are designed for the objective evaluation and subjective evaluation. The reconstructed quality of the modified method is near to the original AMR codec, at the same time, the delay of the improved method is lower 5ms than the original method.

APA, Harvard, Vancouver, ISO, and other styles

35

Ijspeert, Auke Jan, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. "Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors." Neural Computation 25, no. 2 (February 2013): 328–73. http://dx.doi.org/10.1162/neco_a_00393.

Full text

Abstract:

Nonlinear dynamical systems have been used in many disciplines to model complex behaviors, including biological motor control, robotics, perception, economics, traffic prediction, and neuroscience. While often the unexpected emergent behavior of nonlinear systems is the focus of investigations, it is of equal importance to create goal-directed behavior (e.g., stable locomotion from a system of coupled oscillators under perceptual guidance). Modeling goal-directed behavior with nonlinear systems is, however, rather difficult due to the parameter sensitivity of these systems, their complex phase transitions in response to subtle parameter changes, and the difficulty of analyzing and predicting their long-term behavior; intuition and time-consuming parameter tuning play a major role. This letter presents and reviews dynamical movement primitives, a line of research for modeling attractor behaviors of autonomous nonlinear dynamical systems with the help of statistical learning techniques. The essence of our approach is to start with a simple dynamical system, such as a set of linear differential equations, and transform those into a weakly nonlinear system with prescribed attractor dynamics by means of a learnable autonomous forcing term. Both point attractors and limit cycle attractors of almost arbitrary complexity can be generated. We explain the design principle of our approach and evaluate its properties in several example applications in motor control and robotics.

APA, Harvard, Vancouver, ISO, and other styles

36

Pei, Yan, Qiangfu Zhao, and Yong Liu. "Kernel Method Based Human Model for Enhancing Interactive Evolutionary Optimization." Scientific World Journal 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/185860.

Full text

Abstract:

A fitness landscape presents the relationship between individual and its reproductive success in evolutionary computation (EC). However, discrete and approximate landscape in an original search space may not support enough and accurate information for EC search, especially in interactive EC (IEC). The fitness landscape of human subjective evaluation in IEC is very difficult and impossible to model, even with a hypothesis of what its definition might be. In this paper, we propose a method to establish a human model in projected high dimensional search space by kernel classification for enhancing IEC search. Because bivalent logic is a simplest perceptual paradigm, the human model is established by considering this paradigm principle. In feature space, we design a linear classifier as a human model to obtain user preference knowledge, which cannot be supported linearly in original discrete search space. The human model is established by this method for predicting potential perceptual knowledge of human. With the human model, we design an evolution control method to enhance IEC search. From experimental evaluation results with a pseudo-IEC user, our proposed model and method can enhance IEC search significantly.

APA, Harvard, Vancouver, ISO, and other styles

37

Luo, Shijian, Yufei Zhang, Jie Zhang, and Junheng Xu. "A User Biology Preference Prediction Model Based on the Perceptual Evaluations of Designers for Biologically Inspired Design." Symmetry 12, no. 11 (November 12, 2020): 1860. http://dx.doi.org/10.3390/sym12111860.

Full text

Abstract:

Biology provides a rich and novel source of inspiration for product design. An increasing number of industrial designers are gaining inspiration from nature, producing creative products by extracting, classifying, and reconstructing biological features. However, the current process of gaining biological inspiration is still limited by the prior knowledge and experience of designers, so it is necessary to investigate the designer’s perception of biological features. Herein, we investigate designer perceptions of bionic object features based on Kansei engineering, achieving a highly comprehensive structured expression of biological features forming five dimensions—Overall Feeling, Ability and Trait, Color and Texture, Apparent Tactile Sensation, and Structural Features—using factor analysis. Further, producing creative design solutions with a biologically inspired design (BID) has a risk of failing to meet user preferences and market needs. A user preference prediction support tool may address this bottleneck. We examine user preference by questionnaire and explore its association with the perceptual evaluation of designers, obtaining a user preference prediction model by conducting multiple linear regression analysis. This provides a statistical model for identifying the relative weighting of the perception dimensions of each designer in the user preference for an animal, giving the degree of contribution to the user preference. The experiment results show that the dimension “Overall Feeling” of the designer perception is positively correlated with the “like” level of the user preference and negatively correlated with the “dislike” level of the user preference, indicating that this prediction model bridges the gap caused by the asymmetry between designers and users by matching the designer perception and user preference. To a certain extent, this research solves the problems associated with the cognitive limitations of designers and the differences between designers and users, facilitating the use of biological features in product design and thereby enhancing the market importance of BID schemes.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Jessica, Henry Milner, Ion Stoica, and Jibin Zhan. "Benchmark of Bitrate Adaptation in Video Streaming." Journal of Data and Information Quality 13, no. 4 (December 31, 2021): 1–24. http://dx.doi.org/10.1145/3468063.

Full text

Abstract:

The HTTP adaptive streaming technique opened the door to cope with the fluctuating network conditions during the streaming process by dynamically adjusting the volume of the future chunks to be downloaded. The bitrate selection in this adjustment inevitably involves the task of predicting the future throughput of a video session, owing to which various heuristic solutions have been explored. The ultimate goal of the present work is to explore the theoretical upper bounds of the QoE that any ABR algorithm can possibly reach, therefore providing an essential step to benchmarking the performance evaluation of ABR algorithms. In our setting, the QoE is defined in terms of a linear combination of the average perceptual quality and the buffering ratio. The optimization problem is proven to be NP-hard when the perceptual quality is defined by chunk size and conditions are given under which the problem becomes polynomially solvable. Enriched by a global lower bound, a pseudo-polynomial time algorithm along the dynamic programming approach is presented. When the minimum buffering is given higher priority over higher perceptual quality, the problem is shown to be also NP-hard, and the above algorithm is simplified and enhanced by a sequence of lower bounds on the completion time of chunk downloading, which, according to our experiment, brings a 36.0% performance improvement in terms of computation time. To handle large amounts of data more efficiently, a polynomial-time algorithm is also introduced to approximate the optimal values when minimum buffering is prioritized. Besides its performance guarantee, this algorithm is shown to reach 99.938% close to the optimal results, while taking only 0.024% of the computation time compared to the exact algorithm in dynamic programming.

APA, Harvard, Vancouver, ISO, and other styles

39

Holly, Jan E. "Baselines for three-dimensional perception of combined linear and angular self-motion with changing rotational axis." Journal of Vestibular Research 10, no. 4-5 (November 1, 2000): 163–78. http://dx.doi.org/10.3233/ves-2000-104-501.

Full text

Abstract:

The laws of physics explain many human misperceptions of whole-body passive self-motion. One classic misperception occurs in a rotating chair in the dark: If the chair is decelerated to a stop after a period of counterclockwise rotation, then a subject will typically perceive clockwise rotation. The laws of physics show that, indeed, a clockwise rotation would be perceived even by a perfect processor of angular acceleration information, assuming that the processor is initialized (prior to the deceleration) with a typical subject's initial perception – of no rotation in this case. The motion perceived by a perfect acceleration processor serves as a baseline by which to judge human self-motion perception; this baseline makes a rough prediction and also forms a basis for comparison, with uniquely physiological properties of perception showing up as deviations from the baseline. These same principles, using the motion perceived by a perfect acceleration processor as a baseline, are used in the present paper to investigate complex motions that involve simultaneous linear and angular accelerations with a changing axis of rotation. Baselines – motions that would be perceived by a perfect acceleration processor, given the same initial perception (prior to the motion of interest) as that of a typical subject – are computed for the acceleration and deceleration stages of centrifuge runs in which the human carriage tilts along with the vector resultant of the centripetal and gravity vectors. The computations generate a three-dimensional picture of the motion perceived by a perfect acceleration processor, by simultaneously using all six interacting degrees of freedom (three angular and three linear) and taking into account the non-commutativity of rotations in three dimensions. The resulting three-dimensional baselines predict stronger perceptual effects during deceleration than during acceleration, despite the equal magnitudes (with opposite direction) of forces on the subject during acceleration and deceleration. For a centrifuge run with the subject facing tangentially in the direction of motion, the deceleration baseline shows a perception of forward tumble (pitch rotation) beginning with ascent from the earth, while the acceleration baseline does not have analogous pitch and vertical motion. These results give a three-dimensional explanation for certain puzzling acceleration-deceleration perceptual differences observed experimentally by Guedry, Rupert, McGrath, and Oman (Journal of Vestibular Research, 1992 (2).). The present analysis is consistent with, and expands upon, previous analyses of individual components of motion.

APA, Harvard, Vancouver, ISO, and other styles

40

Trabelsi, Imen, and Med Salim Bouhlel. "Feature Selection for GUMI Kernel-Based SVM in Speech Emotion Recognition." International Journal of Synthetic Emotions 6, no. 2 (July 2015): 57–68. http://dx.doi.org/10.4018/ijse.2015070104.

Full text

Abstract:

Speech emotion recognition is the indispensable requirement for efficient human machine interaction. Most modern automatic speech emotion recognition systems use Gaussian mixture models (GMM) and Support Vector Machines (SVM). GMM are known for their performance and scalability in the spectral modeling while SVM are known for their discriminatory power. A GMM-supervector characterizes an emotional style by the GMM parameters (mean vectors, covariance matrices, and mixture weights). GMM-supervector SVM benefits from both GMM and SVM frameworks. In this paper, the GMM-UBM mean interval (GUMI) kernel based on the Bhattacharyya distance is successfully used. CFSSubsetEval combined with Best first algorithm and Greedy stepwise were also utilized on the supervectors space in order to select the most important features. This framework is illustrated using Mel-frequency cepstral (MFCC) coefficients and Perceptual Linear Prediction (PLP) features on two different emotional databases namely the Surrey Audio-Expressed Emotion and the Berlin Emotional speech Database.

APA, Harvard, Vancouver, ISO, and other styles

41

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Optimizing Integrated Features for Hindi Automatic Speech Recognition System." Journal of Intelligent Systems 29, no. 1 (October 1, 2018): 959–76. http://dx.doi.org/10.1515/jisys-2018-0057.

Full text

Abstract:

Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.

APA, Harvard, Vancouver, ISO, and other styles

42

Bae, Sung-Ho, and Seong-Bae Park. "A Very Fast and Accurate Image Quality Assessment Method based on Mean Squared Error with Difference of Gaussians." Journal of Imaging Science and Technology 64, no. 1 (January 1, 2020): 10502–1. http://dx.doi.org/10.2352/j.imagingsci.technol.2020.64.1.010502.

Full text

Abstract:

Abstract Mean squared error (MSE) has long been the most useful objective image quality assessment (IQA) metric due to its mathematical tractability and computational simplicity, although it has shown poor correlations with the perceived visual quality for distorted images. Contrary to the MSE, recent IQA methods are more closely related with measured visual quality. However, their applications are somewhat limited due to their heavy computational costs and inapplicability in optimization process. In order to develop a better IQA method that will be closer to the perceived visual quality, the authors aimed to incorporate simple yet powerful linear features into the form of MSE while retaining the advantages of computational simplicity and desirable mathematical properties of MSE. Through comprehensive experiments, the authors found that Difference of Gaussians (DoG) kernel significantly improves the prediction performance while keeping the aforementioned advantages in the form of MSE. The proposed method performs better as the DoG filtering well approximates the behaviors of neural response functions in the visual cortex of the human visual system, thus extracting perceptually important features. At the same time, it holds the computational simplicity and mathematical properties of MSE since DoG is a very simple linear kernel. Their extensive experiments showed that the proposed method provides competitive prediction performance to the recent IQA methods with a significantly lower computational complexity.

APA, Harvard, Vancouver, ISO, and other styles

43

Gfeller, Kate, Jacob Oleson, John F. Knutson, Patrick Breheny, Virginia Driscoll, and Carol Olszewski. "Multivariate Predictors of Music Perception and Appraisal by Adult Cochlear Implant Users." Journal of the American Academy of Audiology 19, no. 02 (February 2008): 120–34. http://dx.doi.org/10.3766/jaaa.19.2.3.

Full text

Abstract:

The research examined whether performance by adult cochlear implant recipients on a variety of recognition and appraisal tests derived from real-world music could be predicted from technological, demographic, and life experience variables, as well as speech recognition scores. A representative sample of 209 adults implanted between 1985 and 2006 participated. Using multiple linear regression models and generalized linear mixed models, sets of optimal predictor variables were selected that effectively predicted performance on a test battery that assessed different aspects of music listening. These analyses established the importance of distinguishing between the accuracy of music perception and the appraisal of musical stimuli when using music listening as an index of implant success. Importantly, neither device type nor processing strategy predicted music perception or music appraisal. Speech recognition performance was not a strong predictor of music perception, and primarily predicted music perception when the test stimuli included lyrics. Additionally, limitations in the utility of speech perception in predicting musical perception and appraisal underscore the utility of music perception as an alternative outcome measure for evaluating implant outcomes. Music listening background, residual hearing (i.e., hearing aid use), cognitive factors, and some demographic factors predicted several indices of perceptual accuracy or appraisal of music. La investigación examinó si el desempeño, por parte de adultos receptores de un implante coclear, sobre una variedad de pruebas de reconocimiento y evaluación derivadas de la música del mundo real, podrían predecirse a partir de variables tecnológicas, demográficas y de experiencias de vida, así como de puntajes de reconocimiento del lenguaje. Participó una muestra representativa de 209 adultos implantados entre 1965 y el 2006. Usando múltiples modelos de regresión lineal y modelos mixtos lineales generalizados, se seleccionaron grupos de variables óptimas de predicción, que pudieran predecir efectivamente el desempeño por medio de una batería de pruebas que permitiera evaluar diferentes aspectos de la apreciación musical. Estos análisis establecieron la importancia de distinguir entre la exactitud en la percepción musical y la evaluación de estímulos musicales cuando se utiliza la apreciación musical como un índice de éxito en la implantación. Importantemente, ningún tipo de dispositivo o estrategia de procesamiento predijo la percepción o la evaluación musical. El desempeño en el reconocimiento del lenguaje no fue un elemento fuerte de predicción, y llegó a predecir primariamente la percepción musical cuando los estímulos de prueba incluyeron las letras. Adicionalmente, las limitaciones en la utilidad de la percepción del lenguaje a la hora de predecir la percepción y la evaluación musical, subrayan la utilidad de la percepción de la música como una medida alternativa de resultado para evaluar la implantación coclear. La música de fondo, la audición residual (p.e., el uso de auxiliares auditivos), los factores cognitivos, y algunos factores demográficos predijeron varios índices de exactitud y evaluación perceptual de la música.

APA, Harvard, Vancouver, ISO, and other styles

44

Healey, Dione M., David J. Marks, and Jeffrey M. Halperin. "Examining the Interplay Among Negative Emotionality, Cognitive Functioning, and Attention Deficit/Hyperactivity Disorder Symptom Severity." Journal of the International Neuropsychological Society 17, no. 3 (April 5, 2011): 502–10. http://dx.doi.org/10.1017/s1355617711000294.

Full text

Abstract:

AbstractCognition and emotion, traditionally thought of as largely distinct, have recently begun to be conceptualized as dynamically linked processes that interact to influence functioning. This study investigated the moderating effects of cognitive functioning on the relationship between negative emotionality and attention deficit/hyperactivity disorder (ADHD) symptom severity. A total of 216 (140 hyperactive/inattentive; 76 typically developing) preschoolers aged 3–4 years were administered a neuropsychological test battery (i.e., NEPSY). To avoid method bias, child negative emotionality was rated by teachers (Temperament Assessment Battery for Children-Revised), and parents rated symptom severity on the ADHD Rating Scale (ADHD-RS-IV). Hierarchical Linear Regression analyses revealed that both negative emotionality and Perceptual-Motor & Executive Functions accounted for significant unique variance in ADHD symptom severity. Significant interactions indicated that when negative emotionality is low, but not high, neuropsychological functioning accounts for significant variability in ADHD symptoms, with lower functioning predicting more symptoms. Emotional and neuropsychological functioning, both individually and in combination, play a significant role in the expression of ADHD symptom severity. (JINS, 2011, 17, 502–510)

APA, Harvard, Vancouver, ISO, and other styles

45

Moosavi, S. H., R. B. Banzett, and J. P. Butler. "Time course of air hunger mirrors the biphasic ventilatory response to hypoxia." Journal of Applied Physiology 97, no. 6 (December 2004): 2098–103. http://dx.doi.org/10.1152/japplphysiol.00056.2004.

Full text

Abstract:

Determining response dynamics of hypoxic air hunger may provide information of use in clinical practice and will improve understanding of basic dyspnea mechanisms. It is hypothesized that air hunger arises from projection of reflex brain stem ventilatory drive (“corollary discharge”) to forebrain centers. If perceptual response dynamics are unmodified by events between brain stem and cortical awareness, this hypothesis predicts that air hunger will exactly track ventilatory response. Thus, during sustained hypoxia, initial increase in air hunger would be followed by a progressive decline reflecting biphasic reflex ventilatory drive. To test this prediction, we applied a sharp-onset 20-min step of normocapnic hypoxia and compared dynamic response characteristics of air hunger with that of ventilation in 10 healthy subjects. Air hunger was measured during mechanical ventilation (minute ventilation = 9 ± 1.4 l/min; end-tidal Pco2 = 37 ± 2 Torr; end-tidal Po2 = 45 ± 7 Torr); ventilatory response was measured during separate free-breathing trials in the same subjects. Discomfort caused by “urge to breathe” was rated every 30 s on a visual analog scale. Both ventilatory and air hunger responses were modeled as delayed double exponentials corresponding to a simple linear first-order response but with a separate first-order adaptation. These models provided adequate fits to both ventilatory and air hunger data ( r2 = 0.88 and 0.66). Mean time constant and time-to-peak response for the average perceptual response (0.36 min−1 and 3.3 min, respectively) closely matched corresponding values for the average ventilatory response (0.39 min−1 and 3.1 min). Air hunger response to sustained hypoxia tracked ventilatory drive with a delay of ∼30 s. Our data provide further support for the corollary discharge hypothesis for air hunger.

APA, Harvard, Vancouver, ISO, and other styles

46

Evermann, Ulrika, Simon Schmitt, Tina Meller, Julia-Katharina Pfarr, Sarah Grezellschak, and Igor Nenadić. "Distress severity in perceptual anomalies moderates the relationship between prefrontal brain structure and psychosis proneness in nonclinical individuals." European Archives of Psychiatry and Clinical Neuroscience 271, no. 6 (February 2, 2021): 1111–22. http://dx.doi.org/10.1007/s00406-020-01229-5.

Full text

Abstract:

AbstractIn the general population, psychosis risk phenotypes occur independently of attenuated prodromal syndromes. Neurobiological correlates of vulnerability could help to understand their meaningfulness. Interactions between the occurrence of psychotic-like experiences (PLE) and other psychological factors e.g., distress related to PLE, may distinguish psychosis-prone individuals from those without risk of future psychotic disorder. We aimed to investigate whether (a) correlates of total PLE and distress, and (b) symptom dimension-specific moderation effects exist at the brain structural level in non-help-seeking adults reporting PLE below and above the screening criterion for clinical high-risk (CHR). We obtained T1-weighted whole-brain MRI scans from 104 healthy adults from the community without psychosis CHR states for voxel-based morphometry (VBM). Brain structural associations with PLE and PLE distress were analysed with multiple linear regression models. Moderation of PLE by distress severity of two types of positive symptoms from the Prodromal Questionnaire (PQ-16) screening inventory was explored in regions-of-interest after VBM. Total PQ-16 score was positively associated with grey matter volume (GMV) in prefrontal regions, occipital fusiform and lingual gyri (p < 0.05, FDR peak-level corrected). Overall distress severity and GMV were not associated. Examination of distress severity on the positive symptom dimensions as moderators showed reduced strength of the association between PLE and rSFG volume with increased distress severity for perceptual PLE. In this study, brain structural variation was related to PLE level, but not distress severity, suggesting specificity. In healthy individuals, positive relationships between PLE and prefrontal volumes may indicate protective features, which supports the insufficiency of PLE for the prediction of CHR. Additional indicators of vulnerability, such as distress associated with perceptual PLE, change the positive brain structure relationship. Brain structural findings may strengthen clinical objectives through disentanglement of innocuous and risk-related PLE.

APA, Harvard, Vancouver, ISO, and other styles

47

Cabral, Frederico Soares, Hidekazu Fukai, and Satoshi Tamura. "Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors." Sensors 19, no. 16 (August 9, 2019): 3481. http://dx.doi.org/10.3390/s19163481.

Full text

Abstract:

The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.

APA, Harvard, Vancouver, ISO, and other styles

48

Tang, Yiling, Shunliang Jiang, Shaoping Xu, Tingyun Liu, and Chongxi Li. "Blind Image Quality Assessment Based on Multi-Window Method and HSV Color Space." Applied Sciences 9, no. 12 (June 19, 2019): 2499. http://dx.doi.org/10.3390/app9122499.

Full text

Abstract:

To improve the evaluation accuracy of the distorted images with various distortion types, an effective blind image quality assessment (BIQA) algorithm based on the multi-window method and the HSV color space is proposed in this paper. We generate multiple normalized feature maps (NFMs) by using the multi-window method to better characterize image degradation from the receptive fields of different sizes. Specifically, the distribution statistics are first extracted from the multiple NFMs. Then, Pearson linear correlation coefficients between spatially adjacent pixels in the NFMs are utilized to quantify the structural changes of the distorted images. Weibull model is utilized to capture distribution statistics of the differential feature maps between the NFMs to more precisely describe the presence of the distortions. Moreover, the entropy and gradient statistics extracted from the HSV color space are employed as a complement to the gray-scale features. Finally, a support vector regressor is adopted to map the perceptual feature vector to image quality score. Experimental results on five benchmark databases demonstrate that the proposed algorithm achieves higher prediction accuracy and robustness against diverse synthetically and authentically distorted images than the state-of-the-art algorithms while maintaining low computational cost.

APA, Harvard, Vancouver, ISO, and other styles

49

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling." Journal of Intelligent Systems 29, no. 1 (February 20, 2018): 327–44. http://dx.doi.org/10.1515/jisys-2017-0618.

Full text

Abstract:

Abstract The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.

APA, Harvard, Vancouver, ISO, and other styles

50

Scholes, Chris, Paul V. McGraw, Marcus Nyström, and Neil W. Roach. "Fixational eye movements predict visual sensitivity." Proceedings of the Royal Society B: Biological Sciences 282, no. 1817 (October 22, 2015): 20151568. http://dx.doi.org/10.1098/rspb.2015.1568.

Full text

Abstract:

During steady fixation, observers make small fixational saccades at a rate of around 1–2 per second. Presentation of a visual stimulus triggers a biphasic modulation in fixational saccade rate—an initial inhibition followed by a period of elevated rate and a subsequent return to baseline. Here we show that, during passive viewing, this rate signature is highly sensitive to small changes in stimulus contrast. By training a linear support vector machine to classify trials in which a stimulus is either present or absent, we directly compared the contrast sensitivity of fixational eye movements with individuals' psychophysical judgements. Classification accuracy closely matched psychophysical performance, and predicted individuals' threshold estimates with less bias and overall error than those obtained using specific features of the signature. Performance of the classifier was robust to changes in the training set (novel subjects and/or contrasts) and good prediction accuracy was obtained with a practicable number of trials. Our results indicate a tight coupling between the sensitivity of visual perceptual judgements and fixational eye control mechanisms. This raises the possibility that fixational saccades could provide a novel and objective means of estimating visual contrast sensitivity without the need for observers to make any explicit judgement.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Perceptual linear predictive'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles