To see the other types of publications on this topic, follow the link: Adaptive multi-rate speech.

Journal articles on the topic 'Adaptive multi-rate speech'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 27 journal articles for your research on the topic 'Adaptive multi-rate speech.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Abreu-Sernández, V., and C. García-Mateo. "Adaptive multi-rate speech coder for VoIP transmission." Electronics Letters 36, no. 23 (2000): 1978. http://dx.doi.org/10.1049/el:20001344.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sun, Congcong, Hui Tian, Chin-Chen Chang, et al. "Steganalysis of Adaptive Multi-Rate Speech Based on Extreme Gradient Boosting." Electronics 9, no. 3 (2020): 522. http://dx.doi.org/10.3390/electronics9030522.

Full text
Abstract:
Steganalysis of adaptive multi-rate (AMR) speech is a hot topic for controlling cybercrimes grounded in steganography in related speech streams. In this paper, we first present a novel AMR steganalysis model, which utilizes extreme gradient boosting (XGBoost) as the classifier, instead of support vector machines (SVM) adopted in the previous schemes. Compared with the SVM-based model, this new model can facilitate the excavation of potential information from the high-dimensional features and can avoid overfitting. Moreover, to further strengthen the preceding features based on the statistical characteristics of pulse pairs, we present the convergence feature based on the Markov chain to reflect the global characterization of pulse pairs, which is essentially the final state of the Markov transition matrix. Combining the convergence feature with the preceding features, we propose an XGBoost-based steganalysis scheme for AMR speech streams. Finally, we conducted a series of experiments to assess our presented scheme and compared it with previous schemes. The experimental results demonstrate that the proposed scheme is feasible, and can provide better performance in terms of detecting the existing steganography methods based on AMR speech streams.
APA, Harvard, Vancouver, ISO, and other styles
3

Dan, Zhengjia, Yue Zhao, Xiaojun Bi, Licheng Wu, and Qiang Ji. "Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition." Entropy 24, no. 10 (2022): 1429. http://dx.doi.org/10.3390/e24101429.

Full text
Abstract:
At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to be manually adjusted. This makes multi-task learning very difficult and costly because it requires constantly trying various combinations of weights to determine the optimal task weights. In this paper, we propose a multi-dialect acoustic model that combines soft-parameter-sharing multi-task learning with Transformer, and introduce several auxiliary cross-attentions to enable the auxiliary task (dialect ID recognition) to provide dialect information for the multi-dialect speech recognition task. Furthermore, we use the adaptive cross-entropy loss function as the multi-task objective function, which automatically balances the learning of the multi-task model according to the loss proportion of each task during the training process. Therefore, the optimal weight combination can be found without any manual intervention. Finally, for the two tasks of multi-dialect (including low-resource dialect) speech recognition and dialect ID recognition, the experimental results show that, compared with single-dialect Transformer, single-task multi-dialect Transformer, and multi-task Transformer with hard parameter sharing, our method significantly reduces the average syllable error rate of Tibetan multi-dialect speech recognition and the character error rate of Chinese multi-dialect speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
4

Tian, Hui, Meilun Huang, Chin-Chen Chang, Yongfeng Huang, Jing Lu, and Yongqian Du. "Steganalysis of Adaptive Multi-Rate Speech Using Statistical Characteristics of Pitch Delay." JUCS - Journal of Universal Computer Science 25, no. (9) (2019): 1131–50. https://doi.org/10.3217/jucs-025-09-1131.

Full text
Abstract:
Steganography is a promising technique for covert communications. However, illegal United States of Americage of this technique would facilitate cybercrime activities and thereby pose a great threat to information security. Therefore, it is crucial to study its countermeasure, namely, steganalysis. In this paper, we aim to present an efficient steganalysis method for detecting adaptive-codebook based steganography in adaptive multi-rate (AMR) speech streams. To achieve this goal, we first design a new low-dimensional feature set for steganalysis, including an improved calibrated Markov transition probability matrix for the second-order difference of pitch delay values (IC-MSDPD) and the probability distribution of the odevity for pitch delay values (PDOEPD). The dimension of the proposed feature set is 14, far smaller than the feature set in the state-of-the-art steganalysis method. Employing the new feature set, we further present a steganalysis scheme for AMR speech based on support vector machines. The presented scheme is evaluated with a large number of AMR-encoded speech samples, and compared with the state-of-the-art one. The experimental results show that the proposed method is effective, and outperforms the state-of-the-art one in both detection accuracy and computational overhead.
APA, Harvard, Vancouver, ISO, and other styles
5

Syarif, Abdusy, and Ahmad Fachril. "PENERAPAN FITUR ADAPTIVE MULTI RATE (AMR) PADA JARINGAN GSM." CommIT (Communication and Information Technology) Journal 4, no. 1 (2010): 17. http://dx.doi.org/10.21512/commit.v4i1.531.

Full text
Abstract:
Adaptive Mutlirate (AMR) is a feature that plays an important role in the efficiency of use of cell/voice channels and GSM networks in overall and it can improve sound quality dynamically based on actual measurements (real time) between Mobile Station (MS) and Base Transmitter Station (BTS). Resources used as analytical parameters are SQI (Speech Quality Index), MOS (Mean Opinion Score) and the sound quality on the network without and with AMR. Measurements using Test Equipment Mobile System (TEMS) while locking devices to the single channel and comparing them between the two types of network. Based on test results it is obtained that with voice channels with AMR can increase the value of SQI approximately 40% for fullrate channels and about 60% for half-rate channels producing a remarkable (excellent) level, with research and further measuring it is expected to produce better and more perfect sound quality.Kata kunci: AMR, SQI, GSM networkABSTRAK
APA, Harvard, Vancouver, ISO, and other styles
6

Qiu, Yiqin, Hui Tian, Lili Tang, Wojciech Mazurczyk, and Chin-Chen Chang. "Steganalysis of adaptive multi-rate speech streams with distributed representations of codewords." Journal of Information Security and Applications 68 (August 2022): 103250. http://dx.doi.org/10.1016/j.jisa.2022.103250.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Tian, Hui, Yanpeng Wu, Chin-Chen Chang, et al. "Steganalysis of adaptive multi-rate speech using statistical characteristics of pulse pairs." Signal Processing 134 (May 2017): 9–22. http://dx.doi.org/10.1016/j.sigpro.2016.11.013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liu, Ranran, Hongxiang Xu, Enxing Zheng, and Yifeng Jiang. "Adaptive filtering for intelligent sensing speech based on multi-rate LMS algorithm." Cluster Computing 20, no. 2 (2017): 1493–503. http://dx.doi.org/10.1007/s10586-017-0871-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sun, Congcong, Azizol Abdullah, Normalia Samian, and Nuur Alifah Roslan. "Steganalysis of Adaptive Multi-Rate Speech with Unknown Embedding Rates Using Multi-Scale Transformer and Multi-Task Learning Mechanism." Journal of Cybersecurity and Privacy 5, no. 2 (2025): 29. https://doi.org/10.3390/jcp5020029.

Full text
Abstract:
As adaptive multi-rate (AMR) speech applications become increasingly widespread, AMR-based steganography presents growing security risks. Conventional steganalysis methods often assume known embedding rates, limiting their practicality in real-world scenarios where embedding rates are unknown. To overcome this limitation, we introduce a novel framework that integrates a multi-scale transformer architecture with multi-task learning for joint classification and regression. The classification task effectively distinguishes between cover and stego samples, while the regression task enhances feature representation by predicting continuous embedding values, providing deeper insights into embedding behaviors. This joint optimization strategy improves model adaptability to diverse embedding conditions and captures the underlying relationships between discrete embedding classes and their continuous distributions. The experimental results demonstrate that our approach achieves higher accuracy and robustness than existing steganalysis methods across varying embedding rates.
APA, Harvard, Vancouver, ISO, and other styles
10

Reddy, Akkireddy Mohan Kumar. "Optimized Multirate Wideband Speech Steganography for Improving Embedding Capacity Compared with Neighbor-Index-Division Codebook Division Algorithm." Revista Gestão Inovação e Tecnologias 11, no. 2 (2021): 1362–76. http://dx.doi.org/10.47059/revistageintec.v11i2.1763.

Full text
Abstract:
Aim: The main motive of this study is to perform Adaptive Multi Rate Wideband (AMR-WB) Speech Steganography in network security to produce the stego speech with less loss of quality while increasing embedding capacities. Materials and Methods: TIMIT Acoustic-Phonetic Continuous Speech Corpus dataset consists of about 16000 speech samples out of which 1000 samples are taken and 80% pretest power for analyzing the speech steganography. AMR-WB Speech steganography is performed by Diameter Neighbor codebook partition algorithm (Group 1) and Neighbor Index Division codebook division algorithm (Group 2). Results: The AMR-WB speech steganography using DN codebook partition obtained average quality rate of 2.8893 and NID codebook division algorithm obtained average quality rate of 2.4196 in the range of 300bps embedding capacity. Conclusion: The outcomes of this study proves that the decrease in quality in NID is twice more than the DN based steganography while increasing the embedding capacities.
APA, Harvard, Vancouver, ISO, and other styles
11

Sheikhan, Mansour. "Modification of codebook search in adaptive multi-rate wideband speech codecs using intelligent optimization algorithms." Neural Computing and Applications 24, no. 3-4 (2012): 911–26. http://dx.doi.org/10.1007/s00521-012-1321-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Sun, Congcong, Hui Tian, Wojciech Mazurczyk, Chin-Chen Chang, Hanyu Quan, and Yonghong Chen. "Steganalysis of adaptive multi-rate speech with unknown embedding rates using clustering and ensemble learning." Computers and Electrical Engineering 111 (October 2023): 108909. http://dx.doi.org/10.1016/j.compeleceng.2023.108909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Fu, Hongliang, Zhihao Zhuang, Yang Wang, Chen Huang, and Wenzhuo Duan. "Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation." Entropy 25, no. 1 (2023): 124. http://dx.doi.org/10.3390/e25010124.

Full text
Abstract:
To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89%~10.07%, the experimental results verify the validity of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
14

Büker, Aykut, and Cemal Hanilçi. "Exploring the Effectiveness of the Phase Features on Double Compressed AMR Speech Detection." Applied Sciences 14, no. 11 (2024): 4573. http://dx.doi.org/10.3390/app14114573.

Full text
Abstract:
Determining whether an audio signal is single compressed (SC) or double compressed (DC) is a crucial task in audio forensics, as it is closely linked to the integrity of the recording. In this paper, we propose the utilization of phase spectrum-based features for detecting DC narrowband and wideband adaptive multi-rate (AMR-NB and AMR-WB) speech. To the best of our knowledge, phase spectrum features have not been previously explored for DC audio detection. In addition to introducing phase spectrum features, we propose a novel parallel LSTM system that simultaneously learns the most representative features from both the magnitude and phase spectrum of the speech signal and integrates both sets of information to further enhance its performance. Analyses demonstrate significant differences between the phase spectra of SC and DC speech signals, suggesting their potential as representative features for DC AMR speech detection. The proposed phase spectrum features are found to perform as well as magnitude spectrum features for the AMR-NB codec, while outperforming the magnitude spectrum in detecting AMR-WB speech. The proposed phase spectrum features yield 8% performance improvement in terms of true positive rate over the magnitude spectrogram features. The proposed parallel LSTM system further improves DC AMR-WB speech detection.
APA, Harvard, Vancouver, ISO, and other styles
15

Tao, Huawei, Lei Geng, Shuai Shan, Jingchao Mai, and Hongliang Fu. "Multi-Stream Convolution-Recurrent Neural Networks Based on Attention Mechanism Fusion for Speech Emotion Recognition." Entropy 24, no. 8 (2022): 1025. http://dx.doi.org/10.3390/e24081025.

Full text
Abstract:
The quality of feature extraction plays a significant role in the performance of speech emotion recognition. In order to extract discriminative, affect-salient features from speech signals and then improve the performance of speech emotion recognition, in this paper, a multi-stream convolution-recurrent neural network based on attention mechanism (MSCRNN-A) is proposed. Firstly, a multi-stream sub-branches full convolution network (MSFCN) based on AlexNet is presented to limit the loss of emotional information. In MSFCN, sub-branches are added behind each pooling layer to retain the features of different resolutions, different features from which are fused by adding. Secondly, the MSFCN and Bi-LSTM network are combined to form a hybrid network to extract speech emotion features for the purpose of supplying the temporal structure information of emotional features. Finally, a feature fusion model based on a multi-head attention mechanism is developed to achieve the best fusion features. The proposed method uses an attention mechanism to calculate the contribution degree of different network features, and thereafter realizes the adaptive fusion of different network features by weighting different network features. Aiming to restrain the gradient divergence of the network, different network features and fusion features are connected through shortcut connection to obtain fusion features for recognition. The experimental results on three conventional SER corpora, CASIA, EMODB, and SAVEE, show that our proposed method significantly improves the network recognition performance, with a recognition rate superior to most of the existing state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Feng Lian, Xue Ying Zhang, and Xiao Lin Du. "Real-Time Implementation AMR-WB Algorithm in TMS320VC5509A DSP." Advanced Materials Research 490-495 (March 2012): 519–23. http://dx.doi.org/10.4028/www.scientific.net/amr.490-495.519.

Full text
Abstract:
Adaptive multi-rate wideband (AMR-WB) speech codec is the first standard for both wired and wireless universal broadband speech codec. Real-time implementation of AMR-WB algorithm in DSP has important practical significance and will accelerate its course of practice. The paper first designed the hardware platform and system software structure for real-time implementation of the AMR-WB algorithm, then introduced the transplantation of AMR-WB algorithm to the designed hardware system. The paper next discussed the real-time implementation of optimized algorithm on the hardware platform. Objective test results indicated the decoded speeches of real-time platform were bit exactly the same as test sequences. Synthetic speeches of real-time platform were very good in restoring the speaker’s voice characteristics by good intelligibility and naturalness.
APA, Harvard, Vancouver, ISO, and other styles
17

Bhatt, Ninad, and Yogeshwar Kosta. "Overall performance evaluation of adaptive multi rate 06.90 speech codec based on code excited linear prediction algorithm using MATLAB." International Journal of Speech Technology 15, no. 2 (2012): 119–29. http://dx.doi.org/10.1007/s10772-011-9126-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Taha, M., E. S. Azarov, D. S. Likhachov, and A. A. Petrovsky. "AN EFFICIENT SPEECH GENERATIVE MODEL BASED ON DETERMINISTIC/STOCHASTIC SEPARATION OF SPECTRAL ENVELOPES." Doklady BGUIR 18, no. 2 (2020): 23–29. http://dx.doi.org/10.35596/1729-7648-2020-18-2-23-29.

Full text
Abstract:
The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
19

Cherukuru, Pavani, and Mumtaz Begum Mustafa. "CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing." PeerJ Computer Science 10 (February 28, 2024): e1901. http://dx.doi.org/10.7717/peerj-cs.1901.

Full text
Abstract:
Speech enhancement algorithms are applied in multiple levels of enhancement to improve the quality of speech signals under noisy environments known as multi-channel speech enhancement (MCSE) systems. Numerous existing algorithms are used to filter noise in speech enhancement systems, which are typically employed as a pre-processor to reduce noise and improve speech quality. They may, however, be limited in performing well under low signal-to-noise ratio (SNR) situations. The speech devices are exposed to all kinds of environmental noises which may go up to a high-level frequency of noises. The objective of this research is to conduct a noise reduction experiment for a multi-channel speech enhancement (MCSE) system in stationary and non-stationary environmental noisy situations with varying speech signal SNR levels. The experiments examined the performance of the existing and the proposed MCSE systems for environmental noises in filtering low to high SNRs environmental noises (−10 dB to 20 dB). The experiments were conducted using the AURORA and LibriSpeech datasets, which consist of different types of environmental noises. The existing MCSE (BAV-MCSE) makes use of beamforming, adaptive noise reduction and voice activity detection algorithms (BAV) to filter the noises from speech signals. The proposed MCSE (DWT-CNN-MCSE) system was developed based on discrete wavelet transform (DWT) preprocessing and convolution neural network (CNN) for denoising the input noisy speech signals to improve the performance accuracy. The performance of the existing BAV-MCSE and the proposed DWT-CNN-MCSE were measured using spectrogram analysis and word recognition rate (WRR). It was identified that the existing BAV-MCSE reported the highest WRR at 93.77% for a high SNR (at 20 dB) and 5.64% on average for a low SNR (at −10 dB) for different noises. The proposed DWT-CNN-MCSE system has proven to perform well at a low SNR with WRR of 70.55% and the highest improvement (64.91% WRR) at −10 dB SNR.
APA, Harvard, Vancouver, ISO, and other styles
20

Glista, Danielle, Marianne Hawkins, Jonathan M. Vaisberg, Nazanin Pourmand, Vijay Parsa, and Susan Scollie. "Sound Quality Effects of an Adaptive Nonlinear Frequency Compression Processor with Normal-Hearing and Hearing-Impaired Listeners." Journal of the American Academy of Audiology 30, no. 07 (2019): 552–63. http://dx.doi.org/10.3766/jaaa.16179.

Full text
Abstract:
AbstractFrequency lowering (FL) technology offers a means of improving audibility of high-frequency sounds. For some listeners, the benefit of such technology can be accompanied by a perceived degradation in sound quality, depending on the strength of the FL setting.The studies presented in this article investigate the effect of a new type of FL signal processing for hearing aids, adaptive nonlinear frequency compression (ANFC), on subjective speech quality.Listener ratings of sound quality were collected for speech stimuli processed with systematically varied fitting parameters.Study 1 included 40 normal-hearing (NH) adult and child listeners. Study 2 included 11 hearing-impaired (HI) adult and child listeners. HI listeners were fitted with laboratory-worn hearing aids for use during listening tasks.Speech quality ratings were assessed across test conditions consisting of various strengths of static nonlinear frequency compression (NFC) and ANFC speech. Test conditions included those that were fine-tuned on an individual basis per hearing aid fitting and conditions that were modified to intentionally alter the sound quality of the signal.Listeners rated speech quality using the MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA) test paradigm. Ratings were analyzed for reliability and to compare results across conditions.Results show that interrater reliability is high for both studies, indicating that NH and HI listeners from both adult and child age groups can reliably complete the MUSHRA task. Results comparing sound quality ratings across experimental conditions suggest that both the NH and HI listener groups rate the stimuli intended to have poor sound quality (e.g., anchors and the strongest available parameter settings) as having below-average sound quality ratings. A different trend in the results is reported when considering the other experimental conditions across the listener groups in the studies. Speech quality ratings measured with NH listeners improve as the strength of ANFC decreases, with a range of bad to good ratings reported, on average. Speech quality ratings measured with HI listeners are similar and above-average for many of the experimental stimuli, including those with fine-tuned NFC and ANFC parameters.Overall, HI listeners provide similar sound quality ratings when comparing static and adaptive forms of frequency compression, especially when considering the individualized parameter settings. These findings suggest that a range in settings may result in above-average sound quality for adults and children with hearing impairment. Furthermore, the fitter should fine-tune FL parameters for each individual listener, regardless of type of FL technology.
APA, Harvard, Vancouver, ISO, and other styles
21

Yeh, Cheng-Yu, and Hung-Hsun Huang. "An Upgraded Version of the Binary Search Space-Structured VQ Search Algorithm for AMR-WB Codec." Symmetry 11, no. 2 (2019): 283. http://dx.doi.org/10.3390/sym11020283.

Full text
Abstract:
Adaptive multi-rate wideband (AMR-WB) speech codecs have been widely used for high speech quality in modern mobile communication systems, e.g., handheld mobile devices. Nevertheless, a major handicap is that a remarkable computational load is required in the vector quantization (VQ) of immittance spectral frequency (ISF) coefficients of an AMR-WB coding. In view of this, a two-stage search algorithm is presented in this paper as an efficient way to reduce the computational complexity of ISF quantization in AMR-WB coding. At stage 1, an input vector is assigned to a search subspace in an efficient manner using the binary search space-structured VQ (BSS-VQ) algorithm, and a codebook search is performed over the subspace at stage 2 using the iterative triangular inequality elimination (ITIE) approach. Through the use of the codeword rejection mechanisms equipped in both stages, the computational load can be remarkably reduced. As compared with the original version of the BSS-VQ algorithm, the upgraded version provides a computational load reduction of up to 51%. Furthermore, this work is expected to satisfy the energy saving requirement when implemented on an AMR-WB codec of mobile devices.
APA, Harvard, Vancouver, ISO, and other styles
22

Wu, Yanpeng, Huiji Zhang, Yi Sun, and Minghui Chen. "Steganalysis of AMR Based on Statistical Features of Pitch Delay." International Journal of Digital Crime and Forensics 11, no. 4 (2019): 66–81. http://dx.doi.org/10.4018/ijdcf.2019100105.

Full text
Abstract:
The calibrated matrix of the second-order difference of the pitch delay (C-MSDPD) feature has been proven to be effective in detecting steganography based on pitch delay. In this article, a new steganalysis scheme based on multiple statistical features of pitch delay is present. Analyzing the principle of the adaptive multi-rate (AMR) codec, the pitch delay values in the same frame is divided into groups, in each of which, a pitch delay has a closer correlation with the other ones. To depict the characteristic of the pitch delay, two new types of statistical features are adopted in this article. The new features and C-MSDPD feature are together employed to train a classifier based on support vector machine (SVM). The experimental result shows that, the proposed scheme outperforms the existing one at different embedding bit rates and with different speech lengths.
APA, Harvard, Vancouver, ISO, and other styles
23

Xia, Xin, Yunlong Ma, Ye Luo, and Jianwei Lu. "An online intelligent electronic medical record system via speech recognition." International Journal of Distributed Sensor Networks 18, no. 11 (2022): 155013292211344. http://dx.doi.org/10.1177/15501329221134479.

Full text
Abstract:
Traditional electronic medical record systems in hospitals rely on healthcare workers to manually enter patient information, resulting in healthcare workers having to spend a significant amount of time each day filling out electronic medical records. This inefficient interaction seriously affects the communication between doctors and patients and reduces the speed at which doctors can diagnose patients’ conditions. The rapid development of deep learning–based speech recognition technology promises to improve this situation. In this work, we build an online electronic medical record system based on speech interaction. The system integrates a medical linguistic knowledge base, a specialized language model, a personalized acoustic model, and a fault-tolerance mechanism. Hence, we propose and develop an advanced electronic medical record system approach with multi-accent adaptive technology for avoiding the mistakes caused by accents, and it improves the accuracy of speech recognition obviously. For testing the proposed speech recognition electronic medical record system, we construct medical speech recognition data sets using audio and electronic medical records from real medical environments. On the data sets from real clinical scenarios, our proposed algorithm significantly outperforms other machine learning algorithms. Furthermore, compared to traditional electronic medical record systems that rely on keyboard inputs, our system is much more efficient, and its accuracy rate increases with the increasing online time of the proposed system. Our results show that the proposed electronic medical record system is expected to revolutionize the traditional working approach of clinical departments, and it serves more efficient in clinics with low time consumption compared with traditional electronic medical record systems depending on keyboard inputs, which has less recording mistakes and lows down the time consumption in modification of medical recordings; due to the proposed speech recognition electronic medical record system is built on knowledge database of medical terms, so it has a good generalized application and adaption in the clinical scenarios for hospitals.
APA, Harvard, Vancouver, ISO, and other styles
24

Machine, Intelligence Research. "Optimized Speech Signal-based Diagnosis of Parkinson's Disease using Machine Learning Techniques – Augmented by An Efficient Feature Selection & Hyperparameter Tuning Approach." Machine Intelligence Research 17, no. 3 (2023): 9523–47. https://doi.org/10.5281/zenodo.10165395.

Full text
Abstract:
Parkinson's disease (PD) is a neuropathological condition that deteriorates over the time, particularly in elder people. Symptoms of PD include difficulty in moving, autonomic dysfunction, depression, dementia, and visual hallucinations. Conventional diagnostic methods, could be subject to subjectivity as they rely on evaluation of motions that are frequently subtle to different eyes and hence difficult to describe, which could lead to misdiagnosis. While vocal abnormalities are connected to symptoms in almost 90% of PD patients at initial stages, as suggested by recent research on the diagnosis of PD. Higher efficiency & lower error rate of machine learning (ML) methods on complex & high dimensional data problems makes ML methods suitable choice for PD diagnosis task. This study presents implementation of 12 ML models i.e., Logistic Regression (LR), SVM (Linear/RBF), K-nearest Neighbor (KNN), Naive Bayes (NB), Decision-Tree (DT), Random Forest (RF), Extra Tree (ET), Gradient Boost (GbBoost), Extreme Gradient Boost (XgBoost), Adaptive Boost (AdaBoost) and Multi-layer Perceptron (MLP), to acquire efficient ML model, to accurately classify PD subjects using PD speech dataset. Recursive Feature Elimination (RFE) and Minimum Redundancy - Maximum Relevance (mRMR), feature selection (FS) methods were employed along with Grid Search cross-validation method of hyperparameter tuning during current study. Experimental results identified RF model with RFE generated feature subset (RFE-50) as best classifier among all 12 ML model withstanding improved accuracy of 96.46%, recall of 0.96, precision of 0.97, F1-score of 0.96 and AUC-score of 0.998, which observed to be highest among various ML models employed on same PD dataset in recent past.
APA, Harvard, Vancouver, ISO, and other styles
25

Teymourzadeh, Rozita. "Design an Advance computer-aided tool for Image Authentication and Classification." American Journal of Applied Sciences Published Online ISSN: 1546-9239 10, no. 7 (2013): 696–705. https://doi.org/10.3844/ajassp.2013.696.705.

Full text
Abstract:
Over the years, advancements in the fields of digital image processing and artificial intelligence have been applied in solving many real-life problems. This could be seen in facial image recognition for security systems, identity registrations. Hence a bottleneck of identity registration is image processing. These are carried out in form of image preprocessing, image region extraction by cropping, feature extraction using Principal Component Analysis (PCA) and image compression using Discrete Cosine Transform (DCT). Other processing includes filtering and histogram equalization using contrast stretching is performed while enhancing the image as part of the analytical tool. Hence, this research work presents a universal integration image forgery detection analysis tool with image facial recognition using Back Propagation Neural Network (BPNN) processor. The proposed designed tool is a multi-function smart tool with the novel architecture of programmable error goal and light intensity. Furthermore, its advance dual database increases the efficiency of a high-performance application. With the fact that, the facial image recognition will always, give a matching output or closest possible output image for every input image irrespective of the authenticity, the universal smart GUI tool is proposed and designed to perform image forgery detection with the high accuracy of ±2% error rate. Meanwhile, a novel structure that provides efficient automatic image forgery detection for all input test images for the BPNN recognition is presented. Hence, an input image will be authenticated before being fed into the recognition tool.
APA, Harvard, Vancouver, ISO, and other styles
26

Shreekant, Gurrapu Saurabh Mehta &. Shraddha panbude. "CASE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS IN NON-MOBILITY SCENARIOS." April 30, 2018. https://doi.org/10.5281/zenodo.1237411.

Full text
Abstract:
IEEE 802.11 is the most popular standard for WLAN networks. It offers different physical transmission rates. This paper focuses on this multi transmission rate of 802.11 WLANs and its effect on speech quality. In non-adaptive systems, when the physical layer switches from a higher transmission rate to a lower one, different than the one that the VoIP flow needs, the switching may result in congestion, high delay and packet loss, and consequently speech quality degradation. However, there are some algorithms that adapt the transmission parameters according to the channel conditions. In this study we demonstrate how choosing parameter (different codec and packet size) can affect the voice quality, network delay and packet loss. Further, this study presents a comparison between adaptive and non-adaptive methods. The adaptive method has also been evaluated for different congestion level from perceived speech quality point of view.
APA, Harvard, Vancouver, ISO, and other styles
27

Wu, Yiliang, Xuliang Luo, Fengchan Guo, Tinghui Lu, and Cuimei Liu. "Research on multi-scenario adaptive acoustic encoders based on neural architecture search." Frontiers in Physics 12 (December 12, 2024). https://doi.org/10.3389/fphy.2024.1404503.

Full text
Abstract:
This paper presents the Scene Adaptive Acoustic Encoder (SAAE) method, which is tailored to diverse acoustic environments for adaptive design. Hand-crafted acoustic encoders often struggle to adapt to varying acoustic conditions, resulting in performance degradation in end-to-end speech recognition tasks. To address this challenge, the proposed SAAE method learns the differences in acoustic features across different environments and accordingly designs suitable acoustic encoders. By incorporating neural architecture search technology, the effectiveness of the encoder design is enhanced, leading to improved speech recognition performance. Experimental evaluations on three commonly used Mandarin and English datasets (Aishell-1, HKUST, and SWBD) demonstrate the effectiveness of the proposed method. The SAAE method achieves an average error rate reduction of more than 5% compared with existing acoustic encoders, highlighting its capability to deeply analyze speech features in specific scenarios and design high-performance acoustic encoders in a targeted manner.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!