Academic literature on the topic 'Speech Activity Detection (SAD)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speech Activity Detection (SAD).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speech Activity Detection (SAD)"

1

Kaur, Sukhvinder, and J. S. Sohal. "Speech Activity Detection and its Evaluation in Speaker Diarization System." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 16, no. 1 (2017): 7567–72. http://dx.doi.org/10.24297/ijct.v16i1.5893.

Full text
Abstract:
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concl
APA, Harvard, Vancouver, ISO, and other styles
2

Gimeno, Pablo, Dayana Ribas, Alfonso Ortega, Antonio Miguel, and Eduardo Lleida. "Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains." Applied Sciences 12, no. 4 (2022): 1832. http://dx.doi.org/10.3390/app12041832.

Full text
Abstract:
Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep learning solutions. These applications usually show a significant drop in performance when test data are different from training data due to the domain shift observed. Furthermore, machine learning algorithms require large amounts of labelled data, which may be hard to obtain in real applications. Considering both ideas, in this paper we evaluate three unsupervised domain adaptation techniques applied to the SAD task. A
APA, Harvard, Vancouver, ISO, and other styles
3

Dutta, Satwik, Prasanna Kothalkar, Johanna Rudolph, et al. "Advancing speech activity detection for automatic speech assessment of pre-school children prompted speech using COMBO-SAD." Journal of the Acoustical Society of America 148, no. 4 (2020): 2469–67. http://dx.doi.org/10.1121/1.5146831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Muhammad, Fahreza Alghifari, Surya Gunawan Teddy, Aminah binti Wan Nordin Mimi, Asif Ahmad Qadri Syed, Kartiwi Mira, and Janin Zuriati. "On the use of voice activity detection in speech emotion recognition." Bulletin of Electrical Engineering and Informatics 8, no. 4 (2019): 1324–32. https://doi.org/10.11591/eei.v8i4.1646.

Full text
Abstract:
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this pap
APA, Harvard, Vancouver, ISO, and other styles
5

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of differe
APA, Harvard, Vancouver, ISO, and other styles
6

V, Sethuram, Ande Prasad, and R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization." Intelligent Decision Technologies 15, no. 4 (2022): 561–77. http://dx.doi.org/10.3233/idt-211005.

Full text
Abstract:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and
APA, Harvard, Vancouver, ISO, and other styles
7

Khalil, Driss, Amrutha Prasad, Petr Motlicek, et al. "An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain." Aerospace 10, no. 10 (2023): 876. http://dx.doi.org/10.3390/aerospace10100876.

Full text
Abstract:
In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications—air traffic controller (ATCo) and pilot—are usually transmitted in a single channel, which poses a challenge when developing automatic systems for air traffic management. Speaker clustering is one of the challenges when applying speech processing algorithms to identify and group the same speaker among different speakers. We propose a pipeline that deploys (i) speech activity detection (SAD) to identify speech segments, (ii) an automati
APA, Harvard, Vancouver, ISO, and other styles
8

PROLAY GHOSH. "An Improved Convolutional Neural Network For Speech Detection." Journal of Information Systems Engineering and Management 10, no. 3 (2025): 621–30. https://doi.org/10.52783/jisem.v10i3.5951.

Full text
Abstract:
The detection of emotions from speech is the aim of this paper. Speech consists of anger, joy and fear have very high and wide range in pitch, whereas Speech consists of sad and tired emotion have very low pitch. Speech Emotion detection technology can recognize human emotions to help machines better for understanding intentions of a user to improve the human-computer interaction. Classification model named Convolutional Neural Network (CNN) based on mainly Mel Frequency Cepstral Coefficient (MFCC) feature to detect emotion have been presented here. Different approaches have been discussed and
APA, Harvard, Vancouver, ISO, and other styles
9

Zhao, Hui, Yu Tai Wang, and Xing Hai Yang. "Emotion Detection System Based on Speech and Facial Signals." Advanced Materials Research 459 (January 2012): 483–87. http://dx.doi.org/10.4028/www.scientific.net/amr.459.483.

Full text
Abstract:
This paper introduces the present status of speech emotion detection. In order to improve the emotion recognition rate of single mode, the bimodal fusion method based on speech and facial expression is proposed. First, we establishes emotional database include speech and facial expression. For different emotions, calm, happy, surprise, anger, sad, we extract ten speech parameters and use the PCA method to detect the speech emotion. Then we analyze the bimodal emotion detection of fusing facial expression information. The experiment results show that the emotion recognition rate with bimodal fu
APA, Harvard, Vancouver, ISO, and other styles
10

Rajdeep, Bhoomi, Hardik B. ,. Patel, and Sailesh Iyer. "Human Emotion Identification from Speech using Neural Network." International Journal of Computers 16 (November 10, 2022): 87–103. http://dx.doi.org/10.46300/9108.2022.16.15.

Full text
Abstract:
Detection of mood and behavior by voice analysis which helps to detect the speaker’s mood by the voice frequency. Here, I aim to present the mood like happy, and sad and behavior detection devices using machine learning and artificial intelligence which can be detected by voice analysis. Using this device, it detects the user’s mood. Moreover, this device detects the frequency by trained model and algorithm. The algorithm is well trained to catch the frequency where it helps to identify the mood happy or sad of the speaker and behavior. On the other hand, behavior can be predicted in form, it
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Speech Activity Detection (SAD)"

1

Näslund, Anton, and Charlie Jeansson. "Robust Speech Activity Detection and Direction of Arrival Using Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297756.

Full text
Abstract:
Social robots are becoming more and more common in our everyday lives. In the field of conversational robotics, the development goes towards socially engaging robots with humanlike conversation. This project looked into one of the technical aspects when recognizing speech, videlicet speech activity detection (SAD). The presented solution uses a convolutional neural network (CNN) based system to detect speech in a forward azimuth area. The project used a dataset from FestVox, called CMU Artic and was complimented by adding recorded noises. A library called Pyroomacoustics were used to simulate
APA, Harvard, Vancouver, ISO, and other styles
2

Wejdelind, Marcus, and Nils Wägmark. "Multi-speaker Speech Activity Detection From Video." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297701.

Full text
Abstract:
A conversational robot will in many cases have todeal with multi-party spoken interaction in which one or morepeople could be speaking simultaneously. To do this, the robotmust be able to identify the speakers in order to attend to them.Our project has approached this problem from a visual pointof view where a Convolutional Neural Network (CNN) wasimplemented and trained using video stream input containingone or more faces from an already existing data set (AVA-Speech). The goal for the network has then been to for eachface, and in each point in time, detect the probability of thatperson speak
APA, Harvard, Vancouver, ISO, and other styles
3

Murrin, Paul. "Objective measurement of voice activity detectors." Thesis, University of York, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Laverty, Stephen William. "Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free Environment." Link to electronic thesis, 2005. http://www.wpi.edu/Pubs/ETD/Available/etd-051105-110646/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Minotto, Vicente Peruffo. "Audiovisual voice activity detection and localization of simultaneous speech sources." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/77231.

Full text
Abstract:
Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Det
APA, Harvard, Vancouver, ISO, and other styles
6

Ent, Petr. "Voice Activity Detection." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-235483.

Full text
Abstract:
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
APA, Harvard, Vancouver, ISO, and other styles
7

Cho, Yong Duk. "Speech detection, enhancement and compression for voice communications." Thesis, University of Surrey, 2001. http://epubs.surrey.ac.uk/842991/.

Full text
Abstract:
Speech signal processing for voice communications can be characterised in terms of silence compression, noise reduction, and speech compression. The limit in the channel bandwidth of voice communication systems requires efficient compression of speech and silence signals while retaining the voice quality. Silence compression by means of both voice activity detection (VAD) and comfort noise generation could present transparent speech-quality while substantially lowering the transmission bit-rate, since pause regions between talk spurts do not include any voice information. Thus, this thesis pro
APA, Harvard, Vancouver, ISO, and other styles
8

Doukas, Nikolaos. "Voice activity detection using energy based measures and source separation." Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.

Full text
Abstract:
This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and p
APA, Harvard, Vancouver, ISO, and other styles
10

VECCHIOTTI, PAOLO. "Deep neural networks for speech detection and speaker localization in reverberant environments." Doctoral thesis, Università Politecnica delle Marche, 2019. http://hdl.handle.net/11566/263049.

Full text
Abstract:
In questa tesi vengono affrontate le tematiche del Voice Activity Detection (VAD) e dello Speaker LOCalization (SLOC) in ambiente riverberante. Un approccio data-driven caratterizza questo lavoro, e per questo motivo reti neurali deep vengono ampliamente sfruttate e analizzate. Sebbene diversi algoritmi classici siano stati utilizzati per VAD e SLOC per lungo tempo, le recenti scoperte nel campo del machine learning applicato all’audio hanno portato a risultati incoraggianti per quanto concerne VAD e SLOC. Di conseguenza, questa tesi propone numerose strategie vincenti per VAD e SLOC basate su
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Speech Activity Detection (SAD)"

1

Ufimtseva, Nataliya V., Iosif A. Sternin, and Elena Yu Myagkova. Russian psycholinguistics: results and prospects (1966–2021): a research monograph. Institute of Linguistics, Russian Academy of Sciences, 2021. http://dx.doi.org/10.30982/978-5-6045633-7-3.

Full text
Abstract:
The monograph reflects the problems of Russian psycholinguistics from the moment of its inception in Russia to the present day and presents its main directions that are currently developing. In addition, theoretical developments and practical results obtained in the framework of different directions and research centers are described in a concise form. The task of the book is to reflect, as far as it is possible in one edition, firstly, the history of the formation of Russian psycholinguistics; secondly, its methodology and developed methods; thirdly, the results obtained in different research
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speech Activity Detection (SAD)"

1

Alam, Tanvirul, and Akib Khan. "Lightweight CNN for Robust Voice Activity Detection." In Speech and Computer. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Solé-Casals, Jordi, Pere Martí-Puig, Ramon Reig-Bolaño, and Vladimir Zaiats. "Score Function for Voice Activity Detection." In Advances in Nonlinear Speech Processing. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-11509-7_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Pertilä, Pasi, Alessio Brutti, Piergiorgio Svaizer, and Maurizio Omologo. "Multichannel Source Activity Detection, Localization, and Tracking." In Audio Source Separation and Speech Enhancement. John Wiley & Sons Ltd, 2018. http://dx.doi.org/10.1002/9781119279860.ch4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Málek, Jiří, and Jindřich Žďánský. "Voice-Activity and Overlapped Speech Detection Using x-Vectors." In Text, Speech, and Dialogue. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Huang, Zhongqiang, and Mary P. Harper. "Speech Activity Detection on Multichannels of Meeting Recordings." In Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11677482_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zelinka, Jan. "Deep Learning and Online Speech Activity Detection for Czech Radio Broadcasting." In Text, Speech, and Dialogue. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00794-2_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gupta, Vishwa, and Gilles Boulianne. "Improvements in Language Modeling, Voice Activity Detection, and Lexicon in OpenASR21 Low Resource Languages." In Speech and Computer. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-48312-7_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chu, Stephen M., Etienne Marcheret, and Gerasimos Potamianos. "Automatic Speech Recognition and Speech Activity Detection in the CHIL Smart Room." In Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11677482_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Górriz, J. M., C. G. Puntonet, J. Ramírez, and J. C. Segura. "Bispectrum Estimators for Voice Activity Detection and Speech Recognition." In Nonlinear Analyses and Algorithms for Speech Processing. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11613107_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Macho, Dušan, Climent Nadeu, and Andrey Temko. "Robust Speech Activity Detection in Interactive Smart-Room Environments." In Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11965152_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speech Activity Detection (SAD)"

1

Xu, Longting, Mingjun Zhang, Wenbin Zhang, Tianyi Wang, Jiawei Yin, and Yu Gao. "Personal Voice Activity Detection With Ultra-Short Reference Speech." In 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2024. https://doi.org/10.1109/apsipaasc63619.2025.10848915.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Khmelev, Nikita, Alexandr Anikin, Anastasia Zorkina, et al. "Joint Voice Activity Detection and Quality Estimation for Efficient Speech Preprocessing." In 2025 27th International Conference on Digital Signal Processing and its Applications (DSPA). IEEE, 2025. https://doi.org/10.1109/dspa64310.2025.10977856.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Matic, A., V. Osmani, and O. Mayora. "Speech activity detection using accelerometer." In 2012 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2012. http://dx.doi.org/10.1109/embc.2012.6346377.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Abdulla, Waleed H., Zhou Guan, and Hou Chi Sou. "Noise robust speech activity detection." In 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2009. http://dx.doi.org/10.1109/isspit.2009.5407509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Khoury, Elie, and Matt Garland. "I-Vectors for speech activity detection." In Odyssey 2016. ISCA, 2016. http://dx.doi.org/10.21437/odyssey.2016-48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

K, Punnoose A. "New Features for Speech Activity Detection." In SMM19, Workshop on Speech, Music and Mind 2019. ISCA, 2019. http://dx.doi.org/10.21437/smm.2019-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Laskowski, Kornel, Qin Jin, and Tanja Schultz. "Crosscorrelation-based multispeaker speech activity detection." In Interspeech 2004. ISCA, 2004. http://dx.doi.org/10.21437/interspeech.2004-350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tsai, TJ, and Nelson Morgan. "Speech activity detection: An economics approach." In ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6638987.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Soroush, P. Z., M. Angrick, J. Shih, T. Schultz, and D. J. Krusienski. "Speech Activity Detection from Stereotactic EEG." In 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2021. http://dx.doi.org/10.1109/smc52423.2021.9659058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Shahrokhian, Bahar, David Zhou, and Kurt Vanlehn. "Speech-based Automatic Classroom Activity Detection." In 17th International Conference on Computer-Supported Collaborative Learning (CSCL) 2024. International Society of the Learning Sciences, 2024. http://dx.doi.org/10.22318/cscl2024.886155.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!