To see the other types of publications on this topic, follow the link: Speech Activity Detection (SAD).

Dissertations / Theses on the topic 'Speech Activity Detection (SAD)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 23 dissertations / theses for your research on the topic 'Speech Activity Detection (SAD).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Näslund, Anton, and Charlie Jeansson. "Robust Speech Activity Detection and Direction of Arrival Using Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297756.

Full text
Abstract:
Social robots are becoming more and more common in our everyday lives. In the field of conversational robotics, the development goes towards socially engaging robots with humanlike conversation. This project looked into one of the technical aspects when recognizing speech, videlicet speech activity detection (SAD). The presented solution uses a convolutional neural network (CNN) based system to detect speech in a forward azimuth area. The project used a dataset from FestVox, called CMU Artic and was complimented by adding recorded noises. A library called Pyroomacoustics were used to simulate
APA, Harvard, Vancouver, ISO, and other styles
2

Wejdelind, Marcus, and Nils Wägmark. "Multi-speaker Speech Activity Detection From Video." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297701.

Full text
Abstract:
A conversational robot will in many cases have todeal with multi-party spoken interaction in which one or morepeople could be speaking simultaneously. To do this, the robotmust be able to identify the speakers in order to attend to them.Our project has approached this problem from a visual pointof view where a Convolutional Neural Network (CNN) wasimplemented and trained using video stream input containingone or more faces from an already existing data set (AVA-Speech). The goal for the network has then been to for eachface, and in each point in time, detect the probability of thatperson speak
APA, Harvard, Vancouver, ISO, and other styles
3

Murrin, Paul. "Objective measurement of voice activity detectors." Thesis, University of York, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Laverty, Stephen William. "Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free Environment." Link to electronic thesis, 2005. http://www.wpi.edu/Pubs/ETD/Available/etd-051105-110646/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Minotto, Vicente Peruffo. "Audiovisual voice activity detection and localization of simultaneous speech sources." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/77231.

Full text
Abstract:
Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Det
APA, Harvard, Vancouver, ISO, and other styles
6

Ent, Petr. "Voice Activity Detection." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-235483.

Full text
Abstract:
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
APA, Harvard, Vancouver, ISO, and other styles
7

Cho, Yong Duk. "Speech detection, enhancement and compression for voice communications." Thesis, University of Surrey, 2001. http://epubs.surrey.ac.uk/842991/.

Full text
Abstract:
Speech signal processing for voice communications can be characterised in terms of silence compression, noise reduction, and speech compression. The limit in the channel bandwidth of voice communication systems requires efficient compression of speech and silence signals while retaining the voice quality. Silence compression by means of both voice activity detection (VAD) and comfort noise generation could present transparent speech-quality while substantially lowering the transmission bit-rate, since pause regions between talk spurts do not include any voice information. Thus, this thesis pro
APA, Harvard, Vancouver, ISO, and other styles
8

Doukas, Nikolaos. "Voice activity detection using energy based measures and source separation." Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.

Full text
Abstract:
This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and p
APA, Harvard, Vancouver, ISO, and other styles
10

Thorell, Hampus. "Voice Activity Detection in the Tiger Platform." Thesis, Linköping University, Department of Electrical Engineering, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-6586.

Full text
Abstract:
<p>Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications.</p><p>A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly sh
APA, Harvard, Vancouver, ISO, and other styles
11

Cooper, Douglas. "Speech Detection using Gammatone Features and One-Class Support Vector Machine." Master's thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5923.

Full text
Abstract:
A network gateway is a mechanism which provides protocol translation and/or validation of network traffic using the metadata contained in network packets. For media applications such as Voice-over-IP, the portion of the packets containing speech data cannot be verified and can provide a means of maliciously transporting code or sensitive data undetected. One solution to this problem is through Voice Activity Detection (VAD). Many VAD's rely on time-domain features and simple thresholds for efficient speech detection however this doesn't say much about the signal being passed. More sophisticate
APA, Harvard, Vancouver, ISO, and other styles
12

Temko, Andriy. "Acoustic event detection and classification." Doctoral thesis, Universitat Politècnica de Catalunya, 2007. http://hdl.handle.net/10803/6880.

Full text
Abstract:
L'activitat humana que té lloc en sales de reunions o aules d'ensenyament es veu reflectida en una rica varietat d'events acústics, ja siguin produïts pel cos humà o per objectes que les persones manegen. Per això, la determinació de la identitat dels sons i de la seva posició temporal pot ajudar a detectar i a descriure l'activitat humana que té lloc en la sala. A més a més, la detecció de sons diferents de la veu pot ajudar a millorar la robustes de tecnologies de la parla com el reconeixement automàtica a condicions de treball adverses. L'objectiu d'aquesta tesi és la detecció i classificac
APA, Harvard, Vancouver, ISO, and other styles
13

Danko, Michal. "Identifikace hudby, řeči, křiku, zpěvu v audio (video) záznamu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255309.

Full text
Abstract:
This thesis follows the trend of last decades in using neural networks in order to detect speech in noisy data. The text begins with basic knowledge about discussed topics, such as audio features, machine learning and neural networks. The network parameters are examined in order to provide the most suitable background for the experiments. The main focus of the experiments is to observe the influence of various sound events on the speech detection on a small, diverse database. Where the sound events correlated to the speech proved to be the most beneficial. In addition, the accuracy of the acou
APA, Harvard, Vancouver, ISO, and other styles
14

Podloucká, Lenka. "Identifikace pauz v rušeném řečovém signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217266.

Full text
Abstract:
This diploma thesis deals with pause identification with degraded speech signal. The speech characteristics and the conception of speech signal processing are described here. The work aim was to create the reliable recognizing method to establish speech and non-speech segments of speech signal with and without degraded speech signal. The five empty pause detectors were realized in computing environment MATLAB. There was the energetic detector in time domain, two-step detector in spectral domain, one-step integral detector, two-step integral detector and differential detector in cepstrum. The s
APA, Harvard, Vancouver, ISO, and other styles
15

Min-Chang, Chang, and 張民昌. "Voice Activity Detection and Its Application to Speech Coding." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/86106727484751378312.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>電機工程系碩士班<br>91<br>Voice activity detector is usually to be the preprocessor of a speech encoder in order to determine whether the incoming signal is a speech segment or not. If it is, a normal speech coder is used to encode the speech segment. If it is not, fewer parameters called silence insertion descriptor (SID) are needed to transmit to the decoder then a comfort noise generator (CNG) is exploited to mimic the background noise. According to the statistics about people’s talking, above 40 % even as higher as 60 % time slice is silence between talk spurts, so lots of bit ra
APA, Harvard, Vancouver, ISO, and other styles
16

Lai, Chen-Wei, and 賴辰瑋. "The Research on the Voice Activity Detection and Speech Enhancement for Noisy Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/06070640933840270072.

Full text
Abstract:
碩士<br>國立暨南國際大學<br>電機工程學系<br>93<br>When a speech recognizer is applied in a real environment, its performance is often degraded seriously due to the existence of additive noise. In order to improve the robustness of the recognition system under noisy conditions, various approaches have been proposed, one direction of these approaches is attempt to detect the presence the presence of noise, to estimate the characteristics of the noise and then to remove or alleviate the noise in speech signals. In the thesis, we first study several voice activity detection (endpoint detection) approaches, which
APA, Harvard, Vancouver, ISO, and other styles
17

WU, DONG-HAN, and 吳東翰. "Reduced Computation of Speech Coder Using a Voice Activity Detection Algorithm." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/83961092562227492103.

Full text
Abstract:
碩士<br>南臺科技大學<br>資訊工程系<br>105<br>The explosive growth of Internet use and multimedia technology, multimedia communication is integrated into a personal information machine nowadays, and due to the latter’s limited computational capability, the need for a coder with low computational complexity to match different hardware platforms and integrate the services of media sources has arisen. For an Internet or wireless speech communicator, heavy computation uses more power and contributes to higher pricing of the communicator or reduced battery life. In order to achieve the real-time and continuity o
APA, Harvard, Vancouver, ISO, and other styles
18

Tu, Wen Hsiang, and 杜文祥. "Study on the Voice Activity Detection Techniques for Robust Speech Feature Extraction." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/76966247400637028949.

Full text
Abstract:
碩士<br>國立暨南國際大學<br>電機工程學系<br>95<br>The performance of a speech recognition system is often degraded due to the mismatch between the environments of development and application. One of the major sources that give rises to this mismatch is additive noise. The approaches for handling the problem of additive noise can be divided into three classes: speech enhancement, robust speech feature extraction, and compensation of speech models. In this thesis, we are focused on the second class, robust speech feature extraction. The approaches of speech robust feature extraction are often together with the
APA, Harvard, Vancouver, ISO, and other styles
19

楊佳興. "A Real-time Speech Purification and Voice Activity Detection System Using Microphone Array." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/qy6qq9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Hsuei, Yan-Jung, and 許晏榮. "SOPC Implementation of Speech Purification and Voice Activity Detection System Using Microphone Array." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/mkjwd4.

Full text
Abstract:
碩士<br>國立交通大學<br>電機與控制工程系所<br>94<br>A real-time speech purification and voice activity detection (VAD) system for noisy indoor environment is proposed in this thesis. The system contains a real-time eight channel microphone array signal processing platform. An adaptive spatial filter is also designed on the platform to provide the system with the ability of environmental characteristic and noise adaptation. All the algorithms are realized on a Nios embedded system-on-programmable-chip (SOPC) platform. The VAD algorithm is executed by the Nios processor and the adaptive filter is accelerated by
APA, Harvard, Vancouver, ISO, and other styles
21

Chen, Hung-Bin, and 陳鴻彬. "On the Study of Energy-Based Speech Feature Normalization and Application to Voice Activity Detection." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/41039482721804356460.

Full text
Abstract:
碩士<br>國立臺灣師範大學<br>資訊工程研究所<br>95<br>This thesis considered robust speech recognition in various noise environments, with a special focus on investigating the ways to reconstruct the clean time-domain log-energy features from the noise-contaminated ones. Based on the distribution characteristics of the log-energy features of each speech utterance, we aimed to develop an efficient approach to rescale the log-energy features of the noisy speech utterance so as to alleviate the mismatch caused by environmental noises for better speech recognition performance. As the time-domain phenomena of the spe
APA, Harvard, Vancouver, ISO, and other styles
22

ZHENG, SU-XING, and 鄭素幸. "A study on wireless digital subscriber loop and the channel sharing efficiency through speech activity detection." Thesis, 1992. http://ndltd.ncl.edu.tw/handle/03260404361617659124.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Venter, Petrus Jacobus. "Recording and automatic detection of African elephant (Loxodonta africana) infrasonic rumbles." Diss., 2008. http://hdl.handle.net/2263/28329.

Full text
Abstract:
The value of studying elephant vocalizations lies in the abundant information that can be retrieved from it. Recordings of elephant rumbles can be used by researchers to determine the size and composition of the herd, the sexual state, as well as the emotional condition of an elephant. It is a difficult task for researchers to obtain large volumes of continuous recordings of elephant vocalizations. Recordings are normally analysed manually to identify the location of rumbles via the tedious and time consuming methods of sped up listening and the visual evaluation of spectrograms. The applicati
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!