To see the other types of publications on this topic, follow the link: Speaker diarization.

Dissertations / Theses on the topic 'Speaker diarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 32 dissertations / theses for your research on the topic 'Speaker diarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Cui, Ming. "Experiments in speaker diarization using speaker vectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.

Full text
Abstract:
Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our resea
APA, Harvard, Vancouver, ISO, and other styles
2

Delgado, Flores Héctor. "Fast cross-session speaker diarization." Doctoral thesis, Universitat Autònoma de Barcelona, 2015. http://hdl.handle.net/10803/309290.

Full text
Abstract:
Actualmente se crean, almacenan, editan y distribuyen grandes cantidades de contenidos audiovisuales, en parte debido a la capacidad de almacenamiento prácticamente ilimitada, al acceso a los medios necesarios por todo el mundo y en cualquier parte, y a la ubicua conectividad proporcionada por Internet. En este contexto, se requiere una gestión adecuada y sostenible que permita la búsqueda y recuperación de la información de interés. Es aquí donde las técnicas de procesamiento del habla juegan un papel crucial en el etiquetado y anotación automáticos de contenidos audiovisuales. La diarizació
APA, Harvard, Vancouver, ISO, and other styles
3

Anguera, Miró Xavier. "Robust speaker diarization for meetings." Doctoral thesis, Universitat Politècnica de Catalunya, 2006. http://hdl.handle.net/10803/6901.

Full text
Abstract:
Aquesta tesi doctoral mostra la recerca feta en l'àrea de la diarització de locutor per a sales de reunions. En la present s'estudien els algorismes i la implementació d'un sistema en diferit de segmentació i aglomerat de locutor per a grabacions de reunions a on normalment es té accés a més d'un micròfon per al processat. El bloc més important de recerca s'ha fet durant una estada al International Computer Science Institute (ICSI, Berkeley, Caligornia) per un període de dos anys.<br/><br/>La diarització de locutor s'ha estudiat força per al domini de grabacions de ràdio i televisió. La majori
APA, Harvard, Vancouver, ISO, and other styles
4

Shum, Stephen (Stephen Hin-Chung). "Unsupervised methods for speaker diarization." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66478.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (p. 93-95).<br>Given a stream of unlabeled audio data, speaker diarization is the process of determining "who spoke when." We propose a novel approach to solving this problem by taking advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features and exploiting the inherent variabilities in the data through the use of unsupervised methods. Upon initial evaluat
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, David I.-Chung. "Speaker diarization : "who spoke when"." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/59624/1/David_Wang_Thesis.pdf.

Full text
Abstract:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems
APA, Harvard, Vancouver, ISO, and other styles
6

Patino, Villar José María. "Efficient speaker diarization and low-latency speaker spotting." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS003/document.

Full text
Abstract:
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'ana
APA, Harvard, Vancouver, ISO, and other styles
7

Patino, Villar José María. "Efficient speaker diarization and low-latency speaker spotting." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS003.

Full text
Abstract:
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'ana
APA, Harvard, Vancouver, ISO, and other styles
8

NIERO, MARCELO DE CAMPOS. "COMPARATIVE STUDY OF TECHNIQUES TO SPEAKER DIARIZATION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23244@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO<br>COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR<br>PROGRAMA DE EXCELENCIA ACADEMICA<br>A tarefa de diarização de locutor surgiu como forma de otimizar o trabalho do homem em recuperar informações sobre áudios, com o objetivo de realizar, por exemplo, indexação de fala e locutor. De fato, realizar a diarização de locutor consiste em, dado uma gravação de ligação telefônica, reunião ou noticiários, deve responder a pergunta Quem falou quando? sem nenhuma informação prévia sobre o áudio. A resposta em questão nos permite saber as
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Yi. "Speaker Diarization System for Call-center data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286677.

Full text
Abstract:
To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. Our speaker diarization system uses 13-Dimensional MFCCs as Features, performs Voice Active Detection (VAD), segmentation, Linear Clustering and the Hierarchical Clustering based on GMM and the BIC score. By applying it, we decrease the Equal Error
APA, Harvard, Vancouver, ISO, and other styles
10

Luque, Serrano Jordi. "Speaker diarization and tracking in multiple-sensor environments." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/119777.

Full text
Abstract:
This thesis verses about the research conducted in the topic of speaker recognition in real conditions like as meeting rooms, telephone quality speech and radio and TV broadcast news. The main objective is concerned to the automatic detection and the classification of speakers into a smart-room scenario. Acoustic speaker recognition is the application of a machine to identify an individual from a spoken sentence. It aims at processing the acoustic signals to convert them in symbolic descriptions corresponding to the identity of the speakers. For the last several years, speaker recognition in
APA, Harvard, Vancouver, ISO, and other styles
11

Otterson, Scott. "Use of speaker location features in meeting diarization /." Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/15463.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Fu, Rong. "Robust speaker diarization for single channel recorded meetings." Thesis, University of York, 2009. http://etheses.whiterose.ac.uk/1722/.

Full text
Abstract:
This thesis describes research into speaker diarization for recorded meetings. It explores the algorithms and the implementation of an off-line speaker segmentation and clustering system for meetings that have been recorded using one microphone. Speaker diarization is defined as a process of partitioning a spoken record into speaker-homogeneous regions. The meeting record contains different kinds of noise and the length of the noise varies significantly. The average speech-turn is short and the number of speakers is unknown. To reduce the influence of these aural characteristics on the perform
APA, Harvard, Vancouver, ISO, and other styles
13

Yin, Ruiqing. "Steps towards end-to-end neural speaker diarization." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS261/document.

Full text
Abstract:
La tâche de segmentation et de regroupement en locuteurs (speaker diarization) consiste à identifier "qui parle quand" dans un flux audio sans connaissance a priori du nombre de locuteurs ou de leur temps de parole respectifs. Les systèmes de segmentation et de regroupement en locuteurs sont généralement construits en combinant quatre étapes principales. Premièrement, les régions ne contenant pas de parole telles que les silences, la musique et le bruit sont supprimées par la détection d'activité vocale (VAD). Ensuite, les régions de parole sont divisées en segments homogènes en locuteur par d
APA, Harvard, Vancouver, ISO, and other styles
14

Zelenák, Martin. "Detection and handling of overlapping speech for speaker diarization." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/72431.

Full text
Abstract:
For the last several years, speaker diarization has been attracting substantial research attention as one of the spoken language technologies applied for the improvement, or enrichment, of recording transcriptions. Recordings of meetings, compared to other domains, exhibit an increased complexity due to the spontaneity of speech, reverberation effects, and also due to the presence of overlapping speech. Overlapping speech refers to situations when two or more speakers are speaking simultaneously. In meeting data, a substantial portion of errors of the conventional speaker diarization syst
APA, Harvard, Vancouver, ISO, and other styles
15

Vajaria, Himanshu. "Diarization, localization and indexing of meeting archives." [Tampa, Fla] : University of South Florida, 2008. http://purl.fcla.edu/usf/dc/et/SFE0002581.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Zewoudie, Abraham Woubie. "Discriminative features for GMM and i-vector based speaker diarization." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/461086.

Full text
Abstract:
Speaker diarization has received several research attentions over the last decade. Among the different domains of speaker diarization, diarization in meeting domain is the most challenging one. It usually contains spontaneous speech and is, for example, susceptible to reverberation. The appropriate selection of speech features is one of the factors that affect the performance of speaker diarization systems. Mel Frequency Cepstral Coefficients (MFCC) are the most widely used short-term speech features in speaker diarization. Other factors that affect the performance of speaker diarization sy
APA, Harvard, Vancouver, ISO, and other styles
17

Ghaemmaghami, Houman. "Robust automatic speaker linking and attribution." Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/60832/4/Houman_Ghaemmaghami_Thesis.pdf.

Full text
Abstract:
This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.
APA, Harvard, Vancouver, ISO, and other styles
18

Abdelraheem, Mahmoud Fakhry Mahmoud. "Exploiting spatial and spectral information for audio source separation and speaker diarization." Doctoral thesis, University of Trento, 2016. http://eprints-phd.biblio.unitn.it/1876/1/PhD_Thesis.pdf.

Full text
Abstract:
The goal of multichannel audio source separation is to produce high quality separated audio signals, observing mixtures of these signals. The difficulty of tackling the problem comes from not only the source propagation through noisy and echoing environments, but also overlapped source signals. Among the different research directions pursued around this problem, the adoption of probabilistic and advanced modeling aims at exploiting the diversity of multichannel propagation, and the redundancy of source signals. Moreover, prior information about the environments or the signals is helpful to imp
APA, Harvard, Vancouver, ISO, and other styles
19

Sinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.

Full text
Abstract:
This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and p
APA, Harvard, Vancouver, ISO, and other styles
20

Ishizuka, Kentaro. "Studies on Acoustic Features for Automatic Speech Recognition and Speaker Diarization in Real Environments." 京都大学 (Kyoto University), 2009. http://hdl.handle.net/2433/123834.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Silva, Sérgio Montazzolli. "Redução de dimensionalidade aplicada à diarização de locutor." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/94745.

Full text
Abstract:
Atualmente existe uma grande quantidade de dados multimídia sendo geradas todos os dias. Estes dados são oriundos de diversas fontes, como transmissões de rádio ou televisão, gravações de palestras, encontros, conversas telefônicas, vídeos e fotos capturados por celular, entre outros. Com isto, nos últimos anos o interesse pela transcrição de dados multimídia tem crescido, onde, no processamento de voz, podemos destacar as áreas de Reconhecimento de Locutor, Reconhecimento de Fala, Diarização de Locutor e Rastreamento de Locutores. O desenvolvimento destas áreas vem sendo impulsionado e direci
APA, Harvard, Vancouver, ISO, and other styles
22

Soldi, Giovanni. "Diarisation du locuteur en temps réel pour les objets intelligents." Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0061.

Full text
Abstract:
La diarisation du locuteur en temps réel vise à détecter "qui parle maintenant" dans un flux audio donné. La majorité des systèmes de diarisation en ligne proposés a mis l'accent sur des domaines moins difficiles, tels que l’émission des nouvelles et discours en plénière, caractérisé par une faible spontanéité. La première contribution de cette thèse est le développement d'un système de diarisation du locuteur complètement un-supervisé et adaptatif en ligne pour les données de réunions qui sont plus difficiles et spontanées. En raison des hauts taux d’erreur de diarisation, une approche semi-s
APA, Harvard, Vancouver, ISO, and other styles
23

Tomášek, Pavel. "Kdy kdo mluví?" Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236946.

Full text
Abstract:
This work aims at a task of speaker diarization. The goal is to implement a system which is able to decide "who spoke when". Particular components of implementation are described. The main parts are feature extraction, voice activity detection, speaker segmentation and clustering and finally also postprocessing. This work also contains results of implemented system on test data including a description of evaluation. The test data comes from the NIST RT Evaluation 2005 - 2007 and the lowest error rate for this dataset is 18.52% DER. Results are compared with diarization system implemented by Ma
APA, Harvard, Vancouver, ISO, and other styles
24

Bost, Xavier. "A storytelling machine ? : automatic video summarization : the case of TV series." Thesis, Avignon, 2016. http://www.theses.fr/2016AVIG0216/document.

Full text
Abstract:
Ces dix dernières années, les séries télévisées sont devenues de plus en plus populaires. Par opposition aux séries TV classiques composées d’épisodes autosuffisants d’un point de vue narratif, les séries TV modernes développent des intrigues continues sur des dizaines d’épisodes successifs. Cependant, la continuité narrative des séries TV modernes entre directement en conflit avec les conditions usuelles de visionnage : en raison des technologies modernes de visionnage, les nouvelles saisons des séries TV sont regardées sur de courtes périodes de temps. Par conséquent, les spectateurs sur le
APA, Harvard, Vancouver, ISO, and other styles
25

Dupuy, Grégor. "Les collections volumineuses de documents audiovisuels : segmentation et regroupement en locuteurs." Thesis, Le Mans, 2015. http://www.theses.fr/2015LEMA1006/document.

Full text
Abstract:
La tâche de Segmentation et Regroupement en Locuteurs (SRL), telle que définie par le NIST, considère le traitement des enregistrements d’un corpus comme des problèmes indépendants. Les enregistrements sont traités séparément, et le tauxd’erreur global sur le corpus correspond finalement à une moyenne pondérée. Dans ce contexte, les locuteurs détectés par le système sont identifiés par des étiquettes anonymes propres à chaque enregistrement. Un même locuteur qui interviendrait dans plusieurs enregistrements sera donc identifié par des étiquettes différentes selon les enregistrements. Cette sit
APA, Harvard, Vancouver, ISO, and other styles
26

Le, Lan Gaël. "Analyse en locuteurs de collections de documents multimédia." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1020/document.

Full text
Abstract:
La segmentation et regroupement en locuteurs (SRL) de collection cherche à répondre à la question « qui parle quand ? » dans une collection de documents multimédia. C’est un prérequis indispensable à l’indexation des contenus audiovisuels. La tâche de SRL consiste d’abord à segmenter chaque document en locuteurs, avant de les regrouper à l'échelle de la collection. Le but est de positionner des labels anonymes identifiant les locuteurs, y compris ceux apparaissant dans plusieurs documents, sans connaître à l'avance ni leur identité ni leur nombre. La difficulté posée par le regroupement en loc
APA, Harvard, Vancouver, ISO, and other styles
27

Mariotte, Théo. "Traitement automatique de la parole en réunion par dissémination de capteurs." Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1001.

Full text
Abstract:
Ces travaux de thèse se concentrent sur le traitement automatique de la parole, et plus particulièrement sur la diarisation en locuteurs. Cette tâche nécessite de segmenter le signal afin d'identifier des évènements tels que la présence de parole, de parole superposée ou de changements de locuteur. Cette recherche se focalise sur le cas où le signal est capté par un dispositif placé au centre d'un groupe de locuteurs, comme lors de réunions. Ces conditions entraînent une dégradation de la qualité des signaux en raison de l'éloignement des sources sonores (parole distante).Afin de pallier cette
APA, Harvard, Vancouver, ISO, and other styles
28

"ROBUST SPEAKER DIARIZATION FOR MEETINGS." Universitat Politècnica de Catalunya, 2006. http://www.tesisenxarxa.net/TDX-0221107-130541/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Rosário, João Miguel Pinto Carrilho do. "Speaker Diarization using Artificial Intelligence Techniques." Master's thesis, 2020. http://hdl.handle.net/10362/104277.

Full text
Abstract:
The goal in Speaker Diarization (SD) is to answer the question "Who spoke when?" for a given audio where two or more people speak taking turns. This task becomes paramount for Automatic Speech Recognition (ASR) applications as it provides structured data that can improve recognition accuracy. Despite having been investigated for decades, diarization still remains an unsolved problem. Current State-of-the-Art methods focus on either designing probabilistic models such as Gaussian Mixture Models (GMM), where embeddings are extracted from feature matrices, or employing Deep Neural Networks such a
APA, Harvard, Vancouver, ISO, and other styles
30

Chang, Cheng-Jo, and 張乘若. "Speaker Verification and Speaker Diarization based onGMM-HMM Forced Alignment and Recognition." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/p4295t.

Full text
Abstract:
碩士<br>國立臺灣大學<br>資訊工程學研究所<br>106<br>For the past several years, PLDA i-vector scoring technique has achieved great results in speaker diarization. However, in order to keep better speaker characteristics, i-vectors need to be extracted from long utterances, thus it is hard to process extremely short utterances efficiently. Regarding this problem, we propose a new framework for speaker diarization in this thesis. First, we use K-means clustering to obtain preliminary speaker diarization results and build a preliminary speaker models accrodingly. Then we adopt GMM-HMM forced alignment and GMM-HMM
APA, Harvard, Vancouver, ISO, and other styles
31

Hsu, Wu-Hua, and 許吳華. "A Preliminary Study on Speaker Diarization for Automatic Transcription of Broadcast Radio Speech." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/a3z9vr.

Full text
Abstract:
碩士<br>國立臺北科技大學<br>電子工程系<br>106<br>We use Time-delay Neural Network for Speaker Diarization. The average DER is 27.74%, which is better than 31.08% of GMM. We use trained automatic speaker diarization system to classify information of unmarked speakers in the NER-210 corpus, retrain the ASR by marking the output of the speaker information timeline. The experimental results show, through the speaker diarization system, the ASR system that classifies the speaker information can reduce the original CER from 20.01% to 19.13%. In addition, the average CER of the basic LSTM model on the automatic spe
APA, Harvard, Vancouver, ISO, and other styles
32

Tseng, Chun-han, and 曾俊翰. "Chinese Input Method Based on First Mandarin Phonetic Alphabet for Mobile Devices and an Approach in Speaker Diarization with Divide-and-Conquer." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/w2y4q4.

Full text
Abstract:
碩士<br>國立中山大學<br>資訊工程學系研究所<br>96<br>There are two research topics in this thesis. First, we implement a highly efficient Chinese input method. Second, we apply a divide-and-conquer scheme to the speaker diarization problem. The implemented Chinese input method transforms an input first-symbol sequence into a character string (a sentence). This means that a user only needs to input a first Mandarin phonetic symbol per character, which is very efficient compared to the current methods. The implementation is based on a dynamic programming scheme and language models. To reduce time complexity, t
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!