Academic literature on the topic 'Speaker diarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speaker diarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speaker diarization"

1

Karamyan, Davit S., and Grigor A. Kirakosyan. "Building a Speaker Diarization System: Lessons from VoxSRC 2023." Mathematical Problems of Computer Science 60 (November 30, 2023): 52–62. http://dx.doi.org/10.51408/1963-0109.

Full text
Abstract:
Speaker diarization is the process of partitioning an audio recording into segments corresponding to individual speakers. In this paper, we present a robust speaker diarization system and describe its architecture. We focus on discussing the key components necessary for building a strong diarization system, such as voice activity detection (VAD), speaker embedding, and clustering. Our system emerged as the winner in the Voxceleb Speaker Recognition Challenge (VoxSRC) 2023, a widely recognized competition for evaluating speaker diarization systems.
APA, Harvard, Vancouver, ISO, and other styles
2

Iyer, Apoorva, Deepika Kini, and Shanthi Therese. "Speaker Diarization." International Journal of Computer Trends and Technology 67, no. 9 (2019): 50–54. http://dx.doi.org/10.14445/22312803/ijctt-v67i9p110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

V., Subba Ramaiah, Srinivasa Rao S., and Devaraju V.S.N.Kumar. "Speaker Diarization based on Black-Hole Entropy Fuzzy Clustering using Cepstral Features." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 4 (2020): 1055–61. https://doi.org/10.35940/ijeat.D7832.049420.

Full text
Abstract:
Speaker diarization is the process of identification of the speaker in an audio sequence. This paper proposed a speaker diarization method using the Black-hole entropy fuzzy clustering and multiple kernel weighted Mel frequency cepstral coefficient (MKMFCC) parameterization. Initially, the MKMFCC descriptor extracted the cepstral features from the input audio signal. These features are used for clustering the speakers as groups for which the BHEFC is used. The feature parameter uses the audio signal containing both the high and low energy frame for speaker indexing that resulted in accurate se
APA, Harvard, Vancouver, ISO, and other styles
4

Mr. Chaitanya Pampana, Dr. M. Vijay Reddy, and Dr. K. Jhansi Rani. "A Review on Speaker Diarization for Whispered Speech Audio." International Research Journal on Advanced Engineering and Management (IRJAEM) 3, no. 05 (2025): 1765–73. https://doi.org/10.47392/irjaem.2025.0279.

Full text
Abstract:
Speaker diarization, the process of partitioning an audio stream into segments according to the speaker identity, is crucial for various applications in speech processing and analysis. Whispered speech, characterized by its low amplitude and altered spectral properties, presents unique challenges for conventional diarization algorithms designed for clear, normal speech. In this study, I propose a novel approach for supervised speaker diarization specifically tailored to whispered speech audio streams. Supervised learning techniques, utilizing annotated data to train models capable of accuratel
APA, Harvard, Vancouver, ISO, and other styles
5

Prabhala, Jagat Chaitanya, Venkatnareshbabu K, and Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION." Applied Mathematics and Sciences An International Journal (MathSJ) 10, no. 1/2 (2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.

Full text
Abstract:
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This
APA, Harvard, Vancouver, ISO, and other styles
6

Kshirod, Kshirod Sarmah. "Speaker Diarization with Deep Learning Techniques." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, no. 3 (2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.

Full text
Abstract:
Speaker diarization is a task to identify the speaker when different speakers spoke in an audio or video recording environment. Artificial intelligence (AI) fields have effectively used Deep Learning (DL) to solve a variety of real-world application challenges. With effective applications in a wide range of subdomains, such as natural language processing, image processing, computer vision, speech and speaker recognition, and emotion recognition, cyber security, and many others, DL, a very innovative field of Machine Learning (ML), that is quickly emerging as the most potent machine learning te
APA, Harvard, Vancouver, ISO, and other styles
7

PARK, KYUNG-MI, JEONG-SIK PARK, JAE-HYUN BAE, and YUNG-HWAN OH. "ONLINE SPEAKER DIARIZATION FOR MULTIMEDIA DATA RETRIEVAL ON MOBILE DEVICES." International Journal of Pattern Recognition and Artificial Intelligence 26, no. 08 (2012): 1260011. http://dx.doi.org/10.1142/s0218001412600117.

Full text
Abstract:
Speaker diarization detects speaker change points in spoken data and organizes speaker clusters so that each cluster contains one speaker's segments. This study aims to develop online speaker diarization for multimedia data retrieval on mobile devices. Researchers have proposed various methods of diarization, but most approaches thus far depend on an empirically determined threshold as a criterion or work in an offline manner that requires prior knowledge, such as the overall number of speakers. There are therefore clear drawbacks with mobile devices, on which various types of spoken data are
APA, Harvard, Vancouver, ISO, and other styles
8

V, Sethuram, Ande Prasad, and R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization." Intelligent Decision Technologies 15, no. 4 (2022): 561–77. http://dx.doi.org/10.3233/idt-211005.

Full text
Abstract:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and
APA, Harvard, Vancouver, ISO, and other styles
9

Zaiets, I., V. Brydinskyi, D. Sabodashko, Yu Khoma, Kh Ruda, and M. Shved. "UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION." Computer systems and network 6, no. 1 (2024): 54–66. http://dx.doi.org/10.23939/csn2024.01.054.

Full text
Abstract:
This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environm
APA, Harvard, Vancouver, ISO, and other styles
10

Noulas, A., G. Englebienne, and B. J. A. Krose. "Multimodal Speaker Diarization." IEEE Transactions on Pattern Analysis and Machine Intelligence 34, no. 1 (2012): 79–93. http://dx.doi.org/10.1109/tpami.2011.47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Speaker diarization"

1

Cui, Ming. "Experiments in speaker diarization using speaker vectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.

Full text
Abstract:
Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our resea
APA, Harvard, Vancouver, ISO, and other styles
2

Delgado, Flores Héctor. "Fast cross-session speaker diarization." Doctoral thesis, Universitat Autònoma de Barcelona, 2015. http://hdl.handle.net/10803/309290.

Full text
Abstract:
Actualmente se crean, almacenan, editan y distribuyen grandes cantidades de contenidos audiovisuales, en parte debido a la capacidad de almacenamiento prácticamente ilimitada, al acceso a los medios necesarios por todo el mundo y en cualquier parte, y a la ubicua conectividad proporcionada por Internet. En este contexto, se requiere una gestión adecuada y sostenible que permita la búsqueda y recuperación de la información de interés. Es aquí donde las técnicas de procesamiento del habla juegan un papel crucial en el etiquetado y anotación automáticos de contenidos audiovisuales. La diarizació
APA, Harvard, Vancouver, ISO, and other styles
3

Anguera, Miró Xavier. "Robust speaker diarization for meetings." Doctoral thesis, Universitat Politècnica de Catalunya, 2006. http://hdl.handle.net/10803/6901.

Full text
Abstract:
Aquesta tesi doctoral mostra la recerca feta en l'àrea de la diarització de locutor per a sales de reunions. En la present s'estudien els algorismes i la implementació d'un sistema en diferit de segmentació i aglomerat de locutor per a grabacions de reunions a on normalment es té accés a més d'un micròfon per al processat. El bloc més important de recerca s'ha fet durant una estada al International Computer Science Institute (ICSI, Berkeley, Caligornia) per un període de dos anys.<br/><br/>La diarització de locutor s'ha estudiat força per al domini de grabacions de ràdio i televisió. La majori
APA, Harvard, Vancouver, ISO, and other styles
4

Shum, Stephen (Stephen Hin-Chung). "Unsupervised methods for speaker diarization." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66478.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (p. 93-95).<br>Given a stream of unlabeled audio data, speaker diarization is the process of determining "who spoke when." We propose a novel approach to solving this problem by taking advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features and exploiting the inherent variabilities in the data through the use of unsupervised methods. Upon initial evaluat
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, David I.-Chung. "Speaker diarization : "who spoke when"." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/59624/1/David_Wang_Thesis.pdf.

Full text
Abstract:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems
APA, Harvard, Vancouver, ISO, and other styles
6

Patino, Villar José María. "Efficient speaker diarization and low-latency speaker spotting." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS003/document.

Full text
Abstract:
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'ana
APA, Harvard, Vancouver, ISO, and other styles
7

Patino, Villar José María. "Efficient speaker diarization and low-latency speaker spotting." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS003.

Full text
Abstract:
La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'ana
APA, Harvard, Vancouver, ISO, and other styles
8

NIERO, MARCELO DE CAMPOS. "COMPARATIVE STUDY OF TECHNIQUES TO SPEAKER DIARIZATION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23244@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO<br>COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR<br>PROGRAMA DE EXCELENCIA ACADEMICA<br>A tarefa de diarização de locutor surgiu como forma de otimizar o trabalho do homem em recuperar informações sobre áudios, com o objetivo de realizar, por exemplo, indexação de fala e locutor. De fato, realizar a diarização de locutor consiste em, dado uma gravação de ligação telefônica, reunião ou noticiários, deve responder a pergunta Quem falou quando? sem nenhuma informação prévia sobre o áudio. A resposta em questão nos permite saber as
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Yi. "Speaker Diarization System for Call-center data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286677.

Full text
Abstract:
To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. Our speaker diarization system uses 13-Dimensional MFCCs as Features, performs Voice Active Detection (VAD), segmentation, Linear Clustering and the Hierarchical Clustering based on GMM and the BIC score. By applying it, we decrease the Equal Error
APA, Harvard, Vancouver, ISO, and other styles
10

Luque, Serrano Jordi. "Speaker diarization and tracking in multiple-sensor environments." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/119777.

Full text
Abstract:
This thesis verses about the research conducted in the topic of speaker recognition in real conditions like as meeting rooms, telephone quality speech and radio and TV broadcast news. The main objective is concerned to the automatic detection and the classification of speakers into a smart-room scenario. Acoustic speaker recognition is the application of a machine to identify an individual from a spoken sentence. It aims at processing the acoustic signals to convert them in symbolic descriptions corresponding to the identity of the speakers. For the last several years, speaker recognition in
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Speaker diarization"

1

Friedland, Gerald, and David van Leeuwen. "Speaker Recognition and Diarization." In Semantic Computing. John Wiley & Sons, Inc., 2010. http://dx.doi.org/10.1002/9780470588222.ch7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nguyen, Trung Hieu, Eng Siong Chng, and Haizhou Li. "Speaker Diarization: An Emerging Research." In Speech and Audio Processing for Coding, Enhancement and Recognition. Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Nam, Nguyen Duc, and Hieu Trung Huynh. "Speaker Diarization in Vietnamese Voice." In Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-8062-5_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Avdeeva, Anastasia, and Sergey Novoselov. "Deep Speaker Embeddings Based Online Diarization." In Speech and Computer. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pechetti, Ganesh, Anakapalli Rohini Durga Bhavani, Abhinav Dayal, and Sreenu Ponnada. "Unraveling the Techniques for Speaker Diarization." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-48888-7_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zajíc, Zbyněk, Jan Zelinka, and Luděk Müller. "Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech." In Speech and Computer. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-66429-3_55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhu, Xuan, Claude Barras, Lori Lamel, and Jean-Luc Gauvain. "Speaker Diarization: From Broadcast News to Lectures." In Machine Learning for Multimodal Interaction. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11965152_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kapsouras, Ioannis, Anastasios Tefas, Nikos Nikolaidis, and Ioannis Pitas. "Multimodal Speaker Diarization Utilizing Face Clustering Information." In Lecture Notes in Computer Science. Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-21963-9_50.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yue, Yanyan, Jun Du, and Maokui He. "Online Neural Speaker Diarization with Core Samples." In Biometric Recognition. Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20233-9_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Pande, Vinod K., Vijay K. Kale, and Sumegh Tharewal. "Audio Data Feature Extraction for Speaker Diarization." In Proceedings of the NIELIT's International Conference on Communication, Electronics and Digital Technology. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-3601-0_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speaker diarization"

1

Ichikawa, Keigo, Sei Ueno, and Akinobu Lee. "Data generation for speaker diarization by speaker transition information." In 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2024. https://doi.org/10.1109/apsipaasc63619.2025.10849311.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Siyin, and Chao Zhang. "Speaker Diarization for Unlimited Number of Speakers Using Dynamic Linear." In 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2024. https://doi.org/10.1109/iscslp63861.2024.10800134.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Rudrresh, P., P. V. Tharunn Raj, K. Hariharan, et al. "Speaker Diarization in Multispeaker and Multilingual Scenarios." In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024. http://dx.doi.org/10.1109/icccnt61001.2024.10725901.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Han, Jiangyu, Federico Landini, Johan Rohdin, Anna Silnova, Mireia Diez, and Lukáš Burget. "Leveraging Self-Supervised Learning for Speaker Diarization." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10889475.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Raghav, Nikhil, Avisek Gupta, Md Sahidullah, and Swagatam Das. "Self-Tuning Spectral Clustering for Speaker Diarization." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10890194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rouvier, Mickael, Pierre-Michel Bousquet, and Benoit Favre. "Speaker diarization through speaker embeddings." In 2015 23rd European Signal Processing Conference (EUSIPCO). IEEE, 2015. http://dx.doi.org/10.1109/eusipco.2015.7362751.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Aronowitz, Hagai. "Trainable speaker diarization." In Interspeech 2007. ISCA, 2007. http://dx.doi.org/10.21437/interspeech.2007-518.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Pang, Bowen, Huan Zhao, Gaosheng Zhang, et al. "TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge." In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2022. http://dx.doi.org/10.1109/iscslp57327.2022.10037846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Joshi, Aishwary, Mohit Kumar, and Pradip K. Das. "Speaker Diarization: A review." In 2016 International Conference on Signal Processing and Communication (ICSC). IEEE, 2016. http://dx.doi.org/10.1109/icspcom.2016.7980574.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Quan, Carlton Downey, Li Wan, Philip Andrew Mansfield, and Ignacio Lopz Moreno. "Speaker Diarization with LSTM." In ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. http://dx.doi.org/10.1109/icassp.2018.8462628.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Speaker diarization"

1

Hansen, John H. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. Defense Technical Information Center, 2015. http://dx.doi.org/10.21236/ada623029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!