To see the other types of publications on this topic, follow the link: Speaker diarization.

Journal articles on the topic 'Speaker diarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speaker diarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Karamyan, Davit S., and Grigor A. Kirakosyan. "Building a Speaker Diarization System: Lessons from VoxSRC 2023." Mathematical Problems of Computer Science 60 (November 30, 2023): 52–62. http://dx.doi.org/10.51408/1963-0109.

Full text
Abstract:
Speaker diarization is the process of partitioning an audio recording into segments corresponding to individual speakers. In this paper, we present a robust speaker diarization system and describe its architecture. We focus on discussing the key components necessary for building a strong diarization system, such as voice activity detection (VAD), speaker embedding, and clustering. Our system emerged as the winner in the Voxceleb Speaker Recognition Challenge (VoxSRC) 2023, a widely recognized competition for evaluating speaker diarization systems.
APA, Harvard, Vancouver, ISO, and other styles
2

Iyer, Apoorva, Deepika Kini, and Shanthi Therese. "Speaker Diarization." International Journal of Computer Trends and Technology 67, no. 9 (2019): 50–54. http://dx.doi.org/10.14445/22312803/ijctt-v67i9p110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

V., Subba Ramaiah, Srinivasa Rao S., and Devaraju V.S.N.Kumar. "Speaker Diarization based on Black-Hole Entropy Fuzzy Clustering using Cepstral Features." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 4 (2020): 1055–61. https://doi.org/10.35940/ijeat.D7832.049420.

Full text
Abstract:
Speaker diarization is the process of identification of the speaker in an audio sequence. This paper proposed a speaker diarization method using the Black-hole entropy fuzzy clustering and multiple kernel weighted Mel frequency cepstral coefficient (MKMFCC) parameterization. Initially, the MKMFCC descriptor extracted the cepstral features from the input audio signal. These features are used for clustering the speakers as groups for which the BHEFC is used. The feature parameter uses the audio signal containing both the high and low energy frame for speaker indexing that resulted in accurate se
APA, Harvard, Vancouver, ISO, and other styles
4

Mr. Chaitanya Pampana, Dr. M. Vijay Reddy, and Dr. K. Jhansi Rani. "A Review on Speaker Diarization for Whispered Speech Audio." International Research Journal on Advanced Engineering and Management (IRJAEM) 3, no. 05 (2025): 1765–73. https://doi.org/10.47392/irjaem.2025.0279.

Full text
Abstract:
Speaker diarization, the process of partitioning an audio stream into segments according to the speaker identity, is crucial for various applications in speech processing and analysis. Whispered speech, characterized by its low amplitude and altered spectral properties, presents unique challenges for conventional diarization algorithms designed for clear, normal speech. In this study, I propose a novel approach for supervised speaker diarization specifically tailored to whispered speech audio streams. Supervised learning techniques, utilizing annotated data to train models capable of accuratel
APA, Harvard, Vancouver, ISO, and other styles
5

Prabhala, Jagat Chaitanya, Venkatnareshbabu K, and Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION." Applied Mathematics and Sciences An International Journal (MathSJ) 10, no. 1/2 (2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.

Full text
Abstract:
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This
APA, Harvard, Vancouver, ISO, and other styles
6

Kshirod, Kshirod Sarmah. "Speaker Diarization with Deep Learning Techniques." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, no. 3 (2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.

Full text
Abstract:
Speaker diarization is a task to identify the speaker when different speakers spoke in an audio or video recording environment. Artificial intelligence (AI) fields have effectively used Deep Learning (DL) to solve a variety of real-world application challenges. With effective applications in a wide range of subdomains, such as natural language processing, image processing, computer vision, speech and speaker recognition, and emotion recognition, cyber security, and many others, DL, a very innovative field of Machine Learning (ML), that is quickly emerging as the most potent machine learning te
APA, Harvard, Vancouver, ISO, and other styles
7

PARK, KYUNG-MI, JEONG-SIK PARK, JAE-HYUN BAE, and YUNG-HWAN OH. "ONLINE SPEAKER DIARIZATION FOR MULTIMEDIA DATA RETRIEVAL ON MOBILE DEVICES." International Journal of Pattern Recognition and Artificial Intelligence 26, no. 08 (2012): 1260011. http://dx.doi.org/10.1142/s0218001412600117.

Full text
Abstract:
Speaker diarization detects speaker change points in spoken data and organizes speaker clusters so that each cluster contains one speaker's segments. This study aims to develop online speaker diarization for multimedia data retrieval on mobile devices. Researchers have proposed various methods of diarization, but most approaches thus far depend on an empirically determined threshold as a criterion or work in an offline manner that requires prior knowledge, such as the overall number of speakers. There are therefore clear drawbacks with mobile devices, on which various types of spoken data are
APA, Harvard, Vancouver, ISO, and other styles
8

V, Sethuram, Ande Prasad, and R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization." Intelligent Decision Technologies 15, no. 4 (2022): 561–77. http://dx.doi.org/10.3233/idt-211005.

Full text
Abstract:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and
APA, Harvard, Vancouver, ISO, and other styles
9

Zaiets, I., V. Brydinskyi, D. Sabodashko, Yu Khoma, Kh Ruda, and M. Shved. "UTILIZATION OF VOICE EMBEDDINGS IN INTEGRATED SYSTEMS FOR SPEAKER DIARIZATION AND MALICIOUS ACTOR DETECTION." Computer systems and network 6, no. 1 (2024): 54–66. http://dx.doi.org/10.23939/csn2024.01.054.

Full text
Abstract:
This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems such as diarization error rate (DER) and Jaccard error rate (JER). The diarization system was tested on various audio conditions including noisy environm
APA, Harvard, Vancouver, ISO, and other styles
10

Noulas, A., G. Englebienne, and B. J. A. Krose. "Multimodal Speaker Diarization." IEEE Transactions on Pattern Analysis and Machine Intelligence 34, no. 1 (2012): 79–93. http://dx.doi.org/10.1109/tpami.2011.47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Lyu, Ke-Ming, Ren-yuan Lyu, and Hsien-Tsung Chang. "Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation." PeerJ Computer Science 10 (March 29, 2024): e1973. http://dx.doi.org/10.7717/peerj-cs.1973.

Full text
Abstract:
This research presents the development of a cutting-edge real-time multilingual speech recognition and speaker diarization system that leverages OpenAI’s Whisper model. The system specifically addresses the challenges of automatic speech recognition (ASR) and speaker diarization (SD) in dynamic, multispeaker environments, with a focus on accurately processing Mandarin speech with Taiwanese accents and managing frequent speaker switches. Traditional speech recognition systems often fall short in such complex multilingual and multispeaker contexts, particularly in SD. This study, therefore, inte
APA, Harvard, Vancouver, ISO, and other styles
12

Hsu, Yicheng, Ssuhan Chen, Yuhsin Lai, Chingyen Wang, and Mingsian R. Bai. "Spatial-temporal activity-informed diarization and separation." Journal of the Acoustical Society of America 157, no. 2 (2025): 1162–75. https://doi.org/10.1121/10.0035830.

Full text
Abstract:
A robust multichannel speaker diarization and separation system is proposed by exploiting the spatiotemporal activity of the speakers. The system is realized in a hybrid architecture that combines the array signal processing units and the deep learning units. For speaker diarization, a spatial coherence matrix across time frames is computed based on the whitened Relative Transfer Functions of the microphone array. This serves as a robust feature for subsequent machine learning without the need for prior knowledge of the array configuration. A computationally efficient modified End-to-End Neura
APA, Harvard, Vancouver, ISO, and other styles
13

Astapov, Sergei, Aleksei Gusev, Marina Volkova, et al. "Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization." Mathematics 9, no. 23 (2021): 2998. http://dx.doi.org/10.3390/math9232998.

Full text
Abstract:
Recently developed methods in spontaneous speech analytics require the use of speaker separation based on audio data, referred to as diarization. It is applied to widespread use cases, such as meeting transcription based on recordings from distant microphones and the extraction of the target speaker’s voice profiles from noisy audio. However, speech recognition and analysis can be hindered by background and point-source noise, overlapping speech, and reverberation, which all affect diarization quality in conjunction with each other. To compensate for the impact of these factors, there are a va
APA, Harvard, Vancouver, ISO, and other styles
14

Taha, Thaer Mufeed, Zaineb Ben Messaoud, and Mondher Frikha. "Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization." International Journal of Interactive Mobile Technologies (iJIM) 18, no. 03 (2024): 88–103. http://dx.doi.org/10.3991/ijim.v18i03.43013.

Full text
Abstract:
This paper introduces three system architectures for speaker identification that aim to overcome the limitations of diarization and voice-based biometric systems. Diarization systems utilize unsupervised algorithms to segment audio data based on the time boundaries of utterances, but they do not distinguish individual speakers. On the other hand, voice-based biometric systems can only identify individuals in recordings with a single speaker. Identifying speakers in recordings of natural conversations can be challenging, especially when emotional shifts can alter voice characteristics, making g
APA, Harvard, Vancouver, ISO, and other styles
15

Khoma, Volodymyr, Yuriy Khoma, Vitalii Brydinskyi, and Alexander Konovalov. "Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library." Sensors 23, no. 4 (2023): 2082. http://dx.doi.org/10.3390/s23042082.

Full text
Abstract:
Diarization is an important task when work with audiodata is executed, as it provides a solution to the problem related to the need of dividing one analyzed call recording into several speech recordings, each of which belongs to one speaker. Diarization systems segment audio recordings by defining the time boundaries of utterances, and typically use unsupervised methods to group utterances belonging to individual speakers, but do not answer the question “who is speaking?” On the other hand, there are biometric systems that identify individuals on the basis of their voices, but such systems are
APA, Harvard, Vancouver, ISO, and other styles
16

Viñals, Ignacio, Alfonso Ortega, Antonio Miguel, and Eduardo Lleida. "The Domain Mismatch Problem in the Broadcast Speaker Attribution Task." Applied Sciences 11, no. 18 (2021): 8521. http://dx.doi.org/10.3390/app11188521.

Full text
Abstract:
The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information, including the speaker information. The task known as speaker attribution aims at identifying all or part of the speakers in the audio under analysis. In this work, we carry out a study of the speaker attribution problem in the broadcast domain. Through our experiments, we illustrate the positive impact of diarization on the final performance. Additionally, we show the influence of the variability present in broadcast data, dep
APA, Harvard, Vancouver, ISO, and other styles
17

Indu D. "A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients." Journal of Electrical Systems 20, no. 6s (2024): 2938–45. http://dx.doi.org/10.52783/jes.3299.

Full text
Abstract:
Research on Speaker Identification is always difficult. A speaker may be automatically identified using by comparing their voice sample with their previously recorded voice, the machine learning strategy has grown in favor in recent years. Convolutional neural networks (CNN) , deep neural networks (DNN) are some of the machine learning techniques that has employed recently. The article will discuss a successful speaker verification system based on the d-vector to construct a new approach based on speaker diarization. In particular, in this article, we use the concept of LSTM to cluster the spe
APA, Harvard, Vancouver, ISO, and other styles
18

Murali, Abhejay, Satwik Dutta, Meena Chandra Shekar, Dwight Irvin, Jay Buzhardt, and John H. Hansen. "Towards developing speaker diarization for parent-child interactions." Journal of the Acoustical Society of America 152, no. 4 (2022): A61. http://dx.doi.org/10.1121/10.0015551.

Full text
Abstract:
Daily interactions of children with their parents are crucial for spoken language skills and overall development. Capturing such interactions can help to provide meaningful feedback to parents as well as practitioners. Naturalistic audio capture and developing further speech processing pipeline for parent-child interactions is a challenging problem. One of the first important steps in the speech processing pipeline is Speaker Diarization—to identify who spoke when. Speaker Diarization is the method of separating a captured audio stream into analogous segments that are differentiated by the spe
APA, Harvard, Vancouver, ISO, and other styles
19

Ahmad, Zubair, Alquhayz, and Ditta. "Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model." Sensors 19, no. 23 (2019): 5163. http://dx.doi.org/10.3390/s19235163.

Full text
Abstract:
Speaker diarization systems aim to find ‘who spoke when?’ in multi-speaker recordings. The dataset usually consists of meetings, TV/talk shows, telephone and multi-party interaction recordings. In this paper, we propose a novel multimodal speaker diarization technique, which finds the active speaker through audio-visual synchronization model for diarization. A pre-trained audio-visual synchronization model is used to find the synchronization between a visible person and the respective audio. For that purpose, short video segments comprised of face-only regions are acquired using a face detecti
APA, Harvard, Vancouver, ISO, and other styles
20

Jiao, Xiaolin, Yaqi Chen, Dan Qu, and Xukui Yang. "Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization." Electronics 12, no. 19 (2023): 4118. http://dx.doi.org/10.3390/electronics12194118.

Full text
Abstract:
At present, a prevalent approach to speaker diarization is clustering based on speaker embeddings. However, this method encounters two primary issues. Firstly, it cannot directly minimize the diarization error during the training process; secondly, the majority of clustering-based methods struggle to handle speaker overlap in audio. A viable approach for addressing these issues involves adopting end-to-end speaker diarization (EEND). Nevertheless, training this EEND system generally requires lengthy audio inputs, which must be downsampled to allow efficient model processing. In this study, we
APA, Harvard, Vancouver, ISO, and other styles
21

Aronowitz, Hagai. "COMPENSATION OF INTRA-SPEAKER VARIABILITY IN SPEAKER DIARIZATION." Journal of the Acoustical Society of America 134, no. 5 (2013): 3967. http://dx.doi.org/10.1121/1.4828924.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Alvarez-Trejos, Juan Ignacio, Alicia Lozano-Diez, and Daniel Ramos. "Feature Integration Strategies for Neural Speaker Diarization in Conversational Telephone Speech." Applied Sciences 15, no. 9 (2025): 4842. https://doi.org/10.3390/app15094842.

Full text
Abstract:
This paper addresses the challenge of optimizing end-to-end neural diarization systems for conversational telephone speech, focusing on diverse acoustic features beyond traditional Mel-filterbanks. We present a methodological framework for integrating and analyzing different feature types as input to the well-known End-to-End Neural Diarization with Encoder Decoder Attractors (EEND-EDA) model, focusing on Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN) embeddings and Geneva Minimalistic Acoustic Parameter Sets (GeMAPS). Our approach combines
APA, Harvard, Vancouver, ISO, and other styles
23

DARGAHI, Fatemeh, Costin-Alexandru DEONISE, Constantin ANGHEL, Cătălin Negru, and Florin Pop. "Microphone Speaker Analysis: Audio Segmentation and Frequency Insights." Annals of the Academy of Romanian Scientists Series on Science and Technology of Information 17, no. 1 (2024): 5–14. https://doi.org/10.56082/annalsarsciinfo.2024.1.5.

Full text
Abstract:
Audio segmentation represents a technical process used for separating a stream of audio recordings, which frequently contain multiple speakers, into uniform sections. This paper explores the implementation of voice-dialing and recognition algorithms to examine and analyze the technology's capability to accurately identify and differentiate speakers in intricate environments. It aims to enhance our understanding of the technology's functionality, including its ability to discern speakers' emotions and gender. Additionally, a hardware simulation is conducted using a two-way microphone and an Ard
APA, Harvard, Vancouver, ISO, and other styles
24

Vryzas, Nikolaos, Nikolaos Tsipas, and Charalampos Dimoulas. "Web Radio Automation for Audio Stream Management in the Era of Big Data." Information 11, no. 4 (2020): 205. http://dx.doi.org/10.3390/info11040205.

Full text
Abstract:
Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classificatio
APA, Harvard, Vancouver, ISO, and other styles
25

K. Pande, Vinod, Vijay K. Kale, and Sangramsing N. Kayte. "FEATURE EXTRACTION USING I-VECTOR AND X-VECTOR METHODS FOR SPEAKER DIARIZATION." ICTACT Journal on Soft Computing 15, no. 4 (2025): 3717–21. https://doi.org/10.21917/ijsc.2025.0515.

Full text
Abstract:
Speaker diarization is the process of identifying who is speaking at different times in audio recordings. This is important in various situations, such as recording meetings, monitoring calls in call centers, or analyzing media. In this paper, examine how well different methods for speaker diarization perform in real-life scenarios. focus on two modern techniques: I-vectors and X-vectors. I-vectors are effective for automatic speaker recognition because they create compact and efficient representations of speakers using statistical models. However, they struggle in situations involving overlap
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Jiani, Shiran Dudy, Xinlu Hu, Zhiyong Wang, Rosy Southwell, and Jacob Whitehill. "Optimizing Speaker Diarization for the Classroom: Applications in Timing Student Speech and Distinguishing Teachers from Children." Journal of Educational Data Mining 17, no. 1 (2025): 98–125. https://doi.org/10.5281/zenodo.14871875.

Full text
Abstract:
An important dimension of classroom group dynamics & collaboration is how much each person contributes to the discussion. With the goal of distinguishing teachers' speech from children's speech and measuring how much each student speaks, we have investigated how automatic speaker diarization can be built to handle real-world classroom group discussions. We examined key design considerations such as the level of granularity of speaker assignment, speech enhancement techniques, voice activity detection, and embedding assignment methods to find an effective configuration. The best speaker dia
APA, Harvard, Vancouver, ISO, and other styles
27

Barras, C., Xuan Zhu, S. Meignier, and J. L. Gauvain. "Multistage speaker diarization of broadcast news." IEEE Transactions on Audio, Speech and Language Processing 14, no. 5 (2006): 1505–12. http://dx.doi.org/10.1109/tasl.2006.878261.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Jothilakshmi, S., V. Ramalingam, and S. Palanivel. "Speaker diarization using autoassociative neural networks." Engineering Applications of Artificial Intelligence 22, no. 4-5 (2009): 667–75. http://dx.doi.org/10.1016/j.engappai.2009.01.012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Xylogiannis, Paris, Nikolaos Vryzas, Lazaros Vrysis, and Charalampos Dimoulas. "Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization." Sensors 24, no. 13 (2024): 4229. http://dx.doi.org/10.3390/s24134229.

Full text
Abstract:
Speaker diarization consists of answering the question of “who spoke when” in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed to assess the effectiveness of combining speaker embeddings with Time Difference of Arrival (TDOA) values from available microphone sensor arrays in meetings. We extract speaker embeddings using two popular and robust pre-trained models, ECAPA-TDNN and X-vectors, and calculate the TDOA values via the Genera
APA, Harvard, Vancouver, ISO, and other styles
30

Mertens, Robert, Po-Sen Huang, Luke Gottlieb, Gerald Friedland, Ajay Divakaran, and Mark Hasegawa-Johnson. "On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks." International Journal of Multimedia Data Engineering and Management 3, no. 3 (2012): 1–19. http://dx.doi.org/10.4018/jmdem.2012070101.

Full text
Abstract:
A video’s soundtrack is usually highly correlated to its content. Hence, audio-based techniques have recently emerged as a means for video concept detection complementary to visual analysis. Most state-of-the-art approaches rely on manual definition of predefined sound concepts such as “ngine sounds,” “utdoor/indoor sounds.” These approaches come with three major drawbacks: manual definitions do not scale as they are highly domain-dependent, manual definitions are highly subjective with respect to annotators and a large part of the audio content is omitted since the predefined concepts are usu
APA, Harvard, Vancouver, ISO, and other styles
31

Pan, Weijun, Yidi Wang, Yumei Zhang, and Boyuan Han. "ATC-SD Net: Radiotelephone Communications Speaker Diarization Network." Aerospace 11, no. 7 (2024): 599. http://dx.doi.org/10.3390/aerospace11070599.

Full text
Abstract:
This study addresses the challenges that high-noise environments and complex multi-speaker scenarios present in civil aviation radio communications. A novel radiotelephone communications speaker diffraction network is developed specifically for these circumstances. To improve the precision of the speaker diarization network, three core modules are designed: voice activity detection (VAD), end-to-end speaker separation for air–ground communication (EESS), and probabilistic knowledge-based text clustering (PKTC). First, the VAD module uses attention mechanisms to separate silence from irrelevant
APA, Harvard, Vancouver, ISO, and other styles
32

Kone, Tenon Charly, Sebastian Ghinet, Sayed Ahmed Dana, and Anant Grewal. "Speech detection models for effective communicable disease risk assessment in air travel environments." Journal of the Acoustical Society of America 155, no. 3_Supplement (2024): A277. http://dx.doi.org/10.1121/10.0027492.

Full text
Abstract:
In environments characterized by elevated noise levels, such as airports or aircraft cabins, travelers often find themselves involuntarily speaking loudly and drawing closer to one another in an effort to enhance communication and speech intelligibility. Unfortunately, this unintentional behaviour increases the risk of respiratory particles dispersion, potentially carrying infectious agents like bacteria which makes the contagion control more challenging. The accurate characterization of the risk associated to speaking, in such a challenging noise environment with multiple overlapping speech s
APA, Harvard, Vancouver, ISO, and other styles
33

Zhou, Yu. "Harmonic Structure Features for Robust Speaker Diarization." ETRI Journal 34, no. 4 (2012): 583–90. http://dx.doi.org/10.4218/etrij.12.0111.0455.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Ahmad, Rehan, Syed Zubair, and Hani Alquhayz. "Speech Enhancement for Multimodal Speaker Diarization System." IEEE Access 8 (2020): 126671–80. http://dx.doi.org/10.1109/access.2020.3007312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ferras, Marc, Srikanth Madikeri, and Herve Bourlard. "Speaker Diarization and Linking of Meeting Data." IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, no. 11 (2016): 1935–45. http://dx.doi.org/10.1109/taslp.2016.2590139.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Xu, Yan, Ian McLoughlin, Yan Song, and Kui Wu. "Improved i-Vector Representation for Speaker Diarization." Circuits, Systems, and Signal Processing 35, no. 9 (2015): 3393–404. http://dx.doi.org/10.1007/s00034-015-0206-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

AHMAD, Rehan, and Syed ZUBAIR. "Unsupervised deep feature embeddings for speaker diarization." TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES 27, no. 4 (2019): 3138–49. http://dx.doi.org/10.3906/elk-1901-125.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Tranter, S. E., and D. A. Reynolds. "An overview of automatic speaker diarization systems." IEEE Transactions on Audio, Speech and Language Processing 14, no. 5 (2006): 1557–65. http://dx.doi.org/10.1109/tasl.2006.878256.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Anguera, Xavier, Chuck Wooters, and Javier Hernando. "Acoustic Beamforming for Speaker Diarization of Meetings." IEEE Transactions on Audio, Speech and Language Processing 15, no. 7 (2007): 2011–22. http://dx.doi.org/10.1109/tasl.2007.902460.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Imseng, David, and Gerald Friedland. "Tuning-Robust Initialization Methods for Speaker Diarization." IEEE Transactions on Audio, Speech, and Language Processing 18, no. 8 (2010): 2028–37. http://dx.doi.org/10.1109/tasl.2010.2040796.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Barra-Chicote, R., J. M. Pardo, J. Ferreiros, and J. M. Montero. "Speaker Diarization Based on Intensity Channel Contribution." IEEE Transactions on Audio, Speech, and Language Processing 19, no. 4 (2011): 754–61. http://dx.doi.org/10.1109/tasl.2010.2062507.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Anguera Miro, Xavier, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals. "Speaker Diarization: A Review of Recent Research." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 2 (2012): 356–70. http://dx.doi.org/10.1109/tasl.2011.2125954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Friedland, G., A. Janin, D. Imseng, et al. "The ICSI RT-09 Speaker Diarization System." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 2 (2012): 371–81. http://dx.doi.org/10.1109/tasl.2011.2158419.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Huijbregts, Marijn, David A. van Leeuwen, and Chuck Wooters. "Speaker Diarization Error Analysis Using Oracle Components." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 2 (2012): 393–403. http://dx.doi.org/10.1109/tasl.2011.2162318.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

O’Shaughnessy, Douglas. "Speaker Diarization: A Review of Objectives and Methods." Applied Sciences 15, no. 4 (2025): 2002. https://doi.org/10.3390/app15042002.

Full text
Abstract:
Recorded audio often contains speech from multiple people in conversation. It is useful to label such signals with speaker turns, noting when each speaker is talking and identifying each speaker. This paper discusses how to process speech signals to do such speaker diarization (SD). We examine the nature of speech signals, to identify the possible acoustical features that could assist this clustering task. Traditional speech analysis techniques are reviewed, as well as measures of spectral similarity and clustering. Speech activity detection requires separating speech from background noise in
APA, Harvard, Vancouver, ISO, and other styles
46

Vaquero, C., A. Ortega, A. Miguel, and Eduardo Lleida. "Quality Assessment for Speaker Diarization and Its Application in Speaker Characterization." IEEE Transactions on Audio, Speech, and Language Processing 21, no. 4 (2013): 816–27. http://dx.doi.org/10.1109/tasl.2012.2236317.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Kothalkar, Prasanna V., Dwight Irvin, Jay Buzhardt, and John H. Hansen. "End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms." Journal of the Acoustical Society of America 153, no. 3_supplement (2023): A174. http://dx.doi.org/10.1121/10.0018568.

Full text
Abstract:
Speech and language development are early indicators of overall analytical and learning ability in pre-school children. Early childhood researchers are interested in analyzing naturalistic versus controlled lab recordings to assess both quality and quantity of such communication interactions between children and adults/teachers. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to diversity of acoustic events/conditionsin daylong audio streams, automated speaker diarization technology is limited and
APA, Harvard, Vancouver, ISO, and other styles
48

Ahmed, Ahmed Isam, John P. Chiverton, David L. Ndzi, and Mahmoud M. Al-Faris. "Channel and channel subband selection for speaker diarization." Computer Speech & Language 75 (September 2022): 101367. http://dx.doi.org/10.1016/j.csl.2022.101367.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Rho, Jinsang, Suwon Shon, Sung Soo Kim, Jae-Won Lee, and Hanseok Ko. "Local Distribution Based Density Clustering for Speaker Diarization." Journal of the Acoustical Society of Korea 34, no. 4 (2015): 303–9. http://dx.doi.org/10.7776/ask.2015.34.4.303.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Sultan, Wael Ali, Mourad Samir Semary, and Sherif Mahdy Abdou. "An Efficient Speaker Diarization Pipeline for Conversational Speech." Benha Journal of Applied Sciences 9, no. 5 (2024): 141–46. http://dx.doi.org/10.21608/bjas.2024.284482.1414.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!