Relevant bibliographies by topics / Speaker embedding

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Speaker embedding'

Author: Grafiati

Published: 28 June 2021

Last updated: 29 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speaker embedding.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speaker embedding"

Mridha, Muhammad Firoz, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md Abdul Hamid, Md Rashedul Islam, and Yutaka Watanobe. "U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data." Applied Sciences 11, no. 21 (2021): 10079. http://dx.doi.org/10.3390/app112110079.

Full text

Abstract:

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate

APA, Harvard, Vancouver, ISO, and other styles

Kim, Minsoo, and Gil-Jin Jang. "Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding." Applied Sciences 14, no. 18 (2024): 8138. http://dx.doi.org/10.3390/app14188138.

Full text

Abstract:

Automatic speech recognition (ASR) aims at understanding naturally spoken human speech to be used as text inputs to machines. In multi-speaker environments, where multiple speakers are talking simultaneously with a large amount of overlap, a significant performance degradation may occur with conventional ASR systems if they are trained by recordings of single talkers. This paper proposes a multi-speaker ASR method that incorporates speaker embedding information as an additional input. The embedding information for each of the speakers in the training set was extracted as numeric vectors, and a

APA, Harvard, Vancouver, ISO, and other styles

Liu, Elaine M., Jih-Wei Yeh, Jen-Hao Lu, and Yi-Wen Liu. "Speaker embedding space cosine similarity comparisons of singing voice conversion models and voice morphing." Journal of the Acoustical Society of America 154, no. 4_supplement (2023): A244. http://dx.doi.org/10.1121/10.0023424.

Full text

Abstract:

We explore the use of cosine similarity between x-vector speaker embeddings as an objective metric to evaluate the effectiveness of singing voice conversion. Our system preprocesses a source singer’s audio to obtain melody features via the F0 contour, loudness curve, and phonetic posteriorgram. These are input to a denoising diffusion probabilistic acoustic model conditioned with another target voice’s speaker embedding to generate a mel spectrogram, which is passed through a HiFi-GAN vocoder to synthesize audio of the source song in the target timbre. We use cosine similarity between the conv

APA, Harvard, Vancouver, ISO, and other styles

Pick, Ron Korenblum, Vladyslav Kozhukhov, Dan Vilenchik, and Oren Tsur. "STEM: Unsupervised STructural EMbedding for Stance Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 11174–82. http://dx.doi.org/10.1609/aaai.v36i10.21367.

Full text

Abstract:

Stance detection is an important task, supporting many downstream tasks such as discourse parsing and modeling the propagation of fake news, rumors, and science denial. In this paper, we propose a novel framework for stance detection. Our framework is unsupervised and domain-independent. Given a claim and a multi-participant discussion -- we construct the interaction network from which we derive topological embedding for each speaker. These speaker embedding enjoy the following property: speakers with the same stance tend to be represented by similar vectors, while antipodal vectors represent

APA, Harvard, Vancouver, ISO, and other styles

Karamyan, Davit S., and Grigor A. Kirakosyan. "Building a Speaker Diarization System: Lessons from VoxSRC 2023." Mathematical Problems of Computer Science 60 (November 30, 2023): 52–62. http://dx.doi.org/10.51408/1963-0109.

Full text

Abstract:

Speaker diarization is the process of partitioning an audio recording into segments corresponding to individual speakers. In this paper, we present a robust speaker diarization system and describe its architecture. We focus on discussing the key components necessary for building a strong diarization system, such as voice activity detection (VAD), speaker embedding, and clustering. Our system emerged as the winner in the Voxceleb Speaker Recognition Challenge (VoxSRC) 2023, a widely recognized competition for evaluating speaker diarization systems.

APA, Harvard, Vancouver, ISO, and other styles

Milewski, Krzysztof, Szymon Zaporowski, and Andrzej Czyżewski. "Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice." Electronics 12, no. 21 (2023): 4458. http://dx.doi.org/10.3390/electronics12214458.

Full text

Abstract:

The vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron, WaveRNN, and GE2E neural networks. The results of attacks using voice cloning were analyzed and discussed in the context of a subjective assessment of cloned voice f

APA, Harvard, Vancouver, ISO, and other styles

Kang, Woo Hyun, Sung Hwan Mun, Min Hyun Han, and Nam Soo Kim. "Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification." IEEE Access 8 (2020): 141838–49. http://dx.doi.org/10.1109/access.2020.3012893.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Poojary, Nigam R., and K. H. Ashish. "Text To Speech with Custom Voice." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (2023): 4523–30. http://dx.doi.org/10.22214/ijraset.2023.51217.

Full text

Abstract:

Abstract: The Text to Speech with Custom Voice system described in this work has vast applicability in numerous industries, including entertainment, education, and accessibility. The proposed text-to-speech (TTS) system is capable of generating speech audio in custom voices, even those not included in the training data. The system comprises a speaker encoder, a synthesizer, and a WaveRNN vocoder. Multiple speakers from a dataset of clean speech without transcripts are used to train the speaker encoder for a speaker verification process. The reference speech of the target speaker is used to cre

APA, Harvard, Vancouver, ISO, and other styles

Lee, Kong Aik, Qiongqiong Wang, and Takafumi Koshinaka. "Xi-Vector Embedding for Speaker Recognition." IEEE Signal Processing Letters 28 (2021): 1385–89. http://dx.doi.org/10.1109/lsp.2021.3091932.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sečujski, Milan, Darko Pekar, Siniša Suzić, Anton Smirnov, and Tijana Nosek. "Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding." JUCS - Journal of Universal Computer Science 26, no. 4 (2020): 434–53. http://dx.doi.org/10.3897/jucs.2020.023.

Full text

Abstract:

The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the sim

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Speaker embedding"

Cui, Ming. "Experiments in speaker diarization using speaker vectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.

Full text

Abstract:

Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our resea

APA, Harvard, Vancouver, ISO, and other styles

Lukáč, Peter. "Verifikace osob podle hlasu bez extrakce příznaků." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445531.

Full text

Abstract:

Verifikácia osôb je oblasť, ktorá sa stále modernizuje, zlepšuje a snaží sa vyhovieť požiadavkám, ktoré sa na ňu kladú vo oblastiach využitia ako sú autorizačné systmémy, forenzné analýzy, atď. Vylepšenia sa uskutočňujú vďaka pokrom v hlbokom učení, tvorením nových trénovacích a testovacích dátovych sad a rôznych súťaží vo verifikácií osôb a workshopov. V tejto práci preskúmame modely pre verifikáciu osôb bez extrakcie príznakov. Používanie nespracovaných zvukových stôp ako vstupy modelov zjednodušuje spracovávanie vstpu a teda znižujú sa výpočetné a pamäťové požiadavky a redukuje sa počet hyp

APA, Harvard, Vancouver, ISO, and other styles

Fahlström, Myrman Arvid. "Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks." Thesis, KTH, Tal, musik och hörsel, TMH, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210237.

Full text

Abstract:

Unsupervised learning of speech is concerned with automatically finding patterns such as words or speech sounds, without supervision in the form of orthographical transcriptions or a priori knowledge of the language. However, a fundamental problem is that unsupervised speech learning methods tend to discover highly speaker-specific and context-dependent representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes discovered by the model. We do this by training a sparse siamese model to

APA, Harvard, Vancouver, ISO, and other styles

Chung-KoYin and 尹崇珂. "Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/h5smz2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Che-ChingHuang and 黃喆青. "Speaker Change Detection using Speaker and Articulatory Feature Embeddings." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ge4d25.

Full text

Abstract:

碩士<br>國立成功大學<br>資訊工程學系<br>107<br>Nowadays, with the improvement and advancement of many related technologies for voice processing, voice interactive software and products have become more and more popular. In the part of the multi-person dialogue voice, we will need to use the speaker change point detection technology to perform voice pre-processing, and then do further analysis and processing. In the past research on speaker change point detection, most of them are based on the characteristics of acoustic features for detection. The method proposed in this thesis is to provide the speaker inf

APA, Harvard, Vancouver, ISO, and other styles

Wang, Xiaoyan. "An exploration of embedding intercultural knowledge to engage students in Chinese language learning : a bilingual beginning teacher's Xingzhi/action research project." Thesis, 2016. http://hdl.handle.net/1959.7/uws:41062.

Full text

Abstract:

As a participant in the ROSETE program (a partnership between the Ningbo Education Bureau, the University of Western Sydney and the NSW Department of Education and Communities), the teacher-researcher undertook study for a Master of Education degree while at the same time undertaking a Chinese language teaching assignment in a western Sydney public school. The teaching assignment was the context for the research which sought to explore the question: How can a teacher-researcher, implementing teacher Xingzhi/action research, design Chinese lessons through embedding intercultural knowledge to

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Speaker embedding"

Camp, Elisabeth. A Dual Act Analysis of Slurs. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198758655.003.0003.

Full text

Abstract:

Slurs are incendiary terms—many deny that sentences containing them can ever be true. And utterances where they occur embedded within normally “quarantining” contexts, like conditionals and indirect reports, can still seem offensive. At the same time, others find that sentences containing slurs can be true; and there are clear cases where embedding does inoculate a speaker from the slur’s offensiveness. This chapter argues that four standard accounts of the “other” element that differentiates slurs from their more neutral counterparts—semantic content, perlocutionary effect, presupposition, an

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speaker embedding"

Karam, Z. N., and W. M. Campbell. "Graph Embedding for Speaker Recognition." In Graph Embedding for Pattern Analysis. Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4457-2_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhou, Kai, Qun Yang, Xiusong Sun, and Shaohan Liu. "A Deep Speaker Embedding Transfer Method for Speaker Verification." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32456-8_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Qi, Jiajun, Wu Guo, Jingjing Shi, Yafeng Chen, and Tan Liu. "Combining Universal Speech Attributes into Deep Speaker Embedding Extraction for Speaker Verification." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-89698-0_110.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Alam, Jahangir, Woohyun Kang, and Abderrahim Fathan. "Neural Embedding Extractors for Text-Independent Speaker Verification." In Speech and Computer. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhou, Dao, Longbiao Wang, Kong Aik Lee, Meng Liu, and Jianwu Dang. "Deep Discriminative Embedding with Ranked Weight for Speaker Verification." In Communications in Computer and Information Science. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63823-8_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Weizman, Avishai, Yehuda Ben-Shimol, and Itshak Lapidot. "Spoofing-Robust Speaker Verification Based on Time-Domain Embedding." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. https://doi.org/10.1007/978-3-031-76934-4_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Amani, Arash, Mohammad Mohammadamini, and Hadi Veisi. "Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding." In Speech and Computer. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hamouda, Meriem, and Halima Bahi. "Feature Embedding Representation for Unsupervised Speaker Diarization in Telephone Calls." In Communications in Computer and Information Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-46335-8_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Novoselov, Sergey, Galina Lavrentyeva, Vladimir Volokhov, Marina Volkova, Nikita Khmelev, and Artem Akulov. "Investigation of Different Calibration Methods for Deep Speaker Embedding Based Verification Systems." In Speech and Computer. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-48309-7_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chen, Xiaojiao, Sheng Li, and Hao Huang. "GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples." In Communications in Computer and Information Science. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-1645-0_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speaker embedding"

Horiguchi, Shota, Takafumi Moriya, Atsushi Ando, et al. "Guided Speaker Embedding." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10887711.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jin, Zezhong, Youzhi Tu, and Man-Wai Mak. "Joseph: phonetic-aware speaker embedding for far-field speaker verification." In 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2024. https://doi.org/10.1109/apsipaasc63619.2025.10849338.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Clarke, Jason, Yoshihiko Gotoh, and Stefan Goetze. "Speaker Embedding Informed Audiovisual Active Speaker Detection for Egocentric Recordings." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10890414.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Yichi, Jie Zhang, Chengqian Jiang, Weitai Zhang, Zhongyi Ye, and Lirong Dai. "Leveraging Boolean Directivity Embedding for Binaural Target Speaker Extraction." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10888158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Lantian, Chao Xing, Dong Wang, Kaimin Yu, and Thomas Fang Zheng. "Binary speaker embedding." In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016. http://dx.doi.org/10.1109/iscslp.2016.7918381.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yi, Lu, and Man-Wai Mak. "Disentangled Speaker Embedding for Robust Speaker Verification." In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9747778.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kottur, Satwik, Xiaoyu Wang, and Vitor Carvalho. "Exploring Personalized Neural Conversational Models." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/521.

Full text

Abstract:

Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advancement in Deep Learning has sparked an interest in the use of neural networks in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleani

APA, Harvard, Vancouver, ISO, and other styles

Jung, Jee-Weon, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim, and Ha-Jin Yu. "Selective Deep Speaker Embedding Enhancement for Speaker Verification." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-25.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang, and Chien-Lin Huang. "Speaker Characterization Using TDNN-LSTM Based Speaker Embedding." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683185.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Han, Min Hyun, Woo Hyun Kang, Sung Hwan Mun, and Nam Soo Kim. "Information Preservation Pooling for Speaker Embedding." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Speaker embedding'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Speaker embedding"

Dissertations / Theses on the topic "Speaker embedding"

Books on the topic "Speaker embedding"

Book chapters on the topic "Speaker embedding"

Conference papers on the topic "Speaker embedding"