Thematische Bibliographien / Speaker embedding

Inhaltsverzeichnis

Zeitschriftenartikel
Dissertationen
Bücher
Buchteile
Konferenzberichte

Auswahl der wissenschaftlichen Literatur zum Thema „Speaker embedding“

Autor: Grafiati

Veröffentlicht am 28. Juni 2021

Zuletzt aktualisiert am 29. Juli 2025

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Speaker embedding" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Speaker embedding"

Mridha, Muhammad Firoz, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md Abdul Hamid, Md Rashedul Islam, and Yutaka Watanobe. "U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data." Applied Sciences 11, no. 21 (2021): 10079. http://dx.doi.org/10.3390/app112110079.

Der volle Inhalt der Quelle

Annotation:

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Kim, Minsoo, and Gil-Jin Jang. "Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding." Applied Sciences 14, no. 18 (2024): 8138. http://dx.doi.org/10.3390/app14188138.

Der volle Inhalt der Quelle

Annotation:

Automatic speech recognition (ASR) aims at understanding naturally spoken human speech to be used as text inputs to machines. In multi-speaker environments, where multiple speakers are talking simultaneously with a large amount of overlap, a significant performance degradation may occur with conventional ASR systems if they are trained by recordings of single talkers. This paper proposes a multi-speaker ASR method that incorporates speaker embedding information as an additional input. The embedding information for each of the speakers in the training set was extracted as numeric vectors, and a

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liu, Elaine M., Jih-Wei Yeh, Jen-Hao Lu, and Yi-Wen Liu. "Speaker embedding space cosine similarity comparisons of singing voice conversion models and voice morphing." Journal of the Acoustical Society of America 154, no. 4_supplement (2023): A244. http://dx.doi.org/10.1121/10.0023424.

Der volle Inhalt der Quelle

Annotation:

We explore the use of cosine similarity between x-vector speaker embeddings as an objective metric to evaluate the effectiveness of singing voice conversion. Our system preprocesses a source singer’s audio to obtain melody features via the F0 contour, loudness curve, and phonetic posteriorgram. These are input to a denoising diffusion probabilistic acoustic model conditioned with another target voice’s speaker embedding to generate a mel spectrogram, which is passed through a HiFi-GAN vocoder to synthesize audio of the source song in the target timbre. We use cosine similarity between the conv

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Pick, Ron Korenblum, Vladyslav Kozhukhov, Dan Vilenchik, and Oren Tsur. "STEM: Unsupervised STructural EMbedding for Stance Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 11174–82. http://dx.doi.org/10.1609/aaai.v36i10.21367.

Der volle Inhalt der Quelle

Annotation:

Stance detection is an important task, supporting many downstream tasks such as discourse parsing and modeling the propagation of fake news, rumors, and science denial. In this paper, we propose a novel framework for stance detection. Our framework is unsupervised and domain-independent. Given a claim and a multi-participant discussion -- we construct the interaction network from which we derive topological embedding for each speaker. These speaker embedding enjoy the following property: speakers with the same stance tend to be represented by similar vectors, while antipodal vectors represent

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Karamyan, Davit S., and Grigor A. Kirakosyan. "Building a Speaker Diarization System: Lessons from VoxSRC 2023." Mathematical Problems of Computer Science 60 (November 30, 2023): 52–62. http://dx.doi.org/10.51408/1963-0109.

Der volle Inhalt der Quelle

Annotation:

Speaker diarization is the process of partitioning an audio recording into segments corresponding to individual speakers. In this paper, we present a robust speaker diarization system and describe its architecture. We focus on discussing the key components necessary for building a strong diarization system, such as voice activity detection (VAD), speaker embedding, and clustering. Our system emerged as the winner in the Voxceleb Speaker Recognition Challenge (VoxSRC) 2023, a widely recognized competition for evaluating speaker diarization systems.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Milewski, Krzysztof, Szymon Zaporowski, and Andrzej Czyżewski. "Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice." Electronics 12, no. 21 (2023): 4458. http://dx.doi.org/10.3390/electronics12214458.

Der volle Inhalt der Quelle

Annotation:

The vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron, WaveRNN, and GE2E neural networks. The results of attacks using voice cloning were analyzed and discussed in the context of a subjective assessment of cloned voice f

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Kang, Woo Hyun, Sung Hwan Mun, Min Hyun Han, and Nam Soo Kim. "Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification." IEEE Access 8 (2020): 141838–49. http://dx.doi.org/10.1109/access.2020.3012893.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Poojary, Nigam R., and K. H. Ashish. "Text To Speech with Custom Voice." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (2023): 4523–30. http://dx.doi.org/10.22214/ijraset.2023.51217.

Der volle Inhalt der Quelle

Annotation:

Abstract: The Text to Speech with Custom Voice system described in this work has vast applicability in numerous industries, including entertainment, education, and accessibility. The proposed text-to-speech (TTS) system is capable of generating speech audio in custom voices, even those not included in the training data. The system comprises a speaker encoder, a synthesizer, and a WaveRNN vocoder. Multiple speakers from a dataset of clean speech without transcripts are used to train the speaker encoder for a speaker verification process. The reference speech of the target speaker is used to cre

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Lee, Kong Aik, Qiongqiong Wang, and Takafumi Koshinaka. "Xi-Vector Embedding for Speaker Recognition." IEEE Signal Processing Letters 28 (2021): 1385–89. http://dx.doi.org/10.1109/lsp.2021.3091932.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Sečujski, Milan, Darko Pekar, Siniša Suzić, Anton Smirnov, and Tijana Nosek. "Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding." JUCS - Journal of Universal Computer Science 26, no. 4 (2020): 434–53. http://dx.doi.org/10.3897/jucs.2020.023.

Der volle Inhalt der Quelle

Annotation:

The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the sim

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Mehr Quellen

Dissertationen zum Thema "Speaker embedding"

Cui, Ming. "Experiments in speaker diarization using speaker vectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292217.

Der volle Inhalt der Quelle

Annotation:

Speaker Diarization is the task of determining ‘who spoke when?’ in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. It has emerged as an increasingly important and dedicated domain of speech research. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Our resea

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Lukáč, Peter. "Verifikace osob podle hlasu bez extrakce příznaků." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445531.

Der volle Inhalt der Quelle

Annotation:

Verifikácia osôb je oblasť, ktorá sa stále modernizuje, zlepšuje a snaží sa vyhovieť požiadavkám, ktoré sa na ňu kladú vo oblastiach využitia ako sú autorizačné systmémy, forenzné analýzy, atď. Vylepšenia sa uskutočňujú vďaka pokrom v hlbokom učení, tvorením nových trénovacích a testovacích dátovych sad a rôznych súťaží vo verifikácií osôb a workshopov. V tejto práci preskúmame modely pre verifikáciu osôb bez extrakcie príznakov. Používanie nespracovaných zvukových stôp ako vstupy modelov zjednodušuje spracovávanie vstpu a teda znižujú sa výpočetné a pamäťové požiadavky a redukuje sa počet hyp

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Fahlström, Myrman Arvid. "Increasing speaker invariance in unsupervised speech learning by partitioning probabilistic models using linear siamese networks." Thesis, KTH, Tal, musik och hörsel, TMH, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210237.

Der volle Inhalt der Quelle

Annotation:

Unsupervised learning of speech is concerned with automatically finding patterns such as words or speech sounds, without supervision in the form of orthographical transcriptions or a priori knowledge of the language. However, a fundamental problem is that unsupervised speech learning methods tend to discover highly speaker-specific and context-dependent representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes discovered by the model. We do this by training a sparse siamese model to

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Chung-KoYin and 尹崇珂. "Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/h5smz2.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Che-ChingHuang and 黃喆青. "Speaker Change Detection using Speaker and Articulatory Feature Embeddings." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ge4d25.

Der volle Inhalt der Quelle

Annotation:

碩士<br>國立成功大學<br>資訊工程學系<br>107<br>Nowadays, with the improvement and advancement of many related technologies for voice processing, voice interactive software and products have become more and more popular. In the part of the multi-person dialogue voice, we will need to use the speaker change point detection technology to perform voice pre-processing, and then do further analysis and processing. In the past research on speaker change point detection, most of them are based on the characteristics of acoustic features for detection. The method proposed in this thesis is to provide the speaker inf

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wang, Xiaoyan. "An exploration of embedding intercultural knowledge to engage students in Chinese language learning : a bilingual beginning teacher's Xingzhi/action research project." Thesis, 2016. http://hdl.handle.net/1959.7/uws:41062.

Der volle Inhalt der Quelle

Annotation:

As a participant in the ROSETE program (a partnership between the Ningbo Education Bureau, the University of Western Sydney and the NSW Department of Education and Communities), the teacher-researcher undertook study for a Master of Education degree while at the same time undertaking a Chinese language teaching assignment in a western Sydney public school. The teaching assignment was the context for the research which sought to explore the question: How can a teacher-researcher, implementing teacher Xingzhi/action research, design Chinese lessons through embedding intercultural knowledge to

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bücher zum Thema "Speaker embedding"

Camp, Elisabeth. A Dual Act Analysis of Slurs. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198758655.003.0003.

Der volle Inhalt der Quelle

Annotation:

Slurs are incendiary terms—many deny that sentences containing them can ever be true. And utterances where they occur embedded within normally “quarantining” contexts, like conditionals and indirect reports, can still seem offensive. At the same time, others find that sentences containing slurs can be true; and there are clear cases where embedding does inoculate a speaker from the slur’s offensiveness. This chapter argues that four standard accounts of the “other” element that differentiates slurs from their more neutral counterparts—semantic content, perlocutionary effect, presupposition, an

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Speaker embedding"

Karam, Z. N., and W. M. Campbell. "Graph Embedding for Speaker Recognition." In Graph Embedding for Pattern Analysis. Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4457-2_10.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zhou, Kai, Qun Yang, Xiusong Sun, and Shaohan Liu. "A Deep Speaker Embedding Transfer Method for Speaker Verification." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32456-8_40.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Qi, Jiajun, Wu Guo, Jingjing Shi, Yafeng Chen, and Tan Liu. "Combining Universal Speech Attributes into Deep Speaker Embedding Extraction for Speaker Verification." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-89698-0_110.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Alam, Jahangir, Woohyun Kang, and Abderrahim Fathan. "Neural Embedding Extractors for Text-Independent Speaker Verification." In Speech and Computer. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_2.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zhou, Dao, Longbiao Wang, Kong Aik Lee, Meng Liu, and Jianwu Dang. "Deep Discriminative Embedding with Ranked Weight for Speaker Verification." In Communications in Computer and Information Science. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63823-8_10.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Weizman, Avishai, Yehuda Ben-Shimol, and Itshak Lapidot. "Spoofing-Robust Speaker Verification Based on Time-Domain Embedding." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. https://doi.org/10.1007/978-3-031-76934-4_4.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Amani, Arash, Mohammad Mohammadamini, and Hadi Veisi. "Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding." In Speech and Computer. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87802-3_5.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Hamouda, Meriem, and Halima Bahi. "Feature Embedding Representation for Unsupervised Speaker Diarization in Telephone Calls." In Communications in Computer and Information Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-46335-8_16.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Novoselov, Sergey, Galina Lavrentyeva, Vladimir Volokhov, Marina Volkova, Nikita Khmelev, and Artem Akulov. "Investigation of Different Calibration Methods for Deep Speaker Embedding Based Verification Systems." In Speech and Computer. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-48309-7_13.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Chen, Xiaojiao, Sheng Li, and Hao Huang. "GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples." In Communications in Computer and Information Science. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-1645-0_40.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Speaker embedding"

Horiguchi, Shota, Takafumi Moriya, Atsushi Ando, et al. "Guided Speaker Embedding." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10887711.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Jin, Zezhong, Youzhi Tu, and Man-Wai Mak. "Joseph: phonetic-aware speaker embedding for far-field speaker verification." In 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2024. https://doi.org/10.1109/apsipaasc63619.2025.10849338.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Clarke, Jason, Yoshihiko Gotoh, and Stefan Goetze. "Speaker Embedding Informed Audiovisual Active Speaker Detection for Egocentric Recordings." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10890414.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wang, Yichi, Jie Zhang, Chengqian Jiang, Weitai Zhang, Zhongyi Ye, and Lirong Dai. "Leveraging Boolean Directivity Embedding for Binaural Target Speaker Extraction." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10888158.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Li, Lantian, Chao Xing, Dong Wang, Kaimin Yu, and Thomas Fang Zheng. "Binary speaker embedding." In 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2016. http://dx.doi.org/10.1109/iscslp.2016.7918381.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Yi, Lu, and Man-Wai Mak. "Disentangled Speaker Embedding for Robust Speaker Verification." In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9747778.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Kottur, Satwik, Xiaoyu Wang, and Vitor Carvalho. "Exploring Personalized Neural Conversational Models." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/521.

Der volle Inhalt der Quelle

Annotation:

Modeling dialog systems is currently one of the most active problems in Natural Language Processing. Recent advancement in Deep Learning has sparked an interest in the use of neural networks in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. This work carefully explores and compares several of the recently proposed neural conversation models, and carries out a detailed evaluation on the multiple factors that can significantly affect predictive performance, such as pretraining, embedding training, data cleani

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Jung, Jee-Weon, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim, and Ha-Jin Yu. "Selective Deep Speaker Embedding Enhancement for Speaker Verification." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-25.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Chen, Chia-Ping, Su-Yu Zhang, Chih-Ting Yeh, Jia-Ching Wang, Tenghui Wang, and Chien-Lin Huang. "Speaker Characterization Using TDNN-LSTM Based Speaker Embedding." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683185.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Han, Min Hyun, Woo Hyun Kang, Sung Hwan Mun, and Nam Soo Kim. "Information Preservation Pooling for Speaker Embedding." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-9.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!