Увійти

Готові списки джерел за темами / Wav2vec

Добірка наукової літератури з теми "Wav2vec"

Автор: Grafiati

Опубліковано: 19 червня 2021

Оновлено: 5 лютого 2022

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Wav2vec".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Зміст

Статті в журналах
Дисертації
Тези доповідей конференцій

Статті в журналах з теми "Wav2vec":

1

Kolesau, Aliaksei, and Dmitrij Šešok. "Unsupervised Pre-Training for Voice Activation." Applied Sciences 10, no. 23 (December 3, 2020): 8643. http://dx.doi.org/10.3390/app10238643.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The problem of voice activation is to find a pre-defined word in the audio stream. Solutions such as keyword spotter “Ok, Google” for Android devices or keyword spotter “Alexa” for Amazon devices use tens of thousands to millions of keyword examples in training. In this paper, we explore the possibility of using pre-trained audio features to build voice activation with a small number of keyword examples. The contribution of this article consists of two parts. First, we investigate the dependence of the quality of the voice activation system on the number of examples in training for English and Russian and show that the use of pre-trained audio features, such as wav2vec, increases the accuracy of the system by up to 10% if only seven examples are available for each keyword during training. At the same time, the benefits of such features become less and disappear as the dataset size increases. Secondly, we prepare and provide for general use a dataset for training and testing voice activation for the Lithuanian language. We also provide training results on this dataset.

2

Tong, Haonan, Zhaohui Yang, Sihua Wang, Ye Hu, Omid Semiari, Walid Saad, and Changchuan Yin. "Federated Learning for Audio Semantic Communication." Frontiers in Communications and Networks 2 (September 10, 2021). http://dx.doi.org/10.3389/frcmn.2021.734402.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In this paper, the problem of audio semantic communication over wireless networks is investigated. In the considered model, wireless edge devices transmit large-sized audio data to a server using semantic communication techniques. The techniques allow devices to only transmit audio semantic information that captures the contextual features of audio signals. To extract the semantic information from audio signals, a wave to vector (wav2vec) architecture based autoencoder is proposed, which consists of convolutional neural networks (CNNs). The proposed autoencoder enables high-accuracy audio transmission with small amounts of data. To further improve the accuracy of semantic information extraction, federated learning (FL) is implemented over multiple devices and a server. Simulation results show that the proposed algorithm can converge effectively and can reduce the mean squared error (MSE) of audio transmission by nearly 100 times, compared to a traditional coding scheme.

Дисертації з теми "Wav2vec":

1

Bakheet, Mohammed. "Improving Speech Recognition for Arabic language Using Low Amounts of Labeled Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176437.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The importance of Automatic Speech Recognition (ASR) Systems, whose job is to generate text from audio, is increasing as the number of applications of these systems is rapidly going up. However, when it comes to training ASR systems, the process is difficult and rather tedious, and that could be attributed to the lack of training data. ASRs require huge amounts of annotated training data containing the audio files and the corresponding accurately written transcript files. This annotated (labeled) training data is very difficult to find for most of the languages, it usually requires people to perform the annotation manually which, apart from the monetary price it costs, is error-prone. A supervised training task is impractical for this scenario. The Arabic language is one of the languages that do not have an abundance of labeled data, which makes its ASR system's accuracy very low compared to other resource-rich languages such as English, French, or Spanish. In this research, we take advantage of unlabeled voice data by learning general data representations from unlabeled training data (only audio files) in a self-supervised task or pre-training phase. This phase is done by using wav2vec 2.0 framework which masks out input in the latent space and solves a contrastive task. The model is then fine-tuned on a few amounts of labeled data. We also exploit models that have been pre-trained on different languages, by using wav2vec 2.0, for the purpose of fine-tuning them on Arabic language by using annotated Arabic data. We show that using wav2vec 2.0 framework for pre-training on Arabic is considerably time and resource-consuming. It took the model 21.5 days (about 3 weeks) to complete 662 epochs and get a validation accuracy of 58%. Arabic is a right-to-left (rtl) language with many diacritics that indicate how letters should be pronounced, these two features make it difficult for Arabic to fit into these models, as it requires heavy pre-processing for the transcript files. We demonstrate that we can fine-tune a cross-lingual model, that is trained on raw waveforms of speech in multiple languages, on Arabic data and get a low word error rate 36.53%. We also prove that by fine-tuning the model parameters we can increase the accuracy, thus, decrease the word error rate from 54.00% to 36.69%.

2

Zouhair, Taha. "Automatic Speech Recognition for low-resource languages using Wav2Vec2 : Modern Standard Arabic (MSA) as an example of a low-resource language." Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37702.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The need for fully automatic translation at DigitalTolk, a Stockholm-based company providing translation services, leads to exploring Automatic Speech Recognition as a first step for Modern Standard Arabic (MSA). Facebook AI recently released a second version of its Wav2Vec models, dubbed Wav2Vec 2.0, which uses deep neural networks and provides several English pretrained models along with a multilingual model trained in 53 different languages, referred to as the Cross-Lingual Speech Representation (XLSR-53). The small English and the XLSR-53 pretrained models are tested, and the results stemming from them discussed, with the Arabic data from Mozilla Common Voice. In this research, the small model did not yield any results and may have needed more unlabelled data to train whereas the large model proved to be successful in predicting the audio recordings in Arabic and a Word Error Rate of 24.40% was achieved, an unprecedented result. The small model turned out to be not suitable for training especially on languages other than English and where the unlabelled data is not enough. On the other hand, the large model gave very promising results despite the low amount of data. The large model should be the model of choice for any future training that needs to be done on low resource languages such as Arabic.

Тези доповідей конференцій з теми "Wav2vec":

1

Xu, Xiaoshuo, Yueteng Kang, Songjun Cao, Binghuai Lin, and Long Ma. "Explore wav2vec 2.0 for Mispronunciation Detection." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-777.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Schneider, Steffen, Alexei Baevski, Ronan Collobert, and Michael Auli. "wav2vec: Unsupervised Pre-Training for Speech Recognition." In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1873.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Pepino, Leonardo, Pablo Riera, and Luciana Ferrer. "Emotion Recognition from Speech Using wav2vec 2.0 Embeddings." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-703.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Gris, Lucas Rafael Stefanel, Edresson Casanova, Frederico Santos de Oliveira, Anderson da Silva Soares, and Arnaldo Candido-Junior. "Desenvolvimento de um modelo de reconhecimento de voz para o Português Brasileiro com poucos dados utilizando o Wav2vec 2.0." In Brazilian e-Science Workshop. Sociedade Brasileira de Computação, 2021. http://dx.doi.org/10.5753/bresci.2021.15798.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Técnicas de aprendizado profundo têm se mostrado muito eficientes nas mais diversas tarefas, em especial, no desenvolvimento de sistemasde reconhecimento de voz. Apesar do avanço na área, seu desenvolvimento ainda pode ser considerado uma tarefa difícil, especialmente em idiomas que apresentam poucos dados abertos disponíveis, como o Português Brasileiro. Considerando essa limitação, o Wav2vec 2.0, uma arquitetura que dispensa a necessidade de uma grande quantidade de áudios rotulados, pode ser uma alternativa interessante. Nesse sentido, este trabalho apresenta como objetivo avaliar o desenvolvimento de um reconhecedor de voz utilizando poucos dados disponíveis gratuitamente a partir do ajuste do modelo Wav2vec 2.0 pré-treinado em muitas línguas. Este trabalho mostra que é possível construir um sistema de reconhecimento de voz utilizando apenas 1h de fala transcrita para o Português Brasileiro. O modelo ajustado apresenta um WER de somente 34% contra o dataset da Common Voice.

5

Xie, Yang, Zhenchuan Zhang, and Yingchun Yang. "Siamese Network with wav2vec Feature for Spoofing Speech Detection." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-847.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Fan, Zhiyun, Meng Li, Shiyu Zhou, and Bo Xu. "Exploring wav2vec 2.0 on Speaker Verification and Language Identification." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-1280.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Sadhu, Samik, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, and Roland Maas. "wav2vec-C: A Self-Supervised Model for Speech Representation Learning." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-717.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Hsu, Wei-Ning, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, et al. "Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-236.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Vyas, Apoorv, Srikanth Madikeri, and Hervé Bourlard. "Comparing CTC and LFMMI for Out-of-Domain Adaptation of wav2vec 2.0 Acoustic Model." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-1683.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Zhu, Youxiang, Abdelrahman Obyat, Xiaohui Liang, John A. Batsis, and Robert M. Roth. "WavBERT: Exploiting Semantic and Non-Semantic Speech Using Wav2vec and BERT for Dementia Detection." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-332.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.