Academic literature on the topic 'Wikitext-103'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Wikitext-103.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Wikitext-103"

1

Takase, Sho, Jun Suzuki, and Masaaki Nagata. "Character n-Gram Embeddings to Improve RNN Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5074–82. http://dx.doi.org/10.1609/aaai.v33i01.33015074.

Full text
Abstract:
This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character ngram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headli
APA, Harvard, Vancouver, ISO, and other styles
2

Roy, Aurko, Mohammad Saffar, Ashish Vaswani, and David Grangier. "Efficient Content-Based Sparse Attention with Routing Transformers." Transactions of the Association for Computational Linguistics 9 (February 2021): 53–68. http://dx.doi.org/10.1162/tacl_a_00353.

Full text
Abstract:
Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic sparse attention patterns that avoid allocating computation and memory to attend to content unrelated to the query of interest. This work builds upon two lines of research: It combines the modelin
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Wikitext-103"

1

Sagen, Markus. "Large-Context Question Answering with Cross-Lingual Transfer." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-440704.

Full text
Abstract:
Models based around the transformer architecture have become one of the most prominent for solving a multitude of natural language processing (NLP)tasks since its introduction in 2017. However, much research related to the transformer model has focused primarily on achieving high performance and many problems remain unsolved. Two of the most prominent currently are the lack of high performing non-English pre-trained models, and the limited number of words most trained models can incorporate for their context. Solving these problems would make NLP models more suitable for real-world application
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Wikitext-103"

1

Jiang, Nan, Wenge Rong, Min Gao, Yikang Shen, and Zhang Xiong. "Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/271.

Full text
Abstract:
Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation and softmax with class decomposition, we explore the tree-based hierarchical softmax method and reform its architecture, making it compatible with modern GPUs and introducing a compact tree-based loss func
APA, Harvard, Vancouver, ISO, and other styles
2

Lin, Ye, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, and Jingbo Zhu. "Towards Fully 8-bit Integer Inference for the Transformer Model." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/520.

Full text
Abstract:
8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently. On the other hand, previous systems still rely on 32-bit floating point for certain functions in complex models (e.g., Softmax in Transformer), and make heavy use of quantization and de-quantization. In this work, we show that after a principled modification on the Transformer architecture, dubbed Integer Transformer, an (almost) fully 8-bit integer inference algorithm Scale Propagation could be derived. De-quantization is adopted when necessary,
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!