Academic literature on the topic 'Pre-training corpora'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Pre-training corpora.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Pre-training corpora"
Sun, Yu, Shuohuan Wang, Yukun Li, et al. "ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8968–75. http://dx.doi.org/10.1609/aaai.v34i05.6428.
Full textMoodaley, Wayne, and Arnesh Telukdarie. "A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection." European Journal of Sustainable Development 12, no. 4 (2023): 319. http://dx.doi.org/10.14207/ejsd.2023.v12n4p319.
Full textLiu, Yinhan, Jiatao Gu, Naman Goyal, et al. "Multilingual Denoising Pre-training for Neural Machine Translation." Transactions of the Association for Computational Linguistics 8 (November 2020): 726–42. http://dx.doi.org/10.1162/tacl_a_00343.
Full textDean, Roger Thornton, and Marcus Thomas Pearce. "Algorithmically-generated Corpora that use Serial Compositional Principles Can Contribute to the Modeling of Sequential Pitch Structure in Non-tonal Music." Empirical Musicology Review 11, no. 1 (2016): 27. http://dx.doi.org/10.18061/emr.v11i1.4900.
Full textYuan, Sha, Hanyu Zhao, Zhengxiao Du, et al. "WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models." AI Open 2 (2021): 65–68. http://dx.doi.org/10.1016/j.aiopen.2021.06.001.
Full textKreutzer, Julia, Isaac Caswell, Lisa Wang, et al. "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets." Transactions of the Association for Computational Linguistics 10 (2022): 50–72. http://dx.doi.org/10.1162/tacl_a_00447.
Full textQian, Jing, Yong Yue, Katie Atkinson, and Gangmin Li. "Understanding Chinese Moral Stories with Further Pre-Training." International Journal on Natural Language Computing 12, no. 2 (2023): 01–12. http://dx.doi.org/10.5121/ijnlc.2023.12201.
Full textJiang, Xiaoze, Yaobo Liang, Weizhu Chen, and Nan Duan. "XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 10840–48. http://dx.doi.org/10.1609/aaai.v36i10.21330.
Full textKajiwara, Tomoyuki, Biwa Miura, and Yuki Arase. "Monolingual Transfer Learning via Bilingual Translators for Style-Sensitive Paraphrase Generation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8042–49. http://dx.doi.org/10.1609/aaai.v34i05.6314.
Full textKryeziu, Labehat, and Visar Shehu. "Pre-Training MLM Using Bert for the Albanian Language." SEEU Review 18, no. 1 (2023): 52–62. http://dx.doi.org/10.2478/seeur-2023-0035.
Full textDissertations / Theses on the topic "Pre-training corpora"
Ortiz, Suarez Pedro. "A Data-driven Approach to Natural Language Processing for Contemporary and Historical French." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS155.
Full textBooks on the topic "Pre-training corpora"
Humphreys, S. C. Kinship in Ancient Athens. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198788249.001.0001.
Full textPeters, Thomas A. Library Programs Online. ABC-CLIO, LLC, 2009. http://dx.doi.org/10.5040/9798400679216.
Full textBook chapters on the topic "Pre-training corpora"
Mahamoud, Ibrahim Souleiman, Mickaël Coustaty, Aurélie Joseph, Vincent Poulain d’Andecy, and Jean-Marc Ogier. "KAP: Pre-training Transformers for Corporate Documents Understanding." In Document Analysis and Recognition – ICDAR 2023 Workshops. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-41501-2_5.
Full textSiva Raju, S., and Khushboo Ahire. "Enhancing the Quality of Pre-school Education Through Training of Anganwadi Workers: A CSR Initiative." In Corporate Social Responsibility in India. Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-3902-7_5.
Full textStevens, Meg, Georgina Kennedy, and Timothy Churches. "Applying and Improving a Publicly Available Medication NER Pipeline in a Clinical Cancer EMR." In Studies in Health Technology and Informatics. IOS Press, 2024. http://dx.doi.org/10.3233/shti231051.
Full textJiang, Eric P. "Automatic Text Classification from Labeled and Unlabeled Data." In Intelligent Data Analysis for Real-Life Applications. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-4666-1806-0.ch013.
Full textSyed, Mahanazuddin, Shaymaa Al-Shukri, Shorabuddin Syed, et al. "DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool." In Studies in Health Technology and Informatics. IOS Press, 2021. http://dx.doi.org/10.3233/shti210195.
Full textRevenko, Artem, Victor Mireles, Anna Breit, et al. "Learning Ontology Classes from Text by Clustering Lexical Substitutes Derived from Language Models1." In Towards a Knowledge-Aware AI. IOS Press, 2022. http://dx.doi.org/10.3233/ssw220018.
Full textIyer, Usha. "Introduction." In Dancing Women. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780190938734.003.0001.
Full textArya, Ali. "Content Description for Face Animation." In Encyclopedia of Information Science and Technology, First Edition. IGI Global, 2005. http://dx.doi.org/10.4018/978-1-59140-553-5.ch096.
Full textBier, Ada, and Elena Borsetto. "Bisogni e preoccupazioni del corpo docente impegnato in English Medium Instruction (EMI) Una prospettiva italiana post-pandemia." In La linguistica educativa tra ricerca e sperimentazione Scritti in onore di Carmel Mary Coonan. Fondazione Università Ca’ Foscari, 2023. http://dx.doi.org/10.30687/978-88-6969-683-1/018.
Full textConference papers on the topic "Pre-training corpora"
Vu, Thuy-Trang, Xuanli He, Gholamreza Haffari, and Ehsan Shareghi. "Koala: An Index for Quantifying Overlaps with Pre-training Corpora." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-demo.7.
Full textLiu, Zhuang, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. "FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/622.
Full textQian, Jing, Yong Yue, Katie Atkinson, and Gangmin Li. "Knowledge-Enriched Moral Understanding upon Continual Pre-training." In 10th International Conference on Computer Networks & Communications (CCNET 2023). Academy and Industry Research Collaboration Center (AIRCC), 2023. http://dx.doi.org/10.5121/csit.2023.130414.
Full textLu, Jinliang, Yu Lu, and Jiajun Zhang. "Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only." In Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.findings-emnlp.190.
Full textWang, Xin'ao, Huan Li, Ke Chen, and Lidan Shou. "FedBFPT: An Efficient Federated Learning Framework for Bert Further Pre-training." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/483.
Full textQu, Yuanbin, Peihan Liu, Wei Song, Lizhen Liu, and Miaomiao Cheng. "A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2." In 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, 2020. http://dx.doi.org/10.1109/iceiec49280.2020.9152352.
Full textZan, Daoguang, Bei Chen, Dejian Yang, et al. "CERT: Continual Pre-training on Sketches for Library-oriented Code Generation." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/329.
Full textEdwards, Aleksandra, Jose Camacho-Collados, Hélène De Ribaupierre, and Alun Preece. "Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification." In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.coling-main.481.
Full textEdwards, Aleksandra, Jose Camacho-Collados, Hélène De Ribaupierre, and Alun Preece. "Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification." In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.coling-main.481.
Full textFlorencio, Felipe de A., Matheus S. de Lacerda, Anderson P. Cavalcanti, and Vitor Rolim. "Three-Layer Denoiser: Denoising Parallel Corpora for NMT Systems." In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/eniac.2023.234268.
Full text