Journal articles on the topic 'Text tokenization'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 journal articles for your research on the topic 'Text tokenization.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.
Sekhar, Sowmik. "Tokenization for Text Analysis." International Journal of Scientific Research and Engineering Trends 10, no. 1 (2024): 149–52. http://dx.doi.org/10.61137/ijsret.vol.10.issue1.127.
Full textNnaemeka, M. Oparauwah, N. Odii Juliet, I. Ayogu Ikechukwu, and C. Iwuchukwu Vitalis. "A boundary-based tokenization technique for extractive text summarization." World Journal of Advanced Research and Reviews 11, no. 2 (2021): 303–12. https://doi.org/10.5281/zenodo.5336977.
Full textNnaemeka M Oparauwah, Juliet N Odii, Ikechukwu I Ayogu, and Vitalis C Iwuchukwu. "A boundary-based tokenization technique for extractive text summarization." World Journal of Advanced Research and Reviews 11, no. 2 (2021): 303–12. http://dx.doi.org/10.30574/wjarr.2021.11.2.0351.
Full textNazir, Shahzad, Muhammad Asif, Mariam Rehman, and Shahbaz Ahmad. "Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced language." PeerJ Computer Science 10 (January 31, 2024): e1704. http://dx.doi.org/10.7717/peerj-cs.1704.
Full textBAR-HAIM, ROY, KHALIL SIMA'AN, and YOAD WINTER. "Part-of-speech tagging of Modern Hebrew text." Natural Language Engineering 14, no. 2 (2008): 223–51. http://dx.doi.org/10.1017/s135132490700455x.
Full textVadlapati, Praneeth. "TokEncryption: Enhanced Hashing of Text using Tokenization." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 12 (2024): 1–8. https://doi.org/10.55041/ijsrem20280.
Full textA. Mullen, Lincoln, Kenneth Benoit, Os Keyes, Dmitry Selivanov, and Jeffrey Arnold. "Fast, Consistent Tokenization of Natural Language Text." Journal of Open Source Software 3, no. 23 (2018): 655. http://dx.doi.org/10.21105/joss.00655.
Full textBartenyev, Oleg. "Evaluating the Effectiveness of Text Tokenization Methods." Vestnik MEI, no. 6 (December 25, 2023): 144–56. http://dx.doi.org/10.24160/1993-6982-2023-6-144-156.
Full textS, Vijayarani, and Janani R. "Text Mining: open Source Tokenization Tools – An Analysis." Advanced Computational Intelligence: An International Journal (ACII) 3, no. 1 (2016): 37–47. http://dx.doi.org/10.5121/acii.2016.3104.
Full textA. Hosni Mahmoud, Hanan, Alaaeldin M. Hafez, and Eatedal Alabdulkreem. "Language-Independent Text Tokenization Using Unsupervised Deep Learning." Intelligent Automation & Soft Computing 35, no. 1 (2023): 321–34. http://dx.doi.org/10.32604/iasc.2023.026235.
Full textIrum, Naz Sodhar, Hussain Jalbani Akhtar, Hafeez Buller Abdul, and Naz Sodhar Anam. "TOKENIZATION OF SINDHI TEXT ON INFORMATION RETRIEVAL TOOL." PJEST 1, no. 1 (2021): 7. https://doi.org/10.5281/zenodo.4774104.
Full textVadlapati, Praneeth. "Tokenization Beyond NLP: Potential Applications in Data Analytics, Cybersecurity, and Beyond." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 12 (2024): 1–7. https://doi.org/10.55041/ijsrem9532.
Full textCho, Danbi, Hyunyoung Lee, and Seungshik Kang. "An Empirical Study of Korean Sentence Representation with Various Tokenizations." Electronics 10, no. 7 (2021): 845. http://dx.doi.org/10.3390/electronics10070845.
Full textVadlapati, Praneeth. "TokenizedDB: Text Tokenization using NLP for Enhanced Storage Efficiency and Data Privacy." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 12 (2024): 1–6. https://doi.org/10.55041/ijsrem11413.
Full textMcNamee, Paul, and James Mayfield. "Character N-Gram Tokenization for European Language Text Retrieval." Information Retrieval 7, no. 1/2 (2004): 73–97. http://dx.doi.org/10.1023/b:inrt.0000009441.78971.be.
Full textZalmout, Nasser, and Nizar Habash. "Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages." Prague Bulletin of Mathematical Linguistics 108, no. 1 (2017): 257–69. http://dx.doi.org/10.1515/pralin-2017-0025.
Full textSi, Chenglei, Zhengyan Zhang, Yingfa Chen, et al. "Sub-Character Tokenization for Chinese Pretrained Language Models." Transactions of the Association for Computational Linguistics 11 (May 18, 2023): 469–87. http://dx.doi.org/10.1162/tacl_a_00560.
Full textLi, Angela W., and Konstantinos Mamouras. "Efficient Algorithms for the Uniform Tokenization Problem." Proceedings of the ACM on Programming Languages 9, OOPSLA1 (2025): 1492–518. https://doi.org/10.1145/3720498.
Full textVadlapati, Praneeth. "TokChat: Tokenization of Text for Secured Peer-to-Peer Communication." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 12 (2024): 1–6. https://doi.org/10.55041/ijsrem10800.
Full textRehman, Zobia, Waqas Anwar, Usama Ijaz Bajwa, Wang Xuan, and Zhou Chaoying. "Morpheme Matching Based Text Tokenization for a Scarce Resourced Language." PLoS ONE 8, no. 8 (2013): e68178. http://dx.doi.org/10.1371/journal.pone.0068178.
Full textMR ADEPU RAJESH and DR TRYAMBAK HIWARKAR. "Exploring Preprocessing Techniques for Natural LanguageText: A Comprehensive Study Using Python Code." international journal of engineering technology and management sciences 7, no. 5 (2023): 390–99. http://dx.doi.org/10.46647/ijetms.2023.v07i05.047.
Full textSchwarz, Carlo. "Estimating text regressions using txtreg_train." Stata Journal: Promoting communications on statistics and Stata 23, no. 3 (2023): 799–812. http://dx.doi.org/10.1177/1536867x231196349.
Full textAlkesaiberi, Abdulelah, Ali Alkhathlan, and Ahmed Abdelali. "Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language Understanding." International Journal on Cybernetics & Informatics 13, no. 2 (2024): 45–59. http://dx.doi.org/10.5121/ijci.2024.130204.
Full textQin, Honglun, Meiwen Li, Lin Wang, Youming Ge, Junlong Zhu, and Ruijuan Zheng. "A Radical-Based Token Representation Method for Enhancing Chinese Pre-Trained Language Models." Electronics 14, no. 5 (2025): 1031. https://doi.org/10.3390/electronics14051031.
Full textAkkasi, Abbas, Ekrem Varoğlu, and Nazife Dimililer. "ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition." BioMed Research International 2016 (2016): 1–9. http://dx.doi.org/10.1155/2016/4248026.
Full textN. Venkatesan and N. Arulanand. "Implications of Tokenizers in BERT Model for Low-Resource Indian Language." December 2022 4, no. 4 (2023): 264–71. http://dx.doi.org/10.36548/jscp.2022.4.005.
Full textIbrahim Abdelfattah Almajali. "Comprehensive Analysis of Arabic Tokenization System Preprocessing using the Matching Model." Journal of Information Systems Engineering and Management 10, no. 4 (2025): 210–16. https://doi.org/10.52783/jisem.v10i4.8981.
Full textAlmaaytah, Shahab Ahmad. "Arabic word tokenization system using the maximum matching model." Edelweiss Applied Science and Technology 8, no. 6 (2024): 3210–17. http://dx.doi.org/10.55214/25768484.v8i6.2682.
Full textAravind Ayyagiri, Om Goel, and Shalu Jain. "Innovative Approaches to Full-Text Search with Solr and Lucene." Universal Research Reports 11, no. 1 (2024): 209–24. http://dx.doi.org/10.36676/urr.v11.i1.1336.
Full textAravind Ayyagiri, Om Goel, and Shalu Jain. "Innovative Approaches to Full-Text Search with Solr and Lucene." Innovative Research Thoughts 10, no. 3 (2024): 144–59. http://dx.doi.org/10.36676/irt.v10.i3.1473.
Full textNafea, Ahmed Adil, Muhmmad Shihab Muayad, Russel R. Majeed, et al. "A Brief Review on Preprocessing Text in Arabic Language Dataset: Techniques and Challenges." Babylonian Journal of Artificial Intelligence 2024 (May 18, 2024): 46–53. http://dx.doi.org/10.58496/bjai/2024/007.
Full textSharma, Kartik. "Text to SQL Query." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 6870–73. https://doi.org/10.22214/ijraset.2025.71793.
Full textAlzahrani, Ibrahim R. "Unlocking Potential Score Insights of Sentimental Analysis with a Deep Learning Revolutionizes." Emerging Science Journal 9, no. 1 (2025): 25–44. https://doi.org/10.28991/esj-2025-09-01-03.
Full textSergii V., Mashtalir, and Nikolenko Oleksandr V. "Data preprocessing and tokenization techniquesfortechnical Ukrainian texts." Applied Aspects of Information Technology 6, no. 3 (2023): 318–26. http://dx.doi.org/10.15276/aait.06.2023.22.
Full textAvhad, Pranjali. "WordCanvas: Text-to-Image Generation." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem32152.
Full textWiguna, Ratu Aghnia raffaidy, and Andri Irfan Rifai. "Analisis Text Clustering Masyarakat Di Twitter Mengenai Omnibus Law Menggunakan Orange Data Mining." Journal of Information Systems and Informatics 3, no. 1 (2021): 1–12. http://dx.doi.org/10.33557/journalisi.v3i1.78.
Full textChallapalli, Srinivasa Sai Abhijit. "Sentiment Analysis of the Twitter Dataset for the Prediction of Sentiments." Journal of Sensors, IoT & Health Sciences 2, no. 4 (2024): 1–15. https://doi.org/10.69996/jsihs.2024017.
Full textManikkannan, Prof. "SMS Spam Detection using Machine Learning." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 11 (2023): 1–11. http://dx.doi.org/10.55041/ijsrem27463.
Full textAmiri, Amin, Alireza Ghaffarnia, Nafiseh Ghaffar Nia, Dalei Wu, and Yu Liang. "Harmonizer: A Universal Signal Tokenization Framework for Multimodal Large Language Models." Mathematics 13, no. 11 (2025): 1819. https://doi.org/10.3390/math13111819.
Full textBogdanov, M. R., G. R. Shakhmametova, and N. N. Oskin. "Possibility of Using the Attention Mechanism in Multimodal Recognition of Cardiovascular Diseases." Programmnaya Ingeneria 15, no. 11 (2024): 578–88. http://dx.doi.org/10.17587/prin.15.578-588.
Full textDr. Madhur Jain, Shilpi Jain, Shruti Daga, and Roshni. "Unveiling Text Representation with 'Bag of Words'." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 10, no. 3 (2024): 146–54. http://dx.doi.org/10.32628/cseit2410314.
Full textSomanathan Pillai, Sanjaikanth E. Vadakkethil, Srinivas A. Vaddadi, Rohith Vallabhaneni, Santosh Reddy Addula, and Bhuvanesh Ananthan. "TextBugger: an extended adversarial text attack on NLP-based text classification model." Indonesian Journal of Electrical Engineering and Computer Science 38, no. 3 (2025): 1735. https://doi.org/10.11591/ijeecs.v38.i3.pp1735-1744.
Full textBadry Ali Mustofa and Wawan Laksito Yuly Saptomo. "Use of Natural Language Processing in Social Media Text Analysis." Journal of Artificial Intelligence and Engineering Applications (JAIEA) 4, no. 2 (2025): 1235–38. https://doi.org/10.59934/jaiea.v4i2.875.
Full textBakaev, Ilkhom Izatovich. "The development of stemming algorithm for the Uzbek language." Кибернетика и программирование, no. 1 (January 2021): 1–12. http://dx.doi.org/10.25136/2644-5522.2021.1.35847.
Full textAher, Sakshi Bhaulal. "INTELLIGENT PERSONAL MEMORY ASSISTANT." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–7. https://doi.org/10.55041/isjem03380.
Full textDarú, Gilsiley Henrique, Felipe Daltrozo da Motta Motta, Antonio Castelo, and Gustavo Valentim Loch. "Short text classification applied to item description: Some methods evaluation." Semina: Ciências Exatas e Tecnológicas 43, no. 2 (2022): 189–98. http://dx.doi.org/10.5433/1679-0375.2022v43n2p189.
Full textDržík, Dávid, and Frantisek Forgac. "Slovak morphological tokenizer using the Byte-Pair Encoding algorithm." PeerJ Computer Science 10 (November 19, 2024): e2465. http://dx.doi.org/10.7717/peerj-cs.2465.
Full textQarah, Faisal, and Tawfeeq Alsanoosy. "A Comprehensive Analysis of Various Tokenizers for Arabic Large Language Models." Applied Sciences 14, no. 13 (2024): 5696. http://dx.doi.org/10.3390/app14135696.
Full textYunania, Nanda, and Yulian Findawati. "Hate Speech and Emotion Detection on Twitter Using LSTM Model." JICTE (Journal of Information and Computer Technology Education) 7, no. 1 (2023): 1–5. https://doi.org/10.21070/jicte.v7i1.1645.
Full textLe, Duy Nguyen Minh, Huy Gia Le, Hai Thanh Hoang, and Vu Anh Hoang. "XBert - A Model for Hate Speech Detection in Vietnamese Text." International Journal of Emerging Technology and Advanced Engineering 13, no. 12 (2023): 1–5. http://dx.doi.org/10.46338/ijetae1223_01.
Full text