Academic literature on the topic 'Text dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Text dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Text dataset"

1

Khan, Shafiq Ur Rehman, and Muhammad Arshad Islam. "Event-Dataset: Temporal information retrieval and text classification dataset." Data in Brief 25 (August 2019): 104048. http://dx.doi.org/10.1016/j.dib.2019.104048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Assad, Ali, Abdul Hadi M. Alaidi, Amjad Yousif Sahib, Haider TH Salim ALRikabi, and Ahmed Magdy. "Transformer-based automatic Arabic text diacritization." Sustainable Engineering and Innovation 6, no. 2 (2024): 285–96. https://doi.org/10.37868/sei.v6i2.id305.

Full text
Abstract:
In Arabic natural language processing (NLP), automatic text diacritization is a major obstacle, and progress has been slow when compared to other language processing tasks. Automatic diacritical marking of Arabic text is proposed in this work using the first transformer-based paradigm designed solely for this task. By taking advantage of the attention mechanism, our system is able to capture more of the innate patterns in Arabic, surpassing the performance of both rule-based alternatives and neural network techniques. The model trained with the Clean-50 dataset had a diacritic error rate (DER)
APA, Harvard, Vancouver, ISO, and other styles
3

Assad, Ali, Abdul Hadi M. Alaidi, Amjad Yousif Sahib, Haider TH Salim ALRikabi, and Ahmed Magdy. "Transformer-based automatic Arabic text diacritization." Sustainable Engineering and Innovation 6, no. 2 (2024): 285–96. http://dx.doi.org/10.37868/sei.v6i2.id392.

Full text
Abstract:
In Arabic natural language processing (NLP), automatic text diacritization is a major obstacle, and progress has been slow when compared to other language processing tasks. Automatic diacritical marking of Arabic text is proposed in this work using the first transformer-based paradigm designed solely for this task. By taking advantage of the attention mechanism, our system is able to capture more of the innate patterns in Arabic, surpassing the performance of both rule-based alternatives and neural network techniques. The model trained with the Clean-50 dataset had a diacritic error rate (DER)
APA, Harvard, Vancouver, ISO, and other styles
4

Васильев, А. А., and А. С. Нестеров. "APPLYING TEXT QUESTIONS GENERATION ALGORITHMS FOR AUTOMATIC TEST GENERATION." Proceedings in Cybernetics 22, no. 3 (2023): 17–22. http://dx.doi.org/10.35266/1999-7604-2023-3-17-22.

Full text
Abstract:
The article presents findings of manual, semi-automatic, and automatic approaches to gen-erate test questions based on such methods as annotation, keyword extraction, and learning datasets for com-piling tests for studying material, along with a description of each method algorithm, examples of generated questions, and their quality assessment. These examples demonstrate the advantages of an algorithm for gen-erating a method using a dataset and a combination of methods, as well as their possible practical application.
APA, Harvard, Vancouver, ISO, and other styles
5

Saeed, Ari M. "AN AUTOMATED NEW APPROACH IN FAST TEXT CLASSIFICATION: A CASE STUDY FOR KURDISH TEXT." Science Journal of University of Zakho 12, no. 3 (2024): 330–36. http://dx.doi.org/10.25271/sjuoz.2024.12.3.1296.

Full text
Abstract:
With the rapid development of internet technology, text classification has become a vital part of obtaining quick and accurate data. Traditional machine learning methods often suffer from poor performance and high-dimensional feature spaces, which reduce their accuracy. In this paper, the FastText model is proposed as the first-ever classifier on Kurdish text and the results are compared with traditional machine learning methods to show the effects on Kurdish Text. For evaluating the model four datasets Kurdish News Dataset Headlines (KNDH), Medical Kurdish Dataset (MKD), Kurdish-Emotional-Dat
APA, Harvard, Vancouver, ISO, and other styles
6

O, Hyon-Gwang, Myong-Chol Kim, Il-Nam Pak, Un-Hyok Choe, and Chol-Jun O. "RanPil: New Dataset and Benchmark for Offline Handwritten Korean Text Recognition." International Journal on Data Science and Technology 11, no. 2 (2025): 27–34. https://doi.org/10.11648/j.ijdst.20251102.12.

Full text
Abstract:
In recent years, since deep learning technology have been applied to handwritten text recognition, the need for handwritten document image Datasets has been growing more and more. In particular, the development of the dataset is of great significance for improving performance of handwritten Korean text recognition because no dataset for handwritten Korean text recognition has been published. In this paper, we present the “RanPil”, a new training and performance evaluation dataset for handwritten Korean text recognition, which consists of a total of 8,600 pages of images (182,000 text lines and
APA, Harvard, Vancouver, ISO, and other styles
7

Maekawa, Aru, Satoshi Kosugi, Kotaro Funakoshi, and Manabu Okumura. "DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation." Journal of Natural Language Processing 32, no. 1 (2025): 252–82. https://doi.org/10.5715/jnlp.32.252.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kolesov, Anton, Dmitry Kamyshenkov, Maria Litovchenko, Elena Smekalova, Alexey Golovizin, and Alex Zhavoronkov. "On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data." Computational and Mathematical Methods in Medicine 2014 (2014): 1–11. http://dx.doi.org/10.1155/2014/781807.

Full text
Abstract:
Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor (WkNN) and soft supervised learning (SoftSL). Both of the
APA, Harvard, Vancouver, ISO, and other styles
9

Tian, Jing, Wushour Slamu, Miaomiao Xu, Chunbo Xu, and Xue Wang. "Research on Aspect-Level Sentiment Analysis Based on Text Comments." Symmetry 14, no. 5 (2022): 1072. http://dx.doi.org/10.3390/sym14051072.

Full text
Abstract:
Sentiment analysis is the processing of textual data and giving positive or negative opinions to sentences. In the ABSA dataset, most sentences contain one aspect of sentiment polarity, or sentences of one aspect have multiple identical sentiment polarities, which weakens the sentiment polarity of the ABSA dataset. Therefore, this paper uses the SemEval 14 Restaurant Review dataset, in which each document is symmetrically divided into individual sentences, and two versions of the datasets ATSA and ACSA are created. ATSA: Aspect Term Sentiment Analysis Dataset. ACSA: Aspect Category Sentiment A
APA, Harvard, Vancouver, ISO, and other styles
10

Zhao, Huanhuan, Haihua Chen, Thomas A. Ruggles, Yunhe Feng, Debjani Singh, and Hong-Jun Yoon. "Improving Text Classification with Large Language Model-Based Data Augmentation." Electronics 13, no. 13 (2024): 2535. http://dx.doi.org/10.3390/electronics13132535.

Full text
Abstract:
Large Language Models (LLMs) such as ChatGPT possess advanced capabilities in understanding and generating text. These capabilities enable ChatGPT to create text based on specific instructions, which can serve as augmented data for text classification tasks. Previous studies have approached data augmentation (DA) by either rewriting the existing dataset with ChatGPT or generating entirely new data from scratch. However, it is unclear which method is better without comparing their effectiveness. This study investigates the application of both methods to two datasets: a general-topic dataset (Re
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Text dataset"

1

Zakaria, Suliman Zubi. "Retrieving Electronic Data Interchange (EDI) Dataset using Text Mining Methods." Thesis, Сумський державний університет, 2012. http://essuir.sumdu.edu.ua/handle/123456789/28658.

Full text
Abstract:
Abstract: - The internet is a huge source of documents, containing a massive number of texts presented in multilingual languages on a wide range of topics. These texts are demonstrating in an electronic documents format hosted on the web. The documents exchanged using special forms in an Electronic Data Interchange (EDI) environment. Using web text mining approaches to mine documents in EDI environment could be new challenging guidelines in web text mining. Applying text-mining approaches to discover knowledge previously unknown patters retrieved from the web documents by using partitioned clu
APA, Harvard, Vancouver, ISO, and other styles
2

Sharma, Nabin. "Multi-lingual Text Processing from Videos." Thesis, Griffith University, 2015. http://hdl.handle.net/10072/367489.

Full text
Abstract:
Advances in digital technology have produced low priced portable imaging devices such as digital cameras attached to mobile phones, camcorders, PDA’s etc. which are highly portable. These devices can be used to capture videos and images at ease, which can be shared through the internet and other communication media. In the commercial do- main, cameras are used to create news, advertisement videos and other forms of material for information communication. The use of multiple languages to create information for targeted audiences is quite common in countries having mul
APA, Harvard, Vancouver, ISO, and other styles
3

Milintsevich, Kirill. "Estimatiοn οf depressiοn level frοm text : symptοm-based apprοach, external knοwledge, dataset validity". Electronic Thesis or Diss., Normandie, 2024. http://www.theses.fr/2024NORMC227.

Full text
Abstract:
Le trouble dépressif majeur (TDM) est l'un des troubles mentaux les plus répandus au monde, entraînant souvent une incapacité et un risque accru de suicide. La récente pandémie de coronavirus (COVID-19) a fait grimper le taux de dépression dans le monde entier. De plus, la stigmatisation et l'accès limité aux traitements entravent le diagnostic et les soins appropriés pour de nombreuses personnes.Des études préliminaires ont montré que les personnes déprimées et non déprimées utilisent un vocabulaire différent. Par exemple, les personnes déprimées ont tendance à utiliser davantage de mots néga
APA, Harvard, Vancouver, ISO, and other styles
4

Wu, Yingyu. "Using Text based Visualization in Data Analysis." Kent State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=kent1398079502.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shrimpton, Luke William. "Efficient techniques for streaming cross document coreference resolution." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28895.

Full text
Abstract:
Large text streams are commonplace; news organisations are constantly producing stories and people are constantly writing social media posts. These streams should be analysed in real-time so useful information can be extracted and acted upon instantly. When natural disasters occur people want to be informed, when companies announce new products financial institutions want to know and when celebrities do things their legions of fans want to feel involved. In all these examples people care about getting information in real-time (low latency). These streams are massively varied, people’s interest
APA, Harvard, Vancouver, ISO, and other styles
6

Ryan, Elisabeth. "Towards word alignment and dataset creation for shorthand documents and transcripts." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-452278.

Full text
Abstract:
Analysing handwritten texts and creating labelled data sets can facilitate novel research on languages and advanced computerized analysis of authors works. However, few handwritten works have word wise labelling or data sets associated with them. More often a transcription of the text is available, but without any exact coupling between words in the transcript and word representations in the document images. Can an algorithm be created that will take only an image of handwritten text and a corresponding transcript and return a partial alignment and data set? An algorithm is developed in this t
APA, Harvard, Vancouver, ISO, and other styles
7

Baraheem, Samah Saeed. "Text to Image Synthesis via Mask Anchor Points and Aesthetic Assessment." University of Dayton / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=dayton158800567702413.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Belay, Birhanu Hailu [Verfasser], and Didier [Akademischer Betreuer] Stricker. "Deep Learning for Amharic Text-Image Recognition: Algorithm, Dataset and Application / Birhanu Hailu Belay ; Betreuer: Didier Stricker." Kaiserslautern : Technische Universität Kaiserslautern, 2021. http://d-nb.info/1229436308/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Monsen, Julius. "Building high-quality datasets for abstractive text summarization : A filtering‐based method applied on Swedish news articles." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176352.

Full text
Abstract:
With an increasing amount of information on the internet, automatic text summarization could potentially make content more readily available for a larger variety of people. Training and evaluating text summarization models require datasets of sufficient size and quality. Today, most such datasets are in English, and for minor languages such as Swedish, it is not easy to obtain corresponding datasets with handwritten summaries. This thesis proposes methods for compiling high-quality datasets suitable for abstractive summarization from a large amount of noisy data through characterization and fi
APA, Harvard, Vancouver, ISO, and other styles
10

Lewis, Jonathan Scott. "The Role of Work Experiences in College Student Leadership Development: Evidence From a National Dataset and a Text Mining Approach to Examining Beliefs About Leadership." Thesis, Boston College, 2017. http://hdl.handle.net/2345/bc-ir:107652.

Full text
Abstract:
Thesis advisor: Heather Rowan-Kenyon<br>Paid employment is one of the most common extracurricular activities among full-time undergraduates, and an array of studies has attempted to measure its impact. Methodological concerns with the extant literature, however, make it difficult to draw reliable conclusions. Furthermore, the research on working college students has little to say about relationships between employment and leadership development, a key student learning outcome. This study addressed these gaps in two ways, using a national sample of 77,489 students from the 2015 Multi-Institutio
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Text dataset"

1

Shi, Feng. Learn About Text Pre-Processing in R With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526488909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Shi, Feng. Learn About Text Pre-Processing in Python With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526497864.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Shi, Feng. Learn About Basic Concepts in Text Analysis in R With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526488626.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shi, Feng. Learn About Basic Concepts in Text Analysis in Python With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526497796.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shi, Feng. Learn About Term Frequency–Inverse Document Frequency in Text Analysis in R With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526489012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Shi, Feng. Learn About Term Frequency–Inverse Document Frequency in Text Analysis in Python With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526498038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Wiesen, Christopher. Learn to Use the Kolmogorov–Smirnov Test in Stata With the Cardiac Catheterization Diagnostic Dataset (2018). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526489302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Scott Jones, Julie. Learn to Test for Multicollinearity in SPSS With Data From the English Health Survey (Teaching Dataset) (2002). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526485793.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Scott Jones, Julie. Learn to Test for Multicollinearity in R With Data From the English Health Survey (Teaching Dataset) (2002). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526498670.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Scott Jones, Julie. Learn to Use the Kaiser-Meyer-Olkin Test in SPSS With Data From the Northern Ireland Life and Times Survey: Lesbian, Gay, Bisexual, and Transgender Issues Teaching Dataset (Open Access Dataset) (2012). SAGE Publications Ltd., 2019. http://dx.doi.org/10.4135/9781526486745.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Text dataset"

1

Aghaebrahimian, Ahmad. "Quora Question Answer Dataset." In Text, Speech, and Dialogue. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-64206-2_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hao, Yanrong, Bo Chen, and Xiaobing Zhao. "TiLTS:Tibetan Long Text Summarization Dataset." In Lecture Notes in Computer Science. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-9440-9_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Iwamura, Masakazu, Takahiro Matsuda, Naoyuki Morimoto, Hitomi Sato, Yuki Ikeda, and Koichi Kise. "Downtown Osaka Scene Text Dataset." In Lecture Notes in Computer Science. Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-46604-0_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Svoboda, Lukás̆, and Tomás̆ Brychcín. "Czech Dataset for Semantic Textual Similarity." In Text, Speech, and Dialogue. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00794-2_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rafique, Aftab, and M. Ishtiaq. "UOHTD: Urdu Offline Handwritten Text Dataset." In Frontiers in Handwriting Recognition. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sowański, Marcin, and Artur Janicki. "Leyzer: A Dataset for Multilingual Virtual Assistants." In Text, Speech, and Dialogue. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_51.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Saxena, Prateek, and Soma Paul. "EPIE Dataset: A Corpus for Possible Idiomatic Expressions." In Text, Speech, and Dialogue. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Saxena, Prateek, and Soma Paul. "Labelled EPIE: A Dataset for Idiom Sense Disambiguation." In Text, Speech, and Dialogue. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-83527-9_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yang, Zhongliang, Jin He, Siyu Zhang, Jinshuai Yang, and Yongfeng Huang. "TStego-THU: Large-Scale Text Steganalysis Dataset." In Advances in Artificial Intelligence and Security. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-78621-2_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yin, Xu-Cheng, Chun Yang, and Chang Liu. "Open-Set Text Recognition: Concept, Dataset, Protocol, and Framework." In Open-Set Text Recognition. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-0361-6_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Text dataset"

1

Mareen, Hannes, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, and Symeon Papadopoulos. "TGIF: Text-Guided Inpainting Forgery Dataset." In 2024 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 2024. https://doi.org/10.1109/wifs61860.2024.10810690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Dulaimi, Ahmed, Hala Adnan Fadel, and Maryam K. Hasan. "Ultimate Arabic News Dataset: A New Efficient Dataset for Arabic Text Classification." In 2024 10th International Engineering Conference on Advances in Computer and Civil Engineering (IEC). IEEE, 2024. https://doi.org/10.1109/iec61018.2024.11063800.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Hao, Zhengdong Lu, Hang Li, and Enhong Chen. "A Dataset for Research on Short-Text Conversations." In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2013. http://dx.doi.org/10.18653/v1/d13-1096.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Madanbhavi, Lalitha, Padmashree Desai, Neha Dhirendra Sirur, Ananya Deshpande, Risheek V. Hiremath, and Chetan M. Patil. "An Efficient Multilingual Text Classification using IndicCorp dataset." In 2024 5th IEEE Global Conference for Advancement in Technology (GCAT). IEEE, 2024. https://doi.org/10.1109/gcat62922.2024.10923964.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Xie, Zeyu, Xuenan Xu, Zhizheng Wu, and Mengyue Wu. "AudioTime: A Temporally-aligned Audio-text Benchmark Dataset." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10889879.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Abdiansah, Abdiansah, Novi Yusliani, Fathoni Fathoni, Muhammad Fazri Nizar, Aulia Salsabella, and Agi Agustian Davi. "IDSpider: Indonesian Standard Dataset for Text-to-SQL." In 2024 Ninth International Conference on Informatics and Computing (ICIC). IEEE, 2024. https://doi.org/10.1109/icic64337.2024.10956918.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Koto, Fajri, Jey Han Lau, and Timothy Baldwin. "Liputan6: A Large-scale Indonesian Dataset for Text Summarization." In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.aacl-main.60.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bouchiha, Djelloul, Abdelghani Bouziane, Noureddine Doumi, et al. "WiHArD: Wikipedia Based Hierarchical Arabic Dataset for Text Classification." In 2024 4th International Conference on Embedded & Distributed Systems (EDiS). IEEE, 2024. https://doi.org/10.1109/edis63605.2024.10783418.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Weng, Lifen, Qibing Zhu, and Jiangbin Guo. "Generating Sketch Faces from Text Descriptions: Dataset and Algorithm." In 2024 IEEE 18th International Conference on Anti-counterfeiting, Security, and Identification (ASID). IEEE, 2024. https://doi.org/10.1109/asid63618.2024.10839706.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Gongqu, Zhuome, Peng Luo, Dongzhou Jiayang, Jia Cairang, Jiacuo Cizhen, and Dongzhu Renqing. "A Tibetan Ancient Uchen Text Line Dataset for OCR." In 2024 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML). IEEE, 2024. https://doi.org/10.1109/icicml63543.2024.10958135.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Text dataset"

1

Montiel Olea, César E., and Leonardo R. Corral. Text Analysis of Project Completion Reports. Inter-American Development Bank, 2021. http://dx.doi.org/10.18235/0003611.

Full text
Abstract:
Project Completion Reports (PCRs) are the main instrument through which different multilateral organizations measure the success of a project once it closes. PCRs are important for development effectiveness as they serve to understand achievements, failures, and challenges within the project cycle they can feed back into the design and execution of new projects. The aim of this paper is to introduce text analysis tools for the exploration of PCR documents. We describe and apply different text analysis tools to explore the content of a sample of PCRs. We seek to illustrate a way in which PCRs c
APA, Harvard, Vancouver, ISO, and other styles
2

Hoshi Larsson, Kaori. Do LiU researchers publish data – and where? Dataset analysis using ODDPub. Linköping University Electronic Press, 2025. https://doi.org/10.3384/report-119790.

Full text
Abstract:
Swedish researchers are encouraged to share their research data, with a government goal for all publicly funded research to provide open research data by 2026. Hoshi Larsson (2023) investigated the extent and location of research data from LiU researchers. However, the search was limited to datasets with DOIs listed in DataCite Commons, suggesting that many datasets were excluded in the investigation. Therefore, the purpose of this study is to identify, through articles’ descriptions of open data, to what extent LiU’s researchers are sharing their research data and open code, and if so, which
APA, Harvard, Vancouver, ISO, and other styles
3

Marra de Artiñano, Ignacio, Franco Riottini Depetris, and Christian Volpe Martincus. Automatic Product Classification in International Trade: Machine Learning and Large Language Models. Inter-American Development Bank, 2023. http://dx.doi.org/10.18235/0005012.

Full text
Abstract:
Accurately classifying products is essential in international trade. Virtually all countries categorize products into tariff lines using the Harmonized System (HS) nomenclature for both statistical and duty collection purposes. In this paper, we apply and assess several different algorithms to automatically classify products based on text descriptions. To do so, we use agricultural product descriptions from several public agencies, including customs authorities and the United States Department of Agriculture (USDA). We find that while traditional machine learning (ML) models tend to perform we
APA, Harvard, Vancouver, ISO, and other styles
4

Warin, Thierry. The World Health Organization in a Post-COVID-19 Era: An Exploration of Public Engagement on Twitter. CIRANO, 2022. http://dx.doi.org/10.54932/ehuh4224.

Full text
Abstract:
This article analyses the conversations on Twitter related to the World Health Organization (WHO). We collect the text of the discussions as well as the metadata associated with each tweet. Our dataset is exhaustive as it includes all the tweets produced by WHO. Likes, retweets, and replies capture the level of engagement. The goal is to quantify the balance of likes, retweets, and replies, also known as “ratios”, and study their dynamics as proxy for the collective engagement in response to WHO’s communications. Our results demonstrate a higher engagement of the public receiving the informati
APA, Harvard, Vancouver, ISO, and other styles
5

Madsen, Jens, Nikhil Kuppa, and Lucas Parra. The Brain, Body, and Behaviour Dataset - Neural Engineering Lab, CCNY. Fcp-indi, 2025. https://doi.org/10.15387/fcp_indi.retro.bbbd.

Full text
Abstract:
When humans engage with video, their brain and body interact in response to sensory input. To investigate these interactions, we recorded and are releasing a dataset from N=178 participants across five experiments featuring short online educational videos. This dataset comprises approximately 110 hours of multimodal data including electrocardiogram (ECG), heart rate, respiration, breathing rate, pupil size, electrooculogram (EOG), gaze position, saccades, blinks, fixations, head movement, and electroencephalogram (EEG). Participants viewed 3-6 videos (mean total duration: 28±5 min) to test att
APA, Harvard, Vancouver, ISO, and other styles
6

Zinilli, Antonio. Text Mining in Action: Tools and Techniques using Python. Instats Inc., 2024. http://dx.doi.org/10.61700/k4powzm518m5z1739.

Full text
Abstract:
This seminar provides a comprehensive exploration of text mining techniques using Python, tailored for academic researchers seeking to analyze large textual datasets effectively. Participants will gain hands-on experience with Python libraries and methodologies for natural language processing, sentiment analysis, topic modeling, text classification, and more, enhancing their data analysis capabilities across various disciplines.
APA, Harvard, Vancouver, ISO, and other styles
7

Johra, Hicham, Martin Veit, Mathias Østergaard Poulsen, et al. Training and testing labelled image and video datasets of human faces for different indoor visual comfort and glare visual discomfort situations. Department of the Built Environment, 2023. http://dx.doi.org/10.54337/aau542153983.

Full text
Abstract:
The aim of this technical report is to provide a description and access to labelled image and video datasets of human faces that have been generated for different indoor visual comfort and glare visual discomfort situations. These datasets have been used to train and test a computer-vision artificial neural network detecting glare discomfort from images of human faces.
APA, Harvard, Vancouver, ISO, and other styles
8

Kumar, Praveen. PR753-233900-R01 Enhanced Leak Detection Using Minimally Invasive Multi-Sensor Device Based Inspection. Pipeline Research Council International, Inc. (PRCI), 2024. http://dx.doi.org/10.55274/r0000078.

Full text
Abstract:
The project team investigated the feasibility of identifying pipeline leaks using Novel sensing approaches that have been recently gaining popularity in the "Pipeline Integrity assessment" realm (such as multi-Sensor inline inspection tools) that incorporate sensors such as Audio, Magnetometry, Pressure etc. The flow loop setup at the PRCI TDC site was leveraged to create a customized test setup and a test execution methodology was developed and executed towards this end. Two Sensing equipment vendors (hereinafter referred to as Vendor A and Vendor B) were used to collect various sensor datase
APA, Harvard, Vancouver, ISO, and other styles
9

Stucchi, Rodolfo, Alessandro Maffioli, Sofía Rojo, and Victoria Castillo. Knowledge Spillovers of Innovation Policy through Labor Mobility: An Impact Evaluation of the FONTAR Program in Argentina. Inter-American Development Bank, 2014. http://dx.doi.org/10.18235/0011534.

Full text
Abstract:
Although knowledge spillovers are at the core of the innovation policy's justification, they have never been properly measured by any impact evaluation. This paper fills this gap by estimating the spillover effects of the FONTAR program in Argentina. We use an employer-employee matched panel dataset with the entire population of firms and workers in Argentina for the period 2002-2010. This dataset allows us to track the mobility of qualified workers from FONTAR beneficiary firms to other firms and, therefore, to identify firms that indirectly benefit from the program through knowledge diffusio
APA, Harvard, Vancouver, ISO, and other styles
10

Meloncelli, Daniel. Foundations of Statistical Analysis in R. Instats Inc., 2025. https://doi.org/10.61700/90jrgibx52lka1460.

Full text
Abstract:
This seminar introduces researchers to analysing relational and correlational research questions in R, covering key methods for hypothesis testing, t-tests, ANOVA, and non-parametric alternatives. Participants will learn to choose the right test, check assumptions, perform analyses, interpret results, and visualise findings using R. The seminar will include live coding demonstrations and practical examples to help participants apply statistical methods to real-world datasets.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!