To see the other types of publications on this topic, follow the link: SCoRE corpus.

Journal articles on the topic 'SCoRE corpus'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'SCoRE corpus.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Michael, H. Brown. "Using the Sentence Corpus of Remedial English to introduce Data-Driven Learning tasks." Kanda Academic Review 1, no. 1 (2017): 1–14. https://doi.org/10.5281/zenodo.836107.

Full text
Abstract:
Corpus Linguistics has had major effects on English language teaching and learning in the past few decades (Huang, 2011). Its influence can be seen, for example, in the development of modern dictionaries, grammars, course books, and testing design. Data-Driven Learning (DDL), or learning driven by learner access to language data found in corpora, has seen an increase in research interest, too (Cobb & Boulton, 2015). However, this research interest has not resulted in widespread classroom adoption of DDL methods in spite of generally positive findings (Cobb & Boulton, 2015). Objections
APA, Harvard, Vancouver, ISO, and other styles
2

Narayan, Ravi, V. P. Singh, and S. Chakraverty. "Quantum Neural Network Based Machine Translator for Hindi to English." Scientific World Journal 2014 (2014): 1–8. http://dx.doi.org/10.1155/2014/485737.

Full text
Abstract:
This paper presents the machine learning based machine translation system for Hindi to English, which learns the semantically correct corpus. The quantum neural based pattern recognizer is used to recognize and learn the pattern of corpus, using the information of part of speech of individual word in the corpus, like a human. The system performs the machine translation using its knowledge gained during the learning by inputting the pair of sentences of Devnagri-Hindi and English. To analyze the effectiveness of the proposed approach, 2600 sentences have been evaluated during simulation and eva
APA, Harvard, Vancouver, ISO, and other styles
3

Teddiman, Laura. "Contextuality and Beyond: Investigating an Online Diary Corpus." Proceedings of the International AAAI Conference on Web and Social Media 3, no. 1 (2009): 331–33. http://dx.doi.org/10.1609/icwsm.v3i1.14004.

Full text
Abstract:
Heylighen and Dewaele’s (2002) F-score, a measure of formality developed based on categorical frequencies of word types, is used as a starting point for an investigation of an online diary corpus. Comparisons are made between results in the main corpus of diary entries, a smaller corpus of diary comments, and with previously calculated F-scores for similar types of data (Nowson, Oberlander & Gill, 2005). While the overall F-score is similar in these two corpora, results show that internal make-up of the categories upon which the calculation is based can differ. This suggests that while the
APA, Harvard, Vancouver, ISO, and other styles
4

Crossley, Scott, Yu Tian, Perpetual Baffour, et al. "The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus." International Journal of Learner Corpus Research 9, no. 2 (2023): 248–69. http://dx.doi.org/10.1075/ijlcr.22026.cro.

Full text
Abstract:
Abstract This paper introduces the open-source English Language Learning Insight, Proficiency and Skills Evaluation (ELLIPSE) corpus. The corpus comprises ~6,500 essays written by English language learners (ELLs). All essays were written during state-wide standardized annual testing in the United States. The essays were written on 29 different independent prompts that required no background knowledge on the part of the writer. Individual difference information is made available for each essay including economic status, gender, grade level (8–12), and race/ethnicity. Each essay was scored by tw
APA, Harvard, Vancouver, ISO, and other styles
5

Blavier, Frederic, Gilles Faron, Wilfried Cools, et al. "Corpus luteum score, a simple Doppler examination to prognose early pregnancies." European Journal of Obstetrics & Gynecology and Reproductive Biology 258 (March 2021): 324–31. http://dx.doi.org/10.1016/j.ejogrb.2021.01.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kongcharoen, Pong-ampai, Jiraporn Dhanarattigannon, and Intira Bumrungsalee. "Formality in the Academic Writing of Thai EFL English-Major Students." rEFLections 32, no. 1 (2025): 395–414. https://doi.org/10.61508/refl.v32i1.280228.

Full text
Abstract:
In recent years, there has been a growing trend of using informal styles in academic writing, including research articles. To examine the degree of formality in students’ writing, this corpus-based study aimed to analyze the formal linguistic features in the academic writing assignments of English-major students at a Thai university. The learner corpus consisted of 552 assignments, totaling 190,506 words, and was organized into five different writing patterns. TagAnt was used to identify the part of speech for each word, while the Google Colab program was utilized for frequency counting. To as
APA, Harvard, Vancouver, ISO, and other styles
7

Yadav, Siddharth, and Tanmoy Chakraborty. "Zera-Shot Sentiment Analysis for Code-Mixed Data." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (2021): 15941–42. http://dx.doi.org/10.1609/aaai.v35i18.17967.

Full text
Abstract:
Code-mixing is the practice of alternating between two or more languages. A major part of sentiment analysis research has been monolingual and they perform poorly on the code-mixed text. We introduce methods that use multilingual and cross-lingual embeddings to transfer knowledge from monolingual text to code-mixed text for code-mixed sentiment analysis. Our methods handle code-mixed text through zero-shot learning and beat state-of-the-art English-Spanish code-mixed sentiment analysis by an absolute 3% F1-score. We are able to achieve 0.58 F1-score (without a parallel corpus) and 0.62 F1-scor
APA, Harvard, Vancouver, ISO, and other styles
8

D.Umanandhini*1, &. S.Manimegalai2. "FUZZY SCORE BASED SHORT TEXT UNDERSTANDING FROM CORPUS DATA USING SEMANTIC DISCOVERY." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 6, no. 12 (2017): 268–73. https://doi.org/10.5281/zenodo.1116682.

Full text
Abstract:
Short text understanding and short text are always more ambiguous. These short texts are produced including Search queries, Tags, Keywords, Conversation or Social posts and containing limited context. Generally short texts do not contain sufficient collection of data to support many state-of-the-art approaches for text mining such as topic modelling. It presents a comprehensive overview of short text understanding. Here we used a novel framework are Text Feature Extraction Algorithm and Fuzzy weighted Vote algorithm First, Text classification based on semantic feature extraction.   Its go
APA, Harvard, Vancouver, ISO, and other styles
9

Wegner, Philipp, Holger Fröhlich, and Sumit Madan. "Evaluating knowledge fusion models on detecting adverse drug events in text." PLOS Digital Health 4, no. 3 (2025): e0000468. https://doi.org/10.1371/journal.pdig.0000468.

Full text
Abstract:
Detecting adverse drug events (ADE) of drugs that are already available on the market is an essential part of the pharmacovigilance work conducted by both medical regulatory bodies and the pharmaceutical industry. Concerns regarding drug safety and economic interests serve as motivating factors for the efforts to identify ADEs. Hereby, social media platforms play an important role as a valuable source of reports on ADEs, particularly through collecting posts discussing adverse events associated with specific drugs. We aim with our study to assess the effectiveness of knowledge fusion approache
APA, Harvard, Vancouver, ISO, and other styles
10

Achmad, Rizkial, Yokelin Tokoro, Jusuf Haurissa, and Andik Wijanarko. "Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation." Journal of Information Systems and Informatics 5, no. 4 (2023): 1449–60. http://dx.doi.org/10.51519/journalisi.v5i4.597.

Full text
Abstract:
The Papuan Sentani language is spoken in the city of Jayapura, Papua. The law states the need to preserve regional languages. One of them is by building an Indonesian-Sentani Papua translation machine. The problem is how to build a translation machine and what model to choose in doing so. The model chosen is Recurrent Neural Network – Gated Recurrent Units (RNN-GRU) which has been widely used to build regional languages in Indonesia. The method used is an experiment starting from creating a parallel corpus, followed by corpus training using the RNN-GRU model, and the final step is conducting a
APA, Harvard, Vancouver, ISO, and other styles
11

Shu, Can. "Application of Corpus Analysis Software and Error Analysis Theory on Non-English Majors’ Chinese-English Translation Features." Higher Education and Practice 1, no. 1 (2024): 92–95. http://dx.doi.org/10.62381/h241116.

Full text
Abstract:
This research focuses on non-English major students’ Chinese-English (C-E) translation features. It aims at annotating and summarizing the features and error types in students’ C-E translation by applying to corpus analysis software Readability Analyzer. This study first collects students’ C-E translation texts from a computer-based exam, then forms a self-constructed corpus and a reference corpus. The translation materials in the target corpus and the reference corpus are first analyzed and compared by Readability Analyzer to figure out the features of students C-E translation texts from the
APA, Harvard, Vancouver, ISO, and other styles
12

González Fernández, Beatriz, and Norbert Schmitt. "How much collocation knowledge do L2 learners have?" ITL - International Journal of Applied Linguistics 166, no. 1 (2015): 94–126. http://dx.doi.org/10.1075/itl.166.1.03fer.

Full text
Abstract:
Many scholars believe that collocations are difficult to learn and use by L2 learners. However, some research suggests that learners often know more collocations than commonly thought. This study tested 108 Spanish learners of English to measure their productive knowledge of 50 collocations, which varied according to corpus frequency, t-score, and MI score. The participants produced a mean score of 56.6% correct, suggesting that our learners knew a substantial number of collocations. Knowledge of the collocations correlated moderately with corpus frequency (.45), but also with everyday engagem
APA, Harvard, Vancouver, ISO, and other styles
13

Roy, Indradyumna, Venkata Sai Baba Reddy Velugoti, Soumen Chakrabarti, and Abir De. "Interpretable Neural Subgraph Matching for Graph Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (2022): 8115–23. http://dx.doi.org/10.1609/aaai.v36i7.20784.

Full text
Abstract:
Given a query graph and a database of corpus graphs, a graph retrieval system aims to deliver the most relevant corpus graphs. Graph retrieval based on subgraph matching has a wide variety of applications, e.g., molecular fingerprint detection, circuit design, software analysis, and question answering. In such applications, a corpus graph is relevant to a query graph, if the query graph is (perfectly or approximately) a subgraph of the corpus graph. Existing neural graph retrieval models compare the node or graph embeddings of the query-corpus pairs, to compute the relevance scores between the
APA, Harvard, Vancouver, ISO, and other styles
14

Mathew, Preethi, Kerstin Pannek, Pamela Snow, et al. "Maturation of Corpus Callosum Anterior Midbody Is Associated with Neonatal Motor Function in Eight Preterm-Born Infants." Neural Plasticity 2013 (2013): 1–7. http://dx.doi.org/10.1155/2013/359532.

Full text
Abstract:
Background. The etiology of motor impairments in preterm infants is multifactorial and incompletely understood. Whether corpus callosum development is related to impaired motor function is unclear. Potential associations between motor-related measures and diffusion tensor imaging (DTI) of the corpus callosum in preterm infants were explored.Methods. Eight very preterm infants (gestational age of 28–32 weeks) underwent the Hammersmith neonatal neurological examination and DTI assessments at gestational age of 42 weeks. The total Hammersmith score and a motor-specific score (sum of Hammersmith m
APA, Harvard, Vancouver, ISO, and other styles
15

Lototska, Nataliia. "STATISTICAL ANALYSIS OF COLLOCATIONS OF THE CONCEPT JOY IN R. IVANYCHUK’S TEXT CORPUS." Scientific Journal of Polonia University 37, no. 6 (2020): 92–98. http://dx.doi.org/10.23856/3709.

Full text
Abstract:
The paper includes a review of scientific works on the importance of corpus and quantitative methods, the problem of connectivity and the ways of collocation study. The article deals with the study of collocations of the emotion JOY in writer’s Text Corpus by the means of statistical methods in modern linguistics. From the point of view of language system described collocations are presented in various structural-semantic forms in author’s idiolect. Meanwhile statistical research represents a list of collocations organized according to absolute and relative frequency and association measures s
APA, Harvard, Vancouver, ISO, and other styles
16

Putri, Mellati Riandi, Elvi Citraresmana, and Inu Isnaeni Sidiq. "Semantic Preference of the Word Okaa-san and Mama in Tsukuba Web Corpus: A Corpus Linguistic Analysis." Journal of Japanese Language Education and Linguistics 7, no. 2 (2023): 144–59. http://dx.doi.org/10.18196/jjlel.v7i2.19217.

Full text
Abstract:
The use of loanwords is common in Japanese people's daily lives, such as the use of'mama' as a term for mother rather than 'okaa-san'. Using corpus linguistic analysis, this study sought to discover how okaa-san and mama are discussed in Japanese society on the internet. The mixed technique was utilized in this study, with collocate strength calculated using MI Score and subsequently categorized by using USAS semantic categories. It was discovered that the phrase okaa-san is usually used to refer to social activities, states, and processes such as the tight relationship between mother and kid.
APA, Harvard, Vancouver, ISO, and other styles
17

Januzaj, Ylber, Edmond Beqiri, and Artan Luma. "Determining the Optimal Number of Clusters using Silhouette Score as a Data Mining Technique." International Journal of Online and Biomedical Engineering (iJOE) 19, no. 04 (2023): 174–82. http://dx.doi.org/10.3991/ijoe.v19i04.37059.

Full text
Abstract:
The identification of the same objects is very important in determining the similarity between different objects. Nowadays, there are several techniques that allow us to divide objects into different groups that differ from one to another. In order to have the best separation between the clusters, it is required that the optimal determination of the number of clusters of a corpus be made in advance. In our research, the Silhouette score technique was used in order to make the optimal determination of this number of clusters. The application of such a technique was done through the Python langu
APA, Harvard, Vancouver, ISO, and other styles
18

Dong, Rui, Yating Yang, and Tonghai Jiang. "Spelling Correction of Non-Word Errors in Uyghur–Chinese Machine Translation." Information 10, no. 6 (2019): 202. http://dx.doi.org/10.3390/info10060202.

Full text
Abstract:
This research was conducted to solve the out-of-vocabulary problem caused by Uyghur spelling errors in Uyghur–Chinese machine translation, so as to improve the quality of Uyghur–Chinese machine translation. This paper assesses three spelling correction methods based on machine translation: 1. Using a Bilingual Evaluation Understudy (BLEU) score; 2. Using a Chinese language model; 3. Using a bilingual language model. The best results were achieved in both the spelling correction task and the machine translation task by using the BLEU score for spelling correction. A maximum F1 score of 0.72 was
APA, Harvard, Vancouver, ISO, and other styles
19

Lesatari, Aufa Eka Putri, Arie Ardiyanti, Arie Ardiyanti, Ibnu Asror, and Ibnu Asror. "Phrase Based Statistical Machine Translation Javanese-Indonesian." JURNAL MEDIA INFORMATIKA BUDIDARMA 5, no. 2 (2021): 378. http://dx.doi.org/10.30865/mib.v5i2.2812.

Full text
Abstract:
This research aims to produce a statistical machine translation that can be implemented to perform Javanese-Indonesian translation and to know the influence of the main data sources of statistical machine translation namely parallel corpus and monolingual corpus on the quality of Javanese-Indonesian statistical machine translation. The testing was carried out by gradually adding the quantity of parallel corpus and monolingual corpus to seven configurations of Javanese-Indonesian statistical machine translation. All machine translation configuration experiments were tested with test data totali
APA, Harvard, Vancouver, ISO, and other styles
20

Nguyen, Nhung, Roselyn Gabud, and Sophia Ananiadou. "COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature." Biodiversity Data Journal 7 (January 22, 2019): e29626. https://doi.org/10.3897/BDJ.7.e29626.

Full text
Abstract:
<b>Background</b> Species occurrence records are very important in the biodiversity domain. While several available corpora contain only annotations of species names or habitats and geographical locations, there is no consolidated corpus that covers all types of entities necessary for extracting species occurrence from biodiversity literature. In order to alleviate this issue, we have constructed the COPIOUS corpus—a gold standard corpus that covers a wide range of biodiversity entities. <b>Results</b> Two annotators manually annotated the corpus with five categories of entities, i.e. taxon na
APA, Harvard, Vancouver, ISO, and other styles
21

Patumcharoenpol, Preecha, Narumol Doungpan, Asawin Meechai, Bairong Shen, Jonathan H. Chan, and Wanwipa Vongsangnak. "An integrated text mining framework for metabolic interaction network reconstruction." PeerJ 4 (March 21, 2016): e1811. http://dx.doi.org/10.7717/peerj.1811.

Full text
Abstract:
Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction o
APA, Harvard, Vancouver, ISO, and other styles
22

Alfian, Muhammad, Umi Laili Yuhana, Daniel Siahaan, and Harum Munazharoh. "Annotation Error Detection and Correction for Indonesian POS Tagging Corpus." Lontar Komputer : Jurnal Ilmiah Teknologi Informasi 16, no. 1 (2025): 41. https://doi.org/10.24843/lkjiti.2025.v16.i01.p04.

Full text
Abstract:
Linguistic Corpus is the primary material for training and evaluating machine learning models, especially for POS Tagging. However, the human-annotated corpus is not free from annotation errors. Annotation errors have a negative impact on model performance. Therefore, we propose annotation error detection and correction. We detect annotation errors in the Indonesian POS Tagging corpus using the n-gram variation method. Then, we correct the corpus using an expert-voting approach. Annotation error detection successfully collected 6,536 annotation error candidates. Each candidate has two possibil
APA, Harvard, Vancouver, ISO, and other styles
23

Sole-Mauri, Francina, Pilar Sánchez-Gijón, and Antoni Oliver. "Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents." Mutatis Mutandis. Revista Latinoamericana de Traducción 14, no. 2 (2021): 494–508. http://dx.doi.org/10.17533/udea.mut.v14n2a10.

Full text
Abstract:
This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus
APA, Harvard, Vancouver, ISO, and other styles
24

Li, Hui, Lin Yu, Jie Zhang, and Ming Lyu. "Fusion Deep Learning and Machine Learning for Heterogeneous Military Entity Recognition." Wireless Communications and Mobile Computing 2022 (January 17, 2022): 1–11. http://dx.doi.org/10.1155/2022/1103022.

Full text
Abstract:
With respect to the fuzzy boundaries of military heterogeneous entities, this paper improves the entity annotation mechanism for entity with fuzzy boundaries based on related research works. This paper applies a BERT-BiLSTM-CRF model fusing deep learning and machine learning to recognize military entities, and thus, we can construct a smart military knowledge base with these entities. Furthermore, we can explore many military AI applications with the knowledge base and military Internet of Things (MIoT). To verify the performance of the model, we design multiple types of experiments. Experimen
APA, Harvard, Vancouver, ISO, and other styles
25

Haase, Christoph. "TEACHING LEXICO-SEMANTIC COMPLEXITY IN STUDENT ACADEMIC WRITING: CORPORA AND CORPUS TOOLS." International Journal of Education and Social Science Research 07, no. 06 (2024): 299–308. https://doi.org/10.37500/ijessr.2024.7622.

Full text
Abstract:
This contribution surveys methods to enhance the teaching of English for Academic Purposes via computer-aided corpus methods. It introduces data generated from a custom-made algorithm to automatically assign a score for the lexico-semantic complexity of a given text. It also presents findings from corpus studies that investigate the linguistic parameters of academic texts. The tool utilizes the WordNet project, which is part of the semantic web initiative. The texts analyzed in the study come from the self-compiled corpus project CUJOE which includes various registers of student academic writi
APA, Harvard, Vancouver, ISO, and other styles
26

Ganesh, Preetham, Bharat S. Rawal, Alexander Peter, and Andi Giri. "POS-Tagging based Neural Machine Translation System for European Languages using Transformers." WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 18 (May 24, 2021): 26–33. http://dx.doi.org/10.37394/23209.2021.18.5.

Full text
Abstract:
The interaction between human beings has always faced different kinds of difficulties. One of those difficulties is the language barrier. It would be a tedious task for someone to learn all the syllables in a new language in a short period and converse with a native speaker without grammatical errors. Moreover, having a language translator at all times would be intrusive and expensive. We propose a novel approach to Neural Machine Translation (NMT) system using interlanguage word similaritybased model training and Part-Of-Speech (POS) Tagging based model testing. We compare these approaches us
APA, Harvard, Vancouver, ISO, and other styles
27

Ali, Zeshan Ali. "Research Chinese-Urdu Machine Translation Based on Deep Learning." Journal of Autonomous Intelligence 3, no. 2 (2021): 34. http://dx.doi.org/10.32629/jai.v3i2.279.

Full text
Abstract:
Urdu is Pakistan 's national language. However, Chinese expertise is very negligible in Pakistan and the Asian nations. Yet fewer research has been undertaken in the area of computer translation on Chinese to Urdu. In order to solve the above problems, we designed of an electronic dictionary for Chinese-Urdu, and studied the sentence-level machine translation technology which is based on deep learning. The Design of an electronic dictionary Chinese-Urdu machine translation system we collected and constructed an electronic dictionary containing 24000 entries from Chinese to Urdu. For Sentence w
APA, Harvard, Vancouver, ISO, and other styles
28

Afia, Nur, and Muawwinatul Laili. "Developing a Corpus-Based English Vocabulary Dictionary using The ADDIE Model." Nusantara Educational Review 1, no. 1 (2023): 56–62. http://dx.doi.org/10.55732/ner.v1i1.1024.

Full text
Abstract:
Beberapa siswa masih belum mampu berbicara dan memahami teks dan menulis dalam bahasa Inggris akibat siswa belum terbiasa membawa kamus. Kamus berbasis korpus dapat membantu siswa dalam akses belajar bahasa Inggris. Tujuan dari penelitian ini adalah mengembangkan kamus kosakata bahasa Inggris berbasis korpus. Obyek penelitian adalah siswa kelas 8 dan guru bahasa di Sekolah Menengah Pertama Al Manshur, Sidoarjo. Penelitian pengembangan pendidikan ini menggunakan model ADDIE. Penelitian melakukan uji kelayakan kamus berbasis korpus menggunakan validasi dari ahli desain, ahli materi, dan ahli bah
APA, Harvard, Vancouver, ISO, and other styles
29

Shin, Han-Sub, Hyuk-Yoon Kwon, and Seung-Jin Ryu. "A New Text Classification Model Based on Contrastive Word Embedding for Detecting Cybersecurity Intelligence in Twitter." Electronics 9, no. 9 (2020): 1527. http://dx.doi.org/10.3390/electronics9091527.

Full text
Abstract:
Detecting cybersecurity intelligence (CSI) on social media such as Twitter is crucial because it allows security experts to respond cyber threats in advance. In this paper, we devise a new text classification model based on deep learning to classify CSI-positive and -negative tweets from a collection of tweets. For this, we propose a novel word embedding model, called contrastive word embedding, that enables to maximize the difference between base embedding models. First, we define CSI-positive and -negative corpora, which are used for constructing embedding models. Here, to supplement the imb
APA, Harvard, Vancouver, ISO, and other styles
30

Gyasi, Frederick, and Tim Schlippe. "Twi Machine Translation." Big Data and Cognitive Computing 7, no. 2 (2023): 114. http://dx.doi.org/10.3390/bdcc7020114.

Full text
Abstract:
French is a strategically and economically important language in the regions where the African language Twi is spoken. However, only a very small proportion of Twi speakers in Ghana speak French. The development of a Twi–French parallel corpus and corresponding machine translation applications would provide various advantages, including stimulating trade and job creation, supporting the Ghanaian diaspora in French-speaking nations, assisting French-speaking tourists and immigrants seeking medical care in Ghana, and facilitating numerous downstream natural language processing tasks. Since there
APA, Harvard, Vancouver, ISO, and other styles
31

Pořízka, Petr. "A Corpus of Czech Essays from the Turn of the 1900s." Journal of Linguistics/Jazykovedný casopis 72, no. 2 (2021): 618–30. http://dx.doi.org/10.2478/jazcas-2021-0056.

Full text
Abstract:
Abstract A literary essay is an interesting unit for language analyses, as its stylistic means often exceed the boundaries of the genre of an artistic essay. The article presents a new corpus of Czech literary essays covering approximately fifty years from 1890 to 1940. Along with the characterisation of the corpus and its annotation, the paper focuses on the TxM corpus tool: In the second part of the study, we use selected texts to conduct an analysis of seven various authors through multidimensional cluster analysis, factorial correspondence analysis and a specificity score. The main paramet
APA, Harvard, Vancouver, ISO, and other styles
32

Umar, Mahmood, Hauwa Ibrahim Binji, and Anas Tukur Balarabe. "Corpus-based Approaches for Sentiment Analysis: A Review." Asian Journal of Research in Computer Science 17, no. 7 (2024): 95–102. http://dx.doi.org/10.9734/ajrcos/2024/v17i7481.

Full text
Abstract:
The investigation studied the state of art on corpus-based approaches for sentiment analysis. Thus, detailing its methodologies, evaluation metrics, limitations, and future directions. The importance of sentiment analysis in fields such as marketing, customer feedback analysis, social media monitoring, financial analysis, and political science is emphasized. The methodology for corpus-based approaches in sentiment analysis includes the following key steps: data collection, preprocessing, feature extraction, and sentiment classification. The lexicon-based approaches include the corpus-based or
APA, Harvard, Vancouver, ISO, and other styles
33

Asiri, Afefa, and Mostafa Saleh. "SOD: A Corpus for Saudi Offensive Language Detection Classification." Computers 13, no. 8 (2024): 211. http://dx.doi.org/10.3390/computers13080211.

Full text
Abstract:
Social media platforms like X (formerly known as Twitter) are integral to modern communication, enabling the sharing of news, emotions, and ideas. However, they also facilitate the spread of harmful content, and manual moderation of these platforms is impractical. Automated moderation tools, predominantly developed for English, are insufficient for addressing online offensive language in Arabic, a language rich in dialects and informally used on social media. This gap underscores the need for dedicated, dialect-specific resources. This study introduces the Saudi Offensive Dialectal dataset (SO
APA, Harvard, Vancouver, ISO, and other styles
34

Sun, Changjian, Wentao Chen, Zhen Zhang, and Tian Zhang. "A Patent Keyword Extraction Method Based on Corpus Classification." Mathematics 12, no. 7 (2024): 1068. http://dx.doi.org/10.3390/math12071068.

Full text
Abstract:
The keyword extraction of patents is crucial for technicians to master the trends of technology. Traditional keyword extraction approaches only handle short text like title or claims, but ignore the comprehensive meaning of the description. This paper proposes a novel patent keyword extraction method based on corpus classification (PKECC), which simulates the patent understanding methods of human patent examiners. First of all, a corpus classification model based on multi-level attention mechanism adopts the Bert model and hierarchical attention mechanism to classify the sentences of patent de
APA, Harvard, Vancouver, ISO, and other styles
35

Garg, Kamal Deep, Shashi Shekhar, Ajit Kumar, et al. "Framework for Handling Rare Word Problems in Neural Machine Translation System Using Multi-Word Expressions." Applied Sciences 12, no. 21 (2022): 11038. http://dx.doi.org/10.3390/app122111038.

Full text
Abstract:
Neural machine translation (NMT) is an ongoing technique used to implement machine translation (MT) systems. Natural language processing (NLP) researchers have shown that NMT systems are unable to deal with out-of-vocabulary (OOV) words and multi-word expressions (MWEs) in the text. OOV words are those that are not part of the current vocabulary of the NMT system. MWEs are phrases that consist of a minimum of two terms but are treated as a single unit. MWEs have great importance in NLP, linguistic theory, and MT systems. In this article, OOV words and MWEs are handled for the Punjabi to Englis
APA, Harvard, Vancouver, ISO, and other styles
36

Beaumont, Thomas L., Alireza M. Mohammadi, Albert H. Kim, Gene H. Barnett, and Eric C. Leuthardt. "Magnetic Resonance Imaging-Guided Laser Interstitial Thermal Therapy for Glioblastoma of the Corpus Callosum." Neurosurgery 83, no. 3 (2018): 556–65. http://dx.doi.org/10.1093/neuros/nyx518.

Full text
Abstract:
Abstract BACKGROUND Glioblastoma of the corpus callosum is particularly difficult to treat, as the morbidity of surgical resection generally outweighs the potential survival benefit. Laser interstitial thermal therapy (LITT) is a safe and effective treatment option for difficult to access malignant gliomas of the thalamus and insula. OBJECTIVE To assess the safety and efficacy of LITT for the treatment of glioblastoma of the corpus callosum. METHODS We performed a multicenter retrospective analysis of prospectively collected data. The primary endpoint was the safety and efficacy of LITT as a t
APA, Harvard, Vancouver, ISO, and other styles
37

Allam, Ahmed, Peter J. Schulz, and Michael Krauthammer. "Toward automated assessment of health Web page quality using the DISCERN instrument." Journal of the American Medical Informatics Association 24, no. 3 (2016): 481–87. http://dx.doi.org/10.1093/jamia/ocw140.

Full text
Abstract:
Background: As the Internet becomes the number one destination for obtaining health-related information, there is an increasing need to identify health Web pages that convey an accurate and current view of medical knowledge. In response, the research community has created multicriteria instruments for reliably assessing online medical information quality. One such instrument is DISCERN, which measures health Web page quality by assessing an array of features. In order to scale up use of the instrument, there is interest in automating the quality evaluation process by building machine learning
APA, Harvard, Vancouver, ISO, and other styles
38

Görmez, Yasin. "Customized deep learning based Turkish automatic speech recognition system supported by language model." PeerJ Computer Science 10 (April 3, 2024): e1981. http://dx.doi.org/10.7717/peerj-cs.1981.

Full text
Abstract:
Background In today’s world, numerous applications integral to various facets of daily life include automatic speech recognition methods. Thus, the development of a successful automatic speech recognition system can significantly augment the convenience of people’s daily routines. While many automatic speech recognition systems have been established for widely spoken languages like English, there has been insufficient progress in developing such systems for less common languages such as Turkish. Moreover, due to its agglutinative structure, designing a speech recognition system for Turkish pre
APA, Harvard, Vancouver, ISO, and other styles
39

Park, Seongsik, and Harksoo Kim. "Dual Pointer Network for Fast Extraction of Multiple Relations in a Sentence." Applied Sciences 10, no. 11 (2020): 3851. http://dx.doi.org/10.3390/app10113851.

Full text
Abstract:
Relation extraction is a type of information extraction task that recognizes semantic relationships between entities in a sentence. Many previous studies have focused on extracting only one semantic relation between two entities in a single sentence. However, multiple entities in a sentence are associated through various relations. To address this issue, we proposed a relation extraction model based on a dual pointer network with a multi-head attention mechanism. The proposed model finds n-to-1 subject–object relations using a forward object decoder. Then, it finds 1-to-n subject–object relati
APA, Harvard, Vancouver, ISO, and other styles
40

Uddin, Farid, Yibo Chen, Zuping Zhang, and Xin Huang. "Corpus Statistics Empowered Document Classification." Electronics 11, no. 14 (2022): 2168. http://dx.doi.org/10.3390/electronics11142168.

Full text
Abstract:
In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging to capture semantic insights when dealing with short-length documents where word co-o
APA, Harvard, Vancouver, ISO, and other styles
41

Zhu, Ling, Derek F. Wong, and Lidia S. Chao. "Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus." Scientific World Journal 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/401943.

Full text
Abstract:
This paper presents a novel approach for unsupervised shallow parsing model trained on the unannotated Chinese text of parallel Chinese-English corpus. In this approach, no information of the Chinese side is applied. The exploitation of graph-based label propagation for bilingual knowledge transfer, along with an application of using the projected labels as features in unsupervised model, contributes to a better performance. The experimental comparisons with the state-of-the-art algorithms show that the proposed approach is able to achieve impressive higher accuracy in terms ofF-score.
APA, Harvard, Vancouver, ISO, and other styles
42

Wijayanti, Rini, and Andria Arisal. "Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 1 (2021): 1–16. http://dx.doi.org/10.1145/3425632.

Full text
Abstract:
A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, until tuning sentiment value with tagged sentiment corpus. It begins by taking seed words from WordNet Bahasa that mapped with sentiment value from English SentiWordNet . The seed words are enriched by combining the dictionary-based method with words’ synonyms and antonyms, and corpus-based methods with word embedding for word similarity that trai
APA, Harvard, Vancouver, ISO, and other styles
43

Weissenbacher, Davy, Abeed Sarker, Ari Klein, Karen O’Connor, Arjun Magge, and Graciela Gonzalez-Hernandez. "Deep neural networks ensemble for detecting medication mentions in tweets." Journal of the American Medical Informatics Association 26, no. 12 (2019): 1618–26. http://dx.doi.org/10.1093/jamia/ocz156.

Full text
Abstract:
Abstract Objective Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them. Materials and Methods We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products a
APA, Harvard, Vancouver, ISO, and other styles
44

Liang, Zi, Pinghui Wang, Ruofei Zhang, et al. "Exploring Intrinsic Alignments Within Text Corpus." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 26 (2025): 27455–63. https://doi.org/10.1609/aaai.v39i26.34957.

Full text
Abstract:
Recent years have witnessed rapid advancements in the safety alignments of large language models (LLMs). Methods such as supervised instruction fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) have thus emerged as vital components in constructing LLMs. While these methods achieve robust and fine-grained alignment to human values, their practical application is still hindered by high annotation costs and incomplete human alignments. Besides, the intrinsic human values within training corpora have not been fully exploited. To address these issues, we propose ISAAC (Intrins
APA, Harvard, Vancouver, ISO, and other styles
45

Yulita, Winda, Sigit Priyanta, and Azhari SN. "Automatic Text Summarization Based on Semantic Networks and Corpus Statistics." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 13, no. 2 (2019): 137. http://dx.doi.org/10.22146/ijccs.38261.

Full text
Abstract:
One simple automatic text summarization method that can minimize redundancy, in summary, is the Maximum Marginal Relevance (MMR) method. The MMR method has the disadvantage of having parts that are separated from each other in summary results that are not semantically connected. Therefore, this study aims to compare summary results using the MMR method based on semantic and non-semantic based MMR. Semantic-based MMR methods utilize WordNet Bahasa and corpus in processing text summaries. The MMR method is non-semantic based on the TF-IDF method. This study also carried out summary compression o
APA, Harvard, Vancouver, ISO, and other styles
46

Gerber, Matthew, and Joyce Chai. "Semantic Role Labeling of Implicit Arguments for Nominal Predicates." Computational Linguistics 38, no. 4 (2012): 755–98. http://dx.doi.org/10.1162/coli_a_00110.

Full text
Abstract:
Nominal predicates often carry implicit arguments. Recent work on semantic role labeling has focused on identifying arguments within the local context of a predicate; implicit arguments, however, have not been systematically examined. To address this limitation, we have manually annotated a corpus of implicit arguments for ten predicates from NomBank. Through analysis of this corpus, we find that implicit arguments add 71% to the argument structures that are present in NomBank. Using the corpus, we train a discriminative model that is able to identify implicit arguments with an F1 score of 50%
APA, Harvard, Vancouver, ISO, and other styles
47

Putra, Rheza Ramadhan, Donni Richasdy, and Aditya Firman Ihsan. "Part-of-Speech Tagging Implementation on Telkom University News using Bidirectional LSTM Method." JURNAL MEDIA INFORMATIKA BUDIDARMA 7, no. 1 (2023): 360. http://dx.doi.org/10.30865/mib.v7i1.5506.

Full text
Abstract:
News is a tool used to disseminate information through various media, one of which is the internet. Various kinds of news articles have words that are not recognized in the dictionary such as slang words and have foreign words that do not exist in the corpus. How can a POS tagging model built on the corpus be able to handle word class labeling in Indonesian news. The research was conducted to check the results of POS tagging on a collection of news about Telkom University which was selected manually. By using the bidirectional LSTM model, three test scenarios were attempted to improve the perf
APA, Harvard, Vancouver, ISO, and other styles
48

Anokhina, T. O. "Multilingual corpus as resource for working with political speeches by European public figures." MESSENGER of Kyiv National Linguistic University. Series Philology 26, no. 2 (2024): 9–19. http://dx.doi.org/10.32589/2311-0821.2.2023.297658.

Full text
Abstract:
The article examines the use of the multilingual corpus as a tool for analyzing political speeches by European public figures. The novelty of the article lies in the automatic and semi-automatic compilation of the corpus, focusing on speeches delivered by European leaders, key figures and economists. It introduces a novel approach to the analysis of political speeches by European public figures through the compilation of a multilingual corpus. The semi-automatic compilation mode, facilitated by Sketch Engine functionalities, allowed for efficient processing and visualization, offering course p
APA, Harvard, Vancouver, ISO, and other styles
49

Jyoti, Malhotra, and Bakal Jagdish. "A Deterministic Eviction Model for Removing Redundancies in Video Corpus." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (2018): 3221–31. https://doi.org/10.11591/ijece.v8i5.pp3221-3231.

Full text
Abstract:
The traditional storage approaches are being challenged by huge data volumes. In multimedia content, every file does not necessarily get tagged as an exact duplicate; rather they are prone to editing and resulting in similar copies of the same file. This paper proposes the similarity-based deduplication approach to evict similar duplicates from the archive storage, which compares the samples of binary hashes to identify the duplicates. This eviction is done by initially dividing the query video into dynamic key frames based on the video length. Binary hash codes of these frames are then compar
APA, Harvard, Vancouver, ISO, and other styles
50

Muhammad Ahsan Thoriq, Mohammad Ahsanuddin, Nurul Murtadho та Walaa Ali Omar Abdelzaher. "Taṭwīr Masyrū' al-Mudawwanah As-Sam'iyyah al-Baṣariyyah li Kafā`ati at-Tafkīr an-Nāqid wamā warā`a al-Ma'rifī li Firaqi al-Munaẓarah al-‘Arabiyyah fī Indonesia". LISANIA: Journal of Arabic Education and Literature 7, № 2 (2023): 217–29. http://dx.doi.org/10.18326/lisania.v7i2.217-229.

Full text
Abstract:
This research aims to develop an Arabic audio-visual debate corpus based on the Qatar debate guidelines. This development is intended to assist students in enhancing critical thinking and metacognitive skills through Arabic debate learning. The research utilizes the Lee &amp; Owens development approach and is implemented using experimental and control groups. The choice of this approach is because it focuses solely on multimedia learning development. The development procedure consists of five stages: analysis, design, development, implementation, and evaluation. The subjects in this study cons
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!