Log in

Relevant bibliographies by topics / Long Document Classification and Explanation / Journal articles

To see the other types of publications on this topic, follow the link: Long Document Classification and Explanation.

Journal articles on the topic 'Long Document Classification and Explanation'

Author: Grafiati

Published: 7 June 2025

Last updated: 25 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Long Document Classification and Explanation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Beckh, Katharina, Joann Rachel Jacob, Adrian Seeliger, Stefan Rüping, and Najmeh Mousavi Nejad. "Limitations of Feature Attribution in Long Text Classification of Standards." Proceedings of the AAAI Symposium Series 4, no. 1 (2024): 10–17. http://dx.doi.org/10.1609/aaaiss.v4i1.31765.

Full text

Abstract:

Managing complex AI systems requires insight into a model's decision-making processes. Understanding how these systems arrive at their conclusions is essential for ensuring reliability. In the field of explainable natural language processing, many approaches have been developed and evaluated. However, experimental analysis of explainability for text classification has been largely constrained to short text and binary classification. In this applied work, we study explainability for a real-world task where the goal is to assess the technological suitability of standards. This prototypical use case is characterized by large documents, technical language, and a multi-label setting, making it a complex modeling challenge. We provide an analysis of approx. 1000 documents with human-annotated evidence. We then present experimental results with two explanation methods evaluating plausibility and runtime of explanations. We find that the average runtime for explanation generation is at least 5 minutes and that the model explanations do not overlap with the ground truth. These findings reveal limitations of current explanation methods. In a detailed discussion, we identify possible reasons and how to address them on three different dimensions: task, model and explanation method. We conclude with risks and recommendations for the use of feature attribution methods in similar settings.

APA, Harvard, Vancouver, ISO, and other styles

2

Sitompul, Anita, Kammer Tuahman Sipayung, and Jubil Sihite. "The Analysis of Reading Exercise in English Textbook Entitled Pathway to English for The SENIOR High School Grade X." Jurnal Suluh Pendidikan 7, no. 1 (2019): 10–13. http://dx.doi.org/10.36655/jsp.v7i1.111.

Full text

Abstract:

This study is aimed to analyze the types of Reading Exercises on the English textbook used by the first year students of SMA SWASTA METHODIST 7 Medan. The objective of the study is to find out the types of reading exercises on English textbook used by the first year students of SMA SWASTA METHODIST 7 Medan. The design of the study is descriptive qualitative research. The qualitative data were obtained by the steps that mentioned in the procedure of the research, i.e. reading, identification, classification and simplification. The researcher analyzed the exercise in three steps,identifying the topic, clutser the topic, draw explanation. The object of the study is the reading exercises created by Th. M. Sudarwati and Eudia Grace entitled by Pathway To English and published by Erlangga, 2017. The data was collected only by taking documentary analysis. It means that the writer document reading exercises on students� English textbook and analyzed its types of reading exercises. The final result of this study shows that are five types of reading exercises on the English textbook they are Types of reading exercise are Matching Test, True/ False Reading Test, Multiple Choice item test, Completion item tests, and Long and Short answer questions. The result of reading exercise analysis shows that there are controlled exercise and guided exercise. In controlled exercises the researcher didn�t find exercise in Pathway to English textbook. And in guided exercises there are find in multiple choice in vocabulary 2 exercise. Matching cued word matching 3 exercise, matching picture cued sentence 1 exercise, vocabulary matching 5 exercise, matching selected response fill in vocabulary 7 exercise. True or false 3 exercise, completion item in the following of text 1 exercise, completion are in the text itself 9 exercise. In short answer question 6 exercise, long answer question 8 exercise. But, the researcher didn�t find exercise in multiple choice contextualized vocabulary / grammar and multiple choice vocabulary / grammar.

APA, Harvard, Vancouver, ISO, and other styles

3

Pfau-Effinger, Birgit, and Marcel Sebastian. "Institutional persistence despite cultural change: a historical case study of the re-categorization of dogs in Germany." Agriculture and Human Values 39, no. 1 (2021): 473–85. http://dx.doi.org/10.1007/s10460-021-10272-4.

Full text

Abstract:

AbstractHuman–animal relations in post-industrial societies are characterized by a system of cultural categories that distinguishes between different types of animals based on their function in human society, such as “farm animals” or “pets.” The system of cultural categories, and the allocation of animal species within this cultural classification system can change. Options for change include re-categorizing a specific animal species within the categorical system. The paper argues that attempts by political actors to adapt the institutional system to cultural change that calls for re-categorization of certain animal species can start a contradictory process that may lead to long-term survival of the respective institution despite the cultural change. It is common to explain the persistence of political institutions with institutional path dependency or policy preferences of the governing parties. This paper introduces a new institutional theoretical approach to the explanation, the approach of “rejecting changing a part for fear of undermining the whole.” This paper uses a case study of a series of failed political efforts to change the treatment of dogs in the framework of the agricultural human–animal policy in the Federal Republic of Germany in the second half of the twentieth century, to evaluate its theoretical argument, using analyses of historical political documents, mass media, and communication documents between civil society actors and policymakers. This paper makes an innovative contribution to the theory and research on institutional change, the sociology of agriculture and food, and the sociology of human–animal relations.

APA, Harvard, Vancouver, ISO, and other styles

4

Shi, Tian, Xuchao Zhang, Ping Wang, and Chandan K. Reddy. "Corpus-level and Concept-based Explanations for Interpretable Document Classification." ACM Transactions on Knowledge Discovery from Data 16, no. 3 (2022): 1–17. http://dx.doi.org/10.1145/3477539.

Full text

Abstract:

Using attention weights to identify information that is important for models’ decision making is a popular approach to interpret attention-based neural networks. This is commonly realized in practice through the generation of a heat-map for every single document based on attention weights. However, this interpretation method is fragile and it is easy to find contradictory examples. In this article, we propose a corpus-level explanation approach, which aims at capturing causal relationships between keywords and model predictions via learning the importance of keywords for predicted labels across a training corpus based on attention weights. Based on this idea, we further propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model prediction tasks. Our concept-based explanation method is built upon a novel Abstraction-Aggregation Network (AAN), which can automatically cluster important keywords during an end-to-end training process. We apply these methods to the document classification task and show that they are powerful in extracting semantically meaningful keywords and concepts. Our consistency analysis results based on an attention-based Naïve Bayes classifier (NBC) also demonstrate that these keywords and concepts are important for model predictions.

APA, Harvard, Vancouver, ISO, and other styles

5

Uddin, Farid, Yibo Chen, Zuping Zhang, and Xin Huang. "Corpus Statistics Empowered Document Classification." Electronics 11, no. 14 (2022): 2168. http://dx.doi.org/10.3390/electronics11142168.

Full text

Abstract:

In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging to capture semantic insights when dealing with short-length documents where word co-occurrence information is limited. In this context, for long texts, we proposed Weighted Sparse Document Vector (WSDV), which performs clustering on the weighted data that emphasizes vital terms and moderates the soft clustering by removing outliers from the converged clusters. Besides the removal of outliers, WSDV utilizes corpus statistics in different steps for the vectorial representation of the document. For short texts, we proposed Weighted Compact Document Vector (WCDV), which captures better semantic insights in building document vectors by emphasizing potential terms and capturing uncertainty information while measuring the affinity between distributions of words. Using available corpus statistics, WCDV sufficiently handles the data sparsity of short texts without depending on external knowledge sources. To evaluate the proposed models, we performed a multiclass document classification using standard performance measures (precision, recall, f1-score, and accuracy) on three long- and two short-text benchmark datasets that outperform some state-of-the-art models. The experimental results demonstrate that in the long-text classification, WSDV reached 97.83% accuracy on the AgNews dataset, 86.05% accuracy on the 20Newsgroup dataset, and 98.67% accuracy on the R8 dataset. In the short-text classification, WCDV reached 72.7% accuracy on the SearchSnippets dataset and 89.4% accuracy on the Twitter dataset.

APA, Harvard, Vancouver, ISO, and other styles

6

Isha, Bharti Bhardwaj, and Bal Ram Bhardwaj Er. "SMART CLOUD WITH DOCUMENT CLUSTERING." International Journal of Advances in Engineering & Scientific Research 3, no. 2 (2016): 18–31. https://doi.org/10.5281/zenodo.10749526.

Full text

Abstract:

<strong>Abstract: </strong>   <em>This research paper describes the results oriented from experimental study of conventional document clustering techniques implemented in the commercial spaces so far. Particularly, we compared main approaches related to document clustering, agglomerative hierarchical clustering and K-means. though this paper, we generates and implement checker’s algorithms which deals with the duplicacy of the document content with the rest of the documents in the cloud. We also generate algorithm required to deals with the classification of the cloud data. The classification in this algorithm is done on the basis of the date of data uploaded and the how much that data is accessed by the client. We will take the ratio of both vectors and generate a score which rates the document in the classification. We propose an explanation for these results that is based on an analysis of the specifics of the clustering algorithms and the nature of document data.</em> <strong>Keywords:</strong><em> </em>algorithm, commercial, classification, hierarchical, nature, etc.

APA, Harvard, Vancouver, ISO, and other styles

7

Liu, Liu, Kaile Liu, Zhenghai Cong, Jiali Zhao, Yefei Ji, and Jun He. "Long Length Document Classification by Local Convolutional Feature Aggregation." Algorithms 11, no. 8 (2018): 109. http://dx.doi.org/10.3390/a11080109.

Full text

Abstract:

The exponential increase in online reviews and recommendations makes document classification and sentiment analysis a hot topic in academic and industrial research. Traditional deep learning based document classification methods require the use of full textual information to extract features. In this paper, in order to tackle long document, we proposed three methods that use local convolutional feature aggregation to implement document classification. The first proposed method randomly draws blocks of continuous words in the full document. Each block is then fed into the convolution neural network to extract features and then are concatenated together to output the classification probability through a classifier. The second model improves the first by capturing the contextual order information of the sampled blocks with a recurrent neural network. The third model is inspired by the recurrent attention model (RAM), in which a reinforcement learning module is introduced to act as a controller for selecting the next block position based on the recurrent state. Experiments on our collected four-class arXiv paper dataset show that the three proposed models all perform well, and the RAM model achieves the best test accuracy with the least information.

APA, Harvard, Vancouver, ISO, and other styles

8

Almonayyes, Ahmad. "Multiple Explanations Driven Naïve Bayes Classifier." JUCS - Journal of Universal Computer Science 12, no. (2) (2006): 127–39. https://doi.org/10.3217/jucs-012-02-0127.

Full text

Abstract:

Exploratory data analysis over foreign language text presents virtually untapped opportunity. This work incorporates Naïve Bayes classifier with Case-Based Reasoning in order to classify and analyze Arabic texts related to fanaticism. The Arabic vocabularies are converted to equivalent English words using conceptual hierarchy structure. The understanding process operates at two phases. At the first phase, a discrimination network of multiple questions is used to retrieve explanatory knowledge structures each of which gives an interpretation of a text according to a particular aspect of fanaticism. Explanation structures organize past documents of fanatic content. Similar documents are retrieved to generate additional valuable information about the new document. In the second phase, the document classification process based on Naïve Bayes is used to classify documents into their fanatic class. The results show that the classification accuracy is improved by incorporating the explanation patterns with the Naïve Bayes classifier.

APA, Harvard, Vancouver, ISO, and other styles

9

Mariyam, Ayesha, SK Althaf Hussain Basha, and S. Viswanadha Raju. "On Optimality of Long Document Classification using Deep Learning." International Journal on Recent and Innovation Trends in Computing and Communication 10, no. 12 (2022): 51–58. http://dx.doi.org/10.17762/ijritcc.v10i12.5866.

Full text

Abstract:

Document classification is effective with elegant models of word numerical distributions. The word embeddings are one of the categories of numerical distributions of words from the WordNet. The modern machine learning algorithms yearn on classifying documents based on the categorical data. The context of interest on the categorical data is posed with weights and the sense and quality of the sentences is estimated for sensible classification of documents. The focus of the current work is on legal and criminal documents extracted from the popular news channels, particularly on classification of long length legal and criminal documents. Optimization is the essential instrument to bring the quality inputs to the document classification model. The existing models are studied and a feasible model for the efficient document classification is proposed. The experiments are carried out with meticulous filtering and extraction of legal and criminal records from the popular news web sites and preprocessed with WordNet and Text Processing contingencies for efficient inward for the learning framework.

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Bohan, Rui Qi, Jinhua Gao, Jianwei Zhang, Xiaoguang Yuan, and Wenjun Ke. "Mining the Frequent Patterns of Named Entities for Long Document Classification." Applied Sciences 12, no. 5 (2022): 2544. http://dx.doi.org/10.3390/app12052544.

Full text

Abstract:

Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MIPELD (mining the frequent pattern of a named entity for long document classification) for long document classification is demonstrated, which mines the frequent patterns of named entities as features. Discovered patterns allow semantic generalization among documents and provide clues for verifying the results. Experiments on several datasets resulted in good accuracy and marco-F1 values, meeting the requirements for practical application. Further analysis validated the effectiveness of MIPELD in mining interpretable information in text classification.

APA, Harvard, Vancouver, ISO, and other styles

11

Zheng, Jianming, Yupu Guo, Chong Feng, and Honghui Chen. "A Hierarchical Neural-Network-Based Document Representation Approach for Text Classification." Mathematical Problems in Engineering 2018 (2018): 1–10. http://dx.doi.org/10.1155/2018/7987691.

Full text

Abstract:

Document representation is widely used in practical application, for example, sentiment classification, text retrieval, and text classification. Previous work is mainly based on the statistics and the neural networks, which suffer from data sparsity and model interpretability, respectively. In this paper, we propose a general framework for document representation with a hierarchical architecture. In particular, we incorporate the hierarchical architecture into three traditional neural-network models for document representation, resulting in three hierarchical neural representation models for document classification, that is, TextHFT, TextHRNN, and TextHCNN. Our comprehensive experimental results on two public datasets, that is, Yelp 2016 and Amazon Reviews (Electronics), show that our proposals with hierarchical architecture outperform the corresponding neural-network models for document classification, resulting in a significant improvement ranging from 4.65% to 35.08% in terms of accuracy with a comparable (or substantially less) expense of time consumption. In addition, we find that the long documents benefit more from the hierarchical architecture than the short ones as the improvement in terms of accuracy on long documents is greater than that on short documents.

APA, Harvard, Vancouver, ISO, and other styles

12

Bai, Juho, Inwook Shim, and Seog Park. "MEXN: Multi-Stage Extraction Network for Patent Document Classification." Applied Sciences 10, no. 18 (2020): 6229. http://dx.doi.org/10.3390/app10186229.

Full text

Abstract:

The patent document has different content for each paragraph, and the length of the document is also very long. Moreover, patent documents are classified hierarchically as multi-labels. Many works have employed deep neural architectures to classify the patent documents. Traditional document classification methods have not well represented the characteristics of entire patent document contents because they usually require a fixed input length. To address this issue, we propose a neural network-based document classification for patent documents by designing a novel multi-stage feature extraction network (MEXN), which comprise of paragraphs encoder and summarizer for all paragraphs. MEXN features analysis of the whole documents hierarchically and providing multi-labels outputs. Furthermore, MEXN preserves computing performance marginally increase. We demonstrate that the proposed method outperforms current state-of-the-art models in patent document classification tasks with multi-label classification experiments for USPD datasets.

APA, Harvard, Vancouver, ISO, and other styles

13

Yan, Yi-Fan, Sheng-Jun Huang, Shaoyi Chen, Meng Liao, and Jin Xu. "Active Learning with Query Generation for Cost-Effective Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (2020): 6583–90. http://dx.doi.org/10.1609/aaai.v34i04.6133.

Full text

Abstract:

Labeling a text document is usually time consuming because it requires the annotator to read the whole document and check its relevance with each possible class label. It thus becomes rather expensive to train an effective model for text classification when it involves a large dataset of long documents. In this paper, we propose an active learning approach for text classification with lower annotation cost. Instead of scanning all the examples in the unlabeled data pool to select the best one for query, the proposed method automatically generates the most informative examples based on the classification model, and thus can be applied to tasks with large scale or even infinite unlabeled data. Furthermore, we propose to approximate the generated example with a few summary words by sparse reconstruction, which allows the annotators to easily assign the class label by reading a few words rather than the long document. Experiments on different datasets demonstrate that the proposed approach can effectively improve the classification performance while significantly reduce the annotation cost.

APA, Harvard, Vancouver, ISO, and other styles

14

He, Jun, Liqun Wang, Liu Liu, Jiao Feng, and Hao Wu. "Long Document Classification From Local Word Glimpses via Recurrent Attention Learning." IEEE Access 7 (2019): 40707–18. http://dx.doi.org/10.1109/access.2019.2907992.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Zhao, Ke, Lan Huang, Rui Song, Qiang Shen, and Hao Xu. "A Sequential Graph Neural Network for Short Text Classification." Algorithms 14, no. 12 (2021): 352. http://dx.doi.org/10.3390/a14120352.

Full text

Abstract:

Short text classification is an important problem of natural language processing (NLP), and graph neural networks (GNNs) have been successfully used to solve different NLP problems. However, few studies employ GNN for short text classification, and most of the existing graph-based models ignore sequential information (e.g., word orders) in each document. In this work, we propose an improved sequence-based feature propagation scheme, which fully uses word representation and document-level word interaction and overcomes the limitations of textual features in short texts. On this basis, we utilize this propagation scheme to construct a lightweight model, sequential GNN (SGNN), and its extended model, ESGNN. Specifically, we build individual graphs for each document in the short text corpus based on word co-occurrence and use a bidirectional long short-term memory network (Bi-LSTM) to extract the sequential features of each document; therefore, word nodes in the document graph retain contextual information. Furthermore, two different simplified graph convolutional networks (GCNs) are used to learn word representations based on their local structures. Finally, word nodes combined with sequential information and local information are incorporated as the document representation. Extensive experiments on seven benchmark datasets demonstrate the effectiveness of our method.

APA, Harvard, Vancouver, ISO, and other styles

16

Wang, Yifei, Yongwei Wang, Hao Hu, Shengnan Zhou, and Qinwu Wang. "Knowledge-Graph- and GCN-Based Domain Chinese Long Text Classification Method." Applied Sciences 13, no. 13 (2023): 7915. http://dx.doi.org/10.3390/app13137915.

Full text

Abstract:

In order to solve the current problems in domain long text classification tasks, namely, the long length of a document, which makes it difficult for the model to capture key information, and the lack of expert domain knowledge, which leads to insufficient classification accuracy, a domain long text classification model based on a knowledge graph and a graph convolutional neural network is proposed. BERT is used to encode the text, and each word’s corresponding vector is used as a node for the graph convolutional neural network so that the initialized vector contains rich semantic information. Using the trained entity–relationship extraction model, the entity-to-entity–relationships in the document are extracted and used as the edges of the graph convolutional neural network, together with syntactic dependency information. The graph structure mask is used to learn about edge relationships and edge types to further enhance the learning ability of the model for semantic dependencies between words. The method further improves the accuracy of domain long text classification by fusing knowledge features and data features. Experiments on three long text classification datasets—IFLYTEK, THUCNews, and the Chinese corpus of Fudan University—show accuracy improvements of 8.8%, 3.6%, and 2.6%, respectively, relative to the BERT model.

APA, Harvard, Vancouver, ISO, and other styles

17

Nikesh, M., D. Rohini, M. Bharathi, and Syeda Hifsa Naaz. "Deep Dive into Document Classification: Fusion of RNN and LSTM Approach." Journal of Knowledge in Data Science and Information Management 2, no. 1 (2025): 9–20. https://doi.org/10.46610/jokdsim.2025.v02i01.002.

Full text

Abstract:

Deep learning models have transformed text classification, outperforming traditional machine learning in tasks like sentiment analysis, news categorization, and question answering. Among them, Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks excel in handling sequential data. While RNNs capture context well, they struggle with long-term dependencies due to the vanishing gradient problem. LSTMs overcome this with memory cells that retain essential information, making them more effective for tasks requiring long-range context understanding. LSTMs solve this by using memory cells, allowing them to retain context over time. In this paper, we explore different deep learning models, particularly merge RNN and LSTM, to identify the most accurate approach for text and document classification.

APA, Harvard, Vancouver, ISO, and other styles

18

Wu, Tiandeng, Qijiong Liu, Yi Cao, Yao Huang, Xiao-Ming Wu, and Jiandong Ding. "Continual Graph Convolutional Network for Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13754–62. http://dx.doi.org/10.1609/aaai.v37i11.26611.

Full text

Abstract:

Graph convolutional network (GCN) has been successfully applied to capture global non-consecutive and long-distance semantic information for text classification. However, while GCN-based methods have shown promising results in offline evaluations, they commonly follow a seen-token-seen-document paradigm by constructing a fixed document-token graph and cannot make inferences on new documents. It is a challenge to deploy them in online systems to infer steaming text data. In this work, we present a continual GCN model (ContGCN) to generalize inferences from observed documents to unobserved documents. Concretely, we propose a new all-token-any-document paradigm to dynamically update the document-token graph in every batch during both the training and testing phases of an online system. Moreover, we design an occurrence memory module and a self-supervised contrastive learning objective to update ContGCN in a label-free manner. A 3-month A/B test on Huawei public opinion analysis system shows ContGCN achieves 8.86% performance gain compared with state-of-the-art methods. Offline experiments on five public datasets also show ContGCN can improve inference quality. The source code will be released at https://github.com/Jyonn/ContGCN.

APA, Harvard, Vancouver, ISO, and other styles

19

Chang, Charles, and Michael Masterson. "Using Word Order in Political Text Classification with Long Short-term Memory Models." Political Analysis 28, no. 3 (2019): 395–411. http://dx.doi.org/10.1017/pan.2019.46.

Full text

Abstract:

Political scientists often wish to classify documents based on their content to measure variables, such as the ideology of political speeches or whether documents describe a Militarized Interstate Dispute. Simple classifiers often serve well in these tasks. However, if words occurring early in a document alter the meaning of words occurring later in the document, using a more complicated model that can incorporate these time-dependent relationships can increase classification accuracy. Long short-term memory (LSTM) models are a type of neural network model designed to work with data that contains time dependencies. We investigate the conditions under which these models are useful for political science text classification tasks with applications to Chinese social media posts as well as US newspaper articles. We also provide guidance for the use of LSTM models.

APA, Harvard, Vancouver, ISO, and other styles

20

Doussot, Sylvain. "Lincoln et l’esclavage : étude d’un cas de problématisation en histoire scolaire." Didactica Historica 4, no. 1 (2018): 1–11. http://dx.doi.org/10.33055/didacticahistorica.2018.004.01.99.long.

Full text

Abstract:

French students in a graduate training program to become history teachers have been studying a document in a typical classroom activity about an unknown topic (The American Civil War). This paper shows how close their positivist school habits are to school students, and how data and explanation models dealing with another question have arisen from the struggle to answer the initial question. This gives access to the conditions of possibility of a historical problem-building process in a school environment.

APA, Harvard, Vancouver, ISO, and other styles

21

Doussot, Sylvain. "Lincoln et l’esclavage : étude d’un cas de problématisation en histoire scolaire." Didactica Historica 4, no. 1 (2018): 1–11. http://dx.doi.org/10.33055/didacticahistorica.2018.004.01.99.long.

Full text

Abstract:

French students in a graduate training program to become history teachers have been studying a document in a typical classroom activity about an unknown topic (The American Civil War). This paper shows how close their positivist school habits are to school students, and how data and explanation models dealing with another question have arisen from the struggle to answer the initial question. This gives access to the conditions of possibility of a historical problem-building process in a school environment.

APA, Harvard, Vancouver, ISO, and other styles

22

Doussot, Sylvain. "Lincoln et l’esclavage : étude d’un cas de problématisation en histoire scolaire." Didactica Historica 4, no. 1 (2018): 1–11. http://dx.doi.org/10.33055/didacticahistorica.2018.004.01.99.long.

Full text

Abstract:

French students in a graduate training program to become history teachers have been studying a document in a typical classroom activity about an unknown topic (The American Civil War). This paper shows how close their positivist school habits are to school students, and how data and explanation models dealing with another question have arisen from the struggle to answer the initial question. This gives access to the conditions of possibility of a historical problem-building process in a school environment.

APA, Harvard, Vancouver, ISO, and other styles

23

Desai, Poorav, Tanmoy Chakraborty, and Md Shad Akhtar. "Nice Perfume. How Long Did You Marinate in It? Multimodal Sarcasm Explanation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 10563–71. http://dx.doi.org/10.1609/aaai.v36i10.21300.

Full text

Abstract:

Sarcasm is a pervading linguistic phenomenon and highly challenging to explain due to its subjectivity, lack of context and deeply-felt opinion. In the multimodal setup, sarcasm is conveyed through the incongruity between the text and visual entities. Although recent approaches deal with sarcasm as a classification problem, it is unclear why an online post is identified as sarcastic. Without proper explanation, end users may not be able to perceive the underlying sense of irony. In this paper, we propose a novel problem -- Multimodal Sarcasm Explanation (MuSE) -- given a multimodal sarcastic post containing an image and a caption, we aim to generate a natural language explanation to reveal the intended sarcasm. To this end, we develop MORE, a new dataset with explanation of 3510 sarcastic multimodal posts. Each explanation is a natural language (English) sentence describing the hidden irony. We benchmark MORE by employing a multimodal Transformer-based architecture. It incorporates a cross-modal attention in the Transformer's encoder which attends to the distinguishing features between the two modalities. Subsequently, a BART-based auto-regressive decoder is used as the generator. Empirical results demonstrate convincing results over various baselines (adopted for MuSE) across five evaluation metrics. We also conduct human evaluation on predictions and obtain Fleiss' Kappa score of 0.4 as a fair agreement among 25 evaluators.

APA, Harvard, Vancouver, ISO, and other styles

24

Xu, Pengyu, Lin Xiao, Bing Liu, Sijin Lu, Liping Jing, and Jian Yu. "Label-Specific Feature Augmentation for Long-Tailed Multi-Label Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (2023): 10602–10. http://dx.doi.org/10.1609/aaai.v37i9.26259.

Full text

Abstract:

Multi-label text classification (MLTC) involves tagging a document with its most relevant subset of labels from a label set. In real applications, labels usually follow a long-tailed distribution, where most labels (called as tail-label) only contain a small number of documents and limit the performance of MLTC. To facilitate this low-resource problem, researchers introduced a simple but effective strategy, data augmentation (DA). However, most existing DA approaches struggle in multi-label settings. The main reason is that the augmented documents for one label may inevitably influence the other co-occurring labels and further exaggerate the long-tailed problem. To mitigate this issue, we propose a new pair-level augmentation framework for MLTC, called Label-Specific Feature Augmentation (LSFA), which merely augments positive feature-label pairs for the tail-labels. LSFA contains two main parts. The first is for label-specific document representation learning in the high-level latent space, the second is for augmenting tail-label features in latent space by transferring the documents second-order statistics (intra-class semantic variations) from head labels to tail labels. At last, we design a new loss function for adjusting classifiers based on augmented datasets. The whole learning procedure can be effectively trained. Comprehensive experiments on benchmark datasets have shown that the proposed LSFA outperforms the state-of-the-art counterparts.

APA, Harvard, Vancouver, ISO, and other styles

25

Sitaula, Chiranjibi, Anish Basnet, and Sunil Aryal. "Vector representation based on a supervised codebook for Nepali documents classification." PeerJ Computer Science 7 (March 3, 2021): e412. http://dx.doi.org/10.7717/peerj-cs.412.

Full text

Abstract:

Document representation with outlier tokens exacerbates the classification performance due to the uncertain orientation of such tokens. Most existing document representation methods in different languages including Nepali mostly ignore the strategies to filter them out from documents before learning their representations. In this article, we propose a novel document representation method based on a supervised codebook to represent the Nepali documents, where our codebook contains only semantic tokens without outliers. Our codebook is domain-specific as it is based on tokens in a given corpus that have higher similarities with the class labels in the corpus. Our method adopts a simple yet prominent representation method for each word, called probability-based word embedding. To show the efficacy of our method, we evaluate its performance in the document classification task using Support Vector Machine and validate against widely used document representation methods such as Bag of Words, Latent Dirichlet allocation, Long Short-Term Memory, Word2Vec, Bidirectional Encoder Representations from Transformers and so on, using four Nepali text datasets (we denote them shortly as A1, A2, A3 and A4). The experimental results show that our method produces state-of-the-art classification performance (77.46% accuracy on A1, 67.53% accuracy on A2, 80.54% accuracy on A3 and 89.58% accuracy on A4) compared to the widely used existing document representation methods. It yields the best classification accuracy on three datasets (A1, A2 and A3) and a comparable accuracy on the fourth dataset (A4). Furthermore, we introduce the largest Nepali document dataset (A4), called NepaliLinguistic dataset, to the linguistic community.

APA, Harvard, Vancouver, ISO, and other styles

26

Arnold, Sebastian, Rudolf Schneider, Philippe Cudré-Mauroux, Felix A. Gers, and Alexander Löser. "SECTOR: A Neural Model for Coherent Topic Segmentation and Classification." Transactions of the Association for Computational Linguistics 7 (November 2019): 169–84. http://dx.doi.org/10.1162/tacl_a_00261.

Full text

Abstract:

When searching for information, a human reader first glances over a document, spots relevant sections, and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates the identification of the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available data set with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR long short-term memory model with Bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 over state-of-the-art CNN classifiers with baseline segmentation.

APA, Harvard, Vancouver, ISO, and other styles

27

Berard, Raymond, Etienne Fleuret, Jacqueline Gillet, and Jean-Yves Mougel. "Academia and document supply: unsustainable contradictions at INIST?" Interlending & Document Supply 43, no. 3 (2015): 131–37. http://dx.doi.org/10.1108/ilds-05-2015-0017.

Full text

Abstract:

Purpose – The purpose of the paper is to describe the current situation at the Institute for Scientific and Technical Information (INIST) the French document supply centre after their difficulties with open access articles during 2013. Design/methodology/approach – A narrative and analytical explanation by the director of INIST. Findings – That INIST will no longer service the commercial sector and will provide documents to researchers in CNRS for free and priced to French higher education establishments. The growth in open access will mean that INIST is ceasing to be an industrial scale operation and will be concerned primarily with “long tail” requests. Originality/value – Certainly, the only account in English of the difficulties that INIST has faced in the past three years and how they have been dealt with.

APA, Harvard, Vancouver, ISO, and other styles

28

BASILI, ROBERTO, and ALESSANDRO MOSCHITTI. "INTELLIGENT NLP-DRIVEN TEXT CLASSIFICATION." International Journal on Artificial Intelligence Tools 11, no. 03 (2002): 389–423. http://dx.doi.org/10.1142/s0218213002000952.

Full text

Abstract:

Information Retrieval (IR) and NLP-driven Information Extraction (IE) are complementary activities. IR helps in locating specific documents within a huge search space (localization) while IE supports the localization of specific information within a document (extraction or explanation). In application scenarios both capabilities are usually needed. IE is important here, as it can enrich the IR inferences with motivating information. Works on Web-based IR suggest that embedding linguistic information (e.g. sense distinctions) at a suitable level within traditional quantitative approaches (e.g. query expansion as in [26]) is a promising approach. "Which linguistic level is best suited to which IR mechanism" is the interesting representational problem posed by the current research stage. This is also the central concern of this paper. A traditional method for efficient text categorization is here presented. Original features of the proposed model are a self-adapting parameterized weighting model and the use of linguistic information. The key idea is the integration of NLP methods within a robust and efficient TC framework. This allows to combine benefits of large scale and efficient IR with the richer expressivity closer to IE. In this paper we capitalize the systematic benchmarking resources available in TC to extensively derive empirical evidence about the above representational problem. The positive experimental results confirm that the proposed TC framework characterizes as a viable approach to intelligent text categorization on a large scale.

APA, Harvard, Vancouver, ISO, and other styles

29

Lee, Kangwook, Sanggyu Han, and Sung-Hyon Myaeng. "A discourse-aware neural network-based text model for document-level text classification." Journal of Information Science 44, no. 6 (2017): 715–35. http://dx.doi.org/10.1177/0165551517743644.

Full text

Abstract:

Capturing semantics scattered across entire text is one of the important issues for Natural Language Processing (NLP) tasks. It would be particularly critical with long text embodying a flow of themes. This article proposes a new text modelling method that can handle thematic flows of text with Deep Neural Networks (DNNs) in such a way that discourse information and distributed representations of text are incorporate. Unlike previous DNN-based document models, the proposed model enables discourse-aware analysis of text and composition of sentence-level distributed representations guided by the discourse structure. More specifically, our method identifies Elementary Discourse Units (EDUs) and their discourse relations in a given document by applying Rhetorical Structure Theory (RST)-based discourse analysis. The result is fed into a tree-structured neural network that reflects the discourse information including the structure of the document and the discourse roles and relation types. We evaluate the document model for two document-level text classification tasks, sentiment analysis and sarcasm detection, with comparisons against the reference systems that also utilise discourse information. In addition, we conduct additional experiments to evaluate the impact of neural network types and adopted discourse factors on modelling documents vis-à-vis the two classification tasks. Furthermore, we investigate the effects of various learning methods, input units on the quality of the proposed discourse-aware document model.

APA, Harvard, Vancouver, ISO, and other styles

30

SHPINEV, YURY. "CLASSIFICATION OF INVESTMENTS BY TERMS: SHORT-TERM AND LONG-TERM." Gaps in Russian Legislation 14, no. 4 (2021): 229–35. http://dx.doi.org/10.33693/2072-3164-2021-14-4-229-235.

Full text

Abstract:

Currently, there are many options for classifying investments in the scientific community, but almost all authors carry out the classification by terms. At the same time, scientists do not have a single approach to classification by time attribute. In addition, the proposed classification options (short-term, long-term, medium-term), as well as the terms of certain investments themselves, are usually not justified anywhere and are presented as a given. According to the author, such an arbitrary and unjustified classification does not meet the requirements of scientific classification. In addition, different approaches and options for classifying investments by terms in investment textbooks do not contribute to a unified collection, analysis, accounting and reporting in the field of investment, as well as the unification of financial documents in accordance with international standards, but on the contrary, will contribute to the ambiguity of law enforcement practice. Based on the options for classifying investments by terms proposed by the scientific community, an analysis of regulatory acts, as well as international standards in the field of finance and accounting, the author comes to the conclusion that it is advisable to use a single classification by investment period for short-term (up to one year) and long-term (over one year), and fixing such a classification in a regulatory document.

APA, Harvard, Vancouver, ISO, and other styles

31

Nakajima, Hiromu, and Minoru Sasaki. "Text Classification Based on the Heterogeneous Graph Considering the Relationships between Documents." Big Data and Cognitive Computing 7, no. 4 (2023): 181. http://dx.doi.org/10.3390/bdcc7040181.

Full text

Abstract:

Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. Conventional graph-based methods express relationships between words and relationships between words and documents as weights between nodes. Then, a graph neural network is used for learning. However, there is a problem that conventional methods are not able to represent the relationship between documents on the graph. In this paper, we propose a graph structure that considers the relationships between documents. In the proposed method, the cosine similarity of document vectors is set as weights between document nodes. This completes a graph that considers the relationship between documents. The graph is then input into a graph convolutional neural network for training. Therefore, the aim of this study is to improve the text classification performance of conventional methods by using this graph that considers the relationships between document nodes. In this study, we conducted evaluation experiments using five different corpora of English documents. The results showed that the proposed method outperformed the performance of the conventional method by up to 1.19%, indicating that the use of relationships between documents is effective. In addition, the proposed method was shown to be particularly effective in classifying long documents.

APA, Harvard, Vancouver, ISO, and other styles

32

Antognini, Diego, Claudiu Musat, and Boi Faltings. "Multi-Dimensional Explanation of Target Variables from Documents." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (2021): 12507–15. http://dx.doi.org/10.1609/aaai.v35i14.17483.

Full text

Abstract:

Automated predictions require explanations to be interpretable by humans. Past work used attention and rationale mechanisms to find words that predict the target variable of a document. Often though, they result in a tradeoff between noisy explanations or a drop in accuracy. Furthermore, rationale methods cannot capture the multi-faceted nature of justifications for multiple targets, because of the non-probabilistic nature of the mask. In this paper, we propose the Multi-Target Masker (MTM) to address these shortcomings. The novelty lies in the soft multi-dimensional mask that models a relevance probability distribution over the set of target variables to handle ambiguities. Additionally, two regularizers guide MTM to induce long, meaningful explanations. We evaluate MTM on two datasets and show, using standard metrics and human annotations, that the resulting masks are more accurate and coherent than those generated by the state-of-the-art methods. Moreover, MTM is the first to also achieve the highest F1 scores for all the target variables simultaneously.

APA, Harvard, Vancouver, ISO, and other styles

33

Vu, Huy Hien, Hidetaka Kamigaito, and Taro Watanabe. "Context-Aware Machine Translation with Source Coreference Explanation." Transactions of the Association for Computational Linguistics 12 (2024): 856–74. http://dx.doi.org/10.1162/tacl_a_00677.

Full text

Abstract:

Abstract Despite significant improvements in enhancing the quality of translation, context-aware machine translation (MT) models underperform in many cases. One of the main reasons is that they fail to utilize the correct features from context when the context is too long or their models are overly complex. This can lead to the explain-away effect, wherein the models only consider features easier to explain predictions, resulting in inaccurate translations. To address this issue, we propose a model that explains the decisions made for translation by predicting coreference features in the input. We construct a model for input coreference by exploiting contextual features from both the input and translation output representations on top of an existing MT model. We evaluate and analyze our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset, demonstrating an improvement of over 1.0 BLEU score when compared with other context-aware models.

APA, Harvard, Vancouver, ISO, and other styles

34

Lv, Shaoqing, Jungang Dong, Chichi Wang, Xuanhong Wang, and Zhiqiang Bao. "RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention Network." Sensors 24, no. 11 (2024): 3365. http://dx.doi.org/10.3390/s24113365.

Full text

Abstract:

With the development of deep learning, several graph neural network (GNN)-based approaches have been utilized for text classification. However, GNNs encounter challenges when capturing contextual text information within a document sequence. To address this, a novel text classification model, RB-GAT, is proposed by combining RoBERTa-BiGRU embedding and a multi-head Graph ATtention Network (GAT). First, the pre-trained RoBERTa model is exploited to learn word and text embeddings in different contexts. Second, the Bidirectional Gated Recurrent Unit (BiGRU) is employed to capture long-term dependencies and bidirectional sentence information from the text context. Next, the multi-head graph attention network is applied to analyze this information, which serves as a node feature for the document. Finally, the classification results are generated through a Softmax layer. Experimental results on five benchmark datasets demonstrate that our method can achieve an accuracy of 71.48%, 98.45%, 80.32%, 90.84%, and 95.67% on Ohsumed, R8, MR, 20NG and R52, respectively, which is superior to the existing nine text classification approaches.

APA, Harvard, Vancouver, ISO, and other styles

35

Azal Minshed Abid. "Arabic text classification using deep feature and bidirectional long-short-term memory." Global Journal of Engineering and Technology Advances 13, no. 1 (2022): 098–107. http://dx.doi.org/10.30574/gjeta.2022.13.1.0179.

Full text

Abstract:

Due to the increased demand for automatic document organization, text classification is essential in both academic and commercial platforms. The aim of text classification is to automatically group text documents into one or more predefined categories,that helps to solve a variety of challenges. Many of these concerns are related to data management. In this paper, we propose a new model for Arabic text classification. The model consists of two main phases. The first phase is concerned with extracting three sets of features: statistical feature, Latent Semantic Analysis (LSA) feature, and a combination of both. While the second phase is concerned with introducing these features separately to the Bidirectional Long-short Term Memory (BI-LSTM) for classification purposes. The performance of the proposed model evaluated using CNN Arabic corpus. The experimental results showed solid performance of the proposed model, especially for a combination feature when the averages of precision, recall, and F-measurement reached 94, 91, and 91.94 respectively.

APA, Harvard, Vancouver, ISO, and other styles

36

Azal, Minshed Abid. "Arabic text classification using deep feature and bidirectional long-short-term memory." Global Journal of Engineering and Technology Advances 13, no. 1 (2022): 098–107. https://doi.org/10.5281/zenodo.7678618.

Full text

Abstract:

Due to the increased demand for automatic document organization, text classification is essential in both academic and commercial platforms. The aim of text classification is to automatically group text documents into one or more predefined categories,that helps to solve a variety of challenges. Many of these concerns are related to data management. In this paper, we propose a new model for Arabic text classification. The model consists of two main phases. The first phase is concerned with extracting three sets of features: statistical feature, Latent Semantic Analysis (LSA) feature, and a combination of both. While the second phase is concerned with introducing these features separately to the Bidirectional Long-short Term Memory (BI-LSTM) for classification purposes. The performance of the proposed model evaluated using CNN Arabic corpus. The experimental results showed solid performance of the proposed model, especially for a combination feature when the averages of precision, recall, and F-measurement reached 94, 91, and 91.94 respectively.

APA, Harvard, Vancouver, ISO, and other styles

37

Apandi, Siti Hawa, Jamaludin Sallim, Rozlina Mohamed, and Norkhairi Ahmad. "Automatic Topic-Based Web Page Classification Using Deep Learning." JOIV : International Journal on Informatics Visualization 7, no. 3-2 (2023): 2108. http://dx.doi.org/10.30630/joiv.7.3-2.1616.

Full text

Abstract:

The internet is frequently surfed by people by using smartphones, laptops, or computers in order to search information online in the web. The increase of information in the web has made the web pages grow day by day. The automatic topic-based web page classification is used to manage the excessive amount of web pages by classifying them to different categories based on the web page content. Different machine learning algorithms have been employed as web page classifiers to categorise the web pages. However, there is lack of study that review classification of web pages using deep learning. In this study, the automatic topic-based classification of web pages utilising deep learning that has been proposed by many key researchers are reviewed. The relevant research papers are selected from reputable research databases. The review process looked at the dataset, features, algorithm, pre-processing used in classification of web pages, document representation technique and performance of the web page classification model. The document representation technique used to represent the web page features is an important aspect in the classification of web pages as it affects the performance of the web page classification model. The integral web page feature is the textual content. Based on the review, it was found that the image based web page classification showed higher performance compared to the text based web page classification. Due to lack of matrix representation that can effectively handle long web page text content, a new document representation technique which is word cloud image can be used to visualize the words that have been extracted from the text content web page.

APA, Harvard, Vancouver, ISO, and other styles

38

Arief, Rifiana, Suryarini Widodo, Ary Bima Kurniawan, Hustinawaty Hustinawaty, and Faisal Arkan. "Advanced content-based retrieval for digital correspondence documents with ontology classification." Bulletin of Electrical Engineering and Informatics 11, no. 3 (2022): 1665–77. http://dx.doi.org/10.11591/eei.v11i3.3376.

Full text

Abstract:

The growth of digital correspondence documents with various types, different naming rules, and no sufficient search system complicates the search process with certain content, especially if there are unclassified documents, the search becomes inaccurate and takes a long time. This research proposed archiving method with automatic hierarchical classification and the content-based search method which displays ontology classification information as the solution to the content-based search problems. The method consists of preprocessing (creation of automatic hierarchical classification model using a combination of convolutional neural network (CNN) and regular expression method), archiving (document archiving with automatic classification), and retrieval (content-based search by displaying ontology relationships from the document classification). The archiving of 100 documents using the automatic hierarchical classification was found to be 79% accurate as indicated by the 99% accuracy for CNN and 80% for Regex. Moreover, the search results for classified content-based documents through the display of ontology relationships were discovered to be 100% accurate. This research succeeded in improving the quality of search results for digital correspondence documents as indicated by its higher specificity, accuracy, and speed compared to conventional methods based on file names, annotations, and unclassified content.

APA, Harvard, Vancouver, ISO, and other styles

39

Rifiana, Arief, Widodo Suryarini, Bima Kurniawan Ary, Hustinawaty, and Arkan Faisal. "Advanced content-based retrieval for digital correspondence documents with ontology classification." Bulletin of Electrical Engineering and Informatics 11, no. 3 (2022): 1665~1677. https://doi.org/10.11591/eei.v11i3.3376.

Full text

Abstract:

The growth of digital correspondence documents with various types, different naming rules, and no sufficient search system complicates the search process with certain content, especially if there are unclassified documents, the search becomes inaccurate and takes a long time. This research proposed archiving method with automatic hierarchical classification and the content-based search method which displays ontology classification information as the solution to the content-based search problems. The method consists of preprocessing (creation of automatic hierarchical classification model using a combination of convolutional neural network (CNN) and regular expression method), archiving (document archiving with automatic classification), and retrieval (content-based search by displaying ontology relationships from the document classification). The archiving of 100 documents using the automatic hierarchical classification was found to be 79% accurate as indicated by the 99% accuracy for CNN and 80% for Regex. Moreover, the search results for classified content-based documents through the display of ontology relationships were discovered to be 100% accurate. This research succeeded in improving the quality of search results for digital correspondence documents as indicated by its higher specificity, accuracy, and speed compared to conventional methods based on file names, annotations, and unclassified content.

APA, Harvard, Vancouver, ISO, and other styles

40

Dickins, Benjamin James Alexander, David William Dickins, and Thomas Edmund Dickins. "Is this conjectural phenotypic dichotomy a plausible outcome of genomic imprinting?" Behavioral and Brain Sciences 31, no. 3 (2008): 267–68. http://dx.doi.org/10.1017/s0140525x08004287.

Full text

Abstract:

AbstractWhat is the status of the dichotomy proposed and the nosological validity of the contrasting pathologies described in the target article? How plausibly can dysregulated imprinting explain the array of features described, compared with other genetic models? We believe that considering alternative models is more likely to lead in the long term to the correct classification and explanation of the component behaviours.

APA, Harvard, Vancouver, ISO, and other styles

41

Doddy Setiawan, Taufiq Arifin, Y Anni Aryani, and Josephine Tan-Hwang Yau. "How Has the Indonesian Stock Market Performed During Covid-19 Outbreaks?" International Journal of Business and Society 22, no. 3 (2021): 1420–28. http://dx.doi.org/10.33736/ijbs.4312.2021.

Full text

Abstract:

This paper analyzes the stock market reaction towards the Covid-19 pandemic by using a sample of Indonesian listed firms. In general, we document a significant negative cumulative abnormal returns when the Indonesian President announces the first case of Covid-19 in Indonesia. This effect remains ten days (weaker) after the announcement. However, we only find a short-term effect on the finance industry. While the explanation is still unclear, the investors may observe that the economic impact on the finance industry may arise in the long-run.

APA, Harvard, Vancouver, ISO, and other styles

42

Alfarizi, Muhammad Ibnu, Lailis Syafaah, and Merinda Lestandy. "Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory)." JUITA : Jurnal Informatika 10, no. 2 (2022): 225. http://dx.doi.org/10.30595/juita.v10i2.13262.

Full text

Abstract:

Humans in carrying out communication activities can express their feelings either verbally or non-verbally. Verbal communication can be in the form of oral or written communication. A person's feelings or emotions can usually be seen by their behavior, tone of voice, and expression. Not everyone can see emotion only through writing, whether in the form of words, sentences, or paragraphs. Therefore, a classification system is needed to help someone determine the emotions contained in a piece of writing. The novelty of this study is a development of previous research using a similar method, namely LSTM but improved on the word weighting process using the TF-IDF method as a further process of LSTM classification. The method proposed in this research is called Natural Language Processing (NLP). The purpose of this study was to compare the classification method with the LSTM (Long Short-Term Memory) model by adding the word weighting TF-IDF (Term Frequency–Inverse Document Frequency) and the LinearSVC model, as well to increase accuracy in determining an emotion (sadness, anger, fear, love, joy, and surprise) contained in the text. The dataset used is 18000, which is divided into 16000 training data and 2000 test data with 6 classifications of emotion classes, namely sadness, anger, fear, love, joy, and surprise. The results of the classification accuracy of emotions using the LSTM method yielded a 97.50% accuracy while using the LinearSVC method resulted in an accuracy value of 89%.

APA, Harvard, Vancouver, ISO, and other styles

43

Jasmir, Jasmir, Willy Riyadi, Silvia Rianti Agustini, Yulia Arvita, Despita Meisak, and Lies Aryani. "Bidirectional Long Short-Term Memory and Word Embedding Feature for Improvement Classification of Cancer Clinical Trial Document." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 4 (2022): 505–10. http://dx.doi.org/10.29207/resti.v6i4.4005.

Full text

Abstract:

In recent years, the application of deep learning methods has become increasingly popular, especially for big data, because big data has a very large data size and needs to be predicted accurately. One of the big data is the document text data of cancer clinical trials. Clinical trials are studies of human participation in helping people's safety and health. The aim of this paper is to classify cancer clinical texts from a public data set. The proposed algorithms are Bidirectional Long Short Term Memory (BiLSTM) and Word Embedding Features (WE). This study has contributed to a new classification model for documenting clinical trials and increasing the classification performance evaluation. In this study, two experiments work are conducted, namely experimental work BiLSTM without WE, and experimental work BiLSTM using WE. The experimental results for BiLSTM without WE were accuracy = 86.2; precision = 85.5; recall = 87.3; and F-1 score = 86.4. meanwhile the experiment results for BiLSTM using WE stated that the evaluation score showed outstanding performance in text classification, especially in clinical trial texts with accuracy = 92,3; precision = 92.2; recall = 92.9; and F-1 score = 92.5.

APA, Harvard, Vancouver, ISO, and other styles

44

Vaikhanskaya, T. G., T. V. Kurushko, Yu A. Persianskikh, and L. N. Sivitskaya. "Atrial cardiomyopathy — a new concept with a long history." Russian Journal of Cardiology 25, no. 11 (2020): 3942. http://dx.doi.org/10.15829/29/1560-4071-2020-3942.

Full text

Abstract:

Atrial cardiomyopathy (ACM) is a relatively common but clinically underestimated disorder, which is characterized by an increased atrial size and dysfunction. Previously, ACM was considered a primary disorder, but in 2016 this concept was revised by European Heart Rhythm Association (EHRA) working group with inclusion of secondary atrial remodeling. The EHRA document details aspects of atrial anatomy and pathophysiology, proposes definitions of ACM, histological classification, outlines the molecular mechanisms of atrial arrhythmia and the problems of personalized treatment and optimization of indications for catheter ablation.Practical application of the proposed ACM classification system, the clinical significance of novel ACM concept and the potential role of this information for a practitioner are presented in this article. Two clinical cases of ACM with “primary” (familial form of ACM due to NPPA gene mutation with primary defect in atrial structure and function) and “secondary” atrial remodeling (ACM caused by a longterm supraventricular tachyarrhythmias due to SCN1B gene mutation).

APA, Harvard, Vancouver, ISO, and other styles

45

Tukino, Paryono, Sediyono Eko, Hendry Hendry, Huda Baenil, Lia Hananto April, and Yuniar Rahman Aviv. "Intelligent classification and performance prediction of multi text assessment with recurrent neural networks-long short-term memory." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 3 (2024): 3350–63. https://doi.org/10.11591/ijai.v13.i3.pp3350-3363.

Full text

Abstract:

The assessment document at the time of study program accreditation shows performance achievements that will have an impact on the development of the study program in the future. The description in the assessment document contains unstructured data, making it difficult to identify target indicators. Apart from that, the number of Indonesian-based assessment documents is quite large, and there has been no research on these assessment documents. Therefore, this research aims to classify and predict target indicator categories into 4 categories: deficient, enough, good, and very. Learning testing of the Indonesian language assessment sentence classification model using recurrent neural networks-long short-term memory (RNN-LSTM) using 5 layers and 3 parameters produces performance with an accuracy value of 94.24% and a loss of 10%. In the evaluation with the Adamax optimizer, it had a high level of accuracy, namely 79%, followed by stochastic gradient descent (SGD) of 78%. For the Adam optimizer, Adadelta, and root mean squared propagation (RMSProp) have an accuracy rate of 77%.

APA, Harvard, Vancouver, ISO, and other styles

46

Alotaibi, Naif D., Hadi Jahanshahi, Qijia Yao, Jun Mou, and Stelios Bekiros. "An Ensemble of Long Short-Term Memory Networks with an Attention Mechanism for Upper Limb Electromyography Signal Classification." Mathematics 11, no. 18 (2023): 4004. http://dx.doi.org/10.3390/math11184004.

Full text

Abstract:

Advancing cutting-edge techniques to accurately classify electromyography (EMG) signals are of paramount importance given their extensive implications and uses. While recent studies in the literature present promising findings, a significant potential still exists for substantial enhancement. Motivated by this need, our current paper introduces a novel ensemble neural network approach for time series classification, specifically focusing on the classification of upper limb EMG signals. Our proposed technique integrates long short-term memory networks (LSTM) and attention mechanisms, leveraging their capabilities to achieve accurate classification. We provide a thorough explanation of the architecture and methodology, considering the unique characteristics and challenges posed by EMG signals. Furthermore, we outline the preprocessing steps employed to transform raw EMG signals into a suitable format for classification. To evaluate the effectiveness of our proposed technique, we compare its performance with a baseline LSTM classifier. The obtained numerical results demonstrate the superiority of our method. Remarkably, the method we propose attains an average accuracy of 91.5%, with all motion classifications surpassing the 90% threshold.

APA, Harvard, Vancouver, ISO, and other styles

47

Abdul Haseeb, Mohammed Aqeel, Deepak Patil, and Mahima V. "Grammatical Error Checker for Text and Speech using NLP and Speech Recognition in LLMs." JOURNAL OF INTELLIGENT SYSTEMS AND COMPUTING 5, no. 2 (2024): 19–24. https://doi.org/10.51682/jiscom.v5i2.47.

Full text

Abstract:

With the rapid growth of Artificial Intelligence (AI) and Machine Learning (ML), the ways in which we perform many of our day-to-day tasks have changed. The process of documentation has become simpler due to new-age tools that leverage technologies such as AI and ML to automate tasks such as searching for grammatical errors in sentences and spelling errors in words within a text document. Speech recognition is being used in the pursuit of automating tasks and reducing dependency on humans. Natural Language Processing, or NLP is used by Large Language Models (LLMs) to process inputs in the form of text or speech for a multitude of tasks such as text classification and language translation. This paper begins with an explanation of NLP and later provides an overview of speech recognition.

APA, Harvard, Vancouver, ISO, and other styles

48

Paryono, Tukino, Eko Sediyono, Hendry Hendry, Baenil Huda, April Lia Hananto, and Aviv Yuniar Rahman. "Intelligent classification and performance prediction of multi-text assessment with recurrent neural networks-long short-term memory." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 3 (2024): 3350. http://dx.doi.org/10.11591/ijai.v13.i3.pp3350-3363.

Full text

Abstract:

<p>The assessment document at the time of study program accreditation shows performance achievements that will have an impact on the development of the study program in the future. The description in the assessment document contains unstructured data, making it difficult to identify target indicators. Apart from that, the number of Indonesian-based assessment documents is quite large, and there has been no research on these assessment documents. Therefore, this research aims to classify and predict target indicator categories into 4 categories: deficient, enough, good, and very. Learning testing of the Indonesian language assessment sentence classification model using recurrent neural networks-long short-term memory (RNN-LSTM) using 5 layers and 3 parameters produces performance with an accuracy value of 94.24% and a loss of 10%. In the evaluation with the Adamax optimizer, it had a high level of accuracy, namely 79%, followed by stochastic gradient descent (SGD) of 78%. For the Adam optimizer, Adadelta, and root mean squared propagation (RMSProp) have an accuracy rate of 77%.</p>

APA, Harvard, Vancouver, ISO, and other styles

49

Nuser, Maryam, and Enas Al-Horani. "Medical documents classification using topic modeling." Indonesian Journal of Electrical Engineering and Computer Science 17, no. 3 (2020): 1524. http://dx.doi.org/10.11591/ijeecs.v17.i3.pp1524-1530.

Full text

Abstract:

The number of digital medical documents is increasing continuously; several medical websites share a lot of unclassified articles. These articles have very long texts that should be read to determine the topic of each document. The classification of these documents is important so researchers can use these documents easily and the effort and time in reading and searching for a specific topic will be reduced. Therefore, an automatic way to extract latent topics from these text documents is needed. Topic modeling is one of the techniques used to deal with this problem. In this paper, a medical collection of documents is used; this collection contains documents from three types of widespread diseases (Heart Diseases, Blood Pressure and Cholesterol). LDA topic modeling technique is applied to classify these documents into the previous mentioned topics. An evaluation of the algorithm’s results is done and the LDA shows a good level of classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

50

Luo, Siyin, Youjian Gu, Xingxing Yao, and Wei Fan. "Research on Text Sentiment Analysis Based on Neural Network and Ensemble Learning." Revue d'Intelligence Artificielle 35, no. 1 (2021): 63–70. http://dx.doi.org/10.18280/ria.350107.

Full text

Abstract:

In view of the fact that a single sentiment classification model may be unstable in classification, this paper attempts to propose a joint neural network and ensemble learning sentiment analysis method. After data preprocessing such as word segmentation on the text, combined with document vectorization method for feature extraction, we then use four basic classifiers including long short-term memory network, convolutional neural network, a serial model combining convolutional neural network and long short-term memory network, and support vector machine to train model, respectively. Finally, the integration is carried out by stacking ensemble learning. The experimental results show that the integrated model significantly improves the accuracy of text sentiment analysis and it can effectively predict the sentiment polarity of the text.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!