Academic literature on the topic 'Long Document Classification and Explanation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Long Document Classification and Explanation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Long Document Classification and Explanation"

1

Beckh, Katharina, Joann Rachel Jacob, Adrian Seeliger, Stefan Rüping, and Najmeh Mousavi Nejad. "Limitations of Feature Attribution in Long Text Classification of Standards." Proceedings of the AAAI Symposium Series 4, no. 1 (2024): 10–17. http://dx.doi.org/10.1609/aaaiss.v4i1.31765.

Full text
Abstract:
Managing complex AI systems requires insight into a model's decision-making processes. Understanding how these systems arrive at their conclusions is essential for ensuring reliability. In the field of explainable natural language processing, many approaches have been developed and evaluated. However, experimental analysis of explainability for text classification has been largely constrained to short text and binary classification. In this applied work, we study explainability for a real-world task where the goal is to assess the technological suitability of standards. This prototypical use case is characterized by large documents, technical language, and a multi-label setting, making it a complex modeling challenge. We provide an analysis of approx. 1000 documents with human-annotated evidence. We then present experimental results with two explanation methods evaluating plausibility and runtime of explanations. We find that the average runtime for explanation generation is at least 5 minutes and that the model explanations do not overlap with the ground truth. These findings reveal limitations of current explanation methods. In a detailed discussion, we identify possible reasons and how to address them on three different dimensions: task, model and explanation method. We conclude with risks and recommendations for the use of feature attribution methods in similar settings.
APA, Harvard, Vancouver, ISO, and other styles
2

Sitompul, Anita, Kammer Tuahman Sipayung, and Jubil Sihite. "The Analysis of Reading Exercise in English Textbook Entitled Pathway to English for The SENIOR High School Grade X." Jurnal Suluh Pendidikan 7, no. 1 (2019): 10–13. http://dx.doi.org/10.36655/jsp.v7i1.111.

Full text
Abstract:
This study is aimed to analyze the types of Reading Exercises on the English textbook used by the first year students of SMA SWASTA METHODIST 7 Medan. The objective of the study is to find out the types of reading exercises on English textbook used by the first year students of SMA SWASTA METHODIST 7 Medan. The design of the study is descriptive qualitative research. The qualitative data were obtained by the steps that mentioned in the procedure of the research, i.e. reading, identification, classification and simplification. The researcher analyzed the exercise in three steps,identifying the topic, clutser the topic, draw explanation. The object of the study is the reading exercises created by Th. M. Sudarwati and Eudia Grace entitled by Pathway To English and published by Erlangga, 2017. The data was collected only by taking documentary analysis. It means that the writer document reading exercises on students� English textbook and analyzed its types of reading exercises. The final result of this study shows that are five types of reading exercises on the English textbook they are Types of reading exercise are Matching Test, True/ False Reading Test, Multiple Choice item test, Completion item tests, and Long and Short answer questions. The result of reading exercise analysis shows that there are controlled exercise and guided exercise. In controlled exercises the researcher didn�t find exercise in Pathway to English textbook. And in guided exercises there are find in multiple choice in vocabulary 2 exercise. Matching cued word matching 3 exercise, matching picture cued sentence 1 exercise, vocabulary matching 5 exercise, matching selected response fill in vocabulary 7 exercise. True or false 3 exercise, completion item in the following of text 1 exercise, completion are in the text itself 9 exercise. In short answer question 6 exercise, long answer question 8 exercise. But, the researcher didn�t find exercise in multiple choice contextualized vocabulary / grammar and multiple choice vocabulary / grammar.
APA, Harvard, Vancouver, ISO, and other styles
3

Pfau-Effinger, Birgit, and Marcel Sebastian. "Institutional persistence despite cultural change: a historical case study of the re-categorization of dogs in Germany." Agriculture and Human Values 39, no. 1 (2021): 473–85. http://dx.doi.org/10.1007/s10460-021-10272-4.

Full text
Abstract:
AbstractHuman–animal relations in post-industrial societies are characterized by a system of cultural categories that distinguishes between different types of animals based on their function in human society, such as “farm animals” or “pets.” The system of cultural categories, and the allocation of animal species within this cultural classification system can change. Options for change include re-categorizing a specific animal species within the categorical system. The paper argues that attempts by political actors to adapt the institutional system to cultural change that calls for re-categorization of certain animal species can start a contradictory process that may lead to long-term survival of the respective institution despite the cultural change. It is common to explain the persistence of political institutions with institutional path dependency or policy preferences of the governing parties. This paper introduces a new institutional theoretical approach to the explanation, the approach of “rejecting changing a part for fear of undermining the whole.” This paper uses a case study of a series of failed political efforts to change the treatment of dogs in the framework of the agricultural human–animal policy in the Federal Republic of Germany in the second half of the twentieth century, to evaluate its theoretical argument, using analyses of historical political documents, mass media, and communication documents between civil society actors and policymakers. This paper makes an innovative contribution to the theory and research on institutional change, the sociology of agriculture and food, and the sociology of human–animal relations.
APA, Harvard, Vancouver, ISO, and other styles
4

Shi, Tian, Xuchao Zhang, Ping Wang, and Chandan K. Reddy. "Corpus-level and Concept-based Explanations for Interpretable Document Classification." ACM Transactions on Knowledge Discovery from Data 16, no. 3 (2022): 1–17. http://dx.doi.org/10.1145/3477539.

Full text
Abstract:
Using attention weights to identify information that is important for models’ decision making is a popular approach to interpret attention-based neural networks. This is commonly realized in practice through the generation of a heat-map for every single document based on attention weights. However, this interpretation method is fragile and it is easy to find contradictory examples. In this article, we propose a corpus-level explanation approach, which aims at capturing causal relationships between keywords and model predictions via learning the importance of keywords for predicted labels across a training corpus based on attention weights. Based on this idea, we further propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model prediction tasks. Our concept-based explanation method is built upon a novel Abstraction-Aggregation Network (AAN), which can automatically cluster important keywords during an end-to-end training process. We apply these methods to the document classification task and show that they are powerful in extracting semantically meaningful keywords and concepts. Our consistency analysis results based on an attention-based Naïve Bayes classifier (NBC) also demonstrate that these keywords and concepts are important for model predictions.
APA, Harvard, Vancouver, ISO, and other styles
5

Uddin, Farid, Yibo Chen, Zuping Zhang, and Xin Huang. "Corpus Statistics Empowered Document Classification." Electronics 11, no. 14 (2022): 2168. http://dx.doi.org/10.3390/electronics11142168.

Full text
Abstract:
In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging to capture semantic insights when dealing with short-length documents where word co-occurrence information is limited. In this context, for long texts, we proposed Weighted Sparse Document Vector (WSDV), which performs clustering on the weighted data that emphasizes vital terms and moderates the soft clustering by removing outliers from the converged clusters. Besides the removal of outliers, WSDV utilizes corpus statistics in different steps for the vectorial representation of the document. For short texts, we proposed Weighted Compact Document Vector (WCDV), which captures better semantic insights in building document vectors by emphasizing potential terms and capturing uncertainty information while measuring the affinity between distributions of words. Using available corpus statistics, WCDV sufficiently handles the data sparsity of short texts without depending on external knowledge sources. To evaluate the proposed models, we performed a multiclass document classification using standard performance measures (precision, recall, f1-score, and accuracy) on three long- and two short-text benchmark datasets that outperform some state-of-the-art models. The experimental results demonstrate that in the long-text classification, WSDV reached 97.83% accuracy on the AgNews dataset, 86.05% accuracy on the 20Newsgroup dataset, and 98.67% accuracy on the R8 dataset. In the short-text classification, WCDV reached 72.7% accuracy on the SearchSnippets dataset and 89.4% accuracy on the Twitter dataset.
APA, Harvard, Vancouver, ISO, and other styles
6

Isha, Bharti Bhardwaj, and Bal Ram Bhardwaj Er. "SMART CLOUD WITH DOCUMENT CLUSTERING." International Journal of Advances in Engineering & Scientific Research 3, no. 2 (2016): 18–31. https://doi.org/10.5281/zenodo.10749526.

Full text
Abstract:
<strong>Abstract: </strong> &nbsp; <em>This research paper describes the results oriented from experimental study of conventional document clustering techniques implemented in the commercial spaces so far. Particularly, we compared main approaches related to document clustering, agglomerative hierarchical clustering and K-means. though this paper, we generates and implement checker&rsquo;s algorithms which deals with the duplicacy of the document content with the rest of the documents in the cloud. We also generate algorithm required to deals with the classification of the cloud data. The classification in this algorithm is done on the basis of the date of data uploaded and the how much that data is accessed by the client. We will take the ratio of both vectors and generate a score which rates the document in the classification. We propose an explanation for these results that is based on an analysis of the specifics of the clustering algorithms and the nature of document data.</em> <strong>Keywords:</strong><em> </em>algorithm, commercial, classification, hierarchical, nature, etc.
APA, Harvard, Vancouver, ISO, and other styles
7

Liu, Liu, Kaile Liu, Zhenghai Cong, Jiali Zhao, Yefei Ji, and Jun He. "Long Length Document Classification by Local Convolutional Feature Aggregation." Algorithms 11, no. 8 (2018): 109. http://dx.doi.org/10.3390/a11080109.

Full text
Abstract:
The exponential increase in online reviews and recommendations makes document classification and sentiment analysis a hot topic in academic and industrial research. Traditional deep learning based document classification methods require the use of full textual information to extract features. In this paper, in order to tackle long document, we proposed three methods that use local convolutional feature aggregation to implement document classification. The first proposed method randomly draws blocks of continuous words in the full document. Each block is then fed into the convolution neural network to extract features and then are concatenated together to output the classification probability through a classifier. The second model improves the first by capturing the contextual order information of the sampled blocks with a recurrent neural network. The third model is inspired by the recurrent attention model (RAM), in which a reinforcement learning module is introduced to act as a controller for selecting the next block position based on the recurrent state. Experiments on our collected four-class arXiv paper dataset show that the three proposed models all perform well, and the RAM model achieves the best test accuracy with the least information.
APA, Harvard, Vancouver, ISO, and other styles
8

Almonayyes, Ahmad. "Multiple Explanations Driven Naïve Bayes Classifier." JUCS - Journal of Universal Computer Science 12, no. (2) (2006): 127–39. https://doi.org/10.3217/jucs-012-02-0127.

Full text
Abstract:
Exploratory data analysis over foreign language text presents virtually untapped opportunity. This work incorporates Naïve Bayes classifier with Case-Based Reasoning in order to classify and analyze Arabic texts related to fanaticism. The Arabic vocabularies are converted to equivalent English words using conceptual hierarchy structure. The understanding process operates at two phases. At the first phase, a discrimination network of multiple questions is used to retrieve explanatory knowledge structures each of which gives an interpretation of a text according to a particular aspect of fanaticism. Explanation structures organize past documents of fanatic content. Similar documents are retrieved to generate additional valuable information about the new document. In the second phase, the document classification process based on Naïve Bayes is used to classify documents into their fanatic class. The results show that the classification accuracy is improved by incorporating the explanation patterns with the Naïve Bayes classifier.
APA, Harvard, Vancouver, ISO, and other styles
9

Mariyam, Ayesha, SK Althaf Hussain Basha, and S. Viswanadha Raju. "On Optimality of Long Document Classification using Deep Learning." International Journal on Recent and Innovation Trends in Computing and Communication 10, no. 12 (2022): 51–58. http://dx.doi.org/10.17762/ijritcc.v10i12.5866.

Full text
Abstract:
Document classification is effective with elegant models of word numerical distributions. The word embeddings are one of the categories of numerical distributions of words from the WordNet. The modern machine learning algorithms yearn on classifying documents based on the categorical data. The context of interest on the categorical data is posed with weights and the sense and quality of the sentences is estimated for sensible classification of documents. The focus of the current work is on legal and criminal documents extracted from the popular news channels, particularly on classification of long length legal and criminal documents. Optimization is the essential instrument to bring the quality inputs to the document classification model. The existing models are studied and a feasible model for the efficient document classification is proposed. The experiments are carried out with meticulous filtering and extraction of legal and criminal records from the popular news web sites and preprocessed with WordNet and Text Processing contingencies for efficient inward for the learning framework.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Bohan, Rui Qi, Jinhua Gao, Jianwei Zhang, Xiaoguang Yuan, and Wenjun Ke. "Mining the Frequent Patterns of Named Entities for Long Document Classification." Applied Sciences 12, no. 5 (2022): 2544. http://dx.doi.org/10.3390/app12052544.

Full text
Abstract:
Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MIPELD (mining the frequent pattern of a named entity for long document classification) for long document classification is demonstrated, which mines the frequent patterns of named entities as features. Discovered patterns allow semantic generalization among documents and provide clues for verifying the results. Experiments on several datasets resulted in good accuracy and marco-F1 values, meeting the requirements for practical application. Further analysis validated the effectiveness of MIPELD in mining interpretable information in text classification.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Long Document Classification and Explanation"

1

Prasad, Nishchal. "Modèles de langage volumineux et leur adaptation hiérarchique sur de longs documents pour la classification et leur explication : un cas de TALN juridique." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSES244.

Full text
Abstract:
La prédiction des jugements juridiques pose des défis importants en raison de la longueur et de la structure non uniforme des documents de procédure, qui peuvent dépasser des dizaines de milliers de mots. Ces complexités sont encore exacerbées lorsque les documents manquent d'annotations structurelles. Pour résoudre ces problèmes, nous proposons un cadre hiérarchique basé sur l'apprentissage profond appelé MESc (Multi-stage Encoder-based Supervised with Clustering) pour la prédiction des jugements. MESc divise les longs documents juridiques en parties plus petites, en extrayant leurs incorporations des quatre dernières couches d'un modèle de langage large (LLM) personnalisé et affiné. Nous approximons la structure du document à l'aide d'un clustering non supervisé, en alimentant les incorporations groupées dans des couches d'encodeur de transformateur pour apprendre les représentations inter-blocs. Notre approche exploite des LLM à plusieurs milliards de paramètres, tels que GPT-Neo et GPT-J, dans ce cadre hiérarchique et démontre leur adaptabilité et leurs capacités d'apprentissage par transfert intra-domaine. Dans des expériences utilisant des textes juridiques de l'Inde, de l'Union européenne et des États-Unis, provenant des ensembles de données ILDC et LexGLUE, MESc obtient au moins 2 points d'amélioration des performances par rapport aux méthodes de pointe. Malgré le succès des cadres hiérarchiques dans le traitement de longs documents juridiques, leur nature de boîte noire limite souvent l'explicabilité de leurs prédictions, ce qui est essentiel pour les applications juridiques du monde réel. Pour résoudre ce problème, nous développons Ob-HEx (Occlusion-based Hierarchical Explanation-extracteur), un algorithme qui fournit des explications extractives pour les modèles hiérarchiques en évaluant la sensibilité des prédictions aux perturbations d'entrée. Plus précisément, nous utilisons l'occlusion pour perturber les séquences d'entrée et analysons les prédictions résultantes, générant ainsi des explications. Nous adaptons Ob-HEx aux modèles Hierarchical Transformer formés sur des textes juridiques indiens, démontrant son efficacité sur l'ensemble de données ILDC-Expert avec un gain minimum de 1 point par rapport aux références précédentes sur la plupart des mesures d'évaluation<br>Legal judgment prediction poses significant challenges due to the length and non-uniform structure of case documents, which can exceed tens of thousands of words. These complexities are further exacerbated when documents lack structural annotations. To address these issues, we propose a deep-learning-based hierarchical framework called MESc (Multi-stage Encoder-based Supervised with Clustering) for judgment prediction. MESc divides lengthy legal documents into smaller parts, extracting their embeddings from the last four layers of a custom fine-tuned Large Language Model (LLM). We approximate document structure using unsupervised clustering, feeding the clustered embeddings into transformer encoder layers to learn inter-chunk representations. Our approach leverages multi-billion parameter LLMs, such as GPT-Neo and GPT-J, within this hierarchical framework and demonstrates their adaptability and intra-domain transfer learning capabilities. In experiments using legal texts from India, the European Union, and the United States, sourced from the ILDC and LexGLUE datasets, MESc achieves at least a 2-point performance improvement over state-of-the-art methods. Despite the success of hierarchical frameworks in processing long legal documents, their black-box nature often limits the explainability of their predictions, which is critical for real-world legal applications. To address this, we develop Ob-HEx (Occlusion-based Hierarchical Explanation-extractor), an algorithm that provides extractive explanations for hierarchical models by assessing the sensitivity of predictions to input perturbations. Specifically, we use occlusion to perturb input sequences and analyze the resulting predictions, thereby generating explanations. We adapt Ob-HEx to Hierarchical Transformer models trained on Indian legal texts, demonstrating its effectiveness on the ILDC-Expert dataset with a minimum gain of 1 point over previous benchmarks across most evaluation metrics
APA, Harvard, Vancouver, ISO, and other styles
2

Velloso, Rodrigues Karina. "Essai de modélisation du processus d’innovation des biens d’équipement : le cas d’un produit de haute technologie à long cycle de vie : application aux moteurs électriques dans un groupe industriel international." Thesis, Vandoeuvre-les-Nancy, INPL, 2011. http://www.theses.fr/2011INPL015N/document.

Full text
Abstract:
La maîtrise du processus d’innovation est devenu un enjeu majeur pour les acteurs économiques actuels. Plus particulièrement dans les industries de biens d’équipement, le processus de conception et développement est long, complexe et implique différents acteurs travaillant sous des contraintes sévères. L’innovation n’est plus une activité ponctuelle mais récurrente. Le développement de nouveaux produits est donc organisée sous la forme de projets: les «projets d’innovation». A partir des observations portées sur le pilotage des projets d’innovation à l’intérieur d’un groupe industriel appelé Converteam, cette thèse propose une méthode de classification et segmentation des projets d'innovation de biens d’équipement. Afin de concevoir cette méthode de classification, une méthodologie d’observation pour la collecte des données a été mise au point. La campagne d’expérimentation a été menée dans le cadre de six projets d’innovation développés chez Converteam. Nous avons commencé par réaliser une analyse de l’évolution et de la dynamique d’innovation de la technologie des moteurs électriques afin de préciser la problématique industrielle. Ensuite nous avons analysé la littérature afin de trouver les éléments permettant de mettre au point une classification des projets d’innovation. L’étude de ces éléments a abouti dans une méthode de classification qui prend en compte les aspects marketing, technologiques et les attentes des entreprises, pour classer les projets en trois catégories : la différentiation, la neutralisation et l’optimisation. Cette méthode a été testée sur six projets d’innovation ce qui nous a permis de dégager de bonnes pratiques pour chaque catégorie de projet<br>Mastering the innovation process is a critical and major issue for a wide range of companies in this rapidly changing world. Especially in the capital goods industries where the design and development process is very long, complex and involves many different departments working under severe constraints (limited financial and technological resources, time, standards and rules to follow, etc). Innovation in these industries is no longer casual, but it has became a pivotal activity of the company. Thus, new product development activity is organized into "innovation projects". From observations of the innovative projects driver within a large industrial group called Converteam, this thesis proposes a method for classifying and segmenting capital goods innovation projects. In order to develop this classification method, an observation methodology for data collection has been developed. The experimental procedure was performed in six innovation projects developed at Converteam. We started by analysing the evolution and dynamics of the electric motors technology. Then, we analyzed the literature in order to find the inputs and outputs needed to develop a classification of innovation projects. The study of these elements resulted in a classification method that takes into account the marketing and technological aspects and the company expectations to classify projects into three categories: differentiation, neutralization and optimization. This method was tested on six innovative projects that allowed us to identify good practices for each project category
APA, Harvard, Vancouver, ISO, and other styles
3

Mathieu, Jordane. "Modèles d'impact statistiques en agriculture : de la prévision saisonnière à la prévision à long terme, en passant par les estimations annuelles." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE006/document.

Full text
Abstract:
En agriculture, la météo est le principal facteur de variabilité d’une année sur l’autre. Cette thèse vise à construire des modèles statistiques à grande échelle qui estiment l’impact des conditions météorologiques sur les rendements agricoles. Le peu de données agricoles disponibles impose de construire des modèles simples avec peu de prédicteurs, et d’adapter les méthodes de sélection de modèles pour éviter le sur-apprentissage. Une grande attention a été portée sur la validation des modèles statistiques. Des réseaux de neurones et modèles à effets mixtes (montrant l’importance des spécificités locales) ont été comparés. Les estimations du rendement de maïs aux États-Unis en fin d’année ont montré que les informations de températures et de précipitations expliquent en moyenne 28% de la variabilité du rendement. Dans plusieurs états davantage météo-sensibles, ce score passe à près de 70%. Ces résultats sont cohérents avec de récentes études sur le sujet. Les prévisions du rendement au milieu de la saison de croissance du maïs sont possibles à partir de juillet : dès juillet, les informations météorologiques utilisées expliquent en moyenne 25% de la variabilité du rendement final aux États-Unis et près de 60% dans les états plus météo-sensibles comme la Virginie. Les régions du nord et du sud-est des États-Unis sont les moins bien prédites. Le rendements extrêmement faibles ont nécessité une méthode particulière de classification : avec seulement 4 prédicteurs météorologiques, 71% des rendements très faibles sont bien détectés en moyenne. L’impact du changement climatique sur les rendements jusqu’en 2060 a aussi été étudié : le modèle construit nous informe sur la rapidité d’évolution des rendements dans les différents cantons des États-Unis et localisent ceux qui seront le plus impactés. Pour les états les plus touchés (au sud et sur la côte Est), et à pratique agricole constante, le modèle prévoit des rendements près de deux fois plus faibles que ceux habituels, en 2060 sous le scénario RCP 4.5 du GIEC. Les états du nord seraient peu touchés. Les modèles statistiques construits peuvent aider à la gestion sur le cours terme (prévisions saisonnières) ou servent à quantifier la qualité des récoltes avant que ne soient faits les sondages post-récolte comme une aide à la surveillance (estimation en fin d’année). Les estimations pour les 50 prochaines années participent à anticiper les conséquences du changement climatique sur les rendements agricoles, pour définir des stratégies d’adaptation ou d’atténuation. La méthodologie utilisée dans cette thèse se généralise aisément à d’autres cultures et à d’autres régions du monde<br>In agriculture, weather is the main factor of variability between two consecutive years. This thesis aims to build large-scale statistical models that estimate the impact of weather conditions on agricultural yields. The scarcity of available agricultural data makes it necessary to construct simple models with few predictors, and to adapt model selection methods to avoid overfitting. Careful validation of statistical models is a major concern of this thesis. Neural networks and mixed effects models are compared, showing the importance of local specificities. Estimates of US corn yield at the end of the year show that temperature and precipitation information account for an average of 28% of yield variability. In several more weather-sensitive states, this score increases to nearly 70%. These results are consistent with recent studies on the subject. Mid-season maize crop yield forecasts are possible from July: as of July, the meteorological information available accounts for an average of 25% of the variability in final yield in the United States and close to 60% in more weather-sensitive states like Virginia. The northern and southeastern regions of the United States are the least well predicted. Predicting years for which extremely low yields are encountered is an important task. We use a specific method of classification, and show that with only 4 weather predictors, 71% of the very low yields are well detected on average. The impact of climate change on yields up to 2060 is also studied: the model we build provides information on the speed of evolution of yields in different counties of the United States. This highlights areas that will be most affected. For the most affected states (south and east coast), and with constant agricultural practice, the model predicts yields nearly divided by two in 2060, under the IPCC RCP 4.5 scenario. The northern states would be less affected. The statistical models we build can help for management on the short-term (seasonal forecasts) or to quantify the quality of the harvests before post-harvest surveys, as an aid to the monitoring (estimate at the end of the year). Estimations for the next 50 years help to anticipate the consequences of climate change on agricultural yields, and to define adaptation or mitigation strategies. The methodology used in this thesis is easily generalized to other cultures and other regions of the world
APA, Harvard, Vancouver, ISO, and other styles
4

Schulz, Sebastian. "Ein Hochschulschriftenserver für die SLUB Dresden - Weboberfläche für Browsing und Recherche." Master's thesis, Technische Universität Dresden, 2001. https://tud.qucosa.de/id/qucosa%3A24847.

Full text
Abstract:
In Deutschland stehen vor allem die Universitätsbibliotheken zu Beginn des 21. Jahrhunderts vor der großen Herausforderung, die sich rasch vollziehenden Veränderungen und die sich bietenden technischen Möglichkeiten zu erkennen und als Chance zu begreifen, sich vom Image angestaubter Archivieranstalten zu lösen und sich nach und nach zu ”universitären Informations- und Servicezentren” zu entwickeln. Auch für die Sächsische Landesbibliothek - Staats- und Universitätsbibliothek (SLUB - im Folgenden nur noch SLUB genannt) trifft diese Standortbestimmung zu. Hier machte man sich etwa ab dem Jahr 1999 verstärkt darüber Gedanken, wie man in Zukunft Hochschulschriften digital verwalten und archivieren könnte.
APA, Harvard, Vancouver, ISO, and other styles
5

Burzlaff, Marcus. "Aircraft Fuel Consumption - Estimation and Visualization." Aircraft Design and Systems Group (AERO), Department of Automotive and Aeronautical Engineering, Hamburg University of Applied Sciences, 2017. http://d-nb.info/1148997490.

Full text
Abstract:
In order to uncover the best kept secret in today's commercial aviation, this project deals with the calculation of fuel consumption of aircraft. With only the reference of the aircraft manufacturer's information, given within the airport planning documents, a method is established that allows computing values for the fuel consumption of every aircraft in question. The aircraft's fuel consumption per passenger and 100 flown kilometers decreases rapidly with range, until a near constant level is reached around the aircraft's average range. At longer range, where payload reduction becomes necessary, fuel consumption increases significantly. Numerical results are visualized, explained, and discussed. With regard to today's increasing number of long-haul flights, the results are investigated in terms of efficiency and viability. The environmental impact of burning fuel is not considered in this report. The presented method allows calculating aircraft type specific fuel consumption based on publicly available information. In this way, the fuel consumption of every aircraft can be investigated and can be discussed openly.
APA, Harvard, Vancouver, ISO, and other styles
6

Noordhuis-Fairfax, Sarina. "Field | Guide: John Berger and the diagrammatic exploration of place." Phd thesis, Canberra, ACT : The Australian National University, 2018. http://hdl.handle.net/1885/154278.

Full text
Abstract:
Positioned between writing and drawing, the diagram is proposed by John Berger as an alternative strategy for articulating encounters with landscape. A diagrammatic approach offers a schematic vocabulary that can compress time and offer a spatial reading of information. Situated within the contemporary field of direct data visualisation, my practice-led research interprets Berger’s ‘Field’ essay as a guide to producing four field | studies within a suburban park in Canberra. My seasonal investigations demonstrate how applying the conventions of the pictorial list, dot-distribution map, routing diagram and colour-wheel reveals subtle ecological and biographical narratives.
APA, Harvard, Vancouver, ISO, and other styles
7

Rahm, Erhard, Katja Schwippner, and Dieter Sosna. "Dienste für Online-Dokumente gestartet." 1998. https://ul.qucosa.de/id/qucosa%3A31964.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Long Document Classification and Explanation"

1

Mittal, Saloni, Vidula Magdum, Sharayu Hiwarkhedkar, Omkar Dhekane, and Raviraj Joshi. "L3Cube-MahaNews: News-Based Short Text and Long Document Classification Datasets in Marathi." In Communications in Computer and Information Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-58495-4_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

D’Cruz, Célia, Jean-Marc Bereder, Frédéric Precioso, and Michel Riveill. "Domain-Specific Long Text Classification from Sparse Relevant Information." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240967.

Full text
Abstract:
Large Language Models have undoubtedly revolutionized the Natural Language Processing field, the current trend being to promote one-model-for-all tasks (sentiment analysis, translation, etc.). However, the statistical mechanisms at work in the larger language models struggle to exploit the relevant information when it is very sparse, when it is a weak signal. This is the case, for example, for the classification of long domain-specific documents, when the relevance relies on a single relevant word or on very few relevant words from technical jargon. In the medical domain, it is essential to determine whether a given report contains critical information about a patient’s condition. This critical information is often based on one or few specific isolated terms. In this paper, we propose a hierarchical model which exploits a short list of potential target terms to retrieve candidate sentences and represent them into the contextualized embedding of the target term(s) they contain. A pooling of the term(s) embedding(s) entails the document representation to be classified. We evaluate our model on one public medical document benchmark in English and on one private French medical dataset. We show that our narrower hierarchical model is better than larger language models for retrieving relevant long documents in a domain-specific context.
APA, Harvard, Vancouver, ISO, and other styles
3

Prgomet, Mirela, Abbish Kamalakkannan, Judith Thomas, et al. "Identifying Long COVID Patients Using General Practice Data: Challenges, Classification and Long COVID Patterns." In Studies in Health Technology and Informatics. IOS Press, 2025. https://doi.org/10.3233/shti250477.

Full text
Abstract:
General practice data, extracted from electronic medical records, holds immense potential to generate a wealth of public health knowledge. But it is not without challenges. Our aim in this study was to identify long COVID patients within a large general practice dataset through data classification. We discuss the classification and its validation, and present initial data patterns for the identified long COVID cohort. We found significant variation in how general practitioners document and describe long COVID presentations. Less than half of the identified long COVID patients had a documented acute COVID infection. The highest proportion of long COVID patients were female and those 40–49 years of age. Overall, this study highlights key lessons for researchers utilizing general practice data, particularly in the context of long COVID, and underscores the vital importance of collaboration between researchers, general practitioners, and data custodians to ensure the robustness of data underpinning knowledge translation.
APA, Harvard, Vancouver, ISO, and other styles
4

Sidhu, Jaspreet. "Text Document Preprocessing and Classification Using SVM and Improved CNN." In Demystifying Emerging Trends in Machine Learning. BENTHAM SCIENCE PUBLISHERS, 2025. https://doi.org/10.2174/9789815305395125020023.

Full text
Abstract:
Text categorization is a crucial technology in data mining as well as data retrieval that has been extensively investigated and is developing at a rapid pace. Convolutional neural networks (CNNs) are a kind of deep learning modeling that may reduce the complexity of the model while accurately extracting characteristics from input text. Support vector machine (SVM) results have always been more trustworthy and superior to those of other traditional artificial intelligence approaches. Using enhanced convolutional neural network (CNNs) as well as support vector machines (SVMs), we offer a novel approach to online text categorization in this study. Our approach begins with text attribute identification and prediction using a model based on CNN with a five-layer network structure. Databases including both text and images will find it to be a major factor in the long run.
APA, Harvard, Vancouver, ISO, and other styles
5

koertge, Noretta. "Contingencies in Chemical Explanation." In Essays in the Philosophy of Chemistry. Oxford University Press, 2016. http://dx.doi.org/10.1093/oso/9780190494599.003.0013.

Full text
Abstract:
“Chemistry has a position in the center of the sciences, bordering onto physics, which provides its theoretical foundation, on one side, and onto biology on the other, living organisms being the most complex of all chemical systems” (Malmström et al.). Thus begins a recent essay on the development of modern chemistry. Philosophers have long wrestled with how best to describe the exact relationship between chemistry and physics. Is it an example of a classic reduction? But before we ask whether chemistry could in principle be derived from physics, there is a prior question: How well integrated is the science of chemistry itself? This chapter argues that although there is a coherent explanatory core within chemical theory, contingency plays a larger role than is usually recognized. Furthermore, these phenomena at the boundaries of traditional chemistry education are where some of the most important current research is occurring. I will first adopt a quasi-historical approach in this essay, including anecdotes from my own educational trajectory. I then briefly discuss how our current understanding of the explanatory structure of chemistry should be reflected in education today. The professor of quantum chemistry at the University of Illinois in the 1950s told us a story from his PhD defense. His director, Linus Pauling, walked into the room and said something to this effect: “Well, Karplus, you’ve done a bunch of calculations on the hydrogen molecule ion (H2 +). Very nice. But you claim to be a chemist. So please write the Periodic Table on the board for us.” Who knows exactly what point Pauling was actually trying to make, but it reminds us of this basic point. The periodic table with its horizontal and vertical trends is still the basis of the classification of enormous amounts of information about the formulae and properties of chemical compounds. Mendeleev would not have understood talk of strontium-90, but he would have realized immediately that this product of nuclear testing would enter the body in a manner similar to calcium.
APA, Harvard, Vancouver, ISO, and other styles
6

Jana, Enakshi, and V. Uma. "Opinion Mining and Product Review Summarization in E-Commerce." In Trends and Applications of Text Summarization Techniques. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-5225-9373-7.ch008.

Full text
Abstract:
With the immense increase of the number of users of the internet and simultaneously the massive expansion of the e-commerce platform, millions of products are sold online. To improve user experience and satisfaction, online shopping platform enables every user to give their reviews for each and every product that they buy online. Reviews are long and contain only a few sentences which are related to a particular feature of that product. It becomes very difficult for the user to understand other customer views about different features of the product. So, we need accurate opinion-based review summarization which will help both customers and product manufacture to understand and focus on a particular aspect of the product. In this chapter, the authors discuss the abstractive document summarization method to summarize e-commerce product reviews. This chapter has an in-depth explanation about different types of document summarization and how that can be applied to e-commerce product reviews.
APA, Harvard, Vancouver, ISO, and other styles
7

Zvinyatskovsky, Vladimir Ya. "A Historical Anecdote about a Diplomatic Dispatch." In The Non-Euclidean Geometry of Yuri Mann: In Memoriam. A.M. Gorky Institute of World Literature of the Russian Academy of Sciences, 2024. http://dx.doi.org/10.22455/978-5-9208-0754-0-160-167.

Full text
Abstract:
This article analyzes Gogol’s tale “The Missing Document” as a historical anecdote – one of those that were told during those long evenings in the homestead of Vasilyevka (Yanovshchina). This story could have been told by the master of the house — Vasily Gogol- Yanovsky, and its source might have been his father AfanasyYanovsky, the former trusted companion of the hetman Kirill Razumovsky and, most likely, himself the courier sent to the tsarina. The real historical facts mentioned in the tale allow us to date the events described in it to 1764, the year Catherine abolished the hetmanate as a political institution. For this reason, the events recounted in the tale were understood by contemporary readers as probably being historically true: the dispatch by the hetman of some document with the goal of convincing the tsarina not to deprive him of the office of hetman. The failure of the narrator’s grandfather’s diplomatic mission thereupon received a fantastic, but believable explanation: “When the devil or a muscovite steals something — then kiss it goodbye.”
APA, Harvard, Vancouver, ISO, and other styles
8

Bayeh, Jumana, Helen Groth, and Julian Murphet. "Writing and Rioting." In Writing the Global Riot. Oxford University PressOxford, 2023. http://dx.doi.org/10.1093/oso/9780192862594.003.0001.

Full text
Abstract:
Abstract This chapter contends that the literary archive models ways of understanding this new era of riots in which we currently live. The prevalence of rioting as a tactic for expressing collective dissent may seem to epitomize the volatility of contemporary global politics, but literary writers have long registered the riot’s insurrectionary appeal. The writers that feature in this collection exemplify literature’s relationship to the quasi-political form of the riot; a relationship which has been complex and varied, sometimes participatory, reactive, incendiary, but above all archival. As this introduction details, assuming representational responsibility for popular activities that the nation state perceives as ephemeral and destructive, literature has fashioned instead a parallel archive of a style of collective practice that offers unique opportunities to the creative writer. The spontaneous logic of riotous activity presents especially demanding experiential conditions to the artist who would faithfully record it all. Tested in the innermost resources of their artistry, this introduction elaborates on how writers have not merely documented riots over the years, but developed diverse ways of seeing them, ways that attend more radically to the phenomenology and expressive nature of the riot, than historiographical explanations, sociological classifications, or political denunciations ever could.
APA, Harvard, Vancouver, ISO, and other styles
9

Bhimavarapu, Usharani. "Combating Misinformation in Social Media and News." In Digital Citizenship and the Future of AI Engagement, Ethics, and Privacy. IGI Global, 2025. https://doi.org/10.4018/979-8-3693-9015-3.ch010.

Full text
Abstract:
Combating misinformation has become a critical challenge in today's information-driven society, particularly with the proliferation of fake news, propaganda, and biased content across various domains. This study explores advanced natural language processing (NLP) techniques, including feature extraction and selection, to analyze and classify datasets such as Q-Prop, ISOT, GRAFN, and PubHealth. The relief algorithm is employed for feature selection to identify the most relevant attributes, enhancing the efficiency of machine learning models. XLNet, a powerful transformer-based model, is utilized for document representation and classification due to its ability to capture bidirectional and long-term contextual dependencies. The proposed methodology demonstrates how robust embeddings, combined with domain-specific datasets and optimized feature selection, can accurately classify content across news, politics, and public health domains.
APA, Harvard, Vancouver, ISO, and other styles
10

Zielinski P.A. "Individual life safety risk criteria." In NATO Science for Peace and Security Series - E: Human and Societal Dynamics. IOS Press, 2012. https://doi.org/10.3233/978-1-61499-131-1-82.

Full text
Abstract:
The goal of safety management is to ensure that every infrastructure presents a tolerable level of risk and that such risk be as low as reasonably practicable. This document comments on dam safety decision making, focusing on risk evaluation criteria. More detailed discussion of the philosophy of dam safety decision making is found in Hartford et. al. (2004). The objective of dam safety management is based on the principle that the standard of care should be commensurate with risk and should reflect society&amp;apos;s values in allocating resources to protect life and property. &amp;lsquo;Risk&amp;rsquo; incorporates both the consequences of an adverse event and the probability of the event occurring, taken here as the product, Risk = [Probability] &amp;times; [Consequence]. In practice, however, the traditional approach to dam safety management has been to apply classification schemes in which consequences alone are used as a proxy for risk and probability is not considered. Because data are often limited, the assessor tends to be conservative in estimating consequences and the result is a &amp;ldquo;Maximum Loss&amp;rdquo; approach unrelated to risk. A quantified risk analysis is preferable to such classification schemes as long as scientific tools are available.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Long Document Classification and Explanation"

1

Pimparkhede, Sameer, and Pushpak Bhattacharyya. "Main Predicate and Their Arguments as Explanation Signals For Intent Classification." In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Association for Computational Linguistics, 2025. https://doi.org/10.18653/v1/2025.naacl-long.539.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Singha Roy, Sudipta, Xindi Wang, Robert Mercer, and Frank Rudzicz. "Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification." In Findings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.findings-emnlp.257.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jemila, S. Jacily, S. J. Grace Shoba, Shanker MC, A. Sathishkumar, and K. Rubitha. "Harnessing Hierarchical Attention Networks for Effective Document Classification in Handling Long Texts." In 2024 International Conference on Cybernation and Computation (CYBERCOM). IEEE, 2024. https://doi.org/10.1109/cybercom63683.2024.10803150.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Rafieian, Bardia, and Pere-Pau Vázquez. "Evaluating the Suitability of Long Document Embeddings for Classification Tasks: A Comparative Analysis." In 16th International Conference on Knowledge Discovery and Information Retrieval. SCITEPRESS - Science and Technology Publications, 2024. http://dx.doi.org/10.5220/0012950400003838.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Prasad, Nishchal, Taoufiq Dkaki, and Mohand Boughanem. "Explanation Extraction from Hierarchical Classification Frameworks for Long Legal Documents." In Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.findings-naacl.76.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pappagari, Raghavendra, Piotr Zelasko, Jesus Villalba, Yishay Carmiel, and Najim Dehak. "Hierarchical Transformers for Long Document Classification." In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2019. http://dx.doi.org/10.1109/asru46091.2019.9003958.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Wagh, Vedangi, Snehal Khandve, Isha Joshi, Apurva Wani, Geetanjali Kale, and Raviraj Joshi. "Comparative Study of Long Document Classification." In TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON). IEEE, 2021. http://dx.doi.org/10.1109/tencon54134.2021.9707465.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bambroo, Purbid, and Aditi Awasthi. "LegalDB: Long DistilBERT for Legal Document Classification." In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT). IEEE, 2021. http://dx.doi.org/10.1109/icaect49130.2021.9392558.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hu, Yongli, Puman Chen, Tengfei Liu, Junbin Gao, Yanfeng Sun, and Baocai Yin. "Hierarchical Attention Transformer Networks for Long Document Classification." In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9533869.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Khandve, Snehal Ishwar, Vedangi Kishor Wagh, Apurva Dinesh Wani, Isha Mandar Joshi, and Raviraj Bhuminand Joshi. "Hierarchical Neural Network Approaches for Long Document Classification." In ICMLC 2022: 2022 14th International Conference on Machine Learning and Computing. ACM, 2022. http://dx.doi.org/10.1145/3529836.3529935.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Long Document Classification and Explanation"

1

Ulloa-Suarez, Carolina, and Oscar Valencia. Sustaining Compliance with Fiscal Rules: A Future at Risk? Inter-American Development Bank, 2024. https://doi.org/10.18235/0013256.

Full text
Abstract:
Fiscal rules in Latin American and Caribbean (LAC) countries rarely have fixed objectives, exhibiting significant heterogeneity in design and implementation. This document analyzes how different factors influence the ease or difficulty of compliance with these rules and introduces an update of the Compliance dataset, highlighting new trends. The updated dataset shows that fiscal rule compliance in the region is sensitive to economic conditions, with a notable peak in 2022 due to favorable circumstances. However, the seemingly high compliance in 2022 was largely driven by favorable conditions, masking underlying vulnerabilities, as evidenced by the decline in 2023 and the emerging risks of non-compliance in 2024. This pattern suggests that compliance can sometimes misrepresent both short-term and long-term fiscal health. The analysis underscores the need to rethink fiscal rule frameworks to ensure long-term discipline and coherence with fiscal commitments. A new classification of rules in LAC can help understand compliance dynamics, close the gap between short- and long-term fiscal objectives, and identify specific characteristics to improve. This document emphasizes the necessity for targeted reforms to enhance the resilience and effectiveness of fiscal frameworks in the LAC region.
APA, Harvard, Vancouver, ISO, and other styles
2

Boyle, M., M. Gregory, Michael Byrne, Paula Capece, Sarah Corbett, and Wendy Wright. Terrestrial vegetation monitoring in Southeast Coast Network parks: Protocol implementation plan. National Park Service, 2019. https://doi.org/10.36967/2263392.

Full text
Abstract:
The Southeast Coast Network conducts long-term terrestrial vegetation monitoring as part of the nationwide Inventory and Monitoring Program of the National Park Service. Vegetation in parks is monitored as a key vital sign and indicator of overall ecosystem health because changes in vegetation condition reflect effects of stressors such as extreme weather, disease, invasive species, fire, and land use change. Plants also provide the structured habitat and food resources on which other species depend. Monitoring plants and their associated communities over time allows for targeted understanding of ecosystems within the SECN geography, which provides managers information about the degree of change within their parks’ natural vegetation. The Southeast Coast Network adheres to the definition of “natural” vegetation proposed by the National Vegetation Classification System as “vegetation which appears to be unmodified by human activities”, which differs from “cultural” vegetation “which is planted or actively maintained by humans such as annual croplands, orchards, and vineyards (Grossman et al. 1998).” Terrestrial vegetation monitoring takes place within natural vegetation areas of 15 national park units within the Southeast Coast Network. Parks include Canaveral National Seashore (CANA), Cape Hatteras National Seashore (CAHA), Cape Lookout National Seashore (CALO), Chattahoochee River National Recreation Area (CHAT), Congaree National Park (CONG), Cumberland Island National Seashore (CUIS), Fort Frederica National Monument (FOFR), Fort Matanzas National Monument (FOMA), Fort Pulaski National Monument (FOPU), Fort Sumter National Monument (FOSU), Horseshoe Bend National Military Park (HOBE), Kennesaw Mountain National Battlefield Park (KEMO), Moores Creek National Battlefield (MOCR), Ocmulgee Mounds National Historical Park (OCMU), and Timucuan Ecological and Historic Preserve (TIMU). The Southeast Coast Network monitors between nine and seventy-seven randomly located plots within each park. The number of plots depends on several factors, including the total terrestrial area and coverage of broadly defined habitat types within the park or its respective management unit. Monitored habitat types include tidal and nontidal maritime wetlands, alluvial wetlands, nonalluvial wetlands, upland forests, open upland woodlands, and natural to semi natural successional communities. Plots are 20 × 20 meters (65.6 × 65.6 feet [ft]) in size. Data collected in each plot include species richness, species-specific cover and constancy, species-specific woody stem seedling/sapling counts and adult tree (greater than 10 centimeters [3.9 inches (in)]) diameter at breast height (DBH), and site conditions and environmental covariates. The Southeast Coast Network’s approach, rationale, and required resources for terrestrial vegetation monitoring are described in this document, the protocol implementation plan narrative. Ten associated Standard Operating Procedures (SOPs) provide detailed instructions on how to collect, manage, analyze, and disseminate the project’s findings. The network’s narrative and some SOPs are derived, in large part, from Vegetation Monitoring Protocol for the Cumberland Piedmont Network, Version 1 (White et al. 2011). Any differences in approach between the two networks is documented throughout this Southeast Coast Network narrative and the SOP documents.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!