Academic literature on the topic 'Classification de document'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Classification de document.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Classification de document"

1

Shvetsova-Vodka, G. N. "Document classification as a theoretical problem of documentology." Scientific and Technical Libraries, no. 9 (October 4, 2022): 147–68. http://dx.doi.org/10.33186/1027-3689-2022-9-147-168.

Full text
Abstract:
The changes and amendments to the general document classification as a theoretical problem of document studies and documentology, are discussed. The document in the meaning “recorded information” is the subject of classification. The author examines the faceted-block classification of document based on various characteristics united into six clusters: “Types of documents physical carrier”, “Types of documents by document acquisition circumstances”, “Types of documents by information representational and transfer tools (by signative component)”, “Types of documents by information reception (perceptive component)”, “Types of documents by their environmental circumstances”. The document classification within each facet is independent which enables to characterize any document in various parameters. The changes and amendments are made to the general document classification based on increasing number and weight of electronic documents including digital versions of originally non-digital documents. The refined general classification is to facilitate special classification in particular types (classes, groups) of documents. The general document classification is applicable to various scientific disciplines within documentology, and to teaching documentological disciplines.
APA, Harvard, Vancouver, ISO, and other styles
2

Chung-Hsing Chen, Chung-Hsing Chen, and Ko-Wei Huang Chung-Hsing Chen. "Document Classification Using Lightweight Neural Network." 網際網路技術學刊 24, no. 7 (2023): 1505–11. http://dx.doi.org/10.53106/160792642023122407012.

Full text
Abstract:
<p>In recent years, OCR data has been used for learning and analyzing document classification. In addition, some neural networks have used image recognition for training, such as the network published by the ImageNet Large Scale Visual Recognition Challenge for document image training, AlexNet, GoogleNet, and MobileNet. Document image classification is important in data extraction processes and often requires significant computing power. Furthermore, it is difficult to implement image classification using general computers without a graphics processing unit (GPU). Therefore, this study proposes a lightweight neural network application that can perform document image classification on general computers or the Internet of Things (IoT) without a GPU. Plustek Inc. provided 3065 receipts belonging to 58 categories. Three datasets were considered as test samples while the remaining were considered as training samples to train the network to obtain a classifier. After the experiments, the classifier achieved 98.26% accuracy, and only 3 out of 174 samples showed errors.</p> <p> </p>
APA, Harvard, Vancouver, ISO, and other styles
3

HAO, XIAOLONG, JASON T. L. WANG, MICHAEL P. BIEBER, and PETER A. NG. "HEURISTIC CLASSIFICATION OF OFFICE DOCUMENTS." International Journal on Artificial Intelligence Tools 03, no. 02 (1994): 233–65. http://dx.doi.org/10.1142/s0218213094000121.

Full text
Abstract:
Document Processing Systems (DPSs) support office workers to manage information. Document classification is a major function of DPSs. By analyzing a document’s layout and conceptual structures, we present in this paper a sample-based approach to document classification. We represent a document’s layout structure by an ordered labeled tree through a procedure known as nested segmentation and represent the document’s conceptual structure by a set of attribute type pairs. The layout similarities between the document to be classified and sample documents are determined by a previously developed approximate tree matching toolkit. The conceptual similarities between the documents are determined by analyzing their contents and by calculating the degree of conceptual closeness. The document type is identified by computing both the layout and conceptual similarities between the document to be classified and the samples in the document sample base. Some experimental results are presented, which demonstrate the effectiveness of the proposed techniques.
APA, Harvard, Vancouver, ISO, and other styles
4

D’Silva, Suzanne, Neha Joshi, Sudha Rao, Sangeetha Venkatraman, and Seema Shrawne. "Improved Algorithms for Document Classification &Query-based Multi-Document Summarization." International Journal of Engineering and Technology 3, no. 4 (2011): 404–9. http://dx.doi.org/10.7763/ijet.2011.v3.261.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Calvo, Rafael A., and H. A. Ceccatto. "Intelligent document classification." Intelligent Data Analysis 4, no. 5 (2000): 411–20. http://dx.doi.org/10.3233/ida-2000-4503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lee, Jae-Moon, and Rafael A. Calvo. "Scalable document classification." Intelligent Data Analysis 9, no. 4 (2005): 365–80. http://dx.doi.org/10.3233/ida-2005-9404.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jaiswal, Babita. "Automatic Document Classification." DESIDOC Bulletin of Information Technology 19, no. 3 (1999): 23–28. http://dx.doi.org/10.14429/dbit.19.3.3486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Surovtseva, Nataliya G. "Classification of documents as a theoretical problem in office work and in the archive." Herald of an archivist, no. 3 (2022): 756–71. http://dx.doi.org/10.28995/2073-0101-2022-3-756-771.

Full text
Abstract:
For archival and documentary studies, classification issues are of key importance for the development of the methodology of scientific research, the results of which should be applied in practice. The issues of document classification were raised by scientists and specialists in the field of working with documents throughout the Soviet period. K.G. Mityaev paid a lot of attention to the problem of document classification. He was the first to draw attention to the fact that it is necessary to achieve conjugation in the classification of documents in office work and archive. In archival science and archival work, the question of the classification of documents has traditionally been associated with the development of lists of documents with an indication of storage periods. The purpose of the selection of documents for archival storage determined the approach to the classification of documents as the formation of complexes of document systems and subsystems formed as a result of documenting various areas of society. In document science, the problem of document classification resulted in the development of the concept of "document type" and the construction of a specific classification scheme. The problem of classification was actualized in connection with the development of unified documentation systems and raised in the scientific research of A.N. Sokova on document systematics. Her attempt to deepen the classification scheme to the level of a document was unsuccessful and in document science the concept of "document type" was fixed in the meaning of the classification unit of documents with a common self-designation, functional purpose and design features. Modern possibilities of using information technologies in working with documents allow us to take a different look at solving the problem of document classification. It is necessary to abandon the idea of constructing a universal hierarchical classification, which is a flat, linear model. It is necessary to create an integral classification structure, which is based on numerous internal relationships between meanings. The model of such an integral classification is the database of the information system. The classification scheme of the database is based on specific measurements (details, attributes, indicators), the meanings of which are determined using the reference book. At the same time, the composition of classification measurements for all documents in one information system will be the same, but the values of these measurements for each document will be different. Practical possibilities for classifying documents are associated with the need to solve a number of theoretical problems. It is necessary to determine the composition of measurements for constructing a classification scheme and to establish those that are uniform for information systems in management and in the archive.
APA, Harvard, Vancouver, ISO, and other styles
9

Jijo Varghese and P. Tamil Selvan. "A Novel Clustering and Matrix Based Computation for Big Data Dimensionality Reduction and Classification." Journal of Advanced Research in Applied Sciences and Engineering Technology 32, no. 1 (2023): 238–51. http://dx.doi.org/10.37934/araset.32.1.238251.

Full text
Abstract:
For higher dimensional or "Big Data (BD)" clustering and classification, the dimensions of documents have to be considered. The overhead of classifying methods might also be reduced by resolving the volumetric issue of documents. However, the dimensions of the shortened collection of documents might potentially generate noise and abnormalities. Previous noise and abnormality information removal strategies include several different approaches that have already been established throughout time. To increase classification accuracy, current classifications or new classification methods that has created to conduct classification, must deal with some of the most difficult issues in BD document categorization and clustering. Hence, the goals of this research are derived from the issues that can be solved only by expanding classification accuracy of classifiers. Superior clusters may also be achieved by using effective "Dimensionality Reduction (DR)". As the first step in this research, we introduce a unique DR approach that preserves word frequency in the document collection, allowing the classification algorithm to obtain improved (or) at least equal classification levels of accuracy with a lower dimensionality set of documents. When clustering "Word Patterns (WPs)" during "WP Clustering (WPC)", we imply a new WP "Similarity Function (SF)" for "Similarity Computation (SC)" to be used as part of WPC. DR of the document collection is accomplished with the use of information gained from various WP clusters. Finally, we provide "Similarity Measures" for SC of high dimensional texts and deliver SF for document classification and deliver SF for document classification. With assessment criteria like "Information-Ratio for Dimension-Reduction", "Accuracy", and "Recall", we discovered that the proposed method WP paired with SC (WP-SC) scaled extremely effectively to higher dimensional "Dataset’s (DS)" and surpasses the current technique AFO-MKSVM. According to the findings, the WP-SC approach produced more favorable outcomes than the LDA-SVM and AFO-MKSVM approaches.
APA, Harvard, Vancouver, ISO, and other styles
10

Uddin, Farid, Yibo Chen, Zuping Zhang, and Xin Huang. "Corpus Statistics Empowered Document Classification." Electronics 11, no. 14 (2022): 2168. http://dx.doi.org/10.3390/electronics11142168.

Full text
Abstract:
In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging to capture semantic insights when dealing with short-length documents where word co-occurrence information is limited. In this context, for long texts, we proposed Weighted Sparse Document Vector (WSDV), which performs clustering on the weighted data that emphasizes vital terms and moderates the soft clustering by removing outliers from the converged clusters. Besides the removal of outliers, WSDV utilizes corpus statistics in different steps for the vectorial representation of the document. For short texts, we proposed Weighted Compact Document Vector (WCDV), which captures better semantic insights in building document vectors by emphasizing potential terms and capturing uncertainty information while measuring the affinity between distributions of words. Using available corpus statistics, WCDV sufficiently handles the data sparsity of short texts without depending on external knowledge sources. To evaluate the proposed models, we performed a multiclass document classification using standard performance measures (precision, recall, f1-score, and accuracy) on three long- and two short-text benchmark datasets that outperform some state-of-the-art models. The experimental results demonstrate that in the long-text classification, WSDV reached 97.83% accuracy on the AgNews dataset, 86.05% accuracy on the 20Newsgroup dataset, and 98.67% accuracy on the R8 dataset. In the short-text classification, WCDV reached 72.7% accuracy on the SearchSnippets dataset and 89.4% accuracy on the Twitter dataset.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Classification de document"

1

Lovegrove, Will. "Advanced document analysis and automatic classification of PDF documents." Thesis, University of Nottingham, 1996. http://eprints.nottingham.ac.uk/13967/.

Full text
Abstract:
This thesis explores the domain of document analysis and document classification within the PDF document environment The main focus is the creation of a document classification technique which can identify the logical class of a PDF document and so provide necessary information to document class specific algorithms (such as document understanding techniques). The thesis describes a page decomposition technique which is tailored to render the information contained in an unstructured PDF file into a set of blocks. The new technique is based on published research but contains many modifications which enable it to competently analyse the internal document model of PDF documents. A new level of document processing is presented: advanced document analysis. The aim of advanced document analysis is to extract information from the PDF file which can be used to help identify the logical class of that PDF file. A blackboard framework is used in a process of block labelling in which the blocks created from earlier segmentation techniques are classified into one of eight basic categories. The blackboard's knowledge sources are programmed to find recurring patterns amongst the document's blocks and formulate document-specific heuristics which can be used to tag those blocks. Meaningful document features are found from three information sources: a statistical evaluation of the document's esthetic components; a logical based evaluation of the labelled document blocks and an appearance based evaluation of the labelled document blocks. The features are used to train and test a neural net classification system which identifies the recurring patterns amongst these features for four basic document classes: newspapers; brochures; forms and academic documents. In summary this thesis shows that it is possible to classify a PDF document (which is logically unstructured) into a basic logical document class. This has important ramifications for document processing systems which have traditionally relied upon a priori knowledge of the logical class of the document they are processing.
APA, Harvard, Vancouver, ISO, and other styles
2

Augereau, Olivier. "Reconnaissance et classification d’images de documents." Thesis, Bordeaux 1, 2013. http://www.theses.fr/2013BOR14764/document.

Full text
Abstract:
Ces travaux de recherche ont pour ambition de contribuer à la problématique de la classification d’images de documents. Plus précisément, ces travaux tendent à répondre aux problèmes rencontrés par des sociétés de numérisation dont l’objectif est de mettre à disposition de leurs clients une version numérique des documents papiers accompagnés d’informations qui leurs sont relatives. Face à la diversité des documents à numériser, l’extraction d’informations peut s’avérer parfois complexe. C’est pourquoi la classification et l’indexation des documents sont très souvent réalisées manuellement. Ces travaux de recherche ont permis de fournir différentes solutions en fonction des connaissances relatives aux images que possède l’utilisateur ayant en charge l’annotation des documents.Le premier apport de cette thèse est la mise en place d’une méthode permettant, de manière interactive, à un utilisateur de classer des images de documents dont la nature est inconnue. Le second apport de ces travaux est la proposition d’une technique de recherche d’images de documents par l’exemple basée sur l’extraction et la mise en correspondance de points d’intérêts. Le dernier apport de cette thèse est l’élaboration d’une méthode de classification d’images de documents utilisant les techniques de sacs de mots visuels<br>The aim of this research is to contribute to the document image classification problem. More specifically, these studies address digitizing company issues which objective is to provide the digital version of paper document with information relating to them. Given the diversity of documents, information extraction can be complex. This is why the classification and the indexing of documents are often performed manually. This research provides several solutions based on knowledge of the images that the user has. The first contribution of this thesis is a method for classifying interactively document images, where the content of documents and classes are unknown. The second contribution of this work is a new technique for document image retrieval by giving one example of researched document. This technique is based on the extraction and matching of interest points. The last contribution of this thesis is a method for classifying document images by using bags of visual words techniques
APA, Harvard, Vancouver, ISO, and other styles
3

Mondal, Abhro Jyoti. "Document Classification using Characteristic Signatures." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1511793852923472.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sandsmark, Håkon. "Spoken Document Classification of Broadcast News." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for elektronikk og telekommunikasjon, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-19226.

Full text
Abstract:
Two systems for spoken document classification are implemented by combining an automatic speech recognizer with the two classification algorithms naive Bayes and logistic regression. The focus is on how to handle the inherent uncertainty in the output of the speech recognizer. Feature extraction is performed by computing expected word counts from speech recognition lattices, and subsequently removing words that are found to carry little or noisy information about the topic label, as determined by the information gain metric. The systems are evaluated by performing cross-validation on broadcast news stories, and the classification accuracy is measured with different configurations and on recognition output with different word error rates. The results show that a relatively high classification accuracy can be obtained with word error rates around 50%, and that the benefit of extracting features from lattices instead of 1-best transcripts increases with increasing word error rates.
APA, Harvard, Vancouver, ISO, and other styles
5

Chantar, Hamouda Khalifa Hamouda. "New techniques for Arabic document classification." Thesis, Heriot-Watt University, 2013. http://hdl.handle.net/10399/2669.

Full text
Abstract:
Text classification (TC) concerns automatically assigning a class (category) label to a text document, and has increasingly many applications, particularly in the domain of organizing, for browsing in large document collections. It is typically achieved via machine learning, where a model is built on the basis of a typically large collection of document features. Feature selection is critical in this process, since there are typically several thousand potential features (distinct words or terms). In text classification, feature selection aims to improve the computational e ciency and classification accuracy by removing irrelevant and redundant terms (features), while retaining features (words) that contain su cient information that help with the classification task. This thesis proposes binary particle swarm optimization (BPSO) hybridized with either K Nearest Neighbour (KNN) or Support Vector Machines (SVM) for feature selection in Arabic text classi cation tasks. Comparison between feature selection approaches is done on the basis of using the selected features in conjunction with SVM, Decision Trees (C4.5), and Naive Bayes (NB), to classify a hold out test set. Using publically available Arabic datasets, results show that BPSO/KNN and BPSO/SVM techniques are promising in this domain. The sets of selected features (words) are also analyzed to consider the di erences between the types of features that BPSO/KNN and BPSO/SVM tend to choose. This leads to speculation concerning the appropriate feature selection strategy, based on the relationship between the classes in the document categorization task at hand. The thesis also investigates the use of statistically extracted phrases of length two as terms in Arabic text classi cation. In comparison with Bag of Words text representation, results show that using phrases alone as terms in Arabic TC task decreases the classification accuracy of Arabic TC classifiers significantly while combining bag of words and phrase based representations may increase the classification accuracy of the SVM classifier slightly.
APA, Harvard, Vancouver, ISO, and other styles
6

Calabrese, Stephen. "Nonnegative Matrix Factorization and Document Classification." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1462.

Full text
Abstract:
Applications of Non-negative Matrix Factorization are ubiquitous, and there are several well known algorithms available. This paper is concerned with the preprocessing of the documents and how the preprocessing effects document classification. The preprocessing discussed in this paper will run the classification on a variety of inner dimensions to see how my initialization compares to random initialization across an assortment of inner dimensions. The document classification is accomplished by using Non-negative Matrix Factorization and a Support Vector Machine. Several of the well known algorithms call for a random initialization of matrices before starting an iterative process to a locally best solution. Not only is the initialization often random, but choosing the size of the inner dimension also remains a difficult and mysterious task.\\ This paper explores the possible gains in categorization accuracy given a more intelligently chosen initialization as opposed to a random initialization through the use of the Reuters-21578 document collection. This paper presents two new and different approaches for initialization of the data matrix. The first approach uses the most important words for a given document that are least important to all the other documents. The second approach will incorporate the words that appear in the title and header of the documents that are not stop words. The motivation for this is that the title usually tells the reader what the document is about. As a result, the words should be relevant to the category of the document. This paper will also present an entire framework for testing and comparing different Non-negative Matrix Factorization initialization methods. A thorough overview of the implementation and results are presented to ease the interfacing with future work.
APA, Harvard, Vancouver, ISO, and other styles
7

McElroy, Jonathan David. "Automatic Document Classification in Small Environments." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/682.

Full text
Abstract:
Document classification is used to sort and label documents. This gives users quicker access to relevant data. Users that work with large inflow of documents spend time filing and categorizing them to allow for easier procurement. The Automatic Classification and Document Filing (ACDF) system proposed here is designed to allow users working with files or documents to rely on the system to classify and store them with little manual attention. By using a system built on Hidden Markov Models, the documents in a smaller desktop environment are categorized with better results than the traditional Naive Bayes implementation of classification.
APA, Harvard, Vancouver, ISO, and other styles
8

Blein, Florent. "Automatic Document Classification Applied to Swedish News." Thesis, Linköping University, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-3065.

Full text
Abstract:
<p>The first part of this paper presents briefly the ELIN[1] system, an electronic newspaper project. ELIN is a framework that stores news and displays them to the end-user. Such news are formatted using the xml[2] format. The project partner Corren[3] provided ELIN with xml articles, however the format used was not the same. My first task has been to develop a software that converts the news from one xml format (Corren) to another (ELIN).</p><p>The second and main part addresses the problem of automatic document classification and tries to find a solution for a specific issue. The goal is to automatically classify news articles from a Swedish newspaper company (Corren) into the IPTC[4] news categories.</p><p>This work has been carried out by implementing several classification algorithms, testing them and comparing their accuracy with existing software. The training and test documents were 3 weeks of the Corren newspaper that had to be classified into 2 categories.</p><p>The last tests were run with only one algorithm (Naïve Bayes) over a larger amount of data (7, then 10 weeks) and categories (12) to simulate a more real environment.</p><p>The results show that the Naïve Bayes algorithm, although the oldest, was the most accurate in this particular case. An issue raised by the results is that feature selection improves speed but can seldom reduce accuracy by removing too many features.</p>
APA, Harvard, Vancouver, ISO, and other styles
9

SHEN, TONG. "Document and Image Classification withTopic Ngram Model." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-155771.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) is a popular probabilistic model for information retrieval. Many extended models based on LDA have been introduced during the past 10 years. In LDA, a data point is represented as a bag (multiset)of words. In the text case, a word is a regular text word, but other types of data can also be represented as words (e.g. visual words). Due to the bag-of-words assumption, the original LDA neglects the structure of thedata, i.e., all the relationships between words, which leads to information loss. As a matter of fact, the spatial relationship is important and useful. In order to explore the importance of the relationship, we focus on an extensionof LDA called Topic Ngram Model, which models the relationship among adjacent words. In this thesis, we first implement the model and use it in for text classification. Furthermore, we propose a 2D extension, which enables us to model spatial relationships of features in images.
APA, Harvard, Vancouver, ISO, and other styles
10

Gupta, Anjum. "New framework for cross-domain document classification." Monterey, California. Naval Postgraduate School, 2011. http://hdl.handle.net/10945/10786.

Full text
Abstract:
Automatic text document classification is a fundamental problem in machine learning. Given the dynamic nature and the exponential growth of the World Wide Web, one needs the ability to classify not only a massive number of documents, but also documents that belong to wide variety of domains. Some examples of the domains are e-mails, blogs, Wikipedia articles, news articles, newsgroups, online chats, etc. It is the difference in the writing style that differentiates these domains. Text documents are usually classified using supervised learning algorithms that require large set of pre-labeled data. This requirement, of labeled data, poses a challenge in classifying documents that belong to different domains. Our goal is to classify text documents in the testing domain without requiring any labeled documents from the same domain. Our research develops specialized cross-domain learning algorithms based the distributions over words obtained from a collection of text documents by topic models such as Latent Dirichlet Allocation (LDA). Our major contributions include (1) empirically showing that conventional supervised learning algorithms fail to generalize their learned models across different domains and (2) development of novel and specialized cross-domain classification algorithms that show an appreciable improvement over conventional methods used for cross-domain classification that is consistent for different datasets. Our research addresses many real-world needs. Since massive number of new types of text documents is generated daily, it is crucial to have the ability to transfer learned information from one domain to another domain. Cross-domain classification lets us leverage information learned from one domain for use in the classification of documents in a new domain.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Classification de document"

1

Commission, United States Nuclear Regulatory. Public document room file classification system. 2nd ed. U.S. Nuclear Regulatory Commission, Office of the Secretary, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

United States. Nuclear Regulatory Commission. Public document room file classification system. 2nd ed. U.S. Nuclear Regulatory Commission, Office of the Secretary, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Leskiw, L. A. Agricultural capability classification for reclamation: working document. s.n, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Leskiw, L. A. Agricultural capability classification for reclamation: Working document. Alberta Conservation and Reclamation Council (Reclamation Research Technical Advisory Committee), 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Robert Peccia & Associates. 1992 Montana Highway functional reclassification: Final report document. Robert Peccia & Associates, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhou, Shun. Incremental document classification in a knowledge management environment. National Library of Canada, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Library, New York State. Nydocs: New York State document classification system : the arrangement of New York State documents in libraries. University of the State of New York, State Education Dept., New York State Library, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Automatic indexing and abstracting of document texts. Kluwer Academic Publishers, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Museum Documentation Association (Great Britain), ed. Facts & artefacts: How to document a museum collection. 2nd ed. MDA, 1998.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Museum Documentation Association (Great Britain) and Great Britain. Museums and Galleries Commission., eds. Facts & artefacts: How to document a museum collection. Museum Documentation Association, with the support of the Museums & Galleries Commission, 1991.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Classification de document"

1

Webb, Geoffrey I., Johannes Fürnkranz, Johannes Fürnkranz, et al. "Document Classification." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mladenić, Dunja, Janez Brank, and Marko Grobelnik. "Document Classification." In Encyclopedia of Machine Learning and Data Mining. Springer US, 2016. http://dx.doi.org/10.1007/978-1-4899-7502-7_75-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mladenić, Dunja, Janez Brank, and Marko Grobelnik. "Document Classification." In Encyclopedia of Machine Learning and Data Mining. Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_75.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Bing. "Document Sentiment Classification." In Sentiment Analysis and Opinion Mining. Springer International Publishing, 2012. http://dx.doi.org/10.1007/978-3-031-02145-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bhowmik, Showmik. "Document Region Classification." In SpringerBriefs in Computer Science. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4277-0_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Rumeng, and Hiroyuki Shindo. "Distributed Document Representation for Document Classification." In Advances in Knowledge Discovery and Data Mining. Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-18038-0_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Guthrie, Louise, Joe Guthrie, and James Leistensnider. "Document Classification and Routing." In Text, Speech and Language Technology. Springer Netherlands, 1999. http://dx.doi.org/10.1007/978-94-017-2388-6_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bakus, Jan, and Mohamed Kamel. "Document Classification Using Phrases." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2002. http://dx.doi.org/10.1007/3-540-70659-3_58.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Assainar Hafnan, P. P., and Anuraj Mohan. "Summary-Based Document Classification." In Advances in Intelligent Systems and Computing. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8633-5_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Maio, Paulo, Nuno Silva, Ricardo Brandão, Jorge Vasconcelos, and Fábio Loureiro. "Multi-classification Document Manager." In Lecture Notes in Electrical Engineering. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28798-5_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Classification de document"

1

Chanda, Sukalpa, Katrin Franke, and Umapada Pal. "Document-Zone Classification in Torn Documents." In 2010 International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 2010. http://dx.doi.org/10.1109/icfhr.2010.12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Alshamari, Fatimah, and Abdou Youssef. "A Study into Math Document Classification using Deep Learning." In 8th International Conference on Computational Science and Engineering (CSE 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.101702.

Full text
Abstract:
Document classification is a fundamental task for many applications, including document annotation, document understanding, and knowledge discovery. This is especially true in STEM fields where the growth rate of scientific publications is exponential, and where the need for document processing and understanding is essential to technological advancement. Classifying a new publication into a specific domain based on the content of the document is an expensive process in terms of cost and time. Therefore, there is a high demand for a reliable document classification system. In this paper, we focus on classification of mathematics documents, which consist of English text and mathematics formulas and symbols. The paper addresses two key questions. The first question is whether math-document classification performance is impacted by math expressions and symbols, either alone or in conjunction with the text contents of documents. Our investigations show that Text-Only embedding produces better classification results. The second question we address is the optimization of a deep learning (DL) model, the LSTM combined with one dimension CNN, for math document classification. We examine the model with several input representations, key design parameters and decision choices, and choices of the best input representation for math documents classification.
APA, Harvard, Vancouver, ISO, and other styles
3

Guruprakash, K. S., K. Valli Priyadharshini, G. Pavithra, et al. "Document vector extension for document classification." In INTERNATIONAL CONFERENCE ON SCIENCE, ENGINEERING, AND TECHNOLOGY 2022: Conference Proceedings. AIP Publishing, 2023. http://dx.doi.org/10.1063/5.0173197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Nyberg, Katariina, Tapani Raiko, Teemu Tiinanen, and Eero Hyvönen. "Document classification utilising ontologies and relations between documents." In the Eighth Workshop. ACM Press, 2010. http://dx.doi.org/10.1145/1830252.1830264.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Blosseville, M. J., G. Hébrail, M. G. Monteil, and N. Pénot. "Automatic document classification." In the 15th annual international ACM SIGIR conference. ACM Press, 1992. http://dx.doi.org/10.1145/133160.133175.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Parker, Kevin, Robert Williams, Philip Nitse, and Albert Tay. "Use of the Normalized Word Vector Approach in Document Classification for an LKMC." In InSITE 2008: Informing Science + IT Education Conference. Informing Science Institute, 2008. http://dx.doi.org/10.28945/3259.

Full text
Abstract:
In order to realize the objective of expanding library services to provide knowledge management support for small businesses, a series of requirements must be met. This particular phase of a larger research project focuses on one of the requirements: the need for a document classification system to rapidly determine the content of digital documents. Document classification techniques are examined to assess the available alternatives for realization of Library Knowledge Management Centers (LKMCs). After evaluating prominent techniques the authors opted to investigate a less well-known method, the Normalized Word Vector (NWV) approach, which has been used successfully in classifying highly unstructured documents, i.e., student essays. The authors propose utilizing the NWV approach for LKMC automatic document classification with the goal of developing a system whereby unfamiliar documents can be quickly classified into existing topic categories. This conceptual paper will outline an approach to test NWV’s suitability in this area.
APA, Harvard, Vancouver, ISO, and other styles
7

"Document Classification of Accreditation Documents Using Machine Learning Algorithm." In Jan. 29-30, 2019 Cebu (Philippines). Emirates Research Publishing, 2019. http://dx.doi.org/10.17758/erpub3.er01192016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tian, Bing, Yong Zhang, Jin Wang, and Chunxiao Xing. "Hierarchical Inter-Attention Network for Document Classification with Multi-Task Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/495.

Full text
Abstract:
Document classification is an essential task in many real world applications. Existing approaches adopt both text semantics and document structure to obtain the document representation. However, these models usually require a large collection of annotated training instances, which are not always feasible, especially in low-resource settings. In this paper, we propose a multi-task learning framework to jointly train multiple related document classification tasks. We devise a hierarchical architecture to make use of the shared knowledge from all tasks to enhance the document representation of each task. We further propose an inter-attention approach to improve the task-specific modeling of documents with global information. Experimental results on 15 public datasets demonstrate the benefits of our proposed model.
APA, Harvard, Vancouver, ISO, and other styles
9

An, C., H. Baird, and P. Xiu. "Iterated Document Content Classification." In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). IEEE, 2007. http://dx.doi.org/10.1109/icdar.2007.4378714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Jain, Rajiv, and Curtis Wigington. "Multimodal Document Image Classification." In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019. http://dx.doi.org/10.1109/icdar.2019.00021.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Classification de document"

1

Celmins, Aivars. Document Classification by Fuzzy Attribute Evaluation. Defense Technical Information Center, 2000. http://dx.doi.org/10.21236/ada391375.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sheehan, Kathleen, Kikumi Tatsuoka, and Charles Lewis. A Diagnostic Classification Model for Document Processing Skills. Defense Technical Information Center, 1993. http://dx.doi.org/10.21236/ada273790.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Han, Euihong, and George Karypis. Centroid-Based Document Classification Algorithms: Analysis & Experimental Results. Defense Technical Information Center, 2000. http://dx.doi.org/10.21236/ada439538.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Masci, Pietro, Bernardo Barros Weaver, and André Medici. The Relationship Between Insurance and Entrepreneurship in Brazil: Concepts and Basic Data. Inter-American Development Bank, 2007. http://dx.doi.org/10.18235/0006581.

Full text
Abstract:
This document presents data related to insurance and entrepreneurship in the states of Brazil over a period of 10 years. The data is accompanied by brief discussions of terms' definitions. They range from the definition of Startup and Small and Medium-Sized Enterprises (SMEs), to the classification of insurance contracts by types of risks.
APA, Harvard, Vancouver, ISO, and other styles
5

Huff, Karen, Robert McDougall, Terrie Walmsley, and Angel Aguiar. Contributing Input-Output Tables to the GTAP Data Base. GTAP Technical Paper, 2000. http://dx.doi.org/10.21642/gtap.tp01.

Full text
Abstract:
This document is written for those who wish to contribute to the GTAP data base, whether by providing an input-output table for a country not separately represented in the data base, or by updating the table for a region that is already represented. It provides specifications and advice on the structure of the table, sectoral classification, treatment of imports, and other key points. It also describes what we at the Center for Global Trade Analysis do once we receive your table. This version has been revised for use by contributors to release 11 of the GTAP data base. In particular, all concordances are to the revised GTAP sectoral classification.
APA, Harvard, Vancouver, ISO, and other styles
6

Chepeliev, Maksym. The GTAP 10A Data Base with Agricultural Production Targeting Based on the Food and Agricultural Organization (FAO) Data. GTAP Research Memoranda, 2020. http://dx.doi.org/10.21642/gtap.rm35.

Full text
Abstract:
This document describes a new source of inputs, based on FAO data, that allows us to estimate agricultural output targets on 133 regions of the GTAP 10A Data Base. This approach allows to overcome several limitations present under the current agricultural production targeting (APT) processing. First, a significant expansion in the regional coverage is achieved, as the number of regions undergoing APT more than doubles. Second, the detailed commodity classification of the FAO dataset allows for a more accurate mapping to the GTAP Data Base sectors. Third, a better commodity coverage in the FAO data prevents the issue of mapping a processed commodities to the corresponding primary sector. Finally, reliance on the FAO agricultural output data provides a better opportunity for further incorporation of the nutritional accounts to the GTAP Data Base, by lowering inconsistencies between GTAP and FAO agricultural accounting. Comparisons between OECD-based agricultural output (currently used in the GTAP Data Base) and FAO-derived estimates are provided in the document. FAO-based agricultural production targets are incorporated to the GTAP 10A Data Base build stream to produce a special release of the GTAP Data Base. JEL classification: C68, D57, D58, Q10, Q11. Keywords: Agricultural production targeting, GTAP Data Base, Computable general equilibrium.
APA, Harvard, Vancouver, ISO, and other styles
7

Ackerley, N., A. L. Bird, M. Kolaj, H. Kao, and M. Lamontagne. Procedures for seismic event type discrimination at the Canadian Hazards Information Service. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/329613.

Full text
Abstract:
Within a catalogue of seismic events, it is necessary to distinguish natural tectonic earthquakes from seismic events due to human activity or other natural processes. This becomes very important when the data are incorporated into models of seismic hazard, since natural and anthropogenic events follow different recurrence and scaling laws. This document outlines a two-step procedure whereby first, a most likely event type is identified, and second, confirmation or refutation is sought. The procedure is intended to be compatible with current and past practices at the Canadian Hazards Information Service and the Geological Survey of Canada in assigning event types in the National Earthquake Database (NEDB). Furthermore, this document presents a new nomenclature and coding system for event types and their certainty, one that is compatible with QuakeML. Detailed classification criteria are given for all common event types; for rare event types, only definitions and examples are given.
APA, Harvard, Vancouver, ISO, and other styles
8

Zio, Enrico, and Nicola Pedroni. Uncertainty characterization in risk analysis for decision-making practice. Fondation pour une culture de sécurité industrielle, 2012. http://dx.doi.org/10.57071/155chr.

Full text
Abstract:
This document provides an overview of sources of uncertainty in probabilistic risk analysis. For each phase of the risk analysis process (system modeling, hazard identification, estimation of the probability and consequences of accident sequences, risk evaluation), the authors describe and classify the types of uncertainty that can arise. The document provides: a description of the risk assessment process, as used in hazardous industries such as nuclear power and offshore oil and gas extraction; a classification of sources of uncertainty (both epistemic and aleatory) and a description of techniques for uncertainty representation; a description of the different steps involved in a Probabilistic Risk Assessment (PRA) or Quantitative Risk Assessment (QRA), and an analysis of the types of uncertainty that can affect each of these steps; annexes giving an overview of a number of tools used during probabilistic risk assessment, including the HAZID technique, fault trees and event tree analysis.
APA, Harvard, Vancouver, ISO, and other styles
9

Evans, Julie, Kendra Sikes, and Jamie Ratchford. Vegetation classification at Lake Mead National Recreation Area, Mojave National Preserve, Castle Mountains National Monument, and Death Valley National Park: Final report (Revised with Cost Estimate). National Park Service, 2020. http://dx.doi.org/10.36967/nrr-2279201.

Full text
Abstract:
Vegetation inventory and mapping is a process to document the composition, distribution and abundance of vegetation types across the landscape. The National Park Service’s (NPS) Inventory and Monitoring (I&amp;M) program has determined vegetation inventory and mapping to be an important resource for parks; it is one of 12 baseline inventories of natural resources to be completed for all 270 national parks within the NPS I&amp;M program. The Mojave Desert Network Inventory &amp; Monitoring (MOJN I&amp;M) began its process of vegetation inventory in 2009 for four park units as follows: Lake Mead National Recreation Area (LAKE), Mojave National Preserve (MOJA), Castle Mountains National Monument (CAMO), and Death Valley National Park (DEVA). Mapping is a multi-step and multi-year process involving skills and interactions of several parties, including NPS, with a field ecology team, a classification team, and a mapping team. This process allows for compiling existing vegetation data, collecting new data to fill in gaps, and analyzing the data to develop a classification that then informs the mapping. The final products of this process include a vegetation classification, ecological descriptions and field keys of the vegetation types, and geospatial vegetation maps based on the classification. In this report, we present the narrative and results of the sampling and classification effort. In three other associated reports (Evens et al. 2020a, 2020b, 2020c) are the ecological descriptions and field keys. The resulting products of the vegetation mapping efforts are, or will be, presented in separate reports: mapping at LAKE was completed in 2016, mapping at MOJA and CAMO will be completed in 2020, and mapping at DEVA will occur in 2021. The California Native Plant Society (CNPS) and NatureServe, the classification team, have completed the vegetation classification for these four park units, with field keys and descriptions of the vegetation types developed at the alliance level per the U.S. National Vegetation Classification (USNVC). We have compiled approximately 9,000 existing and new vegetation data records into digital databases in Microsoft Access. The resulting classification and descriptions include approximately 105 alliances and landform types, and over 240 associations. CNPS also has assisted the mapping teams during map reconnaissance visits, follow-up on interpreting vegetation patterns, and general support for the geospatial vegetation maps being produced. A variety of alliances and associations occur in the four park units. Per park, the classification represents approximately 50 alliances at LAKE, 65 at MOJA and CAMO, and 85 at DEVA. Several riparian alliances or associations that are somewhat rare (ranked globally as G3) include shrublands of Pluchea sericea, meadow associations with Distichlis spicata and Juncus cooperi, and woodland associations of Salix laevigata and Prosopis pubescens along playas, streams, and springs. Other rare to somewhat rare types (G2 to G3) include shrubland stands with Eriogonum heermannii, Buddleja utahensis, Mortonia utahensis, and Salvia funerea on rocky calcareous slopes that occur sporadically in LAKE to MOJA and DEVA. Types that are globally rare (G1) include the associations of Swallenia alexandrae on sand dunes and Hecastocleis shockleyi on rocky calcareous slopes in DEVA. Two USNVC vegetation groups hold the highest number of alliances: 1) Warm Semi-Desert Shrub &amp; Herb Dry Wash &amp; Colluvial Slope Group (G541) has nine alliances, and 2) Mojave Mid-Elevation Mixed Desert Scrub Group (G296) has thirteen alliances. These two groups contribute significantly to the diversity of vegetation along alluvial washes and mid-elevation transition zones.
APA, Harvard, Vancouver, ISO, and other styles
10

Bekkerman, Ron, Koji Eguchi, and James Allan. Unsupervised Non-topical Classification of Documents. Defense Technical Information Center, 2006. http://dx.doi.org/10.21236/ada478733.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!