Academic literature on the topic 'Working with a text document'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Working with a text document.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Working with a text document"

1

Patil, Harsha, and Ramjeevan Singh Thakur. "A semantic approach for text document clustering using frequent itemsets and WordNet." International Journal of Engineering & Technology 7, no. 2.18 (2018): 102. http://dx.doi.org/10.14419/ijet.v7i2.9.10220.

Full text
Abstract:
Document Clustering is an unsupervised method for classified documents in clusters on the basis of their similarity. Any document get it place in any specific cluster, on the basis of membership score, which calculated through membership function. But many of the traditional clustering algorithms are generally based on only BOW (Bag of Words), which ignores the semantic similarity between document and Cluster. In this research we consider the semantic association between cluster and text document during the calculation of membership score of any document for any specific cluster. Several researchers are working on semantic aspects of document clustering to develop clustering performance. Many external knowledge bases like WordNet, Wikipedia, Lucene etc. are utilized for this purpose. The proposed approach exploits WordNet to improve cluster member ship function. The experimental result shows that clustering quality improved significantly by using proposed framework of semantic approach.
APA, Harvard, Vancouver, ISO, and other styles
2

Ülker, Mehtap, and A. Bedri Özer. "The Bart-based Model for Scientific Articles Summarization." JUCS - Journal of Universal Computer Science 30, no. (13) (2024): 1807–28. https://doi.org/10.3897/jucs.115121.

Full text
Abstract:
With the development of deep learning techniques, many models have been proposed for abstractive text summarization. However, the problem of summarizing source documents while preserving their integrity persists due to token restrictions and the inability to adequately extract semantic word relations between different sentences. To overcome this problem, a fine-tuning BART-based model was proposed, which generates a scientific summary by selecting important words contained in the text in the input document. The input text consists of terminology and keywords from the source document. The proposed model is based on the working principle of graph-based methods. Thus, the proposed model can summarize the source document with as few words as possible that are relevant to the content. The proposed model was compared with baseline models and the results of human evaluation. The experimental results demonstrate that the proposed model outperforms the baseline methods with a 37.60 ROUGE-L score.
APA, Harvard, Vancouver, ISO, and other styles
3

Zagorodnikov, Mikhail Viktorovich, and Andrey Anatolevich Mikhaylov. "Recovering Text Layer from PDF Documents with Complex Background." Proceedings of the Institute for System Programming of the RAS 36, no. 3 (2024): 189–202. http://dx.doi.org/10.15514/ispras-2024-36(3)-13.

Full text
Abstract:
The article considers PDF as a tool for storing and transferring documents. Special attention is paid to the problem of converting data from PDF back to its original format. The relevance of the study is due to the widespread use of PDF in electronic document management of modern organizations. However, despite the convenience of using PDF, extracting information from such documents can be difficult due to the peculiarities of information storage in the format and the lack of effective tools for reverse conversion. The paper proposes a solution based on the analysis of the text information from the output stream of the PDF format. This allows automatic recognition of text in PDF documents, even if they contain non-standard fonts, complex backgrounds, or damaged encoding. The research is of interest to specialists in the field of electronic document management, as well as software developers involved in creating tools for working with PDF.
APA, Harvard, Vancouver, ISO, and other styles
4

Hasan, Ismael, Javier Parapar, and Álvaro Barreiro. "Improving the Extraction of Text in PDFs by Simulating the Human Reading Order." JUCS - Journal of Universal Computer Science 18, no. (5) (2012): 623–49. https://doi.org/10.3217/jucs-018-05-0623.

Full text
Abstract:
Text preprocessing and segmentation are critical tasks in search and text mining applications. Due to the huge amount of documents that are exclusively presented in PDF format, most of the Data Mining (DM) and Information Retrieval (IR) systems must extract content from the PDF files. In some occasions this is a difficult task: the result of the extraction process from a PDF file is plain text, and it should be returned in the same order as a human would read the original PDF file. However, current tools for PDF text extraction fail in this objective when working with complex documents with multiple columns. For instance, this is the case of official government bulletins with legal information. In this task, it is mandatory to get correct and ordered text as a result of the application of the PDF extractor. It is very usual that a legal article in a document refers to a previous article and they should be offered in the right sequential order. To overcome these difficulties we have designed a new method for extraction of text in PDFs that simulates the human reading order. We evaluated our method and compared it against other PDF extraction tools and algorithms. Evaluation of our approach shows that it significantly outperforms the results of the existing tools and algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Bakar, Abu, Raheem Sarwar, Saeed-Ul Hassan, and Raheel Nawaz. "Extracting Algorithmic Complexity in Scientific Literature for Advance Searching." Journal of Computational and Applied Linguistics 1 (July 18, 2023): 39–65. https://doi.org/10.33919/jcal.23.1.2.

Full text
Abstract:
Non-textual document elements such as charts, diagrams, algorithms and tables play an important role to present key information in scientific documents. Recent advances in information retrieval systems tap this information to answer more complex user queries by mining text pertaining to non-textual document elements from full text. Algorithms are critically important in computer science. Researchers are working on existing algorithms to improve them for critical application. Moreover, new algorithms for unsolved and newly faced problems are under development. These enhanced and new algorithms are mostly published in scholarly documents. The complexity of these algorithms is also discussed in the same document by the authors. Complexity of an algorithm is also an important factor for information retrieval (IR) systems. In this paper, we mine the relevant complexities of algorithms from full text document by comparing the metadata of the algorithm, such as caption and function name, with the context of the paragraph in which complexity related discussion is made by the authors. Using the dataset of 256 documents downloaded from CiteSeerX repository, we manually annotate 417 links between algorithms and their complexities. Further, we apply our novel rule-based approach that identifies the desired links with 81% precision, 75% recall, 78% F1-score and 65% accuracy. Overall, our method of identifying the links has potential to improve information retrieval systems that tap the advancements of full text and more specifically non-textual document elements.
APA, Harvard, Vancouver, ISO, and other styles
6

Ülker, Mehtap, and A. Bedri Özer. "The Bart-based Model for Scientific Articles Summarization." JUCS - Journal of Universal Computer Science 30, no. 13 (2024): 1807–28. https://doi.org/10.3897/jucs.115121.

Full text
Abstract:
With the development of deep learning techniques, many models have been proposed for abstractive text summarization. However, the problem of summarizing source documents while preserving their integrity persists due to token restrictions and the inability to adequately extract semantic word relations between different sentences. To overcome this problem, a fine-tuning BART-based model was proposed, which generates a scientific summary by selecting important words contained in the text in the input document. The input text consists of terminology and keywords from the source document. The proposed model is based on the working principle of graph-based methods. Thus, the proposed model can summarize the source document with as few words as possible that are relevant to the content. The proposed model was compared with baseline models and the results of human evaluation. The experimental results demonstrate that the proposed model outperforms the baseline methods with a 37.60 ROUGE-L score.
APA, Harvard, Vancouver, ISO, and other styles
7

Hahnel, Carolin, Cornelia Schoor, Ulf Kroehne, Frank Goldhammer, Nina Mahlow, and Cordula Artelt. "The role of cognitive load in university students' comprehension of multiple documents." Zeitschrift für Pädagogische Psychologie 33, no. 2 (2019): 105–18. http://dx.doi.org/10.1024/1010-0652/a000238.

Full text
Abstract:
Abstract. The study investigates the cognitive load of students working on tasks that require the comprehension of multiple documents (Multiple Document Comprehension, MDC). In a sample of 310 students, perceived task difficulty (PD) and mental effort (ME) were examined in terms of task characteristics, individual characteristics, and students' processing behavior. Moreover, it was investigated if PD and ME can still contribute to MDC while controlling for these variables. The perceived difficulty of the task was shown to be related to the number of documents, text length, study level, and sourcing. Mental effort was predicted by text length, study level, and processing time. When including these variables as covariates, cognitive load was incrementally predictive of MDC. The results are discussed in terms of how working memory resources can shape the process of comprehending multiple documents.
APA, Harvard, Vancouver, ISO, and other styles
8

Javed, Hira, Nadeem Akhtar, and M. M. Sufyan Beg. "Multimodal news document summarization." Journal of Information and Optimization Sciences 45, no. 4 (2024): 959–68. http://dx.doi.org/10.47974/jios-1619.

Full text
Abstract:
With the increase in multimedia content, the domain of multimodal processing is experiencing constant growth. The question of whether combining these modalities is beneficial may come up. In this work, we investigate this by working on multi-modal content for obtaining quality summaries. We have conducted several experiments on the extractive summarization process employing asynchronous text, audio, image,and video. Information present in the multimedia content has been leveraged to bridge the semantic gaps between different modes. Vision Transformers and BERT have been used for the imagematching and similarity-checking tasks. Furthermore, audio transcriptions have been used for incorporating the audio information in the summaries. The obtained news summaries have been evaluated with Rouge Score and a comparative analysis has been done.
APA, Harvard, Vancouver, ISO, and other styles
9

K. Adi Narayana Reddy. "Multi-Document Summarization using Discrete Bat Optimization." Journal of Electrical Systems 20, no. 7s (2024): 831–42. http://dx.doi.org/10.52783/jes.3457.

Full text
Abstract:
With the World Wide Web, we now have a wide range of data that was previously unavailable. Therefore, it has become a complex problem to find useful information in large datasets. In recent years, text summarization has emerged as a viable option for mining relevant data from massive collections of texts. We may classify summarizing as either "single document" or "multi document" depending on how many source documents we are working with. Finding an accurate summary from a collection of documents is more difficult for researchers than doing it from a single document. For this reason, this research proposes a Discrete Bat algorithm Optimization based multi document summarizer (DBAT-MDS) to tackle the issue of multi document summarizing. Comparisons are made between the proposed DBAT-MDS based model and three different summarization algorithms that take their inspiration from the natural world. All methods are evaluated in relation to the benchmark Document Understanding Conference (DUC) datasets using a variety of criteria, such as the ROUGE score and the F score. Compared to the other summarizers used in the experiment, the suggested method performs much better.
APA, Harvard, Vancouver, ISO, and other styles
10

Ovchinnikova, Irina Germanovna. "Working on Сomputer-Assisted Translation platforms: New advantages and new mistakes". Russian Journal of Linguistics 23, № 2 (2019): 544–61. http://dx.doi.org/10.22363/2312-9182-2019-23-2-544-561.

Full text
Abstract:
The paper presents analysis of errors in translation on the CAT platform Smartcat, which accumulates all tools for computer-assisted translation (CAT) including a machine translation (MT) system and translation memory (TM). The research is conducted on the material of the translation on Smartcat platform (a joint project of a tourist guide translation (35,000 words) from Hebrew to Russian, English, and French). The errors on the CAT platform disclose difficulties in mastering text semantic coherence and stylistic features. The influence of English as lingua franca appears in peculiar orthographic and punctuation errors in the target text in Russian. Peculiar errors in translation on the CAT platform reveal the necessity of advanced technological competence in translators. The peculiar errors uncover problems associated with a source text segmentation into sentences. The segmentation can trigger a translator to preserve the sentence boundaries and use a Russian complicated compound sentence that provoke punctuation errors. Difficulties of the anaphora resolution in distant semantically coherent segments are also associated with the source text segmentation and working window formatting. A joint project presupposes different translators to translate different files of the source document. To generate the coherence, contiguity and integrity of the whole document, the files have to be revised by a third-party editor to avoid conflict of interest. The editor-reviser is also responsible for improving the target text pragmatic and genre characteristics while applying top-down strategy to target text analysis. Thus, the translator’s errors while applying CAT tools reveal the effect of bottom-up text processing alongside with cross-language interference.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Working with a text document"

1

Tomitch, Leda Maria Braga. "Reading : text organization perception and working memory capacity." reponame:Repositório Institucional da UFSC, 1995. https://repositorio.ufsc.br/xmlui/handle/123456789/157902.

Full text
Abstract:
Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão<br>Made available in DSpace on 2016-01-08T19:27:04Z (GMT). No. of bitstreams: 1 100378.pdf: 5795821 bytes, checksum: ce7fb8a8bb4dc4220ab9113caa8e2d55 (MD5) Previous issue date: 1995<br>Análise do processamento de leitores mais proficientes e menos proficientes durante a leitura de textos completos e incompletos organizados em termos de Problema/Solução (Hoey, 1979) e Predição (Tadros, 1985). O argumento principal é que leitores mais proficientes são mais capazes de perceber os aspectos de organização textual e fazem uso desses aspectos para organizar o fluxo de informação durante a leitura, desta forma não sobrecarregando a memória operacional. Dois experimentos são conduzidos. O primeiro investiga a correlação entre a capacidade da memória operacional, e a compreensão em leitura. O segundo investiga o uso de aspectos textuais, por leitores mais e menos proficientes. No primeiro experimento, os leitores foram divididos em dois grupos: mais proficientes e menos proficientes, de acordo com a média dos resultados obtidos nas tarefas de compreensão. No segundo experimento, os sujeitos leram cinco textos: 'problema/solução completo', 'predição completo', 'sem solução', 'sem problema' e 'predição distorcida'. Em relação ao primeiro experimento, correlações significativas foram encontradas entre a capacidade da memória operacional e as tarefas de compreensão. Em relação ao segundo experimento, os resultados indicaram que os leitores mais proficientes, também com maior capacidade de memória, foram mais capazes de fazer uso dos aspectos de organização textual do que os leitores menos proficientes, também com menor capacidade de memória. O presente estudo indica que a eficiência no processamento é um componente importante na relação entre a capacidade da memória operacional e a leitura.
APA, Harvard, Vancouver, ISO, and other styles
2

El-Haj, Mahmoud. "Multi-document Arabic text summarisation." Thesis, University of Essex, 2012. http://eprints.lancs.ac.uk/71279/.

Full text
Abstract:
Multi-document summarisation is the process of producing a single summary of a collection of related documents. Much of the current work on multi-document text summarisation is concerned with the English language; relevant resources are numerous and readily available. These resources include human generated (gold-standard) and automatic summaries. Arabic multi-document summarisation is still in its infancy. One of the obstacles to progress is the limited availability of Arabic resources to support this research. When we started our research there were no publicly available Arabic multi-document gold-standard summaries, which are needed to automatically evaluate system generated summaries. The Document Understanding Conference (DUC) and Text Analysis Conference (TAC) at that time provided resources such as gold-standard extractive and abstractive summaries (both human and system generated) that were only available in English. Our aim was to push forward the state-of-the-art in Arabic multi-document summarisation. This required advancements in at least two areas. The first area was the creation of Arabic test collections. The second area was concerned with the actual summarisation process to find methods that improve the quality of Arabic summaries. To address both points we created single and multi-document Arabic test collections both automatically and manually using a commonly used English dataset and by having human participants. We developed extractive language dependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Arabic multi-document summarisation. We succeeded in including Arabic in one of the leading summarisation conferences the Text Analysis Conference (TAC). Researchers on Arabic multi-document summarisation now have resources and tools that can be used to advance the research in this field.
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Yanjun. "High Performance Text Document Clustering." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1181005422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sendur, Zeynel. "Text Document Categorization by Machine Learning." Scholarly Repository, 2008. http://scholarlyrepository.miami.edu/oa_theses/209.

Full text
Abstract:
Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.
APA, Harvard, Vancouver, ISO, and other styles
5

Cripwell, Liam. "Controllable and Document-Level Text Simplification." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0186.

Full text
Abstract:
La simplification de texte est une tâche qui consiste à réécrire un texte pour le rendre plus facile à lire et à comprendre pour un public plus large, tout en exprimant toujours le même sens fondamental. Cela présente des avantages potentiels pour certains utilisateurs (par exemple, les locuteurs non natifs, les enfants, les personnes ayant des difficultés de lecture), tout en étant prometteur en tant qu'étape de prétraitement pour les tâches de Traitement Automatique des Langues (TAL) en aval. Les progrès récents dans les modèles génératifs neuronaux ont conduit au développement de systèmes capables de produire des sorties très fluides. Cependant, étant donné la nature de "boîte noire" (black box) de ces systèmes de bout en bout, l'utilisation de corpus d'entraînement pour apprendre implicitement comment effectuer les opérations de réécriture nécessaires. Dans le cas de la simplification, ces ensembles de données comportent des limitation en termes à la fois de quantité et de qualité, la plupart des corpus étant soit très petits, soit construits automatiquement, soit soumis à des licences d'utilisation strictes. En conséquence, de nombreux systèmes ont tendance à être trop conservateurs, n'apportant souvent aucune modification au texte original ou se limitant à la paraphrase de courtes séquences de mots sans modifications structurelles substantielles. En outre, la plupart des travaux existants sur la simplification du texte se limitent aux entrées au niveau de la phrase, les tentatives d'application itérative de ces approches à la simplification au niveau du document ne parviennent en effet souvent pas à préserver de manière cohérente la structure du discours du document. Ceci est problématique, car la plupart des applications réelles de simplification de texte concernent des documents entiers. Dans cette thèse, nous étudions des stratégies pour atténuer la conservativité des systèmes de simplification tout en favorisant une gamme plus diversifiée de types de transformation. Cela implique la création de nouveaux ensembles de données contenant des instances d'opérations sous-représentées et la mise en œuvre de systèmes contrôlables capables d'être adaptés à des transformations spécifiques et à différents niveaux de simplicité. Nous étendons ensuite ces stratégies à la simplification au niveau du document, en proposant des systèmes capables de prendre en compte le contexte du document environnant. Nous développons également des techniques de contrôlabilité permettant de planifier les opérations à effectuer, à l'avance et au niveau de la phrase. Nous montrons que ces techniques permettent à la fois des performances élevées et une évolutivité des modèles de simplification<br>Text simplification is a task that involves rewriting a text to make it easier to read and understand for a wider audience, while still expressing the same core meaning. This has potential benefits for disadvantaged end-users (e.g. non-native speakers, children, the reading impaired), while also showing promise as a preprocessing step for downstream NLP tasks. Recent advancement in neural generative models have led to the development of systems that are capable of producing highly fluent outputs. However, these end-to-end systems often rely on training corpora to implicitly learn how to perform the necessary rewrite operations. In the case of simplification, these datasets are lacking in both quantity and quality, with most corpora either being very small, automatically constructed, or subject to strict licensing agreements. As a result, many systems tend to be overly conservative, often making no changes to the original text or being limited to the paraphrasing of short word sequences without substantial structural modifications. Furthermore, most existing work on text simplification is limited to sentence-level inputs, with attempts to iteratively apply these approaches to document-level simplification failing to coherently preserve the discourse structure of the document. This is problematic, as most real-world applications of text simplification concern document-level texts. In this thesis, we investigate strategies for mitigating the conservativity of simplification systems while promoting a more diverse range of transformation types. This involves the creation of new datasets containing instances of under-represented operations and the implementation of controllable systems capable of being tailored towards specific transformations and simplicity levels. We later extend these strategies to document-level simplification, proposing systems that are able to consider surrounding document context and use similar controllability techniques to plan which sentence-level operations to perform ahead of time, allowing for both high performance and scalability. Finally, we analyze current evaluation processes and propose new strategies that can be used to better evaluate both controllable and document-level simplification systems
APA, Harvard, Vancouver, ISO, and other styles
6

Linhares, Pontes Elvys. "Compressive Cross-Language Text Summarization." Thesis, Avignon, 2018. http://www.theses.fr/2018AVIG0232/document.

Full text
Abstract:
La popularisation des réseaux sociaux et des documents numériques a rapidement accru l'information disponible sur Internet. Cependant, cette quantité massive de données ne peut pas être analysée manuellement. Parmi les applications existantes du Traitement Automatique du Langage Naturel (TALN), nous nous intéressons dans cette thèse au résumé cross-lingue de texte, autrement dit à la production de résumés dans une langue différente de celle des documents sources. Nous analysons également d'autres tâches du TALN (la représentation des mots, la similarité sémantique ou encore la compression de phrases et de groupes de phrases) pour générer des résumés cross-lingues plus stables et informatifs. La plupart des applications du TALN, celle du résumé automatique y compris, utilisent une mesure de similarité pour analyser et comparer le sens des mots, des séquences de mots, des phrases et des textes. L’une des façons d'analyser cette similarité est de générer une représentation de ces phrases tenant compte de leur contenu. Le sens des phrases est défini par plusieurs éléments, tels que le contexte des mots et des expressions, l'ordre des mots et les informations précédentes. Des mesures simples, comme la mesure cosinus et la distance euclidienne, fournissent une mesure de similarité entre deux phrases. Néanmoins, elles n'analysent pas l'ordre des mots ou les séquences de mots. En analysant ces problèmes, nous proposons un modèle de réseau de neurones combinant des réseaux de neurones récurrents et convolutifs pour estimer la similarité sémantique d'une paire de phrases (ou de textes) en fonction des contextes locaux et généraux des mots. Sur le jeu de données analysé, notre modèle a prédit de meilleurs scores de similarité que les systèmes de base en analysant mieux le sens local et général des mots mais aussi des expressions multimots. Afin d'éliminer les redondances et les informations non pertinentes de phrases similaires, nous proposons de plus une nouvelle méthode de compression multiphrase, fusionnant des phrases au contenu similaire en compressions courtes. Pour ce faire, nous modélisons des groupes de phrases semblables par des graphes de mots. Ensuite, nous appliquons un modèle de programmation linéaire en nombres entiers qui guide la compression de ces groupes à partir d'une liste de mots-clés ; nous cherchons ainsi un chemin dans le graphe de mots qui a une bonne cohésion et qui contient le maximum de mots-clés. Notre approche surpasse les systèmes de base en générant des compressions plus informatives et plus correctes pour les langues française, portugaise et espagnole. Enfin, nous combinons les méthodes précédentes pour construire un système de résumé de texte cross-lingue. Notre système génère des résumés cross-lingue de texte en analysant l'information à la fois dans les langues source et cible, afin d’identifier les phrases les plus pertinentes. Inspirés par les méthodes de résumé de texte par compression en analyse monolingue, nous adaptons notre méthode de compression multiphrase pour ce problème afin de ne conserver que l'information principale. Notre système s'avère être performant pour compresser l'information redondante et pour préserver l'information pertinente, en améliorant les scores d'informativité sans perdre la qualité grammaticale des résumés cross-lingues du français vers l'anglais. En analysant les résumés cross-lingues depuis l’anglais, le français, le portugais ou l’espagnol, vers l’anglais ou le français, notre système améliore les systèmes par extraction de l'état de l'art pour toutes ces langues. En outre, une expérience complémentaire menée sur des transcriptions automatiques de vidéo montre que notre approche permet là encore d'obtenir des scores ROUGE meilleurs et plus stables, même pour ces documents qui présentent des erreurs grammaticales et des informations inexactes ou manquantes<br>The popularization of social networks and digital documents increased quickly the informationavailable on the Internet. However, this huge amount of data cannot be analyzedmanually. Natural Language Processing (NLP) analyzes the interactions betweencomputers and human languages in order to process and to analyze natural languagedata. NLP techniques incorporate a variety of methods, including linguistics, semanticsand statistics to extract entities, relationships and understand a document. Amongseveral NLP applications, we are interested, in this thesis, in the cross-language textsummarization which produces a summary in a language different from the languageof the source documents. We also analyzed other NLP tasks (word encoding representation,semantic similarity, sentence and multi-sentence compression) to generate morestable and informative cross-lingual summaries.Most of NLP applications (including all types of text summarization) use a kind ofsimilarity measure to analyze and to compare the meaning of words, chunks, sentencesand texts in their approaches. A way to analyze this similarity is to generate a representationfor these sentences that contains the meaning of them. The meaning of sentencesis defined by several elements, such as the context of words and expressions, the orderof words and the previous information. Simple metrics, such as cosine metric andEuclidean distance, provide a measure of similarity between two sentences; however,they do not analyze the order of words or multi-words. Analyzing these problems,we propose a neural network model that combines recurrent and convolutional neuralnetworks to estimate the semantic similarity of a pair of sentences (or texts) based onthe local and general contexts of words. Our model predicted better similarity scoresthan baselines by analyzing better the local and the general meanings of words andmulti-word expressions.In order to remove redundancies and non-relevant information of similar sentences,we propose a multi-sentence compression method that compresses similar sentencesby fusing them in correct and short compressions that contain the main information ofthese similar sentences. We model clusters of similar sentences as word graphs. Then,we apply an integer linear programming model that guides the compression of theseclusters based on a list of keywords. We look for a path in the word graph that has goodcohesion and contains the maximum of keywords. Our approach outperformed baselinesby generating more informative and correct compressions for French, Portugueseand Spanish languages. Finally, we combine these previous methods to build a cross-language text summarizationsystem. Our system is an {English, French, Portuguese, Spanish}-to-{English,French} cross-language text summarization framework that analyzes the informationin both languages to identify the most relevant sentences. Inspired by the compressivetext summarization methods in monolingual analysis, we adapt our multi-sentencecompression method for this problem to just keep the main information. Our systemproves to be a good alternative to compress redundant information and to preserve relevantinformation. Our system improves informativeness scores without losing grammaticalquality for French-to-English cross-lingual summaries. Analyzing {English,French, Portuguese, Spanish}-to-{English, French} cross-lingual summaries, our systemsignificantly outperforms extractive baselines in the state of the art for all these languages.In addition, we analyze the cross-language text summarization of transcriptdocuments. Our approach achieved better and more stable scores even for these documentsthat have grammatical errors and missing information
APA, Harvard, Vancouver, ISO, and other styles
7

Tran, Charles. "Intelligent document format, a text encoding scheme." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq20956.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Langiu, Alessio. "Optimal Parsing for dictionary text compression." Thesis, Paris Est, 2012. http://www.theses.fr/2012PEST1091/document.

Full text
Abstract:
Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de parsing pour transformer le texte d'entrée en une séquence de phrases du dictionnaire. Etant donné un texte, un tel processus n'est généralement pas unique et, pour comprimer, il est logique de trouver, parmi les parsing possibles, celui qui minimise le plus le taux de compression finale. C'est ce qu'on appelle le problème du parsing. Un parsing optimal est une stratégie de parsing ou un algorithme de parsing qui résout ce problème en tenant compte de toutes les contraintes d'un algorithme de compression ou d'une classe d'algorithmes de compression homogène. Les contraintes de l'algorithme de compression sont, par exemple, le dictionnaire lui-même, c'est-à-dire l'ensemble dynamique de phrases disponibles, et combien une phrase pèse sur le texte comprimé, c'est-à-dire quelle est la longueur du mot de code qui représente la phrase, appelée aussi le coût du codage d'un pointeur de dictionnaire. En plus de 30 ans d'histoire de la compression de texte par dictionnaire, une grande quantité d'algorithmes, de variantes et d'extensions sont apparus. Cependant, alors qu'une telle approche de la compression du texte est devenue l'une des plus appréciées et utilisées dans presque tous les processus de stockage et de communication, seuls quelques algorithmes de parsing optimaux ont été présentés. Beaucoup d'algorithmes de compression manquent encore d'optimalité pour leur parsing, ou du moins de la preuve de l'optimalité. Cela se produit parce qu'il n'y a pas un modèle général pour le problème de parsing qui inclut tous les algorithmes par dictionnaire et parce que les parsing optimaux existants travaillent sous des hypothèses trop restrictives. Ce travail focalise sur le problème de parsing et présente à la fois un modèle général pour la compression des textes basée sur les dictionnaires appelé la théorie Dictionary-Symbolwise et un algorithme général de parsing qui a été prouvé être optimal sous certaines hypothèses réalistes. Cet algorithme est appelé Dictionary-Symbolwise Flexible Parsing et couvre pratiquement tous les cas des algorithmes de compression de texte basés sur dictionnaire ainsi que la grande classe de leurs variantes où le texte est décomposé en une séquence de symboles et de phrases du dictionnaire. Dans ce travail, nous avons aussi considéré le cas d'un mélange libre d'un compresseur par dictionnaire et d'un compresseur symbolwise. Notre Dictionary-Symbolwise Flexible Parsing couvre également ce cas-ci. Nous avons bien un algorithme de parsing optimal dans le cas de compression Dictionary-Symbolwise où le dictionnaire est fermé par préfixe et le coût d'encodage des pointeurs du dictionnaire est variable. Le compresseur symbolwise est un compresseur symbolwise classique qui fonctionne en temps linéaire, comme le sont de nombreux codeurs communs à longueur variable. Notre algorithme fonctionne sous l'hypothèse qu'un graphe spécial, qui sera décrit par la suite, soit bien défini. Même si cette condition n'est pas remplie, il est possible d'utiliser la même méthode pour obtenir des parsing presque optimaux. Dans le détail, lorsque le dictionnaire est comme LZ78, nous montrons comment mettre en œuvre notre algorithme en temps linéaire. Lorsque le dictionnaire est comme LZ77 notre algorithme peut être mis en œuvre en temps O (n log n) où n est le longueur du texte. Dans les deux cas, la complexité en espace est O (n). Même si l'objectif principal de ce travail est de nature théorique, des résultats expérimentaux seront présentés pour souligner certains effets pratiques de l'optimalité du parsing sur les performances de compression et quelques résultats expérimentaux plus détaillés sont mis dans une annexe appropriée<br>Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimizes the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weight on the compressed text, i.e. the length of the codeword that represent such phrase also denoted as the cost of a dictionary pointer encoding. In more than 30th years of history of dictionary based text compression, while plenty of algorithms, variants and extensions appeared and while such approach to text compression become one of the most appreciated and utilized in almost all the storage and communication process, only few optimal parsing algorithms was presented. Many compression algorithms still leaks optimality of their parsing or, at least, proof of optimality. This happens because there is not a general model of the parsing problem that includes all the dictionary based algorithms and because the existing optimal parsings work under too restrictive hypothesis. This work focus on the parsing problem and presents both a general model for dictionary based text compression called Dictionary-Symbolwise theory and a general parsing algorithm that is proved to be optimal under some realistic hypothesis. This algorithm is called Dictionary-Symbolwise Flexible Parsing and it covers almost all the cases of dictionary based text compression algorithms together with the large class of their variants where the text is decomposed in a sequence of symbols and dictionary phrases.In this work we further consider the case of a free mixture of a dictionary compressor and a symbolwise compressor. Our Dictionary-Symbolwise Flexible Parsing covers also this case. We have indeed an optimal parsing algorithm in the case of dictionary-symbolwise compression where the dictionary is prefix closed and the cost of encoding dictionary pointer is variable. The symbolwise compressor is any classical one that works in linear time, as many common variable-length encoders do. Our algorithm works under the assumption that a special graph that will be described in the following, is well defined. Even if this condition is not satisfied it is possible to use the same method to obtain almost optimal parses. In detail, when the dictionary is LZ78-like, we show how to implement our algorithm in linear time. When the dictionary is LZ77-like our algorithm can be implemented in time O(n log n). Both have O(n) space complexity. Even if the main aim of this work is of theoretical nature, some experimental results will be introduced to underline some practical effects of the parsing optimality in compression performance and some more detailed experiments are hosted in a devoted appendix
APA, Harvard, Vancouver, ISO, and other styles
9

Cankaya, Zeynep. "Influence of working memory capacity and reading purpose on young readers' text comprehension." Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=19247.

Full text
Abstract:
Reading comprehension processes are assumed to be influenced by reading purpose and working memory capacity (WMC). However, it is still unknown how these factors affect comprehension processes in young readers. The aim of this study was to explore whether cognitive processes varied as a function of reading purpose (test versus game) and WMC (high versus low) in young readers. The 39 participants completed the Working Memory Test Battery for Children (WMTB-C), a verbal protocol and a free-recall task. Separate ANOVAs on cognitive processes response categories detected medium effect sizes. In the free recall task, test condition readers exhibited more paraphrasing and recalled more idea units than readers in the game condition. In the verbal protocol task, readers in the game condition uttered more evaluative comments than in the test condition. Furthermore, low WMC readers produced more predictive inferences than the high WMC group. Possible contributions of reading purpose and WMC to text comprehension for educational practice were discussed.<br>Les processus cognitifs impliqués dans la compréhension de textes sont influencés par le but de la lecture et la capacité de mémoire de travail (CMT). Toutefois, nous ignorons toujours comment ces facteurs influencent la lecture chez les jeunes lecteurs. Le but de cette étude était de vérifier si les processus cognitifs varient en fonction du but de la lecture (test versus jeu) et de la capacité de la mémoire de travail (faible versus élevée) chez les jeunes enfants. Les trente-neuf participants de l'étude ont complété le Working Memory Test Battery for Children (WMTB-C), un protocole verbal et une tâche de rappel libre. Les analyses statistiques comparant les différentes catégories de processus cognitifs ont révélé des effets de taille moyenne. Pour le rappel libre, les lecteurs ont paraphrasé davantage et ont mémorisé plus de groupes d'idées dans la condition test que la condition jeu. Lors du protocole verbal, les lecteurs de la condition jeu ont fait plus de commentaires évaluatifs que dans la condition test. Finalement, les enfants ayant une CMT plus faible ont prononcé plus d'inférences de prédiction que ceux ayant une CMT plus élevée. La contribution des processus cognitifs et de la CMT à la compréhension de lecture dans un contexte éducatif fut considérée.
APA, Harvard, Vancouver, ISO, and other styles
10

Bouayad-Agha, Nadjet. "The role of document structure in text generation." Thesis, University of Brighton, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.366234.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Working with a text document"

1

Cavanaugh, Sean. Digital type design guide: The page designer's guide to working with type. Hayden Books, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Expo '98 (1998 Lisbon, Portugal). Working document. Expo'98, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Centre, Dudley Mathematics, ed. Calculator guide: Working document. Dudley Mathematics Centre, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Trent Regional Health Authority. Cancer Registration Bureau., ed. Cancer mortality 1986: Working document. Cancer Registration Bureau, Trent Regional Health Authority, 1988.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Great Britain. Management and Personnel Office. Working patterns: A study document. Cabinet Office, Management and Personnel Office, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

European Community Action Programme: Transition from Education to Working Life., ed. Youth information 1985: Working document. European Community Action Programme, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Swaziland. First draft forest policy: Working document. Forestry Section, Ministry of Agriculture and Cooperatives, Forest Policy and Legislation Project, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Papua New Guinea Forest Authority., General Woods & Veneers Consultants International Ltd., Nawitka Resource Consultants Canada, and Papua New Guinea. National Forest Service., eds. Forest industries development studies: Working document. General Woods & Veneers Consultants International, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Organisation for Economic Co-operation and Development. Working Group on Accounting Standards., ed. Availability of financial statements: Working document. Organisation for Economic Co-operation and Development, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

European Community Action Programme: Transition from Education to Working Life., ed. Guidance and the school: Working document. European Community Action Programme, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Working with a text document"

1

Mortelmans, Dimitri. "Working with Memos." In Springer Texts in Social Sciences. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-66014-6_7.

Full text
Abstract:
AbstractMemos are personal instruments that a qualitative researcher generally uses for themself or in team research, whereby ideas are written down when they arise during the analytical process. Memos enhance the analytical depth of qualitative research by allowing researchers to record insights, reflect on data, and document the analysis process. NVivo contains two memoing tools discussed in this chapter: memos and annotations. The creation of memos and annotations is demonstrated, and the connection of memos to data is shown with the memo link.
APA, Harvard, Vancouver, ISO, and other styles
2

Atkinson-Abutridy, John. "Document Representation." In Text Analytics. Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003280996-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Atkinson-Abutridy, John. "Document Clustering." In Text Analytics. Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003280996-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Atkinson-Abutridy, John. "Document Categorization." In Text Analytics. Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003280996-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Anandarajan, Murugan, Chelsey Hill, and Thomas Nolan. "Term-Document Representation." In Practical Text Analytics. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-95663-3_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Torres-Moreno, Juan-Manuel. "Single-Document Summarization." In Automatic Text Summarization. John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Torres-Moreno, Juan-Manuel. "Evaluating Document Summaries." In Automatic Text Summarization. John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Banchs, Rafael E. "Document Categorization." In Text Mining with MATLAB®. Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4151-9_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Banchs, Rafael E. "Document Search." In Text Mining with MATLAB®. Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4151-9_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Banchs, Rafael E. "Document Search." In Text Mining with MATLAB®. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87695-1_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Working with a text document"

1

Bharadwaj, Tushar, Swarit Ajay, and Sartaj Ahmad. "Harnessing Text Analysis for Automated Document Understanding." In 2024 Second International Conference on Advances in Information Technology (ICAIT). IEEE, 2024. http://dx.doi.org/10.1109/icait61638.2024.10690743.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kovyazina, Elena. "Working with digital publications: Problems and solutions." In The Book. Culture. Education. Innovations. Russian National Public Library for Science and Technology, 2020. http://dx.doi.org/10.33186/978-5-85638-223-4-2020-117-121.

Full text
Abstract:
The full-text archive of employee publications at a research center enables to keep record of the publications and to promote them, as well as to ensure openness of research results for the global community. However, digital documents have a number of specific qualities as compared to traditional printed documents. An attempt is made to define the problems of linking and content integrity of digital documents and to find possible solutions.
APA, Harvard, Vancouver, ISO, and other styles
3

Hall, Sawyer, Calahan Mollan, Vijitashwa Pandey, and Zissimos Mourelatos. "TRIZ Mapping and Novelty Detection of Engineering Design Patents Using Machine Learning." In ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2022. http://dx.doi.org/10.1115/detc2022-89746.

Full text
Abstract:
Abstract Published resources such as technical literature and patent documents are extremely useful in engineering design and form an important input to methods such as TRIZ. Often, design engineers will investigate these resources when working on new design problems. Aside from getting technical information and even direct design solutions, they may find the design principles used in each patent document a useful design stimulus. Unfortunately, patents are not classified based on such “design useful” characterizations. Using unsupervised clustering and Latent Dirichlet Allocation, this paper investigates four hypotheses using engineering patents in informing TRIZ based design. It first investigates the optimal number of TRIZ topics present in a corpus. Using this information, it attempts to map the TRIZ methods to the individual patents using unsupervised machine learning. Both rejected and accepted patents are then tested to determine if an autoencoder can successfully differentiate between the two, just from the text of the document. The autoencoder reconstruction errors of “Vehicle Brake Control” patents are also examined for possible correlation between reconstruction error and patent citation count. Finally, by combining the TRIZ clustering and the trained autoencoder, we show that high reconstruction error patents may be harder to assign to TRIZ methods than low reconstruction error patents.
APA, Harvard, Vancouver, ISO, and other styles
4

Du Bernard, Xavier, Jonathan Gallon, and Jérôme Massot. "The Gaia Explorer, a Powerful Search Platform." In Abu Dhabi International Petroleum Exhibition & Conference. SPE, 2021. http://dx.doi.org/10.2118/207837-ms.

Full text
Abstract:
Abstract After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios. This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched. Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer. In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Shenji, Xiaofeng Fu, Minyou Ye, Jia Li, and Meijing Gong. "The Document Management of Major Scientific Projection of Fusion." In 2013 21st International Conference on Nuclear Engineering. American Society of Mechanical Engineers, 2013. http://dx.doi.org/10.1115/icone21-16853.

Full text
Abstract:
As we all know the nuclear energy is becoming more and more important in our current life, as well in the future. Considering the radiation protection, nuclear safety, security, fusion attracts more and more view of scientists compared with fission. International Thermonuclear Experimental Reactor (ITER) is the world’s biggest energy research project, aiming to prove the feasibility of fusion power as a possible source of safe, sustainable and abundant energy. China together with EU, India, Japan, the Russian Federation, South Korea and the USA are working together on this major research facility. It is an example of international scientific collaboration on an unprecedented scale that will provide the link between plasma physics and engineering and future commercial fusion-based power plants. ITER design and construction have resulted in new issue that how the seven members collaborate all over the earth. So the Document Management has become the indispensable part. This article according to the document management of ITER, as we all know, the IDM system analysis the key point of scan and search, document operating, security settings, process controlling and other information. In order to progress the document management of major scientific project of fusion for China, for example the China Fusion Engineering Test Reactor (CFETR), the first step we should learn from IDM of ITER. Then try to solve the current problem of IDM, such as the response speed, operating difficulty and so on. This paper gives a conceptual design solution for the document management system of major scientific projection like CFETR including document storage, document classification and nomenclature, access control, workflow and roles setting of the system.
APA, Harvard, Vancouver, ISO, and other styles
6

Yadav, Kalpana, Sanjay Yadav, and P. K. Dubey. "Calibration and Uncertainty Assessment of V2 Reference Standard Block." In ASNT Research Symposium 2023. The American Society for Nondestructive Testing Inc., 2023. http://dx.doi.org/10.32548/rs.2023.067.

Full text
Abstract:
In industries and laboratories, reference standard blocks (RSBs) are used to test and ascertain the accuracy of measurement equipment. Before using in practical application and testing, ultrasonic NDT instrument needs to be ascertain that it is working properly and precisely. These RSBs are manufactured and shaped according to national or international standard. In ultrasonic non-destructive testing (NDT) the RSBs may contain drilled hole, notch, and dimension in order to inspect the instrument used in the ultrasonic testing (e.g., ultrasonic flaw detector). In this paper, we are particularly aiming the study of calibration of the International Institute of Welding (IIW) V2 block. IIW V2 block is particularly responsible for the horizontal linearity and vertical sensitivity of an NDT equipment. The calibration of the IIW V2 block is performed as per technical standard document ISO 7963. In the calibration process the clauses of IS 4904 an Indian standard is also incorporated. The measurement uncertainties of various parameters also estimated in accordance with the specified documents.
APA, Harvard, Vancouver, ISO, and other styles
7

Philipps, Axel. "How to sort out uncategorisable documents for interpretive social science? On limits of currently employed text mining techniques." In CARMA 2018 - 2nd International Conference on Advanced Research Methods and Analytics. Universitat Politècnica València, 2018. http://dx.doi.org/10.4995/carma2018.2018.8301.

Full text
Abstract:
Current text mining applications statistically work on the basis of linguistic models and theories and certain parameter settings. This enables researchers to classify, group and rank a large textual corpus – a useful feature for scholars who study all forms of written text. However, these underlying conditions differ in respect to the way how interpretively-oriented social scientists approach textual data. They aim to understand the meaning of text by heuristically using known categorisations, concepts and other formal methods. More importantly, they are primarily interested in documents that are incomprehensible with our current knowledge because these documents offer a chance to formulate new empirically-grounded typifications, hypotheses, and theories. In this paper, therefore, I propose for a text mining technique with different aims and procedures. It includes a shift away from methods of grouping and clustering the whole text corpus to a process that sorts out uncategorisable documents. Such an approach will be demonstrated using a simple example. While more elaborate text mining techniques might become tools for more complex tasks, the given example just presents the essence of a possible working principle. As such, it supports social inquiries that search for and examine unfamiliar patterns and regularities.
APA, Harvard, Vancouver, ISO, and other styles
8

Jang, Hyunchul, Dae-Hyun Kim, Madhusuden Agrawal, et al. "A Joint-Industry Effort to Develop and Verify CFD Modeling Practice for Vortex-Induced Motion of a Deep-Draft Semi-Submersible." In ASME 2021 40th International Conference on Ocean, Offshore and Arctic Engineering. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/omae2021-63785.

Full text
Abstract:
Abstract Platform Vortex Induced Motion (VIM) is an important cause of fatigue damage on risers and mooring lines connected to deep-draft semi-submersible floating platforms. The VIM design criteria have been typically obtained from towing tank model testing. Recently, computational fluid dynamics (CFD) analysis has been used to assess the VIM response and to augment the understanding of physical model test results. A joint industry effort has been conducted for developing and verifying a CFD modeling practice for the semi-submersible VIM through a working group of the Reproducible Offshore CFD JIP. The objectives of the working group are to write a CFD modeling practice document based on existing practices validated for model test data, and to verify the written practice by blind calculations with five CFD practitioners acting as verifiers. This paper presents the working group’s verification process, consisting of two stages. In the initial verification stage, the verifiers independently performed free-decay tests for 3-DOF motions (surge, sway, yaw) to check if the mechanical system in the CFD model is the same as in the benchmark test. Additionally, VIM simulations were conducted at two current headings with a reduced velocity within the lock-in range, where large sway motion responses are expected,. In the final verification stage, the verifiers performed a complete set of test cases with small revisions of their CFD models based on the results from the initial verification. The VIM responses from these blind calculations are presented, showing close agreement with the model test data.
APA, Harvard, Vancouver, ISO, and other styles
9

Berbey, Pierre, Franc¸ois Hedin, and Luc VanHoenacker. "Status and Near-Term Objectives of the Works on the EUR Document." In 17th International Conference on Nuclear Engineering. ASMEDC, 2009. http://dx.doi.org/10.1115/icone17-75530.

Full text
Abstract:
In 2007–2008, the European Utility Requirements (EUR) works have been focused on volume 3 (evaluation of the available Gen 3 designs) and volume 4 (conventional island). The works on the AP1000 and AES92 subsets of the EUR volume 3 have been concluded at the end of 2007. The texts have been published and are now available for the EUR members and other utilities. The works on the EPR subset of volume 3 have resumed in 2007. A revision B is being produced for which representatives from ten EUR utilities and from Areva NP have been involved in the revision of the analysis of compliance. Meetings of the specific EUR coordination group in charge of this task have been organized every 4–5 weeks throughout 2008. The revised version of the EPR subset of the EUR volume 3 should be finalized around mid 2009. The revision C of the EUR volume 4 is now available after a thorough review has been performed within the EUR organization to make it consistent with the revision C of the EUR volume 2 published in 2001. A lot of preparatory material for a possible revision D of the EUR volumes 1 and 2 has been produced since 2002. Since important contributions are not yet available the decision to integrate this revision D is still to come. The EUR organization has kept enlarging: Energoatom, ENEL and Endesa have been welcomed as full members; CEZ and MVM are now EUR associated members. New LWR projects of potential interest for the EUR utilities are being contemplated. For instance a preliminary assessment of compliance of MHI’s APWR project has been worked out in the first months of 2008. Recently the EUR and ENISS organizations have decided to join their efforts in a collaboration scheme in which they will coordinate their positions and actions in nuclear safety with respect to the LWR Gen3 designs. The two organizations will cooperate in their relations with the other stakeholders, in particular with the IAEA and WENRA organizations. In addition EUR and CORDEL (Cooperation in Reactor Design Evaluation and Licensing), which is a WNA (World Nuclear Association) working group decided also to coordinate their efforts for the industry benefit, in relation with the MDEP (Multinational Design Evaluation Program) initiative of safety nuclear regulators.
APA, Harvard, Vancouver, ISO, and other styles
10

Keane, Michael, and Markus Hofmann. "An Investigation into Third Level Module Similarities and Link Analysis." In Third International Conference on Higher Education Advances. Universitat Politècnica València, 2017. http://dx.doi.org/10.4995/head17.2017.5528.

Full text
Abstract:
The focus of this paper is on the extraction of knowledge from data contained within the content of web pages in relation to module descriptors as published on http://courses.itb.ie delivered within the School of Business in the Institute of Technology Blanchardstown. We show an automated similarity analysis highlighting visual exploration options. Resulting from this analysis are three issues of note. Firstly, modules although coded as being different and unique to their particular programme of study indicated substantial similarity. Secondly, substantial content overlap with a lack of clear differentiation between sequential modules was identified.. Thirdly, the document similarity statistics point to the existence of modules having very high similarity scores delivered across different years across different National Framework of Qualification (NFQ) levels of different programmes. These issues can be raised within the management structure of the School of Business and disseminated to the relevant programme boards for further consideration and action. Working within a climate of constrained resources with limited numbers of academic staff and lecture theatres the potential savings outside of the obvious quality assurance benefits illustrate a practical application of how text mining can be used to elicit new knowledge and provide business intelligence to support the quality assurance and decision making process within a higher educational environment.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Working with a text document"

1

Griffiths, Rachael. Transkribus in Practice: Improving CER. Verlag der Österreichischen Akademie der Wissenschaften, 2022. http://dx.doi.org/10.1553/tibschol_erc_cog_101001002_griffiths_cer.

Full text
Abstract:
This paper documents ongoing efforts to enhance the accuracy of Handwritten Text Recognition (HTR) models using Transkribus, focusing on the transcription of Tibetan cursive (dbu med) manuscripts from the 11th to 13th centuries within the framework of the ERC-funded project, The Dawn of Tibetan Buddhist Scholasticism (11th-13th C.) (TibSchol). It presents the steps taken to improve the Character Error Rate (CER) of the HTR models, the results achieved so far, and considerations for those working on similar projects.
APA, Harvard, Vancouver, ISO, and other styles
2

Mokate, Karen Marie, and José Jorge Saavedra. Management for Social Development: An Integrated Approach to the Management of Social Policies and Programs. Inter-American Development Bank, 2006. http://dx.doi.org/10.18235/0012204.

Full text
Abstract:
Management for Social Development is a field of action (or practice) and knowledge focused strategically on the promotion of social development. Its objective lies in the creation of public value, thus contributing to the reduction of poverty and inequality, as well as to the strengthening of democratic states and citizenship. The present document attempts to define and characterize the field of Management for Social Development and proposing a conceptual framework that provides orientation to the strategic action of Management for Social Development. We consider these objectives relevant to the degree that they may contribute to creating awareness of the importance of effective management practices in the promotion of social development and to strengthening those practices. This text highlights the creation of public value as a central element of Management for Social Development. It also emphasizes the importance of working with multiple actors interested or involved in promoting development. It recommends that management consist of simultaneous and strategic efforts in the areas of programmatic, organizational and political management in order to achieve effectiveness, which will be evidenced by impacts on the improvement of the quality of life and living conditions of the target population.
APA, Harvard, Vancouver, ISO, and other styles
3

Kumfert, G., T. Dahlgren, T. Epperly, and J. Leek. Babel 1.0 Release Criteria: A Working Document. Office of Scientific and Technical Information (OSTI), 2004. http://dx.doi.org/10.2172/15014783.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Juskevicius, E. Definition of IETF Working Group Document States. RFC Editor, 2011. http://dx.doi.org/10.17487/rfc6174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zheng, Yefeng, Huiping Li, and David Doermann. Machine Printed Text and Handwriting Identification in Noisy Document Images. Defense Technical Information Center, 2003. http://dx.doi.org/10.21236/ada459230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cobeen, Kelly, Vahid Mahdavifar, Tara Hutchinson, et al. Large-Component Seismic Testing for Existing and Retrofitted Single-Family Wood-Frame Dwellings (PEER-CEA Project). Pacific Earthquake Engineering Research Center, University of California, Berkeley, CA, 2020. http://dx.doi.org/10.55461/hxyx5257.

Full text
Abstract:
This report is one of a series of reports documenting the methods and findings of a multi-year, multi-disciplinary project coordinated by the Pacific Earthquake Engineering Research Center (PEER and funded by the California Earthquake Authority (CEA). The overall project is titled “Quantifying the Performance of Retrofit of Cripple Walls and Sill Anchorage in Single-Family Wood-Frame Buildings,” henceforth referred to as the “PEER–CEA Project.” The overall objective of the PEER–CEA Project is to provide scientifically based information (e.g., testing, analysis, and resulting loss models) that measure and assess the effectiveness of seismic retrofit to reduce the risk of damage and associated losses (repair costs) of wood-frame houses with cripple wall and sill anchorage deficiencies as well as retrofitted conditions that address those deficiencies. Tasks that support and inform the loss-modeling effort are: (1) collecting and summarizing existing information and results of previous research on the performance of wood-frame houses; (2) identifying construction features to characterize alternative variants of wood-frame houses; (3) characterizing earthquake hazard and ground motions at representative sites in California; (4) developing cyclic loading protocols and conducting laboratory tests of cripple wall panels, wood-frame wall subassemblies, and sill anchorages to measure and document their response (strength and stiffness) under cyclic loading; and (5) the computer modeling, simulations, and the development of loss models as informed by a workshop with claims adjustors. Quantifying the difference of seismic performance of un-retrofitted and retrofitted single-family wood-frame houses has become increasingly important in California due to the high seismicity of the state. Inadequate lateral bracing of cripple walls and inadequate sill bolting are the primary reasons for damage to residential homes, even in the event of moderate earthquakes. Physical testing tasks were conducted by Working Group 4 (WG4), with testing carried out at the University of California San Diego (UCSD) and University of California Berkeley (UCB). The primary objectives of the testing were as follows: (1) development of descriptions of load-deflection behavior of components and connections for use by Working Group 5 in development of numerical modeling; and (2) collection of descriptions of damage at varying levels of peak transient drift for use by Working Group 6 in development of fragility functions. Both UCSD and UCB testing included companion specimens tested with and without retrofit. This report documents the portions of the WG4 testing conducted at UCB: two large-component cripple wall tests (Tests AL-1 and AL-2), one test of cripple wall load-path connections (Test B-1), and two tests of dwelling superstructure construction (Tests C-1 and C-2). Included in this report are details of specimen design and construction, instrumentation, loading protocols, test data, testing observations, discussion, and conclusions.
APA, Harvard, Vancouver, ISO, and other styles
7

Levkowetz, H., D. Meyer, L. Eggert, and A. Mankin. Document Shepherding from Working Group Last Call to Publication. RFC Editor, 2007. http://dx.doi.org/10.17487/rfc4858.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

EFFC/DFI Working Platforms Task Group. Guide to Working Platforms. European Federation of Foundation Contractors and Deep Foundations Institute, 2020. https://doi.org/10.37308/effc-dfi-wptg-guide-e1-2020.

Full text
Abstract:
On a typical construction site, the provision of a safe surface to work on involves and affects a number of the contracting parties (the client; principal designer; general contractor; specialty contractor; platform designer; platform installer or earthworks contractor; platform tester and platform maintainer), and as a consequence, the organisation of its design, installation and maintenance can be complex. As it concerns money and liability it is often a contentious issue, but nonetheless one that needs to be addressed. This document takes each step in turn and describes what good practice is, with reference to documents and resources that have been made available through the EFFC and DFI. In compiling this information responses have been collated from foundation contractors from; France, United Kingdom, the Czech Republic, Germany, Netherlands, Poland, Portugal, Romania, Sweden, Austria, Belgium, Denmark, Hungary, Italy and Sweden, USA and Canada.
APA, Harvard, Vancouver, ISO, and other styles
9

Barr, Giles, Elwyn Baynham, Edgar Black, et al. MICE -- Absorber and focus coil safety working group design document: Preliminary design and assessments. Office of Scientific and Technical Information (OSTI), 2003. http://dx.doi.org/10.2172/842369.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Nikolova, Nikolina, Pencho Mihnev, Temenuzhka Zafirova-Malcheva, et al. Intellectual Output 4: Evaluation kit for inclusion-oriented collaborative learning activities. PLEIADE Project, 2023. http://dx.doi.org/10.60063/nn.2023.0089.95.

Full text
Abstract:
This document is intended as a text file annexed to IO4 and is distributed as an accompanying document to the PEIADE Evaluation kit, which can be reached online on the PLEIADE website: https://moodle.pleiade-project.eu/. The main purpose of this text file is to document in detail the process of the Evaluation kit development, the intermediate and supporting products, and to provide scientific reasoning of its validity and reliability. The document describes the main activities carried out during the Evaluation kit development and the responsibilities taken by the partners. It provides a description of the developed Evaluation kit and use-cases, supporting its usage by external users in their attempts to develop and enact inclusive collaborative learning designs.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!