To see the other types of publications on this topic, follow the link: Unstructured text data.

Dissertations / Theses on the topic 'Unstructured text data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 24 dissertations / theses for your research on the topic 'Unstructured text data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Popescu, Ana-Maria. "Information extraction from unstructured web text /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Olsson, Jennny. "Using Elasticsearch for full-text searches on unstructured data." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395654.

Full text
Abstract:
In order to perform effective searches on large amounts of data it is not viable to simply scan through all of said data. A well established solution for this problem is to generate an index based on the data. This report compares different libraries for establishing such an index and a prototype was implemented to enable full-text searches on an existing database. The libraries considered include Elasticsearch, Solr, Sphinx and Xapian. The database in question consists of audit logs generated by a software for online management of financial trade. The author implemented a prototype using the
APA, Harvard, Vancouver, ISO, and other styles
3

Bojduj, Brett N. "Extraction of Causal-Association Networks from Unstructured Text Data." DigitalCommons@CalPoly, 2009. https://digitalcommons.calpoly.edu/theses/138.

Full text
Abstract:
Causality is an expression of the interactions between variables in a system. Humans often explicitly express causal relations through natural language, so extracting these relations can provide insight into how a system functions. This thesis presents a system that uses a grammar parser to extract causes and effects from unstructured text through a simple, pre-defined grammar pattern. By filtering out non-causal sentences before the extraction process begins, the presented methodology is able to achieve a precision of 85.91% and a recall of 73.99%. The polarity of the extracted relations is
APA, Harvard, Vancouver, ISO, and other styles
4

Sequeira, José Francisco Rodrigues. "Automatic knowledge base construction from unstructured text." Master's thesis, Universidade de Aveiro, 2016. http://hdl.handle.net/10773/17910.

Full text
Abstract:
Mestrado em Engenharia de Computadores e Telemática<br>Taking into account the overwhelming number of biomedical publications being produced, the effort required for a user to efficiently explore those publications in order to establish relationships between a wide range of concepts is staggering. This dissertation presents GRACE, a web-based platform that provides an advanced graphical exploration interface that allows users to traverse the biomedical domain in order to find explicit and latent associations between annotated biomedical concepts belonging to a variety of semantic types
APA, Harvard, Vancouver, ISO, and other styles
5

Vikholm, Oskar. "Dealing with unstructured data : A study about information quality and measurement." Thesis, Uppsala universitet, Institutionen för informatik och media, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-255214.

Full text
Abstract:
Many organizations have realized that the growing amount of unstructured text may contain information that can be used for different purposes, such as making decisions. Organizations can by using so-called text mining tools, extract information from text documents. For example within military and intelligence activities it is important to go through reports and look for entities such as names of people, events, and the relationships in-between them when criminal or other interesting activities are being investigated and mapped. This study explores how information quality can be measured and wh
APA, Harvard, Vancouver, ISO, and other styles
6

Coetsee, Dirko. "Conditional random fields for noisy text normalisation." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/96064.

Full text
Abstract:
Thesis (MScEng) -- Stellenbosch University, 2014.<br>ENGLISH ABSTRACT: The increasing popularity of microblogging services such as Twitter means that more and more unstructured data is available for analysis. The informal language usage in these media presents a problem for traditional text mining and natural language processing tools. We develop a pre-processor to normalise this noisy text so that useful information can be extracted with standard tools. A system consisting of a tokeniser, out-of-vocabulary token identifier, correct candidate generator, and N-gram language model is propo
APA, Harvard, Vancouver, ISO, and other styles
7

Hill, Geoffrey. "Sensemaking in Big Data: Conceptual and Empirical Approaches to Actionable Knowledge Generation from Unstructured Text Streams." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1433597354.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Alshaer, Mohammad. "An Efficient Framework for Processing and Analyzing Unstructured Text to Discover Delivery Delay and Optimization of Route Planning in Realtime." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1105/document.

Full text
Abstract:
L'Internet des objets, ou IdO (en anglais Internet of Things, ou IoT) conduit à un changement de paradigme du secteur de la logistique. L'avènement de l'IoT a modifié l'écosystème de la gestion des services logistiques. Les fournisseurs de services logistiques utilisent aujourd'hui des technologies de capteurs telles que le GPS ou la télémétrie pour collecter des données en temps réel pendant la livraison. La collecte en temps réel des données permet aux fournisseurs de services de suivre et de gérer efficacement leur processus d'expédition. Le principal avantage de la collecte de données en t
APA, Harvard, Vancouver, ISO, and other styles
9

Xiong, Hui. "Combining Subject Expert Experimental Data with Standard Data in Bayesian Mixture Modeling." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1312214048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Minhas, Saliha Z. "A corpus driven computational intelligence framework for deception detection in financial text." Thesis, University of Stirling, 2016. http://hdl.handle.net/1893/25345.

Full text
Abstract:
Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud
APA, Harvard, Vancouver, ISO, and other styles
11

Бабич, Микола Валерійович. "Дослідження систем аналізу великих масивів неструктурованих даних". Master's thesis, Київ, 2018. https://ela.kpi.ua/handle/123456789/26806.

Full text
Abstract:
Обсяг магістерської дисертації складає 85 сторінок, зокрема 20 ілюстрації, 14 таблицю, 6 формул та … джерело інформації. Актуальність теми. За думкою експертів, більше ніж 85% даних формуються у неструктурованій формі. До неструктурованих даних можна віднести текст, мультимедія (відео, голос, зображення), тобто це дані, які не мають заздалегідь визначеної структури, або не організована у встановленому порядку. Це все призводить до труднощів аналізу, особливо у випадку використання традиційного програмного забезпечення, яке призначене для роботи зі структурованими даними. Повсякчас, у неструкт
APA, Harvard, Vancouver, ISO, and other styles
12

Andrade, Junior Valter Lacerda de. "Utilização de técnicas de dados não estruturados para desenvolvimento de modelos aplicados ao ciclo de crédito." Pontifícia Universidade Católica de São Paulo, 2014. https://tede2.pucsp.br/handle/handle/18150.

Full text
Abstract:
Made available in DSpace on 2016-04-29T14:23:30Z (GMT). No. of bitstreams: 1 Valter Lacerda de Andrade Junior.pdf: 673552 bytes, checksum: 68480511c98995570354a0166d2bb577 (MD5) Previous issue date: 2014-08-13<br>The need for expert assessment of Data Mining in textual data fields and other unstructured information is increasingly present in the public and private sector. Through probabilistic models and analytical studies, it is possible to broaden the understanding of a particular information source. In recent years, technology progress caused exponential growth of the information pr
APA, Harvard, Vancouver, ISO, and other styles
13

Carvalho, André Silva de. "Analytics como uma ferramenta para Consumer Insights." Escola Superior de Propaganda e Marketing, 2017. http://tede2.espm.br/handle/tede/267.

Full text
Abstract:
Submitted by Adriana Alves Rodrigues (aalves@espm.br) on 2017-11-22T15:02:28Z No. of bitstreams: 1 ANDRE SILVA DE CARVALHO.pdf: 3017440 bytes, checksum: 72f0dd79324eb16e16c0fca2fea756db (MD5)<br>Approved for entry into archive by Adriana Alves Rodrigues (aalves@espm.br) on 2017-11-22T15:02:51Z (GMT) No. of bitstreams: 1 ANDRE SILVA DE CARVALHO.pdf: 3017440 bytes, checksum: 72f0dd79324eb16e16c0fca2fea756db (MD5)<br>Approved for entry into archive by Ana Cristina Ropero (ana@espm.br) on 2017-11-23T10:56:03Z (GMT) No. of bitstreams: 1 ANDRE SILVA DE CARVALHO.pdf: 3017440 bytes, checksum: 72f0
APA, Harvard, Vancouver, ISO, and other styles
14

Dail, Mathias. "Clustering unstructured life sciences experiments with unsupervised machine learning : Natural language processing for unstructured life sciences texts." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-265549.

Full text
Abstract:
The purpose of this master’s thesis is to analyse different types of document representations in the context of improving, in an unsupervised manner, the searchability of unstructured textual life sciences experiments by clustering similar experiments together. The challenge is to produce, analyse and compare different representations of the life sciences data by using traditional and advanced unsupervised Machine learning models. The text data analysed in this work is noisy and very heterogeneous, as it comes from a real-world Electronic Lab Notebook. Clustering unstructured and unlabeled tex
APA, Harvard, Vancouver, ISO, and other styles
15

Valentin, Sarah. "Extraction et combinaison d’informations épidémiologiques à partir de sources informelles pour la veille des maladies infectieuses animales." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS067.

Full text
Abstract:
L’intelligence épidémiologique a pour but de détecter, d’analyser et de surveiller au cours du temps les potentielles menaces sanitaires. Ce processus de surveillance repose sur des sources dites formelles, tels que les organismes de santé officiels, et des sources dites informelles, comme les médias. La veille des sources informelles est réalisée au travers de la surveillance basée sur les événements (event-based surveillance en anglais). Ce type de veille requiert le développement d’outils dédiés à la collecte et au traitement de données textuelles non structurées publiées sur le Web. Cette
APA, Harvard, Vancouver, ISO, and other styles
16

Kóša, Peter. "Structured Data Extraction from Unstructured Text." Master's thesis, 2013. http://www.nusl.cz/ntk/nusl-330279.

Full text
Abstract:
Title: Structured Data Extraction from Unstructured Text Author: Bc. Peter Kóša Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D., Department of Software Engineering Abstract: In the last 20 years, there has been an ever-growing amount of information present on the Internet and in published texts. However, this information is often in a non-structured format and this causes various problems such as the inability to efficiently search in diverse collections of texts (medical reports, ads, etc.). To overcome these problems, we need efficient tools capable of a
APA, Harvard, Vancouver, ISO, and other styles
17

Lukšová, Ivana. "Ontology Enrichment Based on Unstructured Text Data." Master's thesis, 2013. http://www.nusl.cz/ntk/nusl-324590.

Full text
Abstract:
Title: Ontology Enrichment Based on Unstructured Text Data Author: Ivana Lukšová Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D., Department of Software Engi- neering Abstract: Semantic annotation, attaching semantic information to text data, is a fundamental task in the knowledge extraction. Several ontology-based semantic annotation platforms have been proposed in recent years. However, the process of automated ontology engineering is still a challenging problem. In this paper, a new semi-automatic method for ontology enrichment based on unstructured tex
APA, Harvard, Vancouver, ISO, and other styles
18

Felgueiras, Marco Filipe Madeira. "Multilabel classification of unstructured data using Crunchbase." Master's thesis, 2020. http://hdl.handle.net/10071/22188.

Full text
Abstract:
Our work compares different methods and models for multilabel text classification using information collected from Crunchbase, a large database that holds information of more than 600000 companies. Each company is labeled with one more categories, from a subset of 46 possible, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and Part-of-Speech Tagging. This is a highly unbalanced dataset, where the frequency of each ca
APA, Harvard, Vancouver, ISO, and other styles
19

Wu, Tsung-Ying, and 吳宗穎. "A Stock Risk Alert System Based On Unstructured Text Data." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/8r4xbh.

Full text
Abstract:
碩士<br>元智大學<br>資訊管理學系<br>106<br>Getting information of stock market now is easier than before. Investors can collect stock trading information and news easily, but how to combine structured trading data and unstructured news effectively is a difficult question to investors. Therefore, this study provides a way to combine structured and unstructured data. First, segment the news of individual stock. The next, calculate the frequency of the risky word and process it into structured data. Then establishes individual stock risk indicators with the volumn of daily trading, the ratio of turnover, and
APA, Harvard, Vancouver, ISO, and other styles
20

Goeva, Aleksandrina. "Complexity penalized methods for structured and unstructured data." Thesis, 2017. https://hdl.handle.net/2144/27072.

Full text
Abstract:
A fundamental goal of statisticians is to make inferences from the sample about characteristics of the underlying population. This is an inverse problem, since we are trying to recover a feature of the input with the availability of observations on an output. Towards this end, we consider complexity penalized methods, because they balance goodness of fit and generalizability of the solution. The data from the underlying population may come in diverse formats - structured or unstructured - such as probability distributions, text tokens, or graph characteristics. Depending on the defining fe
APA, Harvard, Vancouver, ISO, and other styles
21

Eickhoff, Matthias. "The Information Value of Unstructured Analyst Opinions." Doctoral thesis, 2017. http://hdl.handle.net/11858/00-1735-0000-0023-3EA0-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Wolowiec, Martin. "Using text-mining-assisted analysis to examine the applicability of unstructured data in the context of customer complaint management." Master's thesis, 2015. http://hdl.handle.net/10362/17534.

Full text
Abstract:
Double Degree<br>In quest of gaining a more holistic picture of customer experiences, many companies are starting to consider textual data due to the richer insights on customer experience touch points it can provide. Meanwhile, recent trends point towards an emerging integration of customer relationship management and customer experience management and thereby availability of additional sources of textual data. Using text-mining-assisted analysis, this study demonstrates the practicality of the arising opportunity with means of perceived justice theory in the context of customer complai
APA, Harvard, Vancouver, ISO, and other styles
23

Rampula, Ilana. "Extrakce sémantických vztahů z nestrukturovaných dat v komerční sféře." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-348035.

Full text
Abstract:
Text analytics in the business domain is a growing field in research and practical applications. We chose to concentrate on Relation Extraction from unstructured data which was provided by a corporate partner. Analyzing text from this domain requires a different approach, counting with irregularities and domain specific attributes. In this thesis, we present two methods for relation extraction. The Snowball system and the Distant Supervision method were both adapted for the unique data. The methods were implemented to use both structured and unstructured data from the database of the company.
APA, Harvard, Vancouver, ISO, and other styles
24

Alic, Irina. "Decision Support Systems for Financial Market Surveillance." Doctoral thesis, 2016. http://hdl.handle.net/11858/00-1735-0000-002B-7D04-4.

Full text
Abstract:
Entscheidungsunterstützungssysteme in der Finanzwirtschaft sind nicht nur für die Wis-senschaft, sondern auch für die Praxis von großem Interesse. Um die Finanzmarktüber-wachung zu gewährleisten, sehen sich die Finanzaufsichtsbehörden auf der einen Seite, mit der steigenden Anzahl von onlineverfügbaren Informationen, wie z.B. den Finanz-Blogs und -Nachrichten konfrontiert. Auf der anderen Seite stellen schnell aufkommen-de Trends, wie z.B. die stetig wachsende Menge an online verfügbaren Daten sowie die Entwicklung von Data-Mining-Methoden, Herausforderungen für die Wissenschaft dar. Entscheid
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!