Dissertations / Theses: 'Daga language'

1

Newton, Alan R. "A formal data fusion language." Thesis, Cranfield University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.481233.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Jarman, Jay. "Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages." Scholar Commons, 2011. http://scholarcommons.usf.edu/etd/3166.

Full text

Abstract:

This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual methods. The methods are evaluated on an extensive set of real clinical notes annotated by experts to provide benchmark results. There are two main research question for this dissertation. First, can information (specialized language) be extracted from clinical progress notes that will represent the notes without loss of predictive information? Secondly, can classifiers be built for clinical progress notes that are represented by specialized language? Three experiments were conducted to answer these questions by investigating some specific challenges with regard to extracting information from the unstructured clinical notes and classifying documents that are so important in the medical domain. The first experiment addresses the first research question by focusing on whether relevant patterns within clinical notes reside more in the highly technical medically-relevant terminology or in the passages expressed by common language. The results from this experiment informed the subsequent experiments. It also shows that predictive patterns are preserved by preprocessing text documents with a grammatical NLP system that separates specialized language from common language and it is an acceptable method of data reduction for the purpose of STM. Experiments two and three address the second research question. Experiment two focuses on applying rule-mining techniques to the output of the information extraction effort from experiment one, with the ultimate goal of creating rule-based classifiers. There are several contributions of this experiment. First, it uses a novel approach to create classification rules from specialized language and to build a classifier. The data is split by classification and then rules are generated. Secondly, several toolkits were assembled to create the automated process by which the rules were created. Third, this automated process created interpretable rules and finally, the resulting model provided good accuracy. The resulting performance was slightly lower than from the classifier from experiment one but had the benefit of having interpretable rules. Experiment three focuses on using decision tree induction (DTI) for a rule discovery approach to classification, which also addresses research question three. DTI is another rule centric method for creating a classifier. The contributions of this experiment are that DTI can be used to create an accurate and interpretable classifier using specialized language. Additionally, the resulting rule sets are simple and easily interpretable, as well as created using a highly automated process.

APA, Harvard, Vancouver, ISO, and other styles

3

Huang, Lizhong. "Express query language and templates and rules two languages for advanced software system integrations." Ohio : Ohio University, 1999. http://www.ohiolink.edu/etd/view.cgi?ohiou1181162850.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Hellmann, Sebastian. "Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-157932.

Full text

Abstract:

This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources.

APA, Harvard, Vancouver, ISO, and other styles

5

Touma, Rizkallah. "Computer-language based data prefetching techniques." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/665207.

Full text

Abstract:

Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data. However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead. In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead. The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach. The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.
Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché.

APA, Harvard, Vancouver, ISO, and other styles

6

Jeelani, Ashfaq Ahmed. "A data layout descriptor language (LADEL)." [Johnson City, Tenn. : East Tennessee State University], 2001. http://etd-submit.etsu.edu/etd/theses/available/etd-0301101-022022/unrestricted/Thesis.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Hsu, Bo-June (Bo-June Paul). "Language Modeling for limited-data domains." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/52796.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 99-109).
With the increasing focus of speech recognition and natural language processing applications on domains with limited amount of in-domain training data, enhanced system performance often relies on approaches involving model adaptation and combination. In such domains, language models are often constructed by interpolating component models trained from partially matched corpora. Instead of simple linear interpolation, we introduce a generalized linear interpolation technique that computes context-dependent mixture weights from features that correlate with the component confidence and relevance for each n-gram context. Since the n-grams from partially matched corpora may not be of equal relevance to the target domain, we propose an n-gram weighting scheme to adjust the component n-gram probabilities based on features derived from readily available corpus segmentation and metadata to de-emphasize out-of-domain n-grams. In scenarios without any matched data for a development set, we examine unsupervised and active learning techniques for tuning the interpolation and weighting parameters. Results on a lecture transcription task using the proposed generalized linear interpolation and n-gram weighting techniques yield up to a 1.4% absolute word error rate reduction over a linearly interpolated baseline language model. As more sophisticated models are only as useful as they are practical, we developed the MIT Language Modeling (MITLM) toolkit, designed for efficient iterative parameter optimization, and released it to the research community.
(cont.) With a compact vector-based n-gram data structure and optimized algorithm implementations, the toolkit not only improves the running time of common tasks by up to 40x, but also enables the efficient parameter tuning for language modeling techniques that were previously deemed impractical.
by Bo-June (Paul) Hsu.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

8

Kim, Edward Soo. "Data-mining natural language materials syntheses." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122075.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Materials Science and Engineering, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references.
Discovering, designing, and developing a novel material is an arduous task, involving countless hours of human effort and ingenuity. While some aspects of this process have been vastly accelerated by the advent of first-principles-based computational techniques and high throughput experimental methods, a vast ocean of untapped historical knowledge lies dormant in the scientific literature. Namely, the precise methods by which many inorganic compounds are synthesized are recorded only as text within journal articles. This thesis aims to realize the potential of this data for informing the syntheses of inorganic materials through the use of data-mining algorithms. Critically, the methods used and produced in this thesis are fully automated, thus maximizing the impact for accelerated synthesis planning by human researchers.
There are three primary objectives of this thesis: 1) aggregate and codify synthesis knowledge contained within scientific literature, 2) identify synthesis "driving factors" for different synthesis outcomes (e.g., phase selection) and 3) autonomously learn synthesis hypotheses from the literature and extend these hypotheses to predicted syntheses for novel materials. Towards the first goal of this thesis, a pipeline of algorithms is developed in order to extract and codify materials synthesis information from journal articles into a structured, machine readable format, analogous to existing databases for materials structures and properties. To efficiently guide the extraction of materials data, this pipeline leverages domain knowledge regarding the allowable relations between different types of information (e.g., concentrations often correspond to solutions).
Both unsupervised and supervised machine learning algorithms are also used to rapidly extract synthesis information from the literature. To examine the autonomous learning of driving factors for morphology selection during hydrothermal syntheses, TiO₂ nanotube formation is found to be correlated with NaOH concentrations and reaction temperatures, using models that are given no internal chemistry knowledge. Additionally, the capacity for transfer learning is shown by predicting phase symmetry in materials systems unseen by models during training, outperforming heuristic physically-motivated baseline stratgies, and again with chemistry-agnostic models. These results suggest that synthesis parameters possess some intrinsic capability for predicting synthesis outcomes. The nature of this linkage between synthesis parameters and synthesis outcomes is then further explored by performing virtual synthesis parameter screening using generative models.
Deep neural networks (variational autoencoders) are trained to learn low-dimensional representations of synthesis routes on augmented datasets, created by aggregated synthesis information across materials with high structural similarity. This technique is validated by predicting ion-mediated polymorph selection effects in MnO₂, using only data from the literature (i.e., without knowledge of competing free energies). This method of synthesis parameter screening is then applied to suggest a new hypothesis for solvent-driven formation of the rare TiO₂ phase, brookite. To extend the capability of synthesis planning with literature-based generative models, a sequence-based conditional variational autoencoder (CVAE) neural network is developed. The CVAE allows a materials scientist to query the model for synthesis suggestions of arbitrary materials, including those that the model has not observed before.
In a demonstrative experiment, the CVAE suggests the correct precursors for literature-reported syntheses of two perovskite materials using training data published more than a decade prior to the target syntheses. Thus, the CVAE is used as an additional materials synthesis screening utility that is complementary to techniques driven by density functional theory calculations. Finally, this thesis provides a broad commentary on the status quo for the reporting of written materials synthesis methods, and suggests a new format which improves both human and machine readability. The thesis concludes with comments on promising future directions which may build upon the work described in this document.
by Edward Soo Kim.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Materials Science and Engineering

APA, Harvard, Vancouver, ISO, and other styles

9

Gutti, Praveen. "Semistructured probabilistic object query language a query language for semistructured probabilistic data /." Lexington, Ky. : [University of Kentucky Libraries], 2007. http://hdl.handle.net/10225/701.

Full text

Abstract:

Thesis (M.S.)--University of Kentucky, 2007.
Title from document title page (viewed on April 2, 2008). Document formatted into pages; contains: vii, 42 p. : ill. (some col.). Includes abstract and vita. Includes bibliographical references (p. 39-40).

APA, Harvard, Vancouver, ISO, and other styles

10

Swain, Bradley Andrew. "Path understanding using geospatial natural language." [Pensacola, Fla.] : University of West Florida, 2009. http://purl.fcla.edu/fcla/etd/WFE0000182.

Full text

Abstract:

Thesis (M.S.)--University of West Florida, 2009.
Submitted to the Dept. of Computer Science. Title from title page of source document. Document formatted into pages; contains 45 pages. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

11

余銘龍 and Ming-lung Yu. "Automatic processing of Chinese language bank cheques." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B31225548.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Aksoy, Baybora Sahin Ilker. "Implementation of Data Flow Query Language (DFQL) /." Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2001. http://handle.dtic.mil/100.2/ADA389404.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Botting, Richard. "Iterative construction of data modelling language semantics." Thesis, Coventry University, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.362076.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Smith, Derrell R. "A Corpus of Second Language Attrition Data." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2196.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Sewczwicz, Richard P. "Form definition language for intelligent data objects." Thesis, Kansas State University, 1986. http://hdl.handle.net/2097/9953.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Grinman, Alex J. "Natural language processing on encrypted patient data." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/113438.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 85-86).
While many industries can benefit from machine learning techniques for data analysis, they often do not have the technical expertise nor computational power to do so. Therefore, many organizations would benefit from outsourcing their data analysis. Yet, stringent data privacy policies prevent outsourcing sensitive data and may stop the delegation of data analysis in its tracks. In this thesis, we put forth a two-party system where one party capable of powerful computation can run certain machine learning algorithms from the natural language processing domain on the second party's data, where the first party is limited to learning only specific functions of the second party's data and nothing else. Our system provides simple cryptographic schemes for locating keywords, matching approximate regular expressions, and computing frequency analysis on encrypted data. We present a full implementation of this system in the form of a extendible software library and a command line interface. Finally, we discuss a medical case study where we used our system to run a suite of unmodified machine learning algorithms on encrypted free text patient notes.
by Alex J. Grinman.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

17

Mårtensson, Christian. "Managing language learning data in mobile apps." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-81078.

Full text

Abstract:

On the journey of learning a language we are exposed to countless words that are easily forgotten and subsequently difficult to find again. This study investigates how to design a personal data management system that enables its users to efficiently organize, find and input the words and phrases that they encounter on their journey. Using DSRM, an artifact was iteratively developed and tested in usability tests and interviews by a total of 10 participants. The feedback from the respondents indicates a strong demand for this type of app and also uncovered design knowledge in this new context. The contribution of this study is a set of 14 design principles for making data management in language learning apps more user-friendly and efficient.

APA, Harvard, Vancouver, ISO, and other styles

18

Sahin, Ilker, and Baybora Aksoy. "Implementation of Data Flow Query Language (DFQL)." Thesis, Monterey, California. Naval Postgraduate School, 2001. http://hdl.handle.net/10945/2772.

Full text

Abstract:

Approved for public release; distribution is unlimited.
A relational database management system (RDBMS) is a software product that structures data in accordance with the relational data model and permits data manipulation based on relational algebra. There are two widely-used query languages for the relational database management systems (RDBMS's). These are Structured Query Language (SQL) and Query By Example (QBE). Although these languages are powerful, they both have drawbacks concerning ease-of-use, especially in expressing universal quantification and specifying complex nested queries. In order to eliminate these problems, Data Flow Query Language (DFQL) has been proposed. DFQL offers an easy-to-use graphical user interface to the relational model based on a data flow diagram, while maintaining all of the strengths of SQL andQBE. The purpose of this thesis is to implement DFQL, allowing the users to login one or more relational database(s) through JDBC, view the structure of the connected databases graphically, and implement inquiries in SQL and DFQL to retrieve the data.
Lieutenant Junior Grade, Turkish Navy

APA, Harvard, Vancouver, ISO, and other styles

19

Graul, Michael, Ronald Fernandes, John L. Hamilton, Charles H. Jones, and Jon Morgan. "ENHANCEMENTS TO THE DATA DISPLAY MARKUP LANGUAGE." International Foundation for Telemetering, 2006. http://hdl.handle.net/10150/604103.

Full text

Abstract:

ITC/USA 2006 Conference Proceedings / The Forty-Second Annual International Telemetering Conference and Technical Exhibition / October 23-26, 2006 / Town and Country Resort & Convention Center, San Diego, California
This paper presents the description of the updated Data Display Markup Language (DDML), a neutral format for data display configurations. The development of DDML is motivated by the fact that in joint service program systems, there is a critical need for common data displays to support distributed T&E missions, irrespective of the test location, data acquisition system, and display system. DDML enables standard data displays to be specified for any given system under test, irrespective of the display vendor or system in which they will be implemented. The version 3.0 of DDML represents a more mature language than the version 1.0 presented at the 2003 ITC. The updated version has been validated for completeness and robustness by developing translators between DDML and numerous vendor formats. The DDML schema has been presented to the Range Commander’s Council (RCC) Data Multiplex Committee for consideration for inclusion in the IRIG 106 standard. The DDML model will be described in terms of both the XML schema and the UML model, and various examples of DDML models will be presented. The intent of this paper is to solicit specific input from the community on this potential RCC standard.

APA, Harvard, Vancouver, ISO, and other styles

20

Balunda, Stephanie A. "Teaching academic vocabulary with corpora student perceptions of data-driven learning /." Connect to resource online, 2009. http://hdl.handle.net/1805/2049.

Full text

Abstract:

Thesis (M.A.)--Indiana University, 2009.
Title from screen (viewed on February 1, 2009). Department of English, Indiana University-Purdue University Indianapolis (IUPUI). Advisor(s): Julie A. Belz, Ulla M. Connor, Thomas A. Upton. Includes vitae. Includes bibliographical references (leaves 65-67).

APA, Harvard, Vancouver, ISO, and other styles

21

羅憲璋 and Hin-cheung Hubert Law. "A language model for mandarin Chinese." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1997. http://hub.hku.hk/bib/B29913391.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

張少能 and Siu-nang Bruce Cheung. "A theory of automatic language acquisition." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1994. http://hub.hku.hk/bib/B31233521.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Shutova, Ekaterina. "Computational approaches to figurative language." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609681.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Erozel, Guzen. "Natural Language Interface On A Video Data Model." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606251/index.pdf.

Full text

Abstract:

The video databases and retrieval of data from these databases have become popular in various business areas of work with the improvements in technology. As a kind of video database, video archive systems need user-friendly interfaces to retrieve video frames. In this thesis, an NLP based user interface to a video database system is developed using a content-based spatio-temporal video data model. The data model is focused on the semantic content which includes objects, activities, and spatial properties of objects. Spatio-temporal relationships between video objects and also trajectories of moving objects can be queried with this data model. In this video database system, NL interface enables flexible querying. The queries, which are given as English sentences, are parsed using Link Parser. Not only exact matches but similar objects and activities are also returned from the database with the help of the conceptual ontology module to return all related frames to the user. This module is implemented using a distance-based method of semantic similarity search on the semantic domain-independent ontology, WordNet. The semantic representations of the given queries are extracted from their syntactic structures using information extraction techniques. The extracted semantic representations are used to call the related parts of the underlying spatio-temporal video data model to calculate the results of the queries.

APA, Harvard, Vancouver, ISO, and other styles

25

Massey, Kiran Angelina. "Standardizing our perinatal language to facilitate data sharing." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/3231.

Full text

Abstract:

Our ultimate goal as obstetric and neonatal care providers is to improve care for mothers and their babies. Continuous quality improvement (CQI) involves iterative cycles of practice change and audit of ongoing clinical care identifying practices that are associated with good outcomes. A vital prerequisite to this evidence based medicine is data collection. In Canada, much of the country is covered by separate fragmented silos known as regional reproductive care databases or perinatal health programs. A more centralized system which includes collaborative efforts is required. Moving in this direction would serve many purposes: efficiency, economy in the setting of limited resources and shrinking budgets and lastly, interaction among data collection agencies. This interaction may facilitate translation and transfer of knowledge to care-givers and patients. There are however many barriers towards such collaborative efforts including privacy, ownership and the standardization of both digital technologies and semantics. After thoroughly examining the current existing perinatal data collection among Perinatal Health Programs (PHPs), and the Canadian Perinatal Network (CPN) database, it was evident that there is little standardization of definitions. This serves as one of the most important barriers towards data sharing. To communicate effectively and share data, researchers and clinicians alike must construct a common perinatal language. Communicative tools and programs such as SNOMED CT® offer a potential solution, but still require much work due to their infancy. A standardized perinatal language would not only lay the definitional foundation in women’s health and obstetrics but also serve as a major contribution towards a universal electronic health record.

APA, Harvard, Vancouver, ISO, and other styles

26

Morgan, Juston. "Visual language for exploring massive RDF data sets." Pullman, Wash. : Washington State University, 2010. http://www.dissertations.wsu.edu/Thesis/Spring2010/J_Morgan_041210.pdf.

Full text

Abstract:

Thesis (M.S. in computer science)--Washington State University, May 2010.
Title from PDF title page (viewed on July 12, 2010). "School of Engineering and Computer Science." Includes bibliographical references (p. 33-34).

APA, Harvard, Vancouver, ISO, and other styles

27

Duran, Randall E. (Randall Eugene). "Reengineering using a data abstraction based specification language." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/33155.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1992.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (leaves 86-88).
by Randall E. Duran.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

28

Wolf, Florian 1975. "Coherence in natural language : data structures and applications." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28854.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, February 2005.
Includes bibliographical references (leaves [143]-148).
(cont.) baseline, and that some coherence-based approaches best predict the human data. However, coherence-based algorithms that operate on trees did not perform as well as coherence-based algorithms that operate on more general graphs. It is suggested that that might in part be due to the fact that more general graphs are more descriptively adequate than trees for representing discourse coherence.
The general topic of this thesis is coherence in natural language, where coherence refers to informational relations that hold between segments of a discourse. More specifically, this thesis aims to (1) develop criteria for a descriptively adequate data structure for representing discourse coherence; (2) test the influence of coherence on psycholinguistic processes, in particular, pronoun processing; (3) test the influence of coherence on the relative saliency of discourse segments in a text. In order to address the first aim, a method was developed for hand-annotating a database of naturally occurring texts for coherence structures. The thus obtained database of coherence structures was used to test assumptions about descriptively adequate data structures for representing discourse coherence. In particular, the assumption that discourse coherence can be represented in trees was tested, and results suggest that more powerful data structures than trees are needed (labeled chain graphs, where the labels represent types of coherence relations, and an ordered array of nodes represents the temporal order of discourse segments in a text). The second aim was addressed in an on-line comprehension and an off-line production experiment. Results from both experiments suggest that only a coherence-based account predicted the full range of observed data. In that account, the observed preferences in pronoun processing are not a result of pronoun-specific mechanisms, but a byproduct of more general cognitive mechanisms that operate when establishing coherence. In order to address the third aim, layout-, word-, and coherence-based approaches to discourse segment ranking were compared to human rankings. Results suggest that word-based accounts provide a strong
by Florian Wolf.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

29

Mrkšić, Nikola. "Data-driven language understanding for spoken dialogue systems." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276689.

Full text

Abstract:

Spoken dialogue systems provide a natural conversational interface to computer applications. In recent years, the substantial improvements in the performance of speech recognition engines have helped shift the research focus to the next component of the dialogue system pipeline: the one in charge of language understanding. The role of this module is to translate user inputs into accurate representations of the user goal in the form that can be used by the system to interact with the underlying application. The challenges include the modelling of linguistic variation, speech recognition errors and the effects of dialogue context. Recently, the focus of language understanding research has moved to making use of word embeddings induced from large textual corpora using unsupervised methods. The work presented in this thesis demonstrates how these methods can be adapted to overcome the limitations of language understanding pipelines currently used in spoken dialogue systems. The thesis starts with a discussion of the pros and cons of language understanding models used in modern dialogue systems. Most models in use today are based on the delexicalisation paradigm, where exact string matching supplemented by a list of domain-specific rephrasings is used to recognise users' intents and update the system's internal belief state. This is followed by an attempt to use pretrained word vector collections to automatically induce domain-specific semantic lexicons, which are typically hand-crafted to handle lexical variation and account for a plethora of system failure modes. The results highlight the deficiencies of distributional word vectors which must be overcome to make them useful for downstream language understanding models. The thesis next shifts focus to overcoming the language understanding models' dependency on semantic lexicons. To achieve that, the proposed Neural Belief Tracking (NBT) model forsakes the use of standard one-hot n-gram representations used in Natural Language Processing in favour of distributed representations of user utterances, dialogue context and domain ontologies. The NBT model makes use of external lexical knowledge embedded in semantically specialised word vectors, obviating the need for domain-specific semantic lexicons. Subsequent work focuses on semantic specialisation, presenting an efficient method for injecting external lexical knowledge into word vector spaces. The proposed Attract-Repel algorithm boosts the semantic content of existing word vectors while simultaneously inducing high-quality cross-lingual word vector spaces. Finally, NBT models powered by specialised cross-lingual word vectors are used to train multilingual belief tracking models. These models operate across many languages at once, providing an efficient method for bootstrapping language understanding models for lower-resource languages with limited training data.

APA, Harvard, Vancouver, ISO, and other styles

30

Jäkel, Tobias, Thomas Kühn, Hannes Voigt, and Wolfgang Lehner. "RSQL - a query language for dynamic data types." ACM, 2014. https://tud.qucosa.de/id/qucosa%3A75118.

Full text

Abstract:

Database Management Systems (DBMS) are used by software applications, to store, manipulate, and retrieve large sets of data. However, the requirements of current software systems pose various challenges to established DBMS. First, most software systems organize their data by means of objects rather than relations leading to increased maintenance, redundancy, and transformation overhead when persisting objects to relational databases. Second, complex objects are separated into several objects resulting in Object Schizophrenia and hard to persist Distributed State. Last but not least, current software systems have to cope with increased complexity and changes. These challenges have lead to a general paradigm shift in the development of software systems. Unfortunately, classical DBMS will become intractable, if they are not adapted to the new requirements imposed by these software systems. As a result, we propose an extension of DBMS with roles to represent complex objects within a relational database and support the exibility required by current software systems. To achieve this goal, we introduces RSQL, an extension to SQL with the concept of objects playing roles when interacting with other objects. Additionally, we present a formal model for the logical representation of roles in the extended DBMS.

APA, Harvard, Vancouver, ISO, and other styles

31

Akrin, Christoffer, and Simon Tham. "A Natural Language Interface for Querying Linked Data." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap (from 2013), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-78921.

Full text

Abstract:

The thesis introduces a proof of concept idea that could spark great interest from many industries. The idea consists of a remote Natural Language Interface (NLI), for querying Knowledge Bases (KBs). The system applies natural language technology tools provided by the Stanford CoreNLP, and queries KBs with the use of the query language SPARQL. Natural Language Processing (NLP) is used to analyze the semantics of a question written in natural language, and generates relational information about the question. With correctly defined relations, the question can be queried on KBs containing relevant Linked Data. The Linked Data follows the Resource Description Framework (RDF) model by expressing relations in the form of semantic triples: subject-predicate-object. With our NLI, any KB can be understood semantically. By providing correct training data, the AI can learn to understand the semantics of the RDF data stored in the KB. The ability to understand the RDF data allows for the process of extracting relational information from questions about the KB. With the relational information, questions can be translated to SPARQL and be queried on the KB.

APA, Harvard, Vancouver, ISO, and other styles

32

Berman, Sonia. "P-Pascal : a data-oriented persistent programming language." Doctoral thesis, University of Cape Town, 1991. http://hdl.handle.net/11427/17084.

Full text

Abstract:

Bibliography: pages 187-199.
Persistence is measured by the length of time an object is retained and is usable in a system. Persistent languages extend general purpose languages by providing the full range of persistence for data of any type. Moreover, data which remains on disk after program termination, is manipulated in the same way as transient data. As these languages are based on general purpose programming languages, they tend to be program-centred rather than data-centred. This thesis investigates the inclusion of data-oriented features in a persistent programming language. P-Pascal, a Persistent Pascal, has been designed and implemented to develop techniques for data clustering, metadata maintenance, security enforcement and bulk data management. It introduces type completeness to Pascal and in particular shows how a type-complete set constructor can be provided. This type is shown to be a practical and versatile mechanism for handling bulk data collections in a persistent environment. Relational algebra operators are provided and the automatic optimisation of set expressions is performed by the compiler and the runtime system. The P-Pascal Abstract Machine incorporates two complementary data placement strategies, automatic updating of type information, and metadata query facilities. The protection of data types, primary (named) objects and their individual components is supported. The challenges and opportunities presented by the persistent store organisation are discussed, and techniques for efficiently exploiting these properties are proposed. We also describe the effects on a data-oriented system of treating persistent and transient data alike, so that they cannot be distinguished statically. We conclude that object clustering, metadata maintenance and security enforcement can and should be incorporated in persistent programming languages. The provision of a built-in, type-complete bulk data constructor and its non-procedural operators is demonstrated. We argue that this approach is preferable to engineering such objects on top of a language, because of greater ease of use and considerable opportunity for automatic optimisation. The existence of such a type does not preclude programmers from constructing their own bulk objects using other types - this is but one advantage of a persistent language over a database system.

APA, Harvard, Vancouver, ISO, and other styles

33

Eccles, Lee H. "DESCRIPTION LANGUAGE FOR A SELF-DESCRIBING DATA SYSTEM." International Foundation for Telemetering, 2004. http://hdl.handle.net/10150/605340.

Full text

Abstract:

International Telemetering Conference Proceedings / October 18-21, 2004 / Town & Country Resort, San Diego, California
Flight Test data systems have in the past been setup by experts using ground based computer systems. In the future it will be possible to give the system a list of parameters to be measured on a given test and have the data acquisition system return the information necessary to process the data. There are several things that are leading systems in this direction. Recorders are beginning to record Meta data along with the data on the same media. IRIG 106 Chapter 10 recorder specification requires that a TMATS file be stored on the media with the data so that the data can be processed by any system. The TMATS file is Meta data. However, the TMATS file still needs to be generated by conventional means. Another factor leading us in this direction is the advent of network based data acquisition systems. This will allow much simpler algorithms to be used to format the data and remove some of the reliance on experts to accomplish this task. What this paper discusses is preliminary work toward using an XML based approach to having the system generate the setup information. The result will be an XML Schema. This can then be used by microprocessors in the data acquisition system to create a record for each measurement that can then be used to process the data.

APA, Harvard, Vancouver, ISO, and other styles

34

Charbonneau-Gowdy, Paula. "Forbidden fruit : identity, power and investment issues in learning a second language through computer mediated communication." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100334.

Full text

Abstract:

In this inquiry, I use ethnographic research methods to uncover the tensions that a selected group of military officers and students from Central and Eastern Europe and Asia experienced learning English in Canada and in Europe. In both settings, I use a Participatory Action Research (PAR) approach to the inquiry to critically explore with the participants their experiences using computers for second language learning. We negotiate changes to their current perceptions of computer-assisted language learning (CALL) through the use of computer-mediated communication (CMC). This communication involved writing-based exchanges at the Canadian site and using state-of-the-art audio video transferring technology, in a multi-site videoconferencing setting with Europe. The study took place between 2001 and 2004. During the four phases of the study, I collected data through observations of online interchanges, collaborative dialogic interviews and participants' written texts in the form of journals and e-mails. Other important data sources included videotapes and field notes taken at the Canadian site and during three field trips to the European sites. I draw on Vygotsky's socio-cultural approach to language, Bakhtin's concept of learning as dialogic and Weedon's notion of identity as dynamic, constructed and contested through Discourses. The work of these three theorists helps to frame my understanding of the historical, political, cultural, pedagogical and personal influences on this multicultural group of English language learners as they negotiated their learning in a unique setting. The participants' stories suggest that video-based computer technology not only supported some of their investment in using their second language orally but also enabled them to construct more powerful subjectivities. The identity construction that took place in English online is an important consideration for these individuals from evolving democracies that are struggling for international connection and recognition. I argue that more stories need to be told so that SL researchers can re-examine their understanding and theories of language learning and communicative practices to include computer technology. I suggest that stories such as these also have important implications for learners, educators and policy makers as they consider their teaching and learning practices with computers in their second language learning contexts.

APA, Harvard, Vancouver, ISO, and other styles

35

Alahmadi, Marwan Ibrahim. "Optimizing data parallelism in applicative languages." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/8457.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Parsons, M. S. "Applicative languages and graphical data structures." Thesis, University of Kent, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.379988.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Harris, Earl Stavis. "Data Constraints in Function-Oriented Languages." W&M ScholarWorks, 1990. https://scholarworks.wm.edu/etd/1539625592.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Jouret, Guido Karel. "Exploiting data-parallelism in functional languages." Thesis, Imperial College London, 1991. http://hdl.handle.net/10044/1/46852.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Maksimovic, Gordana. "Query Languages for Semi-structured Data." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4332.

Full text

Abstract:

Semi-structured data is defined as irregular data with structure that may change rapidly or unpredictably. An example of such data can be found inside the World-Wide Web. Since the data is irregular, the user may not know the complete structure of the database. Thus, querying such data becomes a difficult issue. In order to write meaningful queries on semi-structured data, there is a need for a query language that will support the features that are presented by this data. Standard query languages, such as SQL for relational databases and OQL for object databases, are too constraining for querying semi-structured data, because they require data to conform to a fixed schema before any data is stored into the database. This paper introduces Lorel, a query language developed particularly for querying semi-structured data. Furthermore, it investigates if the standardised query languages support any of the criteria presented for semi-structured data. The result is an evaluation of three query languages, SQL, OQL and Lorel against these criteria.

APA, Harvard, Vancouver, ISO, and other styles

40

Woo, Ka-hei Michelle, and 胡嘉熙. "An analysis of gender and discourse with reference to data from the Hong Kong International Corpus of English." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31952495.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Hjelm, Hans. "Cross-language Ontology Learning : Incorporating and Exploiting Cross-language Data in the Ontology Learning Process." Doctoral thesis, Stockholms universitet, Institutionen för lingvistik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-8414.

Full text

Abstract:

An ontology is a knowledge-representation structure, where words, terms or concepts are defined by their mutual hierarchical relations. Ontologies are becoming ever more prevalent in the world of natural language processing, where we currently see a tendency towards using semantics for solving a variety of tasks, particularly tasks related to information access. Ontologies, taxonomies and thesauri (all related notions) are also used in various variants by humans, to standardize business transactions or for finding conceptual relations between terms in, e.g., the medical domain. The acquisition of machine-readable, domain-specific semantic knowledge is time consuming and prone to inconsistencies. The field of ontology learning therefore provides tools for automating the construction of domain ontologies (ontologies describing the entities and relations within a particular field of interest), by analyzing large quantities of domain-specific texts. This thesis studies three main topics within the field of ontology learning. First, we examine which sources of information are useful within an ontology learning system and how the information sources can be combined effectively. Secondly, we do this with a special focus on cross-language text collections, to see if we can learn more from studying several languages at once, than we can from a single-language text collection. Finally, we investigate new approaches to formal and automatic evaluation of the quality of a learned ontology. We demonstrate how to combine information sources from different languages and use them to train automatic classifiers to recognize lexico-semantic relations. The cross-language data is shown to have a positive effect on the quality of the learned ontologies. We also give theoretical and experimental results, showing that our ontology evaluation method is a good complement to and in some aspects improves on the evaluation measures in use today.
För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se

APA, Harvard, Vancouver, ISO, and other styles

42

Guven, Ahmet. "Speeding up a path-based policy language compiler." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FGuven.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Tatarinov, Igor. "Semantic data sharing with a peer data management system /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/6942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Kakavandy, Hanna, and John Landeholt. "How natural language processing can be used to improve digital language learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281693.

Full text

Abstract:

The world is facing globalization and with that, companies are growing and need to hire according their needs. A great obstacle for this is the language barrier between job applicants and employers who want to hire competent candidates. One spark of light in this challenge is Lingio, who provides a product that teaches digital profession-specific Swedish. Lingio intends to make their existing product more interactive and this research paper aims to research aspects involved in that. This study evaluates system utterances that are planned to be used in Lingio’s product for language learners to use in their practice and studies the feasibility of using the natural language model cosine similarity in classifying the correctness of answers to these utterances. This report also looks at whether it best to use crowd sourced material or a golden standard as benchmark for a correct answer. The results indicate that there are a number of improvements and developments that need to be made to the model in order for it to accurately classify answers due to its formulation and the complexity of human language. It is also concluded that the utterances by Lingio might need to be further developed in order to be efficient in their use for learning language and that crowd sourced material works better than a golden standard. The study makes several interesting observations from the collected data and analysis, aiming to contribute to further research in natural language engineering when it comes to text classification and digital language learning.
Globaliseringen medför flertal konsekvenser för växande företag. En av utmaningarna som företag står inför är anställandet av tillräckligt med kompentent personal. För många företag står språkbarriären mellan de och att anställa kompetens, arbetsökande har ofta inte tillräckligt med språkkunskaper för att klara av jobbet. Lingio är företag som arbetar med just detta, deras produkt är en digital applikation som undervisar yrkesspecific svenska, en effektiv lösning för den som vill fokusera sin inlärning av språket inför ett jobb. Syftet är att hjälpa Lingio i utvecklingen av deras produkt, närmare bestämt i arbetet med att göra den mer interaktiv. Detta görs genom att undersöka effektiviteten hos applikationens yttranden som används för inlärningssyfte och att använda en språkteknologisk modell för att klassificera en användares svar till ett yttrande. Vidare analyseras huruvida det är bäst att använda en golden standard eller insamlat material från enkäter som referenspunkt för ett korrekt yttrande. Resultatet visar att modellen har flertal svagheter och behöver utvecklas för att kunna göra klassificeringen på ett korrekt sätt och att det finns utrymme för bättring när det kommer till yttrandena. Det visas även att insamlat material från enkäter fungerar bättre än en golden standard.

APA, Harvard, Vancouver, ISO, and other styles

45

Widerberg, Ernst. "A Modeling Language for Timed Automata." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291554.

Full text

Abstract:

This work details the design and implementation of a modeling language for timed automata. The primary intended use of the language TML is as an interface to controller synthesis system m2mc, which is being developed in a current KTH/Chalmers research project. TML is evaluated by a qualitative comparison with the modeling languages of two well-known model checking tools: Uppaal and Kronos. Two example systems (Fischer’s mutual exclusion protocol and CSMA/CD) are implemented in all three languages, to discover the relative merits of each language. Although not as feature rich as Uppaal, TML brings some new language features which are found to be potentially useful for modeling timed automata systems. These features are largely adopted from the general graph description language Dot, used by programs in the Graphviz software package. As m2mc is still in early development and liable to change, an intermediate JSON representation for timed automata is defined. A compiler targeting this intermediate representation is implemented using Miking, a new compiler tool under development in a separate KTH project. Further compilation from JSON to Uppaal is implemented as a proof of concept.
Detta arbete behandlar utformning och implementering av ett modelleringsspråk för tidsautomater. Språket TML:s huvudsakliga tänkta tillämpning är att fungera som ett användargränssnitt för kontrollsyntessystemet m2mc, vilket utvecklas i ett pågående forskningsprojekt på KTH och Chalmers. TML utvärderas genom en kvalitativ jämförelse med modelleringsspråken för två välkända model checking-verktyg: Uppaal och Kronos. Två exempelsystem (Fischers protokoll för mutual exclusion och CSMA/CD) implementeras i vardera modelleringsspråk för att undersöka de olika språkens relativa fördelar och nackdelar. Fastän TML inte är lika omfattande i funktionalitet som Uppaal så bidrar språket med en del nya funktioner, vilka baserat på utvärderingen anses kunna vara användbara för modellering av tidsautomatsystem. Dessa funktioner hämtas till stor del från språket Dot, vilket används i mjukvarupaketet Graphviz för att modellera generella grafer. Eftersom m2mc är i tidig utveckling vore direkt integration med TML inte praktiskt användbart. Därför definieras istället ett mellanformat för tidsautomater i JSON. En kompilator för TML som producerar detta mellanformat implementeras med användning av Miking, ett nytt kompilatorverktyg under utveckling i ett separat KTH-projekt. Som ett koncepttest implementeras vidare kompilering från JSON till Uppaal.

APA, Harvard, Vancouver, ISO, and other styles

46

Smith, Sydney. "Approaches to Natural Language Processing." Scholarship @ Claremont, 2018. http://scholarship.claremont.edu/cmc_theses/1817.

Full text

Abstract:

This paper explores topic modeling through the example text of Alice in Wonderland. It explores both singular value decomposition as well as non-‐‑negative matrix factorization as methods for feature extraction. The paper goes on to explore methods for partially supervised implementation of topic modeling through introducing themes. A large portion of the paper also focuses on implementation of these techniques in python as well as visualizations of the results which use a combination of python, html and java script along with the d3 framework. The paper concludes by presenting a mixture of SVD, NMF and partially-‐‑supervised NMF as a possible way to improve topic modeling.

APA, Harvard, Vancouver, ISO, and other styles

47

Strycharz, Theodore M. "Analysis of Defense Language Institute automated student questionnaire data." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1996. http://handle.dtic.mil/100.2/ADA319856.

Full text

Abstract:

Thesis (M.S. in Operations Research) Naval Postgraduate School, September 1996.
Thesis advisor(s): H.J. Larson. "September 1996." Includes bibliographical references (p. 39). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

48

Wendelborn, Andrew Lawrence. "Data flow implementations of a lucid-like programming language." Title page, contents and summary only, 1985. http://web4.library.adelaide.edu.au/theses/09PH/09phw471.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Marquez, Gabriel L. "Refactoring for paradigm change in the interactive data language." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2007. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Sadeghi, R. "A database query language for operations on historical data." Thesis, University of Abertay Dundee, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.378932.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Daga language'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles