To see the other types of publications on this topic, follow the link: Language Technology (Computational Linguistics).

Dissertations / Theses on the topic 'Language Technology (Computational Linguistics)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Language Technology (Computational Linguistics).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Stanley, Theban. "A robust architecture for human language technology systems." Master's thesis, Mississippi State : Mississippi State University, 2006. http://sun.library.msstate.edu/ETD-db/ETD-browse/browse.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Holmqvist, Maria. "Word Alignment by Re-using Parallel Phrases." Licentiate thesis, Linköping University, Linköping University, NLPLAB - Natural Language Processing Laboratory, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15463.

Full text
Abstract:
<p>In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more  context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phra
APA, Harvard, Vancouver, ISO, and other styles
3

Jansche, Martin. "Inference of string mappings for speech technology." Columbus, Ohio : Ohio State University, 2003. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1061209163.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2003.<br>Title from first page of PDF file. Document formatted into pages; contains xv, 268 p.; also includes graphics. Includes abstract and vita. Advisor: Chris Brew, Dept. of Linguistics. Includes bibliographical references (p. 252-266) and index.
APA, Harvard, Vancouver, ISO, and other styles
4

Laws, Mark R., and n/a. "Maori language integration in the age of information technology: a computational approach." University of Otago. Department of Information Science, 2001. http://adt.otago.ac.nz./public/adt-NZDU20070517.123300.

Full text
Abstract:
A multidisciplinary approach that involves language universals, linguistic discourse analysis and computer information technology are combined to support the descriptive nature of this research dissertation. Utilising comparative methods to determine rudimentary language structures which reflect both the scientific and historic parameters that are embedded in all languages. From a hypothesis to the proof of concept, a multitude of computer applications have been used to test these language models, templates and frameworks. To encapsulate this entire approach, it is best described as "designing
APA, Harvard, Vancouver, ISO, and other styles
5

Casula, Camilla. "Transfer Learning for Multilingual Offensive Language Detection with BERT." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412450.

Full text
Abstract:
The popularity of social media platforms has led to an increase in user-generated content being posted on the Internet. Users, masked behind what they perceive as anonymity, can express offensive and hateful thoughts on these platforms, creating a need to detect and filter abusive content. Since the amount of data available on the Internet is impossible to analyze manually, automatic tools are the most effective choice for detecting offensive and abusive messages. Academic research on the detection of offensive language on social media has been on the rise in recent years, with more and more s
APA, Harvard, Vancouver, ISO, and other styles
6

Kotsifas, Dimitrios. "Intonation and sentence type interpretation in Greek : A production and perception approach." Thesis, University of Skövde, School of Humanities and Informatics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2960.

Full text
Abstract:
<p>This thesis examines the intonation patterns of Modern Greek with regard to different interpretations of the sentence types (declarative, interrogative, imperative).</p><p>14 utterances are produced by Greek native speakers (2 men and 2 women) so as to express various speech acts: STATEMENT, QUESTION, COMMAND and REQUEST.</p><p>The acquisition of the F0 curve for each utterance by means of the Wavesurfer tool leads to an analysis of the pitch movements and their alignments.</p><p>After the F0 curves are analyzed and illustrated using the Excel program we are able to compare and group them.
APA, Harvard, Vancouver, ISO, and other styles
7

Mollevik, Johan. "Natural language interfaces over spatial data : investigations in scalability, extensibility and reliability." Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-87705.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Perkins, Drew. "Separating the Signal from the Noise: Predicting the Correct Entities in Named-Entity Linking." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412556.

Full text
Abstract:
In this study, I constructed a named-entity linking system that maps between contextual word embeddings and knowledge graph embeddings to predict correct entities. To establish a named-entity linking system, I first applied named-entity recognition to identify the entities of interest. I then performed candidate generation via locality sensitivity hashing (LSH), where a candidate group of potential entities were created for each identified entity. Afterwards, my named-entity disambiguation component was performed to select the most probable candidate. By concatenating contextual word embedding
APA, Harvard, Vancouver, ISO, and other styles
9

Yoon, Hyunsook. "An investigation of students' experiences with corpus technology in second language academic writing." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1109806353.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2005.<br>Document formatted into pages; contains 307 p. Includes bibliographical references. Abstract available online via OhioLINK's ETD Center; full text release delayed at author's request until 2006 March 7.
APA, Harvard, Vancouver, ISO, and other styles
10

Svanfeldt, Gunilla. "Expressiveness in virtual talking faces." Licentiate thesis, Stockholm : Department of Speech, Music and Hearing, School of Computer Science and Communication, Kungliga Tekniska högskolan (KTH), 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4210.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Yusupujiang, Zulipiye. "Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-354459.

Full text
Abstract:
In this thesis, we mainly investigate the influence of using unsupervised morphological segmentation as features on the dependency parsing of morphologically rich languages such as Finnish, Estonian, Hungarian, Turkish, Uyghur, and Kazakh. Studying the morphology of these languages is of great importance for the dependency parsing of morphologically rich languages since dependency relations in a sentence of these languages mostly rely on morphemes rather than word order. In order to investigate our research questions, we have conducted a large number of parsing experiments both on MaltParser a
APA, Harvard, Vancouver, ISO, and other styles
12

Veeman, Hartger. "A comparative study of the grammatical gender systems of languages by means of analysing word embeddings." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-425635.

Full text
Abstract:
The creation of word embeddings is one of the key breakthroughs in natural language processing. Word embeddings allow for words to be represented semantically, opening the way to many new deep learning methods. Understanding what information is in word embeddings will help understanding the behaviour of embeddings in natural language processing tasks, but also allows for the quantitative study of the linguistic features such as grammatical gender. This thesis attempts to explore how grammatical gender is encoded in word embeddings, through analysing the performance of a neural network classifi
APA, Harvard, Vancouver, ISO, and other styles
13

Zhang, Yifei. "The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446094.

Full text
Abstract:
In this thesis, we explore the impact of M-BERT and different transfer sizes on the choice of different transfer languages in dependency parsing. In order to investigate our research questions, we conduct a series of experiments on the treebanks in Universal Dependencies with UUParser.     The main conclusions and contributions of this study are as follows:   First, we train a variety of languages in several different scripts with M-BERT being added into the parsing framework, which is one of the most state-of-the-art deep learning models based on the Transformer architecture. In general, we g
APA, Harvard, Vancouver, ISO, and other styles
14

Abrahamsson, Peder. "Mer lättläst : Påbyggnad av ett automatiskt omskrivningsverktyg till lätt svenska." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70156.

Full text
Abstract:
Det svenska språket ska finnas tillgängligt för alla som bor och verkar i Sverige. Därförär det viktigt att det finns lättlästa alternativ för dem som har svårighet att läsa svensktext. Detta arbete bygger vidare på att visa att det är möjligt att skapa ett automatisktomskrivningsprogram som gör texter mer lättlästa. Till grund för arbetet liggerCogFLUX som är ett verktyg för automatisk omskrivning till lätt svenska. CogFLUXinnehåller funktioner för att syntaktiskt skriva om texter till mer lättläst svenska.Omskrivningarna görs med hjälp av omskrivningsregler framtagna i ett tidigare projekt.I
APA, Harvard, Vancouver, ISO, and other styles
15

Papadopoulou, Anthi. "Automatic Error Detection and Correction in Neural Machine Translation : A comparative study of Swedish to English and Greek to English." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385085.

Full text
Abstract:
Automatic detection and automatic correction of machine translation output are important steps to ensure an optimal quality of the final output. In this work, we compared the output of neural machine translation of two different language pairs, Swedish to English and Greek to English. This comparison was made using common machine translation metrics (BLEU, METEOR, TER) and syntax-related ones (POSBLEU, WPF, WER on POS classes). It was found that neither common metrics nor purely syntax-related ones were able to capture the quality of the machine translation output accurately, but the decomposi
APA, Harvard, Vancouver, ISO, and other styles
16

Johansson, Kajsa. "Transcription of Historical Encrypted Manuscripts : Evaluation of an automatic interactive transcription tool." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385254.

Full text
Abstract:
Countless of historical sources are saved in national libraries and archives all over the world and contain important information about our history. Some of these sources are encrypted to prevent people from reading it. This thesis examines a semi-automated Interactive transcription Tool based on unsupervised learning without any labelled training data that has been developed for transcription of encrypted sources and compares it to manual transcription. The interactive transcription tool is based on handwritten text recognition techniques and the system identifies cluster of symbols based on
APA, Harvard, Vancouver, ISO, and other styles
17

Olof, Löfving. "Sentiment Analysis of Equity Analyst Research Reports using Convolutional Neural Networks." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-388586.

Full text
Abstract:
Natural language processing, a subfield of artificial intelligence and computer science, has recently been of great research interest due to the vast amount of information created on the internet in the modern era. One of the main natural language processing areas concerns sentiment analysis. This is a field that studies the polarity of human natural language and generally tries to categorize it as either positive, negative or neutral. In this thesis, sentiment analysis has been applied to research reports written by equity analysts. The objective has been to investigate if there exist a disti
APA, Harvard, Vancouver, ISO, and other styles
18

Jonasson, Michael. "Fördomsfulla associationer i en svenskvektorbaserad semantisk modell." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159027.

Full text
Abstract:
Semantiska vektormodeller är en kraftfull teknik där ords mening kan representeras av vektorervilka består av siffror. Vektorerna tillåter geometriska operationer vilka fångar semantiskt viktigaförhållanden mellan orden de representerar. I denna studie implementeras och appliceras WEAT-metoden för att undersöka om statistiska förhållanden mellan ord som kan uppfattas somfördomsfulla existerar i en svensk semantisk vektormodell av en svensk nyhetstidning. Resultatetpekar på att ordförhållanden i vektormodellen har förmågan att återspegla flera av de sedantidigare IAT-dokumenterade fördomar som
APA, Harvard, Vancouver, ISO, and other styles
19

Koschwitz, Joana. "The effect of speaking style on the performance of a forensic voice comparison system." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355736.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

You, Huiling. "Unsupervised Lexical Semantic Change Detection with Context-Dependent Word Representations." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444871.

Full text
Abstract:
In this work, we explore the usefulness of contextualized embeddings from language models on lexical semantic change (LSC) detection. With diachronic corpora spanning two time periods, we construct word embeddings for a selected set of target words, aiming at detecting potential LSC of each target word across time. We explore different systems of embeddings to cover three topics: contextualized vs static word embeddings, token- vs type-based embeddings, and multilingual vs monolingual language models. We use a multilingual dataset covering three languages (English, German, Swedish) and explore
APA, Harvard, Vancouver, ISO, and other styles
21

Gotting, Olof. "Generating Conceptual Metaphoric Paraphrases." Thesis, Stockholms universitet, Avdelningen för datorlingvistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-193931.

Full text
Abstract:
Metaphoric Paraphrase generation is a relatively new and unexplored Natural Language Generation task. The aim of the task is to develop computational systems that paraphrase literal sentences into cogent metaphoric ones. Challenges in the field include representation of common sense knowledge and ensuring meaning retention when dealing with phrases that are dissimilar in their literal sense. This thesis will deal with the specific task of paraphrasing literal adjective phrases into metaphoric noun phrases, taking into consideration the preceding context of the adjective phrase. Two different s
APA, Harvard, Vancouver, ISO, and other styles
22

Yako, Mary. "Emotional Content in Novels for Literary Genre Prediction : And Impact of Feature Selection on Text Classification Models." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447148.

Full text
Abstract:
Automatic literary genre classification presents a challenging task for Natural Language Processing (NLP) systems, mainly because literary texts have deeper levels of meanings, hold distinctive themes, and communicate certain messages and emotions. We conduct a study where we experiment with building literary genre classifiers based on emotions in novels, to investigate the effects that features pertinent to emotions have on models of genre prediction. We begin by performing an analysis of emotions describing emotional composition and density in the dataset. The experiments are carried out on
APA, Harvard, Vancouver, ISO, and other styles
23

Eriksson, Adam. "Evaluation Of Methods For AutomaticallyDeciding Article Type For Newspapers." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177788.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Yaxi. "Named Entity Recognition for Social Media Text." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395978.

Full text
Abstract:
This thesis aims to perform named entity recognition for English social media texts. Named Entity Recognition (NER) is applied in many NLP tasks as an important preprocessing procedure. Social media texts contain lots of real-time data and therefore serve as a valuable source for information extraction. Nevertheless, NER for social media texts is a rather challenging task due to the noisy context. Traditional approaches to deal with this task use hand-crafted features but prove to be both time-consuming and very task-specific. As a result, they fail to deliver satisfactory performance. The goa
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Tonghe. "Identifying Base Noun Phrases by Means of Recurrent Neural Networks : Using Morphological and Dependency Features." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412778.

Full text
Abstract:
Noun phrases convey key information in communication and are of interest in NLP tasks. A base NP is defined as the headword and left-hand side modifiers of a noun phrase. In this thesis, we identify base NPs in Universal Dependencies treebanks in English and French using an RNN architecture.The data of this thesis consist of three multi-layered treebanks in which each sentence is annotated in both constituency and dependency formalisms. To build our training data, we find base NPs in the constituency layers and project them onto the dependency layer by labeling corresponding tokens. For input fea
APA, Harvard, Vancouver, ISO, and other styles
26

He, Tiantian. "Specificity Prediction for Sentences in Press Releases." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413515.

Full text
Abstract:
Specificity is an important factor to text analysis. While much research on sentence specificity experiments upon news, very little is known about press releases. Our study is devoted to specificity in press releases, which are journalistic documents that companies share with the press and other media outlets. In this research, we analyze press releases about digital transformation written by pump companies, and develop tools for automatic measurement of sentence specificity. The goal of the research is to 1) explore the effects of data combination, 2) analyze features for specificity predicti
APA, Harvard, Vancouver, ISO, and other styles
27

Renfei, Han. "Using Attention-based Sequence-to-Sequence Neural Networks for Transcription of Historical Cipher Documents." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420322.

Full text
Abstract:
Encrypted historical manuscripts (also called ciphers), containing encoded information, provides a useful resource for giving new insight into our history. Transcribing these manuscripts from image format to computer readable format is a necessary step for decrypting them. In this thesis project, we explore automatic approaches of Hand Written Text Recognition (HTR) for cipher image transcription line by line.In this thesis project, We applied an attention-based Sequence-to-Sequence (Seq2Seq) model for the automatic transcription of ciphers with three different writing systems. We tested/devel
APA, Harvard, Vancouver, ISO, and other styles
28

Cai, Xuemei. "A Lexical Comparison Using Word Embedding Mapping from an Academic Word Usage Perspective." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-425266.

Full text
Abstract:
This thesis applies the word embedding mapping approach to make a lexical comparison from academic word usage perspective. We aim to demonstrate the differences in academic word usage between a corpus of student writings and a corpus of academic English, as well as a corpus of student writings and social media texts. The Vecmap mapping algorithm, commonly used in solving cross-language mapping problems, was used to map academic English vector space and social media text vector space into the common student writing vector space to facilitate the comparison of word representations from different
APA, Harvard, Vancouver, ISO, and other styles
29

Hantosi, Albertsson Sarah. "Textuella särdrag som kvalitet : En studie om att automatiskt mäta kvalitet i teknisk dokumentation." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-120970.

Full text
Abstract:
Denna uppsats har undersökt vilka textuella särdrag som upplevs som brott emot kvalitet för den tekniska dokumentationen internt på Saab och hur särdrag som valts enligt experters bedömning kan evalueras automatiskt. Uppsatsen har med hjälp av data som genererats ur en deltagande design föreslagit en ny automatisk metod för att undersöka kvalitet i teknisk dokumentation. Tekniska skribenter och redaktörer deltog för att besvara uppsatsens första fråga och resultatet ifrån detta är en samling textuella särdrag som är möjliga att kvantifiera. Ur samlingen valdes fyra textuella särdrag som sedan
APA, Harvard, Vancouver, ISO, and other styles
30

Sagemo, Oscar. "Estimating Post-Editing Effort with Translation Quality Features." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-299143.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Tengstrand, Lisa. "Abbreviation Expansion in Swedish Clinical Text : Using Distributional Semantic Models and Levenshtein Distance Normalization." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-226235.

Full text
Abstract:
In the medical domain, especially in clinical texts, non-standard abbreviations are prevalent, which impairs readability for patients. To ease the understanding of the physicians' notes, abbreviations need to be identified and expanded into their original forms. This thesis presents a distributional semantic approach to find candidates of the original form of the abbreviation, which is combined with Levenshtein distance to choose the correct candidate among the semantically related words. The method is applied to radiology reports and medical journal texts, and a comparison is made to general
APA, Harvard, Vancouver, ISO, and other styles
32

Rydberg, Jonatan. "Detektion av handskrivna ordobjekt i inskannade dokument." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-87856.

Full text
Abstract:
I denna rapport presenteras ett sätt att detektera handskrivna ordobjekt i inskannade dokument. Rapporten belyser också några av de problem som förekommer vid detektion av handskrivna ordobjekt. Detektionen görs med hjälp av en indelning av bilden i rektangulära regioner. Därefter används enmaskininlärningsalgoritm för att klassificera regionerna som antingen handskriven text eller övrigt. För att klassificera en region behövs mätvärden för en region, såsom area, som en algoritm kan använda. De flesta som testas och används i denna rapport har använts tidigare för att detektera handskriven tex
APA, Harvard, Vancouver, ISO, and other styles
33

Nilsson, Karin. "Språkteknologi för myndigheters hemsidor : En studie av verktyg som kan underlätta för personer som inte har svenska som modersmål att självständigt använda e-tjänster." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15512.

Full text
Abstract:
På svenska myndigheter arbetar man aktivt med att erbjuda sina kunder möjligheter att göra ärenden via Internet. Försäkringskassans egna studier tyder dock på att personer som inte har svenska som modersmål är en grupp som i stället väljer att komma in på kontoren för att utföra sina ärenden, även om ärendena är relativt enkla. Den här studien undersöker hur språkteknologiska hjälpmedel skulle kunna underlätta för den här gruppen att använda tjänster på Internet. För att ta reda på hur nysvenskar själva ser på sin kontakt med myndigheter hölls fokusgrupper där deras erfarenheter diskuterades.
APA, Harvard, Vancouver, ISO, and other styles
34

Nilsson, Beatrice. "Digitala verktyg och läsmotivation : Hur digitala verktyg används i högstadiet för att stimulera elevernas vilja att läsa." Thesis, Högskolan Dalarna, Svenska språket, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:du-26653.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sandelius, Hugo. "Creating Knowledge Graphs using Distributional Semantic Models." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-199702.

Full text
Abstract:
This report researches a method for creating knowledge graphs, a specific way of structuring information, using distributional semantic models. Two different algorithms for selecting graph edges and two different algorithms for labelling edges are tried, and variations of those are evaluated. We perform experiments comparing our knowledge graphs with existing manually constructed knowledge graphs of high quality, with respect to graph structure and edge labels. We find that the algorithms usually produces graphs with a structure similar to that of manually constructed knowledge graphs, as long
APA, Harvard, Vancouver, ISO, and other styles
36

Hang, Sijia. "Clustering Short Texts: Categorizing Initial Utterances from Customer Service Dialogue Agents." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-453814.

Full text
Abstract:
Text classification involves labeled data, which is not always available, or requires expensive manual labour.User-generated short texts are being produced in abundance in customer service sectors through transcripts of phone calls or chats online. This kind of unstructured textual data can be noisy and thus poses challenges to unsupervised classification methods developed for standard documents such as news articles.This thesis project explores some possible methods of unsupervised classification of user-generated short texts in Swedish on a real-world dataset of short texts collected from fi
APA, Harvard, Vancouver, ISO, and other styles
37

Gambardella, Maria-Elena. "Cleartext detection and language identification in ciphers." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439.

Full text
Abstract:
In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its languag
APA, Harvard, Vancouver, ISO, and other styles
38

Feldman, Anna. "Portable language technology a resource-light approach to morpho-syntactic tagging /." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1153344391.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Luo, Ziyang. "Analyzing the Anisotropy Phenomenon in Transformer-based Masked Language Models." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445537.

Full text
Abstract:
In this thesis, we examine the anisotropy phenomenon in popular masked language models, BERT and RoBERTa, in detail. We propose a possible explanation for this unreasonable phenomenon. First, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers. Namely, we find cases of persistent outlier neurons within BERT and RoBERTa's hidden state vectors that consistently bear the smallest or largest values in said vectors. In an attempt to investigate the source of this information, we in
APA, Harvard, Vancouver, ISO, and other styles
40

Woldemariam, Yonas Demeke. "Natural language processing in cross-media analysis." Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-147640.

Full text
Abstract:
A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Wi
APA, Harvard, Vancouver, ISO, and other styles
41

Jones, Warwick Alfred. "A corpus-linguistic approach to foreign/second language learning: an experimental study of a new pedagogicmodel for integrating linguistic knowledge with corpus technology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46053372.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Stymne, Sara. "Compound Processing for Phrase-Based Statistical Machine Translation." Licentiate thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-51416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Axelsson, Robin. "Implementation och utvärdering av termlänkare i Java." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-92732.

Full text
Abstract:
Aligning parallell terms in a parallell corpus can be done by aligning all words and phrases in the corpus and then performing term extraction on the aligned set of word pairs. Alternatively, term extraction in the source and target text can be made separately and then the resulting term candidates can be aligned, forming aligned parallell terms. This thesis describes an implementation of a word aligner that is applied on extracted term candidates in both the source and the target texts. The term aligner uses statistical measures, the tool Giza++ and heuristics in the search for alignments. Th
APA, Harvard, Vancouver, ISO, and other styles
44

Koo, Kyosung Koo Kyosung. "Effects of using corpora and online reference tools on foreign language writing a study of Korean learners of English as a second language /." Thesis supplement (Stimulated recall data, Korean):, 2006. http://ir.uiowa.edu/etd/65.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Steensland, Henrik, and Dina Dervisevic. "Controlled Languages in Software User Documentation." Thesis, Linköping University, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4637.

Full text
Abstract:
<p>In order to facilitate comprehensibility and translation, the language used in software user documentation must be standardized. If the terminology and language rules are standardized and consistent, the time and cost of translation will be reduced. For this reason, controlled languages have been developed. Controlled languages are subsets of other languages, purposely limited by restricting the terminology and grammar that is allowed.</p><p>The purpose and goal of this thesis is to investigate how using a controlled language can improve comprehensibility and translatability of software use
APA, Harvard, Vancouver, ISO, and other styles
46

Rosell, Magnus. "Clustering in Swedish : The Impact of some Properties of the Swedish Language on Document Clustering and an Evaluation Method." Licentiate thesis, Stockholm, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-438.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Sjöholm, Johan. "Probability as readability : A new machine learning approach to readability assessment for written Swedish." Thesis, Linköpings universitet, NLPLAB - Laboratoriet för databehandling av naturligt språk, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-78107.

Full text
Abstract:
This thesis explores the possibility of assessing the degree of readability of writtenSwedish using machine learning. An application using four levels of linguistic analysishas been implemented and tested with four different established algorithmsfor machine learning. The new approach has then been compared to establishedreadability metrics for Swedish. The results indicate that the new method workssignificantly better for readability classification of both sentences and documents.The system has also been tested with so called soft classification which returns aprobability for the degree of re
APA, Harvard, Vancouver, ISO, and other styles
48

Rennes, Evelina. "Improved Automatic Text Simplification by Manual Training." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-120001.

Full text
Abstract:
The purpose of this thesis was the further development of a rule set used in an automatic text simplification system, and the exploration of whether it is possible to improve the performance of a rule based text simplification system by manual training. A first rule set was developed from a thor- ough literature review, and the rule refinement was performed by manually adapting the first rule set to a set of training texts. When there was no more change added to the set of rules, the training was considered to be completed, and the two sets were applied to a test set, for evaluation. This thes
APA, Harvard, Vancouver, ISO, and other styles
49

Lameris, Harm. "Homograph Disambiguation and Diacritization for Arabic Text-to-Speech Using Neural Networks." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446509.

Full text
Abstract:
Pre-processing Arabic text for Text-to-Speech (TTS) systems poses major challenges, as Arabic omits short vowels in writing. This omission leads to a large number of homographs, and means that Arabic text needs to be diacritized to disambiguate these homographs, in order to be matched up with the intended pronunciation. Diacritizing Arabic has generally been achieved by using rule-based, statistical, or hybrid methods that combine rule-based and statistical methods. Recently, diacritization methods involving deep learning have shown promise in reducing error rates. These deep-learning methods
APA, Harvard, Vancouver, ISO, and other styles
50

Askarieh, Sona. "Cohesion and Comprehensibility in Swedish-English Machine Translated Texts." Thesis, Linköpings universitet, Institutionen för kultur och kommunikation, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-108468.

Full text
Abstract:
Access to various texts in different languages causes an increasing demand for fast, multi-purpose, and cheap translators. Pervasive internet use intensifies the necessity for intelligent and cheap translators, since traditional translation methods are excessively slow to translate different texts. During the past years, scientists carried out much research in order to add human and artificial intelligence into the old machine translation systems and the idea of developing a machine translation system came into existence during the days of World War (Kohenn, 2010). The new invention was useful
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!