Dissertations / Theses on the topic 'Datorlingvistik'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 27 dissertations / theses for your research on the topic 'Datorlingvistik.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Eklund, Robert, and Mats Wirén. "Effects of open and directed prompts on filled pauses and utterance production." Stockholms universitet, Avdelningen för datorlingvistik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-40230.
Full textvon, Kartaschew Filip. "Grundtonsstrategier vid tonlösa segment." Thesis, Uppsala University, Department of Linguistics and Philology, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8270.
Full textProsodimodeller som bl.a. kan användas i talsynteser grundar sig ofta på analyser av tal som består av enbart tonande segment. Framför tonlös konsonant saknar vokalsegments grundtonskurvor möjlig fortsättning och blir dessutom kortare. Detta brukar då justeras med hjälp av trunkering av grundtonskurvan. Tidigare studier har i korthet visat att skillnader, förutom trunkering, i vokalers grundtonskurva kan uppstå beroende på om efterföljande segment är tonande eller tonlöst. Med utgångspunkt från dessa studier undersöks i detta examensarbete grundtonskurvan i svenska satser. Även resultaten i denna studie visar att olika strategier i grundtonskurvan används, och att trunkering inte räcker för att förklara vad som sker med grundtonskurvan i dessa kontexter. Generellt visar resultaten på att det verkar viktigt för försökspersonerna att behålla den information som grundtonskurvan ger i form av max- och minimumvärde, och att fall och stigningar så långt det går bibehålls.
Tiedemann, Jörg. "Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing." Doctoral thesis, Uppsala University, Department of Linguistics, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3791.
Full textThe focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.
Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.
Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.
A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).
Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.
Axelsson, Hans, and Oskar Blom. "Utveckling av ett svensk-engelskt lexikon inom tåg- och transportdomänen." Thesis, Uppsala University, Department of Linguistics and Philology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8269.
Full textThis paper describes the process of building a machine translation lexicon for use in the train and transport domain with the machine translation system MATS. The lexicon will consist of a Swedish part, an English part and links between them and is derived from a Trados
translation memory which is split into a training(90%) part and a testing(10%) part. The task is carried out mainly by using existing word linking software and recycling previous machine translation lexicons from other domains. In order to do this, a method is developed where focus lies on automation by means of both existing and self developed software, in combination with manual interaction. The domain specific lexicon is then extended with a domain neutral core lexicon and a less domain neutral general lexicon. The different lexicons are automatically and manually evaluated through machine translation on the test corpus. The automatic evaluation of the largest lexicon yielded a NEVA score of 0.255 and a BLEU score of 0.190. The manual evaluation saw 34% of the segments correctly translated, 37%, although not correct, perfectly understandable and 29% difficult to understand.
Sandgren, Frida. "Creation of a customised character recognition application." Thesis, Uppsala University, Department of Linguistics and Philology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-4801.
Full textThis master’s thesis describes the work in creating a customised optical character recognition (OCR) application; intended for use in digitisation of theses submitted to the Uppsala University in the 18th and 19th centuries. For this purpose, an open source software called Gamera has been used for recognition and classification of the characters in the documents. The software provides specific algorithms for analysis of heritage documents and is designed to be used as a tool for creating domain-specific (i.e. customised) recognition applications.
By using the Gamera classifier training interface, classifier data was created which reflects the characters in the particular theses. The data can then be used in automatic recognition of ‘new’ characters, by loading it into one of Gamera’s classifiers. The output of Gamera are sets of classified glyphs (i.e. small images of characters), stored in an XML-based format.
However, as OCR typically involves translation of images of text into a machine-readable format, a complementary OCR-module was needed. For this purpose, an external Gamera module for page segmentation was modified and used.
In addition, a script for control of the OCR-process was created, which initiates the page segmentation on Gamera classified glyphs. The result is written to text files.
Finally, in a test for recognition accuracy, one of the theses was used for creation of training data and for test of data. The result from the test show an average accuracy rate of 82% and that there is a need for a better pre-processing module which removes more noise from the images, as well as recognises different character sizes in the images before they are run by the OCR-process.
Larsson, Patrik. "Classification into Readability Levels : Implementation and Evaluation." Thesis, Uppsala University, Department of Linguistics and Philology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7132.
Full textThe use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88.
Uppsatsen beskriver utvecklandet av en klassificeringsmodell för Svenska texter beroende på dess läsbarhet. Användningsområdet för en läsbaretsklassificeringsmodell är främst inom informationssökningssystem. Modellen kan öka träffsäkerheten på de dokument som anses relevanta av en sökmotor genom att matcha användarens krav på läsbarhet med de indexerade dokumentens läsbarhet. Resultatet av uppsatsen är ett antal modeller för klassificering av text beroende på läsbarhet. Modellerna har tagits fram genom att träna upp en Support Vector Machines klassificerare, på ett antal särdrag som av tidigare forskning har fastslagits vara goda mått på läsbarhet. Särdragen extraherades från en korpus som är annoterad med tre läsbarhetsnivåer. Språkteknologiska verktyg för taggning och parsning användes för att möjliggöra extraktionen av särdragen. Särdragen utvärderades empiriskt i olika särdragskombinationer för att optimera modellerna. Modellerna testades och utvärderades med goda resultat. Den bästa modellen hade en precision på 90,21 och en recall på 89,56, detta ger en F-score som är 89,88. Uppsatsen presenterar förslag på vidareutveckling samt potentiella användningsområden.
Kotsifas, Dimitrios. "Intonation and sentence type interpretation in Greek : A production and perception approach." Thesis, University of Skövde, School of Humanities and Informatics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2960.
Full textThis thesis examines the intonation patterns of Modern Greek with regard to different interpretations of the sentence types (declarative, interrogative, imperative).
14 utterances are produced by Greek native speakers (2 men and 2 women) so as to express various speech acts: STATEMENT, QUESTION, COMMAND and REQUEST.
The acquisition of the F0 curve for each utterance by means of the Wavesurfer tool leads to an analysis of the pitch movements and their alignments.
After the F0 curves are analyzed and illustrated using the Excel program we are able to compare and group them. Thus, we come up with 5 different intonation patterns. After a second-level comparison based on the fact that some of the F0 curves were similar but they differed only as far as the final pitch movement is concerned, we ended up with 3 fundamental categories of intonation patterns: Category I whose main feature is the rising pitch movement aligned to the onset of the stressed syllables. This category includes only sentences that denote Statement so we can call it the STATEMENT category. Category II’s main characteristic is a dipping pitch movement aligned to the head of the utterance that is the stress of the verb or a particle that signifies negation (/min/, /den/). Sentences meaning Command or Request belong to this category. Lastly, Category III’s intonation pattern consists of peaking pitch movements aligned to the initial and final stressed syllables. Interrogative sentences belong to this category no matter their interpretation.
A secondary goal of the thesis is to examine to which extent intonation can be a safe criterion for the “correct” interpretation of a sentence. A de facto presumption that since the ratio between the number of utterances (14) and the different intonation patterns (5) is not 1:1 there can always be misunderstandings among speakers, is basically verified by the results of our perception test conducted to Greek native speakers: Greek native speakers were able to identify most of the speech acts that were expressed by the most common (default) sentence type (i.e. imperative sentence for COMMAND and interrogative for QUESTION) however there were combinations that they had difficulties to identify, such as interrogative sentences that were denoting other than QUESTION, e.g. REQUEST or STATEMENT.Ending, a perception test conducted to Flemish speakers (subjects that were native speakers of another language than Greek) showed that they were more successful in sentences that meant STATEMENT and QUESTION but they could hardly identify an interrogative sentence that meant other than QUESTION and they also confused between COMMAND and REQUEST. This implies that the intonation used to convey different interpretations is basically language-dependent.
Concluding, this study offers a description of the intonation patterns (based on pitch movements) regarding the 3 sentence types with 4 different interpretations. Our findings prove that the intonation for some cases (i.e. for sentences that express COMMAND or STATEMENT) seems to be structure-independent and for others structure-dependent (cf. the interrogative sentences). Additionally, the fact that the negation can play an important role for the choice of intonation pattern (as shown for the case of COMMAND and STATEMENT) could be considered as a structure-dependent feature of intonation. This approach contrasts the approach used for many years in the traditional Grammar according to which the structure alone (sentence type) defines the meaning that is to be conveyed.
Hjelm, Hans. "Cross-language Ontology Learning : Incorporating and Exploiting Cross-language Data in the Ontology Learning Process." Doctoral thesis, Stockholms universitet, Institutionen för lingvistik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-8414.
Full textFör att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se
Nilsson, Kristina. "Hybrid Methods for Coreference Resolution in Swedish." Doctoral thesis, Stockholm : Department of Linguistics, Stockholm University, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-38395.
Full textÖquist, Gustav. "Evaluating Readability on Mobile Devices." Doctoral thesis, Uppsala University, Department of Linguistics and Philology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7378.
Full textThe thesis presents findings from five readability studies performed on mobile devices. The dynamic Rapid Serial Visual Presentation (RSVP) format has been enhanced with regard to linguistic adaptation and segmentation as well as eye movement modeling. The novel formats have been evaluated against other common presentation formats including Paging, Scrolling, and Leading in latin-square balanced repeated-measurement studies with 12-16 subjects. Apart from monitoring Reading speed, Comprehension, and Task load (NASA-TLX), Eye movement tracking has been used to learn more about how the text presentation affects reading.
The Page format generally offered best readability. Reading on a mobile phone decreased reading speed by 10% compared to reading on a Personal Digital Assistant (PDA), an interesting finding given that the display area of the mobile phone was 50% smaller. Scrolling, the most commonly used presentation format on mobile devices today, proved inferior to both Paging and RSVP. Leading, the most widely known dynamic format, caused very unnatural eye movements for reading. This seems to have increased task load, but not affected reading speed to a similar extent. The RSVP format displaying one word at time was found to reduce eye movements significantly, but contrary to common claims, this resulted in decreased reading speed and increased task load. In the last study, Predictive Text Presentation (PTP) was introduced. The format is based on RSVP and combines linguistic chunking and adaptation with eye movement modeling to achieve a reading experience that can rival traditional text presentation.
It is explained why readability on mobile devices is important, how it may be evaluated in an efficient and yet reliable manner, and PTP is pinpointed as the format with greatest potential for improvement. The methodology used in the evaluations and the shortcomings of the studies are discussed. Finally, a hyper-graeco-latin-square experimental design is proposed for future evaluations.
Holmqvist, Maria. "Word Alignment by Re-using Parallel Phrases." Licentiate thesis, Linköping University, Linköping University, NLPLAB - Natural Language Processing Laboratory, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15463.
Full textIn this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.
Wärnestål, Pontus. "Dialogue Behavior Management in Conversational Recommender Systems." Doctoral thesis, Linköpings universitet, NLPLAB - Laboratoriet för databehandling av naturligt språk, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-9624.
Full textI denna avhandling undersöks rekommendationsdialog med avseende på utformningen av dialogstrategier f¨or konverserande rekommendationssystem. Syftet med ett rekommendationssystem är att generera personaliserade rekommendationer utifrån potentiellt användbara domänobjekt i stora informationsrymder. I ett konverserande rekommendationssystem angrips detta problem genom att utnyttja naturligt språkk och dialog för att modellera användarpreferenser, liksom för att ge rekommendationer. Grundidén med konverserande rekommendationssystem är att utnyttja dialogsessioner för att upptäcka, uppdatera och utnyttja en användares preferenser för att förutsäga användarens intresse för domänobjekten som modelleras i ett system. Utformningen av dialogstrategihantering är därför en av de viktigaste uppgifterna för sådana system. Baserat på empiriska studier, liksom på utformning och implementering av konverserande rekommendationssystem, presenteras en beteendebaserad dialogmodell som kallas bcorn. bcorns bas utgörs av tre konstruktioner, vilka alla presenteras i denna avhandling. bcorn utnyttjar ett preferensmodelleringsramverk (preflets) som stöder och anv¨ander sig av naturligt språk i dialog och tillåter deskriptiva, komparativa och superlativa preferensuttryck i olika situationer. Den andra komponenten i bcorn är dess interna meddelande-formalism pcql, som är en notation som kan beskriva preferens- och faktiska påståenden och frågor. bcorn är utformat som en generell rekommendationshanteringsstrategi med konventionella, informationsgivande och rekommenderande förmågor, som var och en beskriver naturliga delar av en rekommendationsagents dialogstrategi. Dessa delar modelleras i dialogbeteendediagram som exekveras parallellt för att ge upphov till koherent, flexibel och effektiv dialog i konverserande rekommendationssystem. Tre empiriska studier har utförts för att utforska problemkomplexet som utgör rekommendationsdialog och för att verifiera de lösningar som tagits fram inom ramen för detta arbete. Studie I är en korpusstudie i filmrekommendationsdomänen. Studien resulterar i en karakteristik av rekommendationsdialog, och utgör basen för en första prototyp av dialoghanteringsstrategi för rekommendationsdialog mellan människa och dator. Studie II är en slutanvändarutvärdering av systemet acorn som implementerar denna dialoghanteringsstrategi och resulterar i en verifiering av effektivitet och användbarhet av strategin. Studien resulterar också i implikationer som påverkar utformningen av den modell som används i bcorn. Studie III är en medhörningsutvärdering av det funktionella konverserande rekommendationssystemet CoreSong, som implementerar bcorn-modellen. Resultatet av studien indikerar att det beteendebaserade angreppssättet är funktionellt och att de olika dialogbeteendena i bcorn ger upphov till h¨og informationskvalitet, naturlighet och koherens i rekommendationsdialog.
Svensson, Henrik, and Kalle Lindqvist. "Rättssäker Textanalys." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20396.
Full textNatural language processing is a research area in which new advances are constantly beingmade. A significant portion of text analyses that takes place in this field have the aim ofachieving a satisfactory application in the dialogue between human and computer. In thisstudy, we instead want to focus on what impact natural language processing can have onthe human learning process.Simultaneously, the context for our research has a future impact on one of the mostbasic principles for a legally secure society, namely the writing of the police report.By creating a theoretical foundation of ideas that combines aspects of natural languageprocessing as well as official police report writing and then implementing them in aneducational web platform intended for police students, we are of the opinion that ourresearch adds something new in the computer science and sociological fields.The purpose of this work is to act as the first steps towards a web application thatsupports the Swedish police documentation.
Gorrell, Genevieve. "Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1045s.pdf.
Full textGlant, Oliver. "Attitydanalys av svenska produktomdömen – behövs språkspecifika verktyg?" Thesis, Stockholms universitet, Institutionen för lingvistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-157094.
Full textAttitydanalys av svensk data sker i många fall genom maskinöversättning till engelska för att använda tillgängliga analysverktyg. I den här uppsatsen undersöktes skillnaden mellan användning av ett neuronnät tränat på svensk data och av motsvarande neuronnät tränat på engelsk data. Två datamängder användes: cirka 200 000 icke-neutrala svenska produktomdömen från Prisjakt Sverige AB, en av de största annoterade datamängder som använts för svensk attitydanalys, och 1 000 000 icke-neutrala engelskaproduktomdömen från Amazon.com. Båda versionerna av neuronnätet utvärderades på 11 638 slumpmässigt utvalda svenska produktomdömen, i original och maskinöversatta till engelska. Testmängden hade samma överrepresentation av positiva omdömen som den svenska datamängden (84% positiva omdömen). Resultaten tyder på att engelska verktyg med hjälp av maskinöversättning kan användas för attitydanalys av svenska produktomdömen med bibehållen klassificeringsförmåga, dock krävdes cirka 33% större träningsdata för att det engelska verktyget skulle uppnå maximal klassificeringsförmåga. Utvärdering på den obalanserade datamängden visade sig ställa särskilda krav på de statistiska mått som användes. F1-värde fungerade tillfredsställande endast när det beräknades för den underrepresenterade klassen. Det korrelerade då starkt med Matthews korrelationskoefficient, som tidigare funnits vara ett pålitligare mått. Om korrelationen gäller vid alla olika balanser skulle jämförelser mellan olika studiers resultat underlättas, något som bör undersökas.
Steensland, Henrik, and Dina Dervisevic. "Controlled Languages in Software User Documentation." Thesis, Linköping University, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4637.
Full textIn order to facilitate comprehensibility and translation, the language used in software user documentation must be standardized. If the terminology and language rules are standardized and consistent, the time and cost of translation will be reduced. For this reason, controlled languages have been developed. Controlled languages are subsets of other languages, purposely limited by restricting the terminology and grammar that is allowed.
The purpose and goal of this thesis is to investigate how using a controlled language can improve comprehensibility and translatability of software user documentation written in English. In order to reach our goal, we have performed a case study at IFS AB. We specify a number of research questions that help satisfy some of the goals of IFS and, when generalized, fulfill the goal of this thesis.
A major result of our case study is a list of sixteen controlled language rules. Some examples of these rules are control of the maximum allowed number of words in a sentence, and control of when the author is allowed to use past participles. We have based our controlled language rules on existing controlled languages, style guides, research reports, and the opinions of technical writers at IFS.
When we applied these rules to different user documentation texts at IFS, we managed to increase the readability score for each of the texts. Also, during an assessment test of readability and translatability, the rewritten versions were chosen in 85 % of the cases by experienced technical writers at IFS.
Another result of our case study is a prototype application that shows that it is possible to develop and use a software checker for helping the authors when writing documentation according to our suggested controlled language rules.
Olavison, Jari. "Tolkning av spansk känsloprosodi." Thesis, University of Skövde, Department of Computer Science, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-39.
Full textText-till-talsystem blir allt vanligare i vardagen, och det forskas även en hel del på utvecklingen av tal-till-talöversättningssystem. Många företag använder sig i allt större utsträckning av telefontjänster där automatiska system med syntetiskt tal och taligenkänning ersätter människor. För att vi som konsumenter ska känna att det är bekvämt att nyttja dessa tjänster och förstå budskapen är det viktigt att dessa syntetiska röster låter så naturliga som möjligt. Det som gör en röst naturlig är dess prosodi, dvs.
dess ickesegmentella aspekter såsom röstens intonation, intensitet och tempo, för att nämna några. Prosodin har inte endast lingvistiska funktioner utan den signalerar även känslor och attityder hos talaren. Vem vill lyssna på en syntetisk röst som låter väldigt ledsen eller arg t.ex. när bilens GPS-navigator sorgset talar om att vi ska ta nästa avfart åt höger.
Känslosignalering sker normalt både auditivt och visuellt, en glad person har ofta ett leende på läpparna och talar på ett sätt att vi som lyssnare får intryck av att personen är glad. Denna studie handlar just om den auditiva signaleringen av känslor som jag kallar känsloprosodi.
Det är inte självklart att talare av olika språk signalerar känslor på samma sätt trots att många lingvister, liksom jag, är övertygade om att det finns en viss universalitet, vilket man bör beakta vit tal-till-talöversättningssystem. Av denna anledning har jag i min studie valt att jämföra svenska auditiva känsloyttranden med spanska känsloyttranden.
Detta har jag gjort genom att göra perceptionstester av spanska röster och jämfört resultaten med en tidigare studie av Åsa Abelin och Jens Allwood på Göteborgs universitet (1999) som gjort en liknande studie mha. svenska röster. Jämförelser av misstolkningar av avsedda känslor indikerar bl.a. att vissa känslor verkar uttryckas på olika sätt för spanska och svenska. Tydligast är detta för ”förvåning” som i båda studier i stor utsträckning misstolkats av informanter med annat modersmål än talaren, även ”avsky” verkar uttryckas något annorlunda. Andra resultat som framkom är att svensktalande ofta misstolkar ”ilska” (spansk) som ”glädje” vilket kan jämföras med att spansktalande misstolkade ”glädje” (svensk) som ”sorg”. Studien visar också att känslor som förväxlas ofta är akustiskt lika till uttrycket och även har en del semantiska likheter.
Sundblad, Håkan. "Question Classification in Question Answering Systems." Licentiate thesis, Linköping University, Linköping University, NLPLAB - Natural Language Processing Laboratory, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-9014.
Full textQuestion answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications.
This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst.
The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain.
Report code: LiU-Tek-Lic-2007:29.
Eklund, Robert. "Disfluency in Swedish human-human and human-machine travel booking dialogues /." Doctoral thesis, Linköping : Univ, 2004. http://www.ep.liu.se/diss/science_technology/08/82/index.html.
Full textLilliehöök, Hampus. "Extraction of word senses from bilingual resources using graph-based semantic mirroring." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-91880.
Full textI det här arbetet utvinner vi semantisk information som existerar implicit i tvåspråkig data. Vi samlar indata genom att upprepa proceduren semantisk spegling. Datan representeras som vektorer i en stor vektorrymd. Vi bygger sedan en resurs med synonymkluster genom att applicera K-means-algoritmen på vektorerna. Vi granskar resultatet för hand med hjälp av ordböcker, och mot WordNet, och diskuterar möjligheter och tillämpningar för metoden.
Saers, Markus. "Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation." Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-135704.
Full textAndersson, Karin. "'Consider' and its Swedish equivalents in relation to machine translation." Thesis, University of Skövde, School of Humanities and Informatics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-771.
Full textThis study describes the English verb ’consider’ and the characteristics of some of its senses. An investigation of this kind may be useful, since a machine translation program, SYSTRAN, has invariably translated ’consider’ with the Swedish verbs ’betrakta’ (Eng: ’view’, regard’) and ’anse’ (Eng: ’regard’). This handling of ’consider’ is not satisfactory in all contexts.
Since ’consider’ is a cogitative verb, it is fascinating to observe that both the theory of semantic primes and universals and conceptual semantics are concerned with cogitation in various ways. Anna Wierzbicka, who is one of the advocates of semantic primes and universals, argues that THINK should be considered as a semantic prime. Moreover, one of the prime issues of conceptual semantics is to describe how thoughts are constructed by virtue of e.g. linguistic components, perception and experience.
In order to define and clarify the distinctions between the different senses, we have taken advantage of the theory of mental spaces.
This thesis has been structured in accordance with the meanings that have been indicated in WordNet as to ’consider’. As a consequence, the senses that ’consider’ represents have been organized to form the subsequent groups: ’Observation’, ’Opinion’ together with its sub-group ’Likelihood’ and ’Cogitation’ followed by its sub-group ’Attention/Consideration’.
A concordance tool, http://www.nla.se/culler, provided us with 90 literary quotations that were collected in a corpus. Afterwards, these citations were distributed between the groups mentioned above and translated into Swedish by SYSTRAN.
Furthermore, the meanings as to ’consider’ have also been related to the senses, recorded by the FrameNet scholars. Here, ’consider’ is regarded as a verb of ’Cogitation’ and ’Categorization’.
When this study was accomplished, it could be inferred that certain senses are connected to specific syntactic constructions. In other cases, however, the distinctions between various meanings can only be explained by virtue of semantics.
To conclude, it appears to be likely that an implementation is facilitated if a specific syntactic construction can be tied to a particular sense. This may be the case concerning some meanings of ’consider’. Machine translation is presumably a much more laborious task, if one is solely governed by semantic conditions.
Persson, Hans. "Persons with functional difficulties as resources in ICT design processes." Licentiate thesis, KTH, Human - Computer Interaction, MDI, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4676.
Full textDenna avhandling har sin grund i mina erfarenheter av att arbete med människor som har funktionsnedsättningar. Vanligtvis är denna grupp den sista en producent ser som sina kunder. Det är ganska vanligt att producenter gör olika produkter(produkter och tjänster) för personer med funktionsnedsättningar och en för andra. Om man istället, i designarbetet utgår från synsättet att de flesta personer vid någon tidpunkt och/eller plats har funktionssvårigheter så blir den potentiella kundgruppen större för produkten.
Ursprunget för avhandlingen är ett projekt, vilket drevs av PTS (Post och Telestyrelsen), med syfte att identifiera vilka typer av stöd eller anpassningar personer med intellektuella funktionsnedsättningar har för att använda bredbandsbaserade tjänster. Resultatet i projektet pekade ut ett antal svårighetsområden där flertalet av dessa svårighetsområden inte var unika för denna grupp.
Utifrån resultat i ovanstående projekt togs det fram en test-, utvärderings- och designmodell (TED-modellen) där ett av stegen använde en ”indikatorgrupp”. Syftet med modellen är att identifiera och ge underlag för att prioritera vilka svårighetsområden det fortsatta designarbetet skall fokuseras på. Indikatorgruppen består av individer med funktionssvårigheter som är relevanta i sammanhanget. Modellen tar vara på möjligheterna i ”design för alla” för att göra att göra bättre produkter för människorna.
De empiriska studierna i denna uppsats är gjorda inom två områden. Den första är i ett designsammanhang, där fem olika hemsidor skulle tas fram och den andra är runt en studie av tre olika affärsarbetsplatser, där kassafunktionen var i fokus för studien.
Resultatet i denna uppsats pekar ut en möjlig inriktning för en designmetodologi, vars målsättning är att få fram bättre produkter för en större grupp. Utgångspunkten är att använda människors olikheter som en möjlighet och inte som ett problem.
Individer med funktionella svårigheter är en resurs för att finna nya innovationer vilket jag har benämnt ”the Lead of Need”. Med detta menar jag att individer med funktionella svårigheter, som har ett behov, en ide för en lösning, men inte har möjlighet att förverkliga denna. Om vi kan organisera en mötesplats för individer med ”the Lead of Need”, designers och utvecklare så har vi skapat ett ”Living lab” för nya innovationer.
This thesis has its roots in my experiences of working with people who have some forms of disability. Usually this group is the last group producers consider as their customers. It is quite common that producers make different products (and services) for individuals with disabilities and for others. If one instead takes the position, in the design work, that most people have some functional difficulties at some point in time or in place, then the potential customer group becomes larger for the product in question.
The origin of this thesis is a project run by the Swedish Post and Telecom Agency (PTS), aiming to identify what kind of support or adaptation people with intellectual disabilities needs when using broadband based services. The result of the project pointed out areas of difficulties. Most areas of difficulties were not unique for this group.
From the result of the PTS-project, a design and evaluation model (TED-model) was built, where one of the steps involved the use of an “indicator group”. The aim for this step is to identify and give basis for prioritizing areas of difficulty that the continued design work should focus on. The indicator group consists of individuals with functional difficulties relevant in a specified context. This method uses the possibilities of “design for all” as facilitator to design better products for more people.
The empirical studies in this thesis were carried out within two areas. The first study was made in a design project, where five different web sites were to be designed, and the second one dealt with three different business workplaces in which the cashier workplaces was in focus.
The results of this thesis point out a possible direction of a design methodology, whose objective is to create better products for larger group of people. The starting point is to use people's differences as a possibility for design, and not a problem.
Individuals with functional difficulties constitute a resource for finding new innovations, which I have termed “the Lead of Need”. With this I mean individuals with functional difficulties, who have a need, an idea for a solution, but not the possibility to make it happen. If we can organise a meeting ground for individuals with “the Lead of Need”, designers, and developers, we will have created a “living lab” for new innovations.
Stymne, Sara. "Compound Processing for Phrase-Based Statistical Machine Translation." Licentiate thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-51416.
Full textLaveborn, Joel. "Video Game Vocabulary : The effect of video games on Swedish learners‟ word comprehension." Thesis, Karlstad University, Karlstad University, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-5487.
Full textVideo games are very popular among children in the Western world. This study was done in order to investigate if video games had an effect on 49 Swedish students‟ comprehension of English words (grades 7-8). The investigation was based on questionnaire and word test data. The questionnaire aimed to measure with which frequency students were playing video games, and the word test aimed to measure their word comprehension in general. In addition, data from the word test were used to investigate how students explained the words. Depending on their explanations, students were categorized as either using a “video game approach” or a “dictionary approach” in their explanations.
The results showed a gender difference, both with regard to the frequency of playing and what types of games that were played. Playing video games seemed to increase the students‟ comprehension of English words, though there was no clear connection between the frequency with which students were playing video games and the choice of a dictionary or video game approach as an explanation.
Karlsson, Jessica. "Den offentliga dagboken : Vilka uttrycksmedel använder sig gymnasieungdomar av på dagboksbloggar?" Thesis, Jönköping University, School of Education and Communication, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-8734.
Full text
Internet har sedan starten öppnat nya portar för kommunikation. En av de allra populäraste just nu är att blogga. Att uttrycka sig språkligt har kommit att bli så mycket mer än bara att använda sig av ord. På bloggen ges möjlighet att tillföra bild, film, färg och att använda olika typografiska medel, såsom att kursivera eller göra text fetstilt. Element som alla bidrar till hur text tolkas.
Utifrån fjorton dagboksbloggar och totalt 289 blogginlägg har min uppsats syftat till att undersöka hur framställning på dessa bloggar, tillhörande gymnasieelever, skett.
Mina frågeställningar jag utgått ifrån lyder:
- Hur använder sig gymnasieungdomar av olika uttrycksmedel för att estetiskt och kreativt skapa ett blogginlägg på så kallade dagboksbloggar?
- Hur används rubriksättning, bild, film, färg och olika stilformat på texten för att skapa kommunikation och olika uttryck på blogginläggen?
- Hur förhåller sig gymnasieungdomars dagboksblogg till den traditionella dagboken vad det gäller utformning och kommunikationsmöjligheter?
Genom en strukturalistisk analys, med utgångspunkt hos Jurij Lotman, har jag gripit mig an blogginläggen på olika plan där jag både undersökt detaljer i texten och övergripande utformning. Jag har funnit att dagboksbloggen och dagboken skiljer sig på flera plan. Främst i fråga om kommunikationen som sker öppet på dagboksbloggen. Språkligt utmärker sig bloggen främst genom att ord och meningar betonas genom fetstilt och kursiv text, både för att göra texten mer lättövergriplig men också för att betona uttryck. Smileys och andra känslouttryck visar i sin tur hur ungdomarna undviker missförstånd på ett sätt som inte kräver bearbetning av texten. Jag vill säga att uppsatsen visar på hur en vidgad syn på språklighet och kommunikation idag är nödvändig, i och med de nya medel som tillkommit i dagens IT-samhälle.
Internet has since the beginning widened the form of communication. In recent times one of the most popular form is via blogs.
To express yourself has become more than words. The blogs give you the ability to add pictures, videos, colors and more. You are also able to use typological medium like italic and bold types. All these elements contribute to how the text is read and interpreted.
From 14 different diary blogs written by high school students and 289 posts in total my thesis intend to study which method of fabrications these blogs use.
The question formulations I have based my thesis on are:
· How do high school students use different ways of expressions to esthetical and creatively create posts at the so called diary blogs?
- How does headlining, pictures, film, colour and different typological medium being used to create communication and different expression on the posts?
· How does the diary blog relate to the traditional diary regarding the formation and forms of communication?
Through a structuralistic analysis method based on Jurij Lotman’s analysis I’ve approached the posts on different levels, where I examine details in the text but also the structure. I’ve found that the diary blog and the diary separate from each other on several plans, foremost the way of communication which is overt in a diary blog. Linguistically the diary blog distinguish itself from diaries by the way to be able to emphasize words or a sentence with italic and bold types. Smileys and different kinds of emotional forms of expressions are used by the blogger to avoid misconceptions.
The thesis has proven that a widening way of looking at linguistic and communications are necessary due to the new medium that comes with the IT.
Bjerva, Johannes. "Predicting the N400 Component in Manipulated and Unchanged Texts with a Semantic Probability Model." Thesis, Stockholms universitet, Avdelningen för datorlingvistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-82654.
Full textInnom datalingvistikken har tidligere forskning gjort framsteg når det gjelder å kombinere ordromsmodeller og n-grammodeller. Dette er av spesiell interesse når det er ønskelig å ha en modell som fanger både semantisk og syntaktisk informasjon. Et potensielt bruksområde for en slik modell finnes innom psykolingvistikk, der en neural respons som kalles N400 vist seg å oppstå i kontekster med semantisk inkongruens. Tidligere forskning har oppdaget en sterk korrelasjon mellom cloze probabilities og N400, og nylig forskning har funnet korrelasjoner mellom cloze probabilities og sannsynlighetsmodeller fra datalingvistikk. Denne oppgaven har som mål å undersøke hvorvidt en mer direkte kobling mellom slike kombinerte modeller og N400 finnes, med hypotesen at lave sannsynligheter leder til store N400-responser og omvendt. Et antall forsøkspersoner leste en tekst manipulert ved hjelp av en slik modell, og en naturlig tekst, i et EEG-eksperiment. Resultatsanalysen viser at manipuleringene til en viss grad gav resultat som støtter hypotesen. Tilsvarende resultat ble funnet under resultatanalysen av responsene til den naturlige teksten. Ingen signifikante korrelasjoner ble oppdaget mellom N400 og den kombinerte modellen. Forbedringer for videre forskning involverer å blant annet forbedre eksperimentparadigmet slik at en storstilt EEG-inspilling kan gjennomføres for å konstruere en EEG-korpus.
Inom datalingvistiken har tidigare forskning visat lovande resultat vid kombinering av ordrumsmodeller och n-gramsmodeller. Detta är av speciellt intresse när det är önskvärt att ha en modell som fångar både semantisk och syntaktisk information. Ett potensielt användningsområde för en sådan modell finns inom psykolingvistiken, där en neural respons kallad N400 visat sig uppstå i situationer med semantisk inkongruens. Tidigare forskning har upptäckt en stark korrelation mellan cloze probabilities och N400, medan en nyare studie har upptäckt en korrelation mellan cloze probabilities och sannolikhetsmodeller från datalingvistiken. Denna uppsats har som mål att undersöka huruvida en mer direkt koppling mellan sådana kombinerade modeller och N400 finns, med hypotesen att låga sannolikheter leder till stora N400-responser och vice versa. Ett antal försökspersoner läste en text manipulerad med hjälp av en probabilistisk modell, och en naturlig text, i ett EEG-experiment. Resultatsanalysen visar att manipuleringen till viss grad gav resultat som stödjer hypotesen. Motsvarande resultat hittades under resultatanalysen av responserna till den naturliga texten. Inga signifikanta korrelationer blev upptäckta mellan N400 och den kombinerade modellen. Förbättringar för vidare forskning involverar bland annat att förbättra experimentparadigmet så att en storskalig EEG-inspelning kan genomföras för att konstruera en EEG-korpus.