Contents
Academic literature on the topic 'Datorlingvistik'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Datorlingvistik.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Dissertations / Theses on the topic "Datorlingvistik"
Eklund, Robert, and Mats Wirén. "Effects of open and directed prompts on filled pauses and utterance production." Stockholms universitet, Avdelningen för datorlingvistik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-40230.
Full textvon, Kartaschew Filip. "Grundtonsstrategier vid tonlösa segment." Thesis, Uppsala University, Department of Linguistics and Philology, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8270.
Full textProsodimodeller som bl.a. kan användas i talsynteser grundar sig ofta på analyser av tal som består av enbart tonande segment. Framför tonlös konsonant saknar vokalsegments grundtonskurvor möjlig fortsättning och blir dessutom kortare. Detta brukar då justeras med hjälp av trunkering av grundtonskurvan. Tidigare studier har i korthet visat att skillnader, förutom trunkering, i vokalers grundtonskurva kan uppstå beroende på om efterföljande segment är tonande eller tonlöst. Med utgångspunkt från dessa studier undersöks i detta examensarbete grundtonskurvan i svenska satser. Även resultaten i denna studie visar att olika strategier i grundtonskurvan används, och att trunkering inte räcker för att förklara vad som sker med grundtonskurvan i dessa kontexter. Generellt visar resultaten på att det verkar viktigt för försökspersonerna att behålla den information som grundtonskurvan ger i form av max- och minimumvärde, och att fall och stigningar så långt det går bibehålls.
Tiedemann, Jörg. "Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing." Doctoral thesis, Uppsala University, Department of Linguistics, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3791.
Full textThe focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.
Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.
Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.
A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).
Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.
Axelsson, Hans, and Oskar Blom. "Utveckling av ett svensk-engelskt lexikon inom tåg- och transportdomänen." Thesis, Uppsala University, Department of Linguistics and Philology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8269.
Full textThis paper describes the process of building a machine translation lexicon for use in the train and transport domain with the machine translation system MATS. The lexicon will consist of a Swedish part, an English part and links between them and is derived from a Trados
translation memory which is split into a training(90%) part and a testing(10%) part. The task is carried out mainly by using existing word linking software and recycling previous machine translation lexicons from other domains. In order to do this, a method is developed where focus lies on automation by means of both existing and self developed software, in combination with manual interaction. The domain specific lexicon is then extended with a domain neutral core lexicon and a less domain neutral general lexicon. The different lexicons are automatically and manually evaluated through machine translation on the test corpus. The automatic evaluation of the largest lexicon yielded a NEVA score of 0.255 and a BLEU score of 0.190. The manual evaluation saw 34% of the segments correctly translated, 37%, although not correct, perfectly understandable and 29% difficult to understand.
Sandgren, Frida. "Creation of a customised character recognition application." Thesis, Uppsala University, Department of Linguistics and Philology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-4801.
Full textThis master’s thesis describes the work in creating a customised optical character recognition (OCR) application; intended for use in digitisation of theses submitted to the Uppsala University in the 18th and 19th centuries. For this purpose, an open source software called Gamera has been used for recognition and classification of the characters in the documents. The software provides specific algorithms for analysis of heritage documents and is designed to be used as a tool for creating domain-specific (i.e. customised) recognition applications.
By using the Gamera classifier training interface, classifier data was created which reflects the characters in the particular theses. The data can then be used in automatic recognition of ‘new’ characters, by loading it into one of Gamera’s classifiers. The output of Gamera are sets of classified glyphs (i.e. small images of characters), stored in an XML-based format.
However, as OCR typically involves translation of images of text into a machine-readable format, a complementary OCR-module was needed. For this purpose, an external Gamera module for page segmentation was modified and used.
In addition, a script for control of the OCR-process was created, which initiates the page segmentation on Gamera classified glyphs. The result is written to text files.
Finally, in a test for recognition accuracy, one of the theses was used for creation of training data and for test of data. The result from the test show an average accuracy rate of 82% and that there is a need for a better pre-processing module which removes more noise from the images, as well as recognises different character sizes in the images before they are run by the OCR-process.
Larsson, Patrik. "Classification into Readability Levels : Implementation and Evaluation." Thesis, Uppsala University, Department of Linguistics and Philology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7132.
Full textThe use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88.
Uppsatsen beskriver utvecklandet av en klassificeringsmodell för Svenska texter beroende på dess läsbarhet. Användningsområdet för en läsbaretsklassificeringsmodell är främst inom informationssökningssystem. Modellen kan öka träffsäkerheten på de dokument som anses relevanta av en sökmotor genom att matcha användarens krav på läsbarhet med de indexerade dokumentens läsbarhet. Resultatet av uppsatsen är ett antal modeller för klassificering av text beroende på läsbarhet. Modellerna har tagits fram genom att träna upp en Support Vector Machines klassificerare, på ett antal särdrag som av tidigare forskning har fastslagits vara goda mått på läsbarhet. Särdragen extraherades från en korpus som är annoterad med tre läsbarhetsnivåer. Språkteknologiska verktyg för taggning och parsning användes för att möjliggöra extraktionen av särdragen. Särdragen utvärderades empiriskt i olika särdragskombinationer för att optimera modellerna. Modellerna testades och utvärderades med goda resultat. Den bästa modellen hade en precision på 90,21 och en recall på 89,56, detta ger en F-score som är 89,88. Uppsatsen presenterar förslag på vidareutveckling samt potentiella användningsområden.
Kotsifas, Dimitrios. "Intonation and sentence type interpretation in Greek : A production and perception approach." Thesis, University of Skövde, School of Humanities and Informatics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2960.
Full textThis thesis examines the intonation patterns of Modern Greek with regard to different interpretations of the sentence types (declarative, interrogative, imperative).
14 utterances are produced by Greek native speakers (2 men and 2 women) so as to express various speech acts: STATEMENT, QUESTION, COMMAND and REQUEST.
The acquisition of the F0 curve for each utterance by means of the Wavesurfer tool leads to an analysis of the pitch movements and their alignments.
After the F0 curves are analyzed and illustrated using the Excel program we are able to compare and group them. Thus, we come up with 5 different intonation patterns. After a second-level comparison based on the fact that some of the F0 curves were similar but they differed only as far as the final pitch movement is concerned, we ended up with 3 fundamental categories of intonation patterns: Category I whose main feature is the rising pitch movement aligned to the onset of the stressed syllables. This category includes only sentences that denote Statement so we can call it the STATEMENT category. Category II’s main characteristic is a dipping pitch movement aligned to the head of the utterance that is the stress of the verb or a particle that signifies negation (/min/, /den/). Sentences meaning Command or Request belong to this category. Lastly, Category III’s intonation pattern consists of peaking pitch movements aligned to the initial and final stressed syllables. Interrogative sentences belong to this category no matter their interpretation.
A secondary goal of the thesis is to examine to which extent intonation can be a safe criterion for the “correct” interpretation of a sentence. A de facto presumption that since the ratio between the number of utterances (14) and the different intonation patterns (5) is not 1:1 there can always be misunderstandings among speakers, is basically verified by the results of our perception test conducted to Greek native speakers: Greek native speakers were able to identify most of the speech acts that were expressed by the most common (default) sentence type (i.e. imperative sentence for COMMAND and interrogative for QUESTION) however there were combinations that they had difficulties to identify, such as interrogative sentences that were denoting other than QUESTION, e.g. REQUEST or STATEMENT.Ending, a perception test conducted to Flemish speakers (subjects that were native speakers of another language than Greek) showed that they were more successful in sentences that meant STATEMENT and QUESTION but they could hardly identify an interrogative sentence that meant other than QUESTION and they also confused between COMMAND and REQUEST. This implies that the intonation used to convey different interpretations is basically language-dependent.
Concluding, this study offers a description of the intonation patterns (based on pitch movements) regarding the 3 sentence types with 4 different interpretations. Our findings prove that the intonation for some cases (i.e. for sentences that express COMMAND or STATEMENT) seems to be structure-independent and for others structure-dependent (cf. the interrogative sentences). Additionally, the fact that the negation can play an important role for the choice of intonation pattern (as shown for the case of COMMAND and STATEMENT) could be considered as a structure-dependent feature of intonation. This approach contrasts the approach used for many years in the traditional Grammar according to which the structure alone (sentence type) defines the meaning that is to be conveyed.
Hjelm, Hans. "Cross-language Ontology Learning : Incorporating and Exploiting Cross-language Data in the Ontology Learning Process." Doctoral thesis, Stockholms universitet, Institutionen för lingvistik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-8414.
Full textFör att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.se
Nilsson, Kristina. "Hybrid Methods for Coreference Resolution in Swedish." Doctoral thesis, Stockholm : Department of Linguistics, Stockholm University, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-38395.
Full textÖquist, Gustav. "Evaluating Readability on Mobile Devices." Doctoral thesis, Uppsala University, Department of Linguistics and Philology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7378.
Full textThe thesis presents findings from five readability studies performed on mobile devices. The dynamic Rapid Serial Visual Presentation (RSVP) format has been enhanced with regard to linguistic adaptation and segmentation as well as eye movement modeling. The novel formats have been evaluated against other common presentation formats including Paging, Scrolling, and Leading in latin-square balanced repeated-measurement studies with 12-16 subjects. Apart from monitoring Reading speed, Comprehension, and Task load (NASA-TLX), Eye movement tracking has been used to learn more about how the text presentation affects reading.
The Page format generally offered best readability. Reading on a mobile phone decreased reading speed by 10% compared to reading on a Personal Digital Assistant (PDA), an interesting finding given that the display area of the mobile phone was 50% smaller. Scrolling, the most commonly used presentation format on mobile devices today, proved inferior to both Paging and RSVP. Leading, the most widely known dynamic format, caused very unnatural eye movements for reading. This seems to have increased task load, but not affected reading speed to a similar extent. The RSVP format displaying one word at time was found to reduce eye movements significantly, but contrary to common claims, this resulted in decreased reading speed and increased task load. In the last study, Predictive Text Presentation (PTP) was introduced. The format is based on RSVP and combines linguistic chunking and adaptation with eye movement modeling to achieve a reading experience that can rival traditional text presentation.
It is explained why readability on mobile devices is important, how it may be evaluated in an efficient and yet reliable manner, and PTP is pinpointed as the format with greatest potential for improvement. The methodology used in the evaluations and the shortcomings of the studies are discussed. Finally, a hyper-graeco-latin-square experimental design is proposed for future evaluations.
Books on the topic "Datorlingvistik"
Väyrynen, Pertti. Perspectives on the utility of linguistic knowledge in English word prediction. Oulu: Oulun yliopisto, 2005.
Find full text