Tesis sobre el tema "Statistical linguistics"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Statistical linguistics".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Onnis, Luca. "Statistical language learning". Thesis, University of Warwick, 2003. http://wrap.warwick.ac.uk/54811/.
Texto completoZhang, Lidan y 张丽丹. "Exploiting linguistic knowledge for statistical natural language processing". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46506299.
Texto completoWhite, Christopher Wm. "Some Statistical Properties of Tonality, 1650-1900". Thesis, Yale University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3578472.
Texto completoThis dissertation investigates the statistical properties present within corpora of common practice music, involving a data set of more than 8,000 works spanning from 1650 to 1900, and focusing specifically on the properties of the chord progressions contained therein.
In the first chapter, methodologies concerning corpus analysis are presented and contrasted with text-based methodologies. It is argued that corpus analyses not only can show large-scale trends within data, but can empirically test and formalize traditional or inherited music theories, while also modeling corpora as a collection of discursive and communicative materials. Concerning the idea of corpus analysis as an analysis of discourse, literature concerning musical communication and learning is reviewed, and connections between corpus analysis and statistical learning are explored. After making this connection, we explore several problems with models of musical communication (e.g., music's composers and listeners likely use different cognitive models for their respective production and interpretation) and several implications of connecting corpora to cognitive models (e.g., a model's dependency on a particular historical situation).
Chapter 2 provides an overview of literature concerning computational musical analysis. The divide between top-down systems and bottom-up systems is discussed, and examples of each are reviewed. The chapter ends with an examination of more recent applications of information theory in music analysis.
Chapter 3 considers various ways corpora can be grouped as well as the implications those grouping techniques have on notions of musical style. It is hypothesized that the evolution of musical style can be modeled through the interaction of corpus statistics, chronological eras, and geographic contexts. This idea is tested by quantifying the probabilities of various composers' chord progressions, and cluster analyses are performed on these data. Various ways to divide and group corpora are considered, modeled, and tested.
In the fourth chapter, this dissertation investigates notions of harmonic vocabulary and syntax, hypothesizing that music involves syntactic regularity in much the same way as occurs in spoken languages. This investigation first probes this hypothesis through a corpus analysis of the Bach chorales, identifying potential syntactic/functional categories using a Hidden Markov Model. The analysis produces a three-function model as well as models with higher numbers of functions. In the end, the data suggest that music does indeed involve regularities, while also arguing for a definition of chord function that adds subtlety to models used by traditional music theory. A number of implications are considered, including the interaction of chord frequency and chord function, and the preeminence of triads in the resulting syntactic models.
Chapter 5 considers a particularly difficult problem of corpus analysis as it relates to musical vocabulary and syntax: the variegated and complex musical surface. One potential algorithm for vocabulary reduction is presented. This algorithm attempts to change each chord within an n-grams to its subset or superset that maximizes the probability of that trigram occurring. When a corpus of common-practice music is processed using this algorithm, a standard tertian chord vocabulary results, along with a bigram chord syntax that adheres to our intuitions concerning standard chord function.
In the sixth chapter, this study probes the notion of musical key as it concerns communication, suggesting that if musical practice is constrained by its point in history and progressions of chords exhibit syntactic regularities, then one should be able to build a key-finding model that learns to identify key by observing some historically situated corpus. Such a model is presented, and is trained on the music of a variety of different historical periods. The model then analyzes two famous moments of musical ambiguity: the openings of Beethoven's Eroica and Wagner's prelude to Tristan und Isolde. The results confirm that different corpus-trained models produce subtly different behavior.
The dissertation ends by considering several general and summarizing issues, for instance the notion that there are many historically-situated tonal models within Western music history, and that the difference between listening and compositional models likely accounts for the gap between the complex statistics of the tonal tradition and traditional concepts in music theory.
Arad, Iris. "A quasi-statistical approach to automatic generation of linguistic knowledge". Thesis, University of Manchester, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358872.
Texto completoMcMahon, John George Gavin. "Statistical language processing based on self-organising word classification". Thesis, Queen's University Belfast, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.241417.
Texto completoClark, Stephen. "Class-based statistical models for lexical knowledge acquisition". Thesis, University of Sussex, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.341541.
Texto completoLakeland, Corrin y n/a. "Lexical approaches to backoff in statistical parsing". University of Otago. Department of Computer Science, 2006. http://adt.otago.ac.nz./public/adt-NZDU20060913.134736.
Texto completoStymne, Sara. "Compound Processing for Phrase-Based Statistical Machine Translation". Licentiate thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-51416.
Texto completoYamangil, Elif. "Rich Linguistic Structure from Large-Scale Web Data". Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11162.
Texto completoEngineering and Applied Sciences
Phillips, Aaron B. "Modeling Relevance in Statistical Machine Translation: Scoring Alignment, Context, and Annotations of Translation Instances". Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/134.
Texto completoMadden, Joshua. "A statistical analysis of high-traffic websites". Thesis, Kansas State University, 2014. http://hdl.handle.net/2097/17650.
Texto completoDepartment of Journalism and Mass Communications
Steven Smethers
Although scholars have increasingly recognized the important role of the Internet within the field of mass communications, little research has been done analyzing the behavior of individuals online. The success or failure of a site is often dependent on the number of visitors it receives (often called “traffic”) and this includes newspapers that are attempting to direct larger audiences to their websites. Theoretical arguments have been made for certain factors (region, social media presence, backlinks, etc.) having a positive correlation with traffic, but few, if any, statistical analyses have been done on traffic patterns. This study looks at a sample of approximately 300 high-traffic websites and forms several regression models in order to analyze which factors are most highly correlated with Internet traffic and what the nature of that correlation is.
Tweedie, Fiona Jane. "A statistical investigation into the provenance of De Doctrina Christiana, attributed to John Milton". Thesis, University of the West of England, Bristol, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.364078.
Texto completoJoelsson, Jakob. "Translationese and Swedish-English Statistical Machine Translation". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305199.
Texto completoDehdari, Jonathan. "A Neurophysiologically-Inspired Statistical Language Model". The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1399071363.
Texto completoFilali, Karim. "Multi-dynamic Bayesian networks for machine translation and NLP /". Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6857.
Texto completoNtantiso, Mzamo. "Exploring the statistical equivalence of the English and Xhosa versions of the Woodcock-Munõz Language Survey". Thesis, Nelson Mandela Metropolitan University, 2009. http://hdl.handle.net/10948/d1018620.
Texto completoSu, Kim Nam. "Statistical modeling of multiword expressions". Connect to thesis, 2008. http://repository.unimelb.edu.au/10187/3147.
Texto completoOur goals in this research are: to use computational techniques to shed light on the underlying linguistic processes giving rise to MWEs across constructions and languages; to generalize existing techniques by abstracting away from individual MWE types; and finally to exemplify the utility of MWE interpretation within general NLP tasks.
In this thesis, we target English MWEs due to resource availability. In particular, we focus on noun compounds (NCs) and verb-particle constructions (VPCs) due to their high productivity and frequency.
Challenges in processing noun compounds are: (1) interpreting the semantic relation (SR) that represents the underlying connection between the head noun and modifier(s); (2) resolving syntactic ambiguity in NCs comprising three or more terms; and (3) analyzing the impact of word sense on noun compound interpretation. Our basic approach to interpreting NCs relies on the semantic similarity of the NC components using firstly a nearest-neighbor method (Chapter 5), then verb semantics based on the observation that it is often an underlying verb that relates the nouns in NCs (Chapter 6), and finally semantic variation within NC sense collocations, in combination with bootstrapping (Chapter 7).
Challenges in dealing with verb-particle constructions are: (1) identifying VPCs in raw text data (Chapter 8); and (2) modeling the semantic compositionality of VPCs (Chapter 5). We place particular focus on identifying VPCs in context, and measuring the compositionality of unseen VPCs in order to predict their meaning. Our primary approach to the identification task is to adapt localized context information derived from linguistic features of VPCs to distinguish between VPCs and simple verb-PP combinations. To measure the compositionality of VPCs, we use semantic similarity among VPCs by testing the semantic contribution of each component.
Finally, we conclude the thesis with a chapter-by-chapter summary and outline of the findings of our work, suggestions of potential NLP applications, and a presentation of further research directions (Chapter 9).
Bijleveld, Henny. "Linguistiche analysis van neurogeen stotteren". Doctoral thesis, Universite Libre de Bruxelles, 1999. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211864.
Texto completoRobinson, Cory S. "A Statistical Approach to Syllabic Alliteration in the Odyssean Aeneid". BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4199.
Texto completoUrieli, Assaf. "Robust French syntax analysis : reconciling statistical methods and linguistic knowledge in the Talismane toolkit". Phd thesis, Université Toulouse le Mirail - Toulouse II, 2013. http://tel.archives-ouvertes.fr/tel-01058143.
Texto completoWright, Christopher M. "Using Statistical Methods to Determine Geolocation Via Twitter". TopSCHOLAR®, 2014. http://digitalcommons.wku.edu/theses/1372.
Texto completoChan, Oscar. "Prosodic features for a maximum entropy language model". University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.
Texto completoJarman, Jay. "Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages". Scholar Commons, 2011. http://scholarcommons.usf.edu/etd/3166.
Texto completoCorradini, Ryan Arthur. "A Hybrid System for Glossary Generation of Feature Film Content for Language Learning". BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2238.
Texto completoPacker, Thomas L. "Surface Realization Using a Featurized Syntactic Statistical Language Model". Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1195.pdf.
Texto completoMurakami, Akira. "Individual variation and the role of L1 in the L2 development of English grammatical morphemes : insights from learner corpora". Thesis, University of Cambridge, 2014. https://www.repository.cam.ac.uk/handle/1810/254430.
Texto completoBernhardsson, Sebastian. "Structures in complex systems : Playing dice with networks and books". Doctoral thesis, Umeå universitet, Institutionen för fysik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-27694.
Texto completoKomplexa system är varken perfekt ordnade eller helt slumpmässiga. De består av en mängd aktörer, som i många fall agerar tillsammans på ett sådant sätt att deras kombinerade styrka är större än deras individuella prestationer. Det är ofta effektivt att representera dessa system som nätverk där de faktiska kopplingarna mellan aktörerna spelar en avgörande roll. Nätverk finns överallt omkring oss och är en viktig del av vår värld , från proteinmaskineriet inne i våra celler till sociala samspel och människotillverkade kommunikationssystem.Många av dessa system har utvecklats under lång tid och genomgår hela tiden förändringar som drivs på av komplicerade småskaliga händelser.Dessa händelser är ofta för komplicerade för oss att noggrant kunna analysera, vilket får vår värld att verka slumpmässig och oförutsägbar. Det finns dock sätt att använda denna oförutsägbarhet till vår fördel genom att byta ut de verkliga händelserna mot mycket enklare regler baserade på sannolikheter, som ger effektivt sett samma utfall. Detta tillåter oss att fånga systemets övergripande uppförande, att utvinna viktig information om systemets dynamik och att få kunskap om anledningen till vad vi observerar. Statistisk mekanik hanterar stora system pådrivna av sådana underliggande slumpmässiga processer under olika restriktioner, på liknande sätt som nätverk inne i celler drivs av slumpmässiga mutationer under restriktionerna från naturligt urval. Denna likhet gör det intressant att kombinera de två och att applicera de verktyg som ges av statistisk mekanik på biologiska system. I denna avhandling presenteras flera nollmodeller som, baserat på detta synsätt, fångar och förklarar olika typer av strukturella egenskaper hos verkliga biologiska nätverk. Den senaste stora evolutionära övergången är utvecklandet av språk, både talat och skrivet. Denna avhandling tar också upp ämnet om kvantitativ linguistik genom en fysikers ögon, här kallat linguafysik. även i detta fall så analyseras data med ett antagande om en underliggande slumpmässighet. Det demonstreras att vissa statistiska egenskaper av böcker, som man tidigare trott vara universella, egentligen beror på bokens längd och på författaren. En metaboksteori ställs fram vilken förklarar detta beroende genom att beskriva författandet av en text som att rycka ut en sektion ur en stor, individuell, abstrakt moderbok.
Botha, Gerrti Reinier. "Text-based language identification for the South African languages". Pretoria : [s.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-090942008-133715/.
Texto completoEklund, Robert. "A Probabilistic Tagging Module Based on Surface Pattern Matching". Thesis, Stockholm University, Department of Computational Linguistics, Institute of Linguistics, 1993. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-135294.
Texto completoSaers, Markus. "Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation". Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-135704.
Texto completoPettersson, Eva. "Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction". Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-269753.
Texto completoBarnhart, Zachary. "A Comparative Analysis of Web-based Machine Translation Quality: English to French and French to English". Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc177176/.
Texto completoWilliams, Jake Ryland. "Lexical mechanics: Partitions, mixtures, and context". ScholarWorks @ UVM, 2015. http://scholarworks.uvm.edu/graddis/346.
Texto completoGispert, Ramis Adrià. "Introducing linguistic knowledge into statistical machine translation". Doctoral thesis, Universitat Politècnica de Catalunya, 2007. http://hdl.handle.net/10803/6902.
Texto completoEl sistema de traducció estocàstica utilitzat en aquest treball segueix un enfocament basat en tuples, unitats bilingües que permeten estimar un model de traducció de probabilitat conjunta per mitjà de la combinació, dins un entorn log-linial, de cadenes d'n-grames i funcions característiques addicionals. Es presenta un estudi detallat d'aquesta aproximació, que inclou la seva transformació des d'una implementació d'X-grames en autòmats d'estats finits, més orientada a la traducció de veu, cap a l'actual solució d'n-grames orientada a la traducció de text de gran vocabulari. La tesi estudia també les fases d'entrenament i decodificació, així com el rendiment per a diferents tasques (variant el tamany dels corpora o el parell d'idiomes) i els principals problemes reflectits en les anàlisis d'error.
La tesis també investiga la incorporació de informació lingüística específicament en aliniament per paraules. Es proposa l'extensió mitjançant classificació de formes verbals d'un algorisme d'aliniament paraula a paraula basat en co-ocurrències, amb resultats positius. Així mateix, s'avalua de forma empírica l'impacte en qualitat d'aliniament i de traducció que s'obté mitjançant l'etiquetatge morfològic, la lematització, la classificació de formes verbals i el truncament o stemming del text paral·lel.
Pel que fa al model de traducció, es proposa un model de tractament de les formes verbals per mitjà d'un model de instanciació addicional, i es realitzen experiments en la direcció d'anglès a castellà. La tesi també introdueix un model de llenguatge d'etiquetes morfològiques del destí per tal d'abordar problemes de concordança. Finalment, s'estudia l'impacte de la derivació morfològica en la formulació de la traducció estocàstica mitjançant n-grames, avaluant empíricament el possible guany derivat d'estratègies de reducció morfològica.
This Ph.D. thesis dissertation addresses the use of morphosyntactic information in order to improve the performance of Statistical Machine Translation (SMT) systems, providing them with additional linguistic information beyond the surface level of words from parallel corpora.
The statistical machine translation system in this work here follows a tuple-based approach, modelling joint-probability translation models via log-linear combination of bilingual n-grams with additional feature functions. A detailed study of the approach is conducted. This includes its initial development from a speech-oriented Finite-State Transducer architecture implementing X-grams towards a large-vocabulary text-oriented n-grams implementation, training and decoding particularities, portability across language pairs and tasks, and main difficulties as revealed in error analyses.
The use of linguistic knowledge to improve word alignment quality is also studied. A cooccurrence-based one-to-one word alignment algorithm is extended with verb form classification with successful results. Additionally, we evaluate the impact in word alignment and translation quality of Part-Of-Speech, base form, verb form classification and stemming on state-of-art word alignment tools.
Furthermore, the thesis proposes a translation model tackling verb form generation through an additional verb instance model, reporting experiments in English-to-Spanish tasks. Disagreement is addressed via incorporating a target Part-Of-Speech language model. Finally, we study the impact of morphology derivation on Ngram-based SMT formulation, empirically evaluating the quality gain that is to be gained via morphology reduction.
Hoang, Hieu. "Improving statistical machine translation with linguistic information". Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5781.
Texto completoZbib, Rabih M. (Rabih Mohamed) 1974. "Using linguistic knowledge in statistical machine translation". Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62391.
Texto completoCataloged from PDF version of thesis.
Includes bibliographical references (p. 153-162).
In this thesis, we present methods for using linguistically motivated information to enhance the performance of statistical machine translation (SMT). One of the advantages of the statistical approach to machine translation is that it is largely language-agnostic. Machine learning models are used to automatically learn translation patterns from data. SMT can, however, be improved by using linguistic knowledge to address specific areas of the translation process, where translations would be hard to learn fully automatically. We present methods that use linguistic knowledge at various levels to improve statistical machine translation, focusing on Arabic-English translation as a case study. In the first part, morphological information is used to preprocess the Arabic text for Arabic-to-English and English-to-Arabic translation, which reduces the gap in the complexity of the morphology between Arabic and English. The second method addresses the issue of long-distance reordering in translation to account for the difference in the syntax of the two languages. In the third part, we show how additional local context information on the source side is incorporated, which helps reduce lexical ambiguity. Two methods are proposed for using binary decision trees to control the amount of context information introduced. These methods are successfully applied to the use of diacritized Arabic source in Arabic-to-English translation. The final method combines the outputs of an SMT system and a Rule-based MT (RBMT) system, taking advantage of the flexibility of the statistical approach and the rich linguistic knowledge embedded in the rule-based MT system.
by Rabih M. Zbib.
Ph.D.in Information Technology
Lindgren, Anna. "Semi-Automatic Translation of Medical Terms from English to Swedish : SNOMED CT in Translation". Thesis, Linköpings universitet, Medicinsk informatik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69736.
Texto completoDen internationella kliniska terminologin SNOMED CT har översatts från engelska till svenska under ansvar av Socialstyrelsen. Den här studien utfördes för att påvisa om semiautomatiska översättningsmetoder skulle kunna utföra tillräckligt bra översättning med färre resurser än manuell översättning. Den engelsk-svenska medicinska ordlistan TermColl användes som bas för översättning av delmängder av SNOMED CT via översättningsminne och genom statistisk översättning. Med Socialstyrelsens översättningar som referens poängsattes the semiautomatiska översättningarna via BLEU. Resultaten visade att statistisk översättning gav ett betydligt bättre resultat än översättning med översättningsminne, men över lag var resultaten alltför dåliga för att semiautomatisk översättning skulle kunna rekommenderas i detta fall.
Linardaki, Evita. "Linguistic and statistical extensions of data oriented parsing". Thesis, University of Essex, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.434401.
Texto completoPanesar, Kulvinder. "Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions". Universitat Politécnica de Valéncia, 2020. http://hdl.handle.net/10454/18121.
Texto completoThis paper aims to demystify the hype and attention on chatbots and its association with conversational artificial intelligence. Both are slowly emerging as a real presence in our lives from the impressive technological developments in machine learning, deep learning and natural language understanding solutions. However, what is under the hood, and how far and to what extent can chatbots/conversational artificial intelligence solutions work – is our question. Natural language is the most easily understood knowledge representation for people, but certainly not the best for computers because of its inherent ambiguous, complex and dynamic nature. We will critique the knowledge representation of heavy statistical chatbot solutions against linguistics alternatives. In order to react intelligently to the user, natural language solutions must critically consider other factors such as context, memory, intelligent understanding, previous experience, and personalized knowledge of the user. We will delve into the spectrum of conversational interfaces and focus on a strong artificial intelligence concept. This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the dialogue to achieve a goal. To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution.
Osika, Anton. "Statistical analysis of online linguistic sentiment measures with financial applications". Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177106.
Texto completoFöretaget Gavagai använder olika mått för att i realtid uppskatta sen-timent ifrån diverse strömmar av publika dokument. Gavagai vill hitta ett en procedur som bestämmer vilka mått som passar passar bäst i en given kontext. Det här arbetet diskuterar vilka kriterium som är önskvärda för att mäta sentiment samt härleder och utvärderar procedurer för att välja öptimalasentimentmått. Tre metoder för att välja ut en grupp av mått som beskriver oberoende polariseringar i text föreslås. Dessa bygger på att: välja mått där principal-komponentsanalys uppvisar hög dimensionalitet hos måtten, välja mått som maximerar total uppskattad differentialentropi, välja ett mått som har hög villkorlig varians givet andra polariseringar. Då exogen tidsvarierande data om ett ämne finns tillgängligt kan denna data användas för att beräkna vilka sentimentmått som bäst beskriver datan. För att undersöka potentialen i att välja sentimentmått på detta sätt testas hypoteserna att publika sentimentmått kan förutspå finansiell volatilitet samt politiska opinionsundersökningar. Nollhypotesen kan ej förkastas. En sammanfattning för att på ett genomgående matematiskt koherent sätt aggregera sentiment läggs fram tillsammans med rekommendationer för framtida efterforskningar.
Herrmann, Teresa [Verfasser] y A. [Akademischer Betreuer] Waibel. "Linguistic Structure in Statistical Machine Translation / Teresa Herrmann. Betreuer: A. Waibel". Karlsruhe : KIT-Bibliothek, 2015. http://d-nb.info/1102250155/34.
Texto completoKliegl, Reinhold. "Publication Statistics Show Collaboration, Not Competition". Universität Potsdam, 2008. http://opus.kobv.de/ubp/volltexte/2011/5719/.
Texto completoRayson, Paul Edward. "Matrix : a statistical method and software tool for linguistic analysis through corpus comparison". Thesis, Lancaster University, 2003. http://eprints.lancs.ac.uk/12287/.
Texto completoKearsley, Logan R. "A Hybrid Approach to Cross-Linguistic Tokenization: Morphology with Statistics". BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5984.
Texto completoTolle, Kristin M. "Domain-independent semantic concept extraction using corpus linguistics, statistics and artificial intelligence techniques". Diss., The University of Arizona, 2003. http://hdl.handle.net/10150/280502.
Texto completoXu, Yushi Ph D. Massachusetts Institute of Technology. "Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation". Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/44726.
Texto completoIncludes bibliographical references (p. 86-87).
Second language learning is a compelling activity in today's global markets. This thesis focuses on critical technology necessary to produce a computer spoken translation game for learning Mandarin Chinese in a relatively broad travel domain. Three main aspects are addressed: efficient Chinese parsing, high-quality English-Chinese machine translation, and how these technologies can be integrated into a translation game system. In the language understanding component, the TINA parser is enhanced with bottom-up and long distance constraint features. The results showed that with these features, the Chinese grammar ran ten times faster and covered 15% more of the test set. In the machine translation component, a combined method of linguistic and statistical system is introduced. The English-Chinese translation is done via an intermediate language "Zhonglish", where the English-Zhonglish translation is accomplished by a parse-and-paraphrase paradigm using hand-coded rules, mainly for structural reconstruction. Zhonglish-Chinese translation is accomplished by a standard phrase based statistical machine translation system, mostly accomplishing word sense disambiguation and lexicon mapping. We evaluated in an independent test set in IWSLT travel domain spoken language corpus. Substantial improvements were achieved for GIZA alignment crossover: we obtained a 45% decrease in crossovers compared to a traditional phrase-based statistical MT system. Furthermore, the BLEU score improved by 2 points. Finally, a framework of the translation game system is described, and the feasibility of integrating the components to produce reference translation and to automatically assess student's translation is verified.
by Yushi Xu.
S.M.
Xu, Jia [Verfasser]. "Sequence segmentation for statistical machine translation / Jia Xu". Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2010. http://d-nb.info/1015180108/34.
Texto completoBaker, David Ian. "Shepherd of Hermas : a socio-rhetorical and statistical-linguistic study of authorship and community concerns". Thesis, Cardiff University, 2006. http://orca.cf.ac.uk/56076/.
Texto completoFRAGOSO, LUANE DA COSTA PINTO LINS. "INTEGRATION OF LINGUISTIC AND GRAPHIC INFORMATION IN MULTIMODAL COMPREHENSION OF STATISTICAL GRAPHS: A PSYCHOLINGUISTIC ASSESSMENT". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2015. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=25595@1.
Texto completoCOORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
Esta tese possui como objetivo investigar o mapeamento entre o conteúdo de sentenças e aquele apresentado em gráficos no processo de compreensão multimodal. Assume-se uma abordagem experimental, baseada nos aportes teórico-metodológicos da Psicologia Cognitiva e da Psicolinguística, aliada a discussões pertinentes à área de Educação Matemática e aos estudos sobre multimodalidade e letramento. Consideram-se duas propostas acerca da integração entre informação linguística e visual: uma vinculada à hipótese de modularidade representacional de Jackendoff (1996), em que se defende a ideia de módulos de interface, de natureza híbrida, e uma proposta alternativa, assumida no presente trabalho, segundo a qual tanto o processamento linguístico como o visual gerariam representações de natureza abstrata/proposicional, que seriam integradas em uma interface conceitual. Buscou-se verificar (i) se fatores top-down como conhecimento prévio do assunto afetam essa integração e (ii) em que medida informação linguística instaura expectativas acerca da informação expressa no gráfico. Foram conduzidos dois experimentos de comparação sentença-figura com gráficos de coluna e de linha, utilizando o programa psyscope, e um envolvendo gráficos de linha com a técnica de rastreamento ocular. Não foram encontradas evidências de efeitos top-down no experimento com gráfico de colunas. Foram obtidos, contudo, efeitos significativos para tempo de resposta associados a outros fatores, quais sejam correção do gráfico, expressão lexical usada para comparar itens do gráfico (maior vs menor, p. ex.) e número de itens referidos na sentença a serem localizados no gráfico. Nos dois experimentos com gráficos de linha, as variáveis independentes foram (i) congruência (linha congruente/incongruente em relação ao verbo – exemplo: linha inclinada para cima ou para baixo vs. verbo subir) e (ii) correção do gráfico em expressar o conteúdo da frase, manipulada com alterações na linha e na ordenação (ascendente/descendente) de informação temporal no eixo x. No experimento com psyscope, os resultados indicaram não haver dificuldade de julgar a compatibilidade frase/gráfico quando congruência e correção não divergiam. Para tempo de resposta, houve efeito principal de congruência e correção, com menores tempos associados, respectivamente, às condições em que a linha era congruente com o verbo e o gráfico correto. Também houve efeito de interação entre as variáveis. No experimento com rastreador ocular, foram analisados índice de acertos, número e tempo total de duração das fixações e trajetória do olhar nas áreas de interesse demarcadas. Em relação a índice de acerto, assim como no experimento com psycope, maior dificuldade de processamento estava associada à condição incongruente correta, em que há quebra de expectativa em relação à posição da linha (vs. verbo) e ao modo usual de organização dos gráficos no eixo x. Quanto aos movimentos oculares, na área do gráfico, observou-se maior número e tempo total de duração das fixações nas condições corretas; na área da frase, tais condições apresentaram resultados opostos. Quanto à trajetória do olhar, os dados sugerem ser a informação linguística acessada em primeiro lugar, orientando a leitura do gráfico. Considerando os resultados em conjunto, pode-se afirmar que o custo de integração é determinado pela compatibilidade (ou não) entre as proposições geradas pelos módulos linguístico e visual.
This thesis aims at investigating the mapping between the sentential content and the content presented in graphs in a multimodal comprehension process. We assume an experimental approach, based on Cognitive Pyschology and Psycholinguistics methodological and theoretical contributions as well as literacy and multimodality studies. Two proposals concerning the integration between linguistic and visual information are considered: one linked to Jackendoff s (1996) representational modularity hypothesis, in which, the idea of interface modules, of hybrid nature, is defended; and an alternative one according to which linguistic and visual processing could generate propositional/abstract representations which could be integrated into a conceptual interface. We tried to check (i) if top-down aspects such as prior knowledge can affect this integration and (ii) in what extent linguistic information may bring expectations about the information expressed in the graph. Sentence-picture comparison experiments were conducted with line and columns graphs using the pyscope software, and another one concerning line graphs with eye tracking technique. Top-down effects were not found in columns graphs experiment. However, significant effects related to response time associated with other aspects such as graph accuracy, lexical expression used in order to compare graph elements (larger x smaller, for example) and the number of elements in the sentence that must be found in the graph. In both experiments with line graphs, the independent variables were (i) congruency (congruent/incongruent line in relation to the verb - line up or down vs verb increase) and (ii) accuracy of the graph in order to express the content of the sentence, manipulated with changes in the line and time information order (ascendant/descendent) in x axis. In psyscope experiment, there was no difficulty in judging the sentence-picture compatibility when congruency and correction were not different. Concerning the response time, there was effect of congruency and correction, with shorter times associated, respectively, to the conditions in which line was congruent to the verb and correct graph. There was also effect of interaction. In eye tracking experiment, accuracy rates, number of fixations, total fixation duration and the scanpath in areas of interest were analysed. In relation to accuracy rates, similar to psyscope experiment, more difficulty in processing was associated to incongruent/incorrect condition, in which there is a break in the expectation related to the line position (vs.verb) and the common organization of the elements displayed in x axis. Concerning eye movements, in the graph area, number of fixations and total fixation duration were higher in correct conditions; in the sentence area, these results were opposite. Analyzing the scanpath, data suggest that linguistic information is accessed first, guiding the graph reading. To conclude, it s possible to state that the cost of integration is determined by compatibility (or not) between the propositions from both linguistic and visual modules.
Hasan, Saša [Verfasser]. "Triplet lexicon models for statistical machine translation / Sasa Hasan". Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2012. http://d-nb.info/1028004060/34.
Texto completo