Dissertations / Theses: 'Science of language (Linguistics)'

1

J'Fellers, J., and Theresa McGarry. "Language and Linguistics." Digital Commons @ East Tennessee State University, 2009. https://dc.etsu.edu/etsu-works/6151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Bubalo, Kurtis J. "Bilingual Advantage Reassessed Using Hard Science Linguistics." University of Toledo / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1321470740.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sun, Muye. "Hard Science Linguistics and Brain-based Teaching: The implications for Second Language Teaching." University of Toledo / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1333767256.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Farrar, Scott O. "An ontology for linguistics on the Semantic Web." Diss., The University of Arizona, 2003. http://hdl.handle.net/10150/289879.

Full text

Abstract:

The current research presents an ontology for linguistics useful for an implementation on the Semantic Web. By adhering to this model, it is shown that data of the kind routinely collected by field linguists may be represented so as to facilitate automatic analysis and semantic search. The literature concerning typological databases, knowledge engineering, and the Semantic Web is reviewed. It is argued that the time is right for the integration of these three areas of research. Linguistic knowledge is discussed in the overall context of common-sense knowledge representation. A three-layer approach to meaning is assumed, one that includes conceptual, semantic, and linguistic levels of knowledge. In particular the level of semantics is shown to be crucial for a notional account of grammatical categories such as tense, aspect, and case. The level of semantic is viewed as an encoding of common-sense reality. To develop the ontology an upper model based on the Suggested Upper Merged Ontology (SUMO) is adopted, though elements from other ontologies are utilized as well. A brief comparison of available upper models is presented. It is argued that any ontology for linguistics should provide an account of at least (1) linguistic expressions, (2) mental linguistic units, (3) linguistic categories, and (4) discrete semantic units. The concepts and relations concerning these four domains are motivated as part of the ontology. Finally, an implementation for the Semantic Web is given by discussing the various data constructs necessary for markup (interlinear text, lexicons, paradigms, grammatical descriptions). It is argued that a characterization of the data constructs should not be included in the general ontology, but should be left up to the individual data provider to implement in XML Schema. A search scenario for linguistic data is discussed. It is shown that an ontology for linguistics provides the machinery for pure semantic search, that is, an advanced search framework whereby the user may use linguistic concepts, not just simple strings, as the search query.

APA, Harvard, Vancouver, ISO, and other styles

5

Abuklaish, Abdelhafied. "Investigating the language needs of undergraduate science students in Libya." Thesis, University of Southampton, 2014. https://eprints.soton.ac.uk/374763/.

Full text

Abstract:

Although English for Specific Purposes (ESP) approach is widely applied in science to many non-native speakers around the world, higher education institutions in Libya are striving to remain competitive in on-going changes in the science field. There is an ever increasing demand for communication in English in study and in work places, and some institutions have taken steps to develop newer academic programs as a means to meet students’ needs. However, few studies have been carried out to customise ESP courses to suit the Libyan scientific environment. The primary focus of this study is to explore the language needs of undergraduate science students in Libya. The Needs Analysis Framework was used to investigate the extent of English use among computer science, chemistry and physics undergraduates. For this purpose, multiple-instruments were used including questionnaires, semi-structured interviews, classroom observations and teaching materials. The questionnaires were completed by 127 science students while the semi-structured interviews were conducted with 7 faculty members. The classroom observations were conducted with three classes namely Computer Science, Chemistry and ESP, and teaching materials were collected from each of these subjects. The study reveals that English language is generally needed in the science settings. Moreover, it plays a significant role in computer science in particular, as most of its discourses are conducted in English. However, it plays only a limited role in the teaching of Chemistry and Physics. The study suggests that collaboration between science disciplines and English teachers are needed in terms of the ESP programme if such programmes are to be successful.

APA, Harvard, Vancouver, ISO, and other styles

6

Matsuoka, Warren Eiji. "The vocabulary of L1 senior secondary science textbooks: creating word lists to inform EFL teaching of science-oriented students." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/12937.

Full text

Abstract:

L2 studies examining the relationship between vocabulary knowledge and reading comprehension have found that many EFL students entering university lack the vocabulary knowledge to comprehend L1 academic texts even after at least six years of English language study (e.g., Hui, 2004; Joyce, 2003; Li, 2008). Science-oriented undergraduate students and other non-English major students in particular have been found to have relatively small vocabulary sizes: knowledge of only the first 1,000 to 2,000 most frequent words of English (e.g., Cobb & Horst, 2001; Hsu, 2014; Nurweni & Read, 1999). However, the reason for this difficulty in comprehending texts at the tertiary level may not only be due to the students’ poor vocabulary size but also specifically to the types of English words they had been exposed to and learned in the EFL secondary school classroom. Therefore, in order to inform EFL teaching of science-oriented, university-bound students, the present study aimed to 1) determine the vocabulary demands of L1 senior secondary (i.e., Year 11 and Year 12; for students aged 16 to 18 years) biology, chemistry and physics textbooks written to prepare students in Australia for Year 12 exams; 2) identify the most frequent, wide-range words occurring across and within the biology, chemistry and physics textbooks (also referred to as pure science textbooks in the present study) in order to create a science specific and three subject specific word lists; 3) evaluate the coverage of the lists over various pure science and non-pure science text types; and 4) compare the lists to existing academic and science specific word lists made for use in TESOL. This study found, inter alia, that knowledge of the words making up the science specific and subject specific word lists may enable the L2 reader to obtain the minimal lexical coverage needed for assisted comprehension of pure science textbooks at the senior secondary level and to a lesser extent of those at the tertiary level.

APA, Harvard, Vancouver, ISO, and other styles

7

Botha, Gerrti Reinier. "Text-based language identification for the South African languages." Pretoria : [s.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-090942008-133715/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Berman, Lucy. "Lewisian Properties and Natural Language Processing: Computational Linguistics from a Philosophical Perspective." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cmc_theses/2200.

Full text

Abstract:

Nothing seems more obvious than that our words have meaning. When people speak to each other, they exchange information through the use of a particular set of words. The words they say to each other, moreover, are about something. Yet this relation of “aboutness,” known as “reference,” is not quite as simple as it appears. In this thesis I will present two opposing arguments about the nature of our words and how they relate to the things around us. First, I will present Hilary Putnam’s argument, in which he examines the indeterminacy of reference, forcing us to conclude that we must abandon metaphysical realism. While Putnam considers his argument to be a refutation of non-epistemicism, David Lewis takes it to be a reductio, claiming Putnam’s conclusion is incredible. I will present Lewis’s response to Putnam, in which he accepts the challenge of demonstrating how Putnam’s argument fails and rescuing us from the abandonment of realism. In order to explain the determinacy of reference, Lewis introduces the concept of “natural properties.” In the final chapter of this thesis, I will propose another use for Lewisian properties. Namely, that of helping to minimize the gap between natural language processing and human communication.

APA, Harvard, Vancouver, ISO, and other styles

9

Fountain, Trevor Michael. "Modelling the acquisition of natural language categories." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/7875.

Full text

Abstract:

The ability to reason about categories and category membership is fundamental to human cognition, and as a result a considerable amount of research has explored the acquisition and modelling of categorical structure from a variety of perspectives. These range from feature norming studies involving adult participants (McRae et al. 2005) to long-term infant behavioural studies (Bornstein and Mash 2010) to modelling experiments involving artificial stimuli (Quinn 1987). In this thesis we focus on the task of natural language categorisation, modelling the cognitively plausible acquisition of semantic categories for nouns based on purely linguistic input. Focusing on natural language categories and linguistic input allows us to make use of the tools of distributional semantics to create high-quality representations of meaning in a fully unsupervised fashion, a property not commonly seen in traditional studies of categorisation. We explore how natural language categories can be represented using distributional models of semantics; we construct concept representations for corpora and evaluate their performance against psychological representations based on human-produced features, and show that distributional models can provide a high-quality substitute for equivalent feature representations. Having shown that corpus-based concept representations can be used to model category structure, we turn our focus to the task of modelling category acquisition and exploring how category structure evolves over time. We identify two key properties necessary for cognitive plausibility in a model of category acquisition, incrementality and non-parametricity, and construct a pair of models designed around these constraints. Both models are based on a graphical representation of semantics in which a category represents a densely connected subgraph. The first model identifies such subgraphs and uses these to extract a flat organisation of concepts into categories; the second uses a generative approach to identify implicit hierarchical structure and extract an hierarchical category organisation. We compare both models against existing methods of identifying category structure in corpora, and find that they outperform their counterparts on a variety of tasks. Furthermore, the incremental nature of our models allows us to predict the structure of categories during formation and thus to more accurately model category acquisition, a task to which batch-trained exemplar and prototype models are poorly suited.

APA, Harvard, Vancouver, ISO, and other styles

10

Cooper, Adam. "Co-Teaching Science Courses for English Language Learners." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin149122539833232.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Terra, Egidio. "Lexical Affinities and Language Applications." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1071.

Full text

Abstract:

Understanding interactions among words is fundamental for natural language applications. However, many statistical NLP methods still ignore this important characteristic of language. For example, information retrieval models still assume word independence. This work focuses on the creation of lexical affinity models and their applications to natural language problems. The thesis develops two approaches for computing lexical affinity. In the first, the co-occurrence frequency is the calculated by point estimation. The second uses parametric models for co-occurrence distances. For the point estimation approach, we study several alternative methods for computing the degree of affinity by making use of point estimates for co-occurrence frequency. We propose two new point estimators for co-occurrence and evaluate the measures and the estimation procedures with synonym questions. In our evaluation, synonyms are checked directly by their co-occurrence and also by comparing them indirectly, using other lexical units as supporting evidence. For the parametric approach, we address the creation of lexical affinity models by using two parametric models for distance co-occurrence: an independence model and an affinity model. The independence model is based on the geometric distribution; the affinity model is based on the gamma distribution. Both fit the data by maximizing likelihood. Two measures of affinity are derived from these parametric models and applied to the synonym questions, resulting in the best absolute performance on these questions by a method not trained to the task. We also explore the use of lexical affinity in information retrieval tasks. A new method to score missing terms by using lexical affinities is proposed. In particular, we adapt two probabilistic scoring functions for information retrieval to allow all query terms to be scored. One is a document retrieval method and the other is a passage retrieval method. Our new method, using replacement terms, shows significant improvement over the original methods.

APA, Harvard, Vancouver, ISO, and other styles

12

Graller, Matthew. "DEVELOPMENT AND APPLICATION OF A THREE-TIERED APPROACH TO SCHIZOPHRENIC LANGUAGE: FROM NEUROPATHOLOGY TO SPEECH." Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1435579787.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Gow, Francie. "Metrics for evaluating translation memory software." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26375.

Full text

Abstract:

Translation memory (TM) tools help human translators recycle portions of their previous work by storing previously translated material. In conventional TM tools, the aligned texts are divided into sentence-level source and target translation units for storage in the database. Each sentence of a new source text is compared with the units stored in the database, and the tool proposes matches that are exact or similar. This is referred to as a sentence-based approach to search and retrieval. A different and more recently developed approach involves storing full source- and target-text pairs (known as bitexts) in the database and identifying identical character strings of any length. This is referred to as a character-string-within-a-bitext (CSB)-based approach to search and retrieval. Because the second approach is more recent, traditional techniques for evaluating TM tools do not take into account this fundamental difference. Therefore, the goal of this thesis is to design and develop a new evaluation methodology that can be used to compare the two approaches to search and retrieval fairly and systematically, first by defining "usefulness" as a measurable attribute, then by measuring the usefulness of the output of each approach in an identical translation context. (Abstract shortened by UMI.)

APA, Harvard, Vancouver, ISO, and other styles

14

Doyle, Sean. "Progressive word hypotheses reduction for very large vocabulary, continuous speech recognition." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0015/MQ37123.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Ling, Yong 1973. "Keyword spotting in continuous speech utterances." Thesis, McGill University, 1999. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=21595.

Full text

Abstract:

The work in this thesis constructed a word spotting system, which managed to spot an amount of pre-defined keywords out of unconstrained running conversational speech utterances. The development and experiments are based on the Credit Card subset of SWITCHBOARD speech corpus. The techniques are applied in the context of a Hidden Markov Model (HMM) based Continuous Speech Recognition (CSR) approach to keyword spotting. The word spotting system uses context-dependent acoustic triphone to model both keyword and non-keyword speech utterances. To enhance the true keyword spotting rate, sophisticated keyword-filler network topology models are defined in two different orthographic ways, individual phonemic filler models and individual syllabic filler models. To introduce more lexical constraints, a bigram language model is used. Better performance is obtained in the system with more lexical constraints. A background acoustic model is paralleled to the system network to account for the acoustic variety. The results of the experiments show that the word spotting rate of the overall performance increased by 84% when more lexical constraints applied, and the merge of the background model helps to increase the spotting rate by 5.73%.

APA, Harvard, Vancouver, ISO, and other styles

16

Israel, Ross. "Building a Korean particle error detection system from the ground up." Thesis, Indiana University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3672873.

Full text

Abstract:

This dissertation describes an approach to automatically detecting and correcting grammatical errors in text produced by Korean language learners. Specifically, we focus on Korean particles, which have a range of functions including case marking and indicate properties similar to English prepositions. There are two main goals for this research: to establish reliable data sources that can serve as a foundation for Korean language learning research endeavors, and to develop an accurate error detection system. The machine learning-based system is built to detect errors of particle omission and substitution, then to select the best particle to produce grammatical output. The resources and results outlined in this work should prove useful in aiding other researchers working on Korean error detection and in moving the field one step closer to robust multi-lingual methods.

APA, Harvard, Vancouver, ISO, and other styles

17

Martinson, Anna M. "Identifying gender ideology in web content debates about feminism /." [Bloomington, Ind.] : Indiana University, 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3354915.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, School of Library Information Science, 2009.
Title from PDF t.p. (viewed on Feb. 4, 2010). Source: Dissertation Abstracts International, Volume: 70-04, Section: A, page: 1075. Adviser: Susan C. Herring.

APA, Harvard, Vancouver, ISO, and other styles

18

Jarmasz, Mario. ""Roget's Thesaurus" as a lexical resource for natural language processing." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26493.

Full text

Abstract:

This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.

APA, Harvard, Vancouver, ISO, and other styles

19

Corney, Jeffrey R. "Influence of textual hedging and framing variations on decision making choices pertaining to the climate change issue." The Ohio State University, 2001. http://rave.ohiolink.edu/etdc/view?acc_num=osu1384523204.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Salinas, Barrios Ivan Eduardo. "Embodied experiences for science learning| A cognitive linguistics exploration of middle school students' language in learning about water." Thesis, The University of Arizona, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3634266.

Full text

Abstract:

I investigated linguistic patterns in middle school students' writing to understand their relevant embodied experiences for learning science. Embodied experiences are those limited by the perceptual and motor constraints of the human body. Recent research indicates student understanding of science needs embodied experiences. Recent emphases of science education researchers in the practices of science suggest that students' understanding of systems and their structure, scale, size, representations, and causality are crosscutting concepts that unify all scientific disciplinary areas. To discern the relationship between linguistic patterns and embodied experiences, I relied on Cognitive Linguistics, a field within cognitive sciences that pays attention to language organization and use assuming that language reflects the human cognitive system. Particularly, I investigated the embodied experiences that 268 middle school students learning about water brought to understanding: i) systems and system structure; ii) scale, size and representations; and iii) causality. Using content analysis, I explored students' language in search of patterns regarding linguistic phenomena described within cognitive linguistics: image schemas, conceptual metaphors, event schemas, semantical roles, and force-dynamics. I found several common embodied experiences organizing students' understanding of crosscutting concepts. Perception of boundaries and change in location and perception of spatial organization in the vertical axis are relevant embodied experiences for students' understanding of systems and system structure. Direct object manipulation and perception of size with and without locomotion are relevant for understanding scale, size and representations. Direct applications of force and consequential perception of movement or change in form are relevant for understanding of causality. I discuss implications of these findings for research and science teaching.

APA, Harvard, Vancouver, ISO, and other styles

21

Doyle, Katherine Mary. "Mapping the language of science and science teaching practices : a case study of early childhood school science." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/45941/1/Katherine_Doyle_Thesis.pdf.

Full text

Abstract:

Concerns raised in educational reports about school science in terms of students. outcomes and attitudes, as well as science teaching practices prompted investigation into science learning and teaching practices at the foundational level of school science. Without science content and process knowledge, understanding issues of modern society and active participation in decision-making is difficult. This study contended that a focus on the development of the language of science could enable learners to engage more effectively in learning science and enhance their interest and attitudes towards science. Furthermore, it argued that explicit teaching practices where science language is modelled and scaffolded would facilitate the learning of science by young children at the beginning of their formal schooling. This study aimed to investigate science language development at the foundational level of school science learning in the preparatory-school with students aged five and six years. It focussed on the language of science and science teaching practices in early childhood. In particular, the study focussed on the capacity for young students to engage with and understand science language. Previous research suggests that students have difficulty with the language of science most likely because of the complexities and ambiguities of science language. Furthermore, literature indicates that tensions transpire between traditional science teaching practices and accepted early childhood teaching practices. This contention prompted investigation into means and models of pedagogy for learning foundational science language, knowledge and processes in early childhood. This study was positioned within qualitative assumptions of research and reported via descriptive case study. It was located in a preparatory-school classroom with the class teacher, teacher-aide, and nineteen students aged four and five years who participated with the researcher in the study. Basil Bernstein.s pedagogical theory coupled with Halliday.s Systemic Functional Linguistics (SFL) framed an examination of science pedagogical practices for early childhood science learning. Students. science learning outcomes were gauged by focussing a Hallydayan lens on their oral and reflective language during 12 science-focussed episodes of teaching. Data were collected throughout the 12 episodes. Data included video and audio-taped science activities, student artefacts, journal and anecdotal records, semi-structured interviews and photographs. Data were analysed according to Bernstein.s visible and invisible pedagogies and performance and competence models. Additionally, Halliday.s SFL provided the resource to examine teacher and student language to determine teacher/student interpersonal relationships as well as specialised science and everyday language used in teacher and student science talk. Their analysis established the socio-linguistic characteristics that promoted science competencies in young children. An analysis of the data identified those teaching practices that facilitate young children.s acquisition of science meanings. Positive indications for modelling science language and science text types to young children have emerged. Teaching within the studied setting diverged from perceived notions of common early childhood practices and the benefits of dynamic shifting pedagogies were validated. Significantly, young students demonstrated use of particular specialised components of school-science language in terms of science language features and vocabulary. As well, their use of language demonstrated the students. knowledge of science concepts, processes and text types. The young students made sense of science phenomena through their incorporation of a variety of science language and text-types in explanations during both teacher-directed and independent situations. The study informs early childhood science practices as well as practices for foundational school science teaching and learning. It has exposed implications for science education policy, curriculum and practices. It supports other findings in relation to the capabilities of young students. The study contributes to Systemic Functional Linguistic theory through the development of a specific resource to determine the technicality of teacher language used in teaching young students. Furthermore, the study contributes to methodology practices relating to Bernsteinian theoretical perspectives and has demonstrated new ways of depicting and reporting teaching practices. It provides an analytical tool which couples Bernsteinian and Hallidayan theoretical perspectives. Ultimately, it defines directions for further research in terms of foundation science language learning, ongoing learning of the language of science and learning science, science teaching and learning practices, specifically in foundational school science, and relationships between home and school science language experiences.

APA, Harvard, Vancouver, ISO, and other styles

22

Keller, Thomas Anderson. "Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing." Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.

Full text

Abstract:

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

APA, Harvard, Vancouver, ISO, and other styles

23

Yamani, Ahmed A. S. "An intelligent question : answering system for natural language." Thesis, University of Greenwich, 1998. http://gala.gre.ac.uk/8253/.

Full text

Abstract:

As applications of information storage and retrieval systems are becoming more widespread, there is an increased need to be able to communicate with these systems in a natural way. Natural Language applications in the 1990s, as well as in the foreseeable future, have more demanding requirements. Current Natural Language Processing approaches alone have proven to be insufficient as they lack to obtain linguistic understanding. A more suitable approach would be to adopt Computational Linguistics theories, such as the Lexical-Functional Grammar (LFG) theory complemented with Artificial Intelligence representation and processing techniques. A prototype Question-Answering System has been developed. It takes Natural Language parsed interrogatives, produces the Functional and Semantic structures according to the LFG representation. It compares the functional behaviour of verbs and their linguistic associations in a given query with a general Object Model in that specific domain. It will then attempt to deduce more information from the given processed text and represent it for possible queries. The structural rules of the LFG and the deduced common-sense domain specific information resolve most of the common ambiguities found in Natural Languages and enhance the understanding ability of the proposed prototype. The LFG theory has been adopted and extended: (i) to examine the constituents of the theoretical, syntactic and semantic of Arabic interrogatives, an area which has not been thoroughly investigated, (ii) to represent the Functional and Semantic Structures of the Arabic interrogatives, (iii) to overcome the word-order problem associated with some Natural languages such as Arabic, (iv) to add understanding capabilities by capturing the common-sense domain specific knowledge within a specific domain.

APA, Harvard, Vancouver, ISO, and other styles

24

Van, Leeuwen Theo. "Language and representation : the recontextualisation of participants, activities and reactions." Thesis, The University of Sydney, 1993. http://hdl.handle.net/2123/1615.

Full text

Abstract:

This thesis proposes a model for the description of social practice which analyses social practices into the following elements: (1) the participants of the practice; (2) the activities which constitute the practice; (3) the performance indicators which stipulate how the activities are to be performed; (4) the dress and body grooming for the participants; (5) the times when, and (6)the locations where the activities take place; (7) the objects, tools and materials, required for performing the activities; and (8) the eligibility conditions for the participants and their dress, the objects, and the locations, that is, the characteristics these elements must have to be eligible to participate in, or be used in, the social practice.

APA, Harvard, Vancouver, ISO, and other styles

25

Van, Leeuwen Theo. "Language and representation : the recontextualisation of participants, activities and reactions." University of Sydney, 1993. http://hdl.handle.net/2123/1615.

Full text

Abstract:

Doctor of Philosophy
This thesis proposes a model for the description of social practice which analyses social practices into the following elements: (1) the participants of the practice; (2) the activities which constitute the practice; (3) the performance indicators which stipulate how the activities are to be performed; (4) the dress and body grooming for the participants; (5) the times when, and (6)the locations where the activities take place; (7) the objects, tools and materials, required for performing the activities; and (8) the eligibility conditions for the participants and their dress, the objects, and the locations, that is, the characteristics these elements must have to be eligible to participate in, or be used in, the social practice.

APA, Harvard, Vancouver, ISO, and other styles

26

Schäfer, Ulrich. "Integrating deep and shallow natural language processing components : representations and hybrid architectures /." Saarbrücken : German Reseach Center for Artificial Intelligence : Saarland University, Dept. of Computational Linguistics and Phonetics, 2007. http://www.loc.gov/catdir/toc/fy1001/2008384333.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Taing, Austin. "Application of Boolean Logic to Natural Language Complexity in Political Discourse." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/77.

Full text

Abstract:

Press releases serve as a major influence on public opinion of a politician, since they are a primary means of communicating with the public and directing discussion. Thus, the public’s ability to digest them is an important factor for politicians to consider. This study employs several well-studied measures of linguistic complexity and proposes a new one to examine whether politicians change their language to become more or less difficult to parse in different situations. This study uses 27,500 press releases from the US Senate between 2004–2008 and examines election cycles and natural disasters, namely hurricanes, as situations where politicians’ language may change. We calculate the syntactic complexity measures clauses per sentence, T-unit length, and complex-T ratio, as well as the Automated Readability Index and Flesch Reading Ease of each press release. We also propose a proof-of-concept measure called logical complexity to find if classical Boolean logic can be applied as a practical linguistic complexity measure. We find that language becomes more complex in coastal senators’ press releases concerning hurricanes, but see no significant change for those in election cycles. Our measure shows similar results to the well-established ones, showing that logical complexity is a useful lens for measuring linguistic complexity.

APA, Harvard, Vancouver, ISO, and other styles

28

Gehr, Susan. "Breath of life| Revitalizing California's Native languages through archives." Thesis, San Jose State University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1552255.

Full text

Abstract:

This thesis presents an oral history of the Advocates for Indigenous California Language Survival (AICLS) and its Breath of Life Workshop. Held every other year since 1996, the workshop is designed to meet the language revitalization needs of California Indian people whose languages have no living fluent speakers. Breath of Life Workshop organizers arrange visits to four archives on the University of California, Berkeley, campus and connect participants with linguistic mentors to read and interpret archival documents in their language for the purpose of bringing their language back into use.

Through interviews with AICLS founders, Breath of Life Workshop participants, and University of California, Berkeley, linguists and archivists, this study uncovers the role archivists play in the Breath of Life Workshops and in the care of Native language collections more generally. Topics addressed include the selection and use of archival documents in the program and the changes to archival practice and policies that have resulted from archivists’ work with Breath of Life participants. The thesis also examines issues involved in the collection, arrangement, description, preservation, and access to the documentation of California Indian languages. The study concludes with recommendations for future language revitalization programs.

APA, Harvard, Vancouver, ISO, and other styles

29

Nagao, Kyoko. "Cross-language study of age perception." [Bloomington, Ind.] : Indiana University, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3232572.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, Dept. of Linguistics and the Dept. of Speech and Hearing Sciences, 2006.
"Title from dissertation home page (viewed July 10, 2007)." Source: Dissertation Abstracts International, Volume: 67-08, Section: A, page: 2962. Advisers: Kenneth de Jong; Diane Kewley-Port.

APA, Harvard, Vancouver, ISO, and other styles

30

Kelly-Lopez, Catherine Ann. "The Reality of This and That." See Full Text at OhioLINK ETD Center (Requires Adobe Acrobat Reader for viewing), 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=toledo1113844791.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Heider, Paul M. "The Semantics of Optionality." Thesis, State University of New York at Buffalo, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3683040.

Full text

Abstract:

For every participant role filler in an utterance, speakers must choose to leave it bare (e.g., "the interviewer") or to modify it (e.g., "the interviewer on Fresh Air"). Their decision is the end result of a combination of complex factors ranging from the original message to how distracted the speaker is. When we use corpora to create language models, part of our job is understanding the observable properties in and around an event description that allow us to predict these decisions. A considerable body of work on language production and discourse pragmatics concentrates on measuring noun phrase predictability and other forms of shared knowledge that help determine the balance point between over- and under-specification of a participant role filler. Although the importance of predictability as measured by long-term probabilities has long been recognized, I present a novel quantitative analysis of participant role filler predictability, the structure of the mental lexicon, and how the interaction of these two inform a speaker's internal perception of informativity. Standard Gricean assumptions tend to be efficiency oriented. Speakers will be informative enough but not wastefully so. Using these to model corpus distributions predict that noun phrase modification rates are directly proportional to predictability in order to satisfy the speaker's obligation to always be informative. In contrast, standard Firthian models (built around the idea that "you know a word by the company it keeps") assume spreading activation—and not efficiency—is the dominant predictor of usage. Sensitivity to activation's effect predicts that noun phrase modification rates are inversely proportional to predictability. Strongly connected participant role fillers could be easily activated for production while weakly connected participant role fillers would either be mentioned less often or themselves trigger strongly connected features (not normally associated with the head verb) to be primed for production.

To distinguish between these competing assumptions, I analyze participant role filler modification rates in event descriptions with respect to three indicators: the syntactic and semantic optionality of the role filler, the general predictability of the verb's role fillers, and the predictability of individual pairs of verb/participant role fillers. First, I use insights from linguistic theory to classify verbs and their participant roles into classes of syntactic optionality and semantic optionality. Second, I quantify over a large corpus the general predictability of a verb's participant roles and the specific predictability of each pair of verb/participant role filler. Finally, I model the relationship between the three indicators and modification in order to ascertain whether speakers have a stronger tendency to modify the more predictable participant role fillers, as Grice's Maxim of Relevancy predicts, or a tendency to modify the less predictable participant role fillers, as a Firthian activation-based model predicts.

I present descriptive statistical models to chart the relationship between predictability, syntactic optionality of a participant role, and semantic optionality of a participant role. In general, verb classes with stronger mental lexicon connections to their participant role fillers according to theory also have more predictable participant role fillers in the British National Corpus. Specifically, syntactically optional direct object verbs and semantically obligatory instrument verbs have more predictable participant role fillers than the opposite, comparable verb class. I also present several linear mixed-effect models to determine how predictive of modification the independent variables of syntactic verb class, semantic verb class, and verb/participant role filler predictability are. According to these models, speakers are significantly more likely to modify the less predicted participant role fillers even when taking into account individual verb and verb class differences. I conclude that mental lexicon accessibility modulates noun phrase realization according to a Firthian activation-based model. For each factor, I discuss possible explanations for the correlations between modification, predictability, and optionality and how these correlations make sense within a larger production model.

APA, Harvard, Vancouver, ISO, and other styles

32

Swamy, Sandesh. "Forecasting event outcomes from user predictions on Twitter." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492692142585459.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Shayan, Shakila. "Emergence of roles in English canonical transitive construction." [Bloomington, Ind.] : Indiana University, 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3324519.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, Dept. of Computer Science and the Dept. of Cognitive Science, 2008.
Title from PDF t.p. (viewed on May 13, 2009). Source: Dissertation Abstracts International, Volume: 69-08, Section: B, page: 5071. Advisers: Mike Gasser; Lisa Gershkoff-Stowe.

APA, Harvard, Vancouver, ISO, and other styles

34

Sherry, John William 1961. "Conversational analysis of microcomputer software: The role of customer support." Thesis, The University of Arizona, 1990. http://hdl.handle.net/10150/291327.

Full text

Abstract:

User-friendliness is a common goal of microcomputer software design, yet little attention has been paid to the importance of many conversation-like features of user interface. Computers are incapable of accessing the vast amount of contextual information that humans routinely employ in conversation. Through other means, microcomputers imitate features of conversation, often establishing in users false expectations of communicative competence. Such means usually fail to meet what Goffman (1976) has characterized as the "systemic" and "ritual" constraints of interaction. The increasing ubiquity of microcomputers in our society has been accompanied by a number of attempts to facilitate better human-computer interaction. Customer support provides one type of solution. Support personnel go beyond simply providing technical information to end users. They must additionally act as interactional "surrogates" for software, attending to communicative functions of which software is incapable or neglectful. Additionally, evidence suggests that this type of situation may intensify in the future.

APA, Harvard, Vancouver, ISO, and other styles

35

Nilsson, Fredrik. "A comparative analysis of word use in popular science and research articles in the natural sciences: A corpus linguistic investigation." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44626.

Full text

Abstract:

Within the realm of the natural sciences there are different written genres for interested readers to explore. Popular science articles aim to explain advanced scientific research to a non-expert audience while research articles target the science experts themselves. This study explores these genres in some detail in order to identify linguistic differences between them. Using two corpora consisting of over 200 000 words each, a corpus linguistic analysis was used to perform both quantitative and qualitative examinations of the two genres. The methods of analysis included word frequency, keyword, concordance, cluster and collocation analyses. Also, part-of-speech tagging was used as a complement to distinguish word class use between the two genres. The results show that popular science articles feature personal pronouns to a much greater extent compared to research articles, which contain more noun repetition and specific terminology overall. In addition, the keywords proved to be significant for the respective genres, both in and out of their original context as well as in word clusters, forming word constructions typical of each genre. Overall, the study showed that while both genres are very much related through their roots in natural science research they accomplish the task of disseminating scientific information using different linguistic approaches.

APA, Harvard, Vancouver, ISO, and other styles

36

Al-Khonaizi, Mohammed Taqi. "Natural Arabic language text understanding." Thesis, University of Greenwich, 1999. http://gala.gre.ac.uk/6096/.

Full text

Abstract:

The most challenging part of natural language understanding is the representation of meaning. The current representation techniques are not sufficient to resolve the ambiguities, especially when the meaning is to be used for interrogation at a later stage. Arabic language represents a challenging field for Natural Language Processing (NLP) because of its rich eloquence and free word order, but at the same time it is a good platform to capture understanding because of its rich computational, morphological and grammar rules. Among different representation techniques, Lexical Functional Grammar (LFG) theory is found to be best suited for this task because of its structural approach. LFG lays down a computational approach towards NLP, especially the constituent and the functional structures, and models the completeness of relationships among the contents of each structure internally, as well as among the structures externally. The introduction of Artificial Intelligence (AI) techniques, such as knowledge representation and inferencing, enhances the capture of meaning by utilising domain specific common sense knowledge embedded in the model of domain of discourse and the linguistic rules that have been captured from the Arabic language grammar. This work has achieved the following results: (i) It is the first attempt to apply the LFG formalism on a full Arabic declarative text that consists of more than one paragraph. (ii) It extends the semantic structure of the LFG theory by incorporating a representation based on the thematic-role frames theory. (iii) It extends to the LFG theory to represent domain specific common sense knowledge. (iv) It automates the production process of the functional and semantic structures. (v) It automates the production process of domain specific common sense knowledge structure, which enhances the understanding ability of the system and resolves most ambiguities in subsequent question-answer sessions.

APA, Harvard, Vancouver, ISO, and other styles

37

Lee, Kelvin Kien-Hoanh. "Language and Character Identity: A Study of First-Person Pronouns in a Corpus of Science Fiction Anime Dialogue." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/28687.

Full text

Abstract:

Like a number of other Asian languages, Japanese is a language with a large number of pronouns. These do not vary according to the grammatical role of the referent/s (e.g. subject, object, etc.) but rather index social meanings. This thesis investigates how the indexical meanings commonly associated with Japanese first-person pronouns (1PPs) are recontextualised in the dialogue of anime television series to construct characters and convey aspects of their identity. This is a mixed-methods study which combines statistical analysis and quantitative frequency-based analyses of word lists and collocation with qualitative analyses of concordances and scenes to examine a newly constructed corpus, the corpus of Science-Fiction Anime dialogue (SciFAn corpus), which is comprised of the dialogue from five science-fiction anime television series. Combining corpus linguistic methodologies with a sociolinguistic approach, this study draws primarily on the sociolinguistic concept of indexicality in the discussion of the target forms (i.e. 1PPs) and their relationship to characterisation. Previous studies of Japanese 1PPs, particularly those examining anime dialogue, mainly discuss their usage in relation to construction of gender identity. The link between 1PP use and gender is examined using descriptive statistics and statistical tests. The results show that gender is a significant factor for the use of some 1PPs but not others. As a follow-up, this study shows that the use of 1PPs can convey other aspects of a character’s identity in addition to gender, such as their personality traits and presentational personae. Additionally, examinations of shifts or switches between different 1PPs by an individual character show that different 1PPs are used in the data examined to convey the fluidity of character identity as well as their complexity as characters

APA, Harvard, Vancouver, ISO, and other styles

38

Mahamood, Saad Ali. "Generating affective natural language for parents of neonatal infants." Thesis, University of Aberdeen, 2010. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=158569.

Full text

Abstract:

The thesis presented here describes original research in the field of Natural Language Generation (NLG). NLG is the subfield of artificial intelligence that is concerned with the automatic production of documents from underlying data. This thesis in particular focuses on developing new and novel methods for generating text that takes into consideration the recipient’s level of stress as a factor to adapt the resultant textural output. This consideration of taking the recipient level of stress was particularly salient due to the domain that this research was conducted under; providing information for parents of pre-term infants during neonatal intensive care (NICU). A highly technical and stressful environment for parents where emotional sensitivity must be shown for the nature of information presented. We have investigated the emotional and informational needs of these parents through an extensive past literature review and two separate research studies with former and current NICU parents. The NLG system built for this research was called BabyTalk Family (BT-Family). A system that can produce a textual summary of medical events that has occurred for a baby in NICU in last twenty-four hours for parents. The novelty of this system is that is capable of estimating the level of stress of the recipient and by using several affective NLG strategies it is able to tailor it’s output for a stressed audience. Unlike traditional NLG systems where the output would remain unchanged regardless of emotional state of the recipient. The key innovation in this system was the integration of several affective strategies in the Document Planner for tailoring textual output for stress recipients. BT-Family’s output was evaluated with thirteen parents that previously had baby in neonatal care. We developed a methodology for an evaluation that involved a direct comparison between stressed and unstressed text for the same given medical scenario for variables such as preference, understandability, helpfulness, and emotional appropriateness. The results, obtained showed the parents overwhelming preferred the stressed text for all of the variables measured.

APA, Harvard, Vancouver, ISO, and other styles

39

Cabral, Hayashida Sandra Raquel de Almeida 1963. "Periódicos científicos = a produção e a circulação da ciência da linguagem no Brasil = Scientific journals: production and circulation of the science of language in Brazil." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/270530.

Full text

Abstract:

Orientador: Claudia Regina Castellanos Pfeiffer
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem
Made available in DSpace on 2018-08-21T08:35:08Z (GMT). No. of bitstreams: 1 CabralHayashida_SandraRaqueldeAlmeida_D.pdf: 4018410 bytes, checksum: 69e29815ed1293579dcf65c8a5a8930b (MD5) Previous issue date: 2012
Resumo: Essa tese, inscrita no domínio da História das Ideias Linguísticas em uma articulação com a Análise de Discurso, apresenta um estudo sobre a produção e a circulação da Ciência da Linguagem no Brasil no espaço dos periódicos científicos. Inicialmente foi constituída uma listagem contendo um levantamento de periódicos científicos, instituições, editores, sumários e acontecimentos relacionados ao domínio do saber linguístico, entre os séculos XIX e XXI. A partir dessa relação foi possível propor uma periodização para as revistas científicas de linguagem dividindo-as em quatro partes: o primeiro período inicia-se em 1808 com a liberação da imprensa no Brasil, quando ainda não havia periódicos específicos de linguagem; o segundo período inicia-se com o surgimento das revistas de linguagem em 1910 com ênfase para os estudos filológicos; o terceiro período inicia-se na década de 1960 com o surgimento das primeiras revistas de Linguística; e o quarto período surge com o fortalecimento da pós-graduação, na década de 1990, que vai desencadear o aumento expressivo de revistas especializadas da área. Essa reflexão sobre as revistas mostra que elas surgem inicialmente ligadas a nome de pessoas, depois passam a ser editadas por academias, editoras, cursos de Letras, universidades, centros, associações e, hoje, a grande maioria está ligada a programas de pós-graduação e grupos de pesquisa. Como a política científica está necessariamente implicada na produção do conhecimento, foram analisados alguns programas desenvolvidos por órgãos de fomento para circulação do conhecimento, procurando pensar o lugar da circulação da Linguística na política científica. Concebendo os periódicos, assim como os congressos (Orlandi, 2002) e as associações (Pfeiffer, 2007) como lugar de representação da Linguística, foi possível perceber que até a década de 1950 a Linguística se mostra em defesa de um idioma nacional, construindo uma normatização para a Língua Portuguesa. Os professores, os estudiosos da língua estão preocupados com a formação do cidadão, com isso pôde-se ver uma linguística comprometida em ensinar à sociedade a gramática dessa língua, e isso representa na época a "arte" de falar e escrever corretamente. Pode-se dizer que a Linguística, introduzida no Brasil por Mattoso Câmara na década de 1940, começa a ganhar força e prestígio perante os estudos gramaticais a partir de 1960. Alguns acontecimentos colaboraram para o desenvolvimento da Linguística como a aprovação da NGB e a inclusão da Linguística nos cursos de Letras. Levando-se em conta ainda a aprovação da LDB, pode-se ver nos periódicos científicos uma linguística preocupada em formar um professor capaz de refletir sobre a língua. Com o surgimento e fortalecimento de associações e cursos de pós-graduação em Linguística percebe-se, dentre outras coisas, um deslizamento da Linguística para outros domínios, constituindo nessas articulações novos métodos, teorias e objetos de estudo que propõem à Linguística diversos desdobramentos e subdivisões, que disputam por um lugar autorizado/científico para dizer
Abstract: This thesis inscribed in the field of History of Linguistics Ideas in a joint with Discourse Analysis presents a study about the production and circulation of the Science of Language in Brazil in the space of scientific journals. Initially it was established a research file containing scientific journals, institutions, publishing houses abstracts, and events related to the domain of linguistics knowledge between the XIX and XXI centuries. From this file it was possible to propose a timeline for scientific journals of language dividing it into four parts: the first period begins in 1808 with the press liberation in Brazil, when there were no specific language journals. The second period begins with the emergence of language magazines in 1910 with emphasis on philological studies; the third period begins in the 1960s with the emergence of the first Linguistics magazines; and the fourth period appears with the strengthening of the post-graduation, in the 1990s, which will trigger the significant increase of specialized journals. This reflection on the journals shows that they arise initially linked to personal names, then go on to be edited by academies, publishing houses, Literature courses, universities, associations, and today the vast majority are linked to postgraduate programs and research groups. As the scientific politic is necessarily implied in knowledge, some programs developed by funding agencies for circulation of knowledge were analyzed, trying to think the place of Linguistics circulation in scientific politic. Conceiving the journals, as well as the conventions (Orlandi, 2002) and associations (Pfeiffer, 2007) as a place of representation of linguistics, it was revealed that until the 1950s the Linguistics shown in defense of a national language, building a normalization for the Portuguese Language. Teachers, students of language are concerned with the formation of the citizen, thus we could see a linguistics society committed to teaching the grammar of that language, and this represents at the time the "art" to speak and write correctly. The Linguistics, introduced in Brazil by Mattoso in the 1940s, gain strength and prestige before the grammatical studies in 1960 with the establishment of the NGB and the inclusion of language courses in Literature. Still taking into account the approval of the LDB, it can be seen in the scientific journals a linguistics concerned to educate a teacher capable to reflect about the language. With the emergence and strengthening of associations and post-graduate courses in linguistics it can be seen, among other things, a passage of linguistics to other fields, constituting in these joints new methods, theories and objects of study which propose to linguistics several developments and subdivisions, vying for a authorized/scientific place to say
Doutorado
Linguistica
Doutora em Linguística

APA, Harvard, Vancouver, ISO, and other styles

40

Pon-Barry, Heather Roberta. "Inferring Speaker Affect in Spoken Natural Language Communication." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10710.

Full text

Abstract:

The ﬁeld of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards machine listening—interpreting the full spectrum of information in an audio stream. One part of machine listening, the problem that this thesis focuses on, is the task of using information in the speech signal to infer a person’s emotional or mental state. In this dissertation, our approach is to assess the utility of prosody, or manner of speaking, in classifying speaker affect. Prosody refers to the acoustic features of natural speech: rhythm, stress, intonation, and energy. Affect refers to a person’s emotions and attitudes such as happiness, frustration, or uncertainty. We focus on one specific dimension of affect: level of certainty. Our goal is to automatically infer whether a person is conﬁdent or uncertain based on the prosody of his or her speech. Potential applications include conversational dialogue systems (e.g., in educational technology) and voice search (e.g., smartphone personal assistants). There are three main contributions of this thesis. The first contribution is a method for eliciting uncertain speech that binds a speaker’s uncertainty to a single phrase within the larger utterance, allowing us to compare the utility of contextually-based prosodic features. Second, we devise a technique for computing prosodic features from utterance segments that both improves uncertainty classification and can be used to determine which phrase a speaker is uncertain about. The level of certainty classifier achieves an accuracy of 75%. Third, we examine the differences between perceived, self-reported, and internal level of certainty, concluding that perceived certainty is aligned with internal certainty for some but not all speakers and that self-reports are a good proxy for internal certainty.
Engineering and Applied Sciences

APA, Harvard, Vancouver, ISO, and other styles

41

Ziegler, Nathan E. "Task Based Assessment: Evaluating Communication in the Real World." University of Toledo / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1192757581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Kočiský, Tomáš. "Deep learning for reading and understanding language." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:cc45e366-cdd8-495b-af42-dfd726700ff0.

Full text

Abstract:

This thesis presents novel tasks and deep learning methods for machine reading comprehension and question answering with the goal of achieving natural language understanding. First, we consider a semantic parsing task where the model understands sentences and translates them into a logical form or instructions. We present a novel semi-supervised sequential autoencoder that considers language as a discrete sequential latent variable and semantic parses as the observations. This model allows us to leverage synthetically generated unpaired logical forms, and thereby alleviate the lack of supervised training data. We show the semi-supervised model outperforms a supervised model when trained with the additional generated data. Second, reading comprehension requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess reading comprehension ability, in both artificial agents and children learning to read. We propose a new, challenging, supervised reading comprehension task. We gather a large-scale dataset of news stories from the CNN and Daily Mail websites with Cloze-style questions created from the highlights. This dataset allows for the first time training deep learning models for reading comprehension. We also introduce novel attention-based models for this task and present qualitative analysis of the attention mechanism. Finally, following the recent advances in reading comprehension in both models and task design, we further propose a new task for understanding complex narratives, NarrativeQA, consisting of full texts of books and movie scripts. We collect human written questions and answers based on high-level plot summaries. This task is designed to encourage development of models for language understanding; it is designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard reading comprehension models struggle on the tasks presented here.

APA, Harvard, Vancouver, ISO, and other styles

43

Nefdt, Ryan Mark. "The foundations of linguistics : mathematics, models, and structures." Thesis, University of St Andrews, 2016. http://hdl.handle.net/10023/9584.

Full text

Abstract:

The philosophy of linguistics is a rich philosophical domain which encompasses various disciplines. One of the aims of this thesis is to unite theoretical linguistics, the philosophy of language, the philosophy of science (particularly mathematics and modelling) and the ontology of language. Each part of the research presented here targets separate but related goals with the unified aim of bringing greater clarity to the foundations of linguistics from a philosophical perspective. Part I is devoted to the methodology of linguistics in terms of scientific modelling. I argue against both the Conceptualist and Platonist (as well as Pluralist) interpretations of linguistic theory by means of three grades of mathematical involvement for linguistic grammars. Part II explores the specific models of syntactic and semantics by an analogy with the harder sciences. In Part III, I develop a novel account of linguistic ontology and in the process comment on the type-token distinction, the role and connection with mathematics and the nature of linguistic objects. In this research, I offer a structural realist interpretation of linguistic methodology with a nuanced structuralist picture for its ontology. This proposal is informed by historical and current work in theoretical linguistics as well as philosophical views on ontology, scientific modelling and mathematics.

APA, Harvard, Vancouver, ISO, and other styles

44

Ringland, Nicola. "Structured Named Entities." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14558.

Full text

Abstract:

The names of people, locations, and organisations play a central role in language, and named entity recognition (NER) has been widely studied, and successfully incorporated, into natural language processing (NLP) applications. The most common variant of NER involves identifying and classifying proper noun mentions of these and miscellaneous entities as linear spans in text. Unfortunately, this version of NER is no closer to a detailed treatment of named entities than chunking is to a full syntactic analysis. NER, so construed, reflects neither the syntactic nor semantic structure of NE mentions, and provides insufficient categorical distinctions to represent that structure. Representing this nested structure, where a mention may contain mention(s) of other entities, is critical for applications such as coreference resolution. The lack of this structure creates spurious ambiguity in the linear approximation. Research in NER has been shaped by the size and detail of the available annotated corpora. The existing structured named entity corpora are either small, in specialist domains, or in languages other than English. This thesis presents our Nested Named Entity (NNE) corpus of named entities and numerical and temporal expressions, taken from the WSJ portion of the Penn Treebank (PTB, Marcus et al., 1993). We use the BBN Pronoun Coreference and Entity Type Corpus (Weischedel and Brunstein, 2005a) as our basis, manually annotating it with a principled, fine-grained, nested annotation scheme and detailed annotation guidelines. The corpus comprises over 279,000 entities over 49,211 sentences (1,173,000 words), including 118,495 top-level entities. Our annotations were designed using twelve high-level principles that guided the development of the annotation scheme and difficult decisions for annotators. We also monitored the semantic grammar that was being induced during annotation, seeking to identify and reinforce common patterns to maintain consistent, parsimonious annotations. The result is a scheme of 118 hierarchical fine-grained entity types and nesting rules, covering all capitalised mentions of entities, and numerical and temporal expressions. Unlike many corpora, we have developed detailed guidelines, including extensive discussion of the edge cases, in an ongoing dialogue with our annotators which is critical for consistency and reproducibility. We annotated independently from the PTB bracketing, allowing annotators to choose spans which were inconsistent with the PTB conventions and errors, and only refer back to it to resolve genuine ambiguity consistently. We merged our NNE with the PTB, requiring some systematic and one-off changes to both annotations. This allows the NNE corpus to complement other PTB resources, such as PropBank, and inform PTB-derived corpora for other formalisms, such as CCG and HPSG. We compare this corpus against BBN. We consider several approaches to integrating the PTB and NNE annotations, which affect the sparsity of grammar rules and visibility of syntactic and NE structure. We explore their impact on parsing the NNE and merged variants using the Berkeley parser (Petrov et al., 2006), which performs surprisingly well without specialised NER features. We experiment with flattening the NNE annotations into linear NER variants with stacked categories, and explore the ability of a maximum entropy and a CRF NER system to reproduce them. The CRF performs substantially better, but is infeasible to train on the enormous stacked category sets. The flattened output of the Berkeley parser are almost competitive with the CRF. Our results demonstrate that the NNE corpus is feasible for statistical models to reproduce. We invite researchers to explore new, richer models of (joint) parsing and NER on this complex and challenging task. Our nested named entity corpus will improve a wide range of NLP tasks, such as coreference resolution and question answering, allowing automated systems to understand and exploit the true structure of named entities.

APA, Harvard, Vancouver, ISO, and other styles

45

Pham, Son Bao Computer Science &amp Engineering Faculty of Engineering UNSW. "Incremental knowledge acquisition for natural language processing." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26299.

Full text

Abstract:

Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.

APA, Harvard, Vancouver, ISO, and other styles

46

Zhao, Yifan. "Language Learning through Dialogs:Mental Imagery and Parallel Sensory Input in Second Language Learning." University of Toledo / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1396634043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Casaregola, Laura. "How Our Music Tastes Relate to Language Attitudes with Standard and Non-standard Varieties of English." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/scripps_theses/1044.

Full text

Abstract:

Sociolinguistics studies on language perception have shown that listeners form different attitudes toward speakers based on the speakers’ language varieties (Lukes and Wiley 1996, Lippi-Green 2012, Thompson, Craig, and Washington 2004). Just from hearing a voice, listeners form opinions, and these opinions are often informed by societal archetypes, as well as societal stereotypes. For example, Standard American English is generally perceived with more prestige and respect than non-standard varieties. Unfavorable perceptions of non-standard varieties can, and in many documented cases does, lead to inequitable and/or discriminatory situations (Baugh 2003). Non-standard and standard varieties are found in language use in music. The emergence of the Internet and music playing platforms, as well as more diverse musicians getting mainstream radio play and pay, leads to non-standard varieties reaching new listeners in a new format. In this thesis, I survey the types of music to which people listen, and their perceptions to speakers of Standard American English, Southern American English, and African American English to investigate how the music people listen to connects to their language attitudes. The results show that overall, listeners of any genre have more favorable attitudes toward Standard American English; and, that listeners of rap and/or hip-hop have more favorable attitudes than other groups of listeners toward the non-standard varieties.

APA, Harvard, Vancouver, ISO, and other styles

48

Ong, Toan C. "Product reputation manipulation| The characteristics and impact of shill reviews." Thesis, University of Colorado at Denver, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3562656.

Full text

Abstract:

Online reviews have become a popular method for consumers to express personal evaluation about products. Ecommerce firms have invested heavily into review systems because of the impact of product reviews on product sales and shopping behavior. However, the usage of product reviews is undermined by the increasing appearance of shill or fake reviews. As initial steps to deter and detect shill reviews, this study attempts to understand characteristics of shill reviews and influences of shill reviews on product quality and shopping behavior. To reveal the linguistic characteristics of shill reviews, this study compares shill reviews and normal reviews on informativeness, readability and subjectivity level. The results show that these features can be used as reliable indicators to separate shill reviews from normal reviews. An experiment was conducted to measure the impact of shill reviews on perceived product quality. The results showed that positive shill reviews significantly increased quality perceptions of consumers for thinly reviewed products. This finding provides strong evidence about the risks of shill reviews and emphasizes the need to develop effective detection and prevention methods.

APA, Harvard, Vancouver, ISO, and other styles

49

Kozlowski, Raymond. "Uniform multilingual sentence generation using flexible lexico-grammatical resources." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file 0.93 Mb., 213 p, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:3200536.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Carpuat, Marine Jacinthe. "Word sense alignment using bilingual corpora /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20CARPUA.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 43-44). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Science of language (Linguistics)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles