Journal articles on the topic 'Natural language processing (Computer science) Compuational linguistics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Natural language processing (Computer science) Compuational linguistics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Søgaard, Anders. "Explainable Natural Language Processing." Synthesis Lectures on Human Language Technologies 14, no. 3 (September 21, 2021): 1–123. http://dx.doi.org/10.2200/s01118ed1v01y202107hlt051.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kim, Jin-Dong. "Biomedical Natural Language Processing." Computational Linguistics 43, no. 1 (April 2017): 265–67. http://dx.doi.org/10.1162/coli_r_00281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Dale, Robert, Hermann Moisl, and Harold Somers. "Handbook of Natural Language Processing." Computational Linguistics 27, no. 4 (December 2001): 602–3. http://dx.doi.org/10.1162/coli.2000.27.4.602.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

GAIZAUSKAS, R., P. J. RODGERS, and K. HUMPHREYS. "Visual Tools for Natural Language Processing." Journal of Visual Languages & Computing 12, no. 4 (August 2001): 375–412. http://dx.doi.org/10.1006/jvlc.2000.0203.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bird, Steven. "Natural Language Processing and Linguistic Fieldwork." Computational Linguistics 35, no. 3 (September 2009): 469–74. http://dx.doi.org/10.1162/coli.35.3.469.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Louis, Annie. "Natural Language Processing for Social Media." Computational Linguistics 42, no. 4 (December 2016): 833–36. http://dx.doi.org/10.1162/coli_r_00270.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Duh, Kevin. "Bayesian Analysis in Natural Language Processing." Computational Linguistics 44, no. 1 (March 2018): 187–89. http://dx.doi.org/10.1162/coli_r_00310.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Armstrong, Susan, Kenneth Church, Pierre Isabelle, Sandra Manzi, Evelyne Tzoukermann, and David Yarowsky. "Natural Language Processing Using Very Large Corpora." Computational Linguistics 26, no. 2 (June 2000): 294. http://dx.doi.org/10.1162/coli.2000.26.2.294a.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Yang, and Meng Zhang. "Neural Network Methods for Natural Language Processing." Computational Linguistics 44, no. 1 (March 2018): 193–95. http://dx.doi.org/10.1162/coli_r_00312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Amaral, Luiz, Detmar Meurers, and Ramon Ziai. "Analyzing learner language: towards a flexible natural language processing architecture for intelligent language tutors." Computer Assisted Language Learning 24, no. 1 (January 25, 2011): 1–16. http://dx.doi.org/10.1080/09588221.2010.520674.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Huang, Fei, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates. "Learning Representations for Weakly Supervised Natural Language Processing Tasks." Computational Linguistics 40, no. 1 (March 2014): 85–120. http://dx.doi.org/10.1162/coli_a_00167.

Full text
Abstract:
Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on part-of-speech tagging and information extraction, among other tasks, indicate that features taken from statistical language models, in combination with more traditional features, outperform traditional representations alone, and that graphical model representations outperform n-gram models, especially on sparse and polysemous words.
APA, Harvard, Vancouver, ISO, and other styles
12

Ponti, Edoardo Maria, Helen O’Horan, Yevgeni Berzak, Ivan Vulić, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, and Anna Korhonen. "Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing." Computational Linguistics 45, no. 3 (September 2019): 559–601. http://dx.doi.org/10.1162/coli_a_00357.

Full text
Abstract:
Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.
APA, Harvard, Vancouver, ISO, and other styles
13

Belov, Serey, Daria Zrelova, Petr Zrelov, and Vladimir Korenkov. "Overview of methods for automatic natural language text processing." System Analysis in Science and Education, no. 3 (2020) (September 30, 2020): 8–22. http://dx.doi.org/10.37005/2071-9612-2020-3-8-22.

Full text
Abstract:
This paper provides a brief overview of modern methods and approaches used for automatic processing of text information. In English-language literature, this area of science is called NLP-Natural Language Processing. The very name suggests that the subject of analysis (and for many tasks – and synthesis) are materials presented in one of the natural languages (and for a number of tasks – in several languages simultaneously), i.e. national languages of communication between people. Programming languages are not included in this group. In Russian-language literature, this area is called Computer (or mathematical) linguistics. NLP (computational linguistics) usually includes speech analysis along with text analysis, but in this review speech analysis does not consider. The review used materials from original works, monographs, and a number of articles published the «Open Systems.DBMS» journal.
APA, Harvard, Vancouver, ISO, and other styles
14

Nissan, Ephraim. "Review of Starkey (1992): Connectionist Natural Language Processing: Readings from ‘Connection Science’." Pragmatics and Cognition 5, no. 2 (January 1, 1997): 383–84. http://dx.doi.org/10.1075/pc.5.2.13nis.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Nicolov, Nicolas, and Ruslan Mitkov. "Recent Advances in Natural Language Processing II: Selected Papers from RANLP '97." Computational Linguistics 27, no. 4 (December 2001): 603. http://dx.doi.org/10.1162/coli.2000.27.4.603b.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Costa-jussà, Marta R., Cristina España-Bonet, Pascale Fung, and Noah A. Smith. "Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction." Computational Linguistics 46, no. 2 (June 2020): 249–55. http://dx.doi.org/10.1162/coli_a_00373.

Full text
Abstract:
We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing. We situate the special issue’s five articles in the context of our fast-changing field, explaining our motivation for this project. We offer a brief summary of the work in the issue, which includes developments on lexical and sentential semantic representations, from symbolic and neural perspectives.
APA, Harvard, Vancouver, ISO, and other styles
17

Sun, Xu, Wenjie Li, Houfeng Wang, and Qin Lu. "Feature-Frequency–Adaptive On-line Training for Fast and Accurate Natural Language Processing." Computational Linguistics 40, no. 3 (September 2014): 563–86. http://dx.doi.org/10.1162/coli_a_00193.

Full text
Abstract:
Training speed and accuracy are two major concerns of large-scale natural language processing systems. Typically, we need to make a tradeoff between speed and accuracy. It is trivial to improve the training speed via sacrificing accuracy or to improve the accuracy via sacrificing speed. Nevertheless, it is nontrivial to improve the training speed and the accuracy at the same time, which is the target of this work. To reach this target, we present a new training method, feature-frequency–adaptive on-line training, for fast and accurate training of natural language processing systems. It is based on the core idea that higher frequency features should have a learning rate that decays faster. Theoretical analysis shows that the proposed method is convergent with a fast convergence rate. Experiments are conducted based on well-known benchmark tasks, including named entity recognition, word segmentation, phrase chunking, and sentiment analysis. These tasks consist of three structured classification tasks and one non-structured classification task, with binary features and real-valued features, respectively. Experimental results demonstrate that the proposed method is faster and at the same time more accurate than existing methods, achieving state-of-the-art scores on the tasks with different characteristics.
APA, Harvard, Vancouver, ISO, and other styles
18

Wiebe, Janyce, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. "Learning Subjective Language." Computational Linguistics 30, no. 3 (September 2004): 277–308. http://dx.doi.org/10.1162/0891201041850885.

Full text
Abstract:
Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. Clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identified using distributional similarity. The features are also examined working together in concert. The features, generated from different data sets using different procedures, exhibit consistency in performance in that they all do better and worse on the same data sets. In addition, this article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinion piece recognition (a type of text categorization and genre detection) to demonstrate the utility of the knowledge acquired in this article.
APA, Harvard, Vancouver, ISO, and other styles
19

Musthofa, Musthofa. "COMPUTATIONAL LINGUISTICS (Model Baru Kajian Linguistik dalam Perspektif Komputer)." Adabiyyāt: Jurnal Bahasa dan Sastra 9, no. 2 (December 31, 2010): 247. http://dx.doi.org/10.14421/ajbs.2010.09203.

Full text
Abstract:
This paper describes a new discipline in applied linguistics studies, computational linguistics. It’s a new model of applied linguistics which is influenced by computer technology. Computational linguistics is a discipline straddling applied linguistics and computer science that is concerned with the computer processing of natural languages on all levels of linguistic description. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. The several areas of computational linguistics study encompasses such practical applications as speech recognition systems, speech synthesis, automated voice response systems, web search engines, text editors, grammar checking, text to speech, corpus linguistics, machine translation, text data mining, and others. This paper presents the definition of computational linguistics, relation between language and computer, and area of computational linguistics studies.
APA, Harvard, Vancouver, ISO, and other styles
20

Dou, Jinhua, Jingyan Qin, Zanxia Jin, and Zhuang Li. "Knowledge graph based on domain ontology and natural language processing technology for Chinese intangible cultural heritage." Journal of Visual Languages & Computing 48 (October 2018): 19–28. http://dx.doi.org/10.1016/j.jvlc.2018.06.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Shutova, Ekaterina. "Design and Evaluation of Metaphor Processing Systems." Computational Linguistics 41, no. 4 (December 2015): 579–623. http://dx.doi.org/10.1162/coli_a_00233.

Full text
Abstract:
System design and evaluation methodologies receive significant attention in natural language processing (NLP), with the systems typically being evaluated on a common task and against shared data sets. This enables direct system comparison and facilitates progress in the field. However, computational work on metaphor is considerably more fragmented than similar research efforts in other areas of NLP and semantics. Recent years have seen a growing interest in computational modeling of metaphor, with many new statistical techniques opening routes for improving system accuracy and robustness. However, the lack of a common task definition, shared data set, and evaluation strategy makes the methods hard to compare, and thus hampers our progress as a community in this area. The goal of this article is to review the system features and evaluation strategies that have been proposed for the metaphor processing task, and to analyze their benefits and downsides, with the aim of identifying the desired properties of metaphor processing systems and a set of requirements for their evaluation.
APA, Harvard, Vancouver, ISO, and other styles
22

Reiter, Ehud, and Somayajulu Sripada. "Human Variation and Lexical Choice." Computational Linguistics 28, no. 4 (December 2002): 545–53. http://dx.doi.org/10.1162/089120102762671981.

Full text
Abstract:
Much natural language processing research implicitly assumes that word meanings are fixed in a language community, but in fact there is good evidence that different people probably associate slightly different meanings with words. We summarize some evidence for this claim from the literature and from an ongoing research project, and discuss its implications for natural language generation, especially for lexical choice, that is, choosing appropriate words for a generated text.
APA, Harvard, Vancouver, ISO, and other styles
23

Sprugnoli, Rachele, and Sara Tonelli. "Novel Event Detection and Classification for Historical Texts." Computational Linguistics 45, no. 2 (June 2019): 229–65. http://dx.doi.org/10.1162/coli_a_00347.

Full text
Abstract:
Event processing is an active area of research in the Natural Language Processing community, but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts Particularly in the current era of massive digitization of historical sources: Research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorized into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry as yet underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus, and new models for automatic annotation.
APA, Harvard, Vancouver, ISO, and other styles
24

Shutova, Ekaterina, Lin Sun, Elkin Darío Gutiérrez, Patricia Lichtenstein, and Srini Narayanan. "Multilingual Metaphor Processing: Experiments with Semi-Supervised and Unsupervised Learning." Computational Linguistics 43, no. 1 (April 2017): 71–123. http://dx.doi.org/10.1162/coli_a_00275.

Full text
Abstract:
Highly frequent in language and communication, metaphor represents a significant challenge for Natural Language Processing (NLP) applications. Computational work on metaphor has traditionally evolved around the use of hand-coded knowledge, making the systems hard to scale. Recent years have witnessed a rise in statistical approaches to metaphor processing. However, these approaches often require extensive human annotation effort and are predominantly evaluated within a limited domain. In contrast, we experiment with weakly supervised and unsupervised techniques—with little or no annotation—to generalize higher-level mechanisms of metaphor from distributional properties of concepts. We investigate different levels and types of supervision (learning from linguistic examples vs. learning from a given set of metaphorical mappings vs. learning without annotation) in flat and hierarchical, unconstrained and constrained clustering settings. Our aim is to identify the optimal type of supervision for a learning algorithm that discovers patterns of metaphorical association from text. In order to investigate the scalability and adaptability of our models, we applied them to data in three languages from different language groups—English, Spanish, and Russian—achieving state-of-the-art results with little supervision. Finally, we demonstrate that statistical methods can facilitate and scale up cross-linguistic research on metaphor.
APA, Harvard, Vancouver, ISO, and other styles
25

Recasens, Marta, and Marta Vila. "On Paraphrase and Coreference." Computational Linguistics 36, no. 4 (December 2010): 639–47. http://dx.doi.org/10.1162/coli_a_00014.

Full text
Abstract:
By providing a better understanding of paraphrase and coreference in terms of similarities and differences in their linguistic nature, this article delimits what the focus of paraphrase extraction and coreference resolution tasks should be, and to what extent they can help each other. We argue for the relevance of this discussion to Natural Language Processing.
APA, Harvard, Vancouver, ISO, and other styles
26

Lyons, Michael J., Kazunori Morikawa, and Shigeru Akamatsu. "A linked aggregate code for processing faces." Facial Information Processing 8, no. 1 (May 17, 2000): 63–81. http://dx.doi.org/10.1075/pc.8.1.04lyo.

Full text
Abstract:
A model of face representation, inspired by known biology of the visual system, is compared to experimental data on the perception of facial similarity. The face representation model uses aggregate primary visual cortex (V1) cell responses topographically linked to a grid covering the face, allowing comparison of shape and texture at corresponding points in two facial images. When a set of relatively similar faces was used as stimuli, this “linked aggregate code” (LAC) predicted human performance in similarity judgment experiments. When faces of different categories were used, natural facial dimensions such as sex and race emerged from the LAC model without training. The dimensional structure of the LAC similarity measure for the mixed-category task displayed some psychologically plausible features, but also highlighted shortcomings of the proposed representation. The results suggest that the LAC based similarity measure may be useful as an interesting starting point for further modeling studies of face representation in higher visual areas.
APA, Harvard, Vancouver, ISO, and other styles
27

Li, Liuqing, Jack Geissinger, William A. Ingram, and Edward A. Fox. "Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning." Data and Information Management 4, no. 1 (March 24, 2020): 18–43. http://dx.doi.org/10.2478/dim-2020-0003.

Full text
Abstract:
AbstractNatural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.
APA, Harvard, Vancouver, ISO, and other styles
28

Horacek, Helmut. "Building Natural Language Generation Systems Ehud Reiter and Robert Dale (University of Aberdeen and Macquarie University) Cambridge University Press (Studies in natural language processing), 2000, xxi+248 pp; hardbound, ISBN 0-521-62036-8, $59.95." Computational Linguistics 27, no. 2 (June 2001): 298–300. http://dx.doi.org/10.1162/coli.2000.27.2.298.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Benamara, Farah, Diana Inkpen, and Maite Taboada. "Introduction to the Special Issue on Language in Social Media: Exploiting Discourse and Other Contextual Information." Computational Linguistics 44, no. 4 (December 2018): 663–81. http://dx.doi.org/10.1162/coli_a_00333.

Full text
Abstract:
Social media content is changing the way people interact with each other and share information, personal messages, and opinions about situations, objects, and past experiences. Most social media texts are short online conversational posts or comments that do not contain enough information for natural language processing (NLP) tools, as they are often accompanied by non-linguistic contextual information, including meta-data (e.g., the user’s profile, the social network of the user, and their interactions with other users). Exploiting such different types of context and their interactions makes the automatic processing of social media texts a challenging research task. Indeed, simply applying traditional text mining tools is clearly sub-optimal, as, typically, these tools take into account neither the interactive dimension nor the particular nature of this data, which shares properties with both spoken and written language. This special issue contributes to a deeper understanding of the role of these interactions to process social media data from a new perspective in discourse interpretation. This introduction first provides the necessary background to understand what context is from both the linguistic and computational linguistic perspectives, then presents the most recent context-based approaches to NLP for social media. We conclude with an overview of the papers accepted in this special issue, highlighting what we believe are the future directions in processing social media texts.
APA, Harvard, Vancouver, ISO, and other styles
30

Nivre, Joakim, and Daniel Fernández-González. "Arc-Eager Parsing with the Tree Constraint." Computational Linguistics 40, no. 2 (June 2014): 259–67. http://dx.doi.org/10.1162/coli_a_00185.

Full text
Abstract:
The arc-eager system for transition-based dependency parsing is widely used in natural language processing despite the fact that it does not guarantee that the output is a well-formed dependency tree. We propose a simple modification to the original system that enforces the tree constraint without requiring any modification to the parser training procedure. Experiments on multiple languages show that the method on average achieves 72% of the error reduction possible and consistently outperforms the standard heuristic in current use.
APA, Harvard, Vancouver, ISO, and other styles
31

Jiménez-Zafra, Salud María, Roser Morante, María Teresa Martín-Valdivia, and L. Alfonso Ureña-López. "Corpora Annotated with Negation: An Overview." Computational Linguistics 46, no. 1 (March 2020): 1–52. http://dx.doi.org/10.1162/coli_a_00371.

Full text
Abstract:
Negation is a universal linguistic phenomenon with a great qualitative impact on natural language processing applications. The availability of corpora annotated with negation is essential to training negation processing systems. Currently, most corpora have been annotated for English, but the presence of languages other than English on the Internet, such as Chinese or Spanish, is greater every day. In this study, we present a review of the corpora annotated with negation information in several languages with the goal of evaluating what aspects of negation have been annotated and how compatible the corpora are. We conclude that it is very difficult to merge the existing corpora because we found differences in the annotation schemes used, and most importantly, in the annotation guidelines: the way in which each corpus was tokenized and the negation elements that have been annotated. Differently than for other well established tasks like semantic role labeling or parsing, for negation there is no standard annotation scheme nor guidelines, which hampers progress in its treatment.
APA, Harvard, Vancouver, ISO, and other styles
32

Cohn, Trevor, Chris Callison-Burch, and Mirella Lapata. "Constructing Corpora for the Development and Evaluation of Paraphrase Systems." Computational Linguistics 34, no. 4 (December 2008): 597–614. http://dx.doi.org/10.1162/coli.08-003-r1-07-044.

Full text
Abstract:
Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.
APA, Harvard, Vancouver, ISO, and other styles
33

Padó, Sebastian, and Mirella Lapata. "Dependency-Based Construction of Semantic Space Models." Computational Linguistics 33, no. 2 (June 2007): 161–99. http://dx.doi.org/10.1162/coli.2007.33.2.161.

Full text
Abstract:
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art.
APA, Harvard, Vancouver, ISO, and other styles
34

Тарабань, Роман, Кодуру Лакшмоджі, Марк ЛаКур, and Філіп Маршалл. "Finding a Common Ground in Human and Machine-Based Text Processing." East European Journal of Psycholinguistics 5, no. 1 (June 30, 2018): 83–91. http://dx.doi.org/10.29038/eejpl.2018.5.1.tar.

Full text
Abstract:
Language makes human communication possible. Apart from everyday applications, language can provide insights into individuals’ thinking and reasoning. Machine-based analyses of text are becoming widespread in business applications, but their utility in learning contexts are a neglected area of research. Therefore, the goal of the present work is to explore machine-assisted approaches to aid in the analysis of students’ written compositions. A method for extracting common topics from written text is applied to 78 student papers on technology and ethics. The primary tool for analysis is the Latent Dirichlet Allocation algorithm. The results suggest that this machine-based topic extraction method is effective and supports a promising prospect for enhancing classroom learning and instruction. The method may also prove beneficial in other applied applications, like those in clinical and counseling practice. References Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993-1022. Bruner, J. (1990). Acts of meaning. Cambridge, MA: Harvard University Press. Chen, K. Y. M., & Wang, Y. (2007). Latent dirichlet allocation. http://acsweb.ucsd.edu/~yuw176/ report/lda.pdf. Chung, C. K., & Pennebaker, J. W. (2008). Revealing dimensions of thinking in open-ended self-descriptions: An automated meaning extraction method for natural language. Journal of research in personality, 42(1), 96-132. Feldman, S. (1999). NLP meets the Jabberwocky: Natural language processing in information retrieval. Online Magazine, 23, 62-73. Retrieved from: http://www.onlinemag.net/OL1999/ feldmann5.html Mishlove, J. (2010). https://www.youtube.com/watch?v=0XTDLq34M18 (Accessed June 12, 2018). Ostrowski, D. A. (2015). Using latent dirichlet allocation for topic modelling in twitter. In Semantic Computing (ICSC), 2015 IEEE International Conference (pp. 493-497). IEEE. Pennebaker, J. W. (2004). Theories, therapies, and taxpayers: On the complexities of the expressive writing paradigm. Clinical Psychology: Science and Practice, 11(2), 138-142. Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC 2015. Austin, TX: University of Texas at Austin. Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PLoS ONE, 9(12), e115844. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296-1312. Recchia, G., Sahlgren, M., Kanerva, P., & Jones, M. N. (2015). Encoding sequential information in semantic space models: Comparing holographic reduced representation and random permutation. Computational intelligence and neuroscience, 2015, 1-18. Salzmann, Z. (2004). Language, Culture, and Society: An Introduction to Linguistic Anthropology (3rd ed). Westview Press. Schank, R. C., Goldman, N. M., Rieger III, C. J., & Riesbeck, C. (1973). MARGIE: Memory analysis response generation, and inference on English. In IJCAI, 3, 255-261. Taraban, R., Marcy, W. M., LaCour Jr., M. S., & Burgess II, R. A. (2017). Developing machine-assisted analysis of engineering students’ ethics course assignments. Proceedings of the American Society of Engineering Education (ASEE) Annual Conference, Columbus, OH. https://www.asee.org/public/conferences/78/papers/19234/view. Taraban, R., Marcy, W. M., LaCour, M. S., Pashley, D., & Keim, K. (2018). Do engineering students learn ethics from an ethics course? Proceedings of the American Society of Engineering Education – Gulf Southwest (ASEE-GSW) Annual Conference, Austin, TX. http://www.aseegsw18.com/papers.html. Taraban, R., & Marshall, P. H. (2017). Deep learning and competition in psycholinguistic research. East European Journal of Psycholinguistics, 4(2), 67-74. Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45. Winograd, T. (1972). Understanding natural language. New York: Academic Press.
APA, Harvard, Vancouver, ISO, and other styles
35

Mihov, Stoyan, and Klaus U. Schulz. "Fast Approximate Search in Large Dictionaries." Computational Linguistics 30, no. 4 (December 2004): 451–77. http://dx.doi.org/10.1162/0891201042544938.

Full text
Abstract:
The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to Pdoes not exceed a given (small) bound k. In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a “universal Levenshtein automaton,” we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixed-distance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.
APA, Harvard, Vancouver, ISO, and other styles
36

Argamon, Shlomo Engelson. "Register in computational language research." Register Studies 1, no. 1 (April 26, 2019): 100–135. http://dx.doi.org/10.1075/rs.18015.arg.

Full text
Abstract:
Abstract Shlomo Argamon is Professor of Computer Science and Director of the Master of Data Science Program at the Illinois Institute of Technology (USA). In this article, he reflects on the current and potential relationship between register and the field of computational linguistics. He applies his expertise in computational linguistics and machine learning to a variety of problems in natural language processing. These include stylistic variation, forensic linguistics, authorship attribution, and biomedical informatics. He is particularly interested in the linguistic structures used by speakers and writers, including linguistic choices that are influenced by social variables such as age, gender, and register, as well as linguistic choices that are unique or distinctive to the style of individual authors. Argamon has been a pioneer in computational linguistics and NLP research in his efforts to account for and explore register variation. His computational linguistic research on register draws inspiration from Systemic Functional Linguistics, Biber’s multi-dimensional approach to register variation, as well as his own extensive experience accounting for variation within and across text types and authors. Argamon has applied computational methods to text classification and description across registers – including blogs, academic disciplines, and news writing – as well as the interaction between register and other social variables, such as age and gender. His cutting-edge research in these areas is certain to have a lasting impact on the future of computational linguistics and NLP.
APA, Harvard, Vancouver, ISO, and other styles
37

Gao, Jianfeng, Mu Li, Chang-Ning Huang, and Andi Wu. "Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach." Computational Linguistics 31, no. 4 (December 2005): 531–74. http://dx.doi.org/10.1162/089120105775299177.

Full text
Abstract:
This article presents a pragmatic approach to Chinese word segmentation. It differs from most previous approaches mainly in three respects. First, while theoretical linguists have defined Chinese words using various linguistic criteria, Chinese words in this study are defined pragmatically as segmentation units whose definition depends on how they are used and processed in realistic computer applications. Second, we propose a pragmatic mathematical framework in which segmenting known words and detecting unknown words of different types (i.e., morphologically derived words, factoids, named entities, and other unlisted words) can be performed simultaneously in a unified way. These tasks are usually conducted separately in other systems. Finally, we do not assume the existence of a universal word segmentation standard that is application-independent. Instead, we argue for the necessity of multiple segmentation standards due to the pragmatic fact that different natural language processing applications might require different granularities of Chinese words. These pragmatic approaches have been implemented in an adaptive Chinese word segmenter, called MSRSeg, which will be described in detail. It consists of two components: (1) a generic segmenter that is based on the framework of linear mixture models and provides a unified approach to the five fundamental features of word-level Chinese language processing: lexicon word processing, morphological analysis, factoid detection, named entity recognition, and new word identification; and (2) a set of output adaptors for adapting the output of (1) to different application-specific standards. Evaluation on five test sets with different standards shows that the adaptive system achieves state-of-the-art performance on all the test sets.
APA, Harvard, Vancouver, ISO, and other styles
38

Prud'hommeaux, Emily, and Brian Roark. "Graph-Based Word Alignment for Clinical Language Evaluation." Computational Linguistics 41, no. 4 (December 2015): 549–78. http://dx.doi.org/10.1162/coli_a_00232.

Full text
Abstract:
Among the more recent applications for natural language processing algorithms has been the analysis of spoken language data for diagnostic and remedial purposes, fueled by the demand for simple, objective, and unobtrusive screening tools for neurological disorders such as dementia. The automated analysis of narrative retellings in particular shows potential as a component of such a screening tool since the ability to produce accurate and meaningful narratives is noticeably impaired in individuals with dementia and its frequent precursor, mild cognitive impairment, as well as other neurodegenerative and neurodevelopmental disorders. In this article, we present a method for extracting narrative recall scores automatically and highly accurately from a word-level alignment between a retelling and the source narrative. We propose improvements to existing machine translation–based systems for word alignment, including a novel method of word alignment relying on random walks on a graph that achieves alignment accuracy superior to that of standard expectation maximization–based techniques for word alignment in a fraction of the time required for expectation maximization. In addition, the narrative recall score features extracted from these high-quality word alignments yield diagnostic classification accuracy comparable to that achieved using manually assigned scores and significantly higher than that achieved with summary-level text similarity metrics used in other areas of NLP. These methods can be trivially adapted to spontaneous language samples elicited with non-linguistic stimuli, thereby demonstrating the flexibility and generalizability of these methods.
APA, Harvard, Vancouver, ISO, and other styles
39

Shaw, David. "A New Call Laboratory." ReCALL 3, no. 4 (May 1991): 2–4. http://dx.doi.org/10.1017/s0958344000002482.

Full text
Abstract:
After attending the 1989 Exeter CALL Conference, David Shaw and John Partridge, two teachers from the University of Kent, recommended to the School of European and Modem Language Studies that the School should establish its own Computer Assisted Language Learning Laboratory. Several of us had been ‘keeping an eye’ on CALL for quite a few years, from the days when BBC micros were innovative marvels. The Applied Languages Board had acquired a BBC and some software and had gained some experience with it in postgraduate courses. David Shaw had been supervising practical programming projects for MSc students in Computing in the area of CALL and natural language processing. Our recommendation was that, with a new generation of microcomputers supplanting the trusty but limited BBC micro, a point had been reached where it would be realistic for the School to establish a CALL teaching laboratory on a more ambitious scale.
APA, Harvard, Vancouver, ISO, and other styles
40

Resnik, Philip, and Noah A. Smith. "The Web as a Parallel Corpus." Computational Linguistics 29, no. 3 (September 2003): 349–80. http://dx.doi.org/10.1162/089120103322711578.

Full text
Abstract:
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.
APA, Harvard, Vancouver, ISO, and other styles
41

Tsarfaty, Reut, Djamé Seddah, Sandra Kübler, and Joakim Nivre. "Parsing Morphologically Rich Languages: Introduction to the Special Issue." Computational Linguistics 39, no. 1 (March 2013): 15–22. http://dx.doi.org/10.1162/coli_a_00133.

Full text
Abstract:
Parsing is a key task in natural language processing. It involves predicting, for each natural language sentence, an abstract representation of the grammatical entities in the sentence and the relations between these entities. This representation provides an interface to compositional semantics and to the notions of “who did what to whom.” The last two decades have seen great advances in parsing English, leading to major leaps also in the performance of applications that use parsers as part of their backbone, such as systems for information extraction, sentiment analysis, text summarization, and machine translation. Attempts to replicate the success of parsing English for other languages have often yielded unsatisfactory results. In particular, parsing languages with complex word structure and flexible word order has been shown to require non-trivial adaptation. This special issue reports on methods that successfully address the challenges involved in parsing a range of morphologically rich languages (MRLs). This introduction characterizes MRLs, describes the challenges in parsing MRLs, and outlines the contributions of the articles in the special issue. These contributions present up-to-date research efforts that address parsing in varied, cross-lingual settings. They show that parsing MRLs addresses challenges that transcend particular representational and algorithmic choices.
APA, Harvard, Vancouver, ISO, and other styles
42

Madnani, Nitin, and Bonnie J. Dorr. "Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods." Computational Linguistics 36, no. 3 (September 2010): 341–87. http://dx.doi.org/10.1162/coli_a_00002.

Full text
Abstract:
The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language—words, phrases, and sentences—is an important part of natural language processing (NLP) and is being increasingly employed to improve the performance of several NLP applications. In this article, we attempt to conduct a comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research. Recent work done in manual and automatic construction of paraphrase corpora is also examined. We also discuss the strategies used for evaluating paraphrase generation techniques and briefly explore some future trends in paraphrase generation.
APA, Harvard, Vancouver, ISO, and other styles
43

Chiang, David, Frank Drewes, Daniel Gildea, Adam Lopez, and Giorgio Satta. "Weighted DAG Automata for Semantic Graphs." Computational Linguistics 44, no. 1 (March 2018): 119–86. http://dx.doi.org/10.1162/coli_a_00309.

Full text
Abstract:
Graphs have a variety of uses in natural language processing, particularly as representations of linguistic meaning. A deficit in this area of research is a formal framework for creating, combining, and using models involving graphs that parallels the frameworks of finite automata for strings and finite tree automata for trees. A possible starting point for such a framework is the formalism of directed acyclic graph (DAG) automata, defined by Kamimura and Slutzki and extended by Quernheim and Knight. In this article, we study the latter in depth, demonstrating several new results, including a practical recognition algorithm that can be used for inference and learning with models defined on DAG automata. We also propose an extension to graphs with unbounded node degree and show that our results carry over to the extended formalism.
APA, Harvard, Vancouver, ISO, and other styles
44

Higgins, Derrick, and Jerrold M. Sadock. "A Machine Learning Approach to Modeling Scope Preferences." Computational Linguistics 29, no. 1 (March 2003): 73–96. http://dx.doi.org/10.1162/089120103321337449.

Full text
Abstract:
This article describes a corpus-based investigation of quantifier scope preferences. Following recent work on multimodular grammar frameworks in theoretical linguistics and a long history of combining multiple information sources in natural language processing, scope is treated as a distinct module of grammar from syntax. This module incorporates multiple sources of evidence regarding the most likely scope reading for a sentence and is entirely data-driven. The experiments discussed in this article evaluate the performance of our models in predicting the most likely scope reading for a particular sentence, using Penn Treebank data both with and without syntactic annotation. We wish to focus attention on the issue of determining scope preferences, which has largely been ignored in theoretical linguistics, and to explore different models of the interaction between syntax and quantifier scope.
APA, Harvard, Vancouver, ISO, and other styles
45

Stamatatos, Efstathios, Nikos Fakotakis, and George Kokkinakis. "Automatic Text Categorization in Terms of Genre and Author." Computational Linguistics 26, no. 4 (December 2000): 471–95. http://dx.doi.org/10.1162/089120100750105920.

Full text
Abstract:
The two main factors that characterize a text are its content and its style, and both can be used as a means of categorization. In this paper we present an approach to text categorization in terms of genre and author for Modern Greek. In contrast to previous stylometric approaches, we attempt to take full advantage of existing natural language processing (NLP) tools. To this end, we propose a set of style markers including analysis-level measures that represent the way in which the input text has been analyzed and capture useful stylistic information without additional cost. We present a set of small-scale but reasonable experiments in text genre detection, author identification, and author verification tasks and show that the proposed method performs better than the most popular distributional lexical measures, i.e., functions of vocabulary richness and frequencies of occurrence of the most frequent words. All the presented experiments are based on unrestricted text downloaded from the World Wide Web without any manual text preprocessing or text sampling. Various performance issues regarding the training set size and the significance of the proposed style markers are discussed. Our system can be used in any application that requires fast and easily adaptable text categorization in terms of stylistically homogeneous categories. Moreover, the procedure of defining analysis-level markers can be followed in order to extract useful stylistic information using existing text processing tools.
APA, Harvard, Vancouver, ISO, and other styles
46

Kaiser, Ed. "Extended Finite State Models of Language András Kornai (editor) (BBN Technologies) Cambridge University Press (Studies in natural language processing), 1999, xii+278 pp and CD-ROM; hardbound, ISBN 0-521-63198-X, $59.95." Computational Linguistics 26, no. 2 (June 2000): 282–85. http://dx.doi.org/10.1162/coli.2000.26.2.282.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Vadas, David, and James R. Curran. "Parsing Noun Phrases in the Penn Treebank." Computational Linguistics 37, no. 4 (December 2011): 753–809. http://dx.doi.org/10.1162/coli_a_00076.

Full text
Abstract:
Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Chunlin, Irene Castellón, and Elisabet Comelles. "Linguistic analysis of datasets for semantic textual similarity." Digital Scholarship in the Humanities 35, no. 2 (April 27, 2019): 471–84. http://dx.doi.org/10.1093/llc/fqy076.

Full text
Abstract:
Abstract Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.
APA, Harvard, Vancouver, ISO, and other styles
49

Surdeanu, Mihai, Massimiliano Ciaramita, and Hugo Zaragoza. "Learning to Rank Answers to Non-Factoid Questions from Web Collections." Computational Linguistics 37, no. 2 (June 2011): 351–83. http://dx.doi.org/10.1162/coli_a_00051.

Full text
Abstract:
This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks.
APA, Harvard, Vancouver, ISO, and other styles
50

Ofazer, Kemal, Sergei Nirenburg, and Marjorie McShane. "Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning." Computational Linguistics 27, no. 1 (March 2001): 59–85. http://dx.doi.org/10.1162/089120101300346804.

Full text
Abstract:
This paper presents a semiautomatic technique for developing broad-coverage finite-state morphological analyzers for use in natural language processing applications. It consists of three components—elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inØectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography