To see the other types of publications on this topic, follow the link: Corpora analysis.

Journal articles on the topic 'Corpora analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Corpora analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Park, Chanjun, Midan Shim, Sugyeong Eo, et al. "Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC." Applied Sciences 12, no. 11 (2022): 5545. http://dx.doi.org/10.3390/app12115545.

Full text
Abstract:
The machine translation system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation. One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other high-resource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance.
APA, Harvard, Vancouver, ISO, and other styles
2

Orzigul Ablakulova. "Enhancing English Language Teaching through Corpora Analysis." Texas Journal of Philology, Culture and History 28 (March 6, 2024): 24–26. http://dx.doi.org/10.62480/tjpch.2024.vol28.pp24-26.

Full text
Abstract:
This article discusses the integration of corpora analysis into English language teaching. It highlights the benefits, methodologies, and potential challenges associated with incorporating corpora analysis in the classroom. The author explains that corpora, which are large collections of written and spoken texts, provide teachers and learners with authentic language data to facilitate language learning and teaching. The article presents various methods for implementing corpora analysis, including selecting suitable corpora, introducing learners to corpora tools and software, and incorporating corpus-based activities into the curriculum
APA, Harvard, Vancouver, ISO, and other styles
3

Juhary, Jowati, Erda Wati Bakar, Mardziah Shamsudin, and Asniah Alias. "Understanding Malay Corpora: A Content Analysis of 15 Malay Corpora." JOURNAL OF ADVANCES IN LINGUISTICS 12 (October 8, 2021): 18–26. http://dx.doi.org/10.24297/jal.v12i.9122.

Full text
Abstract:
Corpus research becomes an important area of research of late, especially in Malaysia and for the national language, Malay language. A corpus includes texts and transcriptions of speeches for variety of situations. For this short paper, the focus is on Malay language, which is the national and official language of Malaysia. The purposes of this paper are to identify features and types of Malay Corpora and to determine the needs for a military biased Malay Corpus. In so doing, as a short paper, the methodology involves only content analysis of relevant documents on the development of Malay language corpora. Preliminary findings suggest that there are at least 15 Malay corpora in existence, and that some of the features in these corpora overlap. Further, the researchers argue for the need for a Malay Corpus for Military Operations since the existing corpora do not fully cater for this type of corpus.
APA, Harvard, Vancouver, ISO, and other styles
4

Adolphs, Svenja, Dawn Knight, and Ronald Carter. "Capturing context for heterogeneous corpus analysis." International Journal of Corpus Linguistics 16, no. 3 (2011): 305–24. http://dx.doi.org/10.1075/ijcl.16.3.02ado.

Full text
Abstract:
Heterogeneous corpora are emergent multi-modal datasets which comprise a variety of different records of everyday communication, from SMS/MMS messages to interactions in virtual environments, and from GPS data to phone and video calls. By tracking a person’s specific (inter)actions over time and place, the analysis of such “ubiquitous” corpora enables more detailed investigations of the interface between different communicative modes. This paper outlines some of the ways in which multi-modal, heterogeneous corpora can be utilised in corpus-based analyses of language-in-use and how we can construct richer descriptions of language use in relation to context. The paper further illustrates how the compilation of such corpora may enable us to extrapolate further information about communication across different speakers, media and environments, helping to generate useful insights into the extent to which everyday language and communicative choices are determined by different spatial, temporal and social contexts.
APA, Harvard, Vancouver, ISO, and other styles
5

Van Thin, Dang, Ngan Luu-Thuy Nguyen, Tri Minh Truong, Lac Si Le, and Duy Tin Vo. "Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 4 (2021): 1–22. http://dx.doi.org/10.1145/3446678.

Full text
Abstract:
Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site. 1
APA, Harvard, Vancouver, ISO, and other styles
6

Hodková, Kateřina. "Les relations sémantiques au carrefour des champs conceptuels du droit." Studia Romanistica 22, no. 1 (2022): 57–71. http://dx.doi.org/10.15452/sr.2022.22.0004.

Full text
Abstract:
The present study concerns the analysis of semantic relationships that exist between legal concepts in Czech and French law. The study combines textual approach, which is necessary for identification of relationships, and the approach of constructing conceptual fields by applying the theory of semic analysis, which help to distinguish terminological and conceptual units from other linguistic units in the texts. Two corpora of legal texts serve as source of legal concepts. These corpora concern the same thematic domain and were established for the purpose of this study. After the theorical and methodological delimitations of the key notions (the definition of concept and term, conceptual field, semic analysis, content of corpora), the study proceeds to a detailed description of linguistic relations within the corpora. This paper focuse on semantic relationships and analyses the following ones: synonymy, opposition (antonymy and contrastivity) and hierarchical relationships (hyperonymy, meronymy and hierarchy of conceptual fields). The analysis concerning two languages and two legal systems enables to compare the data related to the given corpora. For each relationship this study offers a short explanation of the nature of the relationship, its frequency in the two corpora, examples borrowed from the corpora and, if present, the description of other phenomena encountered during the research. These phenomena include, among other things, different types of synonymy, the absence of hyperonyme or holonyme in some hierarchical structures or different types of meronymy.
APA, Harvard, Vancouver, ISO, and other styles
7

Ledinek, Nina. "Skladenjska analiza slovenščine in slovenski jezikoslovno označeni korpusi." Jezik in slovstvo 63, no. 2-3 (2024): 103–16. http://dx.doi.org/10.4312/jis.63.2-3.103-116.

Full text
Abstract:
The article deals with the possibilities of using linguistically annotated corpora of Slovenian for syntactic analyses. Due to the inadequately developed Slovenian language infrastructure – at least eight syntactically annotated corpora of Slovenian are available to users, but due to their small size they only allow a limited scope of syntactic analysis – there is a small number of systematic and comprehensive corpus-based studies on Slovenian syntax, most of which rely on the analysis of morphosyntactically annotated corpora of Slovenian.
APA, Harvard, Vancouver, ISO, and other styles
8

Beigman Klebanov, Beata, Chaitanya Ramineni, David Kaufer, Paul Yeoh, and Suguru Ishizaki. "Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis." Language Testing 36, no. 1 (2017): 125–44. http://dx.doi.org/10.1177/0265532217740752.

Full text
Abstract:
Essay writing is a common type of constructed-response task used frequently in standardized writing assessments. However, the impromptu timed nature of the essay writing tests has drawn increasing criticism for the lack of authenticity for real-world writing in classroom and workplace settings. The goal of this paper is to contribute evidence to a validity argument for standardized writing tests. Using measurements of distances between rhetorical profiles in the corpora of interest, we examined connections between argumentative writing on standardized assessments and in external writing situations; namely, opinionated writing in academic and real-life settings. The results show that test corpora, focusing on argumentation in two standardized tests, are rhetorically similar to academic argumentative writing in a graduate-school setting, and about as similar as a corpus of civic writing in the same genre. The proximity between the test corpora and corpora representing external criteria of interest support the assessment use argument. The argumentative writing skills employed on the test are similar to the skills employed in academic and civic settings, despite the differences in the nature of the settings under which the writing samples for these different corpora are produced.
APA, Harvard, Vancouver, ISO, and other styles
9

Newman, John. "Corpora and cognitive linguistics." Revista Brasileira de Linguística Aplicada 11, no. 2 (2011): 521–59. http://dx.doi.org/10.1590/s1984-63982011000200010.

Full text
Abstract:
Corpora are a natural source of data for cognitive linguists, since corpora, more than any other source of data, reflect "usage" - a notion which is often claimed to be of critical importance to the field of cognitive linguistics. Corpora are relevant to all the main topics of interest in cognitive linguistics: metaphor, polysemy, synonymy, prototypes, and constructional analysis. I consider each of these topics in turn and offer suggestions about which methods of analysis can be profitably used with available corpora to explore these topics further. In addition, I consider how the design and content of currently used corpora need to be rethought if corpora are to provide all the types of usage data that cognitive linguists require.
APA, Harvard, Vancouver, ISO, and other styles
10

Crossley, Scott, and Max M. Louwerse. "Multi-dimensional register classification using bigrams." International Journal of Corpus Linguistics 12, no. 4 (2007): 453–78. http://dx.doi.org/10.1075/ijcl.12.4.02cro.

Full text
Abstract:
A corpus linguistic analysis investigated register classification using frequency of bigrams in nine spoken and two written corpora. Four dimensions emerged from a factor analysis using bigram frequencies shared across corpora: (1) Scripted vs. Unscripted Discourse, (2) Deliberate vs. Unplanned Discourse, (3) Spatial vs. Non-Spatial Discourse, and (4) Directional vs. Non-Directional Discourse. These findings were replicated in a second analysis. Both analyses demonstrate the strength of bigrams for classifying spoken and written registers, especially in locating distinct collocations among spoken corpora, as well as revealing syntactic and discourse features through a data-driven approach.
APA, Harvard, Vancouver, ISO, and other styles
11

BERT, MICHEL, SYLVIE BRUXELLES, CAROLE ETIENNE, LORENZA MONDADA, and VÉRONIQUE TRAVERSO. "Tool-assisted analysis of interactional corpora: voilà in the CLAPI database." Journal of French Language Studies 18, no. 1 (2008): 121–45. http://dx.doi.org/10.1017/s0959269507003195.

Full text
Abstract:
ABSTRACTThe aim of this paper is to show how databases and computer tools can be used for archiving and browsing corpora of social interactions. The development of specific search engines allows for both qualitative analysis of naturally occurring interactions and for quantitative explorations of larger corpora. The paper is based on the CLAPI Workbench <http://clapi.univ-lyon2.fr>, an interfaced ensemble of analytic tools which operates on a consistent body of corpora to facilitate their description and theoretical reconstruction. The analytical part of the paper focuses on the uses of a discourse particle in French interaction:voilà.
APA, Harvard, Vancouver, ISO, and other styles
12

Alamri, Basim, and Assem Alqarni. "Syntactic complexity in applied linguistics research article abstracts: A corpus-based comparative analysis between MENA and international." Ibérica, no. 48 (December 16, 2024): 221–46. https://doi.org/10.17398/2340-2784.48.221.

Full text
Abstract:
The study explored syntactic complexity variations in English RA abstracts written by native Arabic authors residing in the Arab world and international authors from various contexts. The study analyzed three specialized corpora of 600 English abstracts. Each corpus comprised 200 abstracts by Middle Eastern Arabs, North African Arabs (MENA), and international authors, with a total of 111,645 words across all three corpora. Using the L2SCA developed by Lu (2010), several procedures were undertaken to analyze the data from the three corpora. Fourteen measures of syntactic complexity grouped in five categories were implemented, and the data were then entered into SPSS Statistics software as dependent variables with the three corpora as independent variables to identify any possible differences via analysis of variance (ANOVA) tests. The findings revealed significant differences among the three corpora in their scores on the five aspects of syntactic complexity. For instance, the international corpus showed a longer length of production unit and more subordinations than the MENA corpora. Overall, the sentence complexity of international abstracts was higher than that of the MENA abstracts. Practical implications for Arabic learners and other learners with respect to writing pedagogy in English for research publication purposes will be discussed.
APA, Harvard, Vancouver, ISO, and other styles
13

Kuziboyeva, Sevinch. "CORPUS LINGUISTICS: ANALYZING LARGE TEXT CORPORA IN ENGLISH." Multidisciplinary Journal of Science and Technology 4, no. 10 (2024): 142–50. https://doi.org/10.5281/zenodo.13926216.

Full text
Abstract:
This paper delves into the field of corpus linguistics, focusing on the analysis of large text corpora in the English language. Corpus linguistics involves the systematic study of language through large collections of texts, known as corpora. This research explores the methodologies used in corpus analysis, the types of corpora available, and their applications in various linguistic studies. It highlights the significance of corpus linguistics in understanding language patterns, usage, and evolution. The paper also discusses the advantages and limitations of using large text corpora for linguistic analysis.
APA, Harvard, Vancouver, ISO, and other styles
14

Boltayeva, Dilfuza Shukhrat qizi. "USING CORPORA FOR LITERARY ANALYSIS: METHODOLOGIES, APPLICATIONS, AND CASE STUDIES." Multidisciplinary Journal of Science and Technology 5, no. 3 (2025): 71–74. https://doi.org/10.5281/zenodo.14993631.

Full text
Abstract:
The utilization of digital corpora for literary analysis has revolutionized the study of literature, providing a quantitative approach that complements traditional qualitative analysis. By harnessing the power of large text databases, researchers can detect patterns, trends, and linguistic features across genres, authors, and historical periods. This paper explores the integration of corpora into literary studies, outlining key methodologies, applications, and case studies. We delve into how corpus-based approaches can be employed to analyze themes, authorial style, and historical language change, emphasizing their potential to uncover new insights in literary analysis.
APA, Harvard, Vancouver, ISO, and other styles
15

Kobzová, Jana. "Kinship Terminology in Western Slavic Languages Based on Corpora Analysis." Journal of Linguistics/Jazykovedný casopis 70, no. 2 (2019): 289–98. http://dx.doi.org/10.2478/jazcas-2019-0059.

Full text
Abstract:
Abstract This paper is discussing kinship arrangements and more generally families of Western Slavs based on linguistic and corpora data. It is argued here that we can find correlation between lexicon and society, and that studying of lexicon can provide supportive data for society examination. In this paper we used corpora data that provides us with reliable information about lexicon that is truly used by speakers of Western Slavic languages and provided possible explanations for changes occurring in this part of vocabulary. Paper is divided into three main parts, one discussing relations between social reality and kinship terminology, while the second is discussing data from corpora. Third part is devoted to drawing conclusions.
APA, Harvard, Vancouver, ISO, and other styles
16

Waldenberger, Sandra, Stefanie Dipper, and Ilka Lemke. "Towards a broad-coverage graphemic analysis of large historical corpora." Zeitschrift für Sprachwissenschaft 40, no. 3 (2021): 401–20. http://dx.doi.org/10.1515/zfs-2021-2037.

Full text
Abstract:
Abstract This paper presents a method which we are developing to explore graphemic variation in large historical corpora of German. Historical corpora provide an amount of data at the level of graphemics which cannot be handled exhaustively using common methods of manual evaluation. To deal with this challenge, we apply methods from computational linguistics to pave the way for a broad-coverage graph(em)ic analysis of large historical corpora. In this paper, we show how our approach can be applied to the Reference Corpus of Middle High German. Illustrating our method and linguistic analysis, we present findings from our investigations into diatopic and/or diachronic variation as documented in 13th and 14th century charters (Urkunden) from the corpus.
APA, Harvard, Vancouver, ISO, and other styles
17

Al Zahran, Aladdin, and Rafik Jamoussi. "Oman Royal Speeches Corpus: Compilation and Analysis." Arab World English Journal 14, no. 4 (2023): 150–68. http://dx.doi.org/10.24093/awej/vol14no4.9.

Full text
Abstract:
For many years, researchers have directed their attention primarily toward developing written corpora, with the consequence that spoken corpora have consistently remained rare compared to written ones. The laborious transcription and annotation tasks make creating and maintaining spoken corpora a challenging endeavor. This project aims to build a transcribed corpus of Oman Royal Speeches and make it available online through a custom-made concordance tool. The study also aims to test the corpus for fundamental corpus-based lexical, stylistic, and discourse-analytical implementations. Compiling the Oman Royal Speeches Corpus is meant to fill a gap by contributing to the development of Arabic spoken language corpora and make available a research tool that can facilitate corpus-based research, uses, and applications in various areas of investigation. The corpus-building process underwent a five-stage process, including data capture, data processing, concordance tool development, testing and evaluation, and online deployment. With 98,511 tokens, the resultant corpus represents a searchable archive of Royal Speeches with a built-in online concordance tool that allows multiple search types and Keyword-in-Context query result display. The corpus has been tested for various corpus-analytic uses and has been found to provide significant findings in these areas. Thus, it has the potential to function as a reliable and authentic record and source of information for researchers and specialists in various fields, as well as a research tool allowing for various applications and analyses in language-related topics.
APA, Harvard, Vancouver, ISO, and other styles
18

Danielewicz-Betz, A., H. Kaneda, M. Mozgovoy, and M. Purgina. "Creating English and Japanese Twitter Corpora for Emotion Analysis." International Journal of Knowledge Engineering-IACSIT 1, no. 2 (2015): 120–24. http://dx.doi.org/10.7763/ijke.2015.v1.20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Bouziri, Basma. "A corpus-assisted genre analysis of the Tunisian Lecture Corpus: An exploratory study." Research in Corpus Linguistics 8, no. 2 (2020): 103–32. http://dx.doi.org/10.32714/ricl.08.02.06.

Full text
Abstract:
Multimodal, specialized corpora of academic lectures represent authentic classroom data that practitioners can draw on to design academic listening resources that would help students attend lectures. These corpora can also act as reflective practice corpora for teacher training or professional development programs with the objective of raising awareness of lecturing practices. Despite their contribution in shaping the type and quality of the learning that takes place in classrooms, multimodal lecture corpora are scarce, particularly in the Arab world. This paper addresses this research gap by designing and collecting a corpus of academic lectures delivered in English in Tunisia. The corpus was explored using a Systemic Functional Linguistics and English for Specific Purposes integrated genre analysis framework. A three-layered model of analysis was used to manually code various rhetorical functions as well as their realizations. Major findings include the pervasiveness of metadiscursive functions when compared to discourse functions, the identification of context-specific metadiscursive strategies, and the absence of verbal or non-verbal signaling of some rhetorical functions. Implications relate to the necessity of compiling and/or using lecture corpora that are multimodal, the value of adopting function-first approaches to explore these, particularly in non-native contexts, and the design of professional development programs and learning materials that would better account for local academic needs.
APA, Harvard, Vancouver, ISO, and other styles
20

Shin, Gyu-Ho. "Automatic analysis of caregiver input and child production." Korean Linguistics 18, no. 2 (2022): 125–58. http://dx.doi.org/10.1075/kl.20002.shi.

Full text
Abstract:
Abstract The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patterns expressing a transitive event (active transitive and suffixal passive). As the first empirical report on NLP-assisted analysis of Korean child corpora, this study is expected to reveal its advantages and drawbacks, thereby opening the window to furthering corpus-mediated research on child language development in Korean. Implications of this study’s findings will also contribute to research practice regarding developmental studies on Korean through child corpora, ensuring the reproducibility of procedures and results, which is often lacking in previous corpus-based research on child language development in Korean.
APA, Harvard, Vancouver, ISO, and other styles
21

Kozak, Ivan, and Nataliia Kunanets. "Information Systems for Working with Text Corpora: Classification and Comparative Analysis." Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì 16 (November 21, 2024): 273–89. https://doi.org/10.23939/sisn2024.16.273.

Full text
Abstract:
The article examines information systems for working with text corpora, particularly their application for linguistic analysis and management of large text data. Information systems for supporting text corpora are analyzed, classified, and compared based on their historical development and functional capabilities. The main focus is comparing the two most common systems that can be distinguished by functionality as corpus managers: ‘AntConc’ and ‘Sketch Engine’. These are evaluated based on key criteria: corpus creation, text processing, annotation, storage and export, data analysis and visualization, interface intuitiveness, support for the Ukrainian language, as well as the presence of an open license. The research aimed to conduct a comparative analysis of these systems using the analytic hierarchy process method to determine their strengths and weaknesses under different usage conditions. It was found that ‘Sketch Engine’ provides advanced capabilities for creating and managing large corpora, annotating and visualizing data, making it a better choice for large research projects. At the same time, ‘AntConc’ is a more accessible and efficient system for individual or small-scale research due to its simplicity, lack of licensing costs, and support for specific parameters for text analysis. The research findings can be useful for corpus and applied linguists when choosing systems for creating and working with text corpora. The conclusions will contribute to making decisions regarding the selection of appropriate tools based on specific research needs, workload, and budget constraints. In addition, the research results can be applied to improving existing and developing new information systems to support corpora in future scientific projects by the authors.
APA, Harvard, Vancouver, ISO, and other styles
22

Park, Shinjae. "Trend Analysis of Fluency-Related Research Articles: Using Data Mining." Forum for Linguistic Studies 6, no. 4 (2024): 449–62. http://dx.doi.org/10.30564/fls.v6i4.6797.

Full text
Abstract:
This research seeks to identify recent research trends in analysing fluency in English language studies since 2010. The importance of studying corpora that compile registered research in linguistics is growing; thus, it is time to consider the significance of data mining analysis on the corpora. Considering this emerging research topic, this analysis intends to generate and compare word clouds and word metric charts from speaking and writing abstract corpora in linguistics research centred on the keyword ‘Fluency’. This comparison is minimally addressed in extant research; hence, this investigation aims to fill this gap. The corpus contains 50 speaking and 15 writing abstracts from linguistics journals. To create the word cloud, AntConc4.1.4 software was used to analyse the corpora, while TF-IDF and matrix analysis were conducted using R. As a result, in spoken English, ‘fluency’ studies were largely related to 'teaching' and tended to be dominated by studies focusing on ‘fluency’ alone. In contrast, writing studies were mostly related to learner proficiency and assessment, and many studies analysed the relationship between linguistic abilities by expanding the scope of ‘fluency’ to include ‘accuracy’ and ‘complexity’. This study will benefit researchers in deciding on topics as it provides research directions and trends in analysing fluency in the two interconnected but differently expressed fields (i.e., speaking and writing).
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Jianxin. "Recent Progress in Corpus Linguistics in China." International Journal of Corpus Linguistics 6, no. 2 (2001): 281–304. http://dx.doi.org/10.1075/ijcl.6.2.05wan.

Full text
Abstract:
This paper discusses some of the new developments in corpus linguistics in China. In the area of Chinese corpus compilation it presents large-scale text databases, representative corpora, annotated corpora, lexical databases for information processing, phonological, dialectal, spoken and other specialized corpora. In connection with the analysis and annotation of Chinese corpora, the characteristics of the Chinese language, word segmentation, tagging, parsing, and some corpus analytical systems are described. Concerning English corpus studies, some corpora of English as a Foreign Language and corpus-based research are depicted. On this basis tentative conclusions are drawn.
APA, Harvard, Vancouver, ISO, and other styles
24

Jung, Boo Kyung, and Gyu-Ho Shin. "Use of locative postposition-verb construction in Korean: analysis of L1-Korean corpora and L2-Korean textbooks." Corpora 18, no. 1 (2023): 15–47. http://dx.doi.org/10.3366/cor.2023.0271.

Full text
Abstract:
A usage-based account argues that the actual experience of language use greatly affects the course of language acquisition. On the basis of this premise, this study investigates how language textbooks, as a main input source for L2 learners, reflect the properties of a target language. We focus on a Korean locative postposition–verb construction, consisting of one of the three locative postpositions (– ey, – eyse and –( u) lo) and particular sets of verbs. For this purpose, we adopt representative L1 written/spoken corpora in Korean and two L2-Korean textbook corpora, and analysed three aspects of them: frequency of these postpositions and verbs, their association strength, and change in use of the construction’s components across the corpora. We find that ( i) verb types co-occurring with each postposition follow the Zipfian distribution in both datasets, ( ii) L1 corpora and L2 textbooks are inconsistent with their manifestation of postposition–verb associations within the construction, and ( iii) the two textbook types differ in verb use co-occurring with the postpositions. The implications of this study’s findings are discussed in terms of the acquisitional benefits of constructional frames and using corpora for L2 learning and teaching
APA, Harvard, Vancouver, ISO, and other styles
25

Ding, Qing. "Analysis of the Impact of the Traditional Literature Environment Based on Big Data Technology on Overseas Literature." Journal of Environmental and Public Health 2022 (September 30, 2022): 1–10. http://dx.doi.org/10.1155/2022/2834363.

Full text
Abstract:
To study the influence of conventional literature on foreign literature driven by big data, this essay begins with surveys and interviews. Chinese big data-driven corpora are distinct from other Chinese corpora, as is widely known. Its main objective is to categorize professional corpora that are unknown and fall within the category of professional corpora. In order to provide a straightforward and useful domain partitioning model for corpus texts, this research makes use of text clustering and big data-driven methodologies. We can easily determine the domain of the aligned text, making it easier to do machine translation research in the future. The research findings demonstrate that the accuracy rate of the approach suggested in this article is essentially above 89.79%, demonstrating the viability of the way of automatically building a corpus suggested in this paper in the experiment.
APA, Harvard, Vancouver, ISO, and other styles
26

Loredo-Pong, Virginia, María Lucila Morales-Rodríguez, Nancy Patricia Díaz-Zavala, Nelson Rangel-Valdez, and Jaime E. Sosa-Sevilla. "Design and analysis of classification models for the gelification of alkoxybenzoates using the kNN algorithm." International Journal of Combinatorial Optimization Problems and Informatics 13, no. 2 (2022): 58–64. https://doi.org/10.61467/2007.1558.2022.v13i2.266.

Full text
Abstract:
The classification models of the states produced by the gelation tests of alkoxy benzoates require designing several corpora of data based on their characteristics. This work studies a series of alkoxybenzoates and 15 solvents characterized by Hansen Solubility Parameters and the number of carbons on the alkyl tail as a distinctive structural feature for the molecules. These properties were evaluated as attributes on the corpora on the kNN algorithm. Different configurations developed were analyzed, with three corpora designed varying their content according to their attributes. From this study, seem the relevance of some attributes over others on the performance prediction of the products class obtained. The significant samples correctly classified on corpora containing HSP and the number of carbons on the alkyl ether tail of alkoxybenzoates denote the influence of these properties on the classification. Also, the more suitable configurations on kNN, metric, k value, attribute weight is founded according to each corpus.
APA, Harvard, Vancouver, ISO, and other styles
27

Geluso, Joe, and Roz Hirch. "The reference corpus matters." Register Studies 1, no. 2 (2019): 209–42. http://dx.doi.org/10.1075/rs.18001.gel.

Full text
Abstract:
Abstract This study investigates the effect that reference corpora of different registers have on the content of keyword lists. The study focusses on two target corpora and the keyword lists generated for each when using three distinct reference corpora. The two target corpora consist of published research by faculty at two PhD-granting programs in applied linguistics in North America. The reference corpora comprise published research in applied linguistics, newspaper and magazine articles, and fiction texts, respectively. The findings suggest that while common keywords representing each target corpus emerge regardless of the reference corpus used in the analysis, there are also substantial differences. Primarily, using a reference corpus of the same sub-register as the target corpus better highlights content unique to each target corpus while using a reference corpus of a different register better uncovers words that reflect the register that the target corpora represent. Implications for conducting keyword analysis are discussed.
APA, Harvard, Vancouver, ISO, and other styles
28

McEnery, Tony, Vaclav Brezina, and Helen Baker. "Usage Fluctuation Analysis." International Journal of Corpus Linguistics 24, no. 4 (2019): 413–44. http://dx.doi.org/10.1075/ijcl.18096.mce.

Full text
Abstract:
Abstract This article introduces a methodology for the diachronic analysis of large historical corpora, Usage Fluctuation Analysis (UFA). UFA looks at the fluctuation of the usage of a word as observed through collocation. It presupposes neither a commitment to a specific semantic theory, nor that the results will focus solely on semantics. We focus, rather, upon a word’s usage. UFA considers large amounts of evidence about usage, through time, as made available by historical corpora, displaying fluctuation in word usage in the form of a graph. The paper provides guidelines for the interpretation of UFA graphs and provides three short case studies applying the technique to (i) the analysis of the word its and (ii) two words related to social actors, whore and harlot. These case studies relate UFA to prior, labour intensive, corpus and historical analyses. They also highlight the novel observations that the technique affords.
APA, Harvard, Vancouver, ISO, and other styles
29

Belica, Cyril. "Analysis of Temporal Changes in Corpora." International Journal of Corpus Linguistics 1, no. 1 (1996): 61–73. http://dx.doi.org/10.1075/ijcl.1.1.05bel.

Full text
Abstract:
This paper describes an experiment in statistical analysis of corpora with respect to the temporal changes in language use. The technique approximates the notion of temporal relevance of usage evolution by analysing and evaluating the frequency distribution of a set of indicators over time and by isolating string configurations with unlikely temporal distribution.
APA, Harvard, Vancouver, ISO, and other styles
30

López-Rodríguez, Clara Inés, and María Isabel Tercedor-Sánchez. "Corpora and Students' Autonomy in Scientific and Technical Translation training." Journal of Specialised Translation, no. 9 (January 25, 2008): 2–19. https://doi.org/10.26034/cm.jostrans.2008.681.

Full text
Abstract:
The use of corpora in the translation classroom has been explored by several researchers (Bowker 1998, 2000; Faber et al 2001; Zanettin, Bernardini & Stewart 2003). The aims of this paper are to explore the applications of corpora in the teaching of scientific and technical translation at university level, and to design activities that increase learner autonomy while developing translation strategies and evaluation skills. These activities include the analysis of three types of corpora (DIY corpora, learner corpora, and a quality corpus), the introduction of tags for didactic purposes, and the generation of lemmatised wordlists and concordances. We also explore how corpora contribute to the acquisition of knowledge of the subject field and its conventions, and to the production of adequate texts in the target language.
APA, Harvard, Vancouver, ISO, and other styles
31

Faigenbaum-Golovin, Shira, Alon Kipnis, Axel Bühler, Eli Piasetzky, Thomas Römer, and Israel Finkelstein. "Critical biblical studies via word frequency analysis: Unveiling text authorship." PLOS One 20, no. 6 (2025): e0322905. https://doi.org/10.1371/journal.pone.0322905.

Full text
Abstract:
The Bible is the product of a complex process of oral and written transmissions that stretched across centuries and traditions. This implies ongoing revision of the “original” or oldest textual layers over the course of hundreds of years. Although critical scholarship recognizes this fact, debates abound regarding the reconstruction of the different layers, their date of composition and their historical backgrounds. Traditional methodologies have grappled with these challenges through textual and diachronic criticism, employing linguistic, stylistic, inner-biblical, archaeological and historical criteria. In this study, we use computer-assisted methods to address the question of authorship of biblical texts by employing statistical analysis that is particularly sensitive to deviations in word frequencies. Here, the term “word” may be generalized to “n-gram” (a sequence of words) or other countable text features. This paper consists of two parts. In the first part, we focus on differentiating between three distinct scribal corpora across numerous chapters in the Enneateuch, the first nine books of the Bible. Specifically, we examine 50 chapters labeled according to biblical exegesis considerations into three corpora: the old layer in Deuteronomy (D), texts belonging to the “Deuteronomistic History” in Joshua-to-Kings (DtrH), and the Priestly writings (P). For pragmatic reasons, we chose entire chapters, in which the number of verses potentially attributed to different authors or redactors is negligible. Without prior assumptions about author identity, our approach leverages subtle differences in word frequencies to distinguish among the three corpora and identify author-dependent linguistic properties. Our analysis indicated that the first two scribal corpora — (D, the oldest layers of Deuteronomy, and DtrH, the so-called Deuteronomistic History) — are much more closely related to each other than they are to the third, (P). This observation aligns with scholarly consensus. In addition, we attained high accuracy in attributing authorship by evaluating the similarity of each chapter to the reference corpora. In the second part of the paper, we report on our use of the three corpora as ground truth to examine other biblical texts whose authorship is disputed by biblical experts. Here, we demonstrate the potential contribution of insights achieved in the first part. Our paper sheds new light on the question of authorship of biblical texts by offering interpretable, statistically significant evidence of the existence of linguistic characteristics in the writing of biblical authors/redactors, that can be identified automatically. Our methodology thus provides a new tool to address disputed matters in biblical studies.
APA, Harvard, Vancouver, ISO, and other styles
32

Fu, Rongbo. "Multilingual Corpora and Multilingual Corpus Analysis." Australian Journal of Linguistics 37, no. 1 (2016): 105–9. http://dx.doi.org/10.1080/07268602.2016.1156466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Mechulam, Nicolás, Damián Salvia, Aiala Rosá, and Mathias Etcheverry. "Building Dynamic Lexicons for Sentiment Analysis." Inteligencia Artificial 22, no. 64 (2019): 1–13. http://dx.doi.org/10.4114/intartif.vol22iss64pp1-13.

Full text
Abstract:
Nowadays, many approaches for Sentiment Analysis (SA) rely on affective lexicons to identify emotions transmitted in opinions. However, most of these lexicons do not consider that a word can express different sentiments in different predication domains, introducing errors in the sentiment inference. Due to this problem, we present a model based on a context-graph which can be used for building domain specic sentiment lexicons(DL: Dynamic Lexicons) by propagating the valence of a few seed words. For different corpora, we compare the results of a simple rule-based sentiment classier using the corresponding DL, with the results obtained using a general affective lexicon. For most corpora containing specic domain opinions, the DL reaches better results than the general lexicon.
APA, Harvard, Vancouver, ISO, and other styles
34

Ojanguren López, Ana Elvira. "Inflectional Variation in the Old English Participle. A Corpus-based Analysis." Journal of English Studies 16 (December 18, 2018): 237. http://dx.doi.org/10.18172/jes.3434.

Full text
Abstract:
This article deals with the coexistence of verbal and adjectival inflection in the Old English past participle. Its aim is to assess the degree of variation in the inflection of the participle so as to determine whether or not the change starts in the Old English period. The analysis is based on two corpora, the “York Corpus of Old English” and the “Dictionary of Old English Corpus”. With these corpora the following variants of the inflection of the participle are analysed: genre (prose and verse), tense (present and past), morphological class (weak vs. strong) and case (nominative, accusative, genitive, dative and instrumental). The main conclusion of the article is that the quantitative evidence from the corpora indicates that the degree of variation presented by the participle in Old English shows that diachronic change is underway. Overall, the past participle and poetic texts clearly reflect the loss of inflection, while the adjectival inflection of the participle co-occurs with its adjectival function.
APA, Harvard, Vancouver, ISO, and other styles
35

Huang, Yinxia. "On the Validity of Corpus for Contrastive Analysis: Focusing on Korean-Chinese Contrast Analysis." Korean Society of Bilingualism 82 (March 31, 2021): 259–86. https://doi.org/10.17296/korbil.2021..82.259.

Full text
Abstract:
This study aims to analyze and verify the bilingual corpus used in the Korean-Chinese contrastive analysis. To this end, Chapter 2 describes the problems of the corpus construction and corpus used in contrastive linguistics. Chapter 3 analyzes the characteristics of the two kinds of the bilingual corpus - parallel corpus and comparative corpus used in Korean-Chinese contrastive analysis. In Chapter 4, in order to examine the validity of the parallel corpus used in the Korean-Chinese uni-direction contrastive study: the corpora used in five different studies on the Korean particle ‘e(에)’ and ‘eseo(에서)’ and their correspondences in Chinese were compared, and the results of the study were compared. As a result, the corpora used in these five papers were different, and the research results for the study subjects were different, which proved the effect of parallel corpus structure in contrastive studies. Furthermore, we conducted a replica study focusing on the comparative study of Korean-Chinese suffixes. Focusing on the large difference in corpus size between the Korean and Chinese sub-corpora of the comparative corpus, we resized the Korean sub-corpus and conducted a replica study on the suffix. As a result of increasing the Korean corpus size, it was confirmed that the study results on the Korean suffix significantly depend on the corpus size. In summary, it is vital to apply strict standards for representativeness, balance, and size in constructing parallel and comparative corpus in contrastive linguistics studies.
APA, Harvard, Vancouver, ISO, and other styles
36

Ruseva, Petranka. "Application of Wordsmith 6.0 in English language teaching." Proceedings. College Dobrich VIII (December 25, 2015): 55–63. https://doi.org/10.5281/zenodo.10033273.

Full text
Abstract:
WordSmith 6.0 is a lexical analysis software which includes programs that facilitate different corpora examination. They can be ready-made corpora (BNC, COCA, etc.) or texts selected according to the choice of any researcher. The paper considers the application of WordSmith 6.0 in finding and easily processing examples of some useful structures that are found in native speakers' language and then presenting and practicing them by the college students. The approach is based in a way on data-driven methodology in English language teaching. In general there are three main steps that are followed here, i.e. text selection for the corpora, corpora processing and finally some suggestions for exercises based on the results from the processed corpora. 
APA, Harvard, Vancouver, ISO, and other styles
37

Titova, S. V., and S. D. Ignatova. "The technology of application of multimodal linguistic corpora for foreign language interaction skill development." Tambov University Review. Series: Humanities 29, no. 6 (2024): 1539–49. https://doi.org/10.20310/1810-0201-2024-29-6-1539-1549.

Full text
Abstract:
Importance. Linguistic corpora are a valuable resource for teaching foreign languages. Recently, there has been a surge in publications highlighting the didactic potential of these corpora for teaching not only lexico-grammatical skills but also productive skills. This trend is linked to the development of multimodal and learner corpora, the simplification of user interfaces, and the increased accessibility of corpus technologies. Consequently, there is a pressing need to develop and test new methodological models for integrating language corpora into teaching practices.Materials and Methods. During the study, a thorough analysis of the scientific literature has been conducted to establish the theoretical framework for the research. Methods of comparison, contrast, generalization are applied.Results and Discussion. The key phases in the data-driven approach to foreign language teaching development have been identified. The advantages of utilizing linguistic corpora in the language classroom have been outlined. A methodological model for working with multimodal corpora to enhance interaction skills is proposed, using the French-language corpus FLEURON as an example.Conclusion. Linguistic corpora have been utilized in foreign language teaching for over 40 years; however, their application in teaching practice remains limited. The didactic functions of linguistic corpora can optimize and enhance the educational process. Future research should focus on developing methodologies for integrating multimodal corpora into foreign language instruction.
APA, Harvard, Vancouver, ISO, and other styles
38

Osipenko, Maria. "Directed Topic Extraction with Side Information for Sustainability Analysis." Analytics 3, no. 3 (2024): 389–405. http://dx.doi.org/10.3390/analytics3030021.

Full text
Abstract:
Topic analysis represents each document in a text corpus in a low-dimensional latent topic space. In some cases, the desired topic representation is subject to specific requirements or guidelines constituting side information. For instance, sustainability-aware investors might be interested in automatically assessing aspects of firm sustainability based on the textual content of its corporate reports, focusing on the established 17 UN sustainability goals. The main corpus consists of the corporate report texts, while the texts containing the definitions of the 17 UN sustainability goals represent the side information. Under the assumption that both text corpora share a common low-dimensional subspace, we propose representing them in such a space via directed topic extraction using matrix co-factorization. Both the main and the side text corpora are first represented as term–context matrices, which are then jointly decomposed into word–topic and topic–context matrices. The word–topic matrix is common to both text corpora, whereas the topic–context matrices contain specific representations in the shared topic space. A nuisance parameter, which allows us to shift the focus between the error minimization of individual factorization terms, controls the extent to which the side information is taken into account. With our approach, documents from the main and the side corpora can be related to each other in the resulting latent topic space. That is, the corporate reports are represented in the same latent topic space as the descriptions of the 17 UN sustainability goals, enabling a structured automatic sustainability assessment of the textual report’s content. We provide an algorithm for such directed topic extraction and propose techniques for visualizing and interpreting the results.
APA, Harvard, Vancouver, ISO, and other styles
39

Wang, Huanyu. "Online Corpus Construction of English Text Collection, Data Cleaning, and Similarity Analysis." Mobile Information Systems 2022 (September 15, 2022): 1–8. http://dx.doi.org/10.1155/2022/3105790.

Full text
Abstract:
Corpora are applied to analyze and study the characteristics of the target language. In language education, corpora are playing an increasingly essential role due to their large capacity, authenticity, rapid and accurate retrieval, as well as quick and easy statistics. At present, a great number of universities are trying to apply the textbook corpus to English teaching. However, most of the existing corpora face the issue of poor sharing. In addition, these corpora may be limited to a specific textbook, which leads to the lack of wide coverage of the retrieval and analysis results. As a result, it is quite necessary to develop a set of English corpora that is highly relevant, well shared, and easy to use by fully integrating existing teaching resources according to the characteristics of English subjects in universities. In recent years, the use of corpus-assisted English language teaching has gained widespread attention and exploration as computers have become more and more popular. After all, a corpus-based teaching model can effectively eliminate the various drawbacks of traditional vocabulary teaching. In fact, the corpus has a large amount of authentic corpus. The authenticity and practicality of the corpus facilitate students’ mastery and use of English vocabulary in real contexts. What is more, the new model of corpus-assisted English vocabulary teaching can greatly increase independent learning and cooperative activities, so that students can increase their internal motivation for learning. This study begins with a brief introduction to the concept and characteristics of corpora. To be specific, the advantages of the corpus application in foreign language teaching are explained. At the same time, this research further analyzes the shortcomings of the existing corpus in university English education from the perspective of the current development and application of English corpora as well as clarifies the importance of building a corpus of university English teaching materials. After that, the system’s operating environment and main development techniques are determined according to the specific requirements of the corpus for university English textbooks. In other words, the overall design and detailed design of the corpus and its management system were then carried out on the basis of the chosen technology platform. In addition, the structure of the tables in the database is analyzed and the basic components and operation procedures of the system are introduced. Furthermore, the functional modules of the system are designed. At the same time, the automatic word and sentence separation methods of the original corpus, the corpus entry process, the cross-distance search of the corpus, and the statistical analysis of the search results are discussed in detail. In conclusion, this study is based on English text collection and data cleaning techniques to build an online corpus.
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Qing, Junwei Wang, and Yongbi Zhi. "A Corpus-based Critical Analysis of American Foreign Policies on China." Journal of Higher Education Research 5, no. 5 (2024): 389. https://doi.org/10.32629/jher.v5i5.2973.

Full text
Abstract:
Modality reflects the speaker's position, attitude or evaluation of the situation. This study collected American foreign policy discourse on China human rights to create two small corpora. Three types of modality are compared to reveal the hidden ideology behind political discourse. It is found that: (1) the proportion of modalities in Obama's corpora is larger than that in Trump's; (2) epistemic accounts for the largest, while volitional accounts for the smallest; (3) three types of modalities contribute to the realization of different communicative purposes in American human rights diplomacy to China.
APA, Harvard, Vancouver, ISO, and other styles
41

Mengliye, B. R., Sh Hamroyeva, and O. Abdullayeva. "Scopus-based bibliometric analysis on corpus linguistics for the period of 2017-2021." E3S Web of Conferences 413 (2023): 03008. http://dx.doi.org/10.1051/e3sconf/202341303008.

Full text
Abstract:
This article aims to observe the latest scientific theories in the field of corpus linguistics, to analyze the latest research trends in corpus linguistics and the creation of language corpora. The results of our research are based on bibliometric analysis of scientific research results and review articles of universities, scientific research centers and well-known scientists of different countries where scientific and practical work is being carried out in the field of corpus linguistics. We analyzed the publications in the Scopus database in the field of corpus linguistics in 2017-2021 and found research results related to finding solutions to various problems in language corpora and problems in it and we observed bibliometric method through speech recognition, syntactic parsing problems, semantic tagging problems, automatic tokenization and lemmatization. This is the first research in Uzbek linguistics to report on the landscape of corpus linguistics in recent years. This research contributes to the general scientific understanding of corpus linguistics and provides insight into the past, present, and future of linguistics. 1353 publications were analyzed in the article. Although the field of corpus linguistics originated in the 1960s and 1970s, the fields of study have expanded and changed over time. Among the fields of linguistics, this direction is dynamic. In recent years, national corpora and target corpora have been created in various languages, and solutions to complex linguistic problems have been found.
APA, Harvard, Vancouver, ISO, and other styles
42

Neves, Mariana. "An analysis on the entity annotations in biological corpora." F1000Research 3 (April 25, 2014): 96. http://dx.doi.org/10.12688/f1000research.3216.1.

Full text
Abstract:
Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six semantic groups and an overview on the semantic entities which can be found in each corpus is shown. Results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain.
APA, Harvard, Vancouver, ISO, and other styles
43

Bakić, Mirjana, Ivan Jovanović, Slađana Ugrenović, et al. "Parahippocampal corpora amylacea and neuronal lipofuscin in human aging." Open Medicine 8, no. 6 (2013): 749–61. http://dx.doi.org/10.2478/s11536-013-0214-1.

Full text
Abstract:
AbstractThe aim of this research was to quantify the number of corpora amylacea and lipofuscin-bearing neurons in the parahippocampal region of the brain. Right parahippocampal gyrus specimens of 30 cadavers were used as material for histological and morphometric analyses. A combined Alcian Blue and Periodic Acid-Schiff technique was used for identification and quantification of corpora amylacea and lipofuscin-bearing neurons. Immunohistochemistry was performed using S100 polyclonal, neuron-specific enolase and glial fibrillary acidic protein monoclonal antibodies for differentiation of corpora amylacea and other spherical inclusions of the aging brain. Cluster analysis of obtained data showed the presence of three age groups (median age: I = 41.5, II = 68, III = 71.5). The second group was characterized by a significantly higher numerical density of subcortical corpora amylacea and number of lipofuscin-bearing neurons than other two groups. Values of the latter cited parameters in the third group were insignificantly higher than the first younger group. Linear regression showed that number of parahippocampal lipofuscin-bearing neurons significantly predicts numerical density of subcortical corpora amylacea. The above results suggest that more numerous parahippocampal region corpora amylacea and lipofuscin-bearing neurons in some older cases might represent signs of its’ neurons quantitatively-altered metabolism.
APA, Harvard, Vancouver, ISO, and other styles
44

Clancy, Brian. "Conflict in corpora." Journal of Language Aggression and Conflict 6, no. 2 (2018): 228–47. http://dx.doi.org/10.1075/jlac.00011.cla.

Full text
Abstract:
Abstract The analysis of conflict in family discourse has often been characterised by ethnographic approaches and/or fine-grained analysis of unique conflict episodes. This article, by contrast, uses a c.175,000-word spoken corpus of Irish family discourse, in conjunction with a corpus pragmatic approach, to explore specific linguistic aspects of conflict discourse. Conflict episodes are identified and analysed in the corpus using a range of linguistic “hooks” (Rühlemann 2010) that have been previously associated with prefacing disagreement such as the marker well, mitigators (I think, I mean, I guess) or the counterargument strategy yes but. The analysis reveals that the family members most frequently use the yeah but strategy in conflict episodes which facilitates immediate disagreement. This strategy is often accompanied by a range of mitigators, predominantly in turn final position, some of which have not been previously identified as indexing conflict sequences.
APA, Harvard, Vancouver, ISO, and other styles
45

Schröter, Melani, and Marie Veniard. "Contrastive analysis of keywords in discourses." International Journal of Language and Culture 3, no. 1 (2016): 1–33. http://dx.doi.org/10.1075/ijolc.3.1.01sch.

Full text
Abstract:
This article suggests a theoretical and methodological framework for a systematic contrastive discourse analysis across languages and discourse communities through keywords. This constitutes a lexical approach to discourse analysis which is considered to be particularly fruitful for comparative analysis. We use a corpus-assisted methodology, presuming meaning to be constituted, revealed, and constrained by collocation environment. We compare the use of the keywords intégration/Integration in French and German public discourses about migration on the basis of newspaper corpora built from two French and German newspapers from 1998 to 2011. We look at the frequency of these keywords over the given time span, group collocates into thematic categories, and discuss indicators of discursive salience by comparing the development of collocation profiles over time in both corpora as well as the occurrence of neologisms and compounds based on intégration/Integration.
APA, Harvard, Vancouver, ISO, and other styles
46

Джуманиязова, Интизор. "КОРПУСНАЯ ЛИНГВИСТИКА КАК НОВЕЙШЕЕ НАПРАВЛЕНИЕ В ЯЗЫКОЗНАНИИ". TAMADDUN NURI JURNALI 8, № 59 (2024): 65–67. http://dx.doi.org/10.69691/27gybh22.

Full text
Abstract:
Currently, there is a fairly large number of language corpora, including Russian language corpora, which differ from each other in a variety of ways. The article considers a brief analysis of the history of corpus linguistic linguistics and defines its tasks and methods, identifies its interaction with other linguistic disciplines, gives a general description of the corpus as the basic concept of corpus linguistics, and presents a classification of text corpora.
APA, Harvard, Vancouver, ISO, and other styles
47

Soloviev, F. N. "Embedding Additional Natural Language Processing Tools into the TXM Platform." Vestnik NSU. Series: Information Technologies 18, no. 1 (2020): 74–82. http://dx.doi.org/10.25205/1818-7900-2020-18-1-74-82.

Full text
Abstract:
In our work we present a description of integration of natural language processing tools (pseudostem extraction, noun phrase extraction, verb government analysis) in order to extend analytic facilities of the TXM corpora analysis platform. The tools introduced in the paper are combined into a single software package providing TXM platform with an effective specialized corpora preparation tool for further analysis.
APA, Harvard, Vancouver, ISO, and other styles
48

Klochikhin, Vitaliy V. "Psychological and pedagogical conditions for the development of students’ collocational competence based on corpora." Tambov University Review. Series: Humanities, no. 2 (2023): 395–404. http://dx.doi.org/10.20310/1810-0201-2023-28-2-395-404.

Full text
Abstract:
Importance. The development of linguistic students’ collocational competence is one of the main goals of foreign language teaching at a university. Corpora is one of the modern digital technologies that can be used to achieve this goal. However, there are a number of problems in the methods of collocational competence development based on corpora, the consideration of which will affect the effectiveness of the implementation of these teaching methods. The purpose of the study is to determine the psychological and pedagogical conditions for the effective collocational competence development based on corpora. Research methods. On the basis of a theoretical analysis of the scientific and methodic literature and systematization of the conceptual approaches of researchers, psychological and pedagogical conditions were identified, with the help of a comparative analysis and synthesis of empirical data, the effectiveness of taking into account psychological and pedagogical conditions for the collocational competence development based on corpora was theoretically substantiated. Results and Discussion. Psychological and pedagogical conditions for collocational competence development based on corpora are identified and substantiated, which must be taken into account when developing a methods for teaching collocations. These conditions include: a) motivation of students to develop collocation competence based on corpora; b) the development of information and communication technologies competence of a foreign language teacher; c) students’ foreign language proficiency at level B1; d) adherence to the identified stages of project activities. Conclusions. The collocational competence development based on corpora should be carried out while following the entire complex of identified psychological and pedagogical conditions. The results obtained can be used to improve the effectiveness of collocation teaching methods based on corpora.
APA, Harvard, Vancouver, ISO, and other styles
49

Yao, Jiayi, Hui Chen, and Yuan Liu. "Research on Constructing “Parallel Contrast Corpus of Grammatical Errors”." Journal of Language Teaching and Research 11, no. 5 (2020): 756. http://dx.doi.org/10.17507/jltr.1105.10.

Full text
Abstract:
Error analysis and interlanguage are two cores in second language acquisition research. Researchers have conducted studies and established corpora from various perspectives based on Big Data. However, most of the existing interlanguage corpora provide no feedback for students, which resulted in the barrier of improving self-study efficiency. Additionally, interlanguage systems are influenced by nationalities, while there is a vacancy on the construction of divisional interlanguage corpora. Based on previous studies and error analysis of BNU-Cardiff Chinese College students, this study proposes an idea and model of “Parallel Contrast Corpus of Grammatical Errors” for native English speakers in Chinese learning.
APA, Harvard, Vancouver, ISO, and other styles
50

Fantinuoli, Claudio. "Revisiting corpus creation and analysis tools for translation tasks." Cadernos de Tradução 36, no. 1 (2016): 62. http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p62.

Full text
Abstract:
http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p62Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual corpora from the web; it includes a concordancer with a query system similar to a search engine; it uses basic statistical measures to indicate the reliability of results; it accesses the original documents directly for more contextual information; it includes a statistical and linguistic terminology extraction utility to extract the relevant terminology of the domain and the typical collocations of a given term. Designed to be easy and intuitive to use, the tool may help translation students as well as professionals to increase their translation quality by adhering to the specific linguistic variety of the target text corpus.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!