Journal articles: 'Statistical Language Model'

1

Bellegarda, Jerome R. "Statistical language model adaptation: review and perspectives." Speech Communication 42, no. 1 (2004): 93–108. http://dx.doi.org/10.1016/j.specom.2003.08.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Shu, Peng, and Sun Cuiqin. "A Statistical English Syntax Analysis Model Based on Linguistic Evaluation Information." Security and Communication Networks 2022 (July 30, 2022): 1–7. http://dx.doi.org/10.1155/2022/3766417.

Full text

Abstract:

Language evaluation research currently focuses on the analysis of scholars from various native language backgrounds, whereas the local grammatical characteristics of other groups, particularly English language learners, are discussed less frequently. Local grammar offers a new perspective for analyzing the meaning characteristics of evaluation languages from the point of view of the people who employ them. In order to provide context for this paper, past research on local syntax is reviewed. The language model generates text that can be analyzed to determine the model’s aggressiveness when perturbed. To evaluate the method’s precision and efficacy, we compared the aggressiveness of pretrained models under various conditions using an English database. The results demonstrate that the method is capable of automatically and effectively evaluating the aggressiveness of language models. We then examine the scales of model parameters and the relationships between words in the training corpus.

APA, Harvard, Vancouver, ISO, and other styles

3

Sennrich, Rico. "Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation." Transactions of the Association for Computational Linguistics 3 (December 2015): 169–82. http://dx.doi.org/10.1162/tacl_a_00131.

Full text

Abstract:

The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps. Syntactic language models have the potential to fill this modelling gap. We propose a language model for dependency structures that is relational rather than configurational and thus particularly suited for languages with a (relatively) free word order. It is trainable with Neural Networks, and not only improves over standard n-gram language models, but also outperforms related syntactic language models. We empirically demonstrate its effectiveness in terms of perplexity and as a feature function in string-to-tree SMT from English to German and Russian. We also show that using a syntactic evaluation metric to tune the log-linear parameters of an SMT system further increases translation quality when coupled with a syntactic language model.

APA, Harvard, Vancouver, ISO, and other styles

4

Bahl, L. R., P. F. Brown, P. V. de Souza, and R. L. Mercer. "A tree-based statistical language model for natural language speech recognition." IEEE Transactions on Acoustics, Speech, and Signal Processing 37, no. 7 (1989): 1001–8. http://dx.doi.org/10.1109/29.32278.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Coloma, Germán. "Towards a Synergetic Statistical Model of Language Phonology." Journal of Quantitative Linguistics 21, no. 2 (2014): 100–122. http://dx.doi.org/10.1080/09296174.2014.882184.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Bril, Isabelle, Achraf Lassoued, and Michel de Rougemont. "A Statistical Model for Morphology Inspired by the Amis Language." International journal of Web & Semantic Technology 13, no. 02 (2022): 1–17. http://dx.doi.org/10.5121/ijwest.2022.13201.

Full text

Abstract:

We introduce a statistical model for analysing the morphology of natural languages based on their affixes. The model was inspired by the analysis of Amis, an Austronesian language with a rich morphology. As words contain a root and potential affixes, we associate three vectors with each word: one for the root, one for the prefixes, and one for the suffixes. The morphology captures semantic notions and we show how to approximately predict some of them, for example the type of simple sentences using prefixes and suffixes only. We then define a Sentence vector s associated with each sentence, built from the prefixes and suffixes of the sentence and show how to approximately predict a derivation tree in a grammar.

APA, Harvard, Vancouver, ISO, and other styles

7

XIONG, DEYI, and MIN ZHANG. "Backward and trigger-based language models for statistical machine translation." Natural Language Engineering 21, no. 2 (2013): 201–26. http://dx.doi.org/10.1017/s1351324913000168.

Full text

Abstract:

AbstractThe language model is one of the most important knowledge sources for statistical machine translation. In this article, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We introduce algorithms to integrate the two proposed models into two kinds of state-of-the-art phrase-based decoders. Our experimental results on Chinese/Spanish/Vietnamese-to-English show that both models are able to significantly improve translation quality in terms of BLEU and METEOR over a competitive baseline.

APA, Harvard, Vancouver, ISO, and other styles

8

Kipyatkova, Irina Sergeevna, and Alexey Anatolyevich Karpov. "Development and Research of a Statistical Russian Language Model." SPIIRAS Proceedings 1, no. 12 (2014): 35. http://dx.doi.org/10.15622/sp.12.3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mnih, Andriy, Zhang Yuecheng, and Geoffrey Hinton. "Improving a statistical language model through non-linear prediction." Neurocomputing 72, no. 7-9 (2009): 1414–18. http://dx.doi.org/10.1016/j.neucom.2008.12.025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

O’Boyle, P., M. Owens, and F. J. Smith. "A study of a statistical model of natural language." Irish Journal of Psychology 14, no. 3 (1993): 382–96. http://dx.doi.org/10.1080/03033910.1993.10557945.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Deligne, Sabine, and Yoshinori Sagisaka. "Statistical language modeling with a class-basedn-multigram model." Computer Speech & Language 14, no. 3 (2000): 261–79. http://dx.doi.org/10.1006/csla.2000.0146.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Carter, Simon, and Christof Monz. "Syntactic discriminative language model rerankers for statistical machine translation." Machine Translation 25, no. 4 (2011): 317–39. http://dx.doi.org/10.1007/s10590-011-9108-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Dorado, Rubén. "Statistical models for languaje representation." Revista Ontare 1, no. 1 (2015): 29. http://dx.doi.org/10.21158/23823399.v1.n1.2013.1208.

Full text

Abstract:

ONTARE. REVISTA DE INVESTIGACIÓN DE LA FACULTAD DE INGENIERÍAThis paper discuses several models for the computational representation of language. First, some n-gram models that are based on Markov models are introduced. Second, a family of models known as the exponential models is taken into account. This family in particular allows the incorporation of several features to model. Third, a recent current of research, the probabilistic Bayesian approach, is discussed. In this kind of models, language is modeled as a probabilistic distribution. Several distributions and probabilistic processes, such as the Dirichlet distribution and the Pitman- Yor process, are used to approximate the linguistic phenomena. Finally, the problem of sparseness of the language and its common solution known as smoothing is discussed. RESUMENEste documento discute varios modelos para la representación computacional del lenguaje. En primer lugar, se introducen los modelos de n-gramas que son basados en los modelos Markov. Luego, se toma en cuenta una familia de modelos conocido como el modelo exponencial. Esta familia en particular permite la incorporación de varias funciones para modelar. Como tercer punto, se discute una corriente reciente de la investigación, el enfoque probabilístico Bayesiano. En este tipo de modelos, el lenguaje es modelado como una distribución probabilística. Se utilizan varias distribuciones y procesos probabilísticos para aproximar los fenómenos lingüísticos, tales como la distribución de Dirichlet y el proceso de Pitman-Yor. Finalmente, se discute el problema de la escasez del lenguaje y su solución más común conocida como smoothing o redistribución.

APA, Harvard, Vancouver, ISO, and other styles

14

Sulistyan, Darryl Yunus. "Factored Statistical Machine Translation for German-English." Journal of Applied Information, Communication and Technology 5, no. 1 (2018): 37–45. http://dx.doi.org/10.33555/ejaict.v5i1.47.

Full text

Abstract:

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhang, Jing, Changhai Wang, Annamalai Muthu, and V. M. Varatharaju. "Computer multimedia assisted language and literature teaching using Heuristic hidden Markov model and statistical language model." Computers & Electrical Engineering 98 (March 2022): 107715. http://dx.doi.org/10.1016/j.compeleceng.2022.107715.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Sujaini, Herry. "Improving the role of language model in statistical machine translation (Indonesian-Javanese)." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 2102. http://dx.doi.org/10.11591/ijece.v10i2.pp2102-2109.

Full text

Abstract:

The statistical machine translation (SMT) is widely used by researchers and practitioners in recent years. SMT works with quality that is determined by several important factors, two of which are language and translation model. Research on improving the translation model has been done quite a lot, but the problem of optimizing the language model for use on machine translators has not received much attention. On translator machines, language models usually use trigram models as standard. In this paper, we conducted experiments with four strategies to analyze the role of the language model used in the Indonesian-Javanese translation machine and show improvement compared to the baseline system with the standard language model. The results of this research indicate that the use of 3-gram language models is highly recommended in SMT.

APA, Harvard, Vancouver, ISO, and other styles

17

Nazar, Rogelio. "A statistical approach to term extraction." International Journal of English Studies 11, no. 2 (2011): 159. http://dx.doi.org/10.6018/ijes/2011/2/149691.

Full text

Abstract:

This paper argues in favor of a statistical approach to terminology extraction, general to all languages but with language specific parameters. In contrast to many application-oriented terminology studies, which are focused on a particular language and domain, this paper adopts some general principles of the statistical properties of terms and a method to obtain the corresponding language specific parameters. This method is used for the automatic identification of terminology and is quantitatively evaluated in an empirical study of English medical terms. The proposal is theoretically and computationally simple and disregards resources such as linguistic or ontological knowledge. The algorithm learns to identify terms during a training phase where it is shown examples of both terminological and non-terminological units. With these examples, the algorithm creates a model of the terminology that accounts for the frequency of lexical, morphological and syntactic elements of the terms in relation to the non-terminological vocabulary. The model is then used for the later identification of new terminology in previously unseen text. The comparative evaluation shows that performance is significantly higher than other well-known systems.

APA, Harvard, Vancouver, ISO, and other styles

18

Nikitina, Larisa, Fumitaka Furuoka, and Nurliana Kamaruddin. "Language Attitudes and L2 Motivation of Korean Language Learners in Malaysia." Journal of Language and Education 6, no. 2 (2020): 132–46. http://dx.doi.org/10.17323/jle.2020.10716.

Full text

Abstract:

This study examined relationships between language attitudes and L2 motivation of learners of Korean as a Foreign Language (KFL) in a large public university in Malaysia. It employed the socio-educational model of L2 motivation and focused on the relationship between the language learners’ attitudes toward speakers of the target language and their motivation to learn Korean. A systematic statistical analysis was performed to analyse the data collected from 19 (N=19) students. A robust statistical procedure adopted in this study allowed some worthwhile insights into the language attitudes–L2 motivation nexus. The findings indicated that there existed a statistically significant relationship between the language learners’ instrumental orientation and their attitudes toward the speakers of Korean language.

APA, Harvard, Vancouver, ISO, and other styles

19

Takahashi, Shuntaro, and Kumiko Tanaka-Ishii. "Evaluating Computational Language Models with Scaling Properties of Natural Language." Computational Linguistics 45, no. 3 (2019): 481–513. http://dx.doi.org/10.1162/coli_a_00355.

Full text

Abstract:

In this article, we evaluate computational models of natural language with respect to the universal statistical behaviors of natural language. Statistical mechanical analyses have revealed that natural language text is characterized by scaling properties, which quantify the global structure in the vocabulary population and the long memory of a text. We study whether five scaling properties (given by Zipf’s law, Heaps’ law, Ebeling’s method, Taylor’s law, and long-range correlation analysis) can serve for evaluation of computational models. Specifically, we test n-gram language models, a probabilistic context-free grammar, language models based on Simon/Pitman-Yor processes, neural language models, and generative adversarial networks for text generation. Our analysis reveals that language models based on recurrent neural networks with a gating mechanism (i.e., long short-term memory; a gated recurrent unit; and quasi-recurrent neural networks) are the only computational models that can reproduce the long memory behavior of natural language. Furthermore, through comparison with recently proposed model-based evaluation methods, we find that the exponent of Taylor’s law is a good indicator of model quality.

APA, Harvard, Vancouver, ISO, and other styles

20

Kadir, R. A., and R. A. Yauri. "Resource description framework triples entity formations using statistical language model." Journal of Fundamental and Applied Sciences 9, no. 4S (2018): 710. http://dx.doi.org/10.4314/jfas.v9i4s.40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Sudoh, Katsuhito, and Mikio Nakano. "Post-dialogue confidence scoring for unsupervised statistical language model training." Speech Communication 45, no. 4 (2005): 387–400. http://dx.doi.org/10.1016/j.specom.2004.10.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Wang, Rui, Hai Zhao, Bao-Liang Lu, Masao Utiyama, and Eiichiro Sumita. "Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, no. 7 (2015): 1209–20. http://dx.doi.org/10.1109/taslp.2015.2425220.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Lee, Jung-Hun, Minho Kim, and Hyuk-Chul Kwon. "Improved Statistical Language Model for Context-sensitive Spelling Error Candidates." Journal of Korea Multimedia Society 20, no. 2 (2017): 371–81. http://dx.doi.org/10.9717/kmms.2017.20.2.371.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

FANG, Gang, Wenbin LIU, and Shemin ZHANG. "Automated DNA Assembly Based on Four-Gram Statistical Language Model." Chinese Journal of Electronics 27, no. 6 (2018): 1200–1205. http://dx.doi.org/10.1049/cje.2018.09.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Huang, Bo, and Xijun Lan. "English corpus and literary analysis based on statistical language model." Cluster Computing 22, S6 (2018): 14897–903. http://dx.doi.org/10.1007/s10586-018-2454-y.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Hu, Xianyao, Richard Xiao, and Andrew Hardie. "How do English translations differ from non-translated English writings? A multi-feature statistical model for linguistic variation analysis." Corpus Linguistics and Linguistic Theory 15, no. 2 (2019): 347–82. http://dx.doi.org/10.1515/cllt-2014-0047.

Full text

Abstract:

Abstract This paper discusses the debatable hypotheses of “Translation Universals”, i. e. the recurring common features of translated texts in relation to original utterances. We propose that, if translational language does have some distinctive linguistic features in contrast to non-translated writings in the same language, those differences should be statistically significant, consistently distributed and systematically co-occurring across registers and genres. Based on the balanced Corpus of Translational English (COTE) and its non-translated English counterpart, the Freiburg-LOB corpus of British English (FLOB), and by deploying a multi-feature statistical analysis on 96 lexical, syntactic and textual features, we try to pinpoint those distinctive features in translated English texts. We also propose that the stylo-statistical model developed in this study will be effective not only in analysing the translational variation of English but also be capable of clustering those variational features into a “translational” dimension which will facilitate a crosslinguistic comparison of translational languages (e. g. translational Chinese) to test the Translation Universals hypotheses.

APA, Harvard, Vancouver, ISO, and other styles

27

TÜR, GÖKHAN, DILEK HAKKANI-TÜR, and KEMAL OFLAZER. "A statistical information extraction system for Turkish." Natural Language Engineering 9, no. 2 (2003): 181–210. http://dx.doi.org/10.1017/s135132490200284x.

Full text

Abstract:

This paper presents the results of a study on information extraction from unrestricted Turkish text using statistical language processing methods. In languages like English, there is a very small number of possible word forms with a given root word. However, languages like Turkish have very productive agglutinative morphology. Thus, it is an issue to build statistical models for specific tasks using the surface forms of the words, mainly because of the data sparseness problem. In order to alleviate this problem, we used additional syntactic information, i.e. the morphological structure of the words. We have successfully applied statistical methods using both the lexical and morphological information to sentence segmentation, topic segmentation, and name tagging tasks. For sentence segmentation, we have modeled the final inflectional groups of the words and combined it with the lexical model, and decreased the error rate to 4.34%, which is 21% better than the result obtained using only the surface forms of the words. For topic segmentation, stems of the words (especially nouns) have been found to be more effective than using the surface forms of the words and we have achieved 10.90% segmentation error rate on our test set according to the weighted TDT-2 segmentation cost metric. This is 32% better than the word-based baseline model. For name tagging, we used four different information sources to model names. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained some improvement. After this, we modeled the morphological analyses of the words, and finally we modeled the tag sequence, and reached an F-Measure of 91.56%, according to the MUC evaluation criteria. Our results are important in the sense that, using linguistic information, i.e. morphological analyses of the words, and a corpus large enough to train a statistical model significantly improves these basic information extraction tasks for Turkish.

APA, Harvard, Vancouver, ISO, and other styles

28

Xu, Weifeng, Dianxiang Xu, Abdulrahman Alatawi, Omar El Ariss, and Yunkai Liu. "Statistical Unigram Analysis for Source Code Repository." International Journal of Semantic Computing 12, no. 02 (2018): 237–60. http://dx.doi.org/10.1142/s1793351x18400123.

Full text

Abstract:

Unigram is a fundamental element of [Formula: see text]-gram in natural language processing. However, unigrams collected from a natural language corpus are unsuitable for solving problems in the domain of computer programming languages. In this paper, we analyze the properties of unigrams collected from an ultra-large source code repository. Specifically, we have collected 1.01 billion unigrams from 0.7 million open source projects hosted at GitHub.com. By analyzing these unigrams, we have discovered statistical properties regarding (1) how developers name variables, methods, and classes, and (2) how developers choose abbreviations. We describe a probabilistic model which relies on these properties for solving a well-known problem in source code analysis: how to expand a given abbreviation to its original indented word. Our empirical study shows that using the unigrams extracted from source code repository outperforms the using of the natural language corpus by 21% when solving the domain specific problems.

APA, Harvard, Vancouver, ISO, and other styles

29

Scarinci, Janice Lee, and Edward Howell. "Implementing a Cultural Model to Increase English-Language Proficiency at an International College." Management:Journal of Sustainable Business and Management Solutions in Emerging Economies 23, no. 2 (2018): 49. http://dx.doi.org/10.7595/management.fon.2018.0014.

Full text

Abstract:

Research Question: The purpose of this study was to determine whether the addition of an American Cultural Model to an existing English as a Second Language (ESL) program improved the performance of international students. Idea: The English language proficiency is essential for students in global emerging economies in order to be competitive, and our study can be generalized to learning other languages within the respective cultural model. Motivation: The results of our study can be applied to higher education worldwide since currently the international business language is English. Data: The data collected were analyzed and interpreted to determine whether cultural training improved scores on the Test of English as a Foreign Language (TOEFL). Tools: Two groups of incoming students were compared as the treatment and control groups, using the t-test with appropriate statistical package. Findings: Data analysis showed a statistically significant difference in TOEFL scores between the control group and the experimental group benefiting from the implementation of the Introduction of the American Cultural Model. Contribution: The English language proficiency is essential for students in global emerging economies in order for them to be competitive, and our study can be generalized to learning other languages within a respective cultural model.

APA, Harvard, Vancouver, ISO, and other styles

30

Zhou, Deyu, and Yulan He. "Semi-Supervised Learning of Statistical Models for Natural Language Understanding." Scientific World Journal 2014 (2014): 1–11. http://dx.doi.org/10.1155/2014/121650.

Full text

Abstract:

Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved inF-measure.

APA, Harvard, Vancouver, ISO, and other styles

31

Mukhamadiyev, Abdinabi, Mukhriddin Mukhiddinov, Ilyos Khujayarov, Mannon Ochilov, and Jinsoo Cho. "Development of Language Models for Continuous Uzbek Speech Recognition System." Sensors 23, no. 3 (2023): 1145. http://dx.doi.org/10.3390/s23031145.

Full text

Abstract:

Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%.

APA, Harvard, Vancouver, ISO, and other styles

32

Buk, Solomiya. "Lexical base as a compressed language model of the world (on material from the Ukrainian language)." Psychology of Language and Communication 13, no. 2 (2009): 35–44. http://dx.doi.org/10.2478/v10057-009-0008-3.

Full text

Abstract:

Lexical base as a compressed language model of the world (on material from the Ukrainian language) In the article the fact is verified that the list of words selected by formal statistical methods (frequency and functional genre unrestrictedness) is not a conglomerate of non-related words. It creates a system of interrelated items and it can be named the "lexical base of language". This selected list of words covers all the spheres of human activities. To verify this statement the invariant synoptical scheme common for ideographic dictionaries of different languages was determined.

APA, Harvard, Vancouver, ISO, and other styles

33

Yang, Jianfeng, Bruce D. McCandliss, Hua Shu, and Jason D. Zevin. "Simulating language-specific and language-general effects in a statistical learning model of Chinese reading." Journal of Memory and Language 61, no. 2 (2009): 238–57. http://dx.doi.org/10.1016/j.jml.2009.05.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Shen, Libin, Jinxi Xu, and Ralph Weischedel. "String-to-Dependency Statistical Machine Translation." Computational Linguistics 36, no. 4 (2010): 649–71. http://dx.doi.org/10.1162/coli_a_00015.

Full text

Abstract:

We propose a novel string-to-dependency algorithm for statistical machine translation. This algorithm employs a target dependency language model during decoding to exploit long distance word relations, which cannot be modeled with a traditional n-gram language model. Experiments show that the algorithm achieves significant improvement in MT performance over a state-of-the-art hierarchical string-to-string system on NIST MT06 and MT08 newswire evaluation sets.

APA, Harvard, Vancouver, ISO, and other styles

35

Wu, H. C., R. W. P. Luk, K. F. Wong, and J. Y. Nie. "Binary Independence Language Model in a Relevance Feedback Environment." International Journal of Software Engineering and Knowledge Engineering 29, no. 06 (2019): 873–95. http://dx.doi.org/10.1142/s021819401950030x.

Full text

Abstract:

Model construction is a kind of knowledge engineering, and building retrieval models is critical to the success of search engines. This article proposes a new (retrieval) language model, called binary independence language model (BILM). It integrates two document-context based language models together into one by the log-odds ratio where these two are language models applied to describe document-contexts of query terms. One model is based on relevance information while the other is based on the non-relevance information. Each model incorporates link dependencies and multiple query term dependencies. The probabilities are interpolated between the relative frequency and the background probabilities. In a simulated relevance feedback environment of top 20 judged documents, our BILM performed statistically significantly better than the other highly effective retrieval models at 95% confidence level across four TREC collections using fixed parameter values for the mean average precision. For the less stable performance measure (i.e. precision at the top 10), no statistical significance is shown between the different models for the individual test collections although numerically our BILM is better than two other models with a confidence level of 95% based on a paired sign test across the test collections of both relevance feedback and retrospective experiments.

APA, Harvard, Vancouver, ISO, and other styles

36

Park, Youngki. "Automatic Generation of Multiple-Choice Questions Based on Statistical Language Model." Journal of The Korean Association of Information Education 20, no. 2 (2016): 197–206. http://dx.doi.org/10.14352/jkaie.20.2.197.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Hull, J. J. "Incorporating language syntax in visual text recognition with a statistical model." IEEE Transactions on Pattern Analysis and Machine Intelligence 18, no. 12 (1996): 1251–55. http://dx.doi.org/10.1109/34.546261.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Huang, Fei, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates. "Learning Representations for Weakly Supervised Natural Language Processing Tasks." Computational Linguistics 40, no. 1 (2014): 85–120. http://dx.doi.org/10.1162/coli_a_00167.

Full text

Abstract:

Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on part-of-speech tagging and information extraction, among other tasks, indicate that features taken from statistical language models, in combination with more traditional features, outperform traditional representations alone, and that graphical model representations outperform n-gram models, especially on sparse and polysemous words.

APA, Harvard, Vancouver, ISO, and other styles

39

Dwivedi, Pankaj. "Review on Machine Translation from English to Kannada." International Journal for Research in Applied Science and Engineering Technology 10, no. 7 (2022): 3908–13. http://dx.doi.org/10.22214/ijraset.2022.45888.

Full text

Abstract:

Abstract: Interlingual is a machine translation tool that uses anartificial language to convey the meaning of reallanguages. The process of converting text from one language to another is known as machine translation.This study provides a better model of machine translation system for English-to-Kannada sentence translation that employs statistically based techniques. Here, we use Moses approach. Moses is a statistical machine translation system. systems are trained on huge amounts of parallel data as well as even bigger amounts of monolingual data in statistical machine translation. Parallel data is a set of sentences in two languages that are sentence- aligned, meaning that each sentence in one languageis matched with its translated counterpart in the other. Moses training technique takes the parallel data and infers translation correspondences betweenthe two languages of interest by looking for co- occurrences of words and segments. The two main components in Moses are the training pipeline and the decoder. The training pipeline consists of a set oftools that take raw data and convert it into a machinetranslation model. The Moses decoder determines thehighest scoring sentence in the target language that matches a given source sentence.

APA, Harvard, Vancouver, ISO, and other styles

40

Shutova, Ekaterina, Simone Teufel, and Anna Korhonen. "Statistical Metaphor Processing." Computational Linguistics 39, no. 2 (2013): 301–53. http://dx.doi.org/10.1162/coli_a_00124.

Full text

Abstract:

Metaphor is highly frequent in language, which makes its computational processing indispensable for real-world NLP applications addressing semantic tasks. Previous approaches to metaphor modeling rely on task-specific hand-coded knowledge and operate on a limited domain or a subset of phenomena. We present the first integrated open-domain statistical model of metaphor processing in unrestricted text. Our method first identifies metaphorical expressions in running text and then paraphrases them with their literal paraphrases. Such a text-to-text model of metaphor interpretation is compatible with other NLP applications that can benefit from metaphor resolution. Our approach is minimally supervised, relies on the state-of-the-art parsing and lexical acquisition technologies (distributional clustering and selectional preference induction), and operates with a high accuracy.

APA, Harvard, Vancouver, ISO, and other styles

41

BASIRAT, A., H. FAILI, and J. NIVRE. "A statistical model for grammar mapping." Natural Language Engineering 22, no. 2 (2015): 215–55. http://dx.doi.org/10.1017/s1351324915000017.

Full text

Abstract:

AbstractThe two main classes of grammars are (a) hand-crafted grammars, which are developed by language experts, and (b) data-driven grammars, which are extracted from annotated corpora. This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combine their advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars (LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in the XTAG project, and the data-driven LTAG, which is automatically extracted from the Penn Treebank and used by the MICA parser. We propose a statistical model for mapping any elementary tree sequence of the MICA grammar onto a proper elementary tree sequence of the XTAG grammar. The model has been tested on three subsets of the WSJ corpus that have average lengths of 10, 16, and 18 words, respectively. The experimental results show that full-parse trees with average F1-scores of 72.49, 64.80, and 62.30 points could be built from 94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets, respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences, the proposed model significantly improves the efficiency of parsing in the XTAG system.

APA, Harvard, Vancouver, ISO, and other styles

42

GOTOH, YOSHIHIKO, and STEVE RENALS. "Topic-based mixture language modelling." Natural Language Engineering 5, no. 4 (1999): 355–75. http://dx.doi.org/10.1017/s1351324900002278.

Full text

Abstract:

This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost.

APA, Harvard, Vancouver, ISO, and other styles

43

Venkataraman, Anand. "A Statistical Model for Word Discovery in Transcribed Speech." Computational Linguistics 27, no. 3 (2001): 351–72. http://dx.doi.org/10.1162/089120101317066113.

Full text

Abstract:

A statistical model for segmentation and word discovery in continuous speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described. Results are also presented of empirical tests showing that the algorithm is competitive with other models that have been used for similar tasks.

APA, Harvard, Vancouver, ISO, and other styles

44

P., Dr Karrupusamy. "Analysis of Neural Network Based Language Modeling." March 2020 2, no. 1 (2020): 53–63. http://dx.doi.org/10.36548/jaicn.2020.1.006.

Full text

Abstract:

The fundamental and core process of the natural language processing is the language modelling usually referred as the statistical language modelling. The language modelling is also considered to be vital in the processing the natural languages as the other chores such as the completion of sentences, recognition of speech automatically, translations of the statistical machines, and generation of text and so on. The success of the viable natural language processing totally relies on the quality of the modelling of the language. In the previous spans the research field such as the linguistics, psychology, speech recognition, data compression, neuroscience, machine translation etc. As the neural network are the very good choices for having a quality language modelling the paper presents the analysis of neural networks in the modelling of the language. Utilizing some of the dataset such as the Penn Tree bank, Billion Word Benchmark and the Wiki Test the neural network models are evaluated on the basis of the word error rate, perplexity and the bilingual evaluation under study scores to identify the optimal model.

APA, Harvard, Vancouver, ISO, and other styles

45

P., Dr Karrupusamy. "Analysis of Neural Network Based Language Modeling." March 2020 2, no. 1 (2020): 53–63. http://dx.doi.org/10.36548/jaicn.2020.3.006.

Full text

Abstract:

The fundamental and core process of the natural language processing is the language modelling usually referred as the statistical language modelling. The language modelling is also considered to be vital in the processing the natural languages as the other chores such as the completion of sentences, recognition of speech automatically, translations of the statistical machines, and generation of text and so on. The success of the viable natural language processing totally relies on the quality of the modelling of the language. In the previous spans the research field such as the linguistics, psychology, speech recognition, data compression, neuroscience, machine translation etc. As the neural network are the very good choices for having a quality language modelling the paper presents the analysis of neural networks in the modelling of the language. Utilizing some of the dataset such as the Penn Tree bank, Billion Word Benchmark and the Wiki Test the neural network models are evaluated on the basis of the word error rate, perplexity and the bilingual evaluation under study scores to identify the optimal model.

APA, Harvard, Vancouver, ISO, and other styles

46

Sharma, Deepak. "EARLY DETECTION OF FACTORS, INCLUDING PANDEMICS AND DISASTERS, LEADING TO LANGUAGE ENDANGERMENT: THINKING STATISTICALLY." IARS' International Research Journal 11, no. 1 (2021): 31–35. http://dx.doi.org/10.51611/iars.irj.v11i1.2021.153.

Full text

Abstract:

The target of this research work is to use a statistical technique on different languages to identify significant factors of endangered languages with similar characteristics to build a model for language endangerment. Factor analysis is used to identify factors. The factors are used to construct a model with and without interaction terms. First three variables (i.e. speakers, longitude and latitude) are analyzed to identify two factors and then these three variables and three interaction terms are used to construct the model. Different variables were identified and a model with and without interaction terms is built using the identified factors. The result shows that the model has significant predictive power. The predictors were retrieved from the dataset. The outcome encourages future studies towards defining techniques of language endangerment prediction for analyzing factors of language endangerment.

APA, Harvard, Vancouver, ISO, and other styles

47

Riezler, Stefan, and Yi Liu. "Query Rewriting Using Monolingual Statistical Machine Translation." Computational Linguistics 36, no. 3 (2010): 569–82. http://dx.doi.org/10.1162/coli_a_00010.

Full text

Abstract:

Long queries often suffer from low recall in Web search due to conjunctive term matching. The chances of matching words in relevant documents can be increased by rewriting query terms into new terms with similar statistical properties. We present a comparison of approaches that deploy user query logs to learn rewrites of query terms into terms from the document space. We show that the best results are achieved by adopting the perspective of bridging the “lexical chasm” between queries and documents by translating from a source language of user queries into a target language of Web documents. We train a state-of-the-art statistical machine translation model on query-snippet pairs from user query logs, and extract expansion terms from the query rewrites produced by the monolingual translation system. We show in an extrinsic evaluation in a real-world Web search task that the combination of a query-to-snippet translation model with a query language model achieves improved contextual query expansion compared to a state-of-the-art query expansion model that is trained on the same query log data.

APA, Harvard, Vancouver, ISO, and other styles

48

ONNIS, LUCA, WIN EE CHUN, and MATTHEW LOU-MAGNUSON. "Improved statistical learning abilities in adult bilinguals." Bilingualism: Language and Cognition 21, no. 2 (2017): 427–33. http://dx.doi.org/10.1017/s1366728917000529.

Full text

Abstract:

Using multiple languages may confer distinct advantages in cognitive control, yet it is unclear whether bilingualism is associated with better implicit statistical learning, a core cognitive ability underlying language. We tested bilingual adults on a challenging task requiring simultaneous learning of two miniature grammars characterized by different statistics. We found that participants learned each grammar significantly better than chance and both grammars equally well. Crucially, a validated continuous measure of bilingual dominance predicted accuracy scores for both artificial grammars in a generalized linear model. The study thus demonstrates the first graded advantage in learning novel statistical relations in adult bilinguals.

APA, Harvard, Vancouver, ISO, and other styles

49

Ferrández-Tordera, Jorge, Sergio Ortiz-Rojas, and Antonio Toral. "CloudLM: a Cloud-based Language Model for Machine Translation." Prague Bulletin of Mathematical Linguistics 105, no. 1 (2016): 51–61. http://dx.doi.org/10.1515/pralin-2016-0002.

Full text

Abstract:

Abstract Language models (LMs) are an essential element in statistical approaches to natural language processing for tasks such as speech recognition and machine translation (MT). The advent of big data leads to the availability of massive amounts of data to build LMs, and in fact, for the most prominent languages, using current techniques and hardware, it is not feasible to train LMs with all the data available nowadays. At the same time, it has been shown that the more data is used for a LM the better the performance, e.g. for MT, without any indication yet of reaching a plateau. This paper presents CloudLM, an open-source cloud-based LM intended for MT, which allows to query distributed LMs. CloudLM relies on Apache Solr and provides the functionality of state-of-the-art language modelling (it builds upon KenLM), while allowing to query massive LMs (as the use of local memory is drastically reduced), at the expense of slower decoding speed.

APA, Harvard, Vancouver, ISO, and other styles

50

Lal, Gend, and Rekha Saha. "A Statistical Approach for Estimating Language Model Reliability with Effective Smoothing Technique." International Journal of Computer Applications 123, no. 16 (2015): 31–35. http://dx.doi.org/10.5120/ijca2015905763.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Statistical Language Model'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles