Dissertations / Theses on the topic 'N-gram language models'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 19 dissertations / theses for your research on the topic 'N-gram language models.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kulhanek, Raymond Daniel. "A Latent Dirichlet Allocation/N-gram Composite Language Model." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1379520876.
Full textZhou, Hanqing. "DBpedia Type and Entity Detection Using Word Embeddings and N-gram Models." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37324.
Full textMehay, Dennis Nolan. "Bean Soup Translation: Flexible, Linguistically-motivated Syntax for Machine Translation." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1345433807.
Full textEbadat, Ali-Reza. "Toward Robust Information Extraction Models for Multimedia Documents." Phd thesis, INSA de Rennes, 2012. http://tel.archives-ouvertes.fr/tel-00760383.
Full textJiang, Yuandong. "Large Scale Distributed Semantic N-gram Language Model." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1316200173.
Full textHannemann, Mirko. "Rozpoznávácí sítě založené na konečných stavových převodnících pro dopředné a zpětné dekódování v rozpoznávání řeči." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-412550.
Full textO'Boyle, Peter L. "A study of an N-gram language model for speech recognition." Thesis, Queen's University Belfast, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333827.
Full textCivera, Saiz Jorge. "Novel statistical approaches to text classification, machine translation and computer-assisted translation." Doctoral thesis, Universitat Politècnica de València, 2008. http://hdl.handle.net/10251/2502.
Full textCivera Saiz, J. (2008). Novel statistical approaches to text classification, machine translation and computer-assisted translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2502
Palancia
Randák, Richard. "N-Grams as a Measure of Naturalness and Complexity." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-90006.
Full textGangireddy, Siva Reddy. "Recurrent neural network language models for automatic speech recognition." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28990.
Full textSusman, Derya. "Turkish Large Vocabulary Continuous Speech Recognition By Using Limited Audio Corpus." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614207/index.pdf.
Full textPacker, Thomas L. "Surface Realization Using a Featurized Syntactic Statistical Language Model." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1195.pdf.
Full textVeselý, Michal. "Dynamický dekodér pro rozpoznávání řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363846.
Full textDoubek, Milan. "Inteligentní nákupní lístek." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236518.
Full textJančík, Zdeněk. "Modelování dynamiky prosodie pro rozpoznávání řečníka." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235929.
Full textWei, Zhihua. "The research on chinese text multi-label classification." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20025/document.
Full textLa thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels
文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化,分类目标也逐渐多样化,研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务,对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上,针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题,从不同的角度尝试运用粗糙集理论加以解决,提出了相应的算法,主要包括:针对n-gram作为中文文本特征时带来的维数灾难问题,提出了两步特征选择的方法,即去除类内稀有特征和类间特征选择相结合的方法,并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验,得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低,开销大等问题,提出了基于LDA模型的多标签文本分类算法,利用LDA模型提取的主题作为文本特征,构建高效的分类器。在PT3多标签分类转换方法下,该分类算法在中英文数据集上都表现出很好的效果,与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点,提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类,再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能,在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题,提出了一种基于可变精度粗糙集的复合多标签文本分类框架,该框架通过可变精度粗糙集方法划分文本特征空间,进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即,当一篇未知文本被划分到某一类文本的下近似区域时,可以直接用简单的单标签文本分类器判断其类别;当未知文本被划分在边界域时,则采用相应区域的多标签分类器进行分类。实验表明,这种分类框架下,分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统(MLWC),该系统能够直接调用搜索引擎返回的搜索结果,并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类,使用户可以快速地定位搜索结果中感兴趣的文本。
Kubalík, Jakub. "Mining of Textual Data from the Web for Speech Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237170.
Full textPrendi, Gonçalo Queiroga. "Stepwise API usage assistance based on N-gram language models." Master's thesis, 2015. http://hdl.handle.net/10071/10910.
Full textO desenvolvimento de software requer a utilização de Application Programming Interfaces (APIs) externas com o objectivo de reutilizar bibliotecas e frameworks. Muitas vezes, os programadores têm dificuldade em utilizar APIs desconhecidas, devido à falta de recursos ou desenho fora do comum. Essas dificuldades provocam inúmeras vezes sequências incorrectas de chamadas às APIs que poderão não produzir o resultado desejado. Os modelos de língua mostraram-se capazes de capturar regularidades em texto, bem como em código. Neste trabalho é explorada a utilização de modelos de língua de n-gramas e a sua capacidade de capturar regularidades na utilização de APIs, através de uma avaliação intrínseca e extrínseca destes modelos em algumas das APIs mais utilizadas na linguagem de programação Java. Para alcançar este objectivo, vários modelos foram treinados sobre repositórios de código do GitHub, contendo centenas de projectos Java que utilizam estas APIs. Com o objectivo de ter uma avaliação completa do desempenho dos modelos de língua, foram seleccionadas APIs de múltiplos domínios e tamanhos de vocabulário. Este trabalho permite concluir que os modelos de língua de n-gramas são capazes de capturar padrões de utilização de APIs devido aos seus baixos valores de perplexidade e a sua alta cobertura, chegando a atingir 100% em alguns casos, o que levou à criação de uma ferramenta de code completion para guiar os programadores na utilização de uma API desconhecida, mas mantendo a possibilidade de a explorar.
Hung, Ta-Hung, and 洪大弘. "Automatic Chinese Character Error Detecting System Based on N-gram Language Model and Pragmatics Knowledge Base." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/52207006755338310956.
Full text朝陽科技大學
資訊工程系碩士班
97
Essay error detection is an important function for computer-aided essay composition. Systems that can detect the spelling errors and usage errors are very helpful for students. Previous systems based on confusion sets of each Chinese character tended to give false alarms and did not explain the errors. To overcome these drawbacks, we implement an error detection system of Chinese essay, based on statistic methods and knowledge base. It can label the errors and give suggestions. Previous works focus on all possible errors from words with similar shape or pronunciation. In addition to the common error patterns, we collect corpus of various correct usage such as idiom, maxim, and slang, which provides context of potential errors. Our system make decision based on n-gram language model, once a word is labeled as an error, the system will give explanation base on the correct context. Thus, our system can offer students information to improve their essay. Traditionally, there are two difficulties on the application of language model. One is data sparseness, another is data adoptability. To deal with the drawback of N-gram language model on the data sparseness problem. We adopt several smoothing methods in our system. To overcome the adoptability, our system combines two language models to fit the usage of students. With a large knowledge base that contains thousands of common error patterns, our system can better identify error candidates. In the experiments, simulate data and real essay corpus are used. We will report the recall and precision of our system, give error analysis, and find the possible benefit of our system. We believe the system can help students and teachers not only in class but also for distance learning via Internet.