Bibliographies: 'CLASSIFICATION OF RESEARCH'

1

Francis, Paul John. "The classification of quasar spectra." Thesis, University of Cambridge, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239185.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wei, Zhihua. "The research on chinese text multi-label classification." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20025/document.

Full text

Abstract:

Text Classification (TC) which is an important field in information technology has many valuable applications. When facing the sea of information resources, the objects of TC are more complicated and diversity. The researches in pursuit of effective and practical TC technology are fairly challenging. More and more researchers regard that multi-label TC is more suited for many applications. This thesis analyses the difficulties and problems in multi-label TC and Chinese text representation based on a mass of algorithms for single-label TC and multi-label TC. Aiming at high dimensionality in feature space, sparse distribution in text representation and poor performance of multi-label classifier, this thesis will bring forward corresponding algorithms from different angles.Focusing on the problem of dimensionality “disaster” when Chinese texts are represented by using n-grams, two-step feature selection algorithm is constructed. The method combines filtering rare features within class and selecting discriminative features across classes. Moreover, the proper value of “n”, the strategy of feature weight and the correlation among features are discussed based on variety of experiments. Some useful conclusions are contributed to the research of n-gram representation in Chinese texts.In a view of the disadvantage in Latent Dirichlet Allocation (LDA) model, that is, arbitrarily revising the variable in smooth process, a new strategy for smoothing based on Tolerance Rough Set (TRS) is put forward. It constructs tolerant class in global vocabulary database firstly and then assigns value for out-of-vocabulary (oov) word in each class according to tolerant class.In order to improve performance of multi-label classifier and degrade computing complexity, a new TC method based on LDA model is applied for Chinese text representation. It extracts topics statistically from texts and then texts are represented by using the topic vector. It shows competitive performance both in English and in Chinese corpus.To enhance the performance of classifiers in multi-label TC, a compound classification framework is raised. It partitions the text space by computing the upper approximation and lower approximation. This algorithm decomposes a multi-label TC problem into several single-label TCs and several multi-label TCs which have less labels than original problem. That is, an unknown text should be classified by single-label classifier when it is partitioned into lower approximation space of some class. Otherwise, it should be classified by corresponding multi-label classifier.An application system TJ-MLWC (Tongji Multi-label Web Classifier) was designed. It could call the result from Search Engines directly and classify these results real-time using improved Naïve Bayes classifier. This makes the browse process more conveniently for users. Users could locate the texts interested immediately according to the class information given by TJ-MLWC La thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels 文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化，分类目标也逐渐多样化，研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务，对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上，针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题，从不同的角度尝试运用粗糙集理论加以解决，提出了相应的算法，主要包括：针对n-gram作为中文文本特征时带来的维数灾难问题，提出了两步特征选择的方法，即去除类内稀有特征和类间特征选择相结合的方法，并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验，得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低，开销大等问题，提出了基于LDA模型的多标签文本分类算法，利用LDA模型提取的主题作为文本特征，构建高效的分类器。在PT3多标签分类转换方法下，该分类算法在中英文数据集上都表现出很好的效果，与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点，提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类，再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能，在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题，提出了一种基于可变精度粗糙集的复合多标签文本分类框架，该框架通过可变精度粗糙集方法划分文本特征空间，进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即，当一篇未知文本被划分到某一类文本的下近似区域时，可以直接用简单的单标签文本分类器判断其类别；当未知文本被划分在边界域时，则采用相应区域的多标签分类器进行分类。实验表明，这种分类框架下，分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统（MLWC），该系统能够直接调用搜索引擎返回的搜索结果，并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类，使用户可以快速地定位搜索结果中感兴趣的文本。

APA, Harvard, Vancouver, ISO, and other styles

3

Mayo, Robert William. "An evaluation of social grade as a classification scheme." Thesis, City University London, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.312902.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Dunlap, James 1963. "Classification and analysis of longwall delays." Thesis, Virginia Tech, 1990. http://hdl.handle.net/10919/42403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Egan, Shaun Peter. "A framework for high speed lexical classification of malicious URLs." Thesis, Rhodes University, 2014. http://hdl.handle.net/10962/d1011933.

Full text

Abstract:

Phishing attacks employ social engineering to target end-users, with the goal of stealing identifying or sensitive information. This information is used in activities such as identity theft or financial fraud. During a phishing campaign, attackers distribute URLs which; along with false information, point to fraudulent resources in an attempt to deceive users into requesting the resource. These URLs are made obscure through the use of several techniques which make automated detection difficult. Current methods used to detect malicious URLs face multiple problems which attackers use to their advantage. These problems include: the time required to react to new attacks; shifts in trends in URL obfuscation and usability problems caused by the latency incurred by the lookups required by these approaches. A new method of identifying malicious URLs using Artificial Neural Networks (ANNs) has been shown to be effective by several authors. The simple method of classification performed by ANNs result in very high classification speeds with little impact on usability. Samples used for the training, validation and testing of these ANNs are gathered from Phishtank and Open Directory. Words selected from the different sections of the samples are used to create a `Bag-of-Words (BOW)' which is used as a binary input vector indicating the presence of a word for a given sample. Twenty additional features which measure lexical attributes of the sample are used to increase classification accuracy. A framework that is capable of generating these classifiers in an automated fashion is implemented. These classifiers are automatically stored on a remote update distribution service which has been built to supply updates to classifier implementations. An example browser plugin is created and uses ANNs provided by this service. It is both capable of classifying URLs requested by a user in real time and is able to block these requests. The framework is tested in terms of training time and classification accuracy. Classification speed and the effectiveness of compression algorithms on the data required to distribute updates is tested. It is concluded that it is possible to generate these ANNs in a frequent fashion, and in a method that is small enough to distribute easily. It is also shown that classifications are made at high-speed with high-accuracy, resulting in little impact on usability.

APA, Harvard, Vancouver, ISO, and other styles

6

Qi, Wang. "Studies in the Dynamics of Science : Exploring emergence, classification, and interdisciplinarity." Doctoral thesis, KTH, Industriell ekonomi och organisation (Inst.), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-184724.

Full text

Abstract:

The dynamic nature of science is embodied in the growth of knowledge in magnitude and the transformation of knowledge in structure. More specifically, the growth in magnitude is indicated by a sharp increase in the number of scientific publications in recent decades. The transformation of knowledge occurs as the boundaries of scientific disciplines become increasingly less distinct, resulting in a complicated situation wherein disciplines and interdisciplinary research topics coexist and co-evolve. Knowledge production in such a context creates challenges for the measurement of science.This thesisaims to develop more flexible bibliometric methodologies in order to address some of the challenges to measuring science effectively. To be specific, this thesis1) proposes a new approach for identifying emerging research topics; 2) measuresthe interdisciplinarity of research topics; 3) explores the accuracy of the journal classification systems of the Web of Science and Scopus; 4) examines the role of cognitive distance in grant decisions; and 5) investigates the effect of cognitive distance between collaborators on their research output. The data used in this thesisaremainly from the in-house Web of Science and Scopus databases of the Centre for Science and Technology Studies (CWTS) at Leiden University. Quantitativeanalyses, in particular bibliometric analyses,are the main research methodologies employed in this thesis. Furthermore, this thesis primarily offers methodological contributions, proposing a series of approaches designed to tackle the challenges created by the dynamics of science. While the major contribution of this dissertation lies in the improvement of certain bibliometric approaches, it also enhances the understanding of the current system of science. In particular, the approaches and research findings presented here have implications for various stakeholders, including publishing organizations, bibliographic database producers, research policy makers, and research funding agencies. Indeed, these approaches could be built into a software tool and thereby be made available to researchers beyond the field of bibliometric studies. QC 20160406

APA, Harvard, Vancouver, ISO, and other styles

7

Vaidya, Priyanka S. "Artificial Intelligence Approach to Breast Cancer Classification." University of Akron / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=akron1240957599.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Monaghan, Mark Peter. "The turmoil of evidence : research utilisation in UK drug classification." Thesis, University of Leeds, 2008. http://etheses.whiterose.ac.uk/701/.

Full text

Abstract:

This thesis investigates the thorny relationship between evidence utilisation and policy making in a heavily politicised policy area. Expectations for the conflux of researcha nd policy formulation have been consolidated in the last decade under the banner of 'evidence-based policy'. In recent times, the debateo ver the nature and utility of evidence-based policy has become much more sophisticated. No longer can the connection between evidence utilisation and policy formulation be conceived in terms of evidence shaping policy outcomes or, conversely, policy being evidence free, where evidence has no impact. Such conceptualisations persist, however, in heavily politicised policy areas, where there is intense media scrutiny of decision making, a lack of consensus on its direction, prolonged conflict between competing interest and stakeholder groups and a permeating sense of crisis. These tend to relate to more 'macro' policy areas, not usually the remit of evidence-based policy-making and evaluative research. Using recent and ongoing developments in UK drug classification policy as a case-study, an explanatory framework of the complex role and nature of evidence in heavily politicised policy areas is developed. Central to this, is the use of a methodological approach that can account for the role of conflict in the policy process. A modified version of the Advocacy Coalition Framework is employed to this end. This, in turn, allows for a range of data-collection methods to be used, including observation and documentary analysis of Parliamentary Select Committee hearings alongside qualitative interviews with a wide-range of key policy actors involved in the decision-making process. From this a nuanced account of the evidence and policy relationship in such contexts is ascertained,which departs from the more established models explaining the evidence and policy nexus. Traditionally, such explanations have been conceived as models of research utilisation. In this research it is suggested that these do not translate effectively as models of evidence-based policy-making. This is because they are beset with some, or all, of the following problems: a) they focus more on 'research' rather than the broader concept of 'evidence'; b) they operate with a static view of the policy process where there is a direct connection between research and policy; c) they restrict the role of evidence to one of policy outcomes, rather than viewing the role of evidence in the process of decision-making; d) they assume that research is the defining influence on the decision-making processe; e) they operate at a high level of abstraction, offering little account of how research is selected for use in decision making. Consequently, a newer addition to the literature is developed, which, it is claimed, avoids these shortcomings.

APA, Harvard, Vancouver, ISO, and other styles

9

Dutta, Bidyarthi, Krishnapada Majumder, and B. K. Sen. "Classification of Keywords Extracted from Research Articles Published in Science Journals." National Institute of Science Communication and Information Resources, 2008. http://hdl.handle.net/10150/105938.

Full text

Abstract:

This paper is based on an analytical study of 335 keywords extracted from titles and abstracts of 70 research articles, taking ten from each year starting from 2000 to 2006, in decreasing order of relevance, on the subject Fermi Liquid, which is a specific subject under the broad area of Condensed Matter Physics. The research articles have been collected from the bibliographic database of INSPEC. The keywords are indexed to critically examine its physical structure that is composed of three fundamental kernels, viz. keyphrase, modulator and qualifier. The keyphrase reflects the central concept, which is usually post-coordinated by the modulator to amend the central concept in accordance with the relevant context. The qualifier comes after the modulator to describe the particular state of the central concept and/or amended concept. The keywords are further classified in 16 classes on the basis of the four parameters, viz. Associativeness, chronological appearance, frequency of occurrence and category. The taxonomy of keywords will enable to analyze research-trend of a subject and to identify potential research-areas of a subject.

APA, Harvard, Vancouver, ISO, and other styles

10

Hajdu, Barat Agnes. "Multilevel education, training, traditions and research on UDC in Hungary." UDC Consortium, 2007. http://hdl.handle.net/10150/105607.

Full text

Abstract:

This paper explores the theory and practice of education in schools and further education as two levels of the Information Society in Hungary. LIS education is considered the third level over previous ones. The curriculum and content of different subjects in schools and their relationship to libraries is summarized, as well as the training programmes for librarians, especially concerning knowledge organization. The long history of UDC usage in Hungary is surveyed, highlighting principal milestones and people. The paper provides a brief overview of recent developments, the situation after the new Hungarian edition, and current UDC usage and research directions.

APA, Harvard, Vancouver, ISO, and other styles

Contents

Academic literature on the topic 'CLASSIFICATION OF RESEARCH'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Dissertations / Theses on the topic "CLASSIFICATION OF RESEARCH"