Academic literature on the topic '20 newsgroup'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic '20 newsgroup.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "20 newsgroup"

1

Nurdin, Arliyanti, Bernadus Anggo Seno Aji, Anugrayani Bustamin, and Zaenal Abidin. "PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS." Jurnal Tekno Kompak 14, no. 2 (2020): 74. http://dx.doi.org/10.33365/jtk.v14i2.732.

Full text
Abstract:
Karakteristik teks yang tidak terstruktur menjadi tantangan dalam ekstraksi fitur pada bidang pemrosesan teks. Penelitian ini bertujuan untuk membandingkan kinerja dari word embedding seperti Word2Vec, GloVe dan FastText dan diklasifikasikan dengan algoritma Convolutional Neural Network. Ketiga metode ini dipilih karena dapat menangkap makna semantik, sintatik, dan urutan bahkan konteks di sekitar kata jika dibandingkan dengan feature engineering tradisional seperti Bag of Words. Proses word embedding dari metode tersebut akan dibandingkan kinerjanya pada klasifikasi berita dari dataset 20 new
APA, Harvard, Vancouver, ISO, and other styles
2

Zhou, Hongfang, Jie Guo, Yinghui Wang, and Minghua Zhao. "A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms." Computational Intelligence and Neuroscience 2016 (2016): 1–8. http://dx.doi.org/10.1155/2016/1715780.

Full text
Abstract:
Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made
APA, Harvard, Vancouver, ISO, and other styles
3

Ghanem, Khadoudja. "Local and Global Latent Semantic Analysis for Text Categorization." International Journal of Information Retrieval Research 4, no. 3 (2014): 1–13. http://dx.doi.org/10.4018/ijirr.2014070101.

Full text
Abstract:
In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which i
APA, Harvard, Vancouver, ISO, and other styles
4

Borrajo, L., A. Seara Vieira, and E. L. Iglesias. "An HMM-based synthetic view generator to improve the efficiency of ensemble systems." Logic Journal of the IGPL 28, no. 1 (2019): 4–18. http://dx.doi.org/10.1093/jigpal/jzz067.

Full text
Abstract:
Abstract One of the most active areas of research in semi-supervised learning has been to study methods for constructing good ensembles of classifiers. Ensemble systems are techniques that create multiple models and then combine them to produce improved results. These systems usually produce more accurate solutions than a single model would. Specially, multi-view ensemble systems improve the accuracy of text classification because they optimize the functions to exploit different views of the same input data. However, despite being more promising than the single-view approaches, document datase
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Qin, Shaobo Li, Jie Hu, Sen Zhang, and Jianjun Hu. "Tourism Review Sentiment Classification Using a Bidirectional Recurrent Neural Network with an Attention Mechanism and Topic-Enriched Word Vectors." Sustainability 10, no. 9 (2018): 3313. http://dx.doi.org/10.3390/su10093313.

Full text
Abstract:
Sentiment analysis of online tourist reviews is playing an increasingly important role in tourism. Accurately capturing the attitudes of tourists regarding different aspects of the scenic sites or the overall polarity of their online reviews is key to tourism analysis and application. However, the performances of current document sentiment analysis methods are not satisfactory as they either neglect the topics of the document or do not consider that not all words contribute equally to the meaning of the text. In this work, we propose a bidirectional gated recurrent unit neural network model (B
APA, Harvard, Vancouver, ISO, and other styles
6

Kaur, Bipanjyot, and Gourav Bathla. "An efficient technique for hybrid classification and feature extraction using normalization." International Journal of Engineering & Technology 7, no. 2.27 (2018): 156. http://dx.doi.org/10.14419/ijet.v7i2.27.14534.

Full text
Abstract:
Text classification is technique for assigning the class or label to a particular document within predefined class labels. Predefined classes examples are sports, business, technical, education and science etc. Classification is supervised learning technique i.e. these classes are trained with certain features and then document is classified based on similarity measure with these trained document set. Text classification is used in many applications like assigning the label to the documents, separating the spam messages from the genuine one, filtering of text, natural language processing etc.
APA, Harvard, Vancouver, ISO, and other styles
7

Chirra, Venkata RamiReddy, Hoolda Daniel Maddiboyina, Yakobu Dasari, and Ranganadhareddy Aluru. "Performance Evaluation of Email Spam Text Classification Using Deep Neural Networks." Review of Computer Engineering Studies 7, no. 4 (2020): 91–95. http://dx.doi.org/10.18280/rces.070403.

Full text
Abstract:
Spam in email box is received because of advertising, collecting personal information, or to indulge malware through websites or scripts. Most often, spammers send junk mail with an intention of committing email fraud. Today spam mail accounts for 45% of all email and hence there is an ever-increasing need to build efficient spam filters to identify and block spam mail. However, notably today’s spam filters in use are built using traditional approaches such as statistical and content-based techniques. These techniques don’t improve their performance while handling huge data and they need a lot
APA, Harvard, Vancouver, ISO, and other styles
8

Vidyadhari, Ch, N. Sandhya, and P. Premchand. "Particle Grey Wolf Optimizer (PGWO) Algorithm and Semantic Word Processing for Automatic Text Clustering." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 27, no. 02 (2019): 201–23. http://dx.doi.org/10.1142/s0218488519500090.

Full text
Abstract:
Text mining refers to the process of extracting the high-quality information from the text. It is broadly used in applications, like text clustering, text categorization, text classification, etc. Recently, the text clustering becomes the facilitating and challenging task used to group the text document. Due to some irrelevant terms and large dimension, the accuracy of text clustering is reduced. In this paper, the semantic word processing and novel Particle Grey Wolf Optimizer (PGWO) is proposed for automatic text clustering. Initially, the text documents are given as input to the pre-process
APA, Harvard, Vancouver, ISO, and other styles
9

Kjelgren, Roger, and Larry Rupp. "461 Multimedia Dissemination On and Off Campus of Two Landscape Horticulture Courses." HortScience 35, no. 3 (2000): 473C—473. http://dx.doi.org/10.21273/hortsci.35.3.473c.

Full text
Abstract:
We developed two courses, sustainable landscaping and landscape water conservation, to meet time-constrained students on campus and place-bound students off campus. Lecture material consisting of text, slides, drawings, and some video were assembled digitally using presentation software. Each course was broken into nine to10 units by topic matter, and each unit consisted of 50 to 100 individual “slides” containing visuals, text, and audio narration. The lecture material was then packaged for student consumption onto videotape and CD-ROM, and on the Web (without audio) and as hard copy. Student
APA, Harvard, Vancouver, ISO, and other styles
10

Ogura, Hiroshi, Hiromi Amano, and Masato Kondo. "Gamma-Poisson Distribution Model for Text Categorization." ISRN Artificial Intelligence 2013 (April 4, 2013): 1–17. http://dx.doi.org/10.1155/2013/829630.

Full text
Abstract:
We introduce a new model for describing word frequency distributions in documents for automatic text classification tasks. In the model, the gamma-Poisson probability distribution is used to achieve better text modeling. The framework of the modeling and its application to text categorization are demonstrated with practical techniques for parameter estimation and vector normalization. To investigate the efficiency of our model, text categorization experiments were performed on 20 Newsgroups, Reuters-21578, Industry Sector, and TechTC-100 datasets. The results show that the model allows perform
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "20 newsgroup"

1

Jönsson, Mattias, and Lucas Borg. "How to explain graph-based semi-supervised learning for non-mathematicians?" Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20339.

Full text
Abstract:
Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "20 newsgroup"

1

Ghanem, Khadoudja. "Local and Global Latent Semantic Analysis for Text Categorization." In Information Retrieval and Management. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5191-1.ch060.

Full text
Abstract:
In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "20 newsgroup"

1

Lima, João Marcos Carvalho, and José Everardo Bessa Maia. "A Topical Word Embeddings for Text Classification." In XV Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2018. http://dx.doi.org/10.5753/eniac.2018.4401.

Full text
Abstract:
This paper presents an approach that uses topic models based on LDA to represent documents in text categorization problems. The document representation is achieved through the cosine similarity between document embeddings and embeddings of topic words, creating a Bag-of-Topics (BoT) variant. The performance of this approach is compared against those of two other representations: BoW (Bag-of-Words) and Topic Model, both based on standard tf-idf. Also, to reveal the effect of the classifier, we compared the performance of the nonlinear classifier SVM against that of the linear classifier Naive B
APA, Harvard, Vancouver, ISO, and other styles
2

Albishre, Khaled, Mubarak Albathan, and Yuefeng Li. "Effective 20 Newsgroups Dataset Cleaning." In 2015 IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). IEEE, 2015. http://dx.doi.org/10.1109/wi-iat.2015.90.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Кривошеев, Николай, Nikolay Krivosheev, Владимир Спицын, and Vladimir Spicyn. "Machine Learning Methods for Classification Textual Information." In 29th International Conference on Computer Graphics, Image Processing and Computer Vision, Visualization Systems and the Virtual Environment GraphiCon'2019. Bryansk State Technical University, 2019. http://dx.doi.org/10.30987/graphicon-2019-1-266-269.

Full text
Abstract:
A method for classifying textual information based on the apparatus of convolutional neural networks is considered. The text preprocessing algorithm is presented. Text preprocessing consists of: lemmatizing words, removing stop words, processing text characters, etc. The word-by-word conversion of the text into dense vectors is performed. Testing is carried out on the basis of the text data of "The 20 Newsgroups". This sample contains a collection of approximately 20,000 news stories in English, which is divided (approximately) evenly between 20 different categories. The accuracy of the best c
APA, Harvard, Vancouver, ISO, and other styles
4

Xun, Guangxu, Yaliang Li, Wayne Xin Zhao, Jing Gao, and Aidong Zhang. "A Correlated Topic Model Using Word Embeddings." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/588.

Full text
Abstract:
Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embed
APA, Harvard, Vancouver, ISO, and other styles
5

Jiang, He, Yangqiu Song, Chenguang Wang, Ming Zhang, and Yizhou Sun. "Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/270.

Full text
Abstract:
Heterogeneous information networks (HINs) is a general representation of many real world applications. The difference between HIN and traditional homogeneous graphs is that the nodes and edges in HIN are with types. Then in the many applications, we need to consider the types to make the approach more semantically meaningful. For the applications that annotation is expensive, on natural way is to consider semi-supervised learning over HIN. In this paper, we present a semi-supervised learning algorithm constrained by the types of HINs. We first decompose the original HIN into several semantical
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!