Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Multivariate analysis. Natural language processing (Computer science).

Articles de revues sur le sujet « Multivariate analysis. Natural language processing (Computer science) »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleurs articles de revues pour votre recherche sur le sujet « Multivariate analysis. Natural language processing (Computer science) ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les articles de revues sur diverses disciplines et organisez correctement votre bibliographie.

1

Duh, Kevin. « Bayesian Analysis in Natural Language Processing ». Computational Linguistics 44, no 1 (mars 2018) : 187–89. http://dx.doi.org/10.1162/coli_r_00310.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Zhao, Liping, Waad Alhoshan, Alessio Ferrari, Keletso J. Letsholo, Muideen A. Ajagbe, Erol-Valeriu Chioasca et Riza T. Batista-Navarro. « Natural Language Processing for Requirements Engineering ». ACM Computing Surveys 54, no 3 (juin 2021) : 1–41. http://dx.doi.org/10.1145/3444689.

Texte intégral
Résumé :
Natural Language Processing for Requirements Engineering (NLP4RE) is an area of research and development that seeks to apply natural language processing (NLP) techniques, tools, and resources to the requirements engineering (RE) process, to support human analysts to carry out various linguistic analysis tasks on textual requirements documents, such as detecting language issues, identifying key domain concepts, and establishing requirements traceability links. This article reports on a mapping study that surveys the landscape of NLP4RE research to provide a holistic understanding of the field. Following the guidance of systematic review, the mapping study is directed by five research questions, cutting across five aspects of NLP4RE research, concerning the state of the literature, the state of empirical research, the research focus, the state of tool development, and the usage of NLP technologies. Our main results are as follows: (i) we identify a total of 404 primary studies relevant to NLP4RE, which were published over the past 36 years and from 170 different venues; (ii) most of these studies (67.08%) are solution proposals, assessed by a laboratory experiment or an example application, while only a small percentage (7%) are assessed in industrial settings; (iii) a large proportion of the studies (42.70%) focus on the requirements analysis phase, with quality defect detection as their central task and requirements specification as their commonly processed document type; (iv) 130 NLP4RE tools (i.e., RE specific NLP tools) are extracted from these studies, but only 17 of them (13.08%) are available for download; (v) 231 different NLP technologies are also identified, comprising 140 NLP techniques, 66 NLP tools, and 25 NLP resources, but most of them—particularly those novel NLP techniques and specialized tools—are used infrequently; by contrast, commonly used NLP technologies are traditional analysis techniques (e.g., POS tagging and tokenization), general-purpose tools (e.g., Stanford CoreNLP and GATE) and generic language lexicons (WordNet and British National Corpus). The mapping study not only provides a collection of the literature in NLP4RE but also, more importantly, establishes a structure to frame the existing literature through categorization, synthesis and conceptualization of the main theoretical concepts and relationships that encompass both RE and NLP aspects. Our work thus produces a conceptual framework of NLP4RE. The framework is used to identify research gaps and directions, highlight technology transfer needs, and encourage more synergies between the RE community, the NLP one, and the software and systems practitioners. Our results can be used as a starting point to frame future studies according to a well-defined terminology and can be expanded as new technologies and novel solutions emerge.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Li, Yong, Xiaojun Yang, Min Zuo, Qingyu Jin, Haisheng Li et Qian Cao. « Deep Structured Learning for Natural Language Processing ». ACM Transactions on Asian and Low-Resource Language Information Processing 20, no 3 (9 juillet 2021) : 1–14. http://dx.doi.org/10.1145/3433538.

Texte intégral
Résumé :
The real-time and dissemination characteristics of network information make net-mediated public opinion become more and more important food safety early warning resources, but the data of petabyte (PB) scale growth also bring great difficulties to the research and judgment of network public opinion, especially how to extract the event role of network public opinion from these data and analyze the sentiment tendency of public opinion comment. First, this article takes the public opinion of food safety network as the research point, and a BLSTM-CRF model for automatically marking the role of event is proposed by combining BLSTM and conditional random field organically. Second, the Attention mechanism based on vocabulary in the field of food safety is introduced, the distance-related sequence semantic features are extracted by BLSTM, and the emotional classification of sequence semantic features is realized by using CNN. A kind of Att-BLSTM-CNN model for the analysis of public opinion and emotional tendency in the field of food safety is proposed. Finally, based on the time series, this article combines the role extraction of food safety events and the analysis of emotional tendency and constructs a net-mediated public opinion early warning model in the field of food safety according to the heat of the event and the emotional intensity of the public to food safety public opinion events.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Wang, Dongyang, Junli Su et Hongbin Yu. « Feature Extraction and Analysis of Natural Language Processing for Deep Learning English Language ». IEEE Access 8 (2020) : 46335–45. http://dx.doi.org/10.1109/access.2020.2974101.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

Taskin, Zehra, et Umut Al. « Natural language processing applications in library and information science ». Online Information Review 43, no 4 (12 août 2019) : 676–90. http://dx.doi.org/10.1108/oir-07-2018-0217.

Texte intégral
Résumé :
Purpose With the recent developments in information technologies, natural language processing (NLP) practices have made tasks in many areas easier and more practical. Nowadays, especially when big data are used in most research, NLP provides fast and easy methods for processing these data. The purpose of this paper is to identify subfields of library and information science (LIS) where NLP can be used and to provide a guide based on bibliometrics and social network analyses for researchers who intend to study this subject. Design/methodology/approach Within the scope of this study, 6,607 publications, including NLP methods published in the field of LIS, are examined and visualized by social network analysis methods. Findings After evaluating the obtained results, the subject categories of publications, frequently used keywords in these publications and the relationships between these words are revealed. Finally, the core journals and articles are classified thematically for researchers working in the field of LIS and planning to apply NLP in their research. Originality/value The results of this paper draw a general framework for LIS field and guides researchers on new techniques that may be useful in the field.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Fairie, Paul, Zilong Zhang, Adam G. D'Souza, Tara Walsh, Hude Quan et Maria J. Santana. « Categorising patient concerns using natural language processing techniques ». BMJ Health & ; Care Informatics 28, no 1 (juin 2021) : e100274. http://dx.doi.org/10.1136/bmjhci-2020-100274.

Texte intégral
Résumé :
ObjectivesPatient feedback is critical to identify and resolve patient safety and experience issues in healthcare systems. However, large volumes of unstructured text data can pose problems for manual (human) analysis. This study reports the results of using a semiautomated, computational topic-modelling approach to analyse a corpus of patient feedback.MethodsPatient concerns were received by Alberta Health Services between 2011 and 2018 (n=76 163), regarding 806 care facilities in 163 municipalities, including hospitals, clinics, community care centres and retirement homes, in a province of 4.4 million. Their existing framework requires manual labelling of pre-defined categories. We applied an automated latent Dirichlet allocation (LDA)-based topic modelling algorithm to identify the topics present in these concerns, and thereby produce a framework-free categorisation.ResultsThe LDA model produced 40 topics which, following manual interpretation by researchers, were reduced to 28 coherent topics. The most frequent topics identified were communication issues causing delays (frequency: 10.58%), community care for elderly patients (8.82%), interactions with nurses (8.80%) and emergency department care (7.52%). Many patient concerns were categorised into multiple topics. Some were more specific versions of categories from the existing framework (eg, communication issues causing delays), while others were novel (eg, smoking in inappropriate settings).DiscussionLDA-generated topics were more nuanced than the manually labelled categories. For example, LDA found that concerns with community care were related to concerns about nursing for seniors, providing opportunities for insight and action.ConclusionOur findings outline the range of concerns patients share in a large health system and demonstrate the usefulness of using LDA to identify categories of patient concerns.
Styles APA, Harvard, Vancouver, ISO, etc.
7

Wei, Wei, Jinsong Wu et Chunsheng Zhu. « Special issue on deep learning for natural language processing ». Computing 102, no 3 (9 janvier 2020) : 601–3. http://dx.doi.org/10.1007/s00607-019-00788-3.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
8

Georgescu, Tiberiu-Marian. « Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents ». Symmetry 12, no 3 (2 mars 2020) : 354. http://dx.doi.org/10.3390/sym12030354.

Texte intégral
Résumé :
This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Gong, Yunlu, Nannan Lu et Jiajian Zhang. « Application of deep learning fusion algorithm in natural language processing in emotional semantic analysis ». Concurrency and Computation : Practice and Experience 31, no 10 (2 octobre 2018) : e4779. http://dx.doi.org/10.1002/cpe.4779.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Mills, Michael T., et Nikolaos G. Bourbakis. « Graph-Based Methods for Natural Language Processing and Understanding—A Survey and Analysis ». IEEE Transactions on Systems, Man, and Cybernetics : Systems 44, no 1 (janvier 2014) : 59–71. http://dx.doi.org/10.1109/tsmcc.2012.2227472.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
11

Belov, Serey, Daria Zrelova, Petr Zrelov et Vladimir Korenkov. « Overview of methods for automatic natural language text processing ». System Analysis in Science and Education, no 3 (2020) (30 septembre 2020) : 8–22. http://dx.doi.org/10.37005/2071-9612-2020-3-8-22.

Texte intégral
Résumé :
This paper provides a brief overview of modern methods and approaches used for automatic processing of text information. In English-language literature, this area of science is called NLP-Natural Language Processing. The very name suggests that the subject of analysis (and for many tasks – and synthesis) are materials presented in one of the natural languages (and for a number of tasks – in several languages simultaneously), i.e. national languages of communication between people. Programming languages are not included in this group. In Russian-language literature, this area is called Computer (or mathematical) linguistics. NLP (computational linguistics) usually includes speech analysis along with text analysis, but in this review speech analysis does not consider. The review used materials from original works, monographs, and a number of articles published the «Open Systems.DBMS» journal.
Styles APA, Harvard, Vancouver, ISO, etc.
12

Alarifi, Abdulaziz, et Ayed Alwadain. « An optimized cognitive-assisted machine translation approach for natural language processing ». Computing 102, no 3 (12 juillet 2019) : 605–22. http://dx.doi.org/10.1007/s00607-019-00741-4.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
13

Kitson, Ezra, et Curtis A. Suttle. « VHost-Classifier : virus-host classification using natural language processing ». Bioinformatics 35, no 19 (1 mars 2019) : 3867–69. http://dx.doi.org/10.1093/bioinformatics/btz151.

Texte intégral
Résumé :
Abstract Motivation When analyzing viral metagenomic sequences, it is often desired to filter the results of a BLAST analysis by the host species of the virus. VHost-Classifier automates this procedure using a natural language processing algorithm written in Python 3, which takes a list of taxonomic identifiers (taxids) returned from a BLAST query using viral sequences as input. The taxid output is binned by the evolutionary lineage of their host, based on string matching the words in their English names. If VHost-Classifier cannot identify a host, it attempts to bin the sequences by the environment from which the sample originated. VHost-Classifier predicts the evolutionary lineage of the host from the virus name and does not rely on referencing taxids against a database; therefore, it is not constrained by the size of a database and can host classify newly characterized viruses. Results Benchmarked on a test dataset of 1000 randomly selected viral taxids on the NCBI taxonomy database, VHost-Classifier assigned, with 100% accuracy, a host to the rank of Class for >93% of viruses, and to the rank of Family for >37% of viruses. Availability and implementation For more information about VHost-Classifier as well as implementation instructions, visit https://github.com/Kzra/VHost-Classifier. Supplementary information Supplementary data are available at Bioinformatics online.
Styles APA, Harvard, Vancouver, ISO, etc.
14

Chen, Jinying, Huaigu Cao et Premkumar Natarajan. « Integrating natural language processing with image document analysis : what we learned from two real-world applications ». International Journal on Document Analysis and Recognition (IJDAR) 18, no 3 (28 mai 2015) : 235–47. http://dx.doi.org/10.1007/s10032-015-0247-x.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
15

Chen, Jinyan, Susanne Becken et Bela Stantic. « Lexicon based Chinese language sentiment analysis method ». Computer Science and Information Systems 16, no 2 (2019) : 639–55. http://dx.doi.org/10.2298/csis181015013c.

Texte intégral
Résumé :
The growing number of social media users and vast volume of posts could provide valuable information about the sentiment toward different locations, services as well as people. Recent advances in Big Data analytics and natural language processing often means to automatically calculate sentiment in these posts. Sentiment analysis is challenging and computationally demanding task due to the volume of data, misspelling, emoticons as well as abbreviations. While significant work was directed toward the sentiment analysis of English text there is limited attention in literature toward the sentiment analytic of Chinese language. In this work we propose method to identify the sentiment in Chinese social media posts and to test our method we rely on posts sent by visitors of Great Barrier Reef by users of most popular Chinese social media platform Sina Weibo. We elaborate process of capturing of weibo posts, describe a creation of lexicon as well as develop and explain algorithm for sentiment calculation. In case study, related to sentiment toward the different GBR destinations, we demonstrate that the proposed method is effective in obtaining the information and is suitable to monitor visitors? opinion.
Styles APA, Harvard, Vancouver, ISO, etc.
16

Chen, Xieling, Ruoyao Ding, Kai Xu, Shan Wang, Tianyong Hao et Yi Zhou. « A Bibliometric Review of Natural Language Processing Empowered Mobile Computing ». Wireless Communications and Mobile Computing 2018 (28 juin 2018) : 1–21. http://dx.doi.org/10.1155/2018/1827074.

Texte intégral
Résumé :
Natural Language Processing (NLP) empowered mobile computing is the use of NLP techniques in the context of mobile environment. Research in this field has drawn much attention given the continually increasing number of publications in the last five years. This study presents the status and development trend of the research field through an objective, systematic, and comprehensive review of relevant publications available from Web of Science. Analysis techniques including a descriptive statistics method, a geographic visualization method, a social network analysis method, a latent dirichlet allocation method, and an affinity propagation clustering method are used. We quantitatively analyze the publications in terms of statistical characteristics, geographical distribution, cooperation relationship, and topic discovery and distribution. This systematic analysis of the field illustrates the publications evolution over time and identifies current research interests and potential directions for future research. Our work can potentially assist researchers in keeping abreast of the research status. It can also help monitoring new scientific and technological development in the research field.
Styles APA, Harvard, Vancouver, ISO, etc.
17

Nawab, Khalid, Gretchen Ramsey et Richard Schreiber. « Natural Language Processing to Extract Meaningful Information from Patient Experience Feedback ». Applied Clinical Informatics 11, no 02 (mars 2020) : 242–52. http://dx.doi.org/10.1055/s-0040-1708049.

Texte intégral
Résumé :
Abstract Background Due to reimbursement tied in part to patients' perception of their care, hospitals continue to stress obtaining patient feedback and understanding it to plan interventions to improve patients' experience. We demonstrate the use of natural language processing (NLP) to extract meaningful information from patient feedback obtained through Press Ganey surveys. Methods The first step was to standardize textual data programmatically using NLP libraries. This included correcting spelling mistakes, converting text to lowercase, and removing words that most likely did not carry useful information. Next, we converted numeric data pertaining to each category based on sentiment and care aspect into charts. We selected care aspect categories where there were more negative comments for more in-depth study. Using NLP, we made tables of most frequently appearing words, adjectives, and bigrams. Comments with frequent words/combinations underwent further study manually to understand factors contributing to negative patient feedback. We then used the positive and negative comments as the training dataset for a neural network to perform sentiment analysis on sentences obtained by splitting mixed reviews. Results We found that most of the comments were about doctors and nurses, confirming the important role patients ascribed to these two in patient care. “Room,” “discharge” and “tests and treatments” were the three categories that had more negative than positive comments. We then tabulated commonly appearing words, adjectives, and two-word combinations. We found that climate control, housekeeping and noise levels in the room, time delays in discharge paperwork, conflicting information about discharge plan, frequent blood draws, and needle sticks were major contributors to negative patient feedback. None of this information was available from numeric data alone. Conclusion NLP is an effective tool to gain insight from raw textual patient feedback to extract meaningful information, making it a powerful tool in processing large amounts of patient feedback efficiently.
Styles APA, Harvard, Vancouver, ISO, etc.
18

Pujeri, Bhagyashree P., et Jagadeesh Sai D. « An Anatomization of Language Detection and Translation using NLP Techniques ». International Journal of Innovative Technology and Exploring Engineering 10, no 2 (10 décembre 2020) : 69–77. http://dx.doi.org/10.35940/ijitee.b8265.1210220.

Texte intégral
Résumé :
The issue with identifying language relates to process of determining natural language in which specific text is written. This is one of the big difficulties in the processing of natural languages. Still, they also pose a problem in improving multiclass classification in this area. Language detection and translation a significant Language Identification task are required. The language analysis method may be carried out according to tools available in a particular language if the source language is known. A successful language detection algorithm determines the achievement of the sentiment analysis task and other identification tasks. Processing natural language and machine learning techniques involve knowledge that is annotated with its language. Algorithms for natural language processing must be updated according to language's grammar.This paper proposes a secure language detection and translation technique to solve the security in natural language processing problems. Language detection algorithm based on char n-gram based statistical detector and translation Yandex API is used.While translating, there should be encryption and decryption for that we are using AES Algorithm.
Styles APA, Harvard, Vancouver, ISO, etc.
19

Kovács, László. « Classification Method for Learning Morpheme Analysis ». Journal of Information Technology Research 5, no 4 (octobre 2012) : 85–98. http://dx.doi.org/10.4018/jitr.2012100106.

Texte intégral
Résumé :
The morpheme analysis module is an important component in natural language processing engines. The parser modules are usually based on rule systems created by human experts. In the paper, a novel approach is tested for implementation of the morpheme analyzer module. The proposed structure is based on the theory of formal concept analysis. The word inflection can be considered as a classification problem, where the class label denotes the corresponding transformation rule. The main benefit of the proposed method is the efficient generalization feature. The proposed morpheme analyzer module was implemented in a prototype question generation application.
Styles APA, Harvard, Vancouver, ISO, etc.
20

Al-Makhadmeh, Zafer, et Amr Tolba. « Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach ». Computing 102, no 2 (1 août 2019) : 501–22. http://dx.doi.org/10.1007/s00607-019-00745-0.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
21

Sun, Xu, Wenjie Li, Houfeng Wang et Qin Lu. « Feature-Frequency–Adaptive On-line Training for Fast and Accurate Natural Language Processing ». Computational Linguistics 40, no 3 (septembre 2014) : 563–86. http://dx.doi.org/10.1162/coli_a_00193.

Texte intégral
Résumé :
Training speed and accuracy are two major concerns of large-scale natural language processing systems. Typically, we need to make a tradeoff between speed and accuracy. It is trivial to improve the training speed via sacrificing accuracy or to improve the accuracy via sacrificing speed. Nevertheless, it is nontrivial to improve the training speed and the accuracy at the same time, which is the target of this work. To reach this target, we present a new training method, feature-frequency–adaptive on-line training, for fast and accurate training of natural language processing systems. It is based on the core idea that higher frequency features should have a learning rate that decays faster. Theoretical analysis shows that the proposed method is convergent with a fast convergence rate. Experiments are conducted based on well-known benchmark tasks, including named entity recognition, word segmentation, phrase chunking, and sentiment analysis. These tasks consist of three structured classification tasks and one non-structured classification task, with binary features and real-valued features, respectively. Experimental results demonstrate that the proposed method is faster and at the same time more accurate than existing methods, achieving state-of-the-art scores on the tasks with different characteristics.
Styles APA, Harvard, Vancouver, ISO, etc.
22

Jang, Hyejin, et Byungun Yoon. « TechWordNet : Development of semantic relation for technology information analysis using F-term and natural language processing ». Information Processing & ; Management 58, no 6 (novembre 2021) : 102752. http://dx.doi.org/10.1016/j.ipm.2021.102752.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
23

Karbab, ElMouatez Billah, et Mourad Debbabi. « MalDy : Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports ». Digital Investigation 28 (avril 2019) : S77—S87. http://dx.doi.org/10.1016/j.diin.2019.01.017.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
24

Khanbhai, Mustafa, Patrick Anyadi, Joshua Symons, Kelsey Flott, Ara Darzi et Erik Mayer. « Applying natural language processing and machine learning techniques to patient experience feedback : a systematic review ». BMJ Health & ; Care Informatics 28, no 1 (mars 2021) : e100262. http://dx.doi.org/10.1136/bmjhci-2020-100262.

Texte intégral
Résumé :
ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.
Styles APA, Harvard, Vancouver, ISO, etc.
25

Mahendhiran, P. D., et S. Kannimuthu. « Deep Learning Techniques for Polarity Classification in Multimodal Sentiment Analysis ». International Journal of Information Technology & ; Decision Making 17, no 03 (mai 2018) : 883–910. http://dx.doi.org/10.1142/s0219622018500128.

Texte intégral
Résumé :
Contemporary research in Multimodal Sentiment Analysis (MSA) using deep learning is becoming popular in Natural Language Processing. Enormous amount of data are obtainable from social media such as Facebook, WhatsApp, YouTube, Twitter and microblogs every day. In order to deal with these large multimodal data, it is difficult to identify the relevant information from social media websites. Hence, there is a need to improve an intellectual MSA. Here, Deep Learning is used to improve the understanding and performance of MSA better. Deep Learning delivers automatic feature extraction and supports to achieve the best performance to enhance the combined model that integrates Linguistic, Acoustic and Video information extraction method. This paper focuses on the various techniques used for classifying the given portion of natural language text, audio and video according to the thoughts, feelings or opinions expressed in it, i.e., whether the general attitude is Neutral, Positive or Negative. From the results, it is perceived that Deep Learning classification algorithm gives better results compared to other machine learning classifiers such as KNN, Naive Bayes, Random Forest, Random Tree and Neural Net model. The proposed MSA in deep learning is to identify sentiment in web videos which conduct the poof-of-concept experiments that proved, in preliminary experiments using the ICT-YouTube dataset, our proposed multimodal system achieves an accuracy of 96.07%.
Styles APA, Harvard, Vancouver, ISO, etc.
26

Mahmoud, Adnen, et Mounir Zrigui. « Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic ». International Arab Journal of Information Technology 18, no 1 (31 décembre 2020) : 1–7. http://dx.doi.org/10.34028/iajit/18/1/1.

Texte intégral
Résumé :
Paraphrase detection allows determining how original and suspect documents convey the same meaning. It has attracted attention from researchers in many Natural Language Processing (NLP) tasks such as plagiarism detection, question answering, information retrieval, etc., Traditional methods (e.g., Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic relations when sentences may not contain any common words or the co-occurrence of words is rarely present. Therefore, we proposed a deep learning model based on Global Word embedding (GloVe) and Recurrent Convolutional Neural Network (RCNN). It was efficient for capturing more contextual dependencies between words vectors with precise semantic meanings. Seeing the lack of resources in Arabic language publicly available, we developed a paraphrased corpus automatically. It preserved syntactic and semantic structures of Arabic sentences using word2vec model and Part-Of-Speech (POS) annotation. Overall experiments shown that our proposed model outperformed the state-of-the-art methods in terms of precision and recall
Styles APA, Harvard, Vancouver, ISO, etc.
27

Ko, Ching-Ru, et Hsien-Tsung Chang. « LSTM-based sentiment analysis for stock price forecast ». PeerJ Computer Science 7 (11 mars 2021) : e408. http://dx.doi.org/10.7717/peerj-cs.408.

Texte intégral
Résumé :
Investing in stocks is an important tool for modern people’s financial management, and how to forecast stock prices has become an important issue. In recent years, deep learning methods have successfully solved many forecast problems. In this paper, we utilized multiple factors for the stock price forecast. The news articles and PTT forum discussions are taken as the fundamental analysis, and the stock historical transaction information is treated as technical analysis. The state-of-the-art natural language processing tool BERT are used to recognize the sentiments of text, and the long short term memory neural network (LSTM), which is good at analyzing time series data, is applied to forecast the stock price with stock historical transaction information and text sentiments. According to experimental results using our proposed models, the average root mean square error (RMSE ) has 12.05 accuracy improvement.
Styles APA, Harvard, Vancouver, ISO, etc.
28

Wiebe, Janyce, Theresa Wilson, Rebecca Bruce, Matthew Bell et Melanie Martin. « Learning Subjective Language ». Computational Linguistics 30, no 3 (septembre 2004) : 277–308. http://dx.doi.org/10.1162/0891201041850885.

Texte intégral
Résumé :
Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. Clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identified using distributional similarity. The features are also examined working together in concert. The features, generated from different data sets using different procedures, exhibit consistency in performance in that they all do better and worse on the same data sets. In addition, this article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinion piece recognition (a type of text categorization and genre detection) to demonstrate the utility of the knowledge acquired in this article.
Styles APA, Harvard, Vancouver, ISO, etc.
29

Garnsey, Margaret R., et Ingrid E. Fisher. « Appearance of New Terms in Accounting Language : A Preliminary Examination of Accounting Pronouncements and Financial Statements ». Journal of Emerging Technologies in Accounting 5, no 1 (1 janvier 2008) : 17–36. http://dx.doi.org/10.2308/jeta.2008.5.1.17.

Texte intégral
Résumé :
ABSTRACT: Accounting language evolves as the transactions and organizations it provides guidance for change. We provide a preliminary analysis of terms used in official accounting pronouncements and annual corporate financial statements. Initial results show statistical natural language-processing techniques provide a means of identifying new terms as they enter the lexicon. These techniques should be valuable in deriving a complete accounting lexicon as well as in constructing and maintaining an accounting thesaurus to support information retrieval.
Styles APA, Harvard, Vancouver, ISO, etc.
30

Leeson, William, Adam Resnick, Daniel Alexander et John Rovers. « Natural Language Processing (NLP) in Qualitative Public Health Research : A Proof of Concept Study ». International Journal of Qualitative Methods 18 (1 janvier 2019) : 160940691988702. http://dx.doi.org/10.1177/1609406919887021.

Texte intégral
Résumé :
Qualitative data-analysis methods provide thick, rich descriptions of subjects’ thoughts, feelings, and lived experiences but may be time-consuming, labor-intensive, or prone to bias. Natural language processing (NLP) is a machine learning technique from computer science that uses algorithms to analyze textual data. NLP allows processing of large amounts of data almost instantaneously. As researchers become conversant with NLP, it is becoming more frequently employed outside of computer science and shows promise as a tool to analyze qualitative data in public health. This is a proof of concept paper to evaluate the potential of NLP to analyze qualitative data. Specifically, we ask if NLP can support conventional qualitative analysis, and if so, what its role is. We compared a qualitative method of open coding with two forms of NLP, Topic Modeling, and Word2Vec to analyze transcripts from interviews conducted in rural Belize querying men about their health needs. All three methods returned a series of terms that captured ideas and concepts in subjects’ responses to interview questions. Open coding returned 5–10 words or short phrases for each question. Topic Modeling returned a series of word-probability pairs that quantified how well a word captured the topic of a response. Word2Vec returned a list of words for each interview question ordered by which words were predicted to best capture the meaning of the passage. For most interview questions, all three methods returned conceptually similar results. NLP may be a useful adjunct to qualitative analysis. NLP may be performed after data have undergone open coding as a check on the accuracy of the codes. Alternatively, researchers can perform NLP prior to open coding and use the results to guide their creation of their codebook.
Styles APA, Harvard, Vancouver, ISO, etc.
31

Shi, Lei, Yulin Zhu, Youpeng Zhang et Zhongji Su. « Fault Diagnosis of Signal Equipment on the Lanzhou-Xinjiang High-Speed Railway Using Machine Learning for Natural Language Processing ». Complexity 2021 (28 juillet 2021) : 1–13. http://dx.doi.org/10.1155/2021/9126745.

Texte intégral
Résumé :
The Lanzhou-Xinjiang (Lan-Xin) high-speed railway is one of the principal sections of the railway network in western China, and signal equipment is of great importance in ensuring the safe and efficient operation of the high-speed railway. Over a long period, in the railway operation and maintenance process, the railway signaling and communications department has recorded a large amount of unstructured text information about equipment faults in the form of natural language. However, due to irregularities in the recording methods of these data, it is difficult to use directly. In this paper, a method based on natural language processing (NLP) was adopted to analyze and classify this information. First, the Latent Dirichlet Allocation (LDA) topic model was used to extract the semantic features of the text, which were then expressed in the corresponding topic feature space. Next, the Support Vector Machine (SVM) algorithm was used to construct a signal equipment fault diagnostic model that reduced the impact of sample data imbalance on the classification accuracy. This was compared and analyzed with the traditional Naive Bayes (NB), Logistic Regression (LR), Random Forest (RF), and K-Nearest Neighbor (KNN) algorithms. This study used signal equipment failure text data from the Lan-Xin high-speed railway to conduct experimental analysis and verify the effectiveness of the proposed method. Experiments showed that the accuracy of the SVM classification algorithm could reach 0.84 after being combined with the LDA topic model, which verifies that the natural language processing method can effectively realize the fault diagnosis of signal equipment and has certain guiding significance for the maintenance of field signal equipment.
Styles APA, Harvard, Vancouver, ISO, etc.
32

Liu, Guangfeng, Xianying Huang, Xiaoyang Liu et Anzhi Yang. « A Novel Aspect-based Sentiment Analysis Network Model Based on Multilingual Hierarchy in Online Social Network ». Computer Journal 63, no 3 (3 mai 2019) : 410–24. http://dx.doi.org/10.1093/comjnl/bxz031.

Texte intégral
Résumé :
Abstract In recent years, sentiment analysis based on aspects has become one of the research hotspots in the field of natural language processing. Aiming at the fact that the existing network model cannot fully obtain the interrelationship between sentences in the same comment and the long-distance dependence of specific aspects in the whole comment, a multilingual deep hierarchical model combining regional convolutional neural network and bidirectional LSTM network is proposed. The model obtains the time series relationship of different sentences in the comments through the regional CNN, and obtains the local features of the specific aspects in the sentence and the long-distance dependence in the whole comment through the hierarchical attention network. In addition, the model improves the word vector representation based on the gate mechanism to make the model completely independent of the language. Experimental results for different domain datasets in multi-language show that the proposed model achieves better classification results than the traditional deep network model, the network model combining with the attention mechanism and considering the relationship between sentences.
Styles APA, Harvard, Vancouver, ISO, etc.
33

Hammad, Mahmoud, Mohammad Al-Smadi, Qanita Baker, Muntaha D, Nour Al-Khdour, Mutaz Younes et Enas Khwaileh. « Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features ». JUCS - Journal of Universal Computer Science 26, no 6 (28 juin 2020) : 671–97. http://dx.doi.org/10.3897/jucs.2020.036.

Texte intégral
Résumé :
In the digitally connected world that we are living in, people expect to get answers to their questions spontaneously. This expectation increased the burden on Question/Answer platforms such as Stack Overflow and many others. A promising solution to this problem is to detect if a question being asked is similar to a question in the database, then present the answer of the detected question to the user. To address this challenge, we propose a novel Natural Language Processing (NLP) approach that detects if two Arabic questions are similar or not using their extracted morphological, syntactic, semantic, lexical, overlapping, and semantic lexical features. Our approach involves several phases including Arabic text processing, novel feature extraction, and text classifications. Moreover, we conducted a comparison between seven different machine learning classifiers. The included classifiers are: Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Extreme Gradient Boosting (XGB), Random Forests (RF), Adaptive Boosting (AdaBoost), and Multilayer Perceptron (MLP). To conduct our experiments, we used a real-world questions dataset consisting of around 19,136 questions (9,568 pairs of questions) in which our approach achieved 82.93% accuracy using our XGB model on the best features selected by the Random Forest feature selection technique. This high accuracy of our model shows the ability of our approach to correctly detect similar Arabic questions and hence increases user satisfactions.
Styles APA, Harvard, Vancouver, ISO, etc.
34

Giménez, Maite, Javier Palanca et Vicent Botti. « Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis ». Neurocomputing 378 (février 2020) : 315–23. http://dx.doi.org/10.1016/j.neucom.2019.08.096.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
35

Vlahovic, Nikola, Andrija Brljak et Mirjana Pejic-Bach. « Ontology-Based Analysis of Website Structure for Benchmarking in Retail Business ». International Journal of Web Portals 13, no 1 (janvier 2021) : 1–19. http://dx.doi.org/10.4018/ijwp.2021010101.

Texte intégral
Résumé :
With the growing trend of digital transformation, electronic business has become a crucial part of business operations for many companies. Some of the most critical steps in successful transformation pertain not only to data and information acquisition and digitalization but also to adequate publishing and organization of these information resources. Companies share information through their websites, so the structuring of the web content becomes critical during the digital transformation of business. In this paper, the authors present a conceptual model of the managerial tool that is based on semantic analysis of web sites in order to obtain information about the structure of web sites in a particular domain. The resulting ontological model contains information about best practices in web information organization and can be a valuable resource for management when deciding on the organization of their own web content. The system is based on grounded theory and uses current information retrieval methods, natural language processing, semantic networks, and ontologies.
Styles APA, Harvard, Vancouver, ISO, etc.
36

Hancox, Peter, et Nikolaos Polatidis. « An evaluation of keyword, string similarity and very shallow syntactic matching for a university admissions processing infobot ». Computer Science and Information Systems 10, no 4 (2013) : 1703–26. http://dx.doi.org/10.2298/csis121202065h.

Texte intégral
Résumé :
?Infobots? are small-scale natural language question answering systems drawing inspiration from ELIZA-type systems. Their key distinguishing feature is the extraction of meaning from users? queries without the use of syntactic or semantic representations. Three approaches to identifying the users? intended meanings were investigated: keyword based systems, Jaro-based string similarity algorithms and matching based on very shallow syntactic analysis. These were measured against a corpus of queries contributed by users of aWWW-hosted infobot for responding to questions about applications to MSc courses. The most effective system was Jaro with stemmed input (78.57%). It also was able to process ungrammatical input and offer scalability.
Styles APA, Harvard, Vancouver, ISO, etc.
37

Chao, Min-Hua, Amy J. C. Trappey et Chun-Ting Wu. « Emerging Technologies of Natural Language-Enabled Chatbots : A Review and Trend Forecast Using Intelligent Ontology Extraction and Patent Analytics ». Complexity 2021 (24 mai 2021) : 1–26. http://dx.doi.org/10.1155/2021/5511866.

Texte intégral
Résumé :
Natural language processing (NLP) is a critical part of the digital transformation. NLP enables user-friendly interactions between machine and human by making computers understand human languages. Intelligent chatbot is an essential application of NLP to allow understanding of users’ utterance and responding in understandable sentences for specific applications simulating human-to-human conversations and interactions for problem solving or Q&As. This research studies emerging technologies for NLP-enabled intelligent chatbot development using a systematic patent analytic approach. Some intelligent text-mining techniques are applied, including document term frequency analysis for key terminology extractions, clustering method for identifying the subdomains, and Latent Dirichlet Allocation for finding the key topics of patent set. This research utilizes the Derwent Innovation database as the main source for global intelligent chatbot patent retrievals.
Styles APA, Harvard, Vancouver, ISO, etc.
38

Almuqren, Latifah, et Alexandra Cristea. « AraCust : a Saudi Telecom Tweets corpus for sentiment analysis ». PeerJ Computer Science 7 (20 mai 2021) : e510. http://dx.doi.org/10.7717/peerj-cs.510.

Texte intégral
Résumé :
Comparing Arabic to other languages, Arabic lacks large corpora for Natural Language Processing (Assiri, Emam & Al-Dossari, 2018; Gamal et al., 2019). A number of scholars depended on translation from one language to another to construct their corpus (Rushdi-Saleh et al., 2011). This paper presents how we have constructed, cleaned, pre-processed, and annotated our 20,0000 Gold Standard Corpus (GSC) AraCust, the first Telecom GSC for Arabic Sentiment Analysis (ASA) for Dialectal Arabic (DA). AraCust contains Saudi dialect tweets, processed from a self-collected Arabic tweets dataset and has been annotated for sentiment analysis, i.e.,manually labelled (k=0.60). In addition, we have illustrated AraCust’s power, by performing an exploratory data analysis, to analyse the features that were sourced from the nature of our corpus, to assist with choosing the right ASA methods for it. To evaluate our Golden Standard corpus AraCust, we have first applied a simple experiment, using a supervised classifier, to offer benchmark outcomes for forthcoming works. In addition, we have applied the same supervised classifier on a publicly available Arabic dataset created from Twitter, ASTD (Nabil, Aly & Atiya, 2015). The result shows that our dataset AraCust outperforms the ASTD result with 91% accuracy and 89% F1avg score. The AraCust corpus will be released, together with code useful for its exploration, via GitHub as a part of this submission.
Styles APA, Harvard, Vancouver, ISO, etc.
39

Iqbal, Sehrish, Saeed-Ul Hassan, Naif Radi Aljohani, Salem Alelyani, Raheel Nawaz et Lutz Bornmann. « A decade of in-text citation analysis based on natural language processing and machine learning techniques : an overview of empirical studies ». Scientometrics 126, no 8 (23 juin 2021) : 6551–99. http://dx.doi.org/10.1007/s11192-021-04055-1.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
40

Smadi, Mohammad Al, Islam Obaidat, Mahmoud Al-Ayyoub, Rami Mohawesh et Yaser Jararweh. « Using Enhanced Lexicon-Based Approaches for the Determination of Aspect Categories and Their Polarities in Arabic Reviews ». International Journal of Information Technology and Web Engineering 11, no 3 (juillet 2016) : 15–31. http://dx.doi.org/10.4018/ijitwe.2016070102.

Texte intégral
Résumé :
Sentiment Analysis (SA) is the process of determining the sentiment of a text written in a natural language to be positive, negative or neutral. It is one of the most interesting subfields of natural language processing (NLP) and Web mining due to its diverse applications and the challenges associated with applying it on the massive amounts of textual data available online (especially, on social networks). Most of the current work on SA focus on the English language and work on the sentence-level or the document-level. This work focuses on the less studied version of SA, which is aspect-based SA (ABSA) for the Arabic language. Specifically, this work considers two ABSA tasks: aspect category determination and aspect category polarity determination, and makes use of the publicly available human annotated Arabic dataset (HAAD) along with its baseline experiments conducted by HAAD providers. In this work, several lexicon-based approaches are presented for the two tasks at hand and show that some of the presented approaches significantly outperforms the best-known result on the given dataset. An enhancement of 9% and 46% were achieved in the tasks aspect category determination and aspect category polarity determination respectively.
Styles APA, Harvard, Vancouver, ISO, etc.
41

Asatani, Kimitaka, Haruo Takeda, Hiroko Yamano et Ichiro Sakata. « Scientific Attention to Sustainability and SDGs : Meta-Analysis of Academic Papers ». Energies 13, no 4 (21 février 2020) : 975. http://dx.doi.org/10.3390/en13040975.

Texte intégral
Résumé :
Scientific research plays an important role in the achievement of a sustainable society. However, grasping the trends in sustainability research is difficult because studies are not devised and conducted in a top-down manner with Sustainable Development Goals (SDGs). To understand the bottom-up research activities, we analyzed over 300,000 publications concerned with sustainability by using citation network analysis and natural language processing. The results suggest that sustainability science’s diverse and dynamic changes have been occurring over the last few years; several new topics, such as nanocellulose and global health, have begun to attract widespread scientific attention. We further examined the relationship between sustainability research subjects and SDGs and found significant correspondence between the two. Moreover, we extracted SDG topics that were discussed following a convergent approach in academic studies, such as “inclusive society” and “early childhood development”, by observing the convergence of terms in the citation network. These results are valuable for government officials, private companies, and academic researchers, empowering them to understand current academic progress along with research attention devoted to SDGs.
Styles APA, Harvard, Vancouver, ISO, etc.
42

Ye, Huanzhuo, et Yuan Li. « Fuzzy Cloud Evaluation of Service Quality Based on DP-FastText ». WSEAS TRANSACTIONS ON COMPUTERS 20 (2 août 2021) : 149–67. http://dx.doi.org/10.37394/23205.2021.20.16.

Texte intégral
Résumé :
This study proposes a service quality evaluation model framework which integrates automatic data acquisition, intelligent data processing and real-time data analysis with online comment data as data sources by introducing natural language processing technology based on management methods to break the traditional idea of over-reliance on human resources for service quality evaluation. The framework is mainly divided into text data preparation, fine-grained sentiment analysis and fuzzy cloud evaluation models. Data preparation module is responsible for preparing the initial data, and the fine-grained sentiment analysis module is responsible for pre-training a fine-grained sentiment classification model. The fuzzy cloud evaluation module uses the data obtained from the first two modules to evaluate service quality. By applying the model into catering industry, the feasibility of the model is proved and individuality, efficiency, dynamicity and intelligence of the model give it more advantage in the practice of service quality evaluation
Styles APA, Harvard, Vancouver, ISO, etc.
43

Wang, Chunlin, Irene Castellón et Elisabet Comelles. « Linguistic analysis of datasets for semantic textual similarity ». Digital Scholarship in the Humanities 35, no 2 (27 avril 2019) : 471–84. http://dx.doi.org/10.1093/llc/fqy076.

Texte intégral
Résumé :
Abstract Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.
Styles APA, Harvard, Vancouver, ISO, etc.
44

Zulqarnain, Muhammad, Ahmed Khalaf Zager Alsaedi, Rozaida Ghazali, Muhammad Ghulam Ghouse, Wareesa Sharif et Noor Aida Husaini. « A comparative analysis on question classification task based on deep learning approaches ». PeerJ Computer Science 7 (3 août 2021) : e570. http://dx.doi.org/10.7717/peerj-cs.570.

Texte intégral
Résumé :
Question classification is one of the essential tasks for automatic question answering implementation in natural language processing (NLP). Recently, there have been several text-mining issues such as text classification, document categorization, web mining, sentiment analysis, and spam filtering that have been successfully achieved by deep learning approaches. In this study, we illustrated and investigated our work on certain deep learning approaches for question classification tasks in an extremely inflected Turkish language. In this study, we trained and tested the deep learning architectures on the questions dataset in Turkish. In addition to this, we used three main deep learning approaches (Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN)) and we also applied two different deep learning combinations of CNN-GRU and CNN-LSTM architectures. Furthermore, we applied the Word2vec technique with both skip-gram and CBOW methods for word embedding with various vector sizes on a large corpus composed of user questions. By comparing analysis, we conducted an experiment on deep learning architectures based on test and 10-cross fold validation accuracy. Experiment results were obtained to illustrate the effectiveness of various Word2vec techniques that have a considerable impact on the accuracy rate using different deep learning approaches. We attained an accuracy of 93.7% by using these techniques on the question dataset.
Styles APA, Harvard, Vancouver, ISO, etc.
45

BENSON, MARK L., RICHARD D. SMITH, NICKOLAY A. KHAZANOV, HEATHER A. CARLSON, BRANDON DIMCHEFF et PETER DRESSLAR. « UPDATING BINDING MOAD — DATA MANAGEMENT AND INFORMATION WORKFLOW ». New Mathematics and Natural Computation 06, no 01 (mars 2010) : 49–56. http://dx.doi.org/10.1142/s1793005710001608.

Texte intégral
Résumé :
Binding MOAD (Mother of All Databases) is the largest collection of high-quality, protein-ligand complexes available from the Protein Data Bank. It has grown to 9837 hand-curated entries. Here, we describe our semi-annual updating procedures and BUDA (Binding Unstructured Data Analysis), a custom workflow tool that incorporates natural language processing technologies to facilitate the annotation process.
Styles APA, Harvard, Vancouver, ISO, etc.
46

Murfi, Hendri, Furida Lusi Siagian et Yudi Satria. « Topic features for machine learning-based sentiment analysis in Indonesian tweets ». International Journal of Intelligent Computing and Cybernetics 12, no 1 (28 février 2019) : 70–81. http://dx.doi.org/10.1108/ijicc-04-2018-0057.

Texte intégral
Résumé :
Purpose The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets. Design/methodology/approach Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class. Findings The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis. Originality/value The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.
Styles APA, Harvard, Vancouver, ISO, etc.
47

Long, Congjun, Xuewen Zhou et Maoke Zhou. « Recognition of Tibetan Maximal-length Noun Phrases Based on Syntax Tree ». ACM Transactions on Asian and Low-Resource Language Information Processing 20, no 2 (30 mars 2021) : 1–13. http://dx.doi.org/10.1145/3423324.

Texte intégral
Résumé :
Frequently corresponding to syntactic components, the Maximal-length Noun Phrase (MNP) possesses abundant syntactic and semantic information and acts a certain semantic role in sentences. Recognition of MNP plays an important role in Natural Language Processing and lays the foundation for analyzing and understanding sentence structure and semantics. By comparing the essence of different MNPs, this article defines the MNP in the Tibetan language from the perspective of syntax tree. A total of 6,038 sentences are extracted from the syntax tree corpus, the structure type, boundary feature, and frequency of MNPs are analyzed, and the MNPs are recognized by applying the sequence tagging model and the syntactic analysis model. The accuracy, recall, and F1 score of the recognition results of applying sequence tagging model are 87.14%, 84.72%, and 85.92%, respectively. The accuracy, recall, and F1 score of the recognition results of applying syntactic analysis model are 87.66%, 87.63%, and 87.65%, respectively.
Styles APA, Harvard, Vancouver, ISO, etc.
48

Shi, Shumin, Dan Luo, Xing Wu, Congjun Long et Heyan Huang. « Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing ». ACM Transactions on Asian and Low-Resource Language Information Processing 20, no 2 (30 mars 2021) : 1–12. http://dx.doi.org/10.1145/3424247.

Texte intégral
Résumé :
Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.
Styles APA, Harvard, Vancouver, ISO, etc.
49

Gong, Renxi, Siqiang Li et Weiyu Peng. « Research on Multi-Attribute Decision-Making in Condition-Based Maintenance for Power Transformers Based on Cloud and Kernel Vector Space Models ». Energies 13, no 22 (14 novembre 2020) : 5948. http://dx.doi.org/10.3390/en13225948.

Texte intégral
Résumé :
Decision-making for the condition-based maintenance (CBM) of power transformers is critical to their sustainable operation. Existing research exhibits significant shortcomings; neither group decision-making nor maintenance intention is considered, which does not satisfy the needs of smart grids. Thus, a multivariate assessment system, which includes the consideration of technology, cost-effectiveness, and security, should be created, taking into account current research findings. In order to address the uncertainty of maintenance strategy selection, this paper proposes a maintenance decision-making model composed of cloud and vector space models. The optimal maintenance strategy is selected in a multivariate assessment system. Cloud models allow for the expression of natural language evaluation information and are used to transform qualitative concepts into quantitative expressions. The subjective and objective weights of the evaluation index are derived from the analytic hierarchy process and the grey relational analysis method, respectively. The kernel vector space model is then used to select the best maintenance strategy through the close degree calculation. Finally, an optimal maintenance strategy is determined. A comparison and analysis of three different representative maintenance strategies resulted in the following findings: The proposed model is effective; it provides a new decision-making method for power transformer maintenance decision-making; it is simple, practical, and easy to combine with the traditional state assessment method, and thus should play a role in transformer fault diagnosis.
Styles APA, Harvard, Vancouver, ISO, etc.
50

Pérez-Guadarramas, Yamel, Manuel Barreiro-Guerrero, Alfredo Simón-Cuevas, Francisco P. Romero et José A. Olivas. « Analysis of OWA operators for automatic keyphrase extraction in a semantic context ». Intelligent Data Analysis 24 (4 décembre 2020) : 43–62. http://dx.doi.org/10.3233/ida-200008.

Texte intégral
Résumé :
Automatic keyphrase extraction from texts is useful for many computational systems in the fields of natural language processing and text mining. Although a number of solutions to this problem have been described, semantic analysis is one of the least exploited linguistic features in the most widely-known proposals, causing the results obtained to have low accuracy and performance rates. This paper presents an unsupervised method for keyphrase extraction, based on the use of lexico-syntactic patterns for extracting information from texts, and a fuzzy topic modeling. An OWA operator combining several semantic measures was applied to the topic modeling process. This new approach was evaluated with Inspec and 500N-KPCrowd datasets. Several approaches within our proposal were evaluated against each other. A statistical analysis was performed to substantiate the best approach of the proposal. This best approach was also compared with other reported systems, giving promising results.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie