To see the other types of publications on this topic, follow the link: Big Data, Hadoop, Business Intelligence, MapReduce.

Journal articles on the topic 'Big Data, Hadoop, Business Intelligence, MapReduce'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 35 journal articles for your research on the topic 'Big Data, Hadoop, Business Intelligence, MapReduce.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Xu, Yi Qiao. "Massive Data Analysis Based MapReduce Structure on Hadoop System." Advanced Materials Research 981 (July 2014): 262–66. http://dx.doi.org/10.4028/www.scientific.net/amr.981.262.

Full text
Abstract:
Massive Data analysis is becoming increasingly prominent in a variety of application fields ranging from scientific studies to business researches. In this paper, we demonstrate the necessity and possibility of using MapReduce [1] module on Hadoop System [2]. Furthermore, we conducted MapReduce module to implement Clustering Algorithms [3] on our Hadoop System [4] and improved the efficiency of the Clustering Algorithms sharply. We showed how to design parallel clustering algorithms based on Hadoop System. Experiments by different size of data demonstrate that our purposed clustering algorithms have good performance on speed-up, scale-up and size-up. So, it is suitable for big data mining and analysis.
APA, Harvard, Vancouver, ISO, and other styles
2

Meddah, Ishak H. A., Khaled Belkadi, and Mohamed Amine Boudia. "Parallel Mining Small Patterns from Business Process Traces." International Journal of Software Science and Computational Intelligence 8, no. 1 (January 2016): 32–45. http://dx.doi.org/10.4018/ijssci.2016010103.

Full text
Abstract:
Hadoop MapReduce has arrived to solve the problem of treatment of big data, also the parallel treatment, with this framework the authors analyze, process a large size of data. It based for distributing the work in two big steps, the map and the reduce steps in a cluster or big set of machines. They apply the MapReduce framework to solve some problems in the domain of process mining how provides a bridge between data mining and business process analysis, this technique consists to mine lot of information from the process traces; In process mining, there are two steps, correlation definition and the process inference. The work consists in first time of mining patterns whom are the work flow of the process from execution traces, those patterns present the work or the history of each party of the process, the authors' small patterns are represented in this work by finite state automaton or their regular expression, the authors have only two patterns to facilitate the process, the general presentation of the process is the combination of the small mining patterns. The patterns are represented by the regular expressions (ab)* and (ab*c)*. Secondly, they compute the patterns, and combine them using the Hadoop MapReduce framework, in this work they have two general steps, first the Map step, they mine small patterns or small models from business process, and the second is the combination of models as reduce step. The authors use the business process of two web applications, the SKYPE, and VIBER applications. The general result shown that the parallel distributed process by using the Hadoop MapReduce framework is scalable, and minimizes the execution time.
APA, Harvard, Vancouver, ISO, and other styles
3

Srinivasan, Sujatha, and T. Thirumalai Kumari. "Big data analytics tools a review." International Journal of Engineering & Technology 7, no. 3.3 (June 8, 2018): 685. http://dx.doi.org/10.14419/ijet.v7i2.33.15476.

Full text
Abstract:
Big data is the hottest trending term all over the globe and the internet. Big organizations are trying to make use of the large amounts of data collected and stored by them in big memory storages. Further large amounts of data is being produced every millisecond all over the world from users of computing devices, from satellites of all kinds, from scientific research, from governments, from big organizations that deal with huge number of customers especially financial institutions and many more. These data lie there for exploration and exploitation to gain more knowledge or rather intelligence and turning out them into wisdom for better decision making. Traditional data mining tools are not able to handle this big data. Hadoop and MapReduce are the first of the kind of tools that are being used to handle big data. Additional data mining and machine learning capabilities have been added to Hadoop and MapReduce through various plug-ins by different open source as well as vendor tools for big data analytics (BDA). Further big organizations have and are in the process of creating BDA tools most of which come with a price tag. This study gives a short review of the available BDA tools taking into consideration different characteristics of these tools. Possible solutions for existing challenges related to big data analytics are discussed.
APA, Harvard, Vancouver, ISO, and other styles
4

Chiang, Dai-Lun, Sheng-Kuan Wang, Yu-Ying Wang, Yi-Nan Lin, Tsang-Yen Hsieh, Cheng-Ying Yang, Victor R. L. Shen, and Hung-Wei Ho. "Modeling and Analysis of Hadoop MapReduce Systems for Big Data Using Petri Nets." Applied Artificial Intelligence 35, no. 1 (November 14, 2020): 80–104. http://dx.doi.org/10.1080/08839514.2020.1842111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Meddah, Ishak H. A., Khaled Belkadi, and Mohamed Amine Boudia. "Efficient Implementation of Hadoop MapReduce based Business Process Dataflow." International Journal of Decision Support System Technology 9, no. 1 (January 2017): 49–60. http://dx.doi.org/10.4018/ijdsst.2017010104.

Full text
Abstract:
Hadoop MapReduce is one of the solutions for the process of large and big data, with-it the authors can analyze and process data, it does this by distributing the computational in a large set of machines. Process mining provides an important bridge between data mining and business process analysis, his techniques allow for mining data information from event logs. Firstly, the work consists to mine small patterns from a log traces, those patterns are the workflow of the execution traces of business process. The authors' work is an amelioration of the existing techniques who mine only one general workflow, the workflow present the general traces of two web applications; they use existing techniques; the patterns are represented by finite state automaton; the final model is the combination of only two types of patterns whom are represented by the regular expressions. Secondly, the authors compute these patterns in parallel, and then combine those patterns using MapReduce, they have two parts the first is the Map Step, they mine patterns from execution traces and the second is the combination of these small patterns as reduce step. The results are promising; they show that the approach is scalable, general and precise. It reduces the execution time by the use of Hadoop MapReduce Framework.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, C., F. Hu, X. Hu, S. Zhao, W. Wen, and C. Yang. "A HADOOP-BASED DISTRIBUTED FRAMEWORK FOR EFFICIENT MANAGING AND PROCESSING BIG REMOTE SENSING IMAGES." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences II-4/W2 (July 10, 2015): 63–66. http://dx.doi.org/10.5194/isprsannals-ii-4-w2-63-2015.

Full text
Abstract:
Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
APA, Harvard, Vancouver, ISO, and other styles
7

Tyagi, Adhishtha, and Sonia Sharma. "A Framework of Security and Performance Enhancement for Hadoop." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 7 (July 30, 2017): 437. http://dx.doi.org/10.23956/ijarcsse/v7i6/0171.

Full text
Abstract:
Hadoop framework has been emerged as the most effective and widely adopted framework for Big Data processing. Map Reduce programming model is used for processing as well as generating large data sets. Data security has become an important issue as far as storage is concerned. By default theres no security mechanism in hadoop and it is the first choice of the business analyst and industrialists to store and manage data as well as theres a need to introduce security solutions to Hadoop in order to secure the important data in the Hadoop environment. We implemented and evaluated Dynamic Task Splitting Scheduler (DTSS) which explores the tradeoffs between fairness and data performance by splitting the tasks dynamically before processing in hadoop along with AES-MR (an Advanced Encryption Standard based encryption using mapreduce) encryption in MapReduce paradigm. This paper would be useful for beginners and researchers for understanding DTSS scheduling along with security.
APA, Harvard, Vancouver, ISO, and other styles
8

Song, Miao Miao, Zhe Li, Bin Zhou, and Chao Ling Li. "Cloud Computing Model for Big Geological Data Processing." Applied Mechanics and Materials 475-476 (December 2013): 306–11. http://dx.doi.org/10.4028/www.scientific.net/amm.475-476.306.

Full text
Abstract:
Geological data with phyletic and various, huge and complex data format, the analysis of geological data processing is mainly divided into three parts: Mines forecast, mine evaluation and mine positioning. Traditional geological data analysis model is limited by limited storage space and computational efficiency, and cannot meet the needs of a large number of geological data fast operations. "Big data technology" provides the ideal solution to the vast amounts of geological data management, information extraction, and comprehensive analysis. For mass storage capacity and high-speed computing power that the "big data technology" need, we built an intelligence systems applied to the analysis of geological data based on MapReduce and GPU double parallel processing cloud computing model. For a large number of geological data, using hadoop cluster system to solve the problem of large amounts of data storage, and designing efficient parallel processing method based on GPU (Graphics Processing Units: calculation of Graphics Processing unit), the method was applied to MapReduce framework, finally completing MapReduce and GPU double parallel processing cloud computing model to improve the operation speed of the system. Through theoretical modeling and experimental verification, indicating that the system can meet the analysis of geological data operation precision, the operation data amount and the operation speed.
APA, Harvard, Vancouver, ISO, and other styles
9

Manogaran, Gunasekaran, and Daphne Lopez. "Disease Surveillance System for Big Climate Data Processing and Dengue Transmission." International Journal of Ambient Computing and Intelligence 8, no. 2 (April 2017): 88–105. http://dx.doi.org/10.4018/ijaci.2017040106.

Full text
Abstract:
Ambient intelligence is an emerging platform that provides advances in sensors and sensor networks, pervasive computing, and artificial intelligence to capture the real time climate data. This result continuously generates several exabytes of unstructured sensor data and so it is often called big climate data. Nowadays, researchers are trying to use big climate data to monitor and predict the climate change and possible diseases. Traditional data processing techniques and tools are not capable of handling such huge amount of climate data. Hence, there is a need to develop advanced big data architecture for processing the real time climate data. The purpose of this paper is to propose a big data based surveillance system that analyzes spatial climate big data and performs continuous monitoring of correlation between climate change and Dengue. Proposed disease surveillance system has been implemented with the help of Apache Hadoop MapReduce and its supporting tools.
APA, Harvard, Vancouver, ISO, and other styles
10

Bu, Lingrui, Hui Zhang, Haiyan Xing, and Lijun Wu. "Research on parallel data processing of data mining platform in the background of cloud computing." Journal of Intelligent Systems 30, no. 1 (January 1, 2021): 479–86. http://dx.doi.org/10.1515/jisys-2020-0113.

Full text
Abstract:
Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.
APA, Harvard, Vancouver, ISO, and other styles
11

Manjula, Aakunuri, and G. Narsimha. "Using an Efficient Optimal Classifier for Soil Classification in Spatial Data Mining Over Big Data." Journal of Intelligent Systems 29, no. 1 (January 10, 2018): 172–88. http://dx.doi.org/10.1515/jisys-2017-0209.

Full text
Abstract:
Abstract This article proposes an effectual process for soil classification. The input data of the proposed procedure is the Harmonized World Soil Database. Preprocessing aids to generate enhanced representation and will use minimum time. Then, the MapReduce framework divides the input dataset into a complimentary portion that is held by the map task. In the map task, principal component analysis is used to reduce the data and the outputs of the maps are then contributed to reduce the tasks. Lastly, the proposed process is employed to categorize the soil kind by means of an optimal neural network (NN) classifier. Here, the conventional NN is customized using the optimization procedure. In an NN, the weights are optimized using the grey wolf optimization (GWO) algorithm. Derived from the classifier, we categorize the soil category. The performance of the proposed procedure is assessed by means of sensitivity, specificity, accuracy, precision, recall, and F-measure. The analysis results illustrate that the recommended artificial NN-GWO process has an accuracy of 90.46%, but the conventional NN and k-nearest neighbor classifiers have an accuracy value of 75.3846% and 75.38%, respectively, which is the least value compared to the proposed procedure. The execution is made by Java within the MapReduce framework using Hadoop.
APA, Harvard, Vancouver, ISO, and other styles
12

Maté, Alejandro, Hector Llorens, Elisa de Gregorio, Roberto Tardío, David Gil, Rafa Muñoz-Terol, and Juan Trujillo. "A Novel Multidimensional Approach to Integrate Big Data in Business Intelligence." Journal of Database Management 26, no. 2 (April 2015): 14–31. http://dx.doi.org/10.4018/jdm.2015040102.

Full text
Abstract:
The huge amount of information available and its heterogeneity has surpassed the capacity of current data management technologies. Dealing with huge amounts of structured and unstructured data, often referred as Big Data, is a hot research topic and a technological challenge. In this paper, the authors present an approach aimed to enable OLAP queries over different, heterogeneous, data sources. Their approach is based on a MapReduce paradigm, which integrates different formats into the recent RDF Data Cube format. The benefits of their approach are that it is capable of querying different sources of information, while maintaining at the same time, an integrated, comprehensive view of the data available. The paper discusses the advantages and disadvantages, as well as the implementation challenges that such approach presents. Furthermore, the approach is evaluated in detail by means of a case study.
APA, Harvard, Vancouver, ISO, and other styles
13

Mouyassir, Kawtar, Mohamed Hanine, and Hassan Ouahmane. "Business Intelligence Model to analyze Social Media through Big Data analytics." SHS Web of Conferences 119 (2021): 07006. http://dx.doi.org/10.1051/shsconf/202111907006.

Full text
Abstract:
Business Intelligence (BI) is a collection of tools, technologies, and practices that include the entire process of collecting, processing, and analyzing qualitative information, to help entrepreneurs better understand their business and marketplace. Every day, social networks expand at a faster rate and pace, which sees them as a source of Big Data. Therefore, BI is developed in the same way on VoC (Voice of Customer) expressed in social media as qualitative data for company decision-makers, who desire to have a clear perception of customers’ behaviour. In this article, we present a comparative study between traditional BI and social BI, then examine an approach to social business intelligence. Next, we are going to demonstrate the power of Big Data that can be integrated into BI so that we can finally describe in detail how Big Data technologies, like Apache Flume, help to collect unstructured data from various sources such as social media networks and store it in Hadoop storage.
APA, Harvard, Vancouver, ISO, and other styles
14

Murthy, Uday S., and Guido L. Geerts. "An REA Ontology-Based Model for Mapping Big Data to Accounting Information Systems Elements." Journal of Information Systems 31, no. 3 (May 1, 2017): 45–61. http://dx.doi.org/10.2308/isys-51803.

Full text
Abstract:
ABSTRACT The term “Big Data” refers to massive volumes of data that grow at an increasing rate and encompass complex data types such as audio and video. While the applications of Big Data and analytic techniques for business purposes have received considerable attention, it is less clear how external sources of Big Data relate to the transaction processing-oriented world of accounting information systems. This paper uses the Resource-Event-Agent Enterprise Ontology (REA) (McCarthy 1982; International Standards Organization [ISO] 2007) to model the implications of external Big Data sources on business transactions. The five-phase REA-based specification of a business transaction as defined in ISO (2007) is used to formally define associations between specific Big Data elements and business transactions. Using Big Data technologies such as Apache Hadoop and MapReduce, a number of information extraction patterns are specified for extracting business transaction-related information from Big Data. We also present a number of analytics patterns to demonstrate how decision making in accounting can benefit from integrating specific external Big Data sources and conventional transactional data. The model and techniques presented in this paper can be used by organizations to formalize the associations between external Big Data elements in their environment and their accounting information artifacts, to build architectures that extract information from external Big Data sources for use in an accounting context, and to leverage the power of analytics for more effective decision making.
APA, Harvard, Vancouver, ISO, and other styles
15

Verma, Neha, and Jatinder Singh. "An intelligent approach to Big Data analytics for sustainable retail environment using Apriori-MapReduce framework." Industrial Management & Data Systems 117, no. 7 (August 14, 2017): 1503–20. http://dx.doi.org/10.1108/imds-09-2016-0367.

Full text
Abstract:
Purpose The purpose of this paper is to explore various limitations of conventional mining systems in extracting useful buying patterns from retail transactional databases flooded with Big Data. The key objective is to assist retail business owners to better understand the purchase needs of their customers and hence to attract customers to physical retail stores away from competitor e-commerce websites. Design/methodology/approach This paper employs a systematic and category-based review of relevant literature to explore the challenges possessed by Big Data for retail industry followed by discussion and implementation of association between MapReduce based Apriori association mining and Hadoop-based intelligent cloud architecture. Findings The findings reveal that conventional mining algorithms have not evolved to support Big Data analysis as required by modern retail businesses. They require a lot of resources such as memory and computational engines. This study aims to develop MR-Apriori algorithm in the form of IRM tool to address all these issues in an efficient manner. Research limitations/implications The paper suggests that a lot of research is yet to be done in market basket analysis, if full potential of cloud-based Big Data framework is required to be utilized. Originality/value This research arms the retail business owners with innovative IRM tool to easily extract comprehensive knowledge of useful buying patterns of customers to increase profits. This study experimentally verifies the effectiveness of proposed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
16

Bo, Yang, and Wang Chunli. "Health data analysis based on multi-calculation of big data during COVID-19 pandemic." Journal of Intelligent & Fuzzy Systems 39, no. 6 (December 4, 2020): 8775–82. http://dx.doi.org/10.3233/jifs-189274.

Full text
Abstract:
Under the influence of the COVID-19, the analysis of physical health data is helpful to grasp the physical condition in time and promote the level of prevention and control of the epidemic. Especially for novel corona virus asymptomatic infections, the initial analysis of physical health data can help to detect the possibility of virus infection to some extent. The digital information system of traditional hospitals and other medical institutions is not perfect. For a large number of health data generated by smart medical technology, there is a lack of an effective storage, management, query and analysis platform. Especially, it lacks the ability of mining valuable information from big data. Aiming at the above problems, the idea of combining Struts 2 and Hadoop in the system architecture of the platform is proposed in this paper. Data mining association algorithm is adopted and improved based on MapReduce. A service platform for college students’ physical health is designed to solve the storage, processing and mining of health big data. The experiment result shows that the system can effectively complete the processing and analysis of the big data of College students’ physical health, which has a certain reference value for college students’ physical health monitoring during the COVID-19 epidemic.
APA, Harvard, Vancouver, ISO, and other styles
17

Anand, L., K. Senthilkumar, N. Arivazhagan, and V. Sivakumar. "Analysis for guaranteeing performance in map reduce systems with hadoop and R." International Journal of Engineering & Technology 7, no. 3.3 (June 8, 2018): 445. http://dx.doi.org/10.14419/ijet.v7i2.33.14207.

Full text
Abstract:
Corporates have fast developing measures of information to technique and store, an information blast goes ahead by USA. By and by one on the whole the chief regular ways to deal with treat these gigantic data amounts region units upheld the MapReduce parallel programming worldview. Though its utilization is across the board inside the exchange, guaranteeing execution limitations, while at a comparable time limiting costs, still gives escalated challenges. We have an angle to have a trend to propose a harsh grained administration hypothetical approach, bolstered procedures that have effectively attempted their quality inside the administration group. We have an angle to have a leaning to acquaint the essential equation with make dynamic models for substantial data MapReduce frameworks, running a matching business. What are a lot of we have a gradient to have a tendency to learn a join of central administration utilize cases: loose execution minor asset and strict execution. For the essential case we have a slant to have a leaning to build up a join of blame administration systems. An established criticism controller and a decent essentially based input that limits the measure of bunch reconfigurations still. In addition, to deal with strict execution necessities a bolster forward ambiguous controller that speedily stifles the ramifications of huge work estimate varieties is created. Every one of the controllers unit substantial on-line all through a benchmark running all through a genuine sixty hub MapReduce bunch, utilizing a data serious Business Intelligence work. Our investigations show the accomplishment of the administration courses used in soothing administration time requirements.
APA, Harvard, Vancouver, ISO, and other styles
18

Bathla, Gourav, Himanshu Aggarwal, and Rinkle Rani. "Migrating From Data Mining to Big Data Mining." International Journal of Engineering & Technology 7, no. 3.4 (June 25, 2018): 13. http://dx.doi.org/10.14419/ijet.v7i3.4.14667.

Full text
Abstract:
Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples. Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.
APA, Harvard, Vancouver, ISO, and other styles
19

Huang, Mingxia, Xuebo Yan, Zhu Bai, Haiqiang Zhang, and Zeen Xu. "Key Technologies of Intelligent Transportation Based on Image Recognition and Optimization Control." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 10 (January 9, 2020): 2054024. http://dx.doi.org/10.1142/s0218001420540245.

Full text
Abstract:
With the development of digital image processing technology, the application scope of image recognition is more and more wide, involving all aspects of life. In particular, the rapid development of urbanization and the popularization and application of automobiles in recent years have led to a sharp increase in traffic problems in various countries, resulting in intelligent transportation technology based on image processing optimization control becoming an important research field of intelligent systems. Aiming at the application demand analysis of intelligent transportation system, this paper designs a set of high-definition bayonet systems for intelligent transportation. It combines data mining technology and distributed parallel Hadoop technology to design the architecture and analysis of intelligent traffic operation state data analysis. The mining algorithm suitable for the system proves the feasibility of the intelligent traffic operation state data analysis system with the actual traffic big data experiment, and aims to provide decision-making opinions for the traffic state. Using the deployed Hadoop server cluster and the AdaBoost algorithm of the improved MapReduce programming model, the example runs large traffic data, performs traffic analysis and speed–overspeed analysis, and extracts information conducive to traffic control. It proves the feasibility and effectiveness of using Hadoop platform to mine massive traffic information.
APA, Harvard, Vancouver, ISO, and other styles
20

Vidisha Sharma, Satish Kumar Alaria. "Improving the Performance of Heterogeneous Hadoop Clusters Using Map Reduce." International Journal on Recent and Innovation Trends in Computing and Communication 7, no. 2 (February 28, 2019): 11–17. http://dx.doi.org/10.17762/ijritcc.v7i2.5225.

Full text
Abstract:
The key issue that emerges because of the tremendous development of connectivity among devices and frameworks is making such a great amount of data at an exponential rate that an achievable answer for preparing it is getting to be troublesome step by step. Thusly, building up a stage for such propelled dimension of data handling, equipment just as programming improvements should be led to come in level with such generous data. To enhance the proficiency of Hadoop bunches in putting away and dissecting big data, we have proposed an algorithmic methodology that will provide food the necessities of heterogeneous data put away .over Hadoop groups and enhance the execution just as effectiveness. The proposed paper intends to discover the adequacy of new calculation, correlation, proposals, and an aggressive way to deal with discover the best answer for enhancing the big data situation. The Map Reduce method from Hadoop will help in keeping up a nearby watch over the unstructured or heterogeneous Hadoop bunches with bits of knowledge on results obviously from the algorithm.in this paper we proposed new Generating another calculation to tackle these issues for the business just as non-business uses can help the advancement of network. The proposed calculation can help enhance the situation of data ordering calculation MapReduce in heterogeneous Hadoop groups. The exposition work and analyses directed under this work have copied very amazing outcomes, some of them being the selection of schedulers to plan employments, arrangement of data in similitude lattice, bunching before planning inquiries and in addition, iterative, mapping and diminishing and restricting the inner conditions together to stay away from question slowing down and execution times. The test led additionally sets up the way that if a procedure is characterized to deal with the diverse use case situations, one could generally lessen the expense of processing and can profit on depending on disseminated frameworks for quick executions.
APA, Harvard, Vancouver, ISO, and other styles
21

Vera-Baquero, Alejandro, Ricardo Colomo Palacios, Vladimir Stantchev, and Owen Molloy. "Leveraging big-data for business process analytics." Learning Organization 22, no. 4 (May 11, 2015): 215–28. http://dx.doi.org/10.1108/tlo-05-2014-0023.

Full text
Abstract:
Purpose – This paper aims to present a solution that enables organizations to monitor and analyse the performance of their business processes by means of Big Data technology. Business process improvement can drastically influence in the profit of corporations and helps them to remain viable. However, the use of traditional Business Intelligence systems is not sufficient to meet today ' s business needs. They normally are business domain-specific and have not been sufficiently process-aware to support the needs of process improvement-type activities, especially on large and complex supply chains, where it entails integrating, monitoring and analysing a vast amount of dispersed event logs, with no structure, and produced on a variety of heterogeneous environments. This paper tackles this variability by devising different Big-Data-based approaches that aim to gain visibility into process performance. Design/methodology/approach – Authors present a cloud-based solution that leverages (BD) technology to provide essential insights into business process improvement. The proposed solution is aimed at measuring and improving overall business performance, especially in very large and complex cross-organisational business processes, where this type of visibility is hard to achieve across heterogeneous systems. Findings – Three different (BD) approaches have been undertaken based on Hadoop and HBase. We introduced first, a map-reduce approach that it is suitable for batch processing and presents a very high scalability. Secondly, we have described an alternative solution by integrating the proposed system with Impala. This approach has significant improvements in respect with map reduce as it is focused on performing real-time queries over HBase. Finally, the use of secondary indexes has been also proposed with the aim of enabling immediate access to event instances for correlation in detriment of high duplication storage and synchronization issues. This approach has produced remarkable results in two real functional environments presented in the paper. Originality/value – The value of the contribution relies on the comparison and integration of software packages towards an integrated solution that is aimed to be adopted by industry. Apart from that, in this paper, authors illustrate the deployment of the architecture in two different settings.
APA, Harvard, Vancouver, ISO, and other styles
22

Yu, Rongrui, Chunqiong Wu, Bingwen Yan, Baoqin Yu, Xiukao Zhou, Yanliang Yu, and Na Chen. "Analysis of the Impact of Big Data on E-Commerce in Cloud Computing Environment." Complexity 2021 (May 26, 2021): 1–12. http://dx.doi.org/10.1155/2021/5613599.

Full text
Abstract:
This article starts with the analysis of the existing electronic commerce system, summarizes its characteristics, and analyzes and solves its existing problems. Firstly, the characteristics of the relational database My Structured Query Language (MySQL) and the distributed database HBase are analyzed, their respective advantages and disadvantages are summarized, and the advantages and disadvantages of each are taken into account when storing data. My SQL is used to store structured business data in the system, while HBase is used to store unstructured data such as pictures. These two storage mechanisms together constitute a data storage subsystem. Secondly, considering the large amount of data in the e-commerce system and the complex calculation of the data mining algorithm, this paper uses MapReduce to realize the parallelization of the data mining algorithm and builds a Hadoop-based commodity recommendation subsystem on this basis. We use JavaEE technology to design a full-featured web mall system. Finally, based on the impact of cloud computing, mobile e-commerce is analyzed, including relevant theories, service mode, architecture, core technology, and the application in e-commerce, which can realize e-commerce precision marketing, find the optimal path of logistics, and take effective security measures to avoid transaction risks. This method can avoid the disadvantages of the traditional e-commerce, where large-scale data cannot be processed in a timely manner, realize the value of mining data behind, and realize the precision marketing of e-commerce enterprises.
APA, Harvard, Vancouver, ISO, and other styles
23

Hayadi, B. Herawan, and Edy Victor Haryanto. "Data Encryption and Decryption Techniques for a High Secure Dataset using Artificial Intelligence." International Innovative Research Journal of Engineering and Technology 6, no. 1 (September 30, 2020): CS—27—CS—37. http://dx.doi.org/10.32595/iirjet.org/v6i1.2020.133.

Full text
Abstract:
The science of extracting patterns, trends, and actionable data analysis detail of large data sets. The growing existence of data in different county’s servers with structured, semi-structured, and unstructured data formats, such as the data. The demands of these are not met by conventional IT infrastructure, a modern landscape of "Data Analysis." For these reasons, several companies are turning to as a possible solution to this unmet commercial business, Hadoop (open-source projects). The amount of data collected by organizations, especially unstructured data, as businesses burst, Hadoop is increasingly emerging as one of the primary alternatives to store and execute operations on that data. The secondary question of data analysis is defense, the rapid increase in internet use, the dramatic shift in acceptance of people who use social media apps that allow users to generate content freely and intensify the already enormous amount of the site. In today's firms, there are a few stuff to bear in mind when starting innovation ventures for big data and analytics. In the business environment, the need for secure data analytics tools is mandatory. In the previous paper, they implemented a high profile dataset using the encryption technique. Using only the encryption method, cannot secure data very highly. There is a chance of knowing the original data to the third party. To reduce the above issues, the paper introduces a new technology called “Artificial intelligence". Using this new technology, paper can achieve more security for data sets. Using both encryption and decryption models in artificial intelligence can solve the drawback in an existing paper. This will provide the data with either a significant degree of authentication analyzed to ever be. The provision of data analytics is pursued with attribute-based restricts Data extraction allows enabled. This model will work better than the present model. In both security and sensitive economic restructuring, data analytical tools.
APA, Harvard, Vancouver, ISO, and other styles
24

Lu, You, Qiming Fu, Xuefeng Xi, and Zhenping Chen. "Cloud data acquisition and processing model based on blockchain." Journal of Intelligent & Fuzzy Systems 39, no. 4 (October 21, 2020): 5027–36. http://dx.doi.org/10.3233/jifs-179988.

Full text
Abstract:
Data outsourcing has gradually become a mainstream solution, but once data is outsourced, data owners will without the control of the data hardware, there is a possibility that the integrity of the data will be destroyed objectively. Many current studies have achieved low network overhead cloud data set verification by designing algorithmic structures (e.g., hashing, Merkel verification trees); however, cloud service providers may not recognize the incompleteness of cloud data to avoid liability or business factors fact. There is a need to build a secure, reliable, non-tamperable, and non-forgeable verification system for accountability. Blockchain is a chain-like data structure constructed by using data signatures, timestamps, hash functions, and proof-of-work mechanisms. Using blockchain technology to build an integrity verification system can achieve fault accountability. Blockchain is a chain-like data structure constructed by using data signatures, timestamps, hash functions, and proof-of-work mechanisms. Using blockchain technology to build an integrity verification system can achieve fault accountability. This paper uses the Hadoop framework to implement data collection and storage of the HBase system based on big data architecture. In summary, based on the research of blockchain cloud data collection and storage technology, based on the existing big data storage middleware, a large flow, high concurrency and high availability data collection and processing system has been realized.
APA, Harvard, Vancouver, ISO, and other styles
25

Concolato, Claude E., and Li M. Chen. "Data Science: A New Paradigm in the Age of Big-Data Science and Analytics." New Mathematics and Natural Computation 13, no. 02 (July 2017): 119–43. http://dx.doi.org/10.1142/s1793005717400038.

Full text
Abstract:
As an emergent field of inquiry, Data Science serves both the information technology world and the applied sciences. Data Science is a known term that tends to be synonymous with the term Big-Data; however, Data Science is the application of solutions found through mathematical and computational research while Big-Data Science describes problems concerning the analysis of data with respect to volume, variation, and velocity (3V). Even though there is not much developed in theory from a scientific perspective for Data Science, there is still great opportunity for tremendous growth. Data Science is proving to be of paramount importance to the IT industry due to the increased need for understanding the insurmountable amount of data being produced and in need of analysis. In short, data is everywhere with various formats. Scientists are currently using statistical and AI analysis techniques like machine learning methods to understand massive sets of data, and naturally, they attempt to find relationships among datasets. In the past 10 years, the development of software systems within the cloud computing paradigm using tools like Hadoop and Apache Spark have aided in making tremendous advances to Data Science as a discipline [Z. Sun, L. Sun and K. Strang, Big data analytics services for enhancing business intelligence, Journal of Computer Information Systems (2016), doi: 10.1080/08874417.2016.1220239]. These advances enabled both scientists and IT professionals to use cloud computing infrastructure to process petabytes of data on daily basis. This is especially true for large private companies such as Walmart, Nvidia, and Google. This paper seeks to address pragmatic ways of looking at how Data Science — with respect to Big-Data Science — is practiced in the modern world. We also examine how mathematics and computer science help shape Big-Data Science’s terrain. We will highlight how mathematics and computer science have significantly impacted the development of Data Science approaches, tools, and how those approaches pose new questions that can drive new research areas within these core disciplines involving data analysis, machine learning, and visualization.
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Jing-Doo. "A Novel Approach to Improve Quality Control by Comparing the Tagged Sequences of Product Traceability." MATEC Web of Conferences 201 (2018): 05002. http://dx.doi.org/10.1051/matecconf/201820105002.

Full text
Abstract:
Quality control is an essential issue for manufacture, especially when the manufacture is towards intelligent manufacturing that is associated with “Internet of thing”(IOT) and “Artificial Intelligence”(AI) to speed up the rate of product line automatically nowadays. To monitor product quality automatically, it is necessary to collect and monitor the data generated by sensors, or to record parameters by machine operators, or to save the types (brands) of materials used when producing products. In this study, it is assumed that the sequences of the traceability of unqualified products are different from that of qualified ones, and these different values (or points) within the sequences result in these products qualified or unqualified. This approach extracts maximal repeats from the tagged sequences of product traceability, and meanwhile computes the class frequency distribution of these repeats, where the classes, e.g. “qualified” or “unqualified”, are derived from the tags. Instead of inspecting all of the sequences of product traceability aimlessly, quality control engineers can filter out those maximal repeats whose frequency distributions are unique to specific classes and then just check the corresponding processes of these repeats. However, from the practical point of view, it should be estimated as a big-data problem to extract these maximal repeats and meanwhile compute their corresponding class frequency distribution from a huge amount of tagged sequential data. To have this work practical, this study uses one previous work that is based on Hadoop MapReduce programming model. and has been applied for an U.S.A patent (US Patent App. 15/208,994). Therefore, it is expected to be able to handle a huge amount of sequences of product traceability. With this approach that can narrow down the range for identifying false points (processes) within product line, it is expected to improve quality control by comparing tagged sequences of product traceability in the future.
APA, Harvard, Vancouver, ISO, and other styles
27

Tigua Moreira, Sonia, Edison Cruz Navarrete, and Geovanny Cordova Perez. "Big Data: paradigm in construction in the face of the challenges and challenges of the financial sector in the 21st century." Universidad Ciencia y Tecnología 25, no. 110 (August 26, 2021): 127–37. http://dx.doi.org/10.47460/uct.v25i110.485.

Full text
Abstract:
The world of finance is immersed in multiple controversies, laden with contradictions and uncertainties typical of a social ecosystem, generating dynamic changes that lead to significant transformations, where the thematic discussion of Big Data becomes crucial for real-time logical decision-making. In this field of knowledge is located this article, which reports as a general objective to explore the strengths, weaknesses and future trends of Big Data in the financial sector, using as a methodology for exploration a scientific approach with the bibliographic tools scopus and scielo, using as a search equation the Big Data, delimited to the financial sector. The findings showed the growing importance of gaining knowledge from the huge amount of financial data generated daily globally, developing predictive capacity towards creating scenarios inclined to find solutions and make timely decisions. Keywords: Big Data, financial sector, decision-making. References [1]D. Reinsel, J. Gantz y J. Rydning, «Data Age 2025: The Evolution of Data to Life-Critical,» IDC White Pape, 2017. [2]R. Barranco Fragoso, «Que es big data IBM Developer works,» 18 Junio 2012. [Online]. Available: https://developer.ibm.com/es/articles/que-es-big-data/. [3]IBM, «IBM What is big data? - Bringing big data to the enterprise,» 2014. [Online]. Available: http://www.ibm.com/big-data/us/en/. [4]IDC, «Resumen Ejecutivo -Big Data: Un mercado emergente.,» Junio 2012. [Online]. Available: https://www.diarioabierto.es/wp-content/uploads/2012/06/Resumen-Ejecutivo-IDC-Big-Data.pdf. [5]Factor humano Formación, «Factor humano formación escuela internacional de postgrado.,» 2014. [Online]. Available: http//factorhumanoformación.com/big-data-ii/. [6]J. Luna, «Las tecnologías Big Data,» 23 Mayo 2018. [Online]. Available: https://www.teldat.com/blog/es/procesado-de-big-data-base-de-datos-de-big-data-clusters-nosql-mapreduce/#:~:text=Tecnolog%C3%ADas%20de%20procesamiento%20Big%20Data&text=De%20este%20modo%20es%20posible,las%20necesidades%20de%20procesado%20disminuyan. [7]T.A.S Foundation, "Apache cassandra 2015", The apache cassandra project, 2015. [8]E. Dede, B. Sendir, P. Kuzlu, J. Hartog y M. Govindaraju, «"An Evaluation of Cassandra for Hadoop",» de 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, 2013. [9]The Apache Software Foundation, «"Apache HBase",» 04 Agosto 2017. [Online]. Available: http://hbase.apache.org/. [10]G. Deka, «"A Survey of Cloud Database Systems",» IT Professional, vol. 16, nº 02, pp. 50-57, 2014. [11]P. Dueñas, «Introducción al sistema financiero y bancario,» Bogotá. Politécnico Grancolombiano, 2008. [12]V. Mesén Figueroa, «Contabilización de CONTRATOS de FUTUROS, OPCIONES, FORWARDS y SWAPS,» Tec Empresarial, vol. 4, nº 1, pp. 42-48, 2010. [13] A. Castillo, «Cripto educación es lo que se necesita para entender el mundo de la Cripto-Alfabetización,» Noticias Artech Digital , 04 Junio 2018. [Online].Available: https://www.artechdigital.net/cripto-educacion-cripto-alfabetizacion/. [14]Conceptodefinicion.de, «Definicion de Cienciometría,» 16 Diciembre 2020. [Online]. Available: https://conceptodefinicion.de/cienciometria/. [15]Elsevier, «Scopus The Largest database of peer-reviewed literature» https//www.elsevier.com/solutions/scopus., 2016. [16]J. Russell, «Obtención de indicadores bibliométricos a partir de la utilización de las herramientas tradicionales de información,» de Conferencia presentada en el Congreso Internacional de información-INFO 2004, La Habana, Cuba, 2004. [17]J. Durán, Industrialized and Ready for Digital Transformation?, Barcelona: IESE Business School, 2015. [18]P. Orellana, «Omnicanalidad,» 06 Julio 2020. [Online]. Available: https://economipedia.com/definiciones/omnicanalidad.html. [19]G. Electrics, «Innovation Barometer,» 2018. [20]D. Chicoma y F. Casafranca, Interviewees, Entrevista a Daniel Chicoma y Fernando Casafranca, docentes del PADE Internacional en Gerencia de Tecnologías de la Información en ESAN. [Entrevista]. 2018. [21]L.R. La república, «La importancia del mercadeo en la actualidad,» 21 Junio 2013. [Online]. Available: https://www.larepublica.co/opinion/analistas/la-importancia-del-mercadeo-en-la-actualidad-2041232#:~:text=El%20mercadeo%20es%20cada%20d%C3%ADa,en%20los%20mercados%20(clientes). [22]UNED, «Acumulación de datos y Big data: Las preguntas correctas,» 10 Noviembre 2017. [Online]. Available: https://www.masterbigdataonline.com/index.php/en-el-blog/150-el-big-data-y-las-preguntas-correctas. [23]J. García, Banca aburrida: el negocio bancario tras la crisis económica, Fundacion Funcas - economía y sociedad, 2015, pp. 101 - 150. [24]G. Cutipa, «Las 5 principales ventajas y desventajas de bases de datos relacionales y no relacionales: NoSQL vs SQL,» 20 Abril 2020. [Online]. Available: https://guidocutipa.blog.bo/principales-ventajas-desventajas-bases-de-datos-relacionales-no-relacionales-nosql-vs-sql/. [25]R. Martinez, «Jornadas Big Data ANALYTICS,»19 Septiembre 2019. [Online]. Available: https://www.cfp.upv.es/formacion-permanente/curso/jornada-big-data-analytics_67010.html. [26]J. Rifkin, The End of Work: The Decline of the Global Labor Force and the Dawn of the Post-Market Era, Putnam Publishing Group, 1995. [27]R. Conde del Pozo, «Los 5 desafíos a los que se enfrenta el Big Data,» 13 Agosto 2019. [Online]. Available: https://diarioti.com/los-5-desafios-a-los-que-se-enfrenta-el-big-data/110607.
APA, Harvard, Vancouver, ISO, and other styles
28

Wahid, Ali, Steven Munkeby, and Samuel Sambasivam. "Machine Learning-based Flu Forecasting Study Using the Official Data from the Centers for Disease Control and Prevention and Twitter Data." Issues in Informing Science and Information Technology 18 (2021): 063–81. http://dx.doi.org/10.28945/4796.

Full text
Abstract:
Aim/Purpose: In the United States, the Centers for Disease Control and Prevention (CDC) tracks the disease activity using data collected from medical practice's on a weekly basis. Collection of data by CDC from medical practices on a weekly basis leads to a lag time of approximately 2 weeks before any viable action can be planned. The 2-week delay problem was addressed in the study by creating machine learning models to predict flu outbreak. Background: The 2-week delay problem was addressed in the study by correlation of the flu trends identified from Twitter data and official flu data from the Centers for Disease Control and Prevention (CDC) in combination with creating a machine learning model using both data sources to predict flu outbreak. Methodology: A quantitative correlational study was performed using a quasi-experimental design. Flu trends from the CDC portal and tweets with mention of flu and influenza from the state of Georgia were used over a period of 22 weeks from December 29, 2019 to May 30, 2020 for this study. Contribution: This research contributed to the body of knowledge by using a simple bag-of-word method for sentiment analysis followed by the combination of CDC and Twitter data to generate a flu prediction model with higher accuracy than using CDC data only. Findings: The study found that (a) there is no correlation between official flu data from CDC and tweets with mention of flu and (b) there is an improvement in the performance of a flu forecasting model based on a machine learning algorithm using both official flu data from CDC and tweets with mention of flu. Recommendations for Practitioners: In this study, it was found that there was no correlation between the official flu data from the CDC and the count of tweets with mention of flu, which is why tweets alone should be used with caution to predict a flu out-break. Based on the findings of this study, social media data can be used as an additional variable to improve the accuracy of flu prediction models. It is also found that fourth order polynomial and support vector regression models offered the best accuracy of flu prediction models. Recommendations for Researchers: Open-source data, such as Twitter feed, can be mined for useful intelligence benefiting society. Machine learning-based prediction models can be improved by adding open-source data to the primary data set. Impact on Society: Key implication of this study for practitioners in the field were to use social media postings to identify neighborhoods and geographic locations affected by seasonal outbreak, such as influenza, which would help reduce the spread of the disease and ultimately lead to containment. Based on the findings of this study, social media data will help health authorities in detecting seasonal outbreaks earlier than just using official CDC channels of disease and illness reporting from physicians and labs thus, empowering health officials to plan their responses swiftly and allocate their resources optimally for the most affected areas. Future Research: A future researcher could use more complex deep learning algorithms, such as Artificial Neural Networks and Recurrent Neural Networks, to evaluate the accuracy of flu outbreak prediction models as compared to the regression models used in this study. A future researcher could apply other sentiment analysis techniques, such as natural language processing and deep learning techniques, to identify context-sensitive emotion, concept extraction, and sarcasm detection for the identification of self-reporting flu tweets. A future researcher could expand the scope by continuously collecting tweets on a public cloud and applying big data applications, such as Hadoop and MapReduce, to perform predictions using several months of historical data or even years for a larger geographical area.
APA, Harvard, Vancouver, ISO, and other styles
29

Hanif, Muhammad, and Choonhwa Lee. "Jargon of Hadoop MapReduce scheduling techniques: a scientific categorization." Knowledge Engineering Review 34 (March 15, 2019). http://dx.doi.org/10.1017/s0269888918000371.

Full text
Abstract:
Abstract Recently, valuable knowledge that can be retrieved from a huge volume of datasets (called Big Data) set in motion the development of frameworks to process data based on parallel and distributed computing, including Apache Hadoop, Facebook Corona, and Microsoft Dryad. Apache Hadoop is an open source implementation of Google MapReduce that attracted strong attention from the research community both in academia and industry. Hadoop MapReduce scheduling algorithms play a critical role in the management of large commodity clusters, controlling QoS requirements by supervising users, jobs, and tasks execution. Hadoop MapReduce comprises three schedulers: FIFO, Fair, and Capacity. However, the research community has developed new optimizations to consider advances and dynamic changes in hardware and operating environments. Numerous efforts have been made in the literature to address issues of network congestion, straggling, data locality, heterogeneity, resource under-utilization, and skew mitigation in Hadoop scheduling. Recently, the volume of research published in journals and conferences about Hadoop scheduling has consistently increased, which makes it difficult for researchers to grasp the overall view of research and areas that require further investigation. A scientific literature review has been conducted in this study to assess preceding research contributions to the Apache Hadoop scheduling mechanism. We classify and quantify the main issues addressed in the literature based on their jargon and areas addressed. Moreover, we explain and discuss the various challenges and open issue aspects in Hadoop scheduling optimizations.
APA, Harvard, Vancouver, ISO, and other styles
30

"Indian Premier League Dataset Analytics using Hadoop-Hive." International Journal of Engineering and Advanced Technology 9, no. 2 (December 30, 2019): 3999–4004. http://dx.doi.org/10.35940/ijeat.b4579.129219.

Full text
Abstract:
Big Data is a term used to represent huge volume of both unstructured and structured data which cannot be processed by the traditional data processing techniques. This data is too huge, grows exponentially and doesn't fit into the structure of the traditional database systems. Analyzing Big Data is a very challenging task since it involves the processing of huge amount of data. As the industry or its business grows, the data related to the industries also tend to grow on a larger scale. Prominent data analysis tools are required to analyze the data in order to gain value out of it. Hadoop is a sought-after open source framework that uses MapReduce techniques to store and process huge datasets. However, the programs written using MapReduce techniques are not flexible and also require maintenance. This problem is overcome by making use of HiveQL. In order to execute queries in HiveQL, the platform required is Hive. It is an open-source data warehousing set-up built on Hadoop. HiveQL queries are compiled into MapReduce jobs that are executed utilizing Hadoop. In this paper we have analyzed the Indian Premier League dataset using HiveQL and compared its execution time with that of traditional SQL queries. It was found that the HiveQL provided better performance with larger dataset while SQL performed better with smaller datasets
APA, Harvard, Vancouver, ISO, and other styles
31

"Big data Performance Evalution of Map-Reduce Pig and Hive." International Journal of Engineering and Advanced Technology 8, no. 6 (August 30, 2019): 2982–85. http://dx.doi.org/10.35940/ijeat.f9002.088619.

Full text
Abstract:
Big data is nothing but unstructured and structured data which is not possible to process by our traditional system its not only have the volume of data also velocity and verity of data, Processing means ( store and analyze for knowledge information to take decision), Every living, non living and each and every device generates tremendous amount of data every fraction of seconds, Hadoop is a software frame work to process big data to get knowledge out of stored data and enhance the business and solve the societal problems, Hadoop basically have two important components HDFS and Map Reduce HDFS for store and mapreduce to process. HDFS includes name node and data nodes for storage, Map-Reduce includes frame works of Job tracker and Task tracker. Whenever client request Hadoop to store name node responds with available free memory data nodes then client will write data to respective data nodes then replication factor of hadoop copies the blocks of data with other data nodes to overcome fault tolerance Name node stores the meta of data nodes. Replication is for back-up as hadoop HDFS uses commodity hardware for storage, also name node have back-up secondary name node as only point of failure the hadoop. Whenever clients want to process the data, client request the name node Job tracker then Name node communicate to Task tracker for task done. All the above components of hadoop are frame works on-top of OS for efficient utilization and manage the system recourses for big data processing. Big data processing performance is measured with bench marks programs in our research work we compared the processing i.e. execution time of bench mark program word count with Hadoop Map-Reduce python Jar code, PIG script and Hive query with same input file big.txt. and we can say that Hive is much faster than PIG and Map-reduce Python jar code Map-reduce execution time is 1m, 29sec Pig Execution time is 57 sec Hive execution time is 31 sec
APA, Harvard, Vancouver, ISO, and other styles
32

Demirbaga, Umit. "HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data." Neural Computing and Applications, May 5, 2021. http://dx.doi.org/10.1007/s00521-021-06046-y.

Full text
Abstract:
AbstractTwitter produces a massive amount of data due to its popularity that is one of the reasons underlying big data problems. One of those problems is the classification of tweets due to use of sophisticated and complex language, which makes the current tools insufficient. We present our framework HTwitt, built on top of the Hadoop ecosystem, which consists of a MapReduce algorithm and a set of machine learning techniques embedded within a big data analytics platform to efficiently address the following problems: (1) traditional data processing techniques are inadequate to handle big data; (2) data preprocessing needs substantial manual effort; (3) domain knowledge is required before the classification; (4) semantic explanation is ignored. In this work, these challenges are overcome by using different algorithms combined with a Naïve Bayes classifier to ensure reliability and highly precise recommendations in virtualization and cloud environments. These features make HTwitt different from others in terms of having an effective and practical design for text classification in big data analytics. The main contribution of the paper is to propose a framework for building landslide early warning systems by pinpointing useful tweets and visualizing them along with the processed information. We demonstrate the results of the experiments which quantify the levels of overfitting in the training stage of the model using different sizes of real-world datasets in machine learning phases. Our results demonstrate that the proposed system provides high-quality results with a score of nearly 95% and meets the requirement of a Hadoop-based classification system.
APA, Harvard, Vancouver, ISO, and other styles
33

Bai, Xujing, and Jiajun Li. "Intelligent platform for real-time page view statistics using educational big data digital resource sharing." Journal of Intelligent & Fuzzy Systems, September 25, 2020, 1–10. http://dx.doi.org/10.3233/jifs-189325.

Full text
Abstract:
In order to meet the rapid growth of educational data, to automate the processing of educational data business, improve operational efficiency and scientific decision-making, a statistical analysis platform for educational data is designed, and Hadoop-based education is designed from the conceptual model, logical model, and physical model. Data warehouse; designed and researched the storage of educational multidimensional data model; and then compared and tested the query efficiency and storage space of HBase and Hive in the Hadoop ecosystem based on educational big data, and used HBase+Hive integrated architecture to complete the education data The statistical analysis tasks and the function of the educational data statistical analysis platform are transplanted to the educational big data platform based on Hadoop; the performance test of the conversion efficiency of educational big data in the ETL link is performed, which illustrates the effectiveness of the educational big data platform based on Hadoop. An object-oriented analysis and design method used to analyze and design the business requirements of teaching resource sharing services. From the perspective of managers and teachers, use case diagrams and use case description tables to define system business requirements. The role of teachers is further refined as the theme of teaching and research. Participants, participants in the subject teaching and research, initiators of simulation teaching research and development, participants, famous teachers, high-quality course judges and experts. The recording, accumulation, statistics and analysis of students’ learning behaviors will provide more valuable applications for school education.
APA, Harvard, Vancouver, ISO, and other styles
34

Banica, Logica, Persefoni Polychronidou, Cristian Stefan, and Alina Hagiu. "Empowering IT Operations through Artificial Intelligence – A New Business Perspective." KnE Social Sciences, January 12, 2020. http://dx.doi.org/10.18502/kss.v4i1.6003.

Full text
Abstract:
This paper aims to describe the concept of applying Artificial Intelligence to IT Operations (AIOps) and its main components, Big Data, Machine Learning and Trend Analysis. The concept was implemented by developing a multi-layered fusion of the technologies that powers the components in AIOps platforms present on the IT market. The core of an AIOps platform is represented by the Big Data organization structure and by a massive parallel data processing platform like Apache Hadoop. The ML component of the platform is able to infer the future behaviour and the regular operations that are performed from the large volume of collected data, in order to develop the ability to automate the activities. AIOps platforms find their place especially in very complex IT infrastructures, ones that require constant monitoring and quick decisions in case of failures. The case study is based on the Moogsoft AIOps platform, and its features are presented in detail, using the Cloud trial version, clearly showing the potential of such an advanced tool for infrastructure monitoring and reporting. The experiment was focused on the way Moogsoft is monitoring computing resources, is handling events and records alerts for the defined timespan, alerts grouped by category (like web services, social media, networking). The platform is also able to display at any given moment the unresolved situations and their type of origin, and includes automated remediation tools. The study presents the features of this software category, consisting in benefits for the business environment and their integration into the Internet-of-Things model. Keywords: Big Data, Machine Learning, AIOps, business performance.
APA, Harvard, Vancouver, ISO, and other styles
35

"Improved Hadoop Cluster Performance by Dynamic Load and Resource Aware Speculative Execution and Straggler Node Detection." International Journal of Engineering and Advanced Technology 9, no. 4 (April 30, 2020): 2370–77. http://dx.doi.org/10.35940/ijeat.d8017.049420.

Full text
Abstract:
The big data is one of the fastest growing technologies, which can to handle huge amounts of data from various sources, such as social media, web logs, banking and business sectors etc. In order to pace with the changes in the data patterns and to accommodate the requirements of big data analytics, the platform for storage and processing such as Hadoop, also requires great advancements. Hadoop, an open source project executes the big data processing job in map and reduce phases and follows master-slave architecture. A Hadoop MapReduce job can be delayed if one of its many tasks is being assigned to an unreliable or congested machine. To solve this straggler problem, a novel algorithm design of speculative execution schemes for parallel processing clusters, from an optimization perspective, under different loading conditions is proposed. For the lightly loaded case, a task cloning scheme, namely, the combined file task cloning algorithm, which is based on maximizing the overall system utility, a straggler detection algorithm is proposed based on a workload threshold. The detection and cloning of tasks assigned with the stragglers only will not be enough to enhance the performance unless cloning of tasks is allocated in a resource aware method. So, a method is proposed which identifies and optimizes the resource allocation by considering all possible aspects of cluster performance balancing. One main issue arises due to the pre configuration of distinct map and reduce slots based on the number of files in the input folder. This can cause severe under-utilization of slot as map slots might not be fully utilized with respect to the input splits. To solve this issue, an alternative technique of Hadoop Slot Allocation is introduced in this paper by keeping the efficient management of slots model. The combine file task cloning algorithm combines the files which are less than the size of a single data block and executes them in the highly performing data node. On implementing these efficient cloning and combining techniques on a heavily loaded cluster after detecting the straggler, machine is found to reduce the elapsed time of execution to an average of 40%. The detection algorithm improves the overall performance of the heavily loaded cluster by 20% of the total elapsed time in comparison with the native Hadoop algorithm.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography