Academic literature on the topic 'Huge datasets'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Huge datasets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Huge datasets"

1

Liu, Wantao, Brian Tieman, Rajkumar Kettimuthu, and Ian Foster. "Moving huge scientific datasets over the Internet." Concurrency and Computation: Practice and Experience 23, no. 18 (July 6, 2011): 2404–20. http://dx.doi.org/10.1002/cpe.1779.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mohan, Shyam, and Shanmugapriya P. "Clustering of huge datasets using Machine Intelligence Techniques." International Journal of Computer Applications 181, no. 18 (September 18, 2018): 8–14. http://dx.doi.org/10.5120/ijca2018917856.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mohan, Shyam, and Shanmugapriya P. "Clustering Algorithms for Huge Datasets: A Mathematical Approach." International Journal of Computer Applications 181, no. 49 (April 11, 2019): 58–62. http://dx.doi.org/10.5120/ijca2019918724.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Peng, Mingyuan, Lifu Zhang, Xuejian Sun, Yi Cen, and Xiaoyang Zhao. "A Fast Three-Dimensional Convolutional Neural Network-Based Spatiotemporal Fusion Method (STF3DCNN) Using a Spatial-Temporal-Spectral Dataset." Remote Sensing 12, no. 23 (November 27, 2020): 3888. http://dx.doi.org/10.3390/rs12233888.

Full text
Abstract:
With the growing development of remote sensors, huge volumes of remote sensing data are being utilized in related applications, bringing new challenges to the efficiency and capability of processing huge datasets. Spatiotemporal remote sensing data fusion can restore high spatial and high temporal resolution remote sensing data from multiple remote sensing datasets. However, the current methods require long computing times and are of low efficiency, especially the newly proposed deep learning-based methods. Here, we propose a fast three-dimensional convolutional neural network-based spatiotemporal fusion method (STF3DCNN) using a spatial-temporal-spectral dataset. This method is able to fuse low-spatial high-temporal resolution data (HTLS) and high-spatial low-temporal resolution data (HSLT) in a four-dimensional spatial-temporal-spectral dataset with increasing efficiency, while simultaneously ensuring accuracy. The method was tested using three datasets, and discussions of the network parameters were conducted. In addition, this method was compared with commonly used spatiotemporal fusion methods to verify our conclusion.
APA, Harvard, Vancouver, ISO, and other styles
5

Kamala, Rosita, and Ranjit Jeba Thangaiah. "An Improved Hybrid Feature Selection Method for Huge Dimensional Datasets." IAES International Journal of Artificial Intelligence (IJ-AI) 8, no. 1 (March 1, 2019): 77. http://dx.doi.org/10.11591/ijai.v8.i1.pp77-86.

Full text
Abstract:
<span>Variable Selection is the most essential function in predictive analytics, that reduces the dimensionality, without losing an appropriate information by selecting a few significant features of machine learning problems. The major techniques involved in this process are filter and wrapper methodologies. While filters measure the weight of features based on the attribute weighting criterion, the wrapper approach computes the competence of the variable selection algorithms. The wrapper approach is achieved by the selection of feature subgroups by pruning the feature space in its search space. The objective of this paper is to choose the most favourable attribute subset from the novel set of features, by using the combination method that unites the merits of filters and wrappers. To achieve this objective, an Improved Hybrid Feature Selection(IMFS) method is performed to create well-organized learners. The results of this study shows that the IMFS algorithm can build competent business applications, which have got a better precision than that of the constructed which is stated by the previous hybrid variable selection algorithms. Experimentation with UCI (University of California, Irvine) repository datasets affirms that this method have got better prediction performance, more robust to input noise and outliers, balances well with the available features, when performed comparison with the present algorithms in the literature review.</span>
APA, Harvard, Vancouver, ISO, and other styles
6

Fu, Yu, and Jun Rui Yang. "Association Rules Optimization Algorithm Based on Fuzzy Clustering." Applied Mechanics and Materials 602-605 (August 2014): 3536–39. http://dx.doi.org/10.4028/www.scientific.net/amm.602-605.3536.

Full text
Abstract:
Frequent pattern mining has been an important research direction in association rules. This paper use a methodology by preprocessing the original dataset using fuzzy clustering which can mapped quantitative datasets into linguistic datasets. Then we propose a algorithm based on fuzzy frequent pattern tree for extracting fuzzy frequent itemset from mapped linguistic datasets. Experimental results show that our algorithm is shorter than the F-Apriori on computing time to huge database. For large database, the algorithm presented in this paper is proved to have a good prospect.
APA, Harvard, Vancouver, ISO, and other styles
7

Prakash, R. Vijaya, S. S. V. N. Sarma, and M. Sheshikala. "Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 6 (December 1, 2018): 4568. http://dx.doi.org/10.11591/ijece.v8i6.pp4568-4576.

Full text
Abstract:
Association Rule mining plays an important role in the discovery of knowledge and information. Association Rule mining discovers huge number of rules for any dataset for different support and confidence values, among this many of them are redundant, especially in the case of multi-level datasets. Mining non-redundant Association Rules in multi-level dataset is a big concern in field of Data mining. In this paper, we present a definition for redundancy and a concise representation called Reliable Exact basis for representing non-redundant Association Rules from multi-level datasets. The given non-redundant Association Rules are loss less representation for any datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Thaseen, Ikram Sumaiya, Vanitha Mohanraj, Sakthivel Ramachandran, Kishore Sanapala, and Sang-Soo Yeo. "A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things." Electronics 10, no. 16 (August 13, 2021): 1955. http://dx.doi.org/10.3390/electronics10161955.

Full text
Abstract:
In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS.
APA, Harvard, Vancouver, ISO, and other styles
9

de Alfonso, C., V. Hernández, and I. Blanquer. "Large Medical Datasets on the Grid." Methods of Information in Medicine 44, no. 02 (2005): 172–76. http://dx.doi.org/10.1055/s-0038-1633940.

Full text
Abstract:
Summary Objective: This paper shows the use of the emerging Grid technology for gathering underused resources that are distributed among a corporate network. The work of these resources is coordinated for facing tasks which are not affordable by the individual usage of each of them. Methods: This paper shows an application for the projection, using Volume Rendering techniques, of huge medical volumes obtained from CTs and RMIs, adapted to Grid computing. Results: As a result the article shows the feasibility of the creation of an application based up on Grid technology, which solves problems that cannot be addressed by using common techniques. As an example, the article describes the projection of a huge medical dataset, which exceeds the resources of most common PCs, carried out by taking profit of idle CPU cycles from the computers of an organization. Conclusions: Grid technology is emerging as a new framework which allows gathering and coordinating resources distributed among a network (LAN or WAN), for addressing problems which cannot be solved through the single use of any of these resources. Medical Imaging is a clear application area for this technology.
APA, Harvard, Vancouver, ISO, and other styles
10

B. Kamdar, Apexa, and Jay M. Jagani. "A survey: classification of huge cloud Datasets with efficient Map - Reduce policy." International Journal of Engineering Trends and Technology 18, no. 2 (December 25, 2014): 103–7. http://dx.doi.org/10.14445/22315381/ijett-v18p218.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Huge datasets"

1

Lundgren, Therese. "Digitizing the Parthenon using 3D Scanning : Managing Huge Datasets." Thesis, Linköping University, Department of Science and Technology, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2636.

Full text
Abstract:

Digitizing objects and environments from real world has become an important part of creating realistic computer graphics. Through the use of structured lighting and laser time-of-flight measurements the capturing of geometric models is now a common process. The result are visualizations where viewers gain new possibilities for both visual and intellectual experiences.

This thesis presents the reconstruction of the Parthenon temple and its environment in Athens, Greece by using a 3D laser-scanning technique.

In order to reconstruct a realistic model using 3D scanning techniques there are various phases in which the acquired datasets have to be processed. The data has to be organized, registered and integrated in addition to pre and post processing. This thesis describes the development of a suitable and efficient data processing pipeline for the given data.

The approach differs from previous scanning projects considering digitizing this large scale object at very high resolution. In particular the issue managing and processing huge datasets is described.

Finally, the processing of the datasets in the different phases and the resulting 3D model of the Parthenon is presented and evaluated.

APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Hang. "Distributed Support Vector Machine With Graphics Processing Units." ScholarWorks@UNO, 2009. http://scholarworks.uno.edu/td/991.

Full text
Abstract:
Training a Support Vector Machine (SVM) requires the solution of a very large quadratic programming (QP) optimization problem. Sequential Minimal Optimization (SMO) is a decomposition-based algorithm which breaks this large QP problem into a series of smallest possible QP problems. However, it still costs O(n2) computation time. In our SVM implementation, we can do training with huge data sets in a distributed manner (by breaking the dataset into chunks, then using Message Passing Interface (MPI) to distribute each chunk to a different machine and processing SVM training within each chunk). In addition, we moved the kernel calculation part in SVM classification to a graphics processing unit (GPU) which has zero scheduling overhead to create concurrent threads. In this thesis, we will take advantage of this GPU architecture to improve the classification performance of SVM.
APA, Harvard, Vancouver, ISO, and other styles
3

Sung, Chih-Hsuan, and 宋芝萱. "Aleatory Variability of Ground-motion Predition Equations Deduced from a Huge Dataset in Taiwan." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/shut33.

Full text
Abstract:
博士
國立中央大學
應用地質研究所
105
In this study, we use 19,887 records for 150 crustal earthquakes with moment magnitudes greater than 4.0 obtained from the Taiwan Strong-Motion Instrumentation Program network to build the Taiwan ground-motion prediction equations (GMPEs) for peak ground acceleration and spectral accelerations. The nonlinear regression analysis of ground-motion prediction model is the mixed-effect model with maximum likelihood method. Though this regression analysis to discuss the relationship of source, path, and site. This paper describes the approaches for the presentation of the components of the error in ground-motion estimates for future earthquakes: (1) spatial-correlation mobile widow, (2)path diagram, (3) semi-variogram, (4) closeness index and (5) the distance of epicenter. Comparing the results with those obtained with the same data, but using the closeness index, semi-variogram and the distance of epicenter approaches, show that we get a lower path-to-path sigma with the combination of the spatial-correlation mobile window and the path diagram methods. For peak ground acceleration and spectral accelerations at periods of 0.3 s, 1.0 s, and 3.0 s, the path-to-path standard deviations obtained in the new approaches are 40%–55% smaller than the total standard deviation. We also set up the ground-motion prediction equations for the single station, single source and single source to an array in this study. When we use these specific conditions GMPEs to analyze the variance, we can obtain the smaller single-station sigma, single-path sigma, and intra-event aleatory variability than general GMPEs. If we only use aleatory variability in PSHA, then the resultant hazard level would be 20% lower than the traditional one in 2475 year.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Huge datasets"

1

Marsh, Michael, David Farrell, and Theresa Reidy, eds. The post-crisis Irish voter. Manchester University Press, 2018. http://dx.doi.org/10.7228/manchester/9781526122643.001.0001.

Full text
Abstract:
This is the definitive study of the Irish general election of 2016 – the most dramatic election in a generation, which among other things resulted in the worst electoral outcome for Ireland’s established parties, the most fractionalized party system in the history of the state, and the emergence of new parties and groups, some of these of a ‘populist’ hue. This was one of the most volatile elections in Ireland (and among one of the most volatile elections in Europe), with among the lowest of election turnouts in the state’s history. These outcomes follow a pattern seen across a number of Western Europe’s established democracies in which the ‘deep crisis’ of the Great Recession has wreaked havoc on party systems. The objective of this book is to assess this most extraordinary of Irish elections both in its Irish and wider cross-national context. With contributions from leading scholars on Irish elections and parties, and using a unique dataset – the Irish National Election Study (INES) 2016 – this volume explores voting patterns at Ireland’s first post crisis election and it considers the implications for the electoral landscape and politics in Ireland. This book will be of interest to scholars of parties and elections. It should provide important supplementary reading to any university courses on Irish politics. And it should also be of interest to general readers interested in contemporary Irish affairs.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Huge datasets"

1

Parvin, Hamid, Behrouz Minaei, and Hosein Alizadeh. "A Heuristic Classifier Ensemble for Huge Datasets." In Active Media Technology, 29–38. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-23620-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Parvin, Hamid, Behrouz Minaei-Bidgoli, and Sajad Parvin. "A Scalable Heuristic Classifier for Huge Datasets: A Theoretical Approach." In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 380–90. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25085-9_45.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Díaz-Pacheco, Angel, and Carlos Alberto Reyes-García. "Full Model Selection in Huge Datasets and for Proxy Models Construction." In Advances in Soft Computing, 171–82. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-04491-6_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

de Haro-García, Aida, Javier Pérez-Rodríguez, and Nicolás García-Pedrajas. "A Comparison of Two Strategies for Scaling Up Instance Selection in Huge Datasets." In Advances in Artificial Intelligence, 64–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25274-7_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Makridis, Michail, Raúl Fidalgo-Merino, José-Antonio Cotelo-Lema, Aris Tsois, and Enrico Checchi. "A Quality Assessment Framework for Large Datasets of Container-Trips Information." In Computer Information Systems and Industrial Management, 729–40. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-45378-1_63.

Full text
Abstract:
Abstract Customs worldwide are facing the challenge of supervising huge volumes of containerized trade arriving to their country with resources allowing them to inspect only a minimal fraction of it. Risk assessment procedures can support them on the selection of the containers to inspect. The Container-Trip information (CTI) is an important element for that evaluation, but is usually not available with the needed quality. Therefore, the quality of the computed CTI records from any data sources that may use (e.g. Container Status Messages), needs to be assessed. This paper presents a quality assessment framework that combines quantitative and qualitative domain specific metrics to evaluate the quality of large datasets of CTI records and to provide a more complete feedback on which aspects need to be revised to improve the quality of the output data. The experimental results show the robustness of the framework in highlighting the weak points on the datasets and in identifying efficiently cases of potentially wrong CTI records.
APA, Harvard, Vancouver, ISO, and other styles
6

Quicke, Donald L. J., Buntika A. Butcher, and Rachel A. Kruft Welton. "Very basic R syntax." In Practical R for biologists: an introduction, 9–12. Wallingford: CABI, 2021. http://dx.doi.org/10.1079/9781789245349.0009.

Full text
Abstract:
Abstract R is a programming language that has a huge range of inbuilt statistical and graphical functions. Firstly, this chapter shows how R works by talking you through a number of exercises, often producing graphical output, so you will get to know how to write simple code and become familiar with some of the most commonly used R functions for manipulating data and doing simple calculations. For ease, the chapter will firstly use a non-biological type of example. Thereafter, it will enter, display and analyse a number of real biological or medical datasets as might be obtained in student class experiments or fieldwork projects. Further on, it will present an outline of statistical tests appropriate to various types of data that you will come across.
APA, Harvard, Vancouver, ISO, and other styles
7

Quicke, Donald L. J., Buntika A. Butcher, and Rachel A. Kruft Welton. "Very basic R syntax." In Practical R for biologists: an introduction, 9–12. Wallingford: CABI, 2021. http://dx.doi.org/10.1079/9781789245349.0003a.

Full text
Abstract:
Abstract R is a programming language that has a huge range of inbuilt statistical and graphical functions. Firstly, this chapter shows how R works by talking you through a number of exercises, often producing graphical output, so you will get to know how to write simple code and become familiar with some of the most commonly used R functions for manipulating data and doing simple calculations. For ease, the chapter will firstly use a non-biological type of example. Thereafter, it will enter, display and analyse a number of real biological or medical datasets as might be obtained in student class experiments or fieldwork projects. Further on, it will present an outline of statistical tests appropriate to various types of data that you will come across.
APA, Harvard, Vancouver, ISO, and other styles
8

Parvin, Hamid, Behrouz Minaei, Hosein Alizadeh, and Akram Beigi. "A Novel Classifier Ensemble Method Based on Class Weightening in Huge Dataset." In Advances in Neural Networks – ISNN 2011, 144–50. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-21090-7_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kirci, Pinar. "Intelligent Techniques for Analysis of Big Data About Healthcare and Medical Records." In Handbook of Research on Promoting Business Process Improvement Through Inventory Control Techniques, 559–82. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-3232-3.ch029.

Full text
Abstract:
To define huge datasets, the term of big data is used. The considered “4 V” datasets imply volume, variety, velocity and value for many areas especially in medical images, electronic medical records (EMR) and biometrics data. To process and manage such datasets at storage, analysis and visualization states are challenging processes. Recent improvements in communication and transmission technologies provide efficient solutions. Big data solutions should be multithreaded and data access approaches should be tailored to big amounts of semi-structured/unstructured data. Software programming frameworks with a distributed file system (DFS) that owns more units compared with the disk blocks in an operating system to multithread computing task are utilized to cope with these difficulties. Huge datasets in data storage and analysis of healthcare industry need new solutions because old fashioned and traditional analytic tools become useless.
APA, Harvard, Vancouver, ISO, and other styles
10

Sakri, Sapiah, Jaizah Othman, and Noreha Halid. "Hybridisation of Feature Selection and Classification Techniques in Credit Risk Assessment Modelling." In Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques. IOS Press, 2020. http://dx.doi.org/10.3233/faia200581.

Full text
Abstract:
In recent years, the use of artificial intelligence techniques to manage credit risk has represented an improvement over conventional methods. Furthermore, small improvements to credit scoring systems and default forecasting can support huge profits. Accordingly, banks and financial institutions have a high interest in any changes. The literature shows that the use of feature selection techniques can reduce the dimensionality problems in most credit risk datasets, and, thus, improve the performance of the credit risk model. Many other works also indicated that various classification approaches would also affect the performance of the credit risk assessment modelling. In this research, based on the new proposed framework, we investigated the effect of various filter-based feature selection techniques with various classification approaches, namely, single and ensemble classifiers, on three credit datasets (German, Australian, and Japanese credit risk datasets) with the aim of improving the performance of the credit risk model. All single and ensemble classifier-based models were evaluated using four of the most used performance metrics for assessing financial stress models. From the comparison analysis between, with, and without applying the feature selection and across the three credit datasets, the Random-Forest + Information-Gain model achieved a better trade-off in improving the model’s accuracy rate with the value of 96% for the Australian credit dataset. This model also obtained the lowest Type I error with the value of 4% for the German credit dataset, the lowest Type II error with the value of 2% for the German credit dataset and the highest value of G-mean of 95% for the Australian credit dataset. The results clearly indicate that the Random-Forest + Information-Gain model is an excellent predictor for the credit risk cases.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Huge datasets"

1

Papa, Joao P., Fabio A. M. Cappabianco, and Alexandre Xavier Falcao. "Optimizing Optimum-Path Forest Classification for Huge Datasets." In 2010 20th International Conference on Pattern Recognition (ICPR). IEEE, 2010. http://dx.doi.org/10.1109/icpr.2010.1012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Konopko, Joanna. "Distributed and parallel approach for handle and perform huge datasets." In INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2015 (ICCMSE 2015). AIP Publishing LLC, 2015. http://dx.doi.org/10.1063/1.4938794.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Angiulli, Fabrizio, and Gianluigi Folino. "A grid-based architecture for nearest neighbor based condensation of huge datasets." In the third international workshop. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1384209.1384213.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Jun, Qiang Tang, Afonso Arriaga, and Peter Y. A. Ryan. "Novel Collaborative Filtering Recommender Friendly to Privacy Protection." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/668.

Full text
Abstract:
Nowadays, recommender system is an indispensable tool in many information services, and a large number of algorithms have been designed and implemented. However, fed with very large datasets, state-of-the-art recommendation algorithms often face an efficiency bottleneck, i.e., it takes huge amount of computing resources to train a recommendation model. In order to satisfy the needs of privacy-savvy users who do not want to disclose their information to the service provider, the complexity of most existing solutions becomes prohibitive. As such, it is an interesting research question to design simple and efficient recommendation algorithms that achieve reasonable accuracy and facilitate privacy protection at the same time. In this paper, we propose an efficient recommendation algorithm, named CryptoRec, which has two nice properties: (1) can estimate a new user's preferences by directly using a model pre-learned from an expert dataset, and the new user's data is not required to train the model; (2) can compute recommendations with only addition and multiplication operations. As to the evaluation, we first test the recommendation accuracy on three real-world datasets and show that CryptoRec is competitive with state-of-the-art recommenders. Then, we evaluate the performance of the privacy-preserving variants of CryptoRec and show that predictions can be computed in seconds on a PC. In contrast, existing solutions will need tens or hundreds of hours on more powerful computers.
APA, Harvard, Vancouver, ISO, and other styles
5

Xu, Ziru, Yunbo Wang, Mingsheng Long, and Jianmin Wang. "PredCNN: Predictive Learning with Cascade Convolutions." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/408.

Full text
Abstract:
Predicting future frames in videos remains an unsolved but challenging problem. Mainstream recurrent models suffer from huge memory usage and computation cost, while convolutional models are unable to effectively capture the temporal dependencies between consecutive video frames. To tackle this problem, we introduce an entirely CNN-based architecture, PredCNN, that models the dependencies between the next frame and the sequential video inputs. Inspired by the core idea of recurrent models that previous states have more transition operations than future states, we design a cascade multiplicative unit (CMU) that provides relatively more operations for previous video frames. This newly proposed unit enables PredCNN to predict future spatiotemporal data without any recurrent chain structures, which eases gradient propagation and enables a fully paralleled optimization. We show that PredCNN outperforms the state-of-the-art recurrent models for video prediction on the standard Moving MNIST dataset and two challenging crowd flow prediction datasets, and achieves a faster training speed and lower memory footprint.
APA, Harvard, Vancouver, ISO, and other styles
6

Luo, Chuan, Bo Qiao, Xin Chen, Pu Zhao, Randolph Yao, Hongyu Zhang, Wei Wu, Andrew Zhou, and Qingwei Lin. "Intelligent Virtual Machine Provisioning in Cloud Computing." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/208.

Full text
Abstract:
Virtual machine (VM) provisioning is a common and critical problem in cloud computing. In industrial cloud platforms, there are a huge number of VMs provisioned per day. Due to the complexity and resource constraints, it needs to be carefully optimized to make cloud platforms effectively utilize the resources. Moreover, in practice, provisioning a VM from scratch requires fairly long time, which would degrade the customer experience. Hence, it is advisable to provision VMs ahead for upcoming demands. In this work, we formulate the practical scenario as the predictive VM provisioning (PreVMP) problem, where upcoming demands are unknown and need to be predicted in advance, and then the VM provisioning plan is optimized based on the predicted demands. Further, we propose Uncertainty-Aware Heuristic Search (UAHS) for solving the PreVMP problem. UAHS first models the prediction uncertainty, and then utilizes the prediction uncertainty in optimization. Moreover, UAHS leverages Bayesian optimization to interact prediction and optimization to improve its practical performance. Extensive experiments show that UAHS performs much better than state-of-the-art competitors on two public datasets and an industrial dataset. UAHS has been successfully applied in Microsoft Azure and brought practical benefits in real-world applications.
APA, Harvard, Vancouver, ISO, and other styles
7

Rahman, Tahleen, Bartlomiej Surma, Michael Backes, and Yang Zhang. "Fairwalk: Towards Fair Graph Embedding." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/456.

Full text
Abstract:
Graph embeddings have gained huge popularity in the recent years as a powerful tool to analyze social networks. However, no prior works have studied potential bias issues inherent within graph embedding. In this paper, we make a first attempt in this direction. In particular, we concentrate on the fairness of node2vec, a popular graph embedding method. Our analyses on two real-world datasets demonstrate the existence of bias in node2vec when used for friendship recommendation. We, therefore, propose a fairness-aware embedding method, namely Fairwalk, which extends node2vec. Experimental results demonstrate that Fairwalk reduces bias under multiple fairness metrics while still preserving the utility.
APA, Harvard, Vancouver, ISO, and other styles
8

Yang, Chengcheng, Lisi Chen, Shuo Shang, Fan Zhu, Li Liu, and Ling Shao. "Toward Efficient Navigation of Massive-Scale Geo-Textual Streams." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/672.

Full text
Abstract:
With the popularization of portable devices, numerous applications continuously produce huge streams of geo-tagged textual data, thus posing challenges to index geo-textual streaming data efficiently, which is an important task in both data management and AI applications, e.g., real-time data streams mining and targeted advertising. This, however, is not possible with the state-of-the-art indexing methods as they focus on search optimizations of static datasets, and have high index maintenance cost. In this paper, we present NQ-tree, which combines new structure designs and self-tuning methods to navigate between update and search efficiency. Our contributions include: (1) the design of multiple stores each with a different emphasis on write-friendness and read-friendness; (2) utilizing data compression techniques to reduce the I/O cost; (3) exploiting both spatial and keyword information to improve the pruning efficiency; (4) proposing an analytical cost model, and using an online self-tuning method to achieve efficient accesses to different workloads. Experiments on two real-world datasets show that NQ-tree outperforms two well designed baselines by up to 10×.
APA, Harvard, Vancouver, ISO, and other styles
9

De Moraes, Matheus B., and André L. S. Gradvohl. "Performance Evaluation of Feature Selection Algorithms Applied to Online Learning in Concept Drift Environments." In XV Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2018. http://dx.doi.org/10.5753/eniac.2018.4438.

Full text
Abstract:
Data streams are transmitted at high speeds with huge volume and may contain critical information need processing in real-time. Hence, to reduce computational cost and time, the system may apply a feature selection algorithm. However, this is not a trivial task due to the concept drift. In this work, we show that two feature selection algorithms, Information Gain and Online Feature Selection, present lower performance when compared to classification tasks without feature selection. Both algorithms presented more relevant results in one distinct scenario each, showing final accuracies up to 14% higher. The experiments using both real and artificial datasets present a potential for using these methods due to their better adaptability in some concept drift situations.
APA, Harvard, Vancouver, ISO, and other styles
10

Candao, Jhonatan, and Lilian Berton. "Combining active learning and graph-based semi-supervised learning." In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2019. http://dx.doi.org/10.5753/eniac.2019.9326.

Full text
Abstract:
The scarcity of labeled data is a common problem in many applications. Semi-supervised learning (SSL) aims to minimize the need for human annotation combining a small set of label data with a huge amount of unlabeled data. Similarly to SSL, Active Learning (AL) reduces the annotation efforts selecting the most informative points for annotation. Few works explore AL and graph-based SSL, in this work, we combine both strategies and explore different techniques: two graph-based SSL and two query strategy of AL in a pool-based scenario. Experimental results in artificial and real datasets indicate that our approach requires significantly less labeled instances to reach the same performance of random label selection.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography