Log in

Relevant bibliographies by topics / Large data handling / Journal articles

To see the other types of publications on this topic, follow the link: Large data handling.

Journal articles on the topic 'Large data handling'

Author: Grafiati

Published: 2 June 2025

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Large data handling.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zupan, Jure. "Handling large amounts of chemical data." Mikrochimica Acta 89, no. 1-6 (1986): 243–60. http://dx.doi.org/10.1007/bf01207319.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Hexemer, Alexander, Dula Parkinson, and Craig Tull. "Information Technology/Large-Scale Data Handling." Synchrotron Radiation News 28, no. 2 (2015): 2–3. http://dx.doi.org/10.1080/08940886.2015.1013412.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Figueroa, S. J. A., and C. Prestipino. "PrestoPronto: a code devoted to handling large data sets." Journal of Physics: Conference Series 712 (May 2016): 012012. http://dx.doi.org/10.1088/1742-6596/712/1/012012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Henry, Antonia J., Nathanael D. Hevelone, Stuart Lipsitz, and Louis L. Nguyen. "Comparative methods for handling missing data in large databases." Journal of Vascular Surgery 58, no. 5 (2013): 1353–59. http://dx.doi.org/10.1016/j.jvs.2013.05.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Samip, Raut, Jaiswal Kamlesh, Kale Vaibhav, Mote Akshay, Soudamini Pawar Ms., and Suvarna Kadam Mrs. "DATA DISTRIBUTION HANDLING ON CLOUD FOR DEPLOYMENT OF BIG DATA." International Journal on Cloud Computing: Services and Architecture (IJCCSA) 6, no. 3 (2019): 15–22. https://doi.org/10.5281/zenodo.3558397.

Full text

Abstract:

Cloud computing is a new emerging model in the field of computer science. For varying workload Cloud computing presents a large scale on demand infrastructure. The primary usage of clouds in practice is to process massive amounts of data. Processing large datasets has become crucial in research and business environments. The big challenges associated with processing large datasets is the vast infrastructure required. Cloud computing provides vast infrastructure to store and process Big data. Vms can be provisioned on demand in cloud to process the data by forming cluster of Vms . Map Reduce pa

APA, Harvard, Vancouver, ISO, and other styles

6

Chakradhar, Avinash Devarapalli. "Exploring Large Datasets: Techniques for Managing and Using Data Efficiently." European Journal of Advances in Engineering and Technology 6, no. 8 (2019): 51–55. https://doi.org/10.5281/zenodo.13253259.

Full text

Abstract:

With the evolution of the digital world, more and more data is being generated which is making a commodity. Most of this data is still raw and not being utilized to date. In other words, only a limited amount of data is being utilized and there is a need for a proper system that allows interesting individuals and organizations to efficiently convert this data into meaningful form where insights can be generated. There are numerous challenges associated with the use of this data which mainly include, storage limitations, displaying data properly, scalability issues, security concerns, and perfo

APA, Harvard, Vancouver, ISO, and other styles

7

V. Patil, Dipak, and R. S. Bichkar. "An Optimistic Data Mining Approach for Handling Large Data Set using Data Partitioning Techniques." International Journal of Computer Applications 24, no. 3 (2011): 29–33. http://dx.doi.org/10.5120/2930-3878.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Nguyen, Hung Son. "On Efficient Handling of Continuous Attributes in Large Data Bases." Fundamenta Informaticae 48, no. 1 (2001): 61–81. https://doi.org/10.3233/fun-2001-48105.

Full text

Abstract:

Some data mining techniques, like discretization of continuous attributes or decision tree induction, are based on searching for an optimal partition of data with respect to some optimization criteria. We investigate the problem of searching for optimal binary partition of continuous attribute domain in case of large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of I/O database operations necessary to construct such partitions. In our approach the basic operators are defined by queries on the number of objects

APA, Harvard, Vancouver, ISO, and other styles

9

Sanchez, Luis, Yossi Mosbacher, Aaron Higuera, and Christopher Tunnell. "Handling Detector Characterization Data (Metadata) in XENONnT." EPJ Web of Conferences 295 (2024): 01033. http://dx.doi.org/10.1051/epjconf/202429501033.

Full text

Abstract:

Effective metadata management is a consistent challenge faced by many scientific experiments. These challenges are magnified by the evolving needs of the experiment, the intricacies of seamlessly integrating a new system with existing analytical frameworks, and the crucial mandate to maintain database integrity. In this work we present the various challenges faced by experiments that produce a large amount of metadata and describe the solution used by the XENON experiment for metadata management.

APA, Harvard, Vancouver, ISO, and other styles

10

Richter, Matthias, Mikolaj Krzewicki, and Giulio Eulisse. "Data Handling In The Alice O2 Event Processing." EPJ Web of Conferences 214 (2019): 01035. http://dx.doi.org/10.1051/epjconf/201921401035.

Full text

Abstract:

The ALICE experiment at the Large Hadron Collider (LHC) at CERN is planned to be operated in a continuous data-taking mode in Run 3. This will allow to inspect data from all Pb-Pb collisions at a rate of 50 kHz, giving access to rare physics signals embedded in a large background. Based on experience with real-time reconstruction of particle trajectories and event properties in the ALICE High Level Trigger, the ALICE O2 facility is currently designed and developed to support processing of a continuous, triggerless stream of data segmented into entities referred to as timeframes. Both raw data

APA, Harvard, Vancouver, ISO, and other styles

11

Yusof, Mohd Kamir. "Efficiency of JSON for Data Retrieval in Big Data." Indonesian Journal of Electrical Engineering and Computer Science 7, no. 1 (2017): 250. http://dx.doi.org/10.11591/ijeecs.v7.i1.pp250-262.

Full text

Abstract:

Big data is the latest industry buzzword to describe large volume of structured and unstructured data that can be difficult to process and analyze. Most of organization looking for the best approach to manage and analyze the large volume of data especially in making a decision. XML is chosen by many organization because of powerful approach during retrieval and storage processes. However, XML approach, the execution time for retrieving large volume of data are still considerably inefficient due to several factors. In this contribution, two databases approaches namely Extensible Markup Language

APA, Harvard, Vancouver, ISO, and other styles

12

van Beek, J. H. G. M. "Channeling the Data Flood: Handling Large-Scale Biomolecular Measurements in Silico." Proceedings of the IEEE 94, no. 4 (2006): 692–709. http://dx.doi.org/10.1109/jproc.2006.871779.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Chaudhry, Omair Z., and William A. Mackaness. "DTM Generalisation: Handling Large Volumes of Data for Multi-Scale Mapping." Cartographic Journal 47, no. 4 (2010): 360–70. http://dx.doi.org/10.1179/000870410x12911342853948.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Langkamp, Diane L., Amy Lehman, and Stanley Lemeshow. "Techniques for Handling Missing Data in Secondary Analyses of Large Surveys." Academic Pediatrics 10, no. 3 (2010): 205–10. http://dx.doi.org/10.1016/j.acap.2010.01.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

HEGEDŰS, ISTVÁN, RÓBERT ORMÁNDI, and MÁRK JELASITY. "MASSIVELY DISTRIBUTED CONCEPT DRIFT HANDLING IN LARGE NETWORKS." Advances in Complex Systems 16, no. 04n05 (2013): 1350021. http://dx.doi.org/10.1142/s0219525913500215.

Full text

Abstract:

Massively distributed data mining in large networks such as smart device platforms and peer-to-peer systems is a rapidly developing research area. One important problem here is concept drift, where global data patterns (movement, preferences, activities, etc.) change according to the actual set of participating users, the weather, the time of day, or as a result of events such as accidents or even natural catastrophes. In an important case — when the network is very large but only a few training samples can be obtained at each node locally — no efficient distributed solution is known that coul

APA, Harvard, Vancouver, ISO, and other styles

16

Yusof, Mohd Kamir, and Mustafa Man. "Efficiency of Flat File Database Approach in Data Storage and Data Extraction for Big Data." Indonesian Journal of Electrical Engineering and Computer Science 9, no. 2 (2018): 460. http://dx.doi.org/10.11591/ijeecs.v9.i2.pp460-473.

Full text

Abstract:

Big data is the latest industry buzzword to describe large volume of structured and unstructured data that can be difficult to process and analyze. Most of organization looking for the best approach to manage and analyze the large volume of data especially in making a decision. XML and JSON are chosen by many organization because of powerful approach during retrieval and storage processes. However, these approaches, the execution time for retrieving large volume of data are still considerably inefficient due to several factors. In this contribution, three databases approaches namely Extensible

APA, Harvard, Vancouver, ISO, and other styles

17

Mohd, Kamir Yusof, and Man Mustafa. "Efficiency of Flat File Database Approach in Data Storage and Data Extraction for Big Data." Indonesian Journal of Electrical Engineering and Computer Science 9, no. 2 (2018): 460–73. https://doi.org/10.11591/ijeecs.v9.i2.pp460-473.

Full text

Abstract:

Big data is the latest industry buzzword to describe large volume of structured and unstructured data that can be difficult to process and analyze. Most of organization looking for the best approach to manage and analyze the large volume of data especially in making a decision. XML and JSON are chosen by many organization because of powerful approach during retrieval and storage processes. However, these approaches, the execution time for retrieving large volume of data are still considerably inefficient due to several factors. In this contribution, three databases approaches namely Extensible

APA, Harvard, Vancouver, ISO, and other styles

18

Basireddy, Maheswara Reddy. "Leveraging Prolog for Large Data Processing with Permutations and Combinations in the Travel Domain." Journal of Scientific and Engineering Research 5, no. 2 (2018): 458–63. https://doi.org/10.5281/zenodo.11216426.

Full text

Abstract:

The travel industry is characterized by vast volumes of data, from flight bookings and hotel reservations to customer preferences and pricing information. Efficiently processing and analyzing this data is crucial for travel companies to make informed decisions, optimize their operations, and provide personalized services to customers. This paper explores the use of the Prolog programming language as a powerful tool for tackling large data processing challenges, particularly in the domain of combinatorics and permutations, which are prevalent in the travel industry. Through a comprehensive revi

APA, Harvard, Vancouver, ISO, and other styles

19

Neater, Chifamba, and Pedzisai Constantino. "Challenges faced by lecturers in handling large classes." GPH-International Journal of Educational Research 05, no. 10 (2022): 16–29. https://doi.org/10.5281/zenodo.7221425.

Full text

Abstract:

<em>This case study employed qualitative methods to collect data using open- ended questionnaires on a census sample of 38 junior lecturers and face-to-face interviews on seven purposively selected senior lecturers in the School ofSchool of Entrepreneurship and Business Sciences at Chinhoyi University of Technology in Zimbabwe.Data were tabulated, analysed presented narratively using emerging themes permeating the study.</em><em>The study found that class size does matter as it affects the performance and quality of student learning. Hence, large classes correlate with low student performance.

APA, Harvard, Vancouver, ISO, and other styles

20

Yerra, Srikanth. "Reducing ETL processing time with SSIS optimizations for large-scale data pipelines." International journal of data science and machine learning 05, no. 01 (2025): 61–69. https://doi.org/10.55640/ijdsml-05-01-12.

Full text

Abstract:

Extract, Transform, Load (ETL) processes form the backbone of data manage- ment and consolidation in today’s data-driven enterprises with prevalent large- scale data pipelines. One of the widely used ETL tools is Microsoft SQL Server Integration Services (SSIS), yet its optimization for performance for large-scale data loads remains a challenge. As the volumes of data grow exponentially, inefficient ETL processes create bottlenecks, increased processing time, and ex- haustion of system resources. This work discusses major SSIS optimizations that minimize ETL processing time, allowing for effec

APA, Harvard, Vancouver, ISO, and other styles

21

Pham, D. T., and A. A. Afify. "Rules-6: A Simple Rule Induction Algorithm for Handling Large Data Sets." Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 219, no. 10 (2005): 1119–37. http://dx.doi.org/10.1243/095440605x31931.

Full text

Abstract:

RULES-3 Plus is a member of the RULES family of simple inductive learning algorithms with successful engineering applications. However, it requires modification in order to be a practical tool for problems involving large data sets. In particular, efficient mechanisms are needed for handling continuous attributes and noisy data. This article presents a new rule induction algorithm called RULES-6, which is derived from the RULES-3 Plus algorithm. The algorithm employs a fast and noise-tolerant search method for extracting IF-THEN rules from examples. It also uses simple and effective methods fo

APA, Harvard, Vancouver, ISO, and other styles

22

Siramgari, Dayakar Reddy, and Vijay Kartik Sikha. "From Raw Data to Actionable Insights: Leveraging LLMs for Automation." International Journal on Recent and Innovation Trends in Computing and Communication 12, no. 2 (2024): 1018–29. https://doi.org/10.5281/zenodo.14128827.

Full text

Abstract:

This paper explores the transformative role of Large Language Models (LLMs) in automating the data processing lifecycle, from ingestion to insights generation. LLMs streamline data handling by automating ingestion, transformation, and modeling processes, offering efficient, reliable, and timely insights critical for sectors such as healthcare, finance, and telecommunications. This study details the technical architecture of LLM-driven data workflows, addresses challenges in integrating diverse data sources, and emphasizes the necessity of governance frameworks to mitigate ethica

APA, Harvard, Vancouver, ISO, and other styles

23

Lorenz, Kilian, Pascal Bürklin, Klemens Schnattinger, et al. "Refinetuning Decentralized Large Language Model for Privacy-Sensitive University Data." Journal of Robotics and Automation Research 6, no. 2 (2025): 01–11. https://doi.org/10.33140/jrar.06.02.01.

Full text

Abstract:

This work focuses on refining a decentralized large language model (LLM) tailored for finetuning on privacy-sensitive university data. Devolved AI models, designed to operate across multiple distributed nodes, offer a promising solution for handling sensitive information by ensuring data remains localized at its source while collaboratively training a global model. The key challenge addressed in this study is the adaptation and fine-tuning of a decentralized LLM to work effectively with heterogeneous, privacyrestricted datasets typical in university environments, such as student records, resea

APA, Harvard, Vancouver, ISO, and other styles

24

Maroto-Molina, F., A. Gómez-Cabrera, J. E. Guerrero-Ginel, et al. "Handling of missing data to improve the mining of large feed databases1." Journal of Animal Science 91, no. 1 (2013): 491–500. http://dx.doi.org/10.2527/jas.2012-5491.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Henry, Antonia J., Nathanael D. Hevelone, Stuart R. Lipsitz, and Louis L. Nguyen. "Comparative methods for handling missing data in large databases: An empirical simulation." Journal of the American College of Surgeons 213, no. 3 (2011): S111—S112. http://dx.doi.org/10.1016/j.jamcollsurg.2011.06.262.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Sassi Hidri, Minyar, Mohamed Ali Zoghlami, and Rahma Ben Ayed. "Speeding up the large-scale consensus fuzzy clustering for handling Big Data." Fuzzy Sets and Systems 348 (October 2018): 50–74. http://dx.doi.org/10.1016/j.fss.2017.11.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Moënne-Loccoz, Nicolas, Bruno Janvier, Stéphane Marchand-Maillet, and Eric Bruno. "Handling temporal heterogeneous data for content-based management of large video collections." Multimedia Tools and Applications 31, no. 3 (2006): 309–25. http://dx.doi.org/10.1007/s11042-006-0042-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Burnett, T. H., A. Chekhtman, E. do Couto e Silva, et al. "Gamma-ray Large-Area Space Telescope (GLAST) balloon flight data handling overview." IEEE Transactions on Nuclear Science 49, no. 4 (2002): 1904–8. http://dx.doi.org/10.1109/tns.2002.801536.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Fan, Hongjie, Zhiyi Ma, Dianhui Wang, and Junfei Liu. "Handling distributed XML queries over large XML data based on MapReduce framework." Information Sciences 453 (July 2018): 1–20. http://dx.doi.org/10.1016/j.ins.2018.04.028.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Makris, Christos, Yannis Panagis, Evangelos Sakkopoulos, and Athanasios Tsakalidis. "Efficient and adaptive discovery techniques of Web Services handling large data sets." Journal of Systems and Software 79, no. 4 (2006): 480–95. http://dx.doi.org/10.1016/j.jss.2005.06.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Baron, Steve, and Rebekah Russell-Bennett. "Editorial: the changing nature of data." Journal of Services Marketing 30, no. 7 (2016): 673–75. http://dx.doi.org/10.1108/jsm-08-2016-0292.

Full text

Abstract:

Purpose The purpose of this paper it to highlight the challenges of managing and handling data for services marketers that have been brought about by the contemporary environment and emerging schools of thought. Design/methodology/approach A comparison is made between conventional data collection and statistical analysis, and the need to glean information from large, pre-existing data sets for future contributions to service research. Findings For service marketers to tackle real world, large problem areas, there will be a need to develop methods of dealing with data which pre-exist in many fo

APA, Harvard, Vancouver, ISO, and other styles

32

Chen, S., Z. Wang, L. Bai, et al. "LARGE VECTOR SPATIAL DATA STORAGE AND QUERY PROCESSING USING CLICKHOUSE." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-M-1-2023 (April 21, 2023): 65–72. http://dx.doi.org/10.5194/isprs-archives-xlviii-m-1-2023-65-2023.

Full text

Abstract:

Abstract. The exponential growth of geospatial data resulting from the development of earth observation technology has created significant challenges for traditional relational databases. While NoSQL databases based on distributed file systems can handle massive data storage, they often struggle to cope with real-time query. Column-storage databases, on other hand, are highly effective at both storage and query processing for large-scale datasets. In this paper, we propose a spatial version of ClickHouse that leverages R-Tree indexing to enable efficient storage and real-time analysis of massi

APA, Harvard, Vancouver, ISO, and other styles

33

Alaa, Hussein Al-Hamami, and Adel Flayyih Ali. "Enhancing Big Data Analysis by Using Map-reduce Technique." Bulletin of Electrical Engineering and Informatics 7, no. 1 (2018): 113–16. https://doi.org/10.11591/eei.v7i1.895.

Full text

Abstract:

Database is defined as a set of data that is organized and distributed in a manner that permits the user to access the data being stored in an easy and more convenient manner. However, in the era of big-data the traditional methods of data analytics may not be able to manage and process the large amount of data. In order to develop an efficient way of handling big-data, this work enhances the use of Map-Reduce technique to handle big-data distributed on the cloud. This approach was evaluated using Hadoop server and applied on Electroencephalogram (EEG) Big-data as a case study. The proposed ap

APA, Harvard, Vancouver, ISO, and other styles

34

Varun, Garg. "Operational Effectiveness Using Cloud-Based ETL Pipelines on Large-Scale Data Platforms." INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH AND CREATIVE TECHNOLOGY 7, no. 1 (2021): 1–5. https://doi.org/10.5281/zenodo.14945046.

Full text

Abstract:

Processing big amounts of data across several sources in real-time depends critically on cloud-based ETL (Extract, Transform, Load) pipelines. Maintaining operational efficiency, meantime, when handling multi-source data intake creates major difficulties. These involve control of scalability, handling of data variance, low latency assurance, and error recovery automation. This work points out the primary difficulties keeping operating efficiency in cloud-based ETL pipelines and suggests solutions like using real-time processing systems and automation. We show how automatic scaling, resource op

APA, Harvard, Vancouver, ISO, and other styles

35

Joseph, Sethunya R., Hlomani Hlomani, and Keletso Letsholo. "Data Mining Algorithms: An Overview." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 15, no. 6 (2016): 6806–13. http://dx.doi.org/10.24297/ijct.v15i6.1615.

Full text

Abstract:

The research on data mining has successfully yielded numerous tools, algorithms, methods and approaches for handling large amounts of data for various purposeful use andÂ Â problem solving. Data mining has become an integral part of many application domains such as data ware housing, predictive analytics, business intelligence, bio-informatics and decision support systems. Prime objective of data mining is to effectively handle large scale data, extract actionable patterns, and gain insightful knowledge. Data mining is part and parcel of knowledge discovery in databases (KDD) process. Success

APA, Harvard, Vancouver, ISO, and other styles

36

Kuznetsov, Valentin. "Gaining insight from large data volumes with ease." EPJ Web of Conferences 214 (2019): 04027. http://dx.doi.org/10.1051/epjconf/201921404027.

Full text

Abstract:

Efficient handling of large data-volumes becomes a necessity in today’s world. It is driven by the desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction, various optimization of data workflows, and pipelines). In this paper, we discuss how modern technologies are transforming well established patterns in HEP communities. The new data insight can be achieved by embracing Big Data tools for a variety of use cases, from analytics and monitoring to training Machine Learning models on a t

APA, Harvard, Vancouver, ISO, and other styles

37

Ramadhan, Insan, and Gladhi Guarddin. "Large-scale integrated infrastructure for asynchronous microservices architecture." Jurnal Teknologi dan Sistem Komputer 10, no. 2 (2022): 60–66. http://dx.doi.org/10.14710/jtsiskom.2022.14120.

Full text

Abstract:

Integrated large-scale business activities increasingly rely on the use of remote resources and services across multi-platform applications. Microservice in previous research has become a solution, but this approach still leaves a data loss problem. This research methodology proposed an architecture of data transmission managed by messaging service to prevent data loss in handling many requests to deliver a multiplatform architecture, handling the plugin services, and enabling escalation based on the requirement. As a result, this research successfully implements large-scale multiplatform Sing

APA, Harvard, Vancouver, ISO, and other styles

38

Preyaa, Atri. "Efficiently Handling Streaming JSON Data: A Novel Library for GCS-to-BigQuery Ingestion." European Journal of Advances in Engineering and Technology 8, no. 10 (2021): 96–99. https://doi.org/10.5281/zenodo.11408124.

Full text

Abstract:

This paper examines a Python library designed to efficiently stream JSON data from Google Cloud Storage (GCS) to BigQuery tables. The library offers functionalities for handling large datasets, enriching data with timestamps and filenames, and ensuring compatibility with BigQuery's schema requirements. Additionally, the library demonstrates potential use cases not only in the realm of data engineering but also in advancing AI development by facilitating robust data ingestion pipelines. We analyze the library's features, potential use cases, and its impact on both data engineering workflows and

APA, Harvard, Vancouver, ISO, and other styles

39

Venkata Siva Reddy, D., and R. Vasanth Kumar Mehta. "Cloud based computational intelligence approaches to machine learning and big data analytics: literature survey." International Journal of Engineering & Technology 7, no. 1.9 (2018): 186. http://dx.doi.org/10.14419/ijet.v7i1.9.9817.

Full text

Abstract:

Today there are many sources through which we can access information from internet and based on the dependency now there is an over flow of data either in refined form or unrefined form. Handling large information is a complicated task. It has to overcome many challenges. There are some challenges like drawing useful information from undefined patterns which we can overcome by using data mining techniques but certain challenges like scalability, easy accessing of large data, time, or cost areto be handled in better sense.Machine learning helps in learning patterns from data automatically and c

APA, Harvard, Vancouver, ISO, and other styles

40

Amaro, F. D., M. Antonacci, R. Antonietti, et al. "Data handling of CYGNO experiment using INFN-Cloud solution." EPJ Web of Conferences 295 (2024): 07013. http://dx.doi.org/10.1051/epjconf/202429507013.

Full text

Abstract:

The INFN Cloud project was launched at the beginning of 2020, aiming to build a distributed Cloud infrastructure and provide advanced services for the INFN scientific communities. A Platform as a Service (PaaS) was created inside INFN Cloud that allows the experiments to develop and access resources as a Software as a Service (SaaS), and CYGNO is the betatester of this system. The aim of the CYGNO experiment is to realize a large gaseous Time Projection Chamber based on the optical readout of the photons produced in the avalanche multiplication of ionization electrons in a GEM stack. To this e

APA, Harvard, Vancouver, ISO, and other styles

41

Furukawa, Kyoji, Dale L. Preston, Munechika Misumi, and Harry M. Cullings. "Handling incomplete smoking history data in survival analysis." Statistical Methods in Medical Research 26, no. 2 (2014): 707–23. http://dx.doi.org/10.1177/0962280214556794.

Full text

Abstract:

While data are unavoidably missing or incomplete in most observational studies, consequences of mishandling such incompleteness in analysis are often overlooked. When time-varying information is collected irregularly and infrequently over a long period, even precisely obtained data may implicitly involve substantial incompleteness. Motivated by an analysis to quantitatively evaluate the effects of smoking and radiation on lung cancer risks among Japanese atomic-bomb survivors, we provide a unique application of multiple imputation to incompletely observed smoking histories under the assumption

APA, Harvard, Vancouver, ISO, and other styles

42

Karthikeyani Visalakshi N., Shanthi S., and Lakshmi K. "MapReduce-Based Crow Search-Adopted Partitional Clustering Algorithms for Handling Large-Scale Data." International Journal of Cognitive Informatics and Natural Intelligence 15, no. 4 (2021): 1–23. http://dx.doi.org/10.4018/ijcini.20211001.oa32.

Full text

Abstract:

Cluster analysis is the prominent data mining technique in knowledge discovery and it discovers the hidden patterns from the data. The K-Means, K-Modes and K-Prototypes are partition based clustering algorithms and these algorithms select the initial centroids randomly. Because of its random selection of initial centroids, these algorithms provide the local optima in solutions. To solve these issues, the strategy of Crow Search algorithm is employed with these algorithms to obtain the global optimum solution. With the advances in information technology, the size of data increased in a drastic

APA, Harvard, Vancouver, ISO, and other styles

43

Zaki, Ummu Hani’ Hair, Izyan Izzati Kamsani, Ahmad Firdaus Ahmad Fadzil, Zainura Idrus, and Eser Kandogan. "Big Data: Issues and Challenges in Clustering Data Visualization." Journal of Advanced Research in Applied Sciences and Engineering Technology 51, no. 1 (2024): 150–59. http://dx.doi.org/10.37934/araset.51.1.150159.

Full text

Abstract:

In the era of big data, the continuous generation of data from various fields has resulted in large and complex datasets. These datasets often come in diverse formats and structures, including unstructured or semi-structured data. Despite the wide availability of big data, high dimensionality remains a significant challenge for analysing and understanding the data for various purposes. Clustering analysis plays a crucial role in data analysis and visualization by uncovering hidden patterns and structures within datasets. However, several challenges hinder the effectiveness of clustering analys

APA, Harvard, Vancouver, ISO, and other styles

44

McTeer, Matthew, Robin Henderson, Quentin M. Anstee, and Paolo Missier. "Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach." Mathematics 12, no. 5 (2024): 777. http://dx.doi.org/10.3390/math12050777.

Full text

Abstract:

Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both coho

APA, Harvard, Vancouver, ISO, and other styles

45

Zheng, Min Juan, Guo Jian Cheng, and Fei Zhao. "Large-Scale Data Classification Based on Ball Vector Machine." Applied Mechanics and Materials 312 (February 2013): 771–76. http://dx.doi.org/10.4028/www.scientific.net/amm.312.771.

Full text

Abstract:

The quadratic programming problem in the standard support vector machine (SVM) algorithm has high time complexity and space complexity in solving the large-scale problems which becomes a bottleneck in the SVM applications. Ball Vector Machine (BVM) converts the quadratic programming problem of the traditional SVM into the minimum enclosed ball problem (MEB). It can indirectly get the solution of quadratic programming through solving the MEB problem which significantly reduces the time complexity and space complexity. The experiments show that when handling five large-scale and high-dimensional

APA, Harvard, Vancouver, ISO, and other styles

46

Gupta, Himanshu. "Cost-Effective Large Data Batch Processing for Call Center Transcripts Using AWS Lambda Functions." International Journal for Research in Applied Science and Engineering Technology 12, no. 9 (2024): 102–4. http://dx.doi.org/10.22214/ijraset.2024.64137.

Full text

Abstract:

As enterprises increasingly rely on cloud services for scalable data processing, optimizing cost and efficiency in handling large datasets has become a priority. This paper explores the use of AWS Lambda for large-scale batch processing of call center transcripts, where data is stored in partitioned S3 buckets. We design a fault-tolerant and cost-effective architecture that leverages Lambda functions to process these datasets during off-peak hours, taking advantage of AWS’s pay-as-you-go pricing model. Our approach includes a retry logic for handling failures, ensuring the robustness of the sy

APA, Harvard, Vancouver, ISO, and other styles

47

Yogeswaran, Claudia, and Kearsy Cormier. "Archiving Large-Scale Legacy Multimedia Research Data: A Case Study." International Journal of Digital Curation 12, no. 2 (2018): 157–76. http://dx.doi.org/10.2218/ijdc.v12i2.484.

Full text

Abstract:

In this paper we provide a case study of the creation of the DCAL Research Data Archive at University College London. In doing so, we assess the various challenges associated with archiving large-scale legacy multimedia research data, given the lack of literature on archiving such datasets. We address issues such as the anonymisation of video research data, the ethical challenges of managing legacy data and historic consent, ownership considerations, the handling of large-size multimedia data, as well as the complexity of multi-project data from a number of researchers and legacy data from ele

APA, Harvard, Vancouver, ISO, and other styles

48

Yuan, Shuangshuang, Peng Wu, Yuehui Chen, and Qiang Li. "A Survey of Methods for Handling Disk Data Imbalance." International Journal on Cybernetics & Informatics 12, no. 6 (2023): 83–93. http://dx.doi.org/10.5121/ijci.2023.120607.

Full text

Abstract:

Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level m

APA, Harvard, Vancouver, ISO, and other styles

49

Jesmeen, M. Z. H., Hossen J., Sayeed S., et al. "A Survey on Cleaning Dirty Data Using Machine Learning Paradigm for Big Data Analytics." Indonesian Journal of Electrical Engineering and Computer Science 10, no. 3 (2018): 1234–43. https://doi.org/10.11591/ijeecs.v10.i3.pp1234-1243.

Full text

Abstract:

Recently Big Data has become one of the important new factors in the business field. This needs to have strategies to manage large volumes of structured, unstructured and semi-structured data. It’s challenging to analyze such large scale of data to extract data meaning and handling uncertain outcomes. Almost all big data sets are dirty, i.e. the set may contain inaccuracies, missing data, miscoding and other issues that influence the strength of big data analytics. One of the biggest challenges in big data analytics is to discover and repair dirty data; failure to do this can lead to ina

APA, Harvard, Vancouver, ISO, and other styles

50

R, Stalin. "Feature Analysis of Big Data Usage in Different Platform." Shanlax International Journal of Arts, Science and Humanities 6, S1 (2018): 61–66. https://doi.org/10.5281/zenodo.1410985.

Full text

Abstract:

Big data is a huge amount of data set. It plays a main role in handling large and complex data where the traditional data processing application software is inadequate to deal with the big data. Big data focuses on capturing data, data storage, data analysis,  search, sharing, transfer, visualization, query and updating information privacy. Relational database management systems and desktop statistics- and visualizationpackages often have difficulty handling big data. The work may require "massively parallel  software  running  on  tens,  

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!