Academic literature on the topic 'Data-centric ml'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Data-centric ml.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Data-centric ml"

1

Böther, Maximilian, Ties Robroek, Viktor Gsteiger, et al. "Modyn: Data-Centric Machine Learning Pipeline Orchestration." Proceedings of the ACM on Management of Data 3, no. 1 (2025): 1–30. https://doi.org/10.1145/3709705.

Full text
Abstract:
In real-world machine learning (ML) pipelines, datasets are continuously growing. Models must incorporate this new training data to improve generalization and adapt to potential distribution shifts. The cost of model retraining is proportional to how frequently the model is retrained and how much data it is trained on, which makes the naive approach of retraining from scratch each time impractical. We present Modyn, a data-centric end-to-end machine learning platform. Modyn's ML pipeline abstraction enables users to declaratively describe policies for continuously training a model on a growing
APA, Harvard, Vancouver, ISO, and other styles
2

Kakkar, Gaurav Tarlok, Jiashen Cao, Aubhro Sengupta, Joy Arulraj, and Hyesoon Kim. "Aero: Adaptive Query Processing of ML Queries." Proceedings of the ACM on Management of Data 3, no. 3 (2025): 1–27. https://doi.org/10.1145/3725408.

Full text
Abstract:
Query optimization is critical in relational database management systems (DBMSs) for ensuring efficient query processing. The query optimizer relies on precise selectivity and cost estimates to generate optimal query plans for execution. However, this static query optimization approach falls short for DBMSs handling machine learning (ML) queries. ML-centric DBMSs face distinct challenges in query optimization. First, performance bottlenecks shift to user-defined functions (UDFs), often encapsulating deep learning models, making it difficult to estimate UDF statistics without profiling the quer
APA, Harvard, Vancouver, ISO, and other styles
3

Zhdanovskaya, Anastasia, Daria Baidakova, and Dmitry Ustalov. "Data Labeling for Machine Learning Engineers: Project-Based Curriculum and Data-Centric Competitions." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (2023): 15886–93. http://dx.doi.org/10.1609/aaai.v37i13.26886.

Full text
Abstract:
The process of training and evaluating machine learning (ML) models relies on high-quality and timely annotated datasets. While a significant portion of academic and industrial research is focused on creating new ML methods, these communities rely on open datasets and benchmarks. However, practitioners often face issues with unlabeled and unavailable data specific to their domain. We believe that building scalable and sustainable processes for collecting data of high quality for ML is a complex skill that needs focused development. To fill the need for this competency, we created a semester co
APA, Harvard, Vancouver, ISO, and other styles
4

Orr, Laurel, Atindriyo Sanyal, Xiao Ling, Karan Goel, and Megan Leszczynski. "Managing ML pipelines." Proceedings of the VLDB Endowment 14, no. 12 (2021): 3178–81. http://dx.doi.org/10.14778/3476311.3476402.

Full text
Abstract:
The industrial machine learning pipeline requires iterating on model features, training and deploying models, and monitoring deployed models at scale. Feature stores were developed to manage and standardize the engineer's workflow in this end-to-end pipeline, focusing on traditional tabular feature data. In recent years, however, model development has shifted towards using self-supervised pretrained embeddings as model features. Managing these embeddings and the downstream systems that use them introduces new challenges with respect to managing embedding training data, measuring embedding qual
APA, Harvard, Vancouver, ISO, and other styles
5

Atughara John Chukwuebuka. "Adaptive machine learning in federated cloud environments: Advancing data-centric AI." International Journal of Science and Research Archive 6, no. 2 (2022): 361–76. https://doi.org/10.30574/ijsra.2022.6.2.0171.

Full text
Abstract:
This article examines the integration of adaptive machine learning (ML) within federated cloud environments, with a particular focus on its potential to advance data-centric AI. The study reviews the current landscape of federated learning, analyses the challenges and opportunities it presents, and evaluates adaptive ML techniques designed to enhance data privacy and model performance. Combining theoretical analysis with practical case studies, the paper offers valuable insights into the implementation of adaptive ML in federated cloud settings. The findings emphasise the significance of adapt
APA, Harvard, Vancouver, ISO, and other styles
6

Khan, Tahseen, Wenhong Tian, Shashikant Ilager, and Rajkumar Buyya. "Workload forecasting and energy state estimation in cloud data centres: ML-centric approach." Future Generation Computer Systems 128 (March 2022): 320–32. http://dx.doi.org/10.1016/j.future.2021.10.019.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Pritom, Bhowmik, and Saha Partha Arabinda. "A Data-Centric Approach to Improve Machine Learning Model's Performance in Production." International Journal of Engineering and Advanced Technology (IJEAT) 11, no. 1 (2021): 240–43. https://doi.org/10.35940/ijeat.A3201.1011121.

Full text
Abstract:
Machine learning teaches computers to think in a similar way to how humans do. An ML models work by exploring data and identifying patterns with minimal human intervention. A supervised ML model learns by mapping an input to an output based on labeled examples of input-output (X, y) pairs. Moreover, an unsupervised ML model works by discovering patterns and information that was previously undetected from unlabelled data. As an ML project is an extensively iterative process, there is always a need to change the ML code/model and datasets. However, when an ML model achieves 70-75% of accuracy, t
APA, Harvard, Vancouver, ISO, and other styles
8

Bhowmik, Pritom, and Arabinda Saha Partha. "A Data-Centric Approach to Improve Machine Learning Model’s Performance in Production." International Journal of Engineering and Advanced Technology 11, no. 1 (2021): 240–43. http://dx.doi.org/10.35940/ijeat.a3201.1011121.

Full text
Abstract:
Machine learning teaches computers to think in a similar way to how humans do. An ML models work by exploring data and identifying patterns with minimal human intervention. A supervised ML model learns by mapping an input to an output based on labeled examples of input-output (X, y) pairs. Moreover, an unsupervised ML model works by discovering patterns and information that was previously undetected from unlabelled data. As an ML project is an extensively iterative process, there is always a need to change the ML code/model and datasets. However, when an ML model achieves 70-75% of accuracy, t
APA, Harvard, Vancouver, ISO, and other styles
9

Dani, Sourabh, Akhlaqur Rahman, Jiong Jin, and Ambarish Kulkarni. "Cloud-Empowered Data-Centric Paradigm for Smart Manufacturing." Machines 11, no. 4 (2023): 451. http://dx.doi.org/10.3390/machines11040451.

Full text
Abstract:
In the manufacturing industry, there are claims about a novel system or paradigm to overcome current data interpretation challenges. Anecdotally, these studies have not been completely practical in real-world applications (e.g., data analytics). This article focuses on smart manufacturing (SM), proposed to address the inconsistencies within manufacturing that are often caused by reasons such as: (i) data realization using a general algorithm, (ii) no accurate methods to overcome the actual inconsistencies using anomaly detection modules, or (iii) real-time availability of insights of the data
APA, Harvard, Vancouver, ISO, and other styles
10

Grafberger, Stefan, Paul Groth, and Sebastian Schelter. "Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines." Proceedings of the ACM on Management of Data 1, no. 2 (2023): 1–26. http://dx.doi.org/10.1145/3589273.

Full text
Abstract:
Software systems that learn from data with machine learning (ML) are used in critical decision-making processes. Unfortunately, real-world experience shows that the pipelines for data preparation, feature encoding and model training in ML systems are often brittle with respect to their input data. As a consequence, data scientists have to run different kinds of data centric what-if analyses to evaluate the robustness and reliability of such pipelines, e.g., with respect to data errors or preprocessing techniques. These what-if analyses follow a common pattern: they take an existing ML pipeline
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Data-centric ml"

1

Owoyemi, Ayomide, Eugeniah Arthur, Tope Ladi-Akinyemi, Yemisi Babalola, and Damian Okaibedi Eke. "Trustworthy AI in Healthcare: Exploring Ethics in Digital Health Technologies in Nigeria." In Trustworthy AI. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-75674-0_9.

Full text
Abstract:
Abstract The rapid expansion of digital health solutions in Africa, encompassing telemedicine, AI, and other technologies, aligns with WHO’s Goals for Sustainable Development and Universal Health Coverage. Despite its benefits, this growth raises ethical concerns regarding deploying these technologies. A cross-sectional survey targeting executives of Nigerian digital health startups was conducted using Google Forms. The survey focused on startup characteristics, data management, ethical/legal governance, and user engagement. Data analysis employed descriptive statistics and cross-tabulation in
APA, Harvard, Vancouver, ISO, and other styles
2

Kotios, Dimitrios, Georgios Makridis, Silvio Walser, Dimosthenis Kyriazis, and Vittorio Monferrino. "Personalized Finance Management for SMEs." In Big Data and Artificial Intelligence in Digital Finance. Springer International Publishing, 2012. http://dx.doi.org/10.1007/978-3-030-94590-9_12.

Full text
Abstract:
AbstractThis chapter presents Business Financial Management (BFM) tools for Small Medium Enterprises (SMEs). The presented tools represent a game changer as they shift away from a one-size-fits-all approach to banking services and put emphasis on delivering a personalized SME experience and an improved bank client’s digital experience. An SME customer-centric approach, which ensures that the particularities of the SME are taken care of as much as possible, is presented. Through a comprehensive view of SMEs’ finances and operations, paired with state-of-the-art ML/DL models, the presented BFM t
APA, Harvard, Vancouver, ISO, and other styles
3

Charanya, J., G. Aruna, K. Akshitha, and S. Aswathy. "Novel ML Algorithms for Healthcare." In Synergizing Data Envelopment Analysis and Machine Learning for Performance Optimization in Healthcare. IGI Global, 2025. https://doi.org/10.4018/979-8-3373-0081-8.ch006.

Full text
Abstract:
The integration of novel machine learning (ML) algorithms in healthcare is revolutionizing patient monitoring, personalized treatments, and robotic surgery. Advanced ML techniques enable real-time analysis of patient data, facilitating continuous monitoring of vital signs and health metrics. This proactive approach allows for early detection of potential complications, ensuring timely interventions and improved patient outcomes. In the realm of personalized treatments, ML algorithms analyze datasets, including genetic information and treatment responses, to tailor therapies to individual patie
APA, Harvard, Vancouver, ISO, and other styles
4

Swathi, N. L., and Achukutla Kumar. "Advanced Technologies in Clinical Research and Drug Development." In The Ethical Frontier of AI and Data Analysis. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-2964-1.ch001.

Full text
Abstract:
This chapter explores the synergistic potential of decentralized trials, gene editing (e.g., CRISPR-Cas9), and the integration of artificial intelligence (AI) and machine learning (ML) in clinical trials and drug development. Decentralized trials enhance diversity and expedite timelines, while gene editing ensures precision in treating genetic diseases, necessitating robust ethical guidelines. AI and ML streamline processes, improving efficiency from patient recruitment to data analysis. Digital biomarkers and real-time monitoring systems provide rich data streams. This confluence marks a tran
APA, Harvard, Vancouver, ISO, and other styles
5

Londhe, Sanket, and Sushila Palwe. "Hybrid Customer-Centric Sales Forecasting Model Using AI ML Approaches." In Recent Trends in Intensive Computing. IOS Press, 2021. http://dx.doi.org/10.3233/apc210210.

Full text
Abstract:
Business Intelligence is a process of preparing, analyzing, presenting, and maintaining the data to gain insights for the decision-makers to make informed decisions. While there are many approaches to predict the growth based on the sales figures a very few consider the influence of customer data on the forecasting and the relevance of the same while making the predictions. So, in this study, we will look at some of the existing techniques used so far to make predictions and studies used to understand the customer data. With the analysis, we shall try to devise a hybrid approach to the traditi
APA, Harvard, Vancouver, ISO, and other styles
6

Swapna, H. R., Emmanuel Bigirimana, R. Geetha, et al. "New Insights Into Strategic Consumer Behavior From the Field of Operations Management." In Utilization of AI Technology in Supply Chain Management. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-3593-2.ch019.

Full text
Abstract:
This study emphasizes the importance of adopting a consumer-centric approach to supply chain management, highlighting the role of data-driven analytics, including artificial intelligence and machine learning (AI/ML), in extracting actionable insights from consumer data. Such insights can enhance demand forecasting, personalization strategies, supply chain efficiency, customer satisfaction, and risk mitigation. This chapter looks into the developing landscape of supply chain management, emphasizing the importance of adopting a consumer-centric approach. It examines the role of data-driven analy
APA, Harvard, Vancouver, ISO, and other styles
7

Soni, Anand, Zakir Hossen Shaikh, Mohammad Taqi, Vishwanath Bijja, and Satya Pavan Kumar Ratnakaram. "Enhancing Education in Bahrain With AI and ML." In Advances in Educational Technologies and Instructional Design. IGI Global, 2025. https://doi.org/10.4018/979-8-3693-7817-5.ch005.

Full text
Abstract:
The study explores the transformative potential of integrating AI and ML in wearable technologies to enhance Bahrain's education sector. It emphasizes the personalized learning experiences and holistic student support such technologies can offer, promoting academic and emotional well-being. Additionally, AI-powered wearables can aid teachers in classroom management and professional development. By preparing students for digital-centric work environments, Bahrain aims to align with global workforce trends and Sustainable Development Goals. The chapter discusses integration challenges, benefits,
APA, Harvard, Vancouver, ISO, and other styles
8

Chava, Karthik. "Developing machine learning algorithms for improved diagnosis and prognosis." In Revolutionizing Healthcare Systems with Next-Generation Technologies: The Role of Artificial Intelligence, Cloud Infrastructure, and Big Data in Driving Patient-Centric Innovation. Deep Science Publishing, 2025. https://doi.org/10.70593/978-81-988918-5-3_3.

Full text
Abstract:
Machine learning research has advanced quickly within the last decade, utilizing the availability of large data sets and data storage and processing advancements to develop state-of-the-art algorithms that rival and often outperform traditional statistical methods. Yet, while ML has been successfully applied in many fields of research from varying disciplines, including neuroscience, politics, criminology, ecology, and remote sensing, among many others, few biomedical researchers have explored the use of ML for improved disease diagnosis and prognosis. This is surprising, considering the fasci
APA, Harvard, Vancouver, ISO, and other styles
9

Mostofi, Fatemeh, and Vedat Toğan. "Explainable Safety Risk Management in Construction With Unsupervised Learning." In Advances in Civil and Industrial Engineering. IGI Global, 2023. http://dx.doi.org/10.4018/978-1-6684-5643-9.ch011.

Full text
Abstract:
The success of Machine Learning (ML) approaches as promising solutions has encouraged their widespread implementation across different fields. Owing to the high accident rate, the construction industry embraced ML in the risk assessment procedure. What if the machine produces knowledge of the relationship between the risk features and accident outcomes contained in the safety dataset? What if machines can explain an accident dataset without human intervention? Unsupervised ML techniques offer several advantages over supervised approaches, including their explainability to analyze and understan
APA, Harvard, Vancouver, ISO, and other styles
10

Raju, M., Kimsy Gulhane, Banu A. Gulshan, Saurabh Chandra, Pravin A. Dwaramwar, and Sampath Boopathi. "Architectures of High-Performance Computing Systems for Machine Learning Workloads." In Advances in Systems Analysis, Software Engineering, and High Performance Computing. IGI Global, 2023. https://doi.org/10.4018/978-1-6684-3795-7.ch017.

Full text
Abstract:
This chapter explores the architectures of high-performance computing (HPC) systems designed for machine learning (ML) workloads with special attention paid to advanced hardware and software optimizations intended to accelerate computational efficiency. It discusses the state-of-the-art use of GPUs, TPUs, and FPGAs in parallelizing operations, optimizing deep learning models during training and inference periods. The final section covers distributed computing frameworks, such as Apache Spark and Hadoop, that also provide support for big data processing in clusters. It also examines the challen
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Data-centric ml"

1

Fathollahzadeh, Saeed, Essam Mansour, and Matthias Boehm. "Demonstrating CatDB: LLM-based Generation of Data-centric ML Pipelines." In SIGMOD/PODS '25: International Conference on Management of Data. ACM, 2025. https://doi.org/10.1145/3722212.3725097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Trivedi, Rohit, Sandipan Patra, and Shafi Khadem. "Data-centric Cyber-attack Detection in Community Microgrids Using ML Techniques." In 2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT). IEEE, 2022. http://dx.doi.org/10.1109/globconpt57482.2022.9938333.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Omar, Rafiullah. "Energy-Efficient Development of ML-Enabled Systems: A Data-Centric Approach." In CAIN 2024: IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI. ACM, 2024. http://dx.doi.org/10.1145/3644815.3644974.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Valeriano, M. G., C. R. V. Kiffer, and A. C. Lorena. "Improving models performance in a data-centric approach applied to the healthcare domain." In Symposium on Knowledge Discovery, Mining and Learning. Sociedade Brasileira de Computação - SBC, 2024. http://dx.doi.org/10.5753/kdmile.2024.244519.

Full text
Abstract:
Machine learning systems heavily rely on training data, and any biases or limitations in datasets can significantly impair the performance and trustworthiness of these models. This paper proposes an instance hardness data-centric approach to enhance ML systems, leveraging the potential of contrasting the profiles of groups of easy and hard instances on a dataset to design classification problems more effectively. We present a case study with a COVID dataset sourced from a public repository that was utilized to predict aggravated conditions based on parameters collected on the patient’s initial
APA, Harvard, Vancouver, ISO, and other styles
5

Alexopoulos, Andreas, Jean Vanderdonckt, Luis Leiva, Ioannis Arapakis, Michalis Vakalellis, and Vassilis Prevelakis. "Adaptive Visualization Framework for Human-Centric Data Interaction in Time-Critical Environments." In Human Interaction and Emerging Technologies (IHIET-AI 2024). AHFE International, 2024. http://dx.doi.org/10.54941/ahfe1004578.

Full text
Abstract:
In today's data-driven era, handling information overload in time-sensitive scenarios poses a significant challenge. Visualization is a valuable tool for comprehending vast amounts of data. However, it's crucial to have self-adapting visualizations that are tailored to the user's cognitive level and grow with their expertise. Existing solutions often fall short in this regard.This paper introduces a framework integrating Artificial Intelligence (AI) techniques for context awareness and emotion sensing, offering visualizations that adjust to user requirements. The framework makes use of cross-m
APA, Harvard, Vancouver, ISO, and other styles
6

Han, Bo. "Trustworthy Machine Learning under Imperfect Data." In Thirty-Third International Joint Conference on Artificial Intelligence {IJCAI-24}. International Joint Conferences on Artificial Intelligence Organization, 2024. http://dx.doi.org/10.24963/ijcai.2024/978.

Full text
Abstract:
Trustworthy machine learning (TML) under imperfect data has recently brought much attention in the data-centric fields of machine learning (ML) and artificial intelligence (AI). Specifically, there are mainly three types of imperfect data along with their challenges for ML, including i) label-level imperfection: noisy labels; ii) feature-level imperfection: adversarial examples; iii) distribution-level imperfection: out-of-distribution data. Therefore, in this paper, we systematically share our insights and solutions of TML to handle three types of imperfect data. More importantly, we discuss
APA, Harvard, Vancouver, ISO, and other styles
7

Patel, Harsh, and Jonathan Chong. "How to Design a Modular, Effective, and Interpretable Machine Learning-Based Real-Time System: Lessons from Automated Electrical Submersible Pump Surveillance." In ADIPEC. SPE, 2023. http://dx.doi.org/10.2118/216761-ms.

Full text
Abstract:
Abstract Many machine learning (ML) projects do not progress beyond the proof-of-concept phase into real-world operations and remain economical at scale. Commonly discussed challenges revolve around digitalization, data, and infrastructure/tooling. However, there are other non-ML aspects that are equally if not more important towards building a successful system. This paper presents a general framework and lessons learned for building a robust, practical, and modular domain-centric ML-based system in contrast to purely "data-centric" or "model-centric" approaches. This paper presents the case
APA, Harvard, Vancouver, ISO, and other styles
8

Cagincan, Can, Juliane Balder, Roland Jochem, and Rainer Stark. "IT Tool Stack Optimization in Collaborative Projects: An Evaluation and Recommendation Framework." In 16th International Conference on Applied Human Factors and Ergonomics (AHFE 2025). AHFE International, 2025. https://doi.org/10.54941/ahfe1006391.

Full text
Abstract:
In modern industrial engineering, configuring IT tool systems is fundamental to ensuring productivity and quality in collaborative projects. Small and medium-sized enterprises (SMEs) face distinctive challenges in selecting and adopting these systems—not only due to limited IT and AI expertise but also because of insufficient consideration of human perceptual factors such as technology acceptance and subjective practical experience. These challenges adversely affect overall quality, productivity, and technology adoption.This study proposes a user-centric framework for evaluating and recommendi
APA, Harvard, Vancouver, ISO, and other styles
9

Lambay, Arsalan, Phillip Morgan, Ying Liu, and Ze Ji. "Model Training Through Synthetic Data Generation: Investigating the Impact on Human Physical Fatigue." In 15th International Conference on Applied Human Factors and Ergonomics (AHFE 2024). AHFE International, 2024. http://dx.doi.org/10.54941/ahfe1005349.

Full text
Abstract:
Collaborative robots, or cobots, are one of the Industry 4.0 technologies that have and continue to change many industrial procedures. However, amid this technological advancement, the persisting physical strain on human workers remains a significant concern. Even with the advent of cobots aimed at alleviating burdensome tasks, certain physical jobs continue to induce fatigue in human workers. Addressing this challenge necessitates the development of robust solutions that combine technological innovation with human-centric considerations. One critical aspect in mitigating physical fatigue in h
APA, Harvard, Vancouver, ISO, and other styles
10

Tze Fung Lam, Steven, and Alan H.S. Chan. "Application of Artificial Intelligence, Machine Learning and Deep Learning in Piloted Aircraft Operations: Systematic Review." In 15th International Conference on Applied Human Factors and Ergonomics (AHFE 2024). AHFE International, 2024. http://dx.doi.org/10.54941/ahfe1004666.

Full text
Abstract:
Aviation research on artificial intelligence (AI), machine learning (ML), and deep learning (DL) has seen significant growth as these emerging technologies hold immense potential for supporting both human-centred and technology-centred aspects of civil aircraft operations. This systematic review, following the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020, was registered on the Open Science Framework (DOI 10.17605/OSF.IO/ZR7A3) and focused specifically on the use of AI, ML, and DL in human-centric flight operations. The review conducted a compre
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!