To see the other types of publications on this topic, follow the link: Benchmarking and High-Performance Computing.

Journal articles on the topic 'Benchmarking and High-Performance Computing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Benchmarking and High-Performance Computing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Vogels, W. "Benchmarking the CLI for high performance computing." IEE Proceedings - Software 150, no. 5 (2003): 266. http://dx.doi.org/10.1049/ip-sen:20030987.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wright, Steven A. "Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems." Future Generation Computer Systems 92 (March 2019): 900–902. http://dx.doi.org/10.1016/j.future.2018.11.020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sexton-Kennedy, E., P. Gartung, and C. D. Jones. "Benchmarking high performance computing architectures with CMS’ skeleton framework." Journal of Physics: Conference Series 898 (October 2017): 042045. http://dx.doi.org/10.1088/1742-6596/898/4/042045.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jarvis, S. A. "Editorial Performance Modelling, Benchmarking and Simulation of High-Performance Computing Systems." Computer Journal 55, no. 2 (2011): 136–37. http://dx.doi.org/10.1093/comjnl/bxr113.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pathak, Purvi, and Kumar R. "THE FEASIBILITY STUDY OF RUNNING HPC WORKLOADS ON COMPUTATIONAL CLOUDS." Asian Journal of Pharmaceutical and Clinical Research 10, no. 13 (2017): 445. http://dx.doi.org/10.22159/ajpcr.2017.v10s1.20507.

Full text
Abstract:
High-performance computing (HPC) applications require high-end computing systems, but not all scientists have access to such powerful systems. Cloud computing provides an opportunity to run these applications on the cloud without the requirement of investing in high-end parallel computing systems. We can analyze the performance of the HPC applications on private as well as public clouds. The performance of the workload on the cloud can be calculated using different benchmarking tools such as NAS parallel benchmarking and Rally. The workloads of HPC applications require use of many parallel computing systems to be run on a physical setup, but this facility is available on cloud computing environment without the need of investing in physical machines. We aim to analyze the ability of the cloud to perform well when running HPC workloads. We shall get the detailed performance of the cloud when running these applications on a private cloud and find the pros and cons of running HPC workloads on cloud environment.
APA, Harvard, Vancouver, ISO, and other styles
6

Dotti, Andrea, V. Daniel Elvira, Gunter Folger, et al. "Geant4 Computing Performance Benchmarking and Monitoring." Journal of Physics: Conference Series 664, no. 6 (2015): 062021. http://dx.doi.org/10.1088/1742-6596/664/6/062021.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Seth, Dhruv, and Pradeep Chintale. "Performance Benchmarking of Serverless Computing Platforms." International Journal of Computer Trends and Technology 72, no. 6 (2024): 160–67. http://dx.doi.org/10.14445/22312803/ijctt-v72i6p121.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Sugiarto, Indar, Doddy Prayogo, Henry Palit, et al. "Custom Built of Smart Computing Platform for Supporting Optimization Methods and Artificial Intelligence Research." Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences 58, S (2021): 59–64. http://dx.doi.org/10.53560/ppasa(58-sp1)733.

Full text
Abstract:
This paper describes a prototype of a computing platform dedicated to artificial intelligence explorations. The platform, dubbed as PakCarik, is essentially a high throughput computing platform with GPU (graphics processing units) acceleration. PakCarik is an Indonesian acronym for Platform Komputasi Cerdas Ramah Industri Kreatif, which can be translated as “Creative Industry friendly Intelligence Computing Platform”. This platform aims to provide complete development and production environment for AI-based projects, especially to those that rely on machine learning and multiobjective optimization paradigms. The method for constructing PakCarik was based on a computer hardware assembling technique that uses commercial off-the-shelf hardware and was tested on several AI-related application scenarios. The testing methods in this experiment include: high-performance lapack (HPL) benchmarking, message passing interface (MPI) benchmarking, and TensorFlow (TF) benchmarking. From the experiment, the authors can observe that PakCarik's performance is quite similar to the commonly used cloud computing services such as Google Compute Engine and Amazon EC2, even though falls a bit behind the dedicated AI platform such as Nvidia DGX-1 used in the benchmarking experiment. Its maximum computing performance was measured at 326 Gflops. The authors conclude that PakCarik is ready to be deployed in real-world applications and it can be made even more powerful by adding more GPU cards in it.
APA, Harvard, Vancouver, ISO, and other styles
9

Islam, Riadul, Patrick Majurski, Jun Kwon, Anurag Sharma, and Sri Ranga Sai Krishna Tummala. "Benchmarking Artificial Neural Network Architectures for High-Performance Spiking Neural Networks." Sensors 24, no. 4 (2024): 1329. http://dx.doi.org/10.3390/s24041329.

Full text
Abstract:
Organizations managing high-performance computing systems face a multitude of challenges, including overarching concerns such as overall energy consumption, microprocessor clock frequency limitations, and the escalating costs associated with chip production. Evidently, processor speeds have plateaued over the last decade, persisting within the range of 2 GHz to 5 GHz. Scholars assert that brain-inspired computing holds substantial promise for mitigating these challenges. The spiking neural network (SNN) particularly stands out for its commendable power efficiency when juxtaposed with conventional design paradigms. Nevertheless, our scrutiny has brought to light several pivotal challenges impeding the seamless implementation of large-scale neural networks (NNs) on silicon. These challenges encompass the absence of automated tools, the need for multifaceted domain expertise, and the inadequacy of existing algorithms to efficiently partition and place extensive SNN computations onto hardware infrastructure. In this paper, we posit the development of an automated tool flow capable of transmuting any NN into an SNN. This undertaking involves the creation of a novel graph-partitioning algorithm designed to strategically place SNNs on a network-on-chip (NoC), thereby paving the way for future energy-efficient and high-performance computing paradigms. The presented methodology showcases its effectiveness by successfully transforming ANN architectures into SNNs with a marginal average error penalty of merely 2.65%. The proposed graph-partitioning algorithm enables a 14.22% decrease in inter-synaptic communication and an 87.58% reduction in intra-synaptic communication, on average, underscoring the effectiveness of the proposed algorithm in optimizing NN communication pathways. Compared to a baseline graph-partitioning algorithm, the proposed approach exhibits an average decrease of 79.74% in latency and a 14.67% reduction in energy consumption. Using existing NoC tools, the energy-latency product of SNN architectures is, on average, 82.71% lower than that of the baseline architectures.
APA, Harvard, Vancouver, ISO, and other styles
10

Wu, Dazhong, Xi Liu, Steve Hebert, Wolfgang Gentzsch, and Janis Terpenny. "Democratizing digital design and manufacturing using high performance cloud computing: Performance evaluation and benchmarking." Journal of Manufacturing Systems 43 (April 2017): 316–26. http://dx.doi.org/10.1016/j.jmsy.2016.09.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Varghese, Blesson, Nan Wang, David Bermbach, et al. "A Survey on Edge Performance Benchmarking." ACM Computing Surveys 54, no. 3 (2021): 1–33. http://dx.doi.org/10.1145/3444692.

Full text
Abstract:
Edge computing is the next Internet frontier that will leverage computing resources located near users, sensors, and data stores to provide more responsive services. Therefore, it is envisioned that a large-scale, geographically dispersed, and resource-rich distributed system will emerge and play a key role in the future Internet. However, given the loosely coupled nature of such complex systems, their operational conditions are expected to change significantly over time. In this context, the performance characteristics of such systems will need to be captured rapidly, which is referred to as performance benchmarking, for application deployment, resource orchestration, and adaptive decision-making. Edge performance benchmarking is a nascent research avenue that has started gaining momentum over the past five years. This article first reviews articles published over the past three decades to trace the history of performance benchmarking from tightly coupled to loosely coupled systems. It then systematically classifies previous research to identify the system under test, techniques analyzed, and benchmark runtime in edge performance benchmarking.
APA, Harvard, Vancouver, ISO, and other styles
12

Kouatli, Issam. "People-process-performance benchmarking technique in cloud computing environment." International Journal of Productivity and Performance Management 69, no. 9 (2019): 1955–72. http://dx.doi.org/10.1108/ijppm-04-2017-0083.

Full text
Abstract:
Purpose Cloud computing is relatively a new type of technology demanding a new method of management techniques to attain security and privacy leading to customer satisfaction regarding “Business Protection” measure. As cloud computing businesses are usually composed of multiple colocation sites/departments, the purpose of this paper is to propose a benchmark operation to measure and compare the overall integrated people-process-performance (PPP) among different departments within cloud computing organization. The purpose of this paper is to motivate staff/units to improve the process performance and meet the standards in a competitive approach among business units. Design/methodology/approach The research method was conducted at Cirrus Ltd, which is a cloud computing service provider where a focus group consists of six IT professionals/managers. The objective of the focus group was to investigate the proposed technique by selecting the best practices relevant criteria, with the relevant sub-criteria as a benchmarking performance tool to measure PPP via an analytic hierarchy processing (AHP) approach. The standard pairwise comparative AHP scale was used to measure the performance of three different teams defined as production team, user acceptance testing team and the development team. Findings Based on best practice performance measurement (reviewed in this paper) of cloud computing, the proposed AHP model was implemented in a local medium-sized cloud service provider named “Cirrus” with their single site data center. The actual criteria relevant to Cirrus was an adaptation of the “Best practice” described in the literature. The main reason for the adaptation of criteria was that the principle of PPP assumes multiple departments/datacenters located in a different geographical area in large service providers. As Cirrus is a type of SMEs, the adaptation of performance measurement was based on teams within the same data center location. Irrelevant of this adaptation, the objective of measuring vendors KPI using the AHP technique as a specific output of PPP is also a valid situation. Practical implications This study provides guidance for achieving cloud computing performance measurement using the AHP technique. Hence, the proposed technique is an integrated model to measure the PPP under monitored cloud environment. Originality/value The proposed technique measures and manages the performance of cloud service providers that also implicitly act as a catalyst to attain trust in such high information-sensitive environment leading to organizational effectiveness of managing cloud organizations.
APA, Harvard, Vancouver, ISO, and other styles
13

Hussain, Tassadaq, Muhammad Wasay Tahir, Manahil Mushtaq, and Sidra Khalid. "Design and benchmarking of a low-cost RISC-V-based high-performance computing cluster for edge computing." IET Conference Proceedings 2025, no. 3 (2025): 579–86. https://doi.org/10.1049/icp.2025.1168.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Montero, R. S., E. Huedo, and I. M. Llorente. "Benchmarking of high throughput computing applications on Grids." Parallel Computing 32, no. 4 (2006): 267–79. http://dx.doi.org/10.1016/j.parco.2005.12.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Mungai, Joseph, and Wanjiku Nganga. "Benchmarking of Undergraduate Computing Curricula in Kenya." INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY 6, no. 1 (2013): 727–37. http://dx.doi.org/10.24297/ijmit.v6i1.754.

Full text
Abstract:
This study investigated the quality of undergraduate computing curricula at Kenyan universities, how they compare locally and regionally with equivalent programs and how closely they meet the ICT sector needs. It was guided by four objectives i.e. to undertake an ontological mapping of computing curricula, to identify appropriate benchmarking criteria, to develop and test a benchmarking tool, and to investigate the alignment of these curricula to computing skills requirement. The study was deemed important by the plethora of academic computing programs of varying degrees of utility and credibility, which are a product of the escalating demand for computing education in Kenya given the development of Vision 2030 and the rapid growth of the ICT industry. To achieve its objectives, the study adopted a quantitative and qualitative cross-sectional descriptive survey of computing curricula offered locally (in Kenya) and regionally (from best practicing countries, USA and India). A sample of 70.3% was drawn from the target population for ontological mapping. Two research instruments, i.e. a questionnaire and a document analysis framework that were administered to a cross-section of 11 public/private universities. The study established that there are 24 undergraduate computing programs under 6 titles, viz. BSc., BCom., BTech., BB., BEd. and BEng. The two most populous programs are BSc. Computer Science (CS) and BSc. Information Technology (IT), which were selected to help identify two benchmarking criteria: Percent weight allocation of core hours within ACM knowledge areas and Relative performance capabilities of computing graduates. Using these criteria a benchmarking tool was developed and tested, which depicted disparities among the respondents in the percent weight allocation of core hours in CS programs. Similarly, it portrayed overlaps in the relative performance capabilities of CS and IT graduates, an outcome that queried the uniqueness of these programs. As such, its results indicate that the quality of the two computing programs is relatively insufficient. However, it further establishes that the computing curricula are aligned to meet the top 3 highly demanded computing skills i.e. Networking, Software development and Internet skills albeit insufficient percent weight allocation of core hours in Software development. It therefore recommends further testing and refining of the established benchmarking tool, the need to re-focus the computing programs and supports the call to institute a regulatory body and qualifications framework for computing education and skills.Â
APA, Harvard, Vancouver, ISO, and other styles
16

Laudan, Janek, Paul Heinrich, and Kai Nagel. "High-Performance Mobility Simulation: Implementation of a Parallel Distributed Message-Passing Algorithm for MATSim." Information 16, no. 2 (2025): 116. https://doi.org/10.3390/info16020116.

Full text
Abstract:
Striving for better simulation results, transport planners want to simulate larger domains with increased levels of detail. Achieving fast execution times for these complex traffic simulations requires the parallel computing power of modern hardware. This paper presents an architectural update to the MATSim traffic simulation framework, introducing a prototype that adapts the existing traffic flow model to a distributed parallel algorithm. The prototype is capable of scaling across multiple compute nodes, utilizing the parallel computing power of modern hardware. Benchmarking reveals a 119-fold improvement in execution speed over the current implementation, and a 43 times speedup when compared to single-core performance. The prototype can simulate 24 h of large-scale traffic in just 3.5 s. Based on these results, we advocate for integrating a distributed simulation approach into MATSim and outline steps for further optimizing the prototype for large-scale applications.
APA, Harvard, Vancouver, ISO, and other styles
17

Lala, Septem Riza, Farrah Dhiba Tyas, Setiawan Wawan, Hidayat Topik, and Fahs Mahmoud. "Parallel random projection using R high performance computing for planted motif search." TELKOMNIKA Telecommunication, Computing, Electronics and Control 17, no. 3 (2019): 1352–59. https://doi.org/10.12928/TELKOMNIKA.v17i3.11750.

Full text
Abstract:
Motif discovery in DNA sequences is one of the most important issues in bioinformatics. Thus, algorithms for dealing with the problem accurately and quickly have always been the goal of research in bioinformatics. Therefore, this study is intended to modify the random projection algorithm to be implemented on R high performance computing (i.e., the R package pbdMPI). Some steps are needed to achieve this objective, ie preprocessing data, splitting data according to number of batches, modifying and implementing random projection in the pbdMPI package, and then aggregating the results. To validate the proposed approach, some experiments have been conducted. Several benchmarking data were used in this study by sensitivity analysis on number of cores and batches. Experimental results show that computational cost can be reduced, which is that the computation cost of 6 cores is faster around 34 times compared with the standalone mode. Thus, the proposed approach can be used for motif discovery effectively and efficiently.
APA, Harvard, Vancouver, ISO, and other styles
18

Mills, Daniel, Seyon Sivarajah, Travis L. Scholten, and Ross Duncan. "Application-Motivated, Holistic Benchmarking of a Full Quantum Computing Stack." Quantum 5 (March 22, 2021): 415. http://dx.doi.org/10.22331/q-2021-03-22-415.

Full text
Abstract:
Quantum computing systems need to be benchmarked in terms of practical tasks they would be expected to do. Here, we propose 3 "application-motivated" circuit classes for benchmarking: deep (relevant for state preparation in the variational quantum eigensolver algorithm), shallow (inspired by IQP-type circuits that might be useful for near-term quantum machine learning), and square (inspired by the quantum volume benchmark). We quantify the performance of a quantum computing system in running circuits from these classes using several figures of merit, all of which require exponential classical computing resources and a polynomial number of classical samples (bitstrings) from the system. We study how performance varies with the compilation strategy used and the device on which the circuit is run. Using systems made available by IBM Quantum, we examine their performance, showing that noise-aware compilation strategies may be beneficial, and that device connectivity and noise levels play a crucial role in the performance of the system according to our benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
19

Mao, Hongyan, Zhengwei Qi, Jiangang Duan, and Xinni Ge. "Cost-Performance Modeling with Automated Benchmarking on Elastic Computing Clouds." Journal of Grid Computing 15, no. 4 (2017): 557–72. http://dx.doi.org/10.1007/s10723-017-9412-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Bahudaila, Subhi Abdul-rahim, and Waddah Ahmed Munasser. "Document classification in parallel environments using Java bindings in open MPI." University of Aden Journal of Natural and Applied Sciences 21, no. 2 (2017): 299–309. http://dx.doi.org/10.47372/uajnas.2017.n2.a09.

Full text
Abstract:
This paper describes the high performance computing (HPC) of document search engines that are vectorizing each classified document. The parallelization is achieved by exploiting the parallel and distributed computing environments of collective multiple processes that are implemented by using Java message passing interface (MPI) bindings of openMPI. A parallelism model of manager/worker is implemented for obtaining the load balancing, as well as the analysis and benchmarking are achieved in our parallelism profiling model that is designed for the implementation. Two output of the experimental results are shown: the parallel processing performance with the efficiency of 80%, and the profiling results that show the utilizations and overheads in our parallelism model.
APA, Harvard, Vancouver, ISO, and other styles
21

Decker, Jonathan, Piotr Kasprzak, and Julian Martin Kunkel. "Performance Evaluation of Open-Source Serverless Platforms for Kubernetes." Algorithms 15, no. 7 (2022): 234. http://dx.doi.org/10.3390/a15070234.

Full text
Abstract:
Serverless computing has grown massively in popularity over the last few years, and has provided developers with a way to deploy function-sized code units without having to take care of the actual servers or deal with logging, monitoring, and scaling of their code. High-performance computing (HPC) clusters can profit from improved serverless resource sharing capabilities compared to reservation-based systems such as Slurm. However, before running self-hosted serverless platforms in HPC becomes a viable option, serverless platforms must be able to deliver a decent level of performance. Other researchers have already pointed out that there is a distinct lack of studies in the area of comparative benchmarks on serverless platforms, especially for open-source self-hosted platforms. This study takes a step towards filling this gap by systematically benchmarking two promising self-hosted Kubernetes-based serverless platforms in comparison. While the resulting benchmarks signal potential, they demonstrate that many opportunities for performance improvements in serverless computing are being left on the table.
APA, Harvard, Vancouver, ISO, and other styles
22

Nancy, Arya, Choudhary Sunita, and S.Taruna. "ENERGY EFFICIENT COMPUTING FOR SMART PHONES IN CLOUD ASSISTED ENVIRONMENT." International Journal of Computer Networks & Communications (IJCNC) 11, no. 5 (2019): 59–78. https://doi.org/10.5281/zenodo.3517912.

Full text
Abstract:
In recent years, the employment of smart mobile phones has increased enormously and are concerned as an area of human life. Smartphones are capable to support immense range of complicated and intensive applications results shortened power capability and fewer performance. Mobile cloud computing is the newly rising paradigm integrates the features of cloud computing and mobile computing to beat the constraints of mobile devices. Mobile cloud computing employs computational offloading that migrates the computations from mobile devices to remote servers. In this paper, a novel model is proposed for dynamic task offloading to attain the energy optimization and better performance for mobile applications in the cloud environment. The paper proposed an optimum offloading algorithm by introducing new criteria such as benchmarking for offloading decision making. It also supports the concept of partitioning to divide the computing problem into various sub-problems. These sub-problems can be executed parallelly on mobile device and cloud. Performance evaluation results proved that the proposed model can reduce around 20% to 53% energy for low complexity problems and up to 98% for high complexity problems.
APA, Harvard, Vancouver, ISO, and other styles
23

Hajder, Piotr, and Łukasz Rauch. "Moving Multiscale Modelling to the Edge: Benchmarking and Load Optimization for Cellular Automata on Low Power Microcomputers." Processes 9, no. 12 (2021): 2225. http://dx.doi.org/10.3390/pr9122225.

Full text
Abstract:
Numerical computations are usually associated with the High Performance Computing. Nevertheless, both industry and science tend to involve devices with lower power in computations. This is especially true when the data collecting devices are able to partially process them at place, thus increasing the system reliability. This paradigm is known as Edge Computing. In this paper, we propose the use of devices at the edge, with lower computing power, for multi-scale modelling calculations. A system was created, consisting of a high-power device—a two-processor workstation, 8 RaspberryPi 4B microcomputers and 8 NVidia Jetson Nano units, equipped with GPU processor. As a part of this research, benchmarking was performed, on the basis of which the computational capabilities of the devices were classified. Two parameters were considered: the number and performance of computing units (CPUs and GPUs) and the energy consumption of the loaded machines. Then, using the calculated weak scalability and energy consumption, a min–max-based load optimization algorithm was proposed. The system was tested in laboratory conditions, giving similar computation time with same power consumption for 24 physical workstation cores vs. 8x RaspberryPi 4B and 8x Jetson Nano. The work ends with a proposal to use this solution in industrial processes on example of hot rolling of flat products.
APA, Harvard, Vancouver, ISO, and other styles
24

Meng, Li, Jinlong Zhu, and Liying Wang. "Classroom Teaching Performance Evaluation Model Guided by Big Data and Mobile Computing." Wireless Communications and Mobile Computing 2022 (March 24, 2022): 1–9. http://dx.doi.org/10.1155/2022/2084423.

Full text
Abstract:
Performance management has evolved rapidly in recent years and has become increasingly dominant in enterprise applications, whereas its application in the field of education has progressed slowly. Because performance management focuses on improving business performance and empowering employees, implementing it in schools helps students develop practical skills. This research focuses on evaluating classroom teaching performance using a big data and mobile computing-driven model. In addition, in the era of educational big data, this paper investigates the general process by which teachers acquire, analyze, and use educational data to improve teaching performance. The data mining method and mobile data capture are organically integrated into the benchmarking analysis to evaluate the classroom teaching performance of local universities, enriching the teaching management theories and methods of local universities. The findings show that benchmarking analysis can produce more meaningful results and provide new data for improving teaching management quality.
APA, Harvard, Vancouver, ISO, and other styles
25

Kösters, Dominique J., Bryan A. Kortman, Irem Boybat, et al. "Benchmarking energy consumption and latency for neuromorphic computing in condensed matter and particle physics." APL Machine Learning 1, no. 1 (2023): 016101. http://dx.doi.org/10.1063/5.0116699.

Full text
Abstract:
The massive use of artificial neural networks (ANNs), increasingly popular in many areas of scientific computing, rapidly increases the energy consumption of modern high-performance computing systems. An appealing and possibly more sustainable alternative is provided by novel neuromorphic paradigms, which directly implement ANNs in hardware. However, little is known about the actual benefits of running ANNs on neuromorphic hardware for use cases in scientific computing. Here, we present a methodology for measuring the energy cost and compute time for inference tasks with ANNs on conventional hardware. In addition, we have designed an architecture for these tasks and estimate the same metrics based on a state-of-the-art analog in-memory computing (AIMC) platform, one of the key paradigms in neuromorphic computing. Both methodologies are compared for a use case in quantum many-body physics in two-dimensional condensed matter systems and for anomaly detection at 40 MHz rates at the Large Hadron Collider in particle physics. We find that AIMC can achieve up to one order of magnitude shorter computation times than conventional hardware at an energy cost that is up to three orders of magnitude smaller. This suggests great potential for faster and more sustainable scientific computing with neuromorphic hardware.
APA, Harvard, Vancouver, ISO, and other styles
26

Hey, Tony, and Juri Papay. "Performance Engineering, PSEs and the GRID." Scientific Programming 10, no. 1 (2002): 3–17. http://dx.doi.org/10.1155/2002/354024.

Full text
Abstract:
Performance Engineering is concerned with the reliable prediction and estimation of the performance of scientific and engineering applications on a variety of parallel and distributed hardware. This paper reviews the present state of the art in 'Performance Engineering' for both parallel computing and meta-computing environments and attempts to look forward to the application of these techniques in the wider context of Problem Solving Environments and the Grid. The paper compares various techniques such as benchmarking, performance measurements, analytical modelling and simulation, and highlights the lessons learned in the related projects. The paper concludes with a discussion of the challenges of extending such methodologies to computational Grid environments.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Qiyang, and Mengwei Xu. "Benchmarking Mobile Deep Learning Software." GetMobile: Mobile Computing and Communications 28, no. 3 (2024): 5–8. http://dx.doi.org/10.1145/3701701.3701703.

Full text
Abstract:
Deploying deep learning (DL) on mobile devices has become increasingly prevalent. DL software libraries are crucial for efficient on-device inference, alongside algorithms and hardware. However, there has been limited understanding on the performance of modern DL libraries. We fill this gap by benchmarking 6 popular DL libraries and 15 diverse models across 10 mobile devices, which reveal an unsatisfactory landscape of mobile DL: their performance is highly disparate and fragmented across different models and hardware, and the impacts often surpass algorithm or hardware optimizations, such as model quantization and GPU/NPU-based computing. Finally, we provide practical implications for stakeholders in the DL library ecosystem, and envision a more ambitious picture of future mobile AI landscape in the LLM era.
APA, Harvard, Vancouver, ISO, and other styles
28

Verma, Ankur, Ayush Goyal, Soundar Kumara, and Thomas Kurfess. "Edge-cloud computing performance benchmarking for IoT based machinery vibration monitoring." Manufacturing Letters 27 (January 2021): 39–41. http://dx.doi.org/10.1016/j.mfglet.2020.12.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Jwo-Sy, Erik Nielsen, Matthew Ebert, et al. "Benchmarking a trapped-ion quantum computer with 30 qubits." Quantum 8 (November 7, 2024): 1516. http://dx.doi.org/10.22331/q-2024-11-07-1516.

Full text
Abstract:
Quantum computers are rapidly becoming more capable, with dramatic increases in both qubit count \cite{kim2023evidence} and quality \cite{moses2023race}. Among different hardware approaches, trapped-ion quantum processors are a leading technology for quantum computing, with established high-fidelity operations and architectures with promising scaling. Here, we demonstrate and thoroughly benchmark the IonQ Forte system: configured as a single-chain 30-qubit trapped-ion quantum computer with all-to-all operations. We assess the performance of our quantum computer operation at the component level via direct randomized benchmarking (DRB) across all 30 choose 2 = 435 gate pairs. We then show the results of application-oriented \cite{IonQ_AQ20_2022}\cite{qedcPeerReviewed} benchmarks and show that the system passes the suite of algorithmic qubit (AQ) benchmarks up to #AQ 29. Finally, we use our component-level benchmarking to build a system-level model to predict the application benchmarking data through direct simulation. While we find that the system-level model correlates with the experiment in predicting application circuit performance, we note quantitative discrepancies indicating significant out-of-model errors, leading to higher predicted performance than what is observed. This highlights that as quantum computers move toward larger and higher-quality devices, characterization becomes more challenging, suggesting future work required to push performance further.
APA, Harvard, Vancouver, ISO, and other styles
30

Solymosi, Bence, Nathalie Favretto-cristini, Vadim Monteiller, et al. "Numerical modeling with high performance computing of seismic waves for complex marine environments: Benchmarking with laboratory experiments." Journal of the Acoustical Society of America 143, no. 3 (2018): 1926. http://dx.doi.org/10.1121/1.5036290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Brower, R. C., C. Rebbi, P. Tamayo, K. J. M. Moriarty, and S. Sanielevici. "Benchmarking High-Performance Computing Systems By Means of Local-Creutz Simulations of the d = 2 Ising Model." International Journal of Supercomputing Applications 6, no. 3 (1992): 281–87. http://dx.doi.org/10.1177/109434209200600305.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Blanke, William J., and Imtiyaz Hussein. "Supercomputing in the South Pacific: performance of a parallel cluster using existing USP facilities." South Pacific Journal of Natural and Applied Sciences 22, no. 1 (2004): 67. http://dx.doi.org/10.1071/sp04016.

Full text
Abstract:
This paper presents the details of a parallel computing cluster built using existing computing resources at the University of the South Pacific. Benchmarking tests using the High Performance Linpack Benchmark were done in order to measure the gigaflops (billions of floating point operations per second) ratings for solving large systems of linear equations while varying the number of computers and Ethernet switches used. These tests provided an overall maximum gigaflops rating which allowed comparison of USP's cluster with leading edge clusters from around the world. Efficiency results also provided insight in how improving the existing network infrastructure might improve the performance of USP's cluster and increase its gigaflops rating. Further tests revealed that the number of Ethernet switches used in USP's current network layout is a definite contributor to the low efficiency of the system as a whole.
APA, Harvard, Vancouver, ISO, and other styles
33

Mammadov, Elshen, Annagi Asgarov, and Aysen Mammadova. "The Role of Artificial Intelligence in Modern Computer Architecture: From Algorithms to Hardware Optimization." Porta Universorum 1, no. 2 (2025): 65–71. https://doi.org/10.69760/portuni.010208.

Full text
Abstract:
The rapid advancement of artificial intelligence (AI) has significantly influenced the design and evolution of modern computer architectures. This article explores the dynamic relationship between AI algorithms and hardware, focusing on how neural networks have driven the development of specialized processors such as GPUs, TPUs, and neuromorphic chips. Through comparative analysis, performance benchmarking, and model-hardware interaction, the study highlights the transition from general-purpose computing systems to AI-optimized platforms. It also addresses emerging challenges related to scalability, energy efficiency, and security. The findings call for deeper interdisciplinary collaboration between AI researchers and hardware engineers to build systems that are both high-performing and sustainable in the age of intelligent computing.
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Zhouding, and Jia Li. "Scalable and Distributed Mathematical Modeling Algorithm Design and Performance Evaluation in Heterogeneous Computing Clusters." Scalable Computing: Practice and Experience 25, no. 5 (2024): 3812–21. http://dx.doi.org/10.12694/scpe.v25i5.3001.

Full text
Abstract:
A growing number of scalable and distributed methods are required to effectively simulate complicated events as computing needs in the research and industrial sectors keep growing. A novel approach for developing and accessing mathematically modeled methods in heterogeneous computing clusters is proposed in this study to meet this difficulty. The suggested methodology uses DRL based Parallel Computational model for the evaluation of Heterogenous computing clusters. The algorithms makes use of parallelization methods to split up the processing burden among several nodes, supporting the variety of topologies seen in contemporary computing clusters. Through the utilization of heterogeneous hardware parts such as CPUs, GPUs, and acceleration devices, the architecture seeks to maximize speed and minimize resource usage. To evaluate the effectiveness of the proposed approach, a comprehensive performance assessment is conducted. The evaluation encompasses scalability analysis, benchmarking, and comparisons against traditional homogeneous computing setups. The research investigates the impact of algorithm design choices on the efficiency and speed achieved in diverse computing environments.
APA, Harvard, Vancouver, ISO, and other styles
35

Huang, Xuanteng, Xianwei Zhang, Panfei Yang, and Nong Xiao. "Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS." Applied Sciences 13, no. 24 (2023): 13022. http://dx.doi.org/10.3390/app132413022.

Full text
Abstract:
GPUs have been broadly used to accelerate big data analytics, scientific computing and machine intelligence. Particularly, matrix multiplication and convolution are two principal operations that use a large proportion of steps in modern data analysis and deep neural networks. These performance-critical operations are often offloaded to the GPU to obtain substantial improvements in end-to-end latency. In addition, multifarious workload characteristics and complicated processing phases in big data demand a customizable yet performant operator library. To this end, GPU vendors, including NVIDIA and AMD, have proposed template and composable GPU operator libraries to conduct specific computations on certain types of low-precision data elements. We formalize a set of benchmarks via CUTLASS, NVIDIA’s templated library that provides high-performance and hierarchically designed kernels. The benchmarking results show that, with the necessary fine tuning, hardware-level ASICs like tensor cores could dramatically boost performance in specific operations like GEMM offloading to modern GPUs.
APA, Harvard, Vancouver, ISO, and other styles
36

Mas Magre, Isidre, Rogeli Grima Torres, José María Cela Espín, and José Julio Gutierrez Moreno. "The NOMAD mini-apps: A suite of kernels from ab initio electronic structure codes enabling co-design in high-performance computing." Open Research Europe 4 (May 29, 2024): 35. http://dx.doi.org/10.12688/openreseurope.16920.2.

Full text
Abstract:
This article introduces a suite of mini-applications (mini-apps) designed to optimise computational kernels in ab initio electronic structure codes. The suite is developed from flagship applications participating in the NOMAD Center of Excellence, such as the ELPA eigensolver library and the GW implementations of the exciting, Abinit, and FHI-aims codes. The mini-apps were identified by targeting functions that significantly contribute to the total execution time in the parent applications. This strategic selection allows for concentrated optimisation efforts. The suite is designed for easy deployment on various High-Performance Computing (HPC) systems, supported by an integrated CMake build system for straightforward compilation and execution. The aim is to harness the capabilities of emerging (post)exascale systems, which necessitate concurrent hardware and software development — a concept known as co-design. The mini-app suite serves as a tool for profiling and benchmarking, providing insights that can guide both software optimisation and hardware design. Ultimately, these developments will enable more accurate and efficient simulations of novel materials, leveraging the full potential of exascale computing in material science research.
APA, Harvard, Vancouver, ISO, and other styles
37

Mas Magre, Isidre, Rogeli Grima Torres, José María Cela Espín, and Julio Gutierrez Moreno. "The NOMAD mini-apps: A suite of kernels from ab initio electronic structure codes enabling co-design in high-performance computing." Open Research Europe 4 (February 19, 2024): 35. http://dx.doi.org/10.12688/openreseurope.16920.1.

Full text
Abstract:
This article introduces a suite of mini-applications (mini-apps) designed to optimise computational kernels in ab initio electronic structure codes. The suite is developed from flagship applications participating in the NOMAD Center of Excellence, such as the ELPA eigensolver library and the GW implementations of the exciting, Abinit, and FHI-aims codes. The mini-apps were identified by targeting functions that significantly contribute to the total execution time in the parent applications. This strategic selection allows for concentrated optimisation efforts. The suite is designed for easy deployment on various High-Performance Computing (HPC) systems, supported by an integrated CMake build system for straightforward compilation and execution. The aim is to harness the capabilities of emerging (post)exascale systems, which necessitate concurrent hardware and software development — a concept known as co-design. The mini-app suite serves as a tool for profiling and benchmarking, providing insights that can guide both software optimisation and hardware design. Ultimately, these developments will enable more accurate and efficient simulations of novel materials, leveraging the full potential of exascale computing in material science research.
APA, Harvard, Vancouver, ISO, and other styles
38

Mas Magre, Isidre, Rogeli Grima Torres, José María Cela Espín, and José Julio Gutierrez Moreno. "The NOMAD mini-apps: A suite of kernels from ab initio electronic structure codes enabling co-design in high-performance computing." Open Research Europe 4 (April 10, 2025): 35. https://doi.org/10.12688/openreseurope.16920.3.

Full text
Abstract:
This article introduces a suite of mini-applications (mini-apps) designed to optimise computational kernels in ab initio electronic structure codes. The suite is developed from flagship applications participating in the NOMAD Center of Excellence, such as the ELPA eigensolver library and the GW implementations of the exciting, Abinit, and FHI-aims codes. The mini-apps were identified by targeting functions that significantly contribute to the total execution time in the parent applications. This strategic selection allows for concentrated optimisation efforts. The suite is designed for easy deployment on various High-Performance Computing (HPC) systems, supported by an integrated CMake build system for straightforward compilation and execution. The aim is to harness the capabilities of emerging (post)exascale systems, which necessitate concurrent hardware and software development — a concept known as co-design. The mini-app suite serves as a tool for profiling and benchmarking, providing insights that can guide both software optimisation and hardware design. Ultimately, these developments will enable more accurate and efficient simulations of novel materials, leveraging the full potential of exascale computing in material science research.
APA, Harvard, Vancouver, ISO, and other styles
39

Xu, Teng, Sinan Xiao, Sebastian Reuschen, Nils Wildt, Harrie-Jan Hendricks Franssen, and Wolfgang Nowak. "Towards a community-wide effort for benchmarking in subsurface hydrological inversion: benchmarking cases, high-fidelity reference solutions, procedure, and first comparison." Hydrology and Earth System Sciences 28, no. 24 (2024): 5375–400. https://doi.org/10.5194/hess-28-5375-2024.

Full text
Abstract:
Abstract. Inversion in subsurface hydrology refers to estimating spatial distributions of (typically hydraulic) properties often associated with quantified uncertainty. Many methods are available, each characterized by a set of assumptions, approximations, and numerical implementations. Only a few intercomparison studies have been performed (in the remote past) amongst different approaches (e.g., Zimmerman et al., 1998; Hendricks Franssen et al., 2009). These intercomparisons guarantee broad participation to push forward research efforts of the entire subsurface hydrological inversion community. However, from past studies until now, comparisons have been made among approximate methods without firm reference solutions. Note that the reference solutions are the best possible solutions with the best estimate and posterior standard deviation and so forth. Without reference solutions, one can only compare competing best estimates and their associated uncertainties in an intercomparison sense, and absolute statements on accuracy are unreachable. Our current initiative defines benchmarking scenarios for groundwater model inversion. These are targeted for community-wide use as test cases in intercomparison scenarios. Here, we develop five synthetic, open-source benchmarking scenarios for the inversion of hydraulic conductivity from pressure data. We also provide highly accurate reference solutions produced with massive high-performance computing efforts and with a high-fidelity Markov chain Monte Carlo (MCMC)-type solution algorithm. Our high-end reference solutions are publicly available along with the benchmarking scenarios, the reference algorithm, and the suggested benchmarking metrics. Thus, in comparison studies, one can test against high-fidelity reference solutions rather than discussing different approximations. To demonstrate how to use these benchmarking scenarios, reference solutions, and suggested metrics, we provide a blueprint comparison of a specific ensemble Kalman filter (EnKF) version. We invite the community to use our benchmarking scenarios and reference solutions now and into the far future in a community-wide effort towards clean and conclusive benchmarking. For now, we aim at an article collection in an appropriate journal, where such clean comparison studies can be submitted together with an editorial summary that provides an overview.
APA, Harvard, Vancouver, ISO, and other styles
40

Resch, Salonik, and Ulya R. Karpuzcu. "Benchmarking Quantum Computers and the Impact of Quantum Noise." ACM Computing Surveys 54, no. 7 (2021): 1–35. http://dx.doi.org/10.1145/3464420.

Full text
Abstract:
Benchmarking is how the performance of a computing system is determined. Surprisingly, even for classical computers this is not a straightforward process. One must choose the appropriate benchmark and metrics to extract meaningful results. Different benchmarks test the system in different ways, and each individual metric may or may not be of interest. Choosing the appropriate approach is tricky. The situation is even more open ended for quantum computers, where there is a wider range of hardware, fewer established guidelines, and additional complicating factors. Notably, quantum noise significantly impacts performance and is difficult to model accurately. Here, we discuss benchmarking of quantum computers from a computer architecture perspective and provide numerical simulations highlighting challenges that suggest caution.
APA, Harvard, Vancouver, ISO, and other styles
41

Chen, Xinyu, Jiannan Tian, Ian Beaver, et al. "FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data." Proceedings of the VLDB Endowment 17, no. 6 (2024): 1418–31. http://dx.doi.org/10.14778/3648160.3648180.

Full text
Abstract:
While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications.
APA, Harvard, Vancouver, ISO, and other styles
42

Liu, Zhengchun, Rajkumar Kettimuthu, Joaquin Chung, Rachana Ananthakrishnan, Michael Link, and Ian Foster. "Design and Evaluation of a Simple Data Interface for Efficient Data Transfer across Diverse Storage." ACM Transactions on Modeling and Performance Evaluation of Computing Systems 6, no. 1 (2021): 1–25. http://dx.doi.org/10.1145/3452007.

Full text
Abstract:
Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these different systems, we propose Connector, a plug-able data access architecture for diverse, distributed storage. By abstracting low-level storage system details, this abstraction permits a managed data transfer service (Globus, in our case) to interact with a large and easily extended set of storage systems. Equally important, it supports third-party transfers: that is, direct data transfers from source to destination that are initiated by a third-party client but do not engage that third party in the data path. The abstraction also enables management of transfers for performance optimization, error handling, and end-to-end integrity. We present the Connector design, describe implementations for different storage services, evaluate tradeoffs inherent in managed vs. direct transfers, motivate recommended deployment options, and propose a model-based method that allows for easy characterization of performance in different contexts without exhaustive benchmarking.
APA, Harvard, Vancouver, ISO, and other styles
43

Chia, Harmon Lee Bruce. "Quantum computing and its revolutionary potential." Advances in Engineering Innovation 4, no. 1 (2023): 26–32. http://dx.doi.org/10.54254/2977-3903/4/2023022.

Full text
Abstract:
The rapid emergence of quantum computing offers the potential to revolutionize numerous domains, promising computational advantages over classical counterparts. This study aimed to evaluate the performance, efficiency, and robustness of selected quantum algorithmsQuantum Variational Eigensolver (VQE), Quantum Fourier Transform (QFT), and Quantum Phase Estimation (QPE)on near-term quantum devices. Our benchmarking revealed that, despite promising theoretical benefits, the practical deployment of these algorithms remains challenged by noise, error rates, and hardware limitations. The VQE showed promise in molecular modeling, while the utility of QFT and QPE in cryptography and optimization became evident. Nevertheless, their practical efficiency is contingent upon specific quantum hardware and employed error mitigation techniques. The findings underscore the transformative potential of quantum computing, but also emphasize the ongoing challenges that need addressing to make quantum computing practically advantageous.
APA, Harvard, Vancouver, ISO, and other styles
44

Caba, Julián, María Díaz, Jesús Barba, Raúl Guerra, and Jose A. de la Torre and Sebastián López. "FPGA-Based On-Board Hyperspectral Imaging Compression: Benchmarking Performance and Energy Efficiency against GPU Implementations." Remote Sensing 12, no. 22 (2020): 3741. http://dx.doi.org/10.3390/rs12223741.

Full text
Abstract:
Remote-sensing platforms, such as Unmanned Aerial Vehicles, are characterized by limited power budget and low-bandwidth downlinks. Therefore, handling hyperspectral data in this context can jeopardize the operational time of the system. FPGAs have been traditionally regarded as the most power-efficient computing platforms. However, there is little experimental evidence to support this claim, which is especially critical since the actual behavior of the solutions based on reconfigurable technology is highly dependent on the type of application. In this work, a highly optimized implementation of an FPGA accelerator of the novel HyperLCA algorithm has been developed and thoughtfully analyzed in terms of performance and power efficiency. In this regard, a modification of the aforementioned lossy compression solution has also been proposed to be efficiently executed into FPGA devices using fixed-point arithmetic. Single and multi-core versions of the reconfigurable computing platforms are compared with three GPU-based implementations of the algorithm on as many NVIDIA computing boards: Jetson Nano, Jetson TX2 and Jetson Xavier NX. Results show that the single-core version of our FPGA-based solution fulfils the real-time requirements of a real-life hyperspectral application using a mid-range Xilinx Zynq-7000 SoC chip (XC7Z020-CLG484). Performance levels of the custom hardware accelerator are above the figures obtained by the Jetson Nano and TX2 boards, and power efficiency is higher for smaller sizes of the image block to be processed. To close the performance gap between our proposal and the Jetson Xavier NX, a multi-core version is proposed. The results demonstrate that a solution based on the use of various instances of the FPGA hardware compressor core achieves similar levels of performance than the state-of-the-art GPU, with better efficiency in terms of processed frames by watt.
APA, Harvard, Vancouver, ISO, and other styles
45

Grenier, Antoine, Jie Lei, Hans Jakob Damsgaard, et al. "Hard SyDR: A Benchmarking Environment for Global Navigation Satellite System Algorithms." Sensors 24, no. 2 (2024): 409. http://dx.doi.org/10.3390/s24020409.

Full text
Abstract:
A Global Navigation Satellite System (GNSS) is widely used today for both positioning and timing purposes. Many distinct receiver chips are available as Application-Specific Integrated Circuit (ASIC)s off-the-shelf, each tailored to the requirements of various applications. These chips deliver good performance and low energy consumption but offer customers little-to-no transparency about their internal features. This prevents modification, research in GNSS processing chain enhancement (e.g., application of Approximate Computing (AxC) techniques), and design space exploration to find the optimal receiver for a use case. In this paper, we review the GNSS processing chain using SyDR, our open-source GNSS Software-Defined Radio (SDR) designed for algorithm benchmarking, and highlight the limitations of a software-only environment. In return, we propose an evolution to our system, called Hard SyDR to become closer to the hardware layer and access new Key Performance Indicator (KPI)s, such as power/energy consumption and resource utilization. We use High-Level Synthesis (HLS) and the PYNQ platform to ease our development process and provide an overview of their advantages/limitations in our project. Finally, we evaluate the foreseen developments, including how this work can serve as the foundation for an exploration of AxC techniques in future low-power GNSS receivers.
APA, Harvard, Vancouver, ISO, and other styles
46

Kaklamani, Dimitra I., and Andy Marsh. "Benchmarking high-performance computing platforms in analyzing electrically large planar conducting structures via a parallel computed method of moments technique." Radio Science 31, no. 5 (1996): 1281–90. http://dx.doi.org/10.1029/96rs00582.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Agarwal, Pankaj, and Kouros Owzar. "Next Generation Distributed Computing for Cancer Research." Cancer Informatics 13s7 (January 2014): CIN.S16344. http://dx.doi.org/10.4137/cin.s16344.

Full text
Abstract:
Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing.
APA, Harvard, Vancouver, ISO, and other styles
48

Valcke, Sophie, Andrea Piacentini, and Gabriel Jonville. "Benchmarking Regridding Libraries Used in Earth System Modelling." Mathematical and Computational Applications 27, no. 2 (2022): 31. http://dx.doi.org/10.3390/mca27020031.

Full text
Abstract:
Components of Earth system models (ESMs) usually use different numerical grids because of the different environments they represent. Therefore, a coupling field sent by a source model has to be regridded to be used by a target model. The regridding has to be accurate and, in some cases, conservative, in order to ensure the consistency of the coupled model. Here, we present work done to benchmark the quality of four regridding libraries currently used in ESMs, i.e., SCRIP, YAC, ESMF and XIOS. We evaluated five regridding algorithms with four different analytical functions for different combinations of six grids used in real ocean or atmosphere models. Four analytical functions were used to define the coupling fields to be regridded. This benchmark calculated some of the metrics proposed by the CANGA project, including the mean, maximum, RMS misfit, and global conservation. The results show that, besides a few very specific cases that present anomalous values, the regridding functionality in YAC, ESMF and XIOS can be considered of high quality and do not present the specific problems observed for the conservative SCRIP remapping. The evaluation of the computing performance of those libraries is not included in the current work but is planned to be performed in the coming months. This exercise shows that benchmarking can be a great opportunity to favour interactions between users and developers of regridding libraries.
APA, Harvard, Vancouver, ISO, and other styles
49

Ayodele Emmanuel Sonuga, Kingsley David Onyewuchi Ofoegbu, Chidiebere Somadina Ike, and Samuel Olaoluwa Folorunsho. "Deploying large language models on diverse computing architectures: A performance evaluation framework." Global Journal of Research in Engineering and Technology 2, no. 1 (2024): 018–36. http://dx.doi.org/10.58175/gjret.2024.2.1.0026.

Full text
Abstract:
Deploying large language models (LLMs) across diverse computing architectures is a critical challenge in the field of artificial intelligence, particularly as these models become increasingly complex and resource-intensive. This review presents a performance evaluation framework designed to systematically assess the deployment of LLMs on various computing architectures, including CPUs, GPUs, TPUs, and specialized accelerators. The framework is structured around key performance metrics such as computational efficiency, latency, throughput, energy consumption, and scalability. It considers the trade-offs associated with different hardware configurations, optimizing the deployment to meet specific application requirements. The evaluation framework employs a multi-faceted approach, integrating both theoretical and empirical analyses to offer comprehensive insights into the performance dynamics of LLMs. This includes benchmarking LLMs under varying workloads, data batch sizes, and precision levels, enabling a nuanced understanding of how these factors influence model performance across different hardware environments. Additionally, the framework emphasizes the importance of model parallelism and distribution strategies, which are critical for efficiently scaling LLMs on high-performance computing clusters. A significant contribution of this framework is its ability to guide practitioners in selecting the optimal computing architecture for LLM deployment based on application-specific needs, such as low-latency inference for real-time applications or energy-efficient processing for large-scale deployments. The framework also provides insights into cost-performance trade-offs, offering guidance for balancing the financial implications of different deployment strategies with their performance benefits. Overall, this performance evaluation framework is a valuable tool for researchers and engineers, facilitating the efficient deployment of LLMs on diverse computing architectures. By offering a systematic approach to evaluating and optimizing LLM performance, the framework supports the ongoing development and application of these models across various domains. This paper will evaluate the deployment of large language models (LLMs) on diverse computing architectures, including x86, ARM, and RISC-V platforms. It will discuss strategies for optimizing LLM performance, such as dynamic frequency scaling, core scaling, and memory optimization. The research will contribute to understanding the best practices for deploying AI applications on different architectures, supporting technological innovation and competitiveness.
APA, Harvard, Vancouver, ISO, and other styles
50

Ponzetto, S. P., and M. Strube. "Knowledge Derived From Wikipedia For Computing Semantic Relatedness." Journal of Artificial Intelligence Research 30 (October 10, 2007): 181–212. http://dx.doi.org/10.1613/jair.2308.

Full text
Abstract:
Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!