Dissertations / Theses on the topic 'Sparse matrix multiplication'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 25 dissertations / theses for your research on the topic 'Sparse matrix multiplication.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kunchum, Rakshith. "On Improving Sparse Matrix-Matrix Multiplication on GPUs." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492694387445938.
Full textAshari, Arash. "Sparse Matrix-Vector Multiplication on GPU." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417770100.
Full textRamachandran, Shridhar. "Incremental PageRank acceleration using Sparse Matrix-Sparse Vector Multiplication." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462894358.
Full textBalasubramanian, Deepan Karthik. "Efficient Sparse Matrix Vector Multiplication for Structured Grid Representation." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339730490.
Full textThumma, Vineeth Reddy. "Optimizing Sparse Matrix-Matrix Multiplication for Graph Computations on GPUs and Multi-Core Systems." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524113772955789.
Full textMansour, Ahmad [Verfasser]. "Sparse Matrix-Vector Multiplication Based on Network-on-Chip / Ahmad Mansour." München : Verlag Dr. Hut, 2015. http://d-nb.info/1075409470/34.
Full textSingh, Kunal. "High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524089757826551.
Full textEl-Kurdi, Yousef M. "Sparse Matrix-Vector floating-point multiplication with FPGAs for finite element electromagnetics." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98958.
Full textGodwin, Jeswin Samuel. "High-Performancs Sparse Matrix-Vector Multiplication on GPUS for Structured Grid Computations." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1357280824.
Full textKuang, Da. "Nonnegative matrix factorization for clustering." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/52299.
Full textBoyer, Brice. "Multiplication matricielle efficace et conception logicielle pour la bibliothèque de calcul exact LinBox." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00767915.
Full textNiu, Qingpeng. "Characterization and Enhancement of Data Locality and Load Balancing for Irregular Applications." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420811652.
Full textGonon, Antoine. "Harnessing symmetries for modern deep learning challenges : a path-lifting perspective." Electronic Thesis or Diss., Lyon, École normale supérieure, 2024. http://www.theses.fr/2024ENSL0043.
Full textNeural networks have demonstrated impressive practical success, but theoretical tools for analyzing them are often limited to simple cases that do not capture the complexity of real-world applications. This thesis seeks to narrow this gap by making theoretical tools more applicable to practical scenarios.The first focus of this work is on generalization: can a given network perform well on previously unseen data? This thesis improves generalization guarantees based on the path-norm and extends their applicability to ReLU networks incorporating pooling or skip connections. By reducing the gap between theoretically analyzable networks and those used in practice, this work provides the first empirical evaluation of these guarantees on practical ReLU networks, such as ResNets.The second focus is on resource optimization (time, energy, memory). This thesis introduces a novel pruning method based on the path-norm, which not only retains the accuracy of traditional magnitude pruning but also exhibits robustness to parameter symmetries. Additionally, this work presents a new GPU matrix multiplication algorithm that enhances the state-of-the-art for sparse matrices with Kronecker-structured support, achieving gains in both time and energy. Finally, this thesis makes approximation guarantees for neural networks more concrete by establishing sufficient bit-precision conditions to ensure that quantized networks maintain the same approximation speed as their unconstrained real-weight counterparts
Hong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.
Full textTsai, Minzong, and 蔡旻容. "Implementing Simple Parallel Sparse Matrix-Matrix Multiplication Using OpenMP." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/69501441604898008038.
Full text靜宜大學
資訊碩士在職專班
100
Recently, parallel programming is becoming more popular because of current trend of multi-core technology. In this thesis, we design a parallel simple sparse matrix-matrix multiplication using common sparse format. We analyze the data structure of sparse matrix named CRS (Compress Row Storage) and CCS (Compress Column Storage) in order to reduce unnecessary mathematical operation. Our algorithms are written in C and the parallel algorithms are implemented using OpenMP. We present four parallel algorithms of sparse matrix multiplication named CRSxCRS, CRSxCCS, CCSxCCS, and CCSxCRS using OpenMP and run on Dell 6950. Finally, we compare their performances. The preliminary experimental result shows that the CRSxCRS and CCSxCCS give the best in four algorithms.
Wu, Xiaolong. "Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform." 2015. http://scholarworks.gsu.edu/cs_theses/83.
Full textLin, Yi-sheng, and 林于勝. "The Design of a NoC-Based Sparse Matrix Multiplication System." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/17795356751665658959.
Full text國立臺灣科技大學
電子工程系
100
With the advance of deep-submicron technologies, a huge number of IP blocks can be integrated into a single chip. However, the bus-based architecture becomes a performance bottleneck for the multicore SoC. In this thesis, a NoC-based architecture for sparse matrix multiplication is proposed to improve the performance of parallel computation. The architecture consists of routers, network interfaces, and processing elements. Our router is efficient by using the pipeline technique and wormhole switching with the XY routing algorithm. We also present a method of mapping and partitioning for large matrices to increase the load balancing and efficiency of packet distribution. In addition, the mesh-based network is fully parameterized so that it is flexible. Various sizes of the NoC-based architecture have been implemented, including 2×2, 2×4, 4×4, 4×8 ,and 8×8, on Xilinx Virtex 5 and TSMC 0.18 um cell library. In the FPGA implementation, the performance of our design is evaluated by a number of random and real-application matrices. Besides, the effects of network sizes, matrix sizes, and sparsity on the system performance are considered. Compared with MicroBlaze and Intel processors, our design achieves up to 40x and 2x speedup respectively. In the ASIC implementation, the core area of the 4×4 NoC architecture is 1,986.5 um×1,985.4 um, which is equivalent to 259,026 gates. The average power consumption is 417 mW at the operating frequency of 166 MHz.
Tsai, Sung-Han, and 蔡松翰. "Optimization for sparse matrix-vector multiplication based on NVIDIA CUDA platform." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/qw23p7.
Full text國立彰化師範大學
資訊工程學系
105
In recent years, large size sparse matrices are often used in fields such as science and engineering which usually apply in computing linear model. Using the ELLPACK format to store sparse matrices, it can reduce the matrix storage space. But if there is too much nonzero elements in one of row of the original sparse matrix, it still waste too much memory space. There are many research focusing on the Sparse Matrix–Vector Multiplication(SpMV)with ELLPACK format on Graphic Processing Unit(GPU). Therefore, the purpose of our research is reducing the access space of sparse matrix which is transformed in Compressed Sparse Row(CSR)format after Reverse Cutthill-McKee(RCM)algorithm to accelerate for SpMV on GPU. Due to lower data access ratio from SpMV, the performance is restricted by memory bandwidth. Our propose is based on CSR format from two aspects:(1)reduce cache misses to enhance the vector locality and raise the performance, and(2)reduce accessed matrix data by index reduction to optimize the performance.
Ramesh, Chinthala. "Hardware-Software Co-Design Accelerators for Sparse BLAS." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4276.
Full textPiccolo, Alessandro, and Johan Soodla. "Performance of parallel sparse matrix-matrixmultiplication." Thesis, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-255134.
Full textJheng, Hong-Yuan, and 鄭弘元. "FPGA Acceleration of Sparse Matrix-Vector Multiplication Based on Network-on-Chip." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/y884tf.
Full text國立臺灣科技大學
電子工程系
99
The Sparse Matrix-Vector Multiplication (SMVM) is a pervasive operation in many scientific and engineering applications. Moreover, SMVM is a computational intensive operation that dominates the performance in most iterative linear system solvers. There are some optimization challenges in computations involving SMVM due to its high memory access rate and irregular memory access pattern. In this thesis, a new design concept for SMVM in an FPGA by using Network-on-Chip (NoC) is presented. In traditional circuit design on-chip communications have been designed with dedicated point-to-point interconnections or shared buses. Therefore, regular data transfer is the major concern of many parallel implementations. However, when dealing with the SMVM operation, the required data transfers are usually dependent on the sparsity structure of the matrix and can be extremely irregular. Using an NoC architecture makes it possible to deal with arbitrary structure of the data transfers, i.e. with arbitrary structured sparse matrices. In addition, the size of the pipelined SMVM calculator based on NoC architecture can be customized to 2×2, 4×4, ..., p×p (p∈N) due to its high scalibility and flexibility. The implementation is done in IEEE-754 single floating-point precision on the Xilinx Virtex-6 FPGA. The experimental results show that the proposed NoC-based implementation can achieve approximate 2.3 - 5.6 speed-up over the MATLAB-based software implementation in Matrix Market benchmark applications.
Hsu, Wei-chun, and 徐偉郡. "Sparse Matrix-Vector Multiplication: A Low Communication Cost Data Mapping-Based Architecture." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/09761233547687794389.
Full text國立臺灣科技大學
電子工程系
103
The performance of the sparse matrix-vector multiplication (SMVM) on a parallel system is strongly conditioned by the distribution of data among its components. Two costs arise as a result of the used data mapping method: arithmetic and communication. The communication cost of an algorithm often dominates the arithmetic cost, and the gap between these costs tends to increase. Therefore, finding a mapping method that reduces the communication cost is of high importance. On the other hand, the load distribution among the processing units must not be sacrificed as well. In this paper, a data mapping method is proposed for SMVM on Network-on-Chip (NoC) which achieves balanced working load and reduces the communication cost. Afterwards, an FPGA-based architecture is introduced which is designed to fit the proposed data mapping method. The experimental results show that the communication cost of the proposed design is 40\% lower than the previous work.
TSAI, NIAN-YING, and 蔡念穎. "On Job Allocation Strategies for Running Sparse Matrix-Vector Multiplication on GPUs." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/mpwmh4.
Full text國立彰化師範大學
資訊工程學系
105
In the era of big data, Graphic Processing Unit (GPU) has been widely used to deal with many parallelization problems as the amount of data to be processed. Sparse Matrix – Vector Multiplication is an important and basic operation in many fields, there are still many improved space for raising the performance of the GPU operation. This paper is mainly about job allocation strategies for running Sparse Matrix-Vector Multiplication on GPUs. The LightSpMV algorithm is based on the standard CSR format. The CSR format is a common sparse matrix storage format which is more flexible and better than other formats. The LightSpMV algorithm uses two dynamic configuration methods, whose matrix row is distributed to one for vector and the other for warp. Both of the methods use Atomic operations to get the Row index values. Because Atomic operations spent too much execution time, we proposed three strategies for this part of the workload allocation: (1) Using warp as the basic unit, through doubling the number of rows which have to be executed for each allocation, to make the number of Atomic operations reduced. (2) Using block as the basic unit, the number of rows are allocated dynamically. Compared to the dynamic configuration of using warp as basic unit, this strategy can reduce the number of Atomic operations. (3) Using block as the basic unit, the number of rows executed by blocks are static allocation. In each block we reuse warp as the basic unit and the warp are allocated dynamically instead of Atomic operations.After the implementation of our experiments in the work environment of the GTX980 GPU, the best performance is the third strategy and the performance improvement is nearly 100%.
LIN, ANG-HSUAN, and 林昂萱. "Implementing OpenCL Sparse Matrix Multiplication and Transposition Using DIA Format on Intel HD Graphics 5000 mobile device." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/86833354306613661461.
Full text靜宜大學
資訊應用與科技管理碩士在職專班
105
Sparse matrix-matrix multiplication (SPMM) is a basic operation for mathematics, linear algebra and statistics. For many years, hundreds of researchers have tried to enhance the performance of the operation using multi-threads, grids and GPU (graphics processing unit) architecture. They have gotten huge achievements and leaded us to the new age of high performance computing. In this thesis, we implement a parallel sparse matrix-matrix multiplication and transposition in DIA format using OpenCL that run on Intel HD Graphics 5000 mobile device, which is a graphic chip built in Intel-i5-4260U CPU. Our experimental result shows that the speed up can be achieved between 2 and 12 times even though there is fewer units of work. Hence, it is also beneficial to use a mobile device for scientific computing.
Mirza, Salma. "Scalable, Memory-Intensive Scientific Computing on Field Programmable Gate Arrays." 2010. https://scholarworks.umass.edu/theses/404.
Full text