Log in

Relevant bibliographies by topics / Network processors Computer architecture. Computer networks / Journal articles

To see the other types of publications on this topic, follow the link: Network processors Computer architecture. Computer networks.

Journal articles on the topic 'Network processors Computer architecture. Computer networks'

Author: Grafiati

Published: 4 June 2021

Last updated: 16 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Network processors Computer architecture. Computer networks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

OMONDI, AMOS R. "Letter to the Editor: NEUROCOMPUTERS: A DEAD END?" International Journal of Neural Systems 10, no. 06 (December 2000): 475–81. http://dx.doi.org/10.1142/s0129065700000375.

Full text

Abstract:

The last decade saw a proliferation of research into the design of neurocomputers. Although such work still continues, much of it is never beyond the prototype-machine stage. In this paper, we argue that, on the whole, neurocomputers are no longer viable; like, say, database computers before them, their time has passed before they became a common reality. We consider the implementation of hardware neural networks, from the level of arithmetic to complete individual processors and parallel processors and show that currents trends in computer architecture and implementation are not supportive of a case for custom neurocomputers. We argue that in the future, neural-network processing ought to be mostly restricted to general-purpose processors or to processors that have been designed for other widely-used applications. There are just one or two, rather narrow, exceptions to this.

APA, Harvard, Vancouver, ISO, and other styles

2

GONZALEZ, TEOFILO F. "Improved Communication Schedules with Buffers." Parallel Processing Letters 19, no. 01 (March 2009): 129–39. http://dx.doi.org/10.1142/s0129626409000110.

Full text

Abstract:

We consider the multimessage multicasting over the n processor complete (or fully connected) static network when there are l incoming (message) buffers on every processor. We present an efficient algorithm to route the messages for every degree d problem instance in d2/l + l - 1 total communication rounds, where d is the maximum number of messages that each processor may send (or receive). Our algorithm takes linear time with respect to the input length, i.e. O(n + q) where q is the total number of messages that all processors must receive. For l = d we present a lower bound for the total communication time. The lower bound matches the upper bound for the schedules generated by our algorithm. For convenience we assume that the network is completely connected. However, it is important to note that each communication round can be automatically translated into one communication round for processors interconnected via a replication network followed by a permutation network (e.g., two adjacent Benes networks), because in these networks all possible one-to-many communications can be performed in a single communication round.

APA, Harvard, Vancouver, ISO, and other styles

3

PETIT, FRANCK, and VINCENT VILLAIN. "OPTIMALITY AND SELF-STABILIZATION IN ROOTED TREE NETWORKS." Parallel Processing Letters 10, no. 01 (March 2000): 3–14. http://dx.doi.org/10.1142/s0129626400000032.

Full text

Abstract:

In this paper, we consider arbitrary tree networks where every processor, except one, called the root, executes the same program. We show that, to design a depth-first token circulation protocol in such networks, it is necessary to have at least [Formula: see text] configurations, where n is the number of processors in the network and Δi is the degree of processor pi. We then propose a depth-first token circulation algorithm which matches the above minimal number of configurations. We show that the proposed algorithm is self-stabilizing, i.e., the system eventually recovers itself to a legitimate state after any perturbation modifying the state of the processors. Hence, the proposed algorithm is optimal in terms of the number of configurations and no extra cost is involved in making it stabilizing.

APA, Harvard, Vancouver, ISO, and other styles

4

PETIT, FRANCK, and VINCENT VILLAIN. "OPTIMALITY AND SELF-STABILIZATION IN ROOTED TREE NETWORKS." Parallel Processing Letters 09, no. 03 (September 1999): 313–23. http://dx.doi.org/10.1142/s0129626499000293.

Full text

Abstract:

In this paper, we consider arbitrary tree networks where every processor, except one, called the root, executes the same program. We show that, to design a depth-first token circulation protocol in such networks, it is necessary to have at least [Formula: see text] configurations, where n is the number of processors in the network and Δi is the degree of processor pi. We then propose a depth-first token circulation algorithm which matches the above minimal number of configurations. We show that the proposed algorithm is self-stabilizing, i.e., the system eventually recovers itself to a legitimate state after any perturbation modifying the state of the processors. Hence, the proposed algorithm is optimal in terms of the number of configurations and no extra cost is involved in making it stabilizing.

APA, Harvard, Vancouver, ISO, and other styles

5

Summers, Kenneth L., Thomas Preston Caudell, Kathryn Berkbigler, Brian Bush, Kei Davis, and Steve Smith. "Graph Visualization for the Analysis of the Structure and Dynamics of Extreme-Scale Supercomputers." Information Visualization 3, no. 3 (July 8, 2004): 209–22. http://dx.doi.org/10.1057/palgrave.ivs.9500079.

Full text

Abstract:

We are exploring the development and application of information visualization techniques for the analysis of new massively parallel supercomputer architectures. Modern supercomputers typically comprise very large clusters of commodity SMPs interconnected by possibly dense and often non-standard networks. The scale, complexity, and inherent non-locality of the structure and dynamics of this hardware, and the operating systems and applications distributed over them, challenge traditional analysis methods. As part of the á la carte (A Los Alamos Computer Architecture Toolkit for Extreme-Scale Architecture Simulation) team at Los Alamos National Laboratory, who are simulating these new architectures, we are exploring advanced visualization techniques and creating tools to enhance analysis of these simulations with intuitive three-dimensional representations and interfaces. This work complements existing and emerging algorithmic analysis tools. In this paper, we give background on the problem domain, a description of a prototypical computer architecture of interest (on the order of 10,000 processors connected by a quaternary fat-tree communications network), and a presentation of three classes of visualizations that clearly display the switching fabric and the flow of information in the interconnecting network.

APA, Harvard, Vancouver, ISO, and other styles

6

FERREIRA, A., A. GOLDMAN, and S. W. SONG. "BROADCASTING IN BUS INTERCONNECTION NETWORKS." Journal of Interconnection Networks 01, no. 02 (June 2000): 73–94. http://dx.doi.org/10.1142/s0219265900000068.

Full text

Abstract:

In most distributed memory MIMD multiprocessors, processors are connected by a point-to-point interconnection network, usually modeled by a graph where processors are nodes and communication links are edges. Since interprocessor communication frequently constitutes serious bottlenecks, several architectures were proposed that enhance point-to-point topologies with the help of multiple bus systems so as to improve the communication efficiency. In this paper we study parallel architectures where the communication means are constituted solely by buses. These architectures can use the power of bus technologies, providing a way to interconnect much more processors in a simple and efficient manner. We present the hyperpath, hypergrid, hyperring, and hypertorus architectures, which are the bus-based versions of the well used point-to-point interconnection networks. Using (hyper) graph theoretic concepts to model inter-processor communication in such networks, we give optimal algorithms for broadcasting a message from one processor to all the others. For deriving high performance communication patterns we developed a new tool called simplification. The idea is to construct a graph, to be called representative graph, from the original hyper-topology, in such a way that it will become easy to describe and perform communication schemes to the former that will fit to the latter, because the simplification concept also allows us to partially use some already known communication algorithms for usual networks.

APA, Harvard, Vancouver, ISO, and other styles

7

Sánchez Couso, José Ramón, José Angel Sanchez Martín, Victor Mitrana, and Mihaela Păun. "Simulations between Three Types of Networks of Splicing Processors." Mathematics 9, no. 13 (June 28, 2021): 1511. http://dx.doi.org/10.3390/math9131511.

Full text

Abstract:

Networks of splicing processors (NSP for short) embody a subcategory among the new computational models inspired by natural phenomena with theoretical potential to handle unsolvable problems efficiently. Current literature considers three variants in the context of networks managed by random-context filters. Despite the divergences on system complexity and control degree over the filters, the three variants were proved to hold the same computational power through the simulations of two computationally complete systems: Turing machines and 2-tag systems. However, the conversion between the three models by means of a Turing machine is unattainable because of the huge computational costs incurred. This research paper addresses this issue with the proposal of direct and efficient simulations between the aforementioned paradigms. The information about the nodes and edges (i.e., splicing rules, random-context filters, and connections between nodes) composing any network of splicing processors belonging to one of the three categories is used to design equivalent networks working under the other two models. We demonstrate that these new networks are able to replicate any computational step performed by the original network in a constant number of computational steps and, consequently, we prove that any outcome achieved by the original architecture can be accomplished by the constructed architectures without worsening the time complexity.

APA, Harvard, Vancouver, ISO, and other styles

8

Ferreira de Lima, Thomas, Alexander N. Tait, Armin Mehrabian, Mitchell A. Nahmias, Chaoran Huang, Hsuan-Tung Peng, Bicky A. Marquez, et al. "Primer on silicon neuromorphic photonic processors: architecture and compiler." Nanophotonics 9, no. 13 (August 10, 2020): 4055–73. http://dx.doi.org/10.1515/nanoph-2020-0172.

Full text

Abstract:

AbstractMicroelectronic computers have encountered challenges in meeting all of today’s demands for information processing. Meeting these demands will require the development of unconventional computers employing alternative processing models and new device physics. Neural network models have come to dominate modern machine learning algorithms, and specialized electronic hardware has been developed to implement them more efficiently. A silicon photonic integration industry promises to bring manufacturing ecosystems normally reserved for microelectronics to photonics. Photonic devices have already found simple analog signal processing niches where electronics cannot provide sufficient bandwidth and reconfigurability. In order to solve more complex information processing problems, they will have to adopt a processing model that generalizes and scales. Neuromorphic photonics aims to map physical models of optoelectronic systems to abstract models of neural networks. It represents a new opportunity for machine information processing on sub-nanosecond timescales, with application to mathematical programming, intelligent radio frequency signal processing, and real-time control. The strategy of neuromorphic engineering is to externalize the risk of developing computational theory alongside hardware. The strategy of remaining compatible with silicon photonics externalizes the risk of platform development. In this perspective article, we provide a rationale for a neuromorphic photonics processor, envisioning its architecture and a compiler. We also discuss how it can be interfaced with a general purpose computer, i.e. a CPU, as a coprocessor to target specific applications. This paper is intended for a wide audience and provides a roadmap for expanding research in the direction of transforming neuromorphic photonics into a viable and useful candidate for accelerating neuromorphic computing.

APA, Harvard, Vancouver, ISO, and other styles

9

Wohl, Peter. "EFFICIENCY THROUGH REDUCED COMMUNICATION IN MESSAGE PASSING SIMULATION OF NEURAL NETWORKS." International Journal on Artificial Intelligence Tools 02, no. 01 (March 1993): 133–62. http://dx.doi.org/10.1142/s0218213093000096.

Full text

Abstract:

Neural algorithms require massive computation and very high communication bandwidth and are naturally expressed at a level of granularity finer than parallel systems can exploit efficiently. Mapping Neural Networks onto parallel computers has traditionally implied a form of clustering neurons and weights to increase the granularity. SIMD simulations may exceed a million connections per second using thousands of processors, but are often tailored to particular networks and learning algorithms. MIMD simulations required an even larger granularity to run efficiently and often trade flexibility for speed. An alternative technique based on pipelining fewer but larger messages through parallel. “broadcast/accumulate trees” is explored. “Lazy” allocation of messages reduces communication and memory requirements, curbing excess parallelism at run time. The mapping is flexible to changes in network architecture and learning algorithm and is suited for a variety of computer configurations. The method pushes the limits of parallelizing backpropagation and feed-forward type algorithms. Results exceed a million connections per second already on 30 processors and are up to ten times superior to previous results on similar hardware. The implementation techniques can also be applied in conjunction with others, including systolic and VLSI.

APA, Harvard, Vancouver, ISO, and other styles

10

Amodu, Oluwatosin Ahmed, Mohamed Othman, Nur Arzilawati Md Yunus, and Zurina Mohd Hanapi. "A Primer on Design Aspects and Recent Advances in Shuffle Exchange Multistage Interconnection Networks." Symmetry 13, no. 3 (February 26, 2021): 378. http://dx.doi.org/10.3390/sym13030378.

Full text

Abstract:

Interconnection networks provide an effective means by which components of a system such as processors and memory modules communicate to provide reliable connectivity. This facilitates the realization of a highly efficient network design suitable for computational-intensive applications. Particularly, the use of multistage interconnection networks has unique advantages as the addition of extra stages helps to improve the network performance. However, this comes with challenges and trade-offs, which motivates researchers to explore various design options and architectural models to improve on its performance. A particular class of these networks is shuffle exchange network (SEN) which involves a symmetric N-input and N-output architecture built in stages of N/2 switching elements each. This paper presents recent advances in multistage interconnection networks with emphasis on SENs while discussing pertinent issues related to its design aspects, and taking lessons from the past and current literature. To achieve this objective, applications, motivating factors, architectures, shuffle exchange networks, and some of the performance evaluation techniques as well as their merits and demerits are discussed. Then, to capture the latest research trends in this area not covered in contemporary literature, this paper reviews very recent advancements in shuffle exchange multistage interconnection networks within the last few years and provides design guidelines as well as recommendations for future consideration.

APA, Harvard, Vancouver, ISO, and other styles

11

Shahsavari, Mahyar, Jonathan Beaumont, David Thomas, and Andrew D. Brown. "POETS: A Parallel Cluster Architecture for Spiking Neural Network." International Journal of Machine Learning and Computing 11, no. 4 (August 2021): 281–85. http://dx.doi.org/10.18178/ijmlc.2021.11.4.1048.

Full text

Abstract:

Spiking Neural Networks (SNNs) are known as a branch of neuromorphic computing and are currently used in neuroscience applications to understand and model the biological brain. SNNs could also potentially be used in many other application domains such as classification, pattern recognition, and autonomous control. This work presents a highly-scalable hardware platform called POETS, and uses it to implement SNN on a very large number of parallel and reconfigurable FPGA-based processors. The current system consists of 48 FPGAs, providing 3072 processing cores and 49152 threads. We use this hardware to implement up to four million neurons with one thousand synapses. Comparison to other similar platforms shows that the current POETS system is twenty times faster than the Brian simulator, and at least two times faster than SpiNNaker.

APA, Harvard, Vancouver, ISO, and other styles

12

Lirkov, Ivan. "Performance Analysis of a Scalable Algorithm for 3D Linear Transforms on Supercomputer with Intel Processors/Co-Processors." Cybernetics and Information Technologies 20, no. 6 (December 1, 2020): 94–104. http://dx.doi.org/10.2478/cait-2020-0064.

Full text

Abstract:

Abstract Practical realizations of 3D forward/inverse separable discrete transforms, such as Fourier transform, cosine/sine transform, etc. are frequently the principal limiters that prevent many practical applications from scaling to a large number of processors. Existing approaches, which are based primarily on 1D or 2D data decompositions, prevent the 3D transforms from effectively scaling to the maximum (possible/available) number of computer nodes. A highly scalable approach to realize forward/inverse 3D transforms has been proposed. It is based on a 3D decomposition of data and geared towards a torus network of computer nodes. The proposed algorithms requires compute-and-roll time-steps, where each step consists of an execution of multiple GEMM operations and concurrent movement of cubical data blocks between nearest neighbors. The aim of this paper is to present an experimental performance study of an implementation on high performance computer architecture.

APA, Harvard, Vancouver, ISO, and other styles

13

DRĂGOI, CEZARA, and FLORIN MANEA. "ON THE DESCRIPTIONAL COMPLEXITY OF ACCEPTING NETWORKS OF EVOLUTIONARY PROCESSORS WITH FILTERED CONNECTIONS." International Journal of Foundations of Computer Science 19, no. 05 (October 2008): 1113–32. http://dx.doi.org/10.1142/s0129054108006170.

Full text

Abstract:

In this paper we consider, from the descriptional complexity point of view, a model of computation introduced in [1], namely accepting network of evolutionary processors with filtered connections (ANEPFCs). First we show that for each morphism h : V → W*, with V ∩ W = ∅, one can effectively construct an ANEPFC, of size 6 + |W|, which accepts every input word w and, at the end of the computation on this word, obtains h(w) in its output node. This result can be applied in constructing two different ANEPFCs, with 27 and, respectively, 26 processors, recognizing a given recursively enumerable language. The first architecture, based on the construction of a universal ANEPFC, has the property that only 7 of its 27 processors depend on the accepted language. On the other hand, all the 26 processors of the second architecture depend on the accepted language, but, differently from the first one, this network simulates efficiently (from both time and space perspectives) a nondeterministic Turing machine accepting the given language.

APA, Harvard, Vancouver, ISO, and other styles

14

Latifi, Shahram, and Ramesh Gajjala. "Reliability Evaluation of Braided Networks Using A Recursive Method." Parallel Processing Letters 07, no. 01 (March 1997): 77–88. http://dx.doi.org/10.1142/s0129626497000103.

Full text

Abstract:

The global reliability of a connected network reflects the probability of having a connected path between every pair of processors. Given any network, there are several ways of finding this reliability measure such as path set and cut set methods. These methods are computation extensive for large networks and do not utilize the symmetry of the networks. In this paper, we present a generalization of a special class of networks called Braided-Line networks and derive the reliability expressions for these networks using recursive techniques. These recursive techniques utilize the symmetry of the networks thus reducing the computation time of the global reliability substantially.

APA, Harvard, Vancouver, ISO, and other styles

15

Shorfuzzaman, Mohammad, Rasit Eskicioglu, and Peter Graham. "In-Network Adaptation of Video Streams Using Network Processors." Advances in Multimedia 2009 (2009): 1–20. http://dx.doi.org/10.1155/2009/905890.

Full text

Abstract:

The increasing variety of networks and end systems, especially wireless devices, pose new challenges in communication support for, particularly, multicast-based collaborative applications. In traditional multicasting, the sender transmits video at the same rate and resolution to all receivers independent of their network characteristics, end system equipment, and users' preferences about video quality and significance. Such an approach results in resources being wasted and may also result in some receivers having their quality expectations unsatisfied. This problem can be addressed, near the network edge, by applying dynamic, in-network adaptation (e.g., transcoding) of video streams to meet available connection bandwidth, machine characteristics, and client preferences. In this paper, we extrapolate from earlier work of Shorfuzzaman et al. 2006 in which we implemented and assessed an MPEG-1 transcoding system on the Intel IXP1200 network processor to consider the feasibility of in-network transcoding for other video formats and network processor architectures. The use of “on-the-fly” video adaptation near the edge of the network offers the promise of simpler support for a wide range of end devices with different display, and so forth, characteristics that can be used in different types of environments.

APA, Harvard, Vancouver, ISO, and other styles

16

GASTALDO, MICHEL, MICHEL MORVAN, and J. MIKE ROBSON. "TRANSITIVE CLOSURE IN PARALLEL ON A LINEAR NETWORK OF PROCESSORS." Parallel Processing Letters 02, no. 02n03 (September 1992): 195–203. http://dx.doi.org/10.1142/s0129626492000325.

Full text

Abstract:

In this paper, we propose a linear time parallel algorithm (in the number of edges of the transitive closure) that computes the transitive closure of a directed graph on a linear network of n processors. The underlying architecture is a linear network of processors with neighbouring communications, where the number of processors is equal to the number of vertices of the graph.

APA, Harvard, Vancouver, ISO, and other styles

17

Jan, Yahya, and Lech Jóźwiak. "Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors." VLSI Design 2012 (March 25, 2012): 1–20. http://dx.doi.org/10.1155/2012/794753.

Full text

Abstract:

This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global) partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.

APA, Harvard, Vancouver, ISO, and other styles

18

Kim, Sungi, Namjun Kim, Jinyoung Seo, Jeong-Eun Park, Eun Ho Song, So Young Choi, Ji Eun Kim, Seungsang Cha, Ha H. Park, and Jwa-Min Nam. "Nanoparticle-based computing architecture for nanoparticle neural networks." Science Advances 6, no. 35 (August 2020): eabb3348. http://dx.doi.org/10.1126/sciadv.abb3348.

Full text

Abstract:

The lack of a scalable nanoparticle-based computing architecture severely limits the potential and use of nanoparticles for manipulating and processing information with molecular computing schemes. Inspired by the von Neumann architecture (VNA), in which multiple programs can be operated without restructuring the computer, we realized the nanoparticle-based VNA (NVNA) on a lipid chip for multiple executions of arbitrary molecular logic operations in the single chip without refabrication. In this system, nanoparticles on a lipid chip function as the hardware that features memory, processors, and output units, and DNA strands are used as the software to provide molecular instructions for the facile programming of logic circuits. NVNA enables a group of nanoparticles to form a feed-forward neural network, a perceptron, which implements functionally complete Boolean logic operations, and provides a programmable, resettable, scalable computing architecture and circuit board to form nanoparticle neural networks and make logical decisions.

APA, Harvard, Vancouver, ISO, and other styles

19

Das, Sajal K., Dirk H. Hohndel, Maximilian Ibel, and Sabine R. Öhring. "Efficient Communication in Folded Petersen Networks." International Journal of Foundations of Computer Science 08, no. 02 (June 1997): 163–85. http://dx.doi.org/10.1142/s0129054197000136.

Full text

Abstract:

Fast and efficient communication is one of the most important requirements in today's multicomputers. When reaching a larger scale of processors, the probability of faults in the network increases, hence communication must be robust and fault tolerant. The recently introduced family of folded Petersen networks, constructed by iteratively applying the cartesian product operation on the well-known Petersen graph, provides a regular, node– and edge-symmetric architecture with optimal connectivity (hence maximal fault-tolerance), and logarithmic diameter. Compared to the closest sized hypercube, the folded petersen network has a smaller diameter, lower node degree and higher packing density. In this paper, we study fundamental communication primitives like single routing, permutation routing, one-to-all broadcasting, multinode-broadcasting (gossiping), personalized communications like scattering, and total exchange on the folded Petersen networks, considering two communication models, namely single link availability (SLA) and multiple link availability (MLA). We derive lower bounds for these problems and design optimal algorithms in terms of both time and the number of message transmissions. The results are based on the construction of minimal height spanning trees in the fault-free folded Petersen network. We further analyze these communication primitives in faulty networks, where processing nodes and transmission links cease working. This analysis is based on multiple arc-disjoint spanning trees, a construct also useful for analyzing other families of multicomputer networks.

APA, Harvard, Vancouver, ISO, and other styles

20

HU, WEI, TIANZHOU CHEN, QINGSONG SHI, and SHA LIU. "CRITICAL-PATH DRIVEN ROUTERS FOR ON-CHIP NETWORKS." Journal of Circuits, Systems and Computers 19, no. 07 (November 2010): 1543–57. http://dx.doi.org/10.1142/s021812661000689x.

Full text

Abstract:

Multithreaded programming has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors. The performance bottleneck of a multithreaded program is its critical path, whose length is its total execution time. As the number of cores within a processor increases, Network-on-Chip (NoC) has been proposed as a promising approach for inter-core communication. In order to optimize the performance of a multithreaded program running on an NoC based multi-core platform, we design and implement the critical-path driven router, which prioritizes inter-thread communication on the critical path when routing packets. The experimental results show that the critical-path driven router improves the execution time of the test case by 14.8% compared to the ordinary router.

APA, Harvard, Vancouver, ISO, and other styles

21

Monien, Burkhard, Ralf Diekmann, and Reinhard Lüling. "The Construction of Large Scale Reconfigurable Parallel Computing Systems (The Architecture of the SC320)." International Journal of Foundations of Computer Science 08, no. 03 (September 1997): 347–61. http://dx.doi.org/10.1142/s0129054197000227.

Full text

Abstract:

Reconfigurable communication networks for massively parallel multiprocessor systems offer the possibility to realize a number of application demands like special communication patterns or real-time requirements. This paper presents the design principle of a reconfigurable network which is able to realize any graph of maximal degree four. The architecture is based on a special multistage Clos network, constructed out of a number of static routing switches of equal size. Upper bounds on the cut size of 4-regular graphs, if split into a number of clusters, allow minimizing the number of switches and connections while still offering the desired reconfiguration capabilities as well as large scalability and flexible multi-user access. Efficient algorithms configuring the architecture are based on an old result by Petersen27 about the decomposition of regular graphs. The concept presented here is the basis for the Parsytec SC series of reconfigurable MPP-systems. The currently largest realization with 320 processors is presented in greater detail.

APA, Harvard, Vancouver, ISO, and other styles

22

DEVISMES, STÉPHANE. "A SILENT SELF-STABILIZING ALGORITHM FOR FINDING CUT-NODES AND BRIDGES." Parallel Processing Letters 15, no. 01n02 (March 2005): 183–98. http://dx.doi.org/10.1142/s0129626405002143.

Full text

Abstract:

In this paper, we present a self-stabilizing algorithm for finding cut-nodes and bridges in arbitrary rooted networks with a low memory requirement (O( log (n)) bits per processor where n is the number of processors). Our algorithm is silent and must be composed with a silent self-stabilizing algorithm computing a Depth-First Search (DFS) Spanning Tree of the network. So, in the paper, we will prove that the composition of our algorithm with any silent self-stabilizing DFS algorithm is self-stabilizing. Finally, we will show that our algorithm needs O(n2) moves to reach a terminal configuration once the DFS spanning tree is computed. Note that this time complexity is equivalent to the best proposed solutions.

APA, Harvard, Vancouver, ISO, and other styles

23

Gade, Sri Harsha, and Sujay Deb. "A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures." ACM Transactions on Design Automation of Electronic Systems 27, no. 1 (January 31, 2022): 1–31. http://dx.doi.org/10.1145/3462775.

Full text

Abstract:

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by and runtime by , while requiring smaller storage and lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.

APA, Harvard, Vancouver, ISO, and other styles

24

Ben-Asher, Yosi, and Assaf Schuster. "Single Step Undirected Reconfigurable Networks." VLSI Design 9, no. 1 (January 1, 1999): 17–28. http://dx.doi.org/10.1155/1999/71739.

Full text

Abstract:

The reconfigurable mesh (RN-MESH) can solve a large class of problems in constant time, including problems that require logarithmic time by other, even shared memory, models such as the PRAM with a similar number of processors [3]. In this work we show that for the RN-MESH these constants can always be reduced to one, still using a polynomial number of processors. Given a reconfigurable mesh that computes a set of values in constant time, we show that it can be simulated by a single step reconfigurable mesh with maximum size that is polynomial in the size of the original mesh. The proof is constructive, where the construction of the single step RN-MESH holds for the relatively weak undirected RN-MESH model. In this model broadcasts made on buses arrive at all nodes that belong to the undirected connected component of the transmitting processor. A result similar to the one that is obtained in this work was previously obtained for the directed reconfigurable mesh model (DRN) [5]. However, the construction for the DRN-MESH relies on the fact that the buses are directed, and thus cannot be applied to the undirected case. In addition, the construction presented here is simpler and uses significantly fewer processors than the one obtained for the DRN-MESH.

APA, Harvard, Vancouver, ISO, and other styles

25

Vin, H., and R. Yovatkor. "Network processors [Guest editorial]." IEEE Network 17, no. 4 (July 2003): 10–11. http://dx.doi.org/10.1109/mnet.2003.1220690.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Raghunath, M. T., and Abhiram Ranade. "Designing Interconnection Networks for Multi-level Packaging." VLSI Design 2, no. 4 (January 1, 1995): 375–88. http://dx.doi.org/10.1155/1995/57617.

Full text

Abstract:

A central problem in building large scale parallel machines is the design of the interconnection network. Interconnection network design is largely constrained by packaging technology. We start with a generic set of packaging restrictions and evaluate different network organizations under a random traffic model. Our results indicate that customizing the network topology to the packaging constraints is useful. Some of the general principles that arise out of this study are: 1) Making the networks denser at the lower levels of the packaging hierarchy has a significant positive impact on global communication performance, 2) It is better to organize a fixed amount of communication bandwidth as a smaller number of high bandwidth channels, 3) Providing the processors with the ability to tolerate latencies (by using multithreading) is very useful in improving performance.

APA, Harvard, Vancouver, ISO, and other styles

27

Nikolaidis, I. "Network Systems Design Using Network Processors [Book Review]." IEEE Network 18, no. 3 (May 2004): 5. http://dx.doi.org/10.1109/mnet.2004.1301013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

GEWALI, LAXMI P., and IVAN STOJMENOVIC. "COMPUTING EXTERNAL WATCHMAN ROUTES ON PRAM, BSR, AND INTERCONNECTION NETWORK MODELS OF PARALLEL COMPUTATION." Parallel Processing Letters 04, no. 01n02 (June 1994): 83–93. http://dx.doi.org/10.1142/s0129626494000107.

Full text

Abstract:

An external watchman route in the presence of a polygonal obstacle is a closed path such that each point in the exterior of the polygon is visible to some point along the route. We adapt the merging slopes technique of parallel computational geometry to develop a parallel algorithm for computing a shortest external watchman route in the presence of a convex polygon of n sides. The algorithm runs in O( log n) time using [Formula: see text] processors in the CREW-PRAM computational model; this is optimal within a constant factor. The algorithm can be easily adapted to compute a shortest watchman route in O( log n) time on a hypercube with O(n) processors. We also discuss the computation of a shortest external watchman route on star and pancake networks. Finally, a constant time algorithm for solving the merging slopes problem on a BSR with n processors is described. This leads to algorithms with the same time and processor count for solving the external watchman route problem, for computing Minkowski sum, critical support lines, and separability of two convex polygons, for finding the maximum distance between two convex polygons, and for computing the smallest enclosing box, diameter, and width of a convex polygon.

APA, Harvard, Vancouver, ISO, and other styles

29

Foag, Jurgen, and Thomas Wild. "Queuing algorithm for speculative Network Processors." International Journal of High Performance Computing and Networking 4, no. 5/6 (2006): 241. http://dx.doi.org/10.1504/ijhpcn.2006.013479.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Wu, Xiaoban, Peilong Li, Yongyi Ran, and Yan Luo. "Network measurement for 100 GbE network links using multicore processors." Future Generation Computer Systems 79 (February 2018): 180–89. http://dx.doi.org/10.1016/j.future.2017.04.038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Ryabko, Boris, and Anton Rakitskiy. "Theoretical Approach to Performance Evaluation of Supercomputers." Journal of Circuits, Systems and Computers 27, no. 04 (December 6, 2017): 1850062. http://dx.doi.org/10.1142/s0218126618500627.

Full text

Abstract:

In this paper, we extend an information-theoretic approach of computer performance evaluation to supercomputers. This approach is based on the notion of computer Capacity which can be estimated relying solely on the description of computer architecture. We describe the method of calculating Computer Capacity for supercomputers including the influence of the architecture of communication network. The suggested approach is applied to estimate the performance of three of the top 10 supercomputers (according to TOP500 June-2016 list) which are based on Haswell processors. For greater objectivity of results, we compared them relatively to values of another supercomputer which is based an Ivy Bridge processors (this microarchitecture differs from Haswell). The obtained results are compared with values of TOP500 LINPACK benchmark and theoretical peak and we arrive at conclusions about the applicability of the presented theoretical approach (nonexperimental) for performance evaluation of real supercomputers. In particular, it means that the estimations of the computer capacity can be used at the design stage of the development of supercomputers.

APA, Harvard, Vancouver, ISO, and other styles

32

SEO, KYUNG-RYONG, and KYU-HO PARK. "TASK ASSIGNMENT IN HOST-SATELLITE SYSTEMS." Journal of Circuits, Systems and Computers 06, no. 03 (June 1996): 213–25. http://dx.doi.org/10.1142/s0218126696000170.

Full text

Abstract:

This paper deals with the problem of assigning task modules of a program over a multiple computer system such that the sum of execution and communication costs is minimized. If the number of processors is two, this problem can be solved efficiently using the network flow approach pioneered by Stone.13 However, the general n-processor problem (n>3) in a fully connected system is known to be NP-complete.14 A host-satellite system considered in this paper is composed of a powerful host processor p0 and N homogeneous satellite processors pk’s, 1≤k≤N, in which each satellite processor pk is connected to the host processor p0 through a communication link. When any two satellite processors are to communicate with each other, the host processor p0 must participate in the communication. Therefore, the interprocessor communication cost per unit of information transferred between any two satellite processors is twice as much as that between a satellite processor and the host processor p0. In this paper, we propose an algorithm which finds an optimal assignment on a host-satellite system in polynomial time. A task assignment problem for a host-satellite system is first transformed into a network flow problem, and then solved by applying the well known network flow algorithm in time no worse than O(NM3), where N and M are the number of satellites and the number of modules, respectively.

APA, Harvard, Vancouver, ISO, and other styles

33

Ying-Dar Lin, Yi-Neng Lin, Shun-Chin Yang, and Yu-Sheng Lin. "Network, processors: implementation and evaluation." IEEE Network 17, no. 4 (July 2003): 28–34. http://dx.doi.org/10.1109/mnet.2003.1220693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

LAVAULT, CHRISTIAN. "EMBEDDINGS INTO THE PANCAKE INTERCONNECTION NETWORK." Parallel Processing Letters 12, no. 03n04 (September 2002): 297–310. http://dx.doi.org/10.1142/s0129626402001002.

Full text

Abstract:

Owing to its nice properties, the pancake is one of the Cayley graphs that were proposed as alternatives to the hypercube for interconnecting processors in parallel computers. In this paper, we present embeddings of rings, grids and hypercubes into the pancake with constant dilation and congestion. We also extend the results to similar efficient embeddings into star graph.

APA, Harvard, Vancouver, ISO, and other styles

35

Chuan Xu, Weiren Shi, and Qingyu Xiong. "An Architecture for Parallelizing Network Monitoring Based on Multi-Core Processors." Journal of Convergence Information Technology 6, no. 4 (April 30, 2011): 246–52. http://dx.doi.org/10.4156/jcit.vol6.issue4.27.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

CHEN, KEVIN F., and EDWIN H. M. SHA. "UNIVERSAL ROUTING AND PERFORMANCE ASSURANCE FOR DISTRIBUTED NETWORKS." Journal of Interconnection Networks 08, no. 01 (March 2007): 1–28. http://dx.doi.org/10.1142/s0219265907001886.

Full text

Abstract:

We show that universal routing can be achieved with low overhead in distributed networks. The validity of our results rests on a new network called the fat-stack. We show that from a routing perspective the fat-stack is efficient and is suitable for use as a baseline distributed network and as a crucial benchmark architecture for evaluating the performance of specific distributed networks. We show that the fat-stack is efficient by proving it is universal. A requirement for the fat-stack to be universal is that link capacities double up the levels of the network. We use methods developed in the areas of VLSI and processor interconnect for much of our analysis. We then show how to scale the fat-stack from a VLSI graph layout to a large-scale distributed topology and how the network can be an effective benchmark architecture. Our universality proofs show that a fat-stack of area Θ(A) can simulate any competing network of area A with [Formula: see text] overhead independently of wire delay. The universality result implies that the fat-stack of a given size is nearly the best routing network of that size. The fat-stack is also the minimal universal network for an [Formula: see text] overhead in terms of number of links. Actual simulations show that the fat-stack outperforms a mesh-based distributed network of comparable hardware usage. Our work helps explain why some deployed networks function in the way they do in terms of routing. It also provides an exemplary network of proven efficiency and scalability for building new distributed systems.

APA, Harvard, Vancouver, ISO, and other styles

37

Engel, Jacob, and Taskin Kocak. "Off-chip communication architectures for high throughput network processors." Computer Communications 32, no. 5 (March 2009): 867–79. http://dx.doi.org/10.1016/j.comcom.2008.12.043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Palmer, R. P., and P. A. Rounce. "An architecture for application specific neural network processors." Computing & Control Engineering Journal 5, no. 6 (December 1, 1994): 260–64. http://dx.doi.org/10.1049/cce:19940603.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Elkateeb, Ali, Paul Richardson, Adnan Shaout, Afzal Hussain, and Mohammed Elbeshti. "Scalable ATM network interface design using parallel RISC processors architecture." Microprocessors and Microsystems 28, no. 9 (November 2004): 499–507. http://dx.doi.org/10.1016/j.micpro.2004.04.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Roka, Sanjay, and Santosh Naik. "SURVEY ON SIGNATURE BASED INTRUCTION DETECTION SYSTEM USING MULTITHREADING." International Journal of Research -GRANTHAALAYAH 5, no. 4RACSIT (April 30, 2017): 58–62. http://dx.doi.org/10.29121/granthaalayah.v5.i4racsit.2017.3352.

Full text

Abstract:

The traditional way of protecting networks with firewalls and encryption software is no longer sufficient and effective. Many intrusion detection techniques have been developed on fixed wired networks but have been turned to be inapplicable in this new environment. We need to search for new architecture and mechanisms to protect computer networks. Signature-based Intrusion Detection System matches network packets against a pre-configured set of intrusion signatures. Current implementations of IDS employ only a single thread of execution and as a consequence benefit very little from multi-processor hardware platforms. A multi-threaded technique would allow more efficient and scalable exploitation of these multi-processor machines.

APA, Harvard, Vancouver, ISO, and other styles

41

Zulfin, Muhammad, S. Suherman, Rahmad Fauzi, M. Razali, and Maksum Pinem. "Cross-Point Comparison of Multistage Non-Blocking Technologies." International Journal of Engineering & Technology 7, no. 3.2 (June 20, 2018): 703. http://dx.doi.org/10.14419/ijet.v7i3.2.15348.

Full text

Abstract:

Multistage switching networks play important role in communication and computer network. They make communication nodes connect to each other. In computer hardware switches connect processors and memories. Initially, switches are arranged as one stage interconnection. As clients are growing, multistage is a must. The finding Clos multistage switching initiated multistage technologies. Benes improves Clos by reducing number of cross-points by using a 2 x 2 switch element and call re-routing. Batcher improves the technology by other way which is sorting destination address. Banyan is then joined to Batcher to simplify routing control. This paper analyses the number of cross-point required in Clos, Benes and Batcher Banyan to accomplish multistage switching architecture of 16, 64, 256, 1024 and 2048 input/output ports. As results, Clos cross-point is in averages 495.24% higher than Benes and 160.30% higher than Batcher Banyan. Clos blocking probabilities are closed to zero. Benes blocking probabilities are conditionally zero. Batcher Banyan blocking probabilities are zero.

APA, Harvard, Vancouver, ISO, and other styles

42

Cascón, Pablo, Julio Ortega, Yan Luo, Eric Murray, Antonio Díaz, and Ignacio Rojas. "Improving IPS by network processors." Journal of Supercomputing 57, no. 1 (February 4, 2011): 99–108. http://dx.doi.org/10.1007/s11227-011-0558-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Misko, Joshua, Shrikant S. Jadhav, and Youngsoo Kim. "Extensible Embedded Processor for Convolutional Neural Networks." Scientific Programming 2021 (April 21, 2021): 1–12. http://dx.doi.org/10.1155/2021/6630552.

Full text

Abstract:

Convolutional neural networks (CNNs) require significant computing power during inference. Smart phones, for example, may not run a facial recognition system or search algorithm smoothly due to the lack of resources and supporting hardware. Methods for reducing memory size and increasing execution speed have been explored, but choosing effective techniques for an application requires extensive knowledge of the network architecture. This paper proposes a general approach to preparing a compressed deep neural network processor for inference with minimal additions to existing microprocessor hardware. To show the benefits to the proposed approach, an example CNN for synthetic aperture radar target classification is modified and complimentary custom processor instructions are designed. The modified CNN is examined to show the effects of the modifications and the custom processor instructions are profiled to illustrate the potential performance increase from the new extended instructions.

APA, Harvard, Vancouver, ISO, and other styles

44

Zheng, Bo. "A Queue Management Algorithm Fit for Network Processors." Journal of Computer Research and Development 42, no. 10 (2005): 1698. http://dx.doi.org/10.1360/crad20051009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Rehman, Saeed ur, Rizwan Akhtar, Zuhaib Ashfaq Khan, and Changda Wang. "Architecture for Collision-Free Communication Using Relaxation Technique." Wireless Communications and Mobile Computing 2018 (November 4, 2018): 1–8. http://dx.doi.org/10.1155/2018/2839797.

Full text

Abstract:

In today’s world we are surrounded by world of smart handheld devices like smart phones, tablets, netbooks, and others. These devices are based on advance technologies of multiple-input and multiple-output, Orthogonal Frequency Division Multiplexing (OFDM), and advance data reliability techniques such as forward error corrections. High data rates are among the requirements of these technologies for which turbo and low density parity check codes (LDPC) are widely used in these standards. In order to get high speed, we need multiple and parallel processors for the implementation of such codes. But there exists a collision problem as a consequence of parallel processor. This problem results in increase latency and increase of hardware complexity. In this work an approach for collision problem is presented in which network relaxation technique is used which is based on a fast clique detection. The proposed approach results in high throughput in terms of latency and complexity. Furthermore, the proposed solution is able to solve the collision problem by connecting network optimization for achieving high throughput.

APA, Harvard, Vancouver, ISO, and other styles

46

Mattes, J., D. Trystram, and J. Demongeot. "Parallel Image Processing Using Neural Networks: Applications in Contrast Enhancement of Medical Images." Parallel Processing Letters 08, no. 01 (March 1998): 63–76. http://dx.doi.org/10.1142/s0129626498000092.

Full text

Abstract:

This paper describes the implementation of a parallel image processing algorithm, the aim of which is to give good contrast enhancement in real time, especially on the boundaries of an object of interest defined by a grey homogeneity (for example, an object of medical interest having a functional or morphologic homogeneity, like a bone or tumor). The implementation of a neural network algorithm which does this contrast enhancement has been done on a SIMD massively parallel machine (a MasPar of 8192 processors) and the communication between its processors has been optimized.

APA, Harvard, Vancouver, ISO, and other styles

47

CARLE, JEAN, JEAN-FREDERIC MYOUPO, and DAVID SEME. "ALL-TO-ALL BROADCASTING ALGORITHMS ON HONEYCOMB NETWORKS AND APPLICATIONS." Parallel Processing Letters 09, no. 04 (December 1999): 539–50. http://dx.doi.org/10.1142/s0129626499000505.

Full text

Abstract:

This paper presents two simple all-to-all broadcasting algorithms on honeycomb mesh. Consider a network with n processors, one has personalized routing strategy at each node and it requires a 3n communication time complexity. This communication time can be reduced to n because the computation time is always assumed to be much lower than the communication time. The other is based on a Hamiltonian path and has a 2n communication time complexity. We show how they can be used to get parallel solutions to a class of problems on honeycomb networks, among others Prefix Sums, Maximal Vectors, Maximal Sum Subsegment, Parenthesis Matching, Decoding Binary Tree, and Sorting. In our knowledge, these all-to-all broadcast algorithms are the only ones so far exhibited on a honeycomb.

APA, Harvard, Vancouver, ISO, and other styles

48

Lin, Yi-Neng, Ying-Dar Lin, and Yuan-Cheng Lai. "Thread allocation in CMP-based multithreaded network processors." Parallel Computing 36, no. 2-3 (February 2010): 104–16. http://dx.doi.org/10.1016/j.parco.2010.01.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Grosse, E., and Y. N. Lakshman. "Network processors applied to IPv4/IPv6 transition." IEEE Network 17, no. 4 (July 2003): 35–39. http://dx.doi.org/10.1109/mnet.2003.1220694.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

DUATO, JOSÉ. "A THEORY TO INCREASE THE EFFECTIVE REDUNDANCY IN WORMHOLE NETWORKS." Parallel Processing Letters 04, no. 01n02 (June 1994): 125–38. http://dx.doi.org/10.1142/s0129626494000144.

Full text

Abstract:

Fault-tolerant systems aim at providing continuous operations in the presence of faults. Multicomputers rely on an interconnection network between processors to support the message-passing mechanism. Therefore, the reliability of the interconnection network is very important for the reliability of the whole system. This paper analyses the effective redundancy available in a wormhole network by combining connectivity and deadlock freedom. Redundancy is defined at the channel level, giving a sufficient condition for a channel to be redundant and computing the set of redundant channels. The redundancy level of the network is also defined, proposing a theorem that supplies a lower bound for it. Finally, a fault-tolerant routing algorithm based on the former theory is proposed.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!