To see the other types of publications on this topic, follow the link: Network-on-chips.

Dissertations / Theses on the topic 'Network-on-chips'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 38 dissertations / theses for your research on the topic 'Network-on-chips.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Xiang, Xiyue. "Contention Alleviation in Network-on-Chips." Thesis, University of Louisiana at Lafayette, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10272587.

Full text
Abstract:

In a network-on-chip (NoC) based system, the NoC is a shared resource among multiple processor cores. Requests generated by different applications running on different cores can create severe contention in NoCs. This contention can jeopardize the system performance and power efficiency in many different formats. First and foremost, we discover that the contention in NoCs can induce inter-application interference, leading to overall system performance degradation, prevent fair-progress of different applications, and cause starvation of unfairly-treated applications. We propose the NoC Application Slowdown (NAS) Model, the first online model that accurately estimates how much network delays due to interference contribute to the overall stall time of each application. We use NAS to develop Fairness-Aware Source Throttling (FAST), a mechanism that employs slowdown predictions to control the network injection rates of applications in a way that minimizes system unfairness. Furthermore, although removing buffers from the constituent routers can reduce power consumption and hardware complexity, the bufferless NoC is subject to the growing deflection caused by contention, leading to severe performance degradation and squandering power-saving potential. we then propose Deflection Containment (DeC) for the bufferless NoC to address its notorious shortcoming of excessive deflection for performance improvement and power reduction. With a link added to each router for bridging subnetworks (whose aggregated link width equals a give value, say, 128b), DeC lets a contending flit in one subnetwork be forwarded to another subnetwork instead of deflected, yielding extraordinary deflection reduction and greatly enriching path diversity. In addition, router microarchitecture under DeC is rectified to shorten the critical path and lift network bandwidth. Last but not least, beside 1-to-1 flow, the growing core counts urgently requires effective hardware support to alleviate the contention caused by 1-to-many and many-to-1 flow. We propose Carpool, the very first bufferless NoC optimized for 1-to-many and many-to-1 traffic. Carpool adaptively forks new flit replicas and performs traffic aggregation at appropriate intermediate routers to lessen bandwidth demands and reduce contention. We propose the microarchitecture of Carpool routers and develop parallel port allocation which supports multicast and reduces critical paths to improve network bandwidth.

APA, Harvard, Vancouver, ISO, and other styles
2

Kwa, Jimmy Williamchingyuan. "Optimizing network-on-chips for FPGAs." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44343.

Full text
Abstract:
As larger System-on-Chip (SoC) designs are attempted on Field Programmable Gate Arrays (FPGAs), the need for a low cost and high performance Network-on-Chip (NoC) grows. Virtual Channel (VC) routers provide desirable traits for an NoC such as higher throughput and deadlock prevention but at significant resource cost when implemented on an FPGA. This thesis presents an FPGA specific optimization to reduce resource utilization. We propose sharing Block RAMs between multiple router ports to store the high logic resource consuming VC buffers and present the Block RAM Split (BRS) router architecture that implements the proposed optimization. We evaluate the performance of the modifications using synthetic traffic patterns on mesh and torus networks and synthesize the NoCs to determine overall resource usage and maximum clock frequency. We find that the additional logic to support sharing Block RAMs has little impact on Adaptive Logic Module (ALM) usage in designs that currently use Block RAMs while at the same time decreasing Block RAM usage by as much as 40%. In comparison to CONNECT, a router design that does not use Block RAMs, a 71% reduction in ALM usage is shown to be possible. This resource reduction comes at the cost of a 15% reduction in the saturation throughput for uniform random traffic and a 50% decrease in the worst case neighbour traffic pattern on a mesh network. The throughput penalty from the neighbour traffic pattern can be reduced to 3% if a torus network is used. In all cases, there is little change in network latency at low load. BRS is capable of running at 161.71 MHz which is a decrease of only 4% from the base VC router design. Determining the optimum NoC topology is a challenging task. This thesis also proposes initial work towards the creation of an analytical model to assist with finding the best topology to use in an FPGA NoC.
APA, Harvard, Vancouver, ISO, and other styles
3

Bakhoda, Ali. "Designing network-on-chips for throughput accelerators." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/46423.

Full text
Abstract:
Physical limits of power usage for integrated circuits have steered the microprocessor industry towards parallel architectures in the past decade. Modern Graphics Processing Units (GPU) are a form of parallel processor that harness chip area more effectively compared to traditional single threaded architectures by favouring application throughput over latency. Modern GPUs can be used as throughput accelerators: accelerating massively parallel non-graphics applications. As the number of compute cores in throughput accelerators increases, so does the importance of efficient memory subsystem design. In this dissertation, we present system-level microarchitectural analysis and optimizations with an emphasis on the memory subsystem of throughput accelerators that employ Bulk-Synchronous-Parallel programming models such as CUDA and OpenCL. We model the whole throughput accelerator as a closed-loop system in order to capture the effects of complex interactions of microarchitectural components: we simulate components such as compute cores, on-chip network and memory controllers with cycle-level accuracy. For this purpose, the first version of GPGPU-Sim simulator that was capable of running unmodified applications by emulating NVIDIA's virtual instruction set was developed. We use this simulator to model and analyze several applications and explore various microarchitectural tradeoffs for throughput accelerators to better suit these applications. Based on our observations, we identify the Network-on-Chip (NoC) component of memory subsystem as our main optimization target and set out to design throughput effective NoCs for future throughput accelerators. We provide a new framework for NoC researchers to ensure the optimizations are "throughput effective", namely, parallel application-level performance improves per unit chip area. We then use this framework to guide the development of several optimizations. Accelerator workloads demand high off-chip memory bandwidth resulting in a many-to-few-to-many traffic pattern. Leveraging this observation, we reduce NoC area by proposing a checkerboard NoC which utilizes routers with limited connectivity. Additionally, we improve performance by increasing the terminal bandwidth of memory controller nodes to better handle frequent read-reply traffic. Furthermore, we propose a double checkerboard inverted NoC organization which maintains the benefits of these optimizations while having a simpler routing mechanism and smaller area and results in a 24.3% improvement in average application throughput per unit area.
APA, Harvard, Vancouver, ISO, and other styles
4

Pattabiraman, Aishwariya. "Heterogeneous Cache Architecture in Network-on-Chips." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1321371508.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sarkar, Souradip. "Multiple clock domain synchronization for network on chips." Online access for everyone, 2007. http://www.dissertations.wsu.edu/Thesis/Fall2007/S_Sarkar_112907.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Alshraiedeh, Juman. "Wear-out Leveling in Network on Chips (NoCs)." Ohio University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1492677926079357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bhardwaj, Kshitij. "Aging-Aware Routing Algorithms for Network-on-Chips." DigitalCommons@USU, 2012. https://digitalcommons.usu.edu/etd/1319.

Full text
Abstract:
Network-on-Chip (NoC) architectures have emerged as a better replacement of the traditional bus-based communication in the many-core era. However, continuous technology scaling has made aging mechanisms, such as Negative Bias Temperature Instability (NBTI) and electromigration, primary concerns in NoC design. In this work, a novel system-level aging model is proposed to model the effects of aging in NoCs, caused due to (a) asymmetric communication patterns between the network nodes, and (b) runtime traffic variations due to routing policies. This work observes a critical need of a holistic aging analysis, which when combined with power-performance optimization, poses a multi-objective design challenge. To solve this problem, two different aging-aware routing algorithms are proposed: (a) congestion-oblivious Mixed Integer Linear Programming (MILP)-based routing algorithm, and (b) congestion-aware adaptive routing algorithm and router micro-architecture. After extensive experimental evaluations, proposed routing algorithms reduce aging-induced power-performance overheads while also improving the system robustness.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Yixuan. "High-Performance Crossbar Designs for Network-on-Chips (NoCs)." Ohio University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1282056856.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Boraten, Travis Henry. "Hardware Security Threat and Mitigation Techniques for Network-on-Chips." Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1596031630118173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Boraten, Travis H. "Runtime Adaptive Scrubbing in Fault-Tolerant Network-on-Chips (NoC) Architectures." Ohio University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1397488496.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Neel, Brian. "High Performance Shared Memory Networking in Future Many-core Architectures UsingOptical Interconnects." Ohio University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1397488118.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Dalirsani, Atefe [Verfasser], and Hans-Joachim [Akademischer Betreuer] Wunderlich. "Self-diagnosis in Network-on-Chips / Atefe Dalirsani. Betreuer: Hans-Joachim Wunderlich." Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2015. http://d-nb.info/1076503330/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Friederich, Stephanie [Verfasser], and J. [Akademischer Betreuer] Becker. "Automated Hardware Prototyping for 3D Network on Chips / Stephanie Friederich ; Betreuer: J. Becker." Karlsruhe : KIT-Bibliothek, 2017. http://d-nb.info/1149522488/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Liu, Meng. "Real-Time Communication over Wormhole-Switched On-Chip Networks." Doctoral thesis, Mälardalens högskola, Inbyggda system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-35316.

Full text
Abstract:
In a modern industrial system, the requirement on computational capacity has increased dramatically, in order to support a higher number of functionalities, to process a larger amount of data or to make faster and safer run-time decisions. Instead of using a traditional single-core processor where threads can only be executed sequentially, multi-core and many-core processors are gaining more and more attentions nowadays. In a multi-core processor, software programs can be executed in parallel, which can thus boost the computational performance. Many-core processors are specialized multi-core processors with a larger number of cores which are designed to achieve a higher degree of parallel processing. An on-chip communication bus is a central intersection used for data-exchange between cores, memory and I/O in most multi-core processors. As the number of cores increases, more contention can occur on the communication bus which raises a bottleneck of the overall performance. Therefore, in order to reduce contention incurred on the communication bus, a many-core processor typically employs a Network-on-Chip (NoC) to achieve data-exchange. Real-time embedded systems have been widely utilized for decades. In addition to the correctness of functionalities, timeliness is also an important factor in such systems. Violation of specific timing requirements can result in performance degradation or even fatal problems. While executing real-time applications on many-core processors, the timeliness of a NoC, as a communication subsystem, is essential as well. Unfortunately, many real-time system designs over-provision resources to guarantee the fulfillment of timing requirements, which can lead to significant resource waste. For example, analysis of a NoC design yields that the network is already saturated (i.e. accepting more traffic can incur requirement violation), however, in reality the network actually has the capacity to admit more traffic. In this thesis, we target such resource wasting problems related to design and analysis of NoCs that are used in real-time systems. We propose a number of solutions to improve the schedulability of real-time traffic over wormhole-switched NoCs in order to further improve the resource utilization of the whole system. The solutions focus mainly on two aspects: (1) providing more accurate and efficient time analyses; (2) proposing more cost-effective scheduling methods.
APA, Harvard, Vancouver, ISO, and other styles
15

DiTomaso, Dominic F. "Improving Energy Efficiency of Network-on-Chips Using Emerging Wireless Technology and Router Optimizations." Ohio University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1337627400.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Vangal, Sriram. "Performance and Energy Efficient Network-on-Chip Architectures." Doctoral thesis, Linköpings universitet, Institutionen för systemteknik, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11439.

Full text
Abstract:
The scaling of MOS transistors into the nanometer regime opens the possibility for creating large Network-on-Chip (NoC) architectures containing hundreds of integrated processing elements with on-chip communication. NoC architectures, with structured on-chip networks are emerging as a scalable and modular solution to global communications within large systems-on-chip. NoCs mitigate the emerging wire-delay problem and addresses the need for substantial interconnect bandwidth by replacing today’s shared buses with packet-switched router networks. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as three-dimensional (3D) graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. This work demonstrates that a computational fabric built using optimized building blocks can provide high levels of performance in an energy efficient manner. The thesis details an integrated 80- Tile NoC architecture implemented in a 65-nm process technology. The prototype is designed to deliver over 1.0TFLOPS of performance while dissipating less than 100W. This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power. In a 150-nm sixmetal CMOS process, the 12.2 mm2 router contains 1.9-million transistors and operates at 1 GHz at 1.2 V supply. We next describe a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulation loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. A combination of algorithmic, logic and circuit techniques enable multiply-accumulate operations at speeds exceeding 3GHz, with singlecycle throughput. This approach reduces the latency of dependent FPMAC instructions and enables a sustained multiply-add result (2FLOPS) every cycle. The optimizations allow removal of the costly normalization step from the critical accumulation loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In a 90-nm seven-metal dual-VT CMOS process, the 2 mm2 custom design contains 230-K transistors. Silicon achieves 6.2-GFLOPS of performance while dissipating 1.2 W at 3.1 GHz, 1.3 V supply. We finally present the industry's first single-chip programmable teraFLOPS processor. The NoC architecture contains 80 tiles arranged as an 8×10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined singleprecision FPMAC units which feature a single-cycle accumulation loop for high throughput. The five-port router combines 100 GB/s of raw bandwidth with low fall-through latency under 1ns. The on-chip 2D mesh network provides a bisection bandwidth of 2 Tera-bits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100-M transistors. The fully functional first silicon achieves over 1.0TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07-V supply. It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results demonstrate that the NoC architecture successfully delivers on its promise of greater integration, high performance, good scalability and high energy efficiency.
APA, Harvard, Vancouver, ISO, and other styles
17

Runge, Armin [Verfasser], Reiner [Gutachter] Kolla, and Martin [Gutachter] Radetzki. "Advances in Deflection Routing based Network on Chips / Armin Runge ; Gutachter: Reiner Kolla, Martin Radetzki." Würzburg : Universität Würzburg, 2017. http://d-nb.info/1141054353/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Karlsson, Erik. "Analysis and Development of Error-Job Mapping and Scheduling for Network-on-Chips with Homogeneous Processors." Thesis, Linköping University, Department of Computer and Information Science, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54633.

Full text
Abstract:

Due to increased complexity of today’s computer systems, which are manufactured in recent semiconductor technologies, and the fact that recent semiconductor technologies are more liable to soft errors (non-permanent errors) it is inherently difficult to ensure that the systems are and will remain error-free. Depending on the application, a soft error can have serious consequences for the system. It is therefore important to detect the presence of soft errors as early as possible and recover from the erroneous state and maintain correct operation. There is an entire research area devoted on proposing, implementing and analyzing techniques that can detect and recover from these errors, known as fault tolerance. The drawback of using faulttolerance is that it usually introduces some overhead. This overhead may be for instance redundant hardware, which increases the cost of the system, or it may be a time overhead that negatively impacts on system performance. Thus a main concern when applying fault tolerance is to minimize the imposed overhead while the system is still able to deliver the correct error-free operation. In this thesis we have analyzed one well known fault tolerant technique, Rollback-Recovery with Checkpointing (RRC). This technique is able to detect and recover from errors by taking and storing checkpoints during the execution of a job.Therefore we can think as if a job is divided into a number of execution segments and a checkpoint is taken after executing each execution segment. This technique requires the job to be concurrently executed on two processors. At each checkpoint, both processors exchange data, which contains enough information for the job’s state. The exchanged data are then compared. If the data differ, it means that an error is detected in the previous execution segment and it is therefore re-executed. If the exchanged data are the same, it means that no errors are detected and the data are stored as a safe point from which the job can be restarted later. A time overhead due to exchanging data between processors is therefore introduced, and it increases the average execution time of a job, i.e. the average time required for a given job to complete. The overhead depends on the number of links that has to be traversed (due to data exchange) after each execution segment and the number of execution segments that are needed for the given job. The number of links that has to be traversed after each execution segment is twice the distance between the processors that are executing the same job concurrently. However, this is only true if all the links are fully functional. A link failure can result in a longer route for communication between the processors. Even though all links arefully functional, the number of execution segments still depends on error-free probabilities of the processors, and these error-free probabilities can vary between processors. This implies that the choice of processors affects the total number of links the communication has to traverse. Choosing two processors with higher error-free probability further away from eachother increases the distance, but decreases the number of execution segments, which can result in a lower overhead. By carefully determining the mapping for a given job, one can decrease the overhead, hence decreasing the average execution time. Since it is very common to have a larger number of jobs than available resources, it is not only important to find a good mapping to decrease the average execution time for a whole system, but also a good order of execution for a given set jobs (scheduling of the jobs). We propose in this thesis several mapping and scheduling algorithms that aim to reduce the average execution time in a fault-tolerant multiprocessor System-on-Chip, which uses Network-on-Chip as an underlying interconnect architecture, so that the fault-tolerant technique (RRC) can perform efficiently.

APA, Harvard, Vancouver, ISO, and other styles
19

Qi, Ji. "System-level design automation and optimisation of network-on-chips in terms of timing and energy." Thesis, University of Southampton, 2015. https://eprints.soton.ac.uk/386210/.

Full text
Abstract:
As system complexity constantly increases, traditional bus-based architectures are less adaptable to the increasing design demands. Specifically in on-chip digital system designs, Network-on-Chip (NoC) architectures are promising platforms that have distributed multi-core co-operation and inter-communication. Since the design cost and time cycles of NoC systems are growing rapidly with higher integration, systemlevel Design Automation (DA) techniques are used to abstract models at early design stages for functional validation and performance prediction. Yet precise abstractions and efficient simulations are critical challenges for modern DA techniques to improve the design efficiency. This thesis makes several contributions to address these challenges. We have firstly extended a backbone simulator, NIRGAM, to offer accurate system level models and performance estimates. A case study of developing a one-to-one transmission system using asynchronous FIFOs as buffers in both the NIRGAM simulator and a synthesised gate-level design is given to validate the model accuracy by comparing their power and timing performance. Then we have made a second contribution to improve DA techniques by proposing a novel method to efficiently emulate non-rectangular NoC topologies in NIRGAM and generating accurate energy and timing performance. Our proposed method uses time regulated models to emulate virtual non-rectangular topologies based on a regular Mesh. The performance accuracy of virtual topologies is validated by comparing with corresponding real NoC topologies. The third contribution of our research is a novel task-mapping scheme that generates optimal mappings to tile-based NoC networks with accurate performance prediction and increased execution speed. A novel Non-Linear Programming (NLP) based mapping problem has been formulated and solved by a modified Branch and Bound (BB) algorithm. The proposed method predicts the performance of optimised mappings and compares it with NIRGAM simulations for accuracy validation.
APA, Harvard, Vancouver, ISO, and other styles
20

Rajamanikkam, Chidhambaranathan. "Understanding Security Threats of Emerging Computing Architectures and Mitigating Performance Bottlenecks of On-Chip Interconnects in Manycore NTC System." DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7453.

Full text
Abstract:
Emerging computing architectures such as, neuromorphic computing and third party intellectual property (3PIP) cores, have attracted significant attention in the recent past. Neuromorphic Computing introduces an unorthodox non-von neumann architecture that mimics the abstract behavior of neuron activity of the human brain. They can execute more complex applications, such as image processing, object recognition, more efficiently in terms of performance and energy than the traditional microprocessors. However, focus on the hardware security aspects of the neuromorphic computing at its nascent stage. 3PIP core, on the other hand, have covertly inserted malicious functional behavior that can inflict range of harms at the system/application levels. This dissertation examines the impact of various threat models that emerges from neuromorphic architectures and 3PIP cores. Near-Threshold Computing (NTC) serves as an energy-efficient paradigm by aggressively operating all computing resources with a supply voltage closer to its threshold voltage at the cost of performance. Therefore, STC system is scaled to many-core NTC system to reclaim the lost performance. However, the interconnect performance in many-core NTC system pose significant bottleneck that hinders the performance of many-core NTC system. This dissertation analyzes the interconnect performance, and further, propose a novel technique to boost the interconnect performance of many-core NTC system.
APA, Harvard, Vancouver, ISO, and other styles
21

Vangal, Sriram R. "Performance and Energy Efficient Building Blocks for Network-on-Chip Architectures." Licentiate thesis, Linköping : Linköpings universitet, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-7845.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Runge, Armin. "Advances in Deflection Routing based Network on Chips." Doctoral thesis, 2017. https://nbn-resolving.org/urn:nbn:de:bvb:20-opus-149700.

Full text
Abstract:
The progress which has been made in semiconductor chip production in recent years enables a multitude of cores on a single die. However, due to further decreasing structure sizes, fault tolerance and energy consumption will represent key challenges. Furthermore, an efficient communication infrastructure is indispensable due to the high parallelism at those systems. The predominant communication system at such highly parallel systems is a Network on Chip (NoC). The focus of this thesis is on NoCs which are based on deflection routing. In this context, contributions are made to two domains, fault tolerance and dimensioning of the optimal link width. Both aspects are essential for the application of reliable, energy efficient, and deflection routing based NoCs. It is expected that future semiconductor systems have to cope with high fault probabilities. The inherently given high connectivity of most NoC topologies can be exploited to tolerate the breakdown of links and other components. In this thesis, a fault-tolerant router architecture has been developed, which stands out for the deployed interconnection architecture and the method to overcome complex fault situations. The presented simulation results show, all data packets arrive at their destination, even at high fault probabilities. In contrast to routing table based architectures, the hardware costs of the herein presented architecture are lower and, in particular, independent of the number of components in the network. Besides fault tolerance, hardware costs and energy efficiency are of great importance. The utilized link width has a decisive influence on these aspects. In particular, at deflection routing based NoCs, over- and under-sizing of the link width leads to unnecessary high hardware costs and bad performance, respectively. In the second part of this thesis, the optimal link width at deflection routing based NoCs is investigated. Additionally, a method to reduce the link width is introduced. Simulation and synthesis results show, the herein presented method allows a significant reduction of hardware costs at comparable performance
Die Fortschritte der letzten Jahre bei der Fertigung von Halbleiterchips ermöglichen eine Vielzahl an Rechenkernen auf einem einzelnen Chip. Die in diesem Zusammenhang immer weiter sinkenden Strukturgrößen führen jedoch dazu, dass Fehlertoleranz und Energieverbrauch zentrale Herausforderungen darstellen werden. Aufgrund der hohen Parallelität in solchen Systemen, ist außerdem eine leistungsfähige Kommunikationsinfrastruktur unabdingbar. Das in diesen hochgradig parallelen Systemen überwiegend eingesetzte System zur Datenübertragung ist ein Netzwerk auf einem Chip (engl. Network on Chip (NoC)). Der Fokus dieser Dissertation liegt auf NoCs, die auf dem Prinzip des sog. Deflection Routing basieren. In diesem Kontext wurden Beiträge zu zwei Bereichen geleistet, der Fehlertoleranz und der Dimensionierung der optimalen Breite von Verbindungen. Beide Aspekte sind für den Einsatz zuverlässiger, energieeffizienter, Deflection Routing basierter NoCs essentiell. Es ist davon auszugehen, dass zukünftige Halbleiter-Systeme mit einer hohen Fehlerwahrscheinlichkeit zurecht kommen müssen. Die hohe Konnektivität, die in den meisten NoC Topologien inhärent gegeben ist, kann ausgenutzt werden, um den Ausfall von Verbindungen und anderen Komponenten zu tolerieren. Im Rahmen dieser Arbeit wurde vor diesem Hintergrund eine fehlertolerante Router-Architektur entwickelt, die sich durch das eingesetzte Verbindungsnetzwerk und das Verfahren zur Überwindung komplexer Fehlersituationen auszeichnet. Die präsentierten Simulations-Ergebnisse zeigen, dass selbst bei sehr hohen Fehlerwahrscheinlichkeiten alle Datenpakete ihr Ziel erreichen. Im Vergleich zu Router-Architekturen die auf Routing-Tabellen basieren, sind die Hardware-Kosten der hier vorgestellten Router-Architektur gering und insbesondere unabhängig von der Anzahl an Komponenten im Netzwerk, was den Einsatz in sehr großen Netzen ermöglicht. Neben der Fehlertoleranz sind die Hardware-Kosten sowie die Energieeffizienz von NoCs von großer Bedeutung. Einen entscheidenden Einfluss auf diese Aspekte hat die verwendete Breite der Verbindungen des NoCs. Insbesondere bei Deflection Routing basierten NoCs führt eine Über- bzw. Unterdimensionierung der Breite der Verbindungen zu unnötig hohen Hardware-Kosten bzw. schlechter Performanz. Im zweiten Teil dieser Arbeit wird die optimale Breite der Verbindungen eines Deflection Routing basierten NoCs untersucht. Außerdem wird ein Verfahren zur Reduzierung der Breite dieser Verbindungen vorgestellt. Simulations- und Synthese-Ergebnisse zeigen, dass dieses Verfahren eine erhebliche Reduzierung der Hardware-Kosten bei ähnlicher Performanz ermöglicht
APA, Harvard, Vancouver, ISO, and other styles
23

Deshpande, Hrishikesh. "Multipath Router Architectures to Reduce Latency in Network-on-Chips." Thesis, 2012. http://hdl.handle.net/1969.1/ETD-TAMU-2012-05-11208.

Full text
Abstract:
The low latency is a prime concern for large Network-on-Chips (NoCs) typically used in chip-multiprocessors (CMPs) and multiprocessor system-on-chips (MPSoCs). A significant component of overall latency is the serialization delay for applications which have long packets such as typical video stream traffic. To address the serialization latency, we propose to exploit the inherent path diversity available in a typical 2-D Mesh with our two novel router architectures, Dual-path router and Dandelion router. We observe that, in a 2-D mesh, for any source-destination pair, there are two minimal paths along the edges of the bounding box. We call it XY Dimension Order Routing (DOR) and YX DOR. There are also two non-minimal paths which are non-coinciding and out of the bounding box created by XY and YX DOR paths. Dual-path Router implements two injection and two ejection ports for parallel packet injection through two minimal paths. Packets are split into two halves and injected simultaneously into the network. Dandelion router implements four injection and ejection ports for parallel packet injection. Packets are split into smaller sub-packets and are injected simultaneously in all possible directions which typically include two minimal paths and two non-minimal paths. When all the sub-packets reach the destination, they are eventually recombined. We find that our technique significantly increases the throughput and reduces the serialization latency and hence overall latency of long packets. We explore the impact of Dual-path and Dandelion on various packet lengths in order to prove the advantage of our routers over the baseline. We further implement different deadlock free disjoint path models for Dandelion and develop a switching mechanism between Dual-path and Dandelion based on the traffic congestion.
APA, Harvard, Vancouver, ISO, and other styles
24

Malave-Bonet, Javier. "A Benchmarking Platform For Network-On-Chip (NOC) Multiprocessor System-On- Chips." Thesis, 2010. http://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8662.

Full text
Abstract:
Network-on-Chip (NOC) based designs have garnered significant attention from both researchers and industry over the past several years. The analysis of these designs has focused on broad topics such as NOC component micro-architecture, fault-tolerant communication, and system memory architecture. Nonetheless, the design of lowlatency, high-bandwidth, low-power and area-efficient NOC is extremely complex due to the conflicting nature of these design objectives. Benchmarks are an indispensable tool in the design process; providing thorough measurement and fair comparison between designs in order to achieve optimal results (i.e performance, cost, quality of service). This research proposes a benchmarking platform called NoCBench for evaluating the performance of Network-on-chip. Although previous research has proposed standard guidelines to develop benchmarks for Network-on-Chip, this work moves forward and proposes a System-C based simulation platform for system-level design exploration. It will provide an initial set of synthetic benchmarks for on-chip network interconnection validation along with an initial set of standardized processing cores, NOC components, and system-wide services. The benchmarks were constructed using synthetic applications described by Task Graphs For Free (TGFF) task graphs extracted from the E3S benchmark suite. Two benchmarks were used for characterization: Consumer and Networking. They are characterized based on throughput and latency. Case studies show how they can be used to evaluate metrics beyond throughput and latency (i.e. traffic distribution). The contribution of this work is two-fold: 1) This study provides a methodology for benchmark creation and characterization using NoCBench that evaluates important metrics in NOC design (i.e. end-to-end packet delay, throughput). 2) The developed full-system simulation platform provides a complete environment for further benchmark characterization on NOC based MpSoC as well as system-level design space exploration.
APA, Harvard, Vancouver, ISO, and other styles
25

Jheng, Cyun-Yi, and 鄭群逸. "On-line Real-Time Task Management for Three-Dimensional Network on Chips." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/25059078362492973281.

Full text
Abstract:
碩士
國立臺灣科技大學
電機工程系
100
Three-dimensional network on chips (3D NoCs) have been developed to achieve high computation and communication throughput for the growing demands of applications. But with increased applications and cores, real-time task management for 3D NoCs becomes a major challenge. The difficulty of this process is due to the precedence constraint between applications and the communication overhead between cores. In this paper, we propose an on-line real-time task management framework for 3D NoCs to minimize the makespan of each application and minimize the communication overhead of systems, in order to meet the quality of service requirements and maximize system performance. Schedulability and scalability of the proposed methodology is evaluated by a series of experiments, with encouraging results.
APA, Harvard, Vancouver, ISO, and other styles
26

Kansal, Rohan. "A Pure STT-MRAM Design for High-bandwidth Low-power On-chip Interconnects." Thesis, 2013. http://hdl.handle.net/1969.1/151161.

Full text
Abstract:
Network-on-Chip (NoC) is a de facto inter-core communication infrastructure for future Chip Multiprocessors (CMPs). NoC should be designed to provide both low latency and high bandwidth considering limited on-chip power and area budgets. The use of a high density and low leakage memory, Spin-Torque Transfer Magnetic RAM (STT-MRAM), in NoC routers has been proposed as it increases network throughput by providing more buffer capacities with the same die footprint. However, the inevitable use of SRAM to hide the long write latencies of STT-MRAM sacrifices buffer area and also wastes significant leakage and dynamic power in migrating flits between the disparate memories. In this thesis, the first NoC router designs that use only STT-MRAM is proposed. This allows for a much larger buffer space with the least power consumptions. To overcome the multi-cycle writes, a multi-banked STT-MRAM buffer is employed, which is a logically divided virtual channel where every incoming flit is seamlessly pipelined to each bank alternately every clock cycle simple latches inside the router links. Our STT-MRAM has aggressively reduced retention time, resulting in a significant reduction in latency and power overheads of write operations. We observe flit losses in our STT-MRAM buffer, and propose cost-efficient dynamic buffer refresh schemes to minimize unnecessary refreshes with minimum hardware overheads. Simulation results show that our STT-MRAM NoC router enhances the throughput by 21.6% and achieves 61% savings in dynamic power and 18% savings in total router power, respectively compared to a conventional SRAM based NoC router of same area.
APA, Harvard, Vancouver, ISO, and other styles
27

Jain, Tushar Naveen Kumar. "Asynchronous Bypass Channels Improving Performance for Multi-synchronous Network-on-chips." Thesis, 2010. http://hdl.handle.net/1969.1/ETD-TAMU-2010-08-8457.

Full text
Abstract:
Dr. Paul V. Gratz Network-on-Chip (NoC) designs have emerged as a replacement for traditional shared-bus designs for on-chip communications. As with all current VLSI design, however, reducing power consumption in NoCs is a critical challenge. One approach to reduce power is to dynamically scale the voltage and frequency of each network node or groups of nodes (DVFS). Another approach to reduce power consumption is to replace the balanced clock tree with a globally-asynchronous, locally-synchronous (GALS) clocking scheme. NoCs implemented with either of these schemes, however, tend to have high latencies as packets must be synchronized at the intermediate nodes between source and destination. In this work, we propose a novel router microarchitecture which offers superior performance versus typical synchroniz- ing router designs. Our approach features Asynchronous Bypass Channels (ABCs) at intermediate nodes thus avoiding synchronization delay. We also propose a new network topology and routing algorithm that leverage the advantages of the bypass channel offered by our router design. Our experiments show that our design improves the performance of a conventional synchronizing design with similar resources by up to 26 percent at low loads and increases saturation throughput by up to 50 percent.
APA, Harvard, Vancouver, ISO, and other styles
28

Liu, Hsiang-Ning, and 劉祥甯. "Infrastructure IPs for Testing and Repairing RAMs in Network-on-Chips." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/40985457839375557301.

Full text
Abstract:
碩士
國立中央大學
電機工程研究所
96
With the advent CMOS technology, more and more transistors can be integrated in a single chip. To cope with the bottleneck of performance, power, reliability, etc. of a complicated chip, the chip designer tends to design the chip using a regular architecture with on-chip communication network. Network-on-Chip (NoC) is one popular on-chip communication approach for large-scale chips. Also, memory core usually is the most used component in such complex chips. This thesis proposes a packet-based built-in self-test (BIST) scheme and a packet-based built-in self-repair (BISR) scheme for testing and repairing random access memories (RAMs) in mesh-based NoCs. The BIST and BISR schemes reuse the NoC to transport test patterns such that the number of RAMs tested and repaired by the BIST and BISR circuits are not limited by the routing issue. Therefore, the area overhead of the BIST and BISR circuits can be drastically reduced. Moreover, the proposed BIST and BISR schemes can reduce the memory test time and increase the yield of the chip. Experimental results show that the area overhead of the packet-based BIST circuit and the packet-based BISR circuit is only about 0.92% and 1.38% for fifteen 8Kx64-bit RAMs, respectively. Also, the BISR scheme can efficiently boost the yield of the chip. For example, the yield of fifteen 8Kx64-bit RAMs can be boosted from 80% to 94%.
APA, Harvard, Vancouver, ISO, and other styles
29

Chao, Wen-Shuo, and 趙文碩. "Design of an Express Router for Mesh-Based Network-on-Chips." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/80910400624104792012.

Full text
Abstract:
碩士
國立雲林科技大學
資訊工程研究所
99
Network-on-Chip (NoC) is a better solution to fulfill modern on chip interconnection design. It includes routers, links, and intellectual properties (IPs). The router is an important component for delivering packets in NoCs. In the literature, the common design of a router is based on a 5x5 crossbar interconnect to pass packets between IPs. In this study, we propose a novel router architecture that can reduce the distance of a hop in the transmission path of each packet, receive a lower average packet delay, and avoid the congestion at the terminal node of the path. The proposed router is an express router, which make packets bypass a destination router to its destination IP directly. In comparison with a traditional router, experimental results show that the performance of NoCs with E-Routers can be enhanced a lot.
APA, Harvard, Vancouver, ISO, and other styles
30

Lee, Wan-Yu, and 李婉毓. "Topology Generation and Floorplanning for Low Power Application-Specific Network-on-Chips." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/19177299131868851848.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Jao, Zhi-Chun, and 饒智淳. "Design of a Partially Buffered Crossbar Router for Mesh-based Network-on-Chips." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/24177252714651534860.

Full text
Abstract:
碩士
國立雲林科技大學
資訊工程研究所
99
With an increase in the number of transistors on-chip, the complexity of the system also increases. In order to cope with the growing interconnection infrastructure, the Networks-on-chip (NoC) concept was introduced. The router plays an important role since it can affect the overall performance of the NoC. In the literature, the NoC employs input-queued routers to receive and transmit packets. In this study, we employ a Partially Buffered Crossbar (PBC) router architecture which consists of virtual channels (VCs) and a small number of separate internal buffers that are maintained per fabric column output for the NoC. In our experiment, results show that the performance of a PBC router can improve a lot compared with a traditional virtual channel (TVC) router.
APA, Harvard, Vancouver, ISO, and other styles
32

Hsin, Hsien-Kai, and 辛賢楷. "Ant-Colony Optimization-based Adaptive Routing Algorithms and Architectures in Network-on-Chips Systems." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/58611715011937821057.

Full text
Abstract:
博士
國立臺灣大學
電子工程學研究所
102
As semiconductor technology continues to advance, increasing complexity and interconnection delay are becoming limiting factors in system-on-chip (SoC) designs. To increase the efficiency of interconnections and meet data transfer requirements, network-on-chip (NoC) systems have proven to be a flexible, scalable, and reusable solution for chip multiprocessor (CMP) systems. To achieve a high system throughput rate, the packet-switched NoC multiplexes packets on channels and shares network resources among these packet flows. However, the packet congestion problem in channels results in unpredictable delays for each packet flow. As the system size increases, the network traffic load tends to become unbalanced with various applications. The congestion in channels increases queuing delays in the routing path, which not only causes network congestion but also dissipates additional energy. Congestion problem cause severely degradation on the overall system performance, especially in real-time applications with strict latency requirements. Therefore, to overcome the problem of traffic congestion, packet routing is a critical design challenge for high-performance NoC. An effective adaptive routing algorithm can help minimize network congestion through load balancing. However, conventional adaptive routing schemes only use current channel-based information to detect the congestion status. Because of the lack of historical network information, channel-based information has difficulty showing the real congestion status under time-variant traffic patterns. To predict temporal network congestion, we apply a bio-inspired approach, Ant Colony Optimization (ACO), to identify the near-future non-congested path to a desired target according to historical network information. Distributed artificial ant agents migrate from node to node and emulate laying of pheromone by updating the corresponding entries in the routing (or pheromone) tables in different nodes which record. However, conventional ACO-based adaptive routing have not consider the integration of spatial/temporal information and multiple congestion factors includes deadlock and faulty nodes. There are three main topics in this work. First, we use additional temporal and spatial information provides better approximation of network status for global load-balancing. In spatial domain, to acquire the spatial range of congestion information, we record historical buffer information from routers within two-hop of distances, which helps to extend spatial pheromone coverage. In temporal domain, we adopt the concept of Exponential Moving Average (EMA) from stock market to use multiple pheromone for capturing hidden-state dependencies of upcoming congestion status In the second part of this dissertation, we establish a framework on analyzing the network information and showed how to integrate the spatial and temporal network information. The proposed framework can indicate arbitrary combinations of network information and corresponding routing algorithms. Based on this framework, we use the concept of diffusive pheromone to integration the information and show that we can reconfigure the ACO-PhD algorithm to each routing algorithm in its subsets by adjusting the parameter settings. In the third part of this dissertation, we integrate the congestion-awareness, deadlock-awareness, and fault-awareness information in channel evaluation function to avoid the hotspot around the faulty router. The three steps behavior of an ant colony while facing an obstacle (failure in NoC) as 1) encounter, 2) search, and 3) select is transformed into effective detouring mechanisms to increase the system throughput and faulty tolerance under faulty network. In summary, the proposed routing schemes can effectively mitigate the spatial and temporal traffic congestion in NoC and achieve good performance with feasible cost.
APA, Harvard, Vancouver, ISO, and other styles
33

Chang, En-Jui, and 張恩瑞. "Congestion-Aware High-Efficiency Adaptive Routing Algorithms and Architectures in Network-on-Chips Systems." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/34194738558208248837.

Full text
Abstract:
博士
國立臺灣大學
電子工程學研究所
102
As semiconductor technology continues to advance, increasing complexity and interconnection delay are becoming limiting factors in system-on-chip (SoC) designs. To increase the efficiency of interconnections and meet data transfer requirements, network-on-chip (NoC) systems have proven to be a flexible, scalable, and reusable solution for chip multiprocessor (CMP) systems. To achieve a high system throughput rate, the packet-switched NoC multiplexes packets on channels and shares network resources among these packet flows. However, the packet contention problem in switches results in unpredictable delays for each packet flow. As the system size increases, the network traffic load tends to become unbalanced with various applications. Switches and channels are prone to congestion, which increases queuing delays in the routing path. It not only causes network congestion but also dissipates additional energy. Congestion problem severely degrades the overall system performance, especially in real-time applications with strict latency requirements. Therefore, to overcome the problem of traffic congestion, packet routing is a critical design challenge for high-performance NoC. An effective adaptive routing algorithm can help minimize network congestion through load balancing. However, conventional adaptive routing schemes only use current channel-based information to detect the congestion status. Because of the lack of fine-grained path-congestion model and historical network information, channel-based information has difficulty showing the real congestion status under non-uniform and time-variant traffic patterns. To effectively relieve spatial traffic congestion and predict traffic-flow trends, we propose model-based and bio-inspired routing schemes. In this dissertation, our goal is to both improve the selection efficiency and cost efficiency of adaptive routing algorithms in the resource-limited NoC system. There are two main topics in this work. First, to figure out hidden spatial congestion information, we analyze the router latency and propose a model-based approach to improve the selection efficiency of adaptive routing algorithms. Namely, it simultaneously considers two congestion situations, switch congestion and channel congestion. Moreover, to overcome imbalance traffic problem, we propose a path-diversity-aware adaptive routing scheme to evenly distribute traffic load on the available channels. In the second part of this dissertation, to predict temporal network congestion, we apply a bio-inspired approach, Ant Colony Optimization (ACO), to identify the near-future non-congested path to a desired target according to historical network information. However, the cost of ACO-based adaptive routing is too high for implementation in resource-limited NoCs. Therefore, in considering the NoC topology, router dependency, and pheromone characteristics, we propose a cost-efficient ACO-based adaptive routing with a regional routing table, which has potential to reduce routing table cost. Moreover, to further improve the selection efficiency of ACO-based routing, we propose the early backward-ant mechanism to provide extra feedback congestion to enhance the learning process, which improves the system performance. In summary, the proposed routing schemes can effectively mitigate the spatial and temporal traffic congestion in NoC and achieve a good trade-off between cost and performance.
APA, Harvard, Vancouver, ISO, and other styles
34

Narayanasetty, Bhargavi. "Analysis of high performance interconnect in SoC with distributed switches and multiple issue bus protocols." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-05-3325.

Full text
Abstract:
In a System on a Chip (SoC), interconnect is the factor limiting Performance, Power, Area and Schedule (PPAS). Distributed crossbar switches also called as Switching Central Resources (SCR) are often used to implement high performance interconnect in a SoC – Network on a Chip (NoC). Multiple issue bus protocols like AXI (from ARM), VBUSM (from TI) are used in paths critical to the performance of the whole chip. Experimental analysis of effects on PPAS by architectural modifications to the SCRs is carried out, using synthesis tools and Texas Instruments (TI) in house power estimation tools. The effects of scaling of SCR sizes are discussed in this report. These results provide a quick means of estimation for architectural changes in the early design phase. Apart from SCR design, the other major domain, which is a concern, is deadlocks. Deadlocks are situations where the network resources are suspended waiting for each other. In this report various kinds of deadlocks are classified and their respective mitigations in such networks are provided. These analyses are necessary to qualify distributed SCR interconnect, which uses multiple issue protocols, across all scenarios of transactions. The entire analysis in this report is carried out using a flagship product of Texas Instruments. This ASIC SoC is a complex wireless base station developed in 2010- 2011, having 20 major cores. Since the parameters of crossbar switches with multiple issue bus protocols are commonly used in SoCs across the semiconductor industry, this reports provides us a strong basis for architectural/design selection and validation of all such high performance device interconnects. This report can be used as a seed for the development of an interface tool for architects. For a given architecture, the tool suggests architectural modifications, and reports deadlock situations. This new tool will aid architects to close design problems and bring provide a competitive specification very early in the design cycle. A working algorithm for the tool development is included in this report.
text
APA, Harvard, Vancouver, ISO, and other styles
35

Biswas, Arnab Kumar. "Securing Multiprocessor Systems-on-Chip." Thesis, 2016. https://etd.iisc.ac.in/handle/2005/2554.

Full text
Abstract:
With Multiprocessor Systems-on-Chips (MPSoCs) pervading our lives, security issues are emerging as a serious problem and attacks against these systems are becoming more critical and sophisticated. We have designed and implemented different hardware based solutions to ensure security of an MPSoC. Security assisting modules can be implemented at different abstraction levels of an MPSoC design. We propose solutions both at circuit level and system level of abstractions. At the VLSI circuit level abstraction, we consider the problem of presence of noise voltage in input signal coming from outside world. This noise voltage disturbs the normal circuit operation inside a chip causing false logic reception. If the disturbance is caused intentionally the security of a chip may be compromised causing glitch/transient attack. We propose an input receiver with hysteresis characteristic that can work at voltage levels between 0.9V and 5V. The circuit can protect the MPSoC from glitch/transient attack. At the system level, we propose solutions targeting Network-on-Chip (NoC) as the on-chip communication medium. We survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC enabled MPSoC. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple Trusted Execution Environments (TEEs). Software attacks, the most common type of attacks, mainly exploit vulnerabilities like buffer overflow. This is possible if proper access control to memory is absent in the system. We propose four hardware based mechanisms to implement Role Based Access Control (RBAC) model in NoC based MPSoC.
MHRD PhD scholarship
APA, Harvard, Vancouver, ISO, and other styles
36

Biswas, Arnab Kumar. "Securing Multiprocessor Systems-on-Chip." Thesis, 2016. http://etd.iisc.ernet.in/handle/2005/2554.

Full text
Abstract:
MHRD PhD scholarship
With Multiprocessor Systems-on-Chips (MPSoCs) pervading our lives, security issues are emerging as a serious problem and attacks against these systems are becoming more critical and sophisticated. We have designed and implemented different hardware based solutions to ensure security of an MPSoC. Security assisting modules can be implemented at different abstraction levels of an MPSoC design. We propose solutions both at circuit level and system level of abstractions. At the VLSI circuit level abstraction, we consider the problem of presence of noise voltage in input signal coming from outside world. This noise voltage disturbs the normal circuit operation inside a chip causing false logic reception. If the disturbance is caused intentionally the security of a chip may be compromised causing glitch/transient attack. We propose an input receiver with hysteresis characteristic that can work at voltage levels between 0.9V and 5V. The circuit can protect the MPSoC from glitch/transient attack. At the system level, we propose solutions targeting Network-on-Chip (NoC) as the on-chip communication medium. We survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC enabled MPSoC. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple Trusted Execution Environments (TEEs). Software attacks, the most common type of attacks, mainly exploit vulnerabilities like buffer overflow. This is possible if proper access control to memory is absent in the system. We propose four hardware based mechanisms to implement Role Based Access Control (RBAC) model in NoC based MPSoC.
APA, Harvard, Vancouver, ISO, and other styles
37

Basavaraj, T. "NoC Design & Optimization of Multicore Media Processors." Thesis, 2013. http://etd.iisc.ac.in/handle/2005/3296.

Full text
Abstract:
Network on Chips[1][2][3][4] are critical elements of modern System on Chip(SoC) as well as Chip Multiprocessor(CMP)designs. Network on Chips (NoCs) help manage high complexity of designing large chips by decoupling computation from communication. SoCs and CMPs have a multiplicity of communicating entities like programmable processing elements, hardware acceleration engines, memory blocks as well as off-chip interfaces. With power having become a serious design constraint[5], there is a great need for designing NoC which meets the target communication requirements, while minimizing power using all the tricks available at the architecture, microarchitecture and circuit levels of the de-sign. This thesis presents a holistic, QoS based, power optimal design solution of a NoC inside a CMP taking into account link microarchitecture and processor tile configurations. Guaranteeing QoS by NoCs involves guaranteeing bandwidth and throughput for connections and deterministic latencies in communication paths. Label Switching based Network-on-Chip(LS-NoC) uses a centralized LS-NoC Management framework that engineers traffic into QoS guaranteed routes. LS-NoC uses label switching, enables band-width reservation, allows physical link sharing and leverages advantages of both packet and circuit switching techniques. A flow identification algorithm takes into account band-width available in individual links to establish QoS guaranteed routes. LS-NoC caters to the requirements of streaming applications where communication channels are fixed over the lifetime of the application. The proposed NoC framework inherently supports heterogeneous and ad-hoc SoC designs. A multicast, broadcast capable label switched router for the LS-NoC has been de-signed, verified, synthesized, placed and routed and timing analyzed. A 5 port, 256 bit data bus, 4 bit label router occupies 0.431 mm2 in 130nm and delivers peak band-width of80Gbits/s per link at312.5MHz. LS Router is estimated to consume 43.08 mW. Bandwidth and latency guarantees of LS-NoC have been demonstrated on streaming applications like Hiper LAN/2 and Object Recognition Processor, Constant Bit Rate traffic patterns and video decoder traffic representing Variable Bit Rate traffic. LS-NoC was found to have a competitive figure of merit with state-of-the-art NoCs providing QoS. We envision the use of LS-NoC in general purpose CMPs where applications demand deterministic latencies and hard bandwidth requirements. Design variables for interconnect exploration include wire width, wire spacing, repeater size and spacing, degree of pipelining, supply, threshold voltage, activity and coupling factors. An optimal link configuration in terms of number of pipeline stages for a given length of link and desired operating frequency is arrived at. Optimal configurations of all links in the NoC are identified and a power-performance optimal NoC is presented. We presents a latency, power and performance trade-off study of NoCs using link microarchitecture exploration. The design and implementation of a framework for such a design space exploration study is also presented. We present the trade-off study on NoCs by varying microarchitectural(e.g. pipelining) and circuit level(e.g. frequency and voltage) parameters. A System-C based NoC exploration framework is used to explore impacts of various architectural and microarchitectural level parameters of NoC elements on power and performance of the NoC. The framework enables the designer to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. Latency, power and throughput results using this framework to study a 4x4 CMP are presented. The framework is used to study NoC designs of a CMP using different classes of parallel computing benchmarks[6]. One of the key findings is that the average latency of a link can be reduced by increasing pipeline depth to a certain extent, as it enables link operation at higher link frequencies. Abstract There exists an optimum degree of pipelining which minimizes the energy-delay product of the link. In a 2D Torus when the longest link is pipelined by 4 stages at which point least latency(1.56 times minimum) is achieved and power(40% of max) and throughput (64%of max) are nominal. Using frequency scaling experiments, power variations of up to40%,26.6% and24% can be seen in 2D Torus, Reduced 2D Torus and Tree based NoC between various pipeline configurations to achieve same frequency at constant voltages. Also in some cases, we find that switching to a higher pipelining configuration can actually help reduce power as the links can be designed with smaller repeaters. We also find that the overall performance of the ICNs is determined by the lengths of the links needed to support the communication patterns. Thus the mesh seems to perform the best amongst the three topologies(Mesh, Torus and Folded Torus) considered in case studies. The effects of communication overheads on performance, power and energy of a multiprocessor chip using L1,L2 cache sizes as primary exploration parameters using accurate interconnect, processor, on-chip and off-chip memory modelling are presented. On-chip and off-chip communication times have significant impact on execution time and the energy efficiency of CMPs. Large cache simply larger tile area that result in longer inter-tile communication link lengths and latencies, thus adversely impacting communication time. Smaller caches potentially have higher number of misses and frequent of off-tile communication. Energy efficient tile design is a configuration exploration and trade-off study using different cache sizes and tile areas to identify a power-performance optimal configuration for the CMP. Trade-offs are explored using a detailed, cycle accurate, multicore simulation frame-work which includes superscalar processor cores, cache coherent memory hierarchies, on-chip point-to-point communication networks and detailed interconnect model including pipelining and latency. Sapphire, a detailed multiprocessor execution environment integrating SESC, Ruby and DRAM Sim was used to run applications from the Splash2 benchmark(64KpointFFT).Link latencies are estimated for a16 core CMP simulation on Sapphire. Each tile has a single processor, L1 and L2 caches and a router. Different sizesofL1 andL2lead to different tile clock speeds, tile miss rates and tile area and hence interconnect latency. Simulations across various L1, L2 sizes indicate that the tile configuration that maximizes energy efficiency is related to minimizing communication time. Experiments also indicate different optimal tile configurations for performance, energy and energy efficiency. Clustered interconnection network, communication aware cache bank mapping and thread mapping to physical cores are also explored as potential energy saving solutions. Results indicate that ignoring link latencies can lead to large errors in estimates of program completion times, of up to 17%. Performance optimal configurations are achieved at lower L1 caches and at moderateL2 cache sizes due to higher operating frequencies and smaller link lengths and comparatively lesser communication. Using minimal L1 cache size to operate at the highest frequency may not always be the performance-power optimal choice. Larger L1 sizes, despite a drop in frequency, offer a energy advantage due to lesser communication due to misses. Clustered tile placement experiments for FFT show considerable performance per watt improvement (1.2%). Remapping most accessed L2 banks by a process in the same core or neighbouring cores after communication traffic analysis offers power and performance advantages. Remapped processes and banks in clustered tile placement show a performance per watt improvement of5.25% and energy reductionof2.53%. This suggests that processors could execute a program in multiple modes, for example, minimum energy, maximum performance.
APA, Harvard, Vancouver, ISO, and other styles
38

Basavaraj, T. "NoC Design & Optimization of Multicore Media Processors." Thesis, 2013. http://etd.iisc.ernet.in/2005/3296.

Full text
Abstract:
Network on Chips[1][2][3][4] are critical elements of modern System on Chip(SoC) as well as Chip Multiprocessor(CMP)designs. Network on Chips (NoCs) help manage high complexity of designing large chips by decoupling computation from communication. SoCs and CMPs have a multiplicity of communicating entities like programmable processing elements, hardware acceleration engines, memory blocks as well as off-chip interfaces. With power having become a serious design constraint[5], there is a great need for designing NoC which meets the target communication requirements, while minimizing power using all the tricks available at the architecture, microarchitecture and circuit levels of the de-sign. This thesis presents a holistic, QoS based, power optimal design solution of a NoC inside a CMP taking into account link microarchitecture and processor tile configurations. Guaranteeing QoS by NoCs involves guaranteeing bandwidth and throughput for connections and deterministic latencies in communication paths. Label Switching based Network-on-Chip(LS-NoC) uses a centralized LS-NoC Management framework that engineers traffic into QoS guaranteed routes. LS-NoC uses label switching, enables band-width reservation, allows physical link sharing and leverages advantages of both packet and circuit switching techniques. A flow identification algorithm takes into account band-width available in individual links to establish QoS guaranteed routes. LS-NoC caters to the requirements of streaming applications where communication channels are fixed over the lifetime of the application. The proposed NoC framework inherently supports heterogeneous and ad-hoc SoC designs. A multicast, broadcast capable label switched router for the LS-NoC has been de-signed, verified, synthesized, placed and routed and timing analyzed. A 5 port, 256 bit data bus, 4 bit label router occupies 0.431 mm2 in 130nm and delivers peak band-width of80Gbits/s per link at312.5MHz. LS Router is estimated to consume 43.08 mW. Bandwidth and latency guarantees of LS-NoC have been demonstrated on streaming applications like Hiper LAN/2 and Object Recognition Processor, Constant Bit Rate traffic patterns and video decoder traffic representing Variable Bit Rate traffic. LS-NoC was found to have a competitive figure of merit with state-of-the-art NoCs providing QoS. We envision the use of LS-NoC in general purpose CMPs where applications demand deterministic latencies and hard bandwidth requirements. Design variables for interconnect exploration include wire width, wire spacing, repeater size and spacing, degree of pipelining, supply, threshold voltage, activity and coupling factors. An optimal link configuration in terms of number of pipeline stages for a given length of link and desired operating frequency is arrived at. Optimal configurations of all links in the NoC are identified and a power-performance optimal NoC is presented. We presents a latency, power and performance trade-off study of NoCs using link microarchitecture exploration. The design and implementation of a framework for such a design space exploration study is also presented. We present the trade-off study on NoCs by varying microarchitectural(e.g. pipelining) and circuit level(e.g. frequency and voltage) parameters. A System-C based NoC exploration framework is used to explore impacts of various architectural and microarchitectural level parameters of NoC elements on power and performance of the NoC. The framework enables the designer to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. Latency, power and throughput results using this framework to study a 4x4 CMP are presented. The framework is used to study NoC designs of a CMP using different classes of parallel computing benchmarks[6]. One of the key findings is that the average latency of a link can be reduced by increasing pipeline depth to a certain extent, as it enables link operation at higher link frequencies. Abstract There exists an optimum degree of pipelining which minimizes the energy-delay product of the link. In a 2D Torus when the longest link is pipelined by 4 stages at which point least latency(1.56 times minimum) is achieved and power(40% of max) and throughput (64%of max) are nominal. Using frequency scaling experiments, power variations of up to40%,26.6% and24% can be seen in 2D Torus, Reduced 2D Torus and Tree based NoC between various pipeline configurations to achieve same frequency at constant voltages. Also in some cases, we find that switching to a higher pipelining configuration can actually help reduce power as the links can be designed with smaller repeaters. We also find that the overall performance of the ICNs is determined by the lengths of the links needed to support the communication patterns. Thus the mesh seems to perform the best amongst the three topologies(Mesh, Torus and Folded Torus) considered in case studies. The effects of communication overheads on performance, power and energy of a multiprocessor chip using L1,L2 cache sizes as primary exploration parameters using accurate interconnect, processor, on-chip and off-chip memory modelling are presented. On-chip and off-chip communication times have significant impact on execution time and the energy efficiency of CMPs. Large cache simply larger tile area that result in longer inter-tile communication link lengths and latencies, thus adversely impacting communication time. Smaller caches potentially have higher number of misses and frequent of off-tile communication. Energy efficient tile design is a configuration exploration and trade-off study using different cache sizes and tile areas to identify a power-performance optimal configuration for the CMP. Trade-offs are explored using a detailed, cycle accurate, multicore simulation frame-work which includes superscalar processor cores, cache coherent memory hierarchies, on-chip point-to-point communication networks and detailed interconnect model including pipelining and latency. Sapphire, a detailed multiprocessor execution environment integrating SESC, Ruby and DRAM Sim was used to run applications from the Splash2 benchmark(64KpointFFT).Link latencies are estimated for a16 core CMP simulation on Sapphire. Each tile has a single processor, L1 and L2 caches and a router. Different sizesofL1 andL2lead to different tile clock speeds, tile miss rates and tile area and hence interconnect latency. Simulations across various L1, L2 sizes indicate that the tile configuration that maximizes energy efficiency is related to minimizing communication time. Experiments also indicate different optimal tile configurations for performance, energy and energy efficiency. Clustered interconnection network, communication aware cache bank mapping and thread mapping to physical cores are also explored as potential energy saving solutions. Results indicate that ignoring link latencies can lead to large errors in estimates of program completion times, of up to 17%. Performance optimal configurations are achieved at lower L1 caches and at moderateL2 cache sizes due to higher operating frequencies and smaller link lengths and comparatively lesser communication. Using minimal L1 cache size to operate at the highest frequency may not always be the performance-power optimal choice. Larger L1 sizes, despite a drop in frequency, offer a energy advantage due to lesser communication due to misses. Clustered tile placement experiments for FFT show considerable performance per watt improvement (1.2%). Remapping most accessed L2 banks by a process in the same core or neighbouring cores after communication traffic analysis offers power and performance advantages. Remapped processes and banks in clustered tile placement show a performance per watt improvement of5.25% and energy reductionof2.53%. This suggests that processors could execute a program in multiple modes, for example, minimum energy, maximum performance.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography