To see the other types of publications on this topic, follow the link: Set of instructions.

Dissertations / Theses on the topic 'Set of instructions'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Set of instructions.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Necsulescu, Philip I. "Automatic Generation of Hardware for Custom Instructions." Thèse, Université d'Ottawa / University of Ottawa, 2011. http://hdl.handle.net/10393/20153.

Full text
Abstract:
The Software/Hardware Implementation and Research Architecture (SHIRA) is a C to hardware toolchain developed by the Computer Architecture Research Group (CARG) of the University of Ottawa. The framework and algorithms to generate the hardware from an Intermediate Representation (IR) of the C code is needed. This dissertation presents the conceiving, design, and development of a module that generates the hardware for custom instructions identified by specialized SHIRA components without the need for any user interaction. The module is programmed in Java and takes a Data Flow Graph (DFG) as an IR for input. It then generates VHDL code that targets the Altera FPGAs. It is possible to use separate components for each operation or to set a maximum number for each component which leads to component reuse and reduces chip area use. The performance improvement of the generated code is compared to using only the processor’s standard instruction set.
APA, Harvard, Vancouver, ISO, and other styles
2

Nagpal, Radhika. "Store Buffers : implementing single cycle store instructions in write-through, write-back and set associative caches." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/36678.

Full text
Abstract:
Thesis (B.S. and M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (p. 87).
This thesis proposes a new mechanism, called Store Buffers, for implementing single cycle store instructions in a pipelined processor. Single cycle store instructions are difficult to implement because in most cases the tag check must be performed before the data can be written into the data cache. Store buffers allow a store instruction to read the cache tag as it. passes through the pipe while keeping the store instruction data buffered in a backup register until the data cache is free. This strategy guarantees single cycle store execution without increasing the hit access time or degrading the performance of the data cache for simple direct-mapped caches, as well as for more complex set associative and write-back caches. As larger caches are incorporated on-chip, the speed of store instructions becomes an increasingly important part of the overall performance. The first part of the thesis describes the design and implementation of store buffers in write through, write-back, direct-mapped and set associative caches. The second part describes the implementation and simulation of store buffers in a 6-stage pipeline with a direct mapped write-through pipelined cache. The performance of this method is compared to other cache write techniques. Preliminary results show that store buffers perform better than other store strategies under high IO latencies and cache thrashing. With as few as three buffers, they significantly reduce the number of cycles per instruction.
by Radhika Nagpal.
B.S.and M.S.
APA, Harvard, Vancouver, ISO, and other styles
3

Schneider, Nicole. "Parameters: Suites of unique abstract prints generated through the use of a limited set of form-making instructions." Kent State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1334282036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zmily, Ahmad Darweesh. "Block-aware instruction set architecture /." May be available electronically:, 2007. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Schoepke, Olaf S. "Dense instruction set computer architecture." Thesis, University of Bath, 1992. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.332540.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Shi, Xiaomu. "Certification of an Instruction Set Simulator." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00937524.

Full text
Abstract:
Cette thèse expose nos travaux de certification d'une partie d'un programme C/C++ nommé SimSoC (Simulation of System on Chip), qui simule le comportement d'archi- tectures basées sur des processeurs tels que ARM, PowerPC, MIPS ou SH4. Un simulateur de System on Chip peut être utilisé pour developper le logiciel d'un système embarqué spécifique, afin de raccourcir les phases des développement et de test, en particulier quand la vitesse de simulation est réaliste (environ 100 millions d'instructions par seconde par cœur dans le cas de SimSoC). Les réductions de temps et de coût de développement obtenues se traduisent par des cycles de conception interactifs et rapides, en évitant la lourdeur d'un système de développement matériel. SimSoC est un logiciel complexe, comprenant environ 60 000 de C++, intégrant des parties écrites en SystemC et des optimisations non triviales pour atteindre une grande vitesse de simulation. La partie de SimSoC dédiée au processeur ARM, l'un des plus répandus dans le domaine des SoC, transcrit les informations contenues dans un manuel épais de plus de 1000 pages. Les erreurs sont inévitables à ce niveau de complexité, et certaines sont passées au travers des tests intensifs effectués sur la version précédente de SimSoC pour l'ARMv5, qui réussissait tout de même à simuler l'amorçage complet de linux. Un problème critique se pose alors : le simulateur simule-t-il effectivement le matériel réel ? Pour apporter des éléments de réponse positifs à cette question, notre travail vise à prouver la correction d'une partie significative de SimSoC, de sorte à augmenter la confiance de l'utilisateur en ce similateur notamment pour des systèmes critiques. Nous avons concentré nos efforts sur un composant particulièrement sensible de SimSoC : le simulateur du jeu d'instructions de l'ARMv6, faisant partie de la version actuelle de SimSoC. Les approches basées sur une sémantique axiomatique (logique de Hoare par exemple) sont les plus répandues en preuve de programmes impératifs. Cependant, nous avons préféré essayer une approche moins classique mais plus directe, basée sur la sémantique opérationnelle de C : cela était rendu possible en théorie depuis la formalisation en Coq d'une telle sémantique au sein du projet CompCert et mettait à notre disposition toute la puissance de Coq pour gérer la complexitité de la spécification. À notre connaissance, au delà de la certification d'un simulateur, il s'agit de la première expérience de preuve de correction de programmes C à cette échelle basée sur la sémantique opérationnelle. Nous définissons une représentation du jeu d'instruction ARM et de ses modes d'adressage formalisée en Coq, grâce à un générateur automatique prenant en entrée le pseudo-code des instructions issu du manuel de référence ARM. Nous générons égale- ment l'arbre syntaxique abstrait CompCert du code C simulant les mêmes instructions au sein de Simlight, une version allégée de SimSoC. À partir de ces deux représentations Coq, nous pouvons énoncer et démontrer la correction de Simlight, en nous appuyant sur la sémantique opérationnelle définie dans CompCert. Cette méthodologie a été appliquée à au moins une instruction de chaque catégorie du jeu d'instruction de l'ARM. Au passage, nous avons amélioré la technologie disponible en Coq pour effectuer des inversions, une forme de raisonnement utilisée intensivement dans ce type de situation.
APA, Harvard, Vancouver, ISO, and other styles
7

Wright, Stephen. "Formal construction of Instruction Set Architectures." Thesis, University of Bristol, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508307.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bennett, Richard Vincent. "Increasing the efficacy of automated instruction set extension." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5789.

Full text
Abstract:
The use of Instruction Set Extension (ISE) in customising embedded processors for a specific application has been studied extensively in recent years. The addition of a set of complex arithmetic instructions to a baseline core has proven to be a cost-effective means of meeting design performance requirements. This thesis proposes and evaluates a reconfigurable ISE implementation called “Configurable Flow Accelerators” (CFAs), a number of refinements to an existing Automated ISE (AISE) algorithm called “ISEGEN”, and the effects of source form on AISE. The CFA is demonstrated repeatedly to be a cost-effective design for ISE implementation. A temporal partitioning algorithm called “staggering” is proposed and demonstrated on average to reduce the area of CFA implementation by 37% for only an 8% reduction in acceleration. This thesis then turns to concerns within the ISEGEN AISE algorithm. A methodology for finding a good static heuristic weighting vector for ISEGEN is proposed and demonstrated. Up to 100% of merit is shown to be lost or gained through the choice of vector. ISEGEN early-termination is introduced and shown to improve the runtime of the algorithm by up to 7.26x, and 5.82x on average. An extension to the ISEGEN heuristic to account for pipelining is proposed and evaluated, increasing acceleration by up to an additional 1.5x. An energyaware heuristic is added to ISEGEN, which reduces the energy used by a CFA implementation of a set of ISEs by an average of 1.6x, up to 3.6x. This result directly contradicts the frequently espoused notion that “bigger is better” in ISE. The last stretch of work in this thesis is concerned with source-level transformation: the effect of changing the representation of the application on the quality of the combined hardwaresoftware solution. A methodology for combined exploration of source transformation and ISE is presented, and demonstrated to improve the acceleration of the result by an average of 35% versus ISE alone. Floating point is demonstrated to perform worse than fixed point, for all design concerns and applications studied here, regardless of ISEs employed.
APA, Harvard, Vancouver, ISO, and other styles
9

Lee, Vinson 1978. "Instruction set and simulation framework for transactional memory." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87369.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Moreira, João Carlos Peralta. "An instruction set simulator for VLIW DSP architectures." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/18675.

Full text
Abstract:
Engenharia Eletrónica e Telecomunicações
Dissertação apresentada a Universidade de Aveiro para cumprimento dos requisitos necessários a obtenção do grau de Mestre em Engenharia Eletrónica e Telecomunicações, realizada sob a orientação científica do Professor Doutor Manuel Bernardo Salvador Cunha, Professor Auxiliar do Departamento de Eletrónica, Telecomunicações e Informática da Universidade de Aveiro e Doutor Mohamed Bamakhrama, Hardware Tools Engineer na equipa "Processor and Compiler Tools" no grupo "Imaging and Camera Technologies", Intel Eindhoven, Países Baixos.
Dissertation presented to Universidade de Aveiro with the goal of achieving a Master's Degree in Electronics and Telecommunications, made with the scienti c orientation of Professor Manuel Bernardo Salvador Cunha PhD, Professor at the Department of Electronic, Telecommunications and Informatics from Universidade de Aveiro and Mohamed Bamakhrama, Hardware Tools Engineer at Processor and Compiler Tools Team of Intel's Imaging and Camera Technologies Group, Eindhoven.
APA, Harvard, Vancouver, ISO, and other styles
11

Kim, Jang Dae. "An instruction-set process calculus for synchronous hardware composition." Related electronic resource: Current Research at SU : database of SU dissertations, recent titles available full text, 2002. http://wwwlib.umi.com/cr/syr/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Saghir, Mazen A. R. "Application-specific instruction-set architectures for embedded DSP applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0021/NQ53899.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Glökler, Tilman Meyr Heinrich. "Design of energy-efficient application-specific instruction set processors /." Boston, Mass. [u.a.] : Kluwer Acad. Publ, 2004. http://www.loc.gov/catdir/enhancements/fy0820/2004041376-d.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Dittmann, Gero [Verfasser]. "On Instruction-Set Generation for Specialized Processors / Gero Dittmann." Aachen : Shaker, 2006. http://d-nb.info/1170532837/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Zuluaga, Marcela. "Efficient design-space exploration of custom instruction-set extensions." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/4630.

Full text
Abstract:
Customization of processors with instruction set extensions (ISEs) is a technique that improves performance through parallelization with a reasonable area overhead, in exchange for additional design effort. This thesis presents a collection of novel techniques that reduce the design effort and cost of generating ISEs by advancing automation and reconfigurability. In addition, these techniques maximize the perfomance gained as a function of the additional commited resources. Including ISEs into a processor design implies development at many levels. Most prior works on ISEs solve separate stages of the design: identification, selection, and implementation. However, the interations between these stages also hold important design trade-offs. In particular, this thesis addresses the lack of interaction between the hardware implementation stage and the two previous stages. Interaction with the implementation stage has been mostly limited to accurately measuring the area and timing requirements of the implementation of each ISE candidate as a separate hardware module. However, the need to independently generate a hardware datapath for each ISE limits the flexibility of the design and the performance gains. Hence, resource sharing is essential in order to create a customized unit with multi-function capabilities. Previously proposed resource-sharing techniques aggressively share resources amongst the ISEs, thus minimizing the area of the solution at any cost. However, it is shown that aggressively sharing resources leads to large ISE datapath latency. Thus, this thesis presents an original heuristic that can be parameterized in order to control the degree of resource sharing amongst a given set of ISEs, thereby permitting the exploration of the existing implementation trade-offs between instruction latency and area savings. In addition, this thesis introduces an innovative predictive model that is able to quickly expose the optimal trade-offs of this design space. Compared to an exhaustive exploration of the design space, the predictive model is shown to reduce by two orders of magnitude the number of executions of the resource-sharing algorithm that are required in order to find the optimal trade-offs. This thesis presents a technique that is the first one to combine the design spaces of ISE selection and resource sharing in ISE datapath synthesis, in order to offer the designer solutions that achieve maximum speedup and maximum resource utilization using the available area. Optimal trade-offs in the design space are found by guiding the selection process to favour ISE combinations that are likely to share resources with low speedup losses. Experimental results show that this combined approach unveils new trade-offs between speedup and area that are not identified by previous selection techniques; speedups of up to 238% over previous selection thecniques were obtained. Finally, multi-cycle ISEs can be pipelined in order to increase their throughput. However, it is shown that traditional ISE identification techniques do not allow this optimization due to control flow overhead. In order to obtain the benefits of overlapping loop executions, this thesis proposes to carefully insert loop control flow statements into the ISEs, thus allowing the ISE to control the iterations of the loop. The proposed ISEs broaden the scope of instruction-level parallelism and obtain higher speedups compared to traditional ISEs, primarily through pipelining, the exploitation of spatial parallelism, and reducing the overhead of control flow statements and branches. A detailed case study of a real application shows that the proposed method achieves 91% higher speedups than the state-of-the-art, with an area overhead of less than 8% in hardware implementation.
APA, Harvard, Vancouver, ISO, and other styles
16

Pajak, Dominic. "Specification of microprocessor instruction set architectures : ARM case study." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.422038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Rockers, Daniel M. "A Revised Instruction Set for the Booklet Category Test." Thesis, University of North Texas, 1996. https://digital.library.unt.edu/ark:/67531/metadc278025/.

Full text
Abstract:
Eighty-eight (N = 88) non-brain-injured adults were tested with one of two versions of the Booklet Category Test (BCT). Forty-four (N = 44) individuals were tested with the standard version of the BCT, and forty-four (N = 44) were tested with a revised BCT in which between-subtest cueing was removed, called the Noncued Category Test (NCT). The results of this study indicate that removal of cueing instructions changes the Category test significantly. Subjects administered the NCT scored significantly more errors than those who were administered the standard Category test. While BCT scores correlated significantly with nonverbal intelligence scores, NCT scores did not. However, the difference in these correlations was not significant, indicating that the intelligence aspect measured in the two versions is not different. Neither the BCT nor the NCT correlated significantly with the Wisconsin Card Sort, Word Fluency, Stroop, or Trail Making Test. It is recommended that the NCT be administered to circumscribed clinical populations in order to best utilize present findings.
APA, Harvard, Vancouver, ISO, and other styles
18

Bauer, Heiner. "Dynamic instruction set extension of microprocessors with embedded FPGAs." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-222858.

Full text
Abstract:
Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described
Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt
APA, Harvard, Vancouver, ISO, and other styles
19

Mapes, Glenn. "An instruction set simulator for the 8086 16-bit microprocessor." Virtual Press, 1985. http://liblink.bsu.edu/uhtbin/catkey/416976.

Full text
Abstract:
The intent of this thesis is to show the usefulness simulating of an instruction set in software and to demonstrate the feasibility of doing so by providing the framework of a simulation program.The design of new computer architectures and computer based control systems is a trial and error process. Normal design practice is to design and build a prototype of the new system and then evaluate the performance of the prototype. Designing complex systems in this manner is very time consuming and expensive; using a software program to simulate the operation of the new system can help solve certain design problems and shorten the development time and effort.The instruction set simulator executes a subset of the 8086 instruction set and contains routines that are useful in debugging the target software.The feasibility of implementing an instruction set simulator to solve certain design problems has been demonstrated by implementing the most commonly used op codes from the 8086 instruction set.Ball State UniversityMuncie, IN 47306
APA, Harvard, Vancouver, ISO, and other styles
20

Andersson, Olof, and Karl Bengtsson. "Adapting an FPGA-optimized microprocessor to the MIPS32 instruction set." Thesis, Linköping University, Computer Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54680.

Full text
Abstract:

Nowadays, FPGAs are large enough to host entire system-on-chip designs, wherein a soft core processor is often an integral part. High performance of the processor is always desirable, so there is an interest in finding faster solutions.This report aims to describe the work and results performed by Karl Bengtson and Olof Andersson at ISY. The task was to continue the development of a soft core microprocessor, originally created by Andreas Ehliar. The first step was to decide a more widely adopted instruction set for the processor. The choice fell upon the MIPS32 instruction set. The main work of the project has been focused on implementing support for MIPS32, allowing the processor to execute MIPS assembly language programs. The development has been done with speed optimization in mind. For every new function, the effects on the maximum frequency has been considered, and solutions not satisfying the speed requirements has been abandoned or revised.The performance has been measured by running a benchmark program—Coremark. Comparison has also been made to the main competitors among soft core processors. The results were positive, and reported a higher Coremark score than the other processors inthe study. The processor described herein still lacks many essential features. Nevertheless, the conclusion is that it may be possible to create a competitive alternative to established soft processors.


FPGAer används idag ofta för stora inbyggda system, i vilka en mjuk processor ofta spelar en viktig roll. Hög prestanda hos processorn är alltid önskvärt, så det finns ett intresse i att hitta snabbare lösningar. Denna rapport skall beskriva det arbete och de resultat som uppnåtts av Karl Bengtson och Olof Andersson på ISY. Uppgiften var att fortsätta utvecklandet av en mjuk processor, som ursprungligen skapats av Andreas Ehliar. Första steget var att välja ut en mer allmänt använd instruktionsuppsättning för processorn. Valet föll på instruktionsuppsättningsarkitekturen MIPS32. Projektets huvutarbete har varit fokuserat på att implementera stöd för MIPS32, vilket ger processorn möjlighet att köra assemblerprogram för MIPS.Utvecklingen har gjorts med hastighetsoptimering i beaktning. För varje ny funktion har dess effekter på maxfrekvensen undersökts,och lösningar som inte uppfyllt hastighetskraven har förkastats eller reviderats. Prestandan har mätts med programmet Coremark. Det har också gjorts jämförelser med huvudkonkurrenterna bland mjuka processorer. Resultaten var positiva, och rapporterade ett högre Coremarkpoäng än de andra processorerna i studien. Slutsatsen är att det ärmöjligt att skapa ett alternativ till de etablerade mjuka processorerna, men att denna processor fortfarande saknar väsentliga funktioner som behövs för att utgöra en mogen produkt.

APA, Harvard, Vancouver, ISO, and other styles
21

Williams, Fleur Liane. "The impact of instruction set orthogonality on compiler code generation." Thesis, University of Hertfordshire, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252688.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Wagstaff, Harry. "From high level architecture descriptions to fast instruction set simulators." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14162.

Full text
Abstract:
As computer systems become increasingly complex and diverse, so too do the architectures they implement. This leads to an increase in complexity in the tools used to design new hardware and software. One particularly important tool in hardware and software design is the Instruction Set Simulator, which is used to prototype new architectures and hardware features, verify hardware, and test and debug software. Many Architecture Description Languages exist which facilitate the description of new architectural or hardware features, and generate a tools such as simulators. However, these typically suffer from poor performance, are difficult to test effectively, and may be limited in functionality. This thesis considers three objectives when developing Instruction Set Simulators: performance, correctness, and completeness, and presents techniques which contribute to each of these. Performance is obtained by combining Dynamic Binary Translation techniques with a novel analysis of high level architecture descriptions. This makes use of partial evaluation techniques in order to both improve the translation system, and to improve the quality of the translated code, leading a performance improvement of over 2.5x compared to a naïve implementation. This thesis also presents techniques which contribute to the correctness objective. Each possible behaviour of each described instruction is used to guide the generation of a test case. Constraint satisfaction techniques are used to determine the necessary instruction encoding and context for each behaviour to be produced. It is shown that this is a significant improvement over benchmark-driven testing, and this technique has led to the discovery of several bugs and inconsistencies in multiple state of the art instruction set simulators. Finally, several challenges in ‘Full System’ simulation are addressed, contributing to both the performance and completeness objectives. Full System simulation generally carries significant performance costs compared with other simulation strategies. Crucially, instructions which access memory require virtual to physical address translation and can now cause exceptions. Both of these processes must be correctly and efficiently handled by the simulator. This thesis presents novel techniques to address this issue which provide up to a 1.65x speedup over a state of the art solution.
APA, Harvard, Vancouver, ISO, and other styles
23

Seeds, Michael A. "THE ATIS INSTRUCTION SET FOR COMMUNICATION WITH ROBOTIC ASTRONOMICAL TELESCOPES." International Foundation for Telemetering, 1997. http://hdl.handle.net/10150/607396.

Full text
Abstract:
International Telemetering Conference Proceedings / October 27-30, 1997 / Riviera Hotel and Convention Center, Las Vegas, Nevada
Astronomers now communicate over Internet with robotic astronomical telescopes using a specially designed instruction set. ATIS, Automatic Telescope Instruction Set, is designed to communicate specific, technical instructions to a robotic telescope, facilitate data retrieval and analysis, support a wide range of data formats, and also convey preference information that describe the astronomers general needs for data acquisition. Over a dozen telescopes now use ATIS and more are under construction.
APA, Harvard, Vancouver, ISO, and other styles
24

Min, Byoung Woo. "Adding the modal mu-calculus to the instruction-set process calculus." Related electronic resource: Current Research at SU : database of SU dissertations, recent titles available full text, 2005. http://wwwlib.umi.com/cr/syr/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Radhakrishnan, Swarnalatha Computer Science &amp Engineering Faculty of Engineering UNSW. "Heterogeneous multi-pipeline application specific instruction-set processor design and implementation." Awarded by:University of New South Wales. Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/29161.

Full text
Abstract:
Embedded systems are becoming ubiquitous, primarily due to the fast evolution of digital electronic devices. The design of modern embedded systems requires systems to exhibit, high performance and reliability, yet have short design time and low cost. Application Specific Instruction set processors (ASIPs) are widely used in embedded system since they are economical to use, flexible, and reusable (thus saves design time). During the last decade research work on ASIPs have been carried out in mainly for single pipelined processors. Improving performance in processors is possible by exploring the available parallelism in the program. Designing of multiple parallel execution paths for parallel execution of the processor naturally incurs additional cost. The methodology presented in this dissertation has addressed the problem of improving performance in ASIPs, at minimal additional cost. The devised methodology explores the available parallelism of an application to generate a multi-pipeline heterogeneous ASIP. The processor design is application specific. No pre-defined IPs are used in the design. The generated processor contains multiple standalone pipelined data paths, which are not necessarily identical, and are connected by the necessary bypass paths and control signals. Control unit are separate for each pipeline (though with the same clock) resulting in a simple and cost effective design. By using separate instruction and data memories (Harvard architecture) and by allowing memory access by two separate pipes, the complexity of the controller and buses are reduced. The impact of higher memory latencies is nullified by utilizing parallel pipes during memory access. Efficient bypass network selection and encoding techniques provide a better implementation. The initial design approach with only two pipelines without bypass paths show speed improvements of up to 36% and switching activity reductions of up to 11%. The additional area costs around 16%. An improved design with different number of pipelines (more than two) based on applications show on average of 77% performance improvement with overheads of: 49% on area; 51% on leakage power; 17% on switching activity; and 69% on code size. The design was further trimmed, with bypass path selection and encoding techniques, which show a saving of up to 32% of area and 34% of leakage power with 6% performance improvement and 69% of code size reduction compared to the design approach without these techniques in the multi pipeline design.
APA, Harvard, Vancouver, ISO, and other styles
26

Ponnala, Kalyan. "DESIGN AND IMPLEMENTATION OF THE INSTRUCTION SET ARCHITECTURE FOR DATA LARS." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/58.

Full text
Abstract:
The ideal memory system assumed by most programmers is one which has high capacity, yet allows any word to be accessed instantaneously. To make the hardware approximate this performance, an increasingly complex memory hierarchy, using caches and techniques like automatic prefetch, has evolved. However, as the gap between processor and memory speeds continues to widen, these programmer-visible mechanisms are becoming inadequate. Part of the recent increase in processor performance has been due to the introduction of programmer/compiler-visible SWAR (SIMD Within A Register) parallel processing on increasingly wide DATA LARs (Line Associative Registers) as a way to both improve data access speed and increase efficiency of SWAR processing. Although the base concept of DATA LARs predates this thesis, this thesis presents the first instruction set architecture specification complete enough to allow construction of a detailed prototype hardware design. This design was implemented and tested using a hardware simulator.
APA, Harvard, Vancouver, ISO, and other styles
27

Chatterjee, Aakriti. "Development of an RSA Algorithm using Reduced RISC V instruction Set." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1617104502129937.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Rawat, Hemendra Kumar. "Vector Instruction Set Extensions for Efficient and Reliable Computation of Keccak." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/72857.

Full text
Abstract:
Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instructions are optimized for a single algorithm and provide significant performance improvements over software written using general-purpose instruction set. However, today's secure systems and protocols do not rely on just one, but a suite of many cryptographic applications that are expected to work in a correct and reliable manner. In this work, we propose a new instruction set for supporting efficient and reliable cryptography on modern processors. For efficiency, we propose flexible instruction set extensions for Keccak, a cryptographic kernel for hashing, authenticated encryption, key-stream generation and random-number generation. Keccak is the basis of the SHA-3 standard and the newly proposed Keyak and Ketje authenticated ciphers. For reliability, we propose a set of trusted instructions to verify the integrity of a cryptographic software library. These instructions are aimed at detecting tamper in the software or in the configurable hardware. We develop the instruction extensions for a 128-bit interface, commonly available in the vector processing unit of many modern processors. Simulation results on GEM5 architectural simulator show that the proposed instructions not only improves the performance of Keccak applications by 2 times (over NEON programming) and 6 times (over assembly programming), but also improves the reliability of applications at a performance overhead of just 6%.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
29

Montcalm, Michael R. "Scheduling Algorithms for Instruction Set Extended Symmetrical Homogeneous Multiprocessor Systems-on-Chip." Thèse, Université d'Ottawa / University of Ottawa, 2011. http://hdl.handle.net/10393/20056.

Full text
Abstract:
Embedded system designers face multiple challenges in fulfilling the runtime requirements of programs. Effective scheduling of programs is required to extract as much parallelism as possible. These scheduling algorithms must also improve speedup after instruction-set extensions have occurred. Scheduling of dynamic code at run time is made more difficult when the static components of the program are scheduled inefficiently. This research aims to optimize a program’s static code at compile time. This is achieved with four algorithms designed to schedule code at the task and instruction level. Additionally, the algorithms improve scheduling using instruction set extended code on symmetrical homogeneous multiprocessor systems. Using these algorithms, we achieve speedups up to 3.86X over sequential execution for a 4-issue 2-processor system, and show better performance than recent heuristic techniques for small programs. Finally, the algorithms generate speedup values for a 64-point FFT that are similar to the test runs.
APA, Harvard, Vancouver, ISO, and other styles
30

Vogt, Timo. "A reconfigurable application-specific instruction-set processor for trellis-based channel decoding /." Kaiserslautern : Techn. Univ. Kaiserslautern, 2008. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=016537958&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Shee, Seng Lin Computer Science &amp Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.

Full text
Abstract:
This thesis presents design automation methodologies for extensible processor platforms in application specific domains. The work presents first a single processor approach for customization; a methodology that can rapidly create different processor configurations by the removal of unused instructions sets from the architecture. A profile directed approach is used to identify frequently used instructions and to eliminate unused opcodes from the available instruction pool. A coprocessor approach is next explored to create an SoC (System-on-Chip) to speedup the application while reducing energy consumption. Loops in applications are identified and accelerated by tightly coupling a coprocessor to an ASIP (Application Specific Instruction-set Processor). Latency hiding is used to exploit the parallelism provided by this architecture. A case study has been performed on a JPEG encoding algorithm; comparing two different coprocessor approaches: a high-level synthesis approach and our custom coprocessor approach. The thesis concludes by introducing a heterogenous multi-processor system using ASIPs as processing entities in a pipeline configuration. The problem of mapping each algorithmic stage in the system to an ASIP configuration is formulated. We proposed an estimation technique to calculate runtimes of the configured multiprocessor system without running cycle-accurate simulations, which could take a significant amount of time. We present two heuristics to efficiently search the design space of a pipeline-based multi ASIP system and compare the results against an exhaustive approach. In our first approach, we show that, on average, processor size can be reduced by 30%, energy consumption by 24%, while performance is improved by 24%. In the coprocessor approach, compared with the use of a main processor alone, a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, as against 1.58x for the high level synthesis method, and 1.33x for the customized instruction approach. Energy savings are 57%, 28% and 19%, respectively. Our multiprocessor design provides a performance improvement of at least 4.03x for JPEG and 3.31x for MP3, for a single processor design system. The minimum cost obtained using our heuristic was within 0.43% and 0.29% of the optimum values for the JPEG and MP3 benchmarks respectively.
APA, Harvard, Vancouver, ISO, and other styles
32

Curtis, Bryce Allen. "A special instruction set multiple chip computer for DSP : architecture and compiler design." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15736.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Sohl, Joar. "Efficient Compilation for Application Specific Instruction set DSP Processors with Multi-bank Memories." Doctoral thesis, Linköpings universitet, Datorteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-113702.

Full text
Abstract:
Modern signal processing systems require more and more processing capacity as times goes on. Previously, large increases in speed and power efficiency have come from process technology improvements. However, lately the gain from process improvements have been greatly reduced. Currently, the way forward for high-performance systems is to use specialized hardware and/or parallel designs. Application Specific Integrated Circuits (ASICs) have long been used to accelerate the processing of tasks that are too computationally heavy for more general processors. The problem with ASICs is that they are costly to develop and verify, and the product life time can be limited with newer standards. Since they are very specific the applicable domain is very narrow. More general processors are more flexible and can easily adapt to perform the functions of ASIC based designs. However, the generality comes with a performance cost that renders general designs unusable for some tasks. The question then becomes, how general can a processor be while still being power efficient and fast enough for some particular domain? Application Specific Instruction set Processors (ASIPs) are processors that target a specific application domain, and can offer enough performance  with power efficiency and silicon cost that is comparable to ASICs. The flexibility allows for the same hardware design to be used over several system designs, and also for multiple functions in the same system, if some functions are not used simultaneously. One problem with ASIPs is that they are more difficult to program than a general purpose processor, given that we want efficient software. Utilizing all of the features that give an ASIP its performance advantage can be difficult at times, and new tools and methods for programming them are needed. This thesis will present ePUMA (embedded Parallel DSP platform with Unique Memory Access), an ASIP architecture that targets algorithms with predictable data access. These kinds of algorithms are very common in e.g. baseband processing or multimedia applications. The primary focus will be on the specific features of ePUMA that are utilized to achieve high performance, and how it is possible to automatically utilize them using tools. The most significant features include data permutation for conflict-free data access, and utilization of address generation features for overhead free code execution. This sometimes requires specific information; for example the exact sequences of addresses in memory that are accessed, or that some operations may be performed in parallel. This is not always available when writing code using the traditional way with traditional languages, e.g. C, as extracting this information is still a very active research topic. In the near future at least, the way that software is written needs to change to exploit all hardware features, but in many cases in a positive way. Often the problem with current methods is that code is overly specific, and that a more general abstractions are actually easier to generate code from.
APA, Harvard, Vancouver, ISO, and other styles
34

Afuah, Allan Nembo. "Strategic adoption of innovation--the case of reduced instruction set computer (RISC) technology." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/11613.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Degenbaev, Ulan [Verfasser], and Wolfgang J. [Akademischer Betreuer] Paul. "Formal specification of the x86 instruction set architecture / Ulan Degenbaev. Betreuer: Wolfgang J. Paul." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2012. http://d-nb.info/105227885X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Johnston, Erin. "Shared instruction-set extensions for soft multiprocessor systems implemented on field-programmable gate arrays." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/43695.

Full text
Abstract:
Soft-core embedded systems implemented on FPGAs offer a high level of flexibility. Application specific customizations can be added in the form of extensions to the processor’s regular instruction-set. These custom instructions benefit run-time performance, but come at the cost of increased resource usage. Reducing the overall FPGA area required to implement a system will decrease static power consumption and allow a smaller, cheaper device to be used. There is a constant effort to reduce area and power consumption while maintaining performance benefits attained through customizations. This thesis presents a new architecture to share custom instruction units among multiple processors in a system. This implementation allows run-time performance benefits to be maintained while decreasing the overall resource usage. The shared architecture is implemented using an arbitrator to determine processor access to each custom instruction in a set. Custom instruction inputs and outputs are controlled using additional multiplexors and selection hardware. Results for a sample system using fine-grained custom instructions show that sharing can reduce the implementation area by up to 24% with minimal impact to the critical path delay. This reduction remains high at 19% for a coarse-grained case study of an encryption algorithm called SHA. The custom instruction configuration depends on the application being performed. A benchmark generator and simulator are also developed to evaluate candidates for custom instruction implementation and efficiently explore the design space. The overall run-time performance of the candidate systems can also be evaluated using these tools. The simulator can also be used with an input trace to determine cycle accurate run-time performance for a real application, without requiring the entire system to be designed and implemented in hardware. The simulator shows up to 53% run-time improvement for a shared fine-grained system over a system with no custom instructions. Hardware run-time results for the coarsegrained case study improve run-time up to 13.5% over a system with no custom instructions.
APA, Harvard, Vancouver, ISO, and other styles
37

Lim, Wei Ming. "Design of application specific instruction set processors for the domain of GF(2'm)." Thesis, University of Sheffield, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412439.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Martin, Susan Marie. "A comprehensive curriculum for drum set in the college percussion studio." Diss., The University of Arizona, 1994. http://hdl.handle.net/10150/186837.

Full text
Abstract:
The purpose of this study is to develop a comprehensive curriculum for drum set in the college percussion studio. The main emphasis of the paper is to provide information addressing the needs of the percussion student over a four-year course of drum set study. In addition, I will show how these needs can best be met through the use of both existing instructional materials and original supplemental materials written by the author. The need for a drum set curriculum is defined and an in-depth review made of selected extant materials. After defining the general guidelines for the curriculum, a limited number of instructional materials were chosen from the extant materials which could adequately and affordably fulfill these guidelines. Recommended studies for the freshman through senior years are outlined and instructional objectives defined.
APA, Harvard, Vancouver, ISO, and other styles
39

Yassin, Yahya H. "ULTRA LOW POWER APPLICATION SPECIFIC INSTRUCTION-SET PROCESSOR DESIGN : for a cardiac beat detector algorithm." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9914.

Full text
Abstract:

High efficiency and low power consumption are among the main topics in embedded systems today. For complex applications, off-the-shelf processor cores might not provide the desired goals in terms of power consumption. By optimizing the processor for the application, or a set of applications, one could improve the computing power by introducing special purpose hardware units. The execution cycle count of the application would in this case be reduced significantly, and the resulting processor would consume less power. In this thesis, some research is done in how to optimize a software and hardware development for ultra low power consumption. A cardiac beat detector algorithm is implemented in ANSI C, and optimized for low power consumption, by using several software power optimization techniques. The resulting application is mapped on a basic processor architecture provided by Target Compiler Technologies. This processor is optimized further for ultra low power consumption by applying application specific hardware, and by using several hardware power optimization techniques. A general processor and the optimized processor has been mapped on a chip, using a 90 nm low power TSMC process. Information about power dissipation is extracted through netlist simulation, and the results of both processors have been compared. The optimized processor consume 55% less average power, and the duty cycle of the processor, i.e., the time in which the processor executes its task with respect to the time budget available, has been reduced from 14% to 2.8%. The reduction in the total execution cycle count is 81%. The possibilities of applying power gating, or voltage and frequency scaling are discussed, and it is concluded that further reduction in power consumption is possible by applying these power optimization techniques. For a given case, the average leakage power dissipation is estimated to be reduced by 97.2%.

APA, Harvard, Vancouver, ISO, and other styles
40

Zheng, Yi Hong, and 鄭一鴻. "Optimal instruction set design." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/37593217098049161237.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Yu-Ru, Yang. "Instruction Set Extension for Interpolation." 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0016-1303200709302045.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Yang, Yu-Ru, and 楊侑儒. "Instruction Set Extension for Interpolation." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/65039211993127491600.

Full text
Abstract:
碩士
國立清華大學
資訊工程學系
94
H.264/AVC is the state-of-the-art video coding standard. One of the basic implementation issues of H.264 standard isthat its computational complexity is very high. By the profiling result, we focus on the interpolation procedure which consumes up to 22% of the execution time. In this thesis, we propose a number of new instructions for interpolation and hardware architecture to support these new instructions. The experimental result shows that by these new instructions, computation time is reduced 30% in average.
APA, Harvard, Vancouver, ISO, and other styles
43

Yo-Ray, Lee. "Instruction Set Extension For Deblocking Filter." 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0016-1303200709264754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Tseng, Cheng-Pin, and 曾成濱. "Instruction Set Extension Exploration on VLIW." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/00656234462118202289.

Full text
Abstract:
碩士
逢甲大學
資訊工程所
97
Both Instruction Set Extension(ISE) on base processor and VLIW architectures are two of the most popular methodology to achieve the required computing power in embedded system design. Selecting operations on critical path as ISE as well as scheduling parallel operations on a multiple-issue architecture can obtain the better performance with efficient hardware utilization. However, manually explore ISE may be very complex and time cost. This thesis presents an automatic design flow for ISE exploration on VLIW architectures using a Force-Directed Scheduling Algorithm and a proposed heuristic. Firstly the C source code will be translated into partial order data flow graph representation, and then to explore efficient ISE by proposed heuristic and cooperating with instruction scheduling for VLIW. Finally, the application code with ISE will be estimated by proposed Estimation Model to estimate the performance improvement. Results indicates that our approach gives average 12% improvement even the maximum ILP is been satisfied by base architecture.
APA, Harvard, Vancouver, ISO, and other styles
45

Lee, Yo-Ray, and 李岳叡. "Instruction Set Extension For Deblocking Filter." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/91596609321006376554.

Full text
Abstract:
碩士
國立清華大學
資訊工程學系
94
H.264/AVC is a new codec standard for video. A traditional DSP architecture is not e±cient enough for this new standard. The pro¯ling results show that Deblocking Filter consumes up to 33% running time of H.264 decoder. In this thesis, we focus on designing new instructions to speed up Deblocking Filter. Based on a standard Star¯sh DSP, we propose novel instructions and architecture for Deblocking Filter. In experiment result, we show that 28.5% improvement in computation time is achieved on Filter Processing part of Deblocking Filter by our new instructions.
APA, Harvard, Vancouver, ISO, and other styles
46

Akkaş, Ahmet. "Instruction set enhancements for reliable computations /." Diss., 2001. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3036247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

"Reducing a complex instruction set computer." Chinese University of Hong Kong, 1988. http://library.cuhk.edu.hk/record=b5885967.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

(10645670), Christopher M. Wright. "EMULATION FOR MULTIPLE INSTRUCTION SET ARCHITECTURES." Thesis, 2021.

Find full text
Abstract:

System emulation and firmware re-hosting are popular techniques to answer various security and performance related questions, such as, does a firmware contain security vulnerabilities or meet timing requirements when run on a specific hardware platform. While this motivation for emulation and binary analysis has previously been explored and reported, starting to work or research in the field is difficult. Further, doing the actual firmware re-hosting for various Instruction Set Architectures(ISA) is usually time consuming and difficult, and at times may seem impossible. To this end, I provide a comprehensive guide for the practitioner or system emulation researcher, along with various tools that work for a large number of ISAs, reducing the challenges of getting re-hosting working or porting previous work for new architectures. I layout the common challenges faced during firmware re-hosting and explain successive steps and survey common tools to overcome these challenges. I provide emulation classification techniques on five different axes, including emulator methods, system type, fidelity, emulator purpose, and control. These classifications and comparison criteria enable the practitioner to determine the appropriate tool for emulation. I use these classifications to categorize popular works in the field and present 28 common challenges faced when creating, emulating and analyzing a system, from obtaining firmware to post emulation analysis. I then introduce a HALucinator [1 ]/QEMU [2 ] tracer tool named HQTracer, a binary function matching tool PMatch, and GHALdra, an emulator that works for more than 30 different ISAs and enables High Level Emulation.

APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Yi-Chieh, and 王繹傑. "Instruction Set Extension for Java Bytecode." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/41889087975284590298.

Full text
Abstract:
碩士
國立成功大學
工程科學系碩博士班
97
This thesis is to define a subset of Java bytecodes that are suitable for instruction set extension as the basis of CPU executing Java bytecode directly. The first step is to analyze the usage of each Java bytecode from benchmark programs, and then identify the bytecodes suitable for instruction set extension. These experiments are performed on the open-source Java Virtual Machine, the JamVM and GNU Classpath. These experiments collect the usage of Java bytecodes of the DaCapo benchmark suite. We implemented a profiler inside the JVM to help tracing and analyzing Java bytecodes executed. This thesis used ARM instruction’s clock cycles to evaluate each Java bytecode cycle counts. The evaluation method is to multiply the execution times of each bytecode by the implementation’s clock cycles. After evaluation, we propose 28 Java bytecodes for instruction set extension. The cycle counts of these 28 bytecodes are accounted for all of the 81.42% of the benchmark programs.
APA, Harvard, Vancouver, ISO, and other styles
50

Κάργας, Χρήστος. "Energy efficient instruction decoding in application: Specific instruction - set processors." Thesis, 2012. http://hdl.handle.net/10889/6295.

Full text
Abstract:
With commercial processor design tools, a designer can quickly design a C- programmable ASIP for a specific application domain. There are several such ASIPs available for both wireless (UWB baseband processing), encryption, and biomedical processing (particularly for ECG beat detection). In traditional CPUs and DSPs the impact of the instruction-set definition and the complexity of the instruction decoder can be substantial, especially in terms of power consumption. Fully orthogonal VLIW processors, do not incur the cost of an instruction decoder that severely. Instead the instruction word becomes very large, thereby shifting the (power-)cost to the program memory or instruction cache. For the purposes of this thesis a SIMD processor is developed and is compared to a soft-SIMD to observe its area, performance and energy efficiency for a bioimaging benchmark and how the processor description in the ASIP language nML, defines the generated HDL. This SIMD processor is turned into orthogonal and using iterative experiments it is investigated, what is the impact on power while manipulating the instruction-set architecture in combination with the program memory size. It is also investigated how instruction-set re-configuration can be exploited to improve power efficiency. Using this investigation guidelines for low-power ASIP design can be produced.
Με τη σύγχρονη τεχνολογία σχεδιασμού επεξεργαστών, ο σχεδιαστής μπορεί με ευκολία να σχεδιάσει ένα προγραμματιζόμενο Επεξεργαστή Συνόλου Εντολών Ειδικού Σκοπού (ASIP - Application-Specific Instruction-set Processor) για ένα συγκεκριμένο εύρος εφαρμογών. Υπάρχουν διάφοροι τέτοιοι επεξεργαστές διαθέσιμοι για ασύρματες εφαρμογές, κρυπτογράφηση και βιοϊατρικές εφαρμογές (π.χ. στον αλγόριθμο εντοπισμού χτύπου ηλεκτροκαρδιογραφήματος). Στους παραδοσιακούς επεξεργαστές και επεξεργαστές σήματος (DSP - Digital Signal Processor) ο ορισμός του συνόλου εντολών και η πολυπλοκότητα έχουν μεγάλη επίδραση, ειδικά στην κατανάλωση ισχύος. Μία πιθανή λύση σε αυτό το πρόβλημα είναι οι ορθογώνιοι επεξεργαστές μεγάλου μεγέθους λέξης εντολής (VLIW - Very Large Instruction Word). Με τον όρο ορθογώνιο επεξεργαστή, ορίζεται ένας επεξεργαστής οριζόντιου σύνολου εντολών, άρα ένας επεξεργαστής στον οποίο μπορεί να υπάρξει κάθε διαθέσιμος συνδυασμός μεταξύ των διαθέσιμων εντολών και των μεθόδων διευθυνσιοδότησης για πρόσβαση στη μνήμη και το αρχείο καταχωρητών. Οι ορθογώνιοι επεξεργαστές δεν επιβαρύνουν τόσο τον αποκωδικοποιητή εντολών. Αντί αυτού το μέγεθος της λέξης της εντολής γίνεται πολύ μεγάλο, και έτσι μετατίθεται το ενεργειακό κόστος στην μνήμη εντολών προγράμματος (program memory )ή την κρυφή μνήμη εντολών προγράμματος (instruction cache). Για τους σκοπούς αυτής της διπλωματικής εργασίας, αναπτύχθηκε ένας επεξεργαστής SIMD, ο οποίος συγκρίνεται με έναν soft-SIMD για να μελετηθούν η απαιτούμενη περιοχή στο ενσωματωμένο, επιδόσεις και κατανάλωση ενέργειας για μία βιοϊατρική εφαρμογή, καθώς και το πως η περιγραφή ενός επεξεργαστή στη γλώσσα περιγραφής επεξεργαστών ASIP nML ορίζει την παραγούμενη γλώσσα περιγραφής υλικού (HDL - Hardware Description Language). Ο επεξεργαστής αυτός μετατρέπεται σε ορθογώνιο, και με τη χρήση επαναληπτικών πειραμάτων μελετάται η επίδραση στην κατανάλωση ενέργειας κατά τη διάρκεια αλλαγών στην αρχιτεκτονική του συνόλου εντολών και του μεγέθους της μνήμης εντολών προγράμματος. Ακόμη μελετάται πως μπορεί να εκμεταλλευτεί ο σχεδιαστής την αναδιάρθρωση του συνόλου εντολών για να βελτιώσει την κατανάλωση ενέργειας.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography