Dissertations / Theses: 'Instruction set architecture'

1

Zmily, Ahmad Darweesh. "Block-aware instruction set architecture /." May be available electronically:, 2007. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Schoepke, Olaf S. "Dense instruction set computer architecture." Thesis, University of Bath, 1992. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.332540.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Glökler, Tilman Meyr Heinrich. "Design of energy-efficient application-specific instruction set processors /." Boston, Mass. [u.a.] : Kluwer Acad. Publ, 2004. http://www.loc.gov/catdir/enhancements/fy0820/2004041376-d.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Wagstaff, Harry. "From high level architecture descriptions to fast instruction set simulators." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14162.

Full text

Abstract:

As computer systems become increasingly complex and diverse, so too do the architectures they implement. This leads to an increase in complexity in the tools used to design new hardware and software. One particularly important tool in hardware and software design is the Instruction Set Simulator, which is used to prototype new architectures and hardware features, verify hardware, and test and debug software. Many Architecture Description Languages exist which facilitate the description of new architectural or hardware features, and generate a tools such as simulators. However, these typically suffer from poor performance, are difficult to test effectively, and may be limited in functionality. This thesis considers three objectives when developing Instruction Set Simulators: performance, correctness, and completeness, and presents techniques which contribute to each of these. Performance is obtained by combining Dynamic Binary Translation techniques with a novel analysis of high level architecture descriptions. This makes use of partial evaluation techniques in order to both improve the translation system, and to improve the quality of the translated code, leading a performance improvement of over 2.5x compared to a naïve implementation. This thesis also presents techniques which contribute to the correctness objective. Each possible behaviour of each described instruction is used to guide the generation of a test case. Constraint satisfaction techniques are used to determine the necessary instruction encoding and context for each behaviour to be produced. It is shown that this is a significant improvement over benchmark-driven testing, and this technique has led to the discovery of several bugs and inconsistencies in multiple state of the art instruction set simulators. Finally, several challenges in ‘Full System’ simulation are addressed, contributing to both the performance and completeness objectives. Full System simulation generally carries significant performance costs compared with other simulation strategies. Crucially, instructions which access memory require virtual to physical address translation and can now cause exceptions. Both of these processes must be correctly and efficiently handled by the simulator. This thesis presents novel techniques to address this issue which provide up to a 1.65x speedup over a state of the art solution.

APA, Harvard, Vancouver, ISO, and other styles

5

Bennett, Richard Vincent. "Increasing the efficacy of automated instruction set extension." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5789.

Full text

Abstract:

The use of Instruction Set Extension (ISE) in customising embedded processors for a specific application has been studied extensively in recent years. The addition of a set of complex arithmetic instructions to a baseline core has proven to be a cost-effective means of meeting design performance requirements. This thesis proposes and evaluates a reconfigurable ISE implementation called “Configurable Flow Accelerators” (CFAs), a number of refinements to an existing Automated ISE (AISE) algorithm called “ISEGEN”, and the effects of source form on AISE. The CFA is demonstrated repeatedly to be a cost-effective design for ISE implementation. A temporal partitioning algorithm called “staggering” is proposed and demonstrated on average to reduce the area of CFA implementation by 37% for only an 8% reduction in acceleration. This thesis then turns to concerns within the ISEGEN AISE algorithm. A methodology for finding a good static heuristic weighting vector for ISEGEN is proposed and demonstrated. Up to 100% of merit is shown to be lost or gained through the choice of vector. ISEGEN early-termination is introduced and shown to improve the runtime of the algorithm by up to 7.26x, and 5.82x on average. An extension to the ISEGEN heuristic to account for pipelining is proposed and evaluated, increasing acceleration by up to an additional 1.5x. An energyaware heuristic is added to ISEGEN, which reduces the energy used by a CFA implementation of a set of ISEs by an average of 1.6x, up to 3.6x. This result directly contradicts the frequently espoused notion that “bigger is better” in ISE. The last stretch of work in this thesis is concerned with source-level transformation: the effect of changing the representation of the application on the quality of the combined hardwaresoftware solution. A methodology for combined exploration of source transformation and ISE is presented, and demonstrated to improve the acceleration of the result by an average of 35% versus ISE alone. Floating point is demonstrated to perform worse than fixed point, for all design concerns and applications studied here, regardless of ISEs employed.

APA, Harvard, Vancouver, ISO, and other styles

6

Ponnala, Kalyan. "DESIGN AND IMPLEMENTATION OF THE INSTRUCTION SET ARCHITECTURE FOR DATA LARS." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/58.

Full text

Abstract:

The ideal memory system assumed by most programmers is one which has high capacity, yet allows any word to be accessed instantaneously. To make the hardware approximate this performance, an increasingly complex memory hierarchy, using caches and techniques like automatic prefetch, has evolved. However, as the gap between processor and memory speeds continues to widen, these programmer-visible mechanisms are becoming inadequate. Part of the recent increase in processor performance has been due to the introduction of programmer/compiler-visible SWAR (SIMD Within A Register) parallel processing on increasingly wide DATA LARs (Line Associative Registers) as a way to both improve data access speed and increase efficiency of SWAR processing. Although the base concept of DATA LARs predates this thesis, this thesis presents the first instruction set architecture specification complete enough to allow construction of a detailed prototype hardware design. This design was implemented and tested using a hardware simulator.

APA, Harvard, Vancouver, ISO, and other styles

7

Curtis, Bryce Allen. "A special instruction set multiple chip computer for DSP : architecture and compiler design." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Mapes, Glenn. "An instruction set simulator for the 8086 16-bit microprocessor." Virtual Press, 1985. http://liblink.bsu.edu/uhtbin/catkey/416976.

Full text

Abstract:

The intent of this thesis is to show the usefulness simulating of an instruction set in software and to demonstrate the feasibility of doing so by providing the framework of a simulation program.The design of new computer architectures and computer based control systems is a trial and error process. Normal design practice is to design and build a prototype of the new system and then evaluate the performance of the prototype. Designing complex systems in this manner is very time consuming and expensive; using a software program to simulate the operation of the new system can help solve certain design problems and shorten the development time and effort.The instruction set simulator executes a subset of the 8086 instruction set and contains routines that are useful in debugging the target software.The feasibility of implementing an instruction set simulator to solve certain design problems has been demonstrated by implementing the most commonly used op codes from the 8086 instruction set.Ball State UniversityMuncie, IN 47306

APA, Harvard, Vancouver, ISO, and other styles

9

Degenbaev, Ulan [Verfasser], and Wolfgang J. [Akademischer Betreuer] Paul. "Formal specification of the x86 instruction set architecture / Ulan Degenbaev. Betreuer: Wolfgang J. Paul." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2012. http://d-nb.info/105227885X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Bauer, Heiner. "Dynamic instruction set extension of microprocessors with embedded FPGAs." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-222858.

Full text

Abstract:

Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described
Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt

APA, Harvard, Vancouver, ISO, and other styles

11

Moustakas, Evangelos. "Design and simulation of a primitive RISC architecture using VHDL /." Online version of thesis, 1991. http://hdl.handle.net/1850/11229.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Chen, Wan-Fu. "A high speed 16-bit RISC processor chip /." Online version of thesis, 1994. http://hdl.handle.net/1850/11754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Ong, Jia Jan. "Hardware realization of Discrete Wavelet Transform Cauchy Reed Solomon Minimal Instruction Set Computer architecture for Wireless Visual Sensor Networks." Thesis, University of Nottingham, 2016. http://eprints.nottingham.ac.uk/32583/.

Full text

Abstract:

Large amount of image data transmitting across the Wireless Visual Sensor Networks (WVSNs) increases the data transmission rate thus increases the power transmission. This would inevitably decreases the operating lifespan of the sensor nodes and affecting the overall operation of WVSNs. Limiting power consumption to prolong battery lifespan is one of the most important goals in WVSNs. To achieve this goal, this thesis presents a novel low complexity Discrete Wavelet Transform (DWT) Cauchy Reed Solomon (CRS) Minimal Instruction Set Computer (MISC) architecture that performs data compression and data encoding (encryption) in a single architecture. There are four different programme instructions were developed to programme the MISC processor, which are Subtract and Branch if Negative (SBN), Galois Field Multiplier (GF MULT), XOR and 11TO8 instructions. With the use of these programme instructions, the developed DWT CRS MISC were programmed to perform DWT image compression to reduce the image size and then encode the DWT coefficients with CRS code to ensure data security and reliability. Both compression and CRS encoding were performed by a single architecture rather than in two separate modules which require a lot of hardware resources (logic slices). By reducing the number of logic slices, the power consumption can be subsequently reduced. Results show that the proposed new DWT CRS MISC architecture implementation requires 142 Slices (Xilinx Virtex-II), 129 slices (Xilinx Spartan-3E), 144 Slices (Xilinx Spartan-3L) and 66 Slices (Xilinx Spartan-6). The developed DWT CRS MISC architecture has lower hardware complexity as compared to other existing systems, such as Crypto-Processor in Xilinx Spartan-6 (4828 Slices), Low-Density Parity-Check in Xilinx Virtex-II (870 slices) and ECBC in Xilinx Spartan-3E (1691 Slices). With the use of RC10 development board, the developed DWT CRS MISC architecture can be implemented onto the Xilinx Spartan-3L FPGA to simulate an actual visual sensor node. This is to verify the feasibility of developing a joint compression, encryption and error correction processing framework in WVSNs.

APA, Harvard, Vancouver, ISO, and other styles

14

Yuan, Fangfang. "Assessing the impact of processor design decisions on simulation based verification complexity using formal modeling with experiments at instruction set architecture level." Thesis, University of Bristol, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.566838.

Full text

Abstract:

The Instruction Set Architecture (ISA) describes the key functionalities of a processor design and is the most comprehensible format for enabling humans to understand the structure of the entire processor design. This thesis first introduces the construction of a generic ISA formal model with mathematical notations rather than programming languages, and demonstrates the extensions towards specific ISA designs. The stepwise refinement modeling technique gives rise to the hierarchically structured model, which eases the overall comprehensibility of the ISA and reduces the effort required for modeling similar designs. The ISA models serve as self-consistent, complete, and unambiguous specifications for coding, while helping engineers explore different design options beforehand. In the design phase, a selection of features is available to architects in order for the design to be trimmed towards a particular optimization target, e.g. low power consumption or fast computation, which can be assessed before implementation. However, taking verification into consideration, there is to my knowledge no way to estimate the difficulty of verifying a design before coding it. There needs to be a platform and a metric, from which both functional and non-functional properties can be quantitatively represented and then compared before implementation. Hence, this thesis secondly pro- poses a metric, based on the formally reasoned extension of the generic ISA models, as an estimator of some non-functional property, i.e. the verification complexity for achieving verification goals. The main claim of this thesis is that the verification complexity in simulation-based verification can be accurately retrieved from a hierarchically constructed ISA formal model in which the functionalities are fully specified with the correctness preserved. The modeling structure allows relative comparisons at a reasonably high level of abstraction brought by the hierarchically constructed formalization. The analysis on the experimental ISA emulator assesses the quality of the metric and concludes the applicability of the proposed metric.

APA, Harvard, Vancouver, ISO, and other styles

15

Varanasi, Archana. "Course grained low power design flow using UPF /." Online version of thesis, 2009. http://hdl.handle.net/1850/11768.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Aminot, Alexandre. "Placement de tâches dynamique et flexible sur processeur multicoeur asymétrique en fonctionnalités." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM047/document.

Full text

Abstract:

Pour répondre aux besoins de plus en plus hétérogènes des applications (puissance et efficacité énergétique), nous nous intéressons dans cette thèse aux architectures émergentes de type multi-cœur asymétrique en fonctionnalités (FAMP). Ces architectures sont caractérisées par une mise en œuvre non-uniforme des extensions matérielles dans les cœurs (ex. unitée de calculs à virgule flottante (FPU)). Les avantages en surface sont apparents, mais qu'en est-il de l'impact au niveau logiciel, énergétique et performance?Pour répondre à ces questions, la thèse explore la nature de l'utilisation des extensions dans des applications de l'état de l'art et compare différentes méthodes existantes. Pour optimiser le placement de tâches et ainsi augmenter l'efficacité, la thèse propose une solution dynamique au niveau ordonnanceur, appelée ordonnanceur relaxé.Les extensions matérielles sont intéressantes car elles permettent des accélérations difficilement atteignables par la parallélisation sur un multi-cœur. Néanmoins, leurs utilisations par les applications sont faibles et leur coût en termes de surface et consommation énergétique sont importants.En se basant sur ces observations, les points suivants ont été développés:Nous présentons une étude approfondie sur l'utilisation de l'extension vectorielle et FPU dans des applications de l'état de l'artNous comparons plusieurs solutions de gestion des extensions à différent niveaux de granularité temporelle d'action pour comprendre les limites de ces solutions et ainsi définir à quel niveau il faut agir. Peu d'études traitent la question de la granularité d'action pour gérer les extensions.Nous proposons une solution pour estimer en ligne la dégradation de performance à exécuter une tâche sur un cœur sans extension. Afin de permettre la mise à l'échelle des multi-cœurs, le système d'exploitation doit avoir de la flexibilité dans le placement de tâches. Placer une tâche sur un cœur sans extension peut avoir d'importantes conséquences en énergie et en performance. Or à ce jour, il n'existe pas de solution pour estimer cette potentielle dégradation.Nous proposons un ordonnanceur relaxé, basé notre modèle d'estimation de dégradation, qui place les tâches sur un ensemble de cœurs hétérogènes de manière efficace. Nous étudions la flexibilité gagnée ainsi que les conséquences aux niveaux performances et énergie.Les solutions existantes proposent des méthodes pour placer les tâches sur un ensemble de cœurs hétérogènes, or, celles-ci n'étudient pas le compromis entre qualité de service et gain en consommation pour les architectures FAMP.Nos expériences sur simulateur ont montré que l'ordonnanceur peut atteindre une flexibilité de placement significative avec une dégradation en performance de moins de 2%. Comparé à un multi-cœur symétrique, notre solution permet un gain énergétique moyen au niveau cœur de 11 %. Ces résultats sont très encourageant et contribuent au développement d'une plateforme complète FAMP. Cette thèse a fait l'objet d'un dépôt de brevet, de trois communications scientifiques internationales (plus une en soumission), et a contribué à deux projets européens
To meet the increasingly heterogeneous needs of applications (in terms of power and efficiency), this thesis focus on the emerging functionally asymmetric multi-core processor (FAMP) architectures. These architectures are characterized by non-uniform implementation of hardware extensions in the cores (ex. Floating Point Unit (FPU)). The area savings are apparent, but what about the impact in software, energy and performance?To answer these questions, the thesis investigates the nature of the use of extensions in state-of-the-art's applications and compares various existing methods. To optimize the tasks mapping and increase efficiency, the thesis proposes a dynamic solution at scheduler level, called relaxed scheduler.Hardware extensions are valuable because they speed up part of code where the parallelization on multi-core isn't efficient. However, the hardware extensions are under-exploited by applications and their cost in terms of area and power consumption are important.Based on these observations, the following contributions have been proposed:We present a detailed study on the use of vector and FPU extensions in state-of-the-art's applicationsWe compare multiple extension management solutions at different levels of temporal granularity of action, to understand the limitations of these solutions and thus define at which level we must act. Few studies address the issue of the granularity of action to manage extensions.We offer a solution for estimating online performance degradation to run a task on a core without a given extension. To allow the scalability of multi-core, the operating system must have flexibility in the placement of tasks. Placing a task on a core with no extension can have important consequences for energy and performance. But to date, there is no way to estimate this potential degradation.We offer a relaxed scheduler, based on our degradation estimation model, which maps the tasks on a set of heterogeneous cores effectively. We study the flexibility gained and the implications for performance and energy levels. Existing solutions propose methods to map tasks on a heterogeneous set of cores, but they do not study the tradeoff between quality of service and consumption gain for FAMP architectures.Our experiments with simulators have shown that the scheduler can achieve a significantly higher mapping flexibility with a performance degradation of less than 2 %. Compared to a symmetrical multi-core, our solution enables an average energy gain at core level of 11 %. These results are very encouraging and contribute to the development of a comprehensive FAMP platform . This thesis has been the subject of a patent application, three international scientific communications (plus one submission), and contributes to two active european projects

APA, Harvard, Vancouver, ISO, and other styles

17

Wright, Stephen. "Formal construction of Instruction Set Architectures." Thesis, University of Bristol, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508307.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Hedayati, Mohammad Hadi. "Visualization of microprocessor execution in computer architecture courses: a case study at Kabul University." Thesis, University of the Western Cape, 2010. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_4960_1362394106.

Full text

Abstract:

Computer architecture and assembly language programming microprocessor execution are basic courses taught in every computer science department. Generally, however, students have
difficulties in mastering many of the concepts in the courses, particularly students whose first language is not English. In addition to their difficulties in understanding the purpose of given
instructions, students struggle to mentally visualize the data movement, control and processing operations. To address this problem, this research proposed a graphical visualization approach
and investigated the visual illustrations of such concepts and instruction execution by implementing a graphical visualization simulator as a teaching aid. The graphical simulator developed during the course of this research was applied in a computer architecture course at Kabul University, Afghanistan. Results obtained from student evaluation of the simulator show significant
levels of success using the visual simulation teaching aid. The results showed that improved learning was achieved, suggesting that this approach could be useful in other computer science departments in Afghanistan, and elsewhere where similar challenges are experienced.

APA, Harvard, Vancouver, ISO, and other styles

19

Ministr, Martin. "Virtuální platformy pro simulaci instrukčních sad." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-235424.

Full text

Abstract:

This master's thesis deals with creation of generators of the code for existing virtual platforms QEMU and OVP. This work consist of study of techniques, which are used by virtual machines for their work. Main part of this work is the design of process, which transforms input instruction sets to the code used by these virtual platforms. As the result of this work functional programs, which generate the code for these virtual platforms, was created.

APA, Harvard, Vancouver, ISO, and other styles

20

Tell, Eric. "Design of Programmable Baseband Processors." Doctoral thesis, Linköping : Univ, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4377.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Moreira, João Carlos Peralta. "An instruction set simulator for VLIW DSP architectures." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/18675.

Full text

Abstract:

Engenharia Eletrónica e Telecomunicações
Dissertação apresentada a Universidade de Aveiro para cumprimento dos requisitos necessários a obtenção do grau de Mestre em Engenharia Eletrónica e Telecomunicações, realizada sob a orientação científica do Professor Doutor Manuel Bernardo Salvador Cunha, Professor Auxiliar do Departamento de Eletrónica, Telecomunicações e Informática da Universidade de Aveiro e Doutor Mohamed Bamakhrama, Hardware Tools Engineer na equipa "Processor and Compiler Tools" no grupo "Imaging and Camera Technologies", Intel Eindhoven, Países Baixos.
Dissertation presented to Universidade de Aveiro with the goal of achieving a Master's Degree in Electronics and Telecommunications, made with the scienti c orientation of Professor Manuel Bernardo Salvador Cunha PhD, Professor at the Department of Electronic, Telecommunications and Informatics from Universidade de Aveiro and Mohamed Bamakhrama, Hardware Tools Engineer at Processor and Compiler Tools Team of Intel's Imaging and Camera Technologies Group, Eindhoven.

APA, Harvard, Vancouver, ISO, and other styles

22

Barták, Jiří. "Model procesoru RISC-V." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255393.

Full text

Abstract:

The number of application specific instruction set processors is rapidly increasing, because of increased demand for low power and small area designs. A lot of new instruction sets are born, but they are usually confidential. University of California in Berkeley took an opposite approach. The RISC-V instruction set is completely free. This master's thesis focuses on analysis of RISC-V instruction set and two programming languages used to model instruction sets and microarchitectures, CodAL and Chisel. Implementation of RISC-V base instruction set along with multiplication, division and 64-bit address space extensions and implementation of cycle accurate model of Rocket Core-like microarchitecture in CodAL are main goals of this master's thesis. The instruction set model is used to generate the C compiler and the cycle accurate model is used to generate RTL representation, all thanks to Codasip Studio. Generated compiler is compared against the one implemented manually and results are used for instruction set optimizations. RTL is synthesized to Artix 7 FPGA and compared to the Rocket Core synthesis.

APA, Harvard, Vancouver, ISO, and other styles

23

Musasa, Mutombo Mike. "Evaluation of embedded processors for next generation asic : Evaluation of open source Risc-V processors and tools ability to perform packet processing operations compared to Arm Cortex M7 processors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299656.

Full text

Abstract:

Nowadays, network processors are an integral part of information technology. With the deployment of 5G network ramping up around the world, numerous new devices are going to take advantage of their processing power and programming flexibility. Contemporary information technology providers of today such as Ericsson, spend a great amount of financial resources on licensing deals to use processors with proprietary instruction set architecture designs from companies like Arm holdings. There is a new non-proprietary instruction set architecture technology being developed known as Risc-V. There are many open source processors based on Risc-V architecture, but it is still unclear how well an open-source Risc-V processor performs network packet processing tasks compared to an Arm-based processor. The main purpose of this thesis is to design a test model simulating and evaluating how well an open-source Risc-V processor performs packet processing compared to an Arm Cortex M7 processor. This was done by designing a C code simulating some key packet processing functions processing 50 randomly generated 72 bytes data packets. The following functions were tested: framing, parsing, pattern matching, and classification. The code was ported and executed in both an Arm Cortex M7 processor and an emulated open source Risc-V processor. A working packet processing test code was built, evaluated on an Arm Cortex M7 processor. Three different open-source Risc-V processors were tested, Arianne, SweRV core, and Rocket-chip. The execution time of both cases was analyzed and compared. The execution time of the test code on Arm was 67, 5 ns. Based on the results, it can be argued that open source Risc-V processor tools are not fully reliable yet and ready to be used for packet processing applications. Further evaluation should be performed on this topic, with a more in-depth look at the SweRV core processor, at physical open-source Risc-V hardware instead of emulators.
Nätverksprocessorer är en viktig byggsten av informationsteknik idag. I takt med att 5G nätverk byggs ut runt om i världen, många fler enheter kommer att kunna ta del av deras kraftfulla prestanda och programerings flexibilitet. Informationsteknik företag som Ericsson, spenderarmycket ekonomiska resurser på licenser för att kunna använda proprietära instruktionsuppsättnings arkitektur teknik baserade processorer från ARM holdings. Det är väldigt kostam att fortsätta köpa licenser då dessa arkitekturer är en byggsten till designen av många processorer och andra komponenter. Idag finns det en lovande ny processor instruktionsuppsättnings arkitektur teknik som inte är licensierad så kallad Risc-V. Tack vare Risc-V har många propietära och öppen källkod processor utvecklats idag. Det finns dock väldigt lite information kring hur bra de presterar i nätverksapplikationer är känt idag. Kan en öppen-källkod Risc-V processor utföra nätverks databehandling funktioner lika bra som en proprietär Arm Cortex M7 processor? Huvudsyftet med detta arbete är att bygga en test model som undersöker hur väl en öppen-källkod Risc-V baserad processor utför databehandlings operationer av nätverk datapacket jämfört med en Arm Cortex M7 processor. Detta har utförts genom att ta fram en C programmeringskod som simulerar en mottagning och behandling av 72 bytes datapaket. De följande funktionerna testades, inramning, parsning, mönster matchning och klassificering. Koden kompilerades och testades i både en Arm Cortex M7 processor och 3 olika emulerade öppen källkod Risc-V processorer, Arianne, SweRV core och Rocket-chip. Efter att ha testat några öppen källkod Risc-V processorer och använt test koden i en ArmCortex M7 processor, kan det hävdas att öppen-källkod Risc-V processor verktygen inte är tillräckligt pålitliga än. Denna rapport tyder på att öppen-källkod Risc-V emulatorer och verktygen behöver utvecklas mer för att användas i nätverks applikationer. Det finns ett behov av ytterligare undersökning inom detta ämne i framtiden. Exempelvis, en djupare undersökning av SweRV core processor, eller en öppen-källkod Risc-V byggd hårdvara krävs.

APA, Harvard, Vancouver, ISO, and other styles

24

Saghir, Mazen A. R. "Application-specific instruction-set architectures for embedded DSP applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0021/NQ53899.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Pajak, Dominic. "Specification of microprocessor instruction set architectures : ARM case study." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.422038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Bocco, Andrea. "A variable precision hardware acceleration for scientific computing." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI065.

Full text

Abstract:

La plupart des unités matérielles arithmétiques à virgule flottante (en anglais Floating-Point, FP) prennent en charge les formats et les opérations spécifiés dans le standard IEEE 754. Ces formats ont une longueur en bits fixe et sont définis sur 16, 32, 64 et 128 bits. Cependant, certaines applications, par exemple les solveurs de systèmes linéaires, ou encore la géométrie computationnelle, pourraient bénéficier de formats différents pour représenter les flottants sur différentes tailles, avec différents compromis entre les champs des exposant et mantisse. La classe des formats de précision variable (en anglais Variable Precision, VP) répond à ces exigences. L'objectif de cette recherche est de proposer un système de calcul VP capable d'augmenter la précision ou l'efficacité de calcul des problèmes en offrant une granularité plus fine des opérations FP. Ce travail propose un système de calcul FP à VP basé sur trois couches de calcul. La couche externe prend en charge les formats IEEE existants pour les variables d'entrée et de sortie. La couche interne utilise des registres de longueur variable pour les opérations de multiplication-addition à haute précision. Enfin, une couche intermédiaire prend en charge le chargement et le stockage des résultats intermédiaires dans la mémoire cache sans perte de précision, avec un format VP réglable dynamiquement. Le support des formats différents entre la représentation interne et le stockage en mémoire proche permets d'envisager des "grands vecteurs" en VP avec la possibilité d’avoir une haute précision de calcul dans la couche interne. L'unité à VP exploite le format FP UNUM de type I, en proposant des solutions pour remédier à certains de ses difficultés intrinsèques, telles que la latence variable de l'opération interne et l'empreinte mémoire variable des variables intermédiaires. Contrairement aux formats définis par IEEE 754, dans l'UNUM de type I, la taille d'un nombre est stockée dans la représentation elle-même. Ce travail propose une architecture de jeu d'instructions pour programmer le système de calcul VP qui suit la structure des couches de calcul susmentionnée. L'objectif de cette ISA est d'établir une séparation claire entre le format de la mémoire et celui à l'intérieur du coprocesseur. Avec cette ISA, le programmeur peut écrire des programmes VP de telle sorte que les instructions assembleur générées soient décorrélées de la taille et des formats des variables du programme. Cette décorrélation se fait en stockant les informations sur la taille, la précision et le format des variables du programme dans des registres d'état dédiés, à l'intérieur de l'unité VP. Ces registres d’état sont utilisés par une unité de chargement et de stockage (Load and Store Unit, LSU), étroitement couplée à l'unité de calcul VP, qui prend en charge la conversion des données entre les couches de calcul
Most of the Floating-Point (FP) hardware units support the formats and the operations specified in the IEEE 754 standard. These formats have fixed bit-length. They are defined on 16, 32, 64, and 128 bits. However, some applications, such as linear system solvers and computational geometry, benefit from different formats which can express FP numbers on different sizes and different tradeoffs among the exponent and the mantissa fields. The class of Variable Precision (VP) formats meets these requirements. This research proposes a VP FP computing system based on three computation layers. The external layer supports legacy IEEE formats for input and output variables. The internal layer uses variable-length internal registers for inner loop multiply-add. Finally, an intermediate layer supports loads and stores of intermediate results to cache memory without losing precision, with a dynamically adjustable VP format. The VP unit exploits the UNUM type I FP format and proposes solutions to address some of its pitfalls, such as the variable latency of the internal operation and the variable memory footprint of the intermediate variables. Unlike IEEE 754, in UNUM type I the size of a number is stored within its representation. The unit implements a fully pipelined architecture, and it supports up to 512 bits of precision, internally and in memory, for both interval and scalar computing. The user can configure the storage format and the internal computing precision at 8-bit and 64-bit granularity This system is integrated as a RISC-V coprocessor. The system has been prototyped on an FPGA (Field-Programmable Gate Array) platform and also synthesized for a 28nm FDSOI process technology. The respective working frequencies of FPGA and ASIC implementations are 50MHz and 600MHz. Synthesis results show that the estimated chip area is 1.5mm2, and the estimated power consumption is 95mW. The experiments emulated in an FPGA environment show that the latency and the computation accuracy of this system scale linearly with the memory format length set by the user. In cases where legacy IEEE-754 formats do not converge, this architecture can achieve up to 130 decimal digits of precision, increasing the chances of obtaining output data with an accuracy similar to that of the input data. This high accuracy opens the possibility to use direct methods, which are more sensitive to computational error, instead of iterative methods, which always converge. However, their latency is ten times higher than the direct ones. Compared to low precision FP formats, in iterative methods, the usage of high precision VP formats helps to drastically reduce the number of iterations required by the iterative algorithm to converge, reducing the application latency of up to 50%. Compared with the MPFR software library, the proposed unit achieves speedups between 3.5x and 18x, with comparable accuracy

APA, Harvard, Vancouver, ISO, and other styles

27

Fick, David. "A virtual machine framework for domain-specific languages." Diss., Pretoria : [S.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-10192007-163559/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Williams, Fleur Liane. "The impact of instruction set orthogonality on compiler code generation." Thesis, University of Hertfordshire, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252688.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Shee, Seng Lin Computer Science &amp Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.

Full text

Abstract:

This thesis presents design automation methodologies for extensible processor platforms in application specific domains. The work presents first a single processor approach for customization; a methodology that can rapidly create different processor configurations by the removal of unused instructions sets from the architecture. A profile directed approach is used to identify frequently used instructions and to eliminate unused opcodes from the available instruction pool. A coprocessor approach is next explored to create an SoC (System-on-Chip) to speedup the application while reducing energy consumption. Loops in applications are identified and accelerated by tightly coupling a coprocessor to an ASIP (Application Specific Instruction-set Processor). Latency hiding is used to exploit the parallelism provided by this architecture. A case study has been performed on a JPEG encoding algorithm; comparing two different coprocessor approaches: a high-level synthesis approach and our custom coprocessor approach. The thesis concludes by introducing a heterogenous multi-processor system using ASIPs as processing entities in a pipeline configuration. The problem of mapping each algorithmic stage in the system to an ASIP configuration is formulated. We proposed an estimation technique to calculate runtimes of the configured multiprocessor system without running cycle-accurate simulations, which could take a significant amount of time. We present two heuristics to efficiently search the design space of a pipeline-based multi ASIP system and compare the results against an exhaustive approach. In our first approach, we show that, on average, processor size can be reduced by 30%, energy consumption by 24%, while performance is improved by 24%. In the coprocessor approach, compared with the use of a main processor alone, a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, as against 1.58x for the high level synthesis method, and 1.33x for the customized instruction approach. Energy savings are 57%, 28% and 19%, respectively. Our multiprocessor design provides a performance improvement of at least 4.03x for JPEG and 3.31x for MP3, for a single processor design system. The minimum cost obtained using our heuristic was within 0.43% and 0.29% of the optimum values for the JPEG and MP3 benchmarks respectively.

APA, Harvard, Vancouver, ISO, and other styles

30

Dolíhal, Luděk. "Testování generovaných překladačů jazyka c pro procesory ve vestavěných systémech." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-412583.

Full text

Abstract:

Vestavěné systémy se staly nepostradatelnými pro náš každodenní život. Jsou to obvykle úzce zaměřená, vysoce optimalizovaná, jednoúčelová zařízení. Jádro vestavěných zařízení obvykle tvoří jeden nebo více aplikačně specifických instrukčních procesorů. Tato disertační práce se zaměřuje na problematiku testování nástrojú pro návrh aplikačně specifických procesorů a následně i samotných aplikačne specifických procesorů. Snahou bylo vytvořit systém, ve kterém bude možné otestovat jednotlivé nástroje, jako například překladač, assembler, disassembler, debugger. Nicméně vyvstává také potřeba provádět složitější testy, například integrační, které zaručí, že mezi jednotlivými nástroji nevzniká nekompatibilita. Autor vytvořil s podporou přůběžně integračního serveru prostředí, které napomáhá odhalování a odstraňování chyb při návrhu aplikačně specifických procesorů a které je navíc do značné míry automatizované.

APA, Harvard, Vancouver, ISO, and other styles

31

Husár, Adam. "Implementace obecného assembleru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-412779.

Full text

Abstract:

This thesis describes the design of the universal assembler that represents a part of the Lissom project. You will be provided with the description of the assembler architectures and their usual tasks. Special attention is paid to GNU assembler. Designed assembler consists of the fixed and the generated part. The generated part is created automatically from the description of instruction set, that is defined using architecture and instructions set description language ISAC. Using this approach, it is possible to change assembler target architecture automatically. The second part of thesis describes the Parserlib2 library implementation that is a part of the Lissom project and provides the information about the target instruction set for an assembler generator.

APA, Harvard, Vancouver, ISO, and other styles

32

Grad, Mariusz [Verfasser], and Marco [Akademischer Betreuer] Platzner. "Just-in-time processor customization on the feasibility and limitations of FPGA-based dynamically reconfigurable instruction set architectures / Mariusz Grad. Betreuer: Marco Platzner." Paderborn : Universitätsbibliothek, 2011. http://d-nb.info/1036423565/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Schwarz, Oliver. "No Hypervisor Is an Island : System-wide Isolation Guarantees for Low Level Code." Doctoral thesis, KTH, Teoretisk datalogi, TCS, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192466.

Full text

Abstract:

The times when malware was mostly written by curious teenagers are long gone. Nowadays, threats come from criminals, competitors, and government agencies. Some of them are very skilled and very targeted in their attacks. At the same time, our devices – for instance mobile phones and TVs – have become more complex, connected, and open for the execution of third-party software. Operating systems should separate untrusted software from confidential data and critical services. But their vulnerabilities often allow malware to break the separation and isolation they are designed to provide. To strengthen protection of select assets, security research has started to create complementary machinery such as security hypervisors and separation kernels, whose sole task is separation and isolation. The reduced size of these solutions allows for thorough inspection, both manual and automated. In some cases, formal methods are applied to create mathematical proofs on the security of these systems. The actual isolation solutions themselves are carefully analyzed and included software is often even verified on binary level. The role of other software and hardware for the overall system security has received less attention so far. The subject of this thesis is to shed light on these aspects, mainly on (i) unprivileged third-party code and its ability to influence security, (ii) peripheral devices with direct access to memory, and (iii) boot code and how we can selectively enable and disable isolation services without compromising security. The papers included in this thesis are both design and verification oriented, however, with an emphasis on the analysis of instruction set architectures. With the help of a theorem prover, we implemented various types of machinery for the automated information flow analysis of several processor architectures. The analysis is guaranteed to be both sound and accurate.
Förr skrevs skadlig mjukvara mest av nyfikna tonåringar. Idag är våra datorer under ständig hot från statliga organisationer, kriminella grupper, och kanske till och med våra affärskonkurrenter. Vissa besitter stor kompetens och kan utföra fokuserade attacker. Samtidigt har tekniken runtomkring oss (såsom mobiltelefoner och tv-apparater) blivit mer komplex, uppkopplad och öppen för att exekvera mjukvara från tredje part. Operativsystem borde egentligen isolera känslig data och kritiska tjänster från mjukvara som inte är trovärdig. Men deras sårbarheter gör det oftast möjligt för skadlig mjukvara att ta sig förbi operativsystemens säkerhetsmekanismer. Detta har lett till utveckling av kompletterande verktyg vars enda funktion är att förbättra isolering av utvalda känsliga resurser. Speciella virtualiseringsmjukvaror och separationskärnor är exempel på sådana verktyg. Eftersom sådana lösningar kan utvecklas med relativt liten källkod, är det möjligt att analysera dem noggrant, både manuellt och automatiskt. I några fall används formella metoder för att generera matematiska bevis på att systemet är säkert. Själva isoleringsmjukvaran är oftast utförligt verifierad, ibland till och med på assemblernivå. Dock så har andra komponenters påverkan på systemets säkerhet hittills fått mindre uppmärksamhet, både när det gäller hårdvara och annan mjukvara. Den här avhandlingen försöker belysa dessa aspekter, huvudsakligen (i) oprivilegierad kod från tredje part och hur den kan påverka säkerheten, (ii) periferienheter med direkt tillgång till minnet och (iii) startkoden, samt hur man kan aktivera och deaktivera isolationstjänster på ett säkert sätt utan att starta om systemet. Avhandlingen är baserad på sex tidigare publikationer som handlar om både design- och verifikationsaspekter, men mest om säkerhetsanalys av instruktionsuppsättningar. Baserat på en teorembevisare har vi utvecklat olika verktyg för den automatiska informationsflödesanalysen av processorer. Vi har använt dessa verktyg för att tydliggöra vilka register oprivilegierad mjukvara har tillgång till på ARM- och MIPS-maskiner. Denna analys är garanterad att vara både korrekt och precis. Så vitt vi vet är vi de första som har publicerat en lösning för automatisk analys och bevis av informationsflödesegenskaper i standardinstruktionsuppsättningar.

QC 20160919

PROSPER
HASPOC

APA, Harvard, Vancouver, ISO, and other styles

34

"Reducing a complex instruction set computer." Chinese University of Hong Kong, 1988. http://library.cuhk.edu.hk/record=b5885967.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Su, Heng-I., and 蘇恆毅. "An Instruction Set Architecture Simulator for Embedded Processor Design." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/05833976379712231803.

Full text

Abstract:

碩士
國立清華大學
電機工程學系
90
The design evaluation of embedded processors at each level is an important issue, the architecture level especially. The accurate evaluation at the architecture level is the key to improving the system performance, but it is not easy to fix the complete design at the architecture level. The designers need to spend a lot of time in exploring different architectures based on the applications. Without an appropriate simulation tool for performance evaluation, exploring different processor architectures would be painful, if possible. An instruction set architecture simulator is a simulation tool which attempts to simplify this work. In this thesis, we propose an instruction-accurate and cycle-accurate instruction set architecture simulator for embedded processor design. It helps us easily and quickly describing different embedded processors, using a simple architecture description method which we developed. According to the simulation results, it is easy for us choose the highest performance architecture with an acceptable area overhead. A debugging environment also is provided for debugging, which is important for application software development. It allows easy modification of the source code. If there are some special opcodes which our simulator does not support, one can revise the source code with the proposed environment. In our experiment, we simulated and evaluated the performance of some processor architectures. Based on the results, we were able to modify the architectures to improve their performance. The performance improvement varies from 19% to 42% in these cases.

APA, Harvard, Vancouver, ISO, and other styles

36

Hu, Ya-Lun, and 胡亞倫. "Design and Evaluation of Advanced RISC Instruction Set Architecture." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/03080070015891954595.

Full text

Abstract:

碩士
國立中正大學
資訊工程所
94
In embedded domain, performance and power consumption are usually the design constraints. And a good instruction set architecture plays a key role in that. A successful embedded processor must be accompanied with an excellent instruction set, such as most popular processor in embedded domain - ARM. In this paper we propose sub-computing instruction, load and store mask instruction, prefetch instruction and repeat instruction to improve performance. And we also propose compression instructions to improve code density. Besides, we develop an instruction level cycle accurate C simulator for evaluating and refining our design. Finally, we compare our design with ARM using MiBench benchmark suite.

APA, Harvard, Vancouver, ISO, and other styles

37

Chen, Jeng-Hung, and 陳政宏. "ARM Cortex-A8 NEON Instruction Set and Architecture Study." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/93pyw4.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
99
ARM processors advance steadily on the market. Undoubtedly it has become the most popular and most important processor in embedded systems. More and more chip makers embrace ARM processor core and integrate hardware accelerator, DSP, and other peripherals according to individual needs to differentiate their products. ARM core SOC has become mainstream standard for SOC. ARM Cortex is the newest generation ARM to replace previous ARM7, ARM9, and ARM11 of V4/V5/V6.It comes with high efficiency and low power processors such as A, R, M profiles. They provide various application needs of the whole series for all-kind systems. Multimedia has widely used in embedded environment. But implementation of multimedia computing complexity for the processor is still a heavy burden. How to achieve high performance and low power consumption is an important topic. The ARM A profile since A8, A9 ... have now provide SIMD NEON instruction set to support the portable and low power multimedia software. In this thesis we research the new technology supported by ARM Cortex-A8 core – SIMD (NEON), Single Instruction Multiple Data. We find way to achieve performance via compiler optimization option and program scheme. The analysis shows that our method can boost multimedia program speed-up around 2-4 folds by applying SIMD (NEON).

APA, Harvard, Vancouver, ISO, and other styles

38

Chiu, Tai-En, and 邱泰恩. "An extensible instruction set architecture design and its toolchain implementation." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/31263591761985977538.

Full text

Abstract:

碩士
國立成功大學
電腦與通信工程研究所
96
The design methodology of embedded processors can adapt to the design flow of Application-Specific Instruction-Set Processor (ASIP) to perform various types of operations more efficiently. In this thesis, we present a design of extensible instruction set architecture (ISA) for ASIP systems. By removing the less frequently used functionality of the ARMv4 ISA and rearranging its binary encoding, we obtain an extended instruction encoding space. This extended space can be added with special-purpose instructions without any constraint. To use this extensible ISA, we also implement the corresponding software toolchain that includes an assembler, a linker, and some basic libraries. To verify the software toolchain, we modify our RISC32 processor to perform verification. We first use our toolchain to generate an executable binary image, and then execute this image by an HDL simulator which is our RISC32 processor. At last, we compare the simulator’s output results with the referenced ones for correctness checking.

APA, Harvard, Vancouver, ISO, and other styles

39

"Application-specific instruction set processor for speech recognition." 2005. http://library.cuhk.edu.hk/record=b5892381.

Full text

Abstract:

Cheung Man Ting.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references (leaves 69-71).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- The Emergence of ASIP --- p.1
Chapter 1.1.1 --- Related Work --- p.3
Chapter 1.2 --- Motivation --- p.6
Chapter 1.3 --- ASIP Design Methodologies --- p.7
Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8
Chapter 1.5 --- Thesis outline --- p.10
Chapter 2 --- Automatic Speech Recognition --- p.11
Chapter 2.1 --- Overview of ASR system --- p.11
Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12
Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14
Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14
Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14
Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15
Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17
Chapter 2.3.5 --- Probability Evaluation --- p.18
Chapter 2.4 --- The Viterbi Search Engine --- p.19
Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22
Chapter 3 --- Design of ASIP Platform --- p.24
Chapter 3.1 --- Instruction Fetch --- p.25
Chapter 3.2 --- Instruction Decode --- p.26
Chapter 3.3 --- Datapath --- p.29
Chapter 3.4 --- Register File Systems --- p.30
Chapter 3.4.1 --- Memory Hierarchy --- p.30
Chapter 3.4.2 --- Register File Organization --- p.31
Chapter 3.4.3 --- Special Registers --- p.34
Chapter 3.4.4 --- Address Generation --- p.34
Chapter 3.4.5 --- Load and Store --- p.36
Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37
Chapter 4.1 --- Hardware Architecture Exploration --- p.37
Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37
Chapter 4.1.2 --- Multiplication and Accumulation --- p.38
Chapter 4.1.3 --- Pipelining --- p.41
Chapter 4.1.4 --- Memory Architecture --- p.43
Chapter 4.1.5 --- Saturation Logic --- p.44
Chapter 4.1.6 --- Specialized Addressing Modes --- p.44
Chapter 4.1.7 --- Repetitive Operation --- p.47
Chapter 4.2 --- Software Algorithm Implementation --- p.49
Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49
Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54
Chapter 5 --- Simulation Results --- p.56
Chapter 6 --- Conclusions and Future Work --- p.60
Appendices --- p.62
Chapter A --- Base Instruction Set --- p.62
Chapter B --- Special Registers --- p.65
Chapter C --- Chip Microphotograph of ASIP --- p.67
Chapter D --- The Testing Board of ASIP --- p.68
Bibliography --- p.69

APA, Harvard, Vancouver, ISO, and other styles

40

Wang, Albert, and 王伯文. "Improving instruction set design of embedded microcontroller architecture based on Transport-Triggered Architecture and VLIW." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/74214506110234397499.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
93
In this paper, we propose a new design concept of instruction set design based on Very Long Instruction Word (VLIW) and Transport Triggered Architecture(TTA). VLIW has advantages on highly parallel ability and easy for hardware implementation. But it also has disadvantages with poorly code density and binary compatible. Differ from traditional architecture, TTA archives operation by data movement. Because the only operation is move, implementations of TTA are more simple than other architecture and ease to extend other specific applications. Bus TTA has the same disadvantages with VLIW. We will analyze and propose improvements for VLIW and TTA on two aspects: For the disadvantages of VLIW, we propose instruction tag to improve flexibilities and binary compatible issues. For TTA, we propose a multiple-source instruction format that between TTA and traditional RISC architecture for code density issue. And we will have a instruction set implementation that combine 2 concepts.

APA, Harvard, Vancouver, ISO, and other styles

41

Fang, Jhih-Jhong, and 方志中. "The Design and Verification of an ARMv4T Instruction Set Architecture Compatible Microprocessor IP." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/82114362652112642892.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
97
In this thesis, an ARMv4T instruction set architecture compatible microprocessor IP (Intellectual Property), Proto3-ARM9TM, is proposed. In order to improve the performance of the Proto-ARM9M processor [25], we redesign the architectureof the processor and its major modules. For the processor design, the number of pipeline stages is increased from 5 to 9, the register file is constructed by using SRAMs instead of D-FFs, multiply-accumulator is built with a pipelined parallel multiplier instead of the iterated multiplier, and a barrel shifter is designed to replace the DesignWare shifter block. We also employ the following mechanisms to reduce the effects of hazards: two groups of forwarding paths, two stages for exception detection, a 128x60 branch target cache, and the 2-bit branch prediction scheme. For the implemnted instruction set, we also implement both the coprocessor and the Thumb instruction sets. In addition, a coprocessor interface for the Proto3-ARM9TM processor is designed and implemented. Comparing to the Proto-ARM9M processor, the operating frequency is increased from 21 MHz to 45.3 MHz on the same FPGA platform, the IPC is increased from 0.47 to 0.7 on the same set of testing programs, and the performance is increased by an amount of 221.58%. The Proto3-ARM9TM system along with AMBA and its related peripherals, are implemented and verified at Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18μm cell library, respectively. When realized with the FPGA, the Proto3-ARM9TM system consumes 9728 LUTs and operates at the maximum frequency of 31 MHz. When realized with the cell library, the Proto3-ARM9TM has a core area of 1.400×1.393 mm2 and the whole chip area is 2.280×2.274 mm2. The average power consumption is 83 mW at the operating frequency of 134 MHz.

APA, Harvard, Vancouver, ISO, and other styles

42

Chang, Tzu-Mu, and 張慈牧. "The Design and Implementation of a 6502 Instruction Set Architecture Compatible Microprocessor IP." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/59486858963263400232.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
91
In this thesis, a 6502 instruction set architecture compatible microprocessor IP (Intellectual Property), Proto-6502, is proposed. Since the reusability of IP depends on the completeness of its verification, a reconfigurable and automatic comparable verification environment is also proposed to insure the consistency of every implementation stage and the completeness of the verification. According to the behavior of every 6502 instruction, we design the ASM chart and the microoperations of Proto-6502. For balancing the speed and the area, the microoperations and their corresponding datapath of Proto-6502 are adjusted according to the delay-cost analysis result. All of the behavioral-level and the register transfer level (RTL) designs have been verified by the verification environment and the behavior of all instructions are consistent with the 6502 microprocessor; in addition, the average statement coverage of every module in Proto-6502 is 94.4%. In the synthesis stage, a suitable state encoding style of Proto-6502’s FSM is found according to the area, speed, and power analyses. To improve the power consumption of the circuit, we use the gated clock method to reduce the unnecessary switching activities of the registers. The reduction of power consumption is 20.81%. Proto-6502 has been implemented and verified with Xilnx Vertex 400 FPGA and TSMC 0.25 mm cell library. In the FPGA part, it takes 2010 LUTs and operates at the internal working frequency of 12.9 MHz. In the cell-based part, the core occupies the area of 333 mm 333 mm, which is approximately equivalent to 3700 gates, and consumes about 68 mW in the typical operating condition at the internal working frequency of 80 MHz.

APA, Harvard, Vancouver, ISO, and other styles

43

Lin, Jin-Ho, and 林晉禾. "The Design and Verification of an ARM v4 Instruction Set Architecture Compatible Microprocessor IP." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/35790359121880667696.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
93
In this thesis, an ARM v4 instruction set architecture compatible microprocessor IP (Intellectual Property), Proto-ARM9M, is proposed. Since the reusability of IP depends on the completeness of its verification, we also develop a test environment to demonstrate IP accuracy on each steps of ASIC design flow, such as behavioral level, register transfer level, post-synthesized gate level, and post-layout gate level model. Based on the behavior of ARM v4 instruction set architecture, we design the behavioral-level model of Proto-ARM9M, and establish a test environment to verify it. After the behavioral-level model verification is done, we start to design register-transfer-level model of Proto-ARM9M. A typical five-stage pipeline is used in the Proto-ARM9M datapath. The individual module of datapath, such as instruction decoder, register file, shifter, arithmetic and logic unit, multiply-accumulator, and program status register are designed carefully to improve performance and decrease area. The register-transfer-level simulation results in the testbech are the same as ARM instruction simulator, ADS (ARM Developer Suite), and the average code coverage of every module in Proto-ARM9M is 96.58%. Proto-ARM9M has been implemented and verified with Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.35 μm cell library. In the FPGA part, it takes 9304 LUTs and operates at the maximum working frequency of 21 MHz. Furthermore, all of the testing programs are run successfully in FPGA development board. In the cell-based part, the core occupies 3420.8 μm 3212.5 μm, which is approximately equivalent to 55450 gates, and the whole chip occupies 5251.8 μm 5087.4 μm. Proto-ARM9M consumes about 151.2 mW to 192.8 mW in the SS（Slow NMOS Slow PMOS model）simulation condition at the maximum working frequency of 33 MHz.

APA, Harvard, Vancouver, ISO, and other styles

44

Hung, Yu-Ting, and 洪毓廷. "X86-64 Instruction Set Architecture Supports for an LLVM-Based Retargetable Hybrid Binary Translator." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/15113686460618532792.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
105
Hybrid binary translation (HBT) is a binary translation technology which combines the technologies of static binary translation and dynamic binary translation. The HBT-86 is an LLVM-based retargetable HBT system for x86 instruction set architecture (ISA). For the previous HBT-86, the front-end supports only the x86 integer, x87 floating-point and a part of SSE SIMD integer instruction sets, and the back-end supports the x86 and x64 target platforms. However, comparing with 32-bit ISA, 64-bit ISA can access larger memory and registers. Thus, there are more and more 64-bit executables of applications in recent years. In this thesis, we extend the previous HBT-86 to support x64 source ISA. Moreover, for validating the retargetability of HBT-86, we extend the previous HBT-86 to support ARM-64 target ISA. For x64 to x64 emulation experiments, our HBT-86 is about 2.30 and 2.14 times faster than QEMU for SPEC2006 CINT and SPEC2006 CFP benchmark, respectively. For x64 to ARM-64 emulation, our HBT-86 is about 3.68 and 9.27 times faster than QEMU for SPEC2006 CINT and SPEC2006 CFP benchmark, respectively.

APA, Harvard, Vancouver, ISO, and other styles

45

江建德. "Efficient Two-Layered Cycle-Accurate Modeling Technique for Processor Family with Same Instruction Set Architecture." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/54733771635248826664.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Wu, Jin-You, and 巫謹佑. "The design and implementation of an SoC based on the RISC-V Instruction Set Architecture." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qb64q4.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
107
The purpose of this research is to design and implement a system-on-chip (SoC) based on the RISC-V instruction set architecture (ISA). RISC-V is an open source ISA based on the Reduced Instruction Set Computing (RISC) principle. Compared to most RISC ISAs, the RISC-V ISA is freely available for any purposes. Allowing anyone to design, manufacture and sell RISC-V chips and software. In recent years, RISC-V has risen rapidly due to the booming development of the Internet of Things (IoT) and the licensing and patents of the commercial RISC ISA. In response to the fragmentation of IoT scenarios, there is a high demand for low power, low cost and customization, which are the features of RISC-V ISA. Therefore, in this research, we will design and implementation a RISC-V processor that supports RV32IM. After the design and implementation, we will pass the RISC-V official ISA tests and use the standard Dhrystone benchmark to evaluate the performance (DMIPS/MHz) of our purposed RISC-V processor in a full-system simulation environment. In addition, we will also compile the binary file with the open source GNU Compiler Toolchain and verify the correctness of the proposed process SoC running on the Xilinx FPGA, KC705 development platform.

APA, Harvard, Vancouver, ISO, and other styles

47

Gribble, Donald L. "A new RISC architecture for high speed data acquisition." Thesis, 1991. http://hdl.handle.net/1957/37001.

Full text

Abstract:

This thesis describes the design of a RISC architecture for high speed data acquisition. The structure of existing data acquisition systems is first examined. An instruction set is created to allow the data acquisition system to serve a wide variety of applications. The architecture is designed to allow the execution of an instruction each clock cycle. The utility of the RISC system is illustrated by implementing several representative applications. Performance of the system is analyzed and future enhancements discussed.
Graduation date: 1992

APA, Harvard, Vancouver, ISO, and other styles

48

Syu, Dong-Fong, and 許東豐. "The Design and Implementation of a Soft-Core Processor based on the MicroBlaze Instruction Set Architecture." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/f85auq.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
107
This research aims to design and implement a 32-bit soft-core RISC processor based on the Xilinx MicroBlaze Instruction Set Architecture (ISA). MicroBlaze is a 32-bit RISC soft-core processor that provides a series of configurable features, complete software toolchain support, and flexible interfaces communicating with peripherals, memory, and other IPs. Therefore, many application-level researches have been conducted based on the MicroBlaze processor. However, the synthesizable RTL model of the MicroBlaze processor is not in public domain. This proprietary nature prevents developers from gaining deeper insights into their designs. At present, only a few research projects, such as OpenFire, MB-Lite, and SecretBlaze, have been carried out on investigating and implementing the MicroBlaze processor’s microarchitecture. Although these researches are open-source projects, the proposed synthesizable processors are not well-tested with complex software systems and the technical documentations are not thorough enough for practical replacement of the MicroBlaze processor from Xilinx. Therefore, in this research, we will focus on the design and implementation of a 32-bit soft-core processor – KernelBlaze based on the MicroBlaze ISA and the DLX architecture. We will use Dhrystone, an integer Benchmark program running on bare-metal systems, to evaluate the performance (DMIPS/MHz) of the proposed KernelBlaze processor. In addition, a FreeRTOS application will also be used to verify the correctness of the KernelBlaze processor running on the Xilinx development board - KC705. Finally, we will compare the design trade-offs of KernelBlaze with the OpenFire processor in terms of DMIPS/MHz, resource utilization, and maximal synthesizable circuit frequency.

APA, Harvard, Vancouver, ISO, and other styles

49

Hsu, Chung-yang, and 徐昌陽. "The Design and Verification of an ARM v4 Instruction Set Architecture Compatible Memory-Management-Unit IP." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/60592735992821194446.

Full text

Abstract:

碩士
國立臺灣科技大學
電子工程系
96
MMU, sometimes called “paged memory management unit”（PMMU）, is a computer hardware component responsible for handling accesses to memory requested by CPU. Its functions include address translation, access permission checks for instruction and data address, and memory sharing. By translating virtual addresses to physical address, it helps the operating system manage virtual memory with hardware support. In this thesis, an MMU IP （Intellectual Property） compatible to ARM v4 architecture is proposed. The MMU consists of an FSM（Finite State Machine） Control Unit, TLB（Translation Look-aside Buffer）, a calculation and protection module, and an AMBA（Advanced Microcontroller Bus Architecture） Interface to read the translation table in the main memory. Proto-ARM922, which is combined proto-ARM9M with cache, system co-processor, MMU, and AMBA interface, has been implemented and verified with Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18 μm cell library. In the FPGA part, it takes 21211 LUTs and operates at the maximum working frequency of 11 MHz. Furthermore, all of the testing programs are run successfully in FPGA development board. In the cell-based part, the core occupies 3183.26 μm × 3423.08 μm, which is approximately equivalent to 481533 gates, and the whole chip occupies 4088 μm × 4081 μm, and in the SS （Slow NMOS Slow PMOS model） simulation condition it operates at the maximum working frequency of 40 MHz, and it comsumes about 167.1 mW.

APA, Harvard, Vancouver, ISO, and other styles

50

"Very large register file for BLAS-3 operations." Chinese University of Hong Kong, 1995. http://library.cuhk.edu.hk/record=b5888541.

Full text

Abstract:

by Aylwin Chung-Fai, Yu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.
Includes bibliographical references (leaves 117-118).
Abstract --- p.i
Acknowledgement --- p.iii
List of Tables --- p.v
List of Figures --- p.vi
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- BLAS-3 Operations --- p.2
Chapter 1.2 --- Organization of Thesis --- p.2
Chapter 1.3 --- Contribution --- p.3
Chapter 2 --- Background Studies --- p.4
Chapter 2.1 --- Registers & Cache Memory --- p.4
Chapter 2.2 --- Previous Research --- p.6
Chapter 2.3 --- Problem of Register & Cache --- p.8
Chapter 2.4 --- BLAS-3 Operations On RISC Microprocessor --- p.10
Chapter 3 --- Compiler Optimization Techniques for BLAS-3 Operations --- p.12
Chapter 3.1 --- One-Dimensional Q-Way J-Loop Unrolling --- p.13
Chapter 3.2 --- Two-Dimensional P×Q -Ways I×J-Loops Unrolling --- p.15
Chapter 3.3 --- Addition of Code to Remove Redundant Code --- p.17
Chapter 3.4 --- Simulation Result --- p.17
Chapter 3.5 --- Summary --- p.23
Chapter 4 --- Architectural Model of Very Large Register File --- p.25
Chapter 4.1 --- Architectural Model --- p.26
Chapter 4.2 --- Traditional Register File vs. Very Large Register File --- p.32
Chapter 5 --- Ideal Case Study of Very Large Register File --- p.35
Chapter 5.1 --- Matrix Multiply --- p.36
Chapter 5.2 --- LU Decomposition --- p.41
Chapter 5.3 --- Convolution --- p.50
Chapter 6 --- Worst Case Study of Very Large Register File --- p.58
Chapter 6.1 --- Matrix Multiply --- p.59
Chapter 6.2 --- LU Decomposition --- p.65
Chapter 6.3 --- Convolution --- p.74
Chapter 7 --- Proposed Case Study of Very Large Register File --- p.81
Chapter 7.1 --- Matrix Multiply --- p.82
Chapter 7.2 --- LU Decomposition --- p.91
Chapter 7.3 --- Convolution --- p.102
Chapter 7.4 --- Comparison --- p.111
Chapter 8 --- Conclusion & Future Work --- p.114
Chapter 8.1 --- Summary --- p.114
Chapter 8.2 --- Future Work --- p.115
Bibliography --- p.117

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Instruction set architecture'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles