Dissertations / Theses on the topic 'Instruction set architecture'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Instruction set architecture.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Zmily, Ahmad Darweesh. "Block-aware instruction set architecture /." May be available electronically:, 2007. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.
Full textSchoepke, Olaf S. "Dense instruction set computer architecture." Thesis, University of Bath, 1992. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.332540.
Full textGlökler, Tilman Meyr Heinrich. "Design of energy-efficient application-specific instruction set processors /." Boston, Mass. [u.a.] : Kluwer Acad. Publ, 2004. http://www.loc.gov/catdir/enhancements/fy0820/2004041376-d.html.
Full textWagstaff, Harry. "From high level architecture descriptions to fast instruction set simulators." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14162.
Full textBennett, Richard Vincent. "Increasing the efficacy of automated instruction set extension." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5789.
Full textPonnala, Kalyan. "DESIGN AND IMPLEMENTATION OF THE INSTRUCTION SET ARCHITECTURE FOR DATA LARS." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/58.
Full textCurtis, Bryce Allen. "A special instruction set multiple chip computer for DSP : architecture and compiler design." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15736.
Full textMapes, Glenn. "An instruction set simulator for the 8086 16-bit microprocessor." Virtual Press, 1985. http://liblink.bsu.edu/uhtbin/catkey/416976.
Full textDegenbaev, Ulan [Verfasser], and Wolfgang J. [Akademischer Betreuer] Paul. "Formal specification of the x86 instruction set architecture / Ulan Degenbaev. Betreuer: Wolfgang J. Paul." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2012. http://d-nb.info/105227885X/34.
Full textBauer, Heiner. "Dynamic instruction set extension of microprocessors with embedded FPGAs." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-222858.
Full textZunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt
Moustakas, Evangelos. "Design and simulation of a primitive RISC architecture using VHDL /." Online version of thesis, 1991. http://hdl.handle.net/1850/11229.
Full textChen, Wan-Fu. "A high speed 16-bit RISC processor chip /." Online version of thesis, 1994. http://hdl.handle.net/1850/11754.
Full textOng, Jia Jan. "Hardware realization of Discrete Wavelet Transform Cauchy Reed Solomon Minimal Instruction Set Computer architecture for Wireless Visual Sensor Networks." Thesis, University of Nottingham, 2016. http://eprints.nottingham.ac.uk/32583/.
Full textYuan, Fangfang. "Assessing the impact of processor design decisions on simulation based verification complexity using formal modeling with experiments at instruction set architecture level." Thesis, University of Bristol, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.566838.
Full textVaranasi, Archana. "Course grained low power design flow using UPF /." Online version of thesis, 2009. http://hdl.handle.net/1850/11768.
Full textAminot, Alexandre. "Placement de tâches dynamique et flexible sur processeur multicoeur asymétrique en fonctionnalités." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM047/document.
Full textTo meet the increasingly heterogeneous needs of applications (in terms of power and efficiency), this thesis focus on the emerging functionally asymmetric multi-core processor (FAMP) architectures. These architectures are characterized by non-uniform implementation of hardware extensions in the cores (ex. Floating Point Unit (FPU)). The area savings are apparent, but what about the impact in software, energy and performance?To answer these questions, the thesis investigates the nature of the use of extensions in state-of-the-art's applications and compares various existing methods. To optimize the tasks mapping and increase efficiency, the thesis proposes a dynamic solution at scheduler level, called relaxed scheduler.Hardware extensions are valuable because they speed up part of code where the parallelization on multi-core isn't efficient. However, the hardware extensions are under-exploited by applications and their cost in terms of area and power consumption are important.Based on these observations, the following contributions have been proposed:We present a detailed study on the use of vector and FPU extensions in state-of-the-art's applicationsWe compare multiple extension management solutions at different levels of temporal granularity of action, to understand the limitations of these solutions and thus define at which level we must act. Few studies address the issue of the granularity of action to manage extensions.We offer a solution for estimating online performance degradation to run a task on a core without a given extension. To allow the scalability of multi-core, the operating system must have flexibility in the placement of tasks. Placing a task on a core with no extension can have important consequences for energy and performance. But to date, there is no way to estimate this potential degradation.We offer a relaxed scheduler, based on our degradation estimation model, which maps the tasks on a set of heterogeneous cores effectively. We study the flexibility gained and the implications for performance and energy levels. Existing solutions propose methods to map tasks on a heterogeneous set of cores, but they do not study the tradeoff between quality of service and consumption gain for FAMP architectures.Our experiments with simulators have shown that the scheduler can achieve a significantly higher mapping flexibility with a performance degradation of less than 2 %. Compared to a symmetrical multi-core, our solution enables an average energy gain at core level of 11 %. These results are very encouraging and contribute to the development of a comprehensive FAMP platform . This thesis has been the subject of a patent application, three international scientific communications (plus one submission), and contributes to two active european projects
Wright, Stephen. "Formal construction of Instruction Set Architectures." Thesis, University of Bristol, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508307.
Full textHedayati, Mohammad Hadi. "Visualization of microprocessor execution in computer architecture courses: a case study at Kabul University." Thesis, University of the Western Cape, 2010. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_4960_1362394106.
Full textComputer architecture and assembly language programming microprocessor execution are basic courses taught in every computer science department. Generally, however, students have 
difficulties in mastering many of the concepts in the courses, particularly students whose first language is not English. In addition to their difficulties in understanding the purpose of given 
instructions, students struggle to mentally visualize the data movement, control and processing operations. To address this problem, this research proposed a graphical visualization approach 
and investigated the visual illustrations of such concepts and instruction execution by implementing a graphical visualization simulator as a teaching aid. The graphical simulator developed during the course of this research was applied in a computer architecture course at Kabul University, Afghanistan. Results obtained from student evaluation of the simulator show significant 
levels of success using the visual simulation teaching aid. The results showed that improved learning was achieved, suggesting that this approach could be useful in other computer science departments in Afghanistan, and elsewhere where similar challenges are experienced.
Ministr, Martin. "Virtuální platformy pro simulaci instrukčních sad." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-235424.
Full textTell, Eric. "Design of Programmable Baseband Processors." Doctoral thesis, Linköping : Univ, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4377.
Full textMoreira, João Carlos Peralta. "An instruction set simulator for VLIW DSP architectures." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/18675.
Full textDissertação apresentada a Universidade de Aveiro para cumprimento dos requisitos necessários a obtenção do grau de Mestre em Engenharia Eletrónica e Telecomunicações, realizada sob a orientação científica do Professor Doutor Manuel Bernardo Salvador Cunha, Professor Auxiliar do Departamento de Eletrónica, Telecomunicações e Informática da Universidade de Aveiro e Doutor Mohamed Bamakhrama, Hardware Tools Engineer na equipa "Processor and Compiler Tools" no grupo "Imaging and Camera Technologies", Intel Eindhoven, Países Baixos.
Dissertation presented to Universidade de Aveiro with the goal of achieving a Master's Degree in Electronics and Telecommunications, made with the scienti c orientation of Professor Manuel Bernardo Salvador Cunha PhD, Professor at the Department of Electronic, Telecommunications and Informatics from Universidade de Aveiro and Mohamed Bamakhrama, Hardware Tools Engineer at Processor and Compiler Tools Team of Intel's Imaging and Camera Technologies Group, Eindhoven.
Barták, Jiří. "Model procesoru RISC-V." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255393.
Full textMusasa, Mutombo Mike. "Evaluation of embedded processors for next generation asic : Evaluation of open source Risc-V processors and tools ability to perform packet processing operations compared to Arm Cortex M7 processors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299656.
Full textNätverksprocessorer är en viktig byggsten av informationsteknik idag. I takt med att 5G nätverk byggs ut runt om i världen, många fler enheter kommer att kunna ta del av deras kraftfulla prestanda och programerings flexibilitet. Informationsteknik företag som Ericsson, spenderarmycket ekonomiska resurser på licenser för att kunna använda proprietära instruktionsuppsättnings arkitektur teknik baserade processorer från ARM holdings. Det är väldigt kostam att fortsätta köpa licenser då dessa arkitekturer är en byggsten till designen av många processorer och andra komponenter. Idag finns det en lovande ny processor instruktionsuppsättnings arkitektur teknik som inte är licensierad så kallad Risc-V. Tack vare Risc-V har många propietära och öppen källkod processor utvecklats idag. Det finns dock väldigt lite information kring hur bra de presterar i nätverksapplikationer är känt idag. Kan en öppen-källkod Risc-V processor utföra nätverks databehandling funktioner lika bra som en proprietär Arm Cortex M7 processor? Huvudsyftet med detta arbete är att bygga en test model som undersöker hur väl en öppen-källkod Risc-V baserad processor utför databehandlings operationer av nätverk datapacket jämfört med en Arm Cortex M7 processor. Detta har utförts genom att ta fram en C programmeringskod som simulerar en mottagning och behandling av 72 bytes datapaket. De följande funktionerna testades, inramning, parsning, mönster matchning och klassificering. Koden kompilerades och testades i både en Arm Cortex M7 processor och 3 olika emulerade öppen källkod Risc-V processorer, Arianne, SweRV core och Rocket-chip. Efter att ha testat några öppen källkod Risc-V processorer och använt test koden i en ArmCortex M7 processor, kan det hävdas att öppen-källkod Risc-V processor verktygen inte är tillräckligt pålitliga än. Denna rapport tyder på att öppen-källkod Risc-V emulatorer och verktygen behöver utvecklas mer för att användas i nätverks applikationer. Det finns ett behov av ytterligare undersökning inom detta ämne i framtiden. Exempelvis, en djupare undersökning av SweRV core processor, eller en öppen-källkod Risc-V byggd hårdvara krävs.
Saghir, Mazen A. R. "Application-specific instruction-set architectures for embedded DSP applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0021/NQ53899.pdf.
Full textPajak, Dominic. "Specification of microprocessor instruction set architectures : ARM case study." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.422038.
Full textBocco, Andrea. "A variable precision hardware acceleration for scientific computing." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI065.
Full textMost of the Floating-Point (FP) hardware units support the formats and the operations specified in the IEEE 754 standard. These formats have fixed bit-length. They are defined on 16, 32, 64, and 128 bits. However, some applications, such as linear system solvers and computational geometry, benefit from different formats which can express FP numbers on different sizes and different tradeoffs among the exponent and the mantissa fields. The class of Variable Precision (VP) formats meets these requirements. This research proposes a VP FP computing system based on three computation layers. The external layer supports legacy IEEE formats for input and output variables. The internal layer uses variable-length internal registers for inner loop multiply-add. Finally, an intermediate layer supports loads and stores of intermediate results to cache memory without losing precision, with a dynamically adjustable VP format. The VP unit exploits the UNUM type I FP format and proposes solutions to address some of its pitfalls, such as the variable latency of the internal operation and the variable memory footprint of the intermediate variables. Unlike IEEE 754, in UNUM type I the size of a number is stored within its representation. The unit implements a fully pipelined architecture, and it supports up to 512 bits of precision, internally and in memory, for both interval and scalar computing. The user can configure the storage format and the internal computing precision at 8-bit and 64-bit granularity This system is integrated as a RISC-V coprocessor. The system has been prototyped on an FPGA (Field-Programmable Gate Array) platform and also synthesized for a 28nm FDSOI process technology. The respective working frequencies of FPGA and ASIC implementations are 50MHz and 600MHz. Synthesis results show that the estimated chip area is 1.5mm2, and the estimated power consumption is 95mW. The experiments emulated in an FPGA environment show that the latency and the computation accuracy of this system scale linearly with the memory format length set by the user. In cases where legacy IEEE-754 formats do not converge, this architecture can achieve up to 130 decimal digits of precision, increasing the chances of obtaining output data with an accuracy similar to that of the input data. This high accuracy opens the possibility to use direct methods, which are more sensitive to computational error, instead of iterative methods, which always converge. However, their latency is ten times higher than the direct ones. Compared to low precision FP formats, in iterative methods, the usage of high precision VP formats helps to drastically reduce the number of iterations required by the iterative algorithm to converge, reducing the application latency of up to 50%. Compared with the MPFR software library, the proposed unit achieves speedups between 3.5x and 18x, with comparable accuracy
Fick, David. "A virtual machine framework for domain-specific languages." Diss., Pretoria : [S.n.], 2007. http://upetd.up.ac.za/thesis/available/etd-10192007-163559/.
Full textWilliams, Fleur Liane. "The impact of instruction set orthogonality on compiler code generation." Thesis, University of Hertfordshire, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252688.
Full textShee, Seng Lin Computer Science & Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.
Full textDolíhal, Luděk. "Testování generovaných překladačů jazyka c pro procesory ve vestavěných systémech." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-412583.
Full textHusár, Adam. "Implementace obecného assembleru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-412779.
Full textGrad, Mariusz [Verfasser], and Marco [Akademischer Betreuer] Platzner. "Just-in-time processor customization on the feasibility and limitations of FPGA-based dynamically reconfigurable instruction set architectures / Mariusz Grad. Betreuer: Marco Platzner." Paderborn : Universitätsbibliothek, 2011. http://d-nb.info/1036423565/34.
Full textSchwarz, Oliver. "No Hypervisor Is an Island : System-wide Isolation Guarantees for Low Level Code." Doctoral thesis, KTH, Teoretisk datalogi, TCS, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192466.
Full textFörr skrevs skadlig mjukvara mest av nyfikna tonåringar. Idag är våra datorer under ständig hot från statliga organisationer, kriminella grupper, och kanske till och med våra affärskonkurrenter. Vissa besitter stor kompetens och kan utföra fokuserade attacker. Samtidigt har tekniken runtomkring oss (såsom mobiltelefoner och tv-apparater) blivit mer komplex, uppkopplad och öppen för att exekvera mjukvara från tredje part. Operativsystem borde egentligen isolera känslig data och kritiska tjänster från mjukvara som inte är trovärdig. Men deras sårbarheter gör det oftast möjligt för skadlig mjukvara att ta sig förbi operativsystemens säkerhetsmekanismer. Detta har lett till utveckling av kompletterande verktyg vars enda funktion är att förbättra isolering av utvalda känsliga resurser. Speciella virtualiseringsmjukvaror och separationskärnor är exempel på sådana verktyg. Eftersom sådana lösningar kan utvecklas med relativt liten källkod, är det möjligt att analysera dem noggrant, både manuellt och automatiskt. I några fall används formella metoder för att generera matematiska bevis på att systemet är säkert. Själva isoleringsmjukvaran är oftast utförligt verifierad, ibland till och med på assemblernivå. Dock så har andra komponenters påverkan på systemets säkerhet hittills fått mindre uppmärksamhet, både när det gäller hårdvara och annan mjukvara. Den här avhandlingen försöker belysa dessa aspekter, huvudsakligen (i) oprivilegierad kod från tredje part och hur den kan påverka säkerheten, (ii) periferienheter med direkt tillgång till minnet och (iii) startkoden, samt hur man kan aktivera och deaktivera isolationstjänster på ett säkert sätt utan att starta om systemet. Avhandlingen är baserad på sex tidigare publikationer som handlar om både design- och verifikationsaspekter, men mest om säkerhetsanalys av instruktionsuppsättningar. Baserat på en teorembevisare har vi utvecklat olika verktyg för den automatiska informationsflödesanalysen av processorer. Vi har använt dessa verktyg för att tydliggöra vilka register oprivilegierad mjukvara har tillgång till på ARM- och MIPS-maskiner. Denna analys är garanterad att vara både korrekt och precis. Så vitt vi vet är vi de första som har publicerat en lösning för automatisk analys och bevis av informationsflödesegenskaper i standardinstruktionsuppsättningar.
QC 20160919
PROSPER
HASPOC
"Reducing a complex instruction set computer." Chinese University of Hong Kong, 1988. http://library.cuhk.edu.hk/record=b5885967.
Full textSu, Heng-I., and 蘇恆毅. "An Instruction Set Architecture Simulator for Embedded Processor Design." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/05833976379712231803.
Full text國立清華大學
電機工程學系
90
The design evaluation of embedded processors at each level is an important issue, the architecture level especially. The accurate evaluation at the architecture level is the key to improving the system performance, but it is not easy to fix the complete design at the architecture level. The designers need to spend a lot of time in exploring different architectures based on the applications. Without an appropriate simulation tool for performance evaluation, exploring different processor architectures would be painful, if possible. An instruction set architecture simulator is a simulation tool which attempts to simplify this work. In this thesis, we propose an instruction-accurate and cycle-accurate instruction set architecture simulator for embedded processor design. It helps us easily and quickly describing different embedded processors, using a simple architecture description method which we developed. According to the simulation results, it is easy for us choose the highest performance architecture with an acceptable area overhead. A debugging environment also is provided for debugging, which is important for application software development. It allows easy modification of the source code. If there are some special opcodes which our simulator does not support, one can revise the source code with the proposed environment. In our experiment, we simulated and evaluated the performance of some processor architectures. Based on the results, we were able to modify the architectures to improve their performance. The performance improvement varies from 19% to 42% in these cases.
Hu, Ya-Lun, and 胡亞倫. "Design and Evaluation of Advanced RISC Instruction Set Architecture." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/03080070015891954595.
Full text國立中正大學
資訊工程所
94
In embedded domain, performance and power consumption are usually the design constraints. And a good instruction set architecture plays a key role in that. A successful embedded processor must be accompanied with an excellent instruction set, such as most popular processor in embedded domain - ARM. In this paper we propose sub-computing instruction, load and store mask instruction, prefetch instruction and repeat instruction to improve performance. And we also propose compression instructions to improve code density. Besides, we develop an instruction level cycle accurate C simulator for evaluating and refining our design. Finally, we compare our design with ARM using MiBench benchmark suite.
Chen, Jeng-Hung, and 陳政宏. "ARM Cortex-A8 NEON Instruction Set and Architecture Study." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/93pyw4.
Full text國立臺灣科技大學
電子工程系
99
ARM processors advance steadily on the market. Undoubtedly it has become the most popular and most important processor in embedded systems. More and more chip makers embrace ARM processor core and integrate hardware accelerator, DSP, and other peripherals according to individual needs to differentiate their products. ARM core SOC has become mainstream standard for SOC. ARM Cortex is the newest generation ARM to replace previous ARM7, ARM9, and ARM11 of V4/V5/V6.It comes with high efficiency and low power processors such as A, R, M profiles. They provide various application needs of the whole series for all-kind systems. Multimedia has widely used in embedded environment. But implementation of multimedia computing complexity for the processor is still a heavy burden. How to achieve high performance and low power consumption is an important topic. The ARM A profile since A8, A9 ... have now provide SIMD NEON instruction set to support the portable and low power multimedia software. In this thesis we research the new technology supported by ARM Cortex-A8 core – SIMD (NEON), Single Instruction Multiple Data. We find way to achieve performance via compiler optimization option and program scheme. The analysis shows that our method can boost multimedia program speed-up around 2-4 folds by applying SIMD (NEON).
Chiu, Tai-En, and 邱泰恩. "An extensible instruction set architecture design and its toolchain implementation." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/31263591761985977538.
Full text國立成功大學
電腦與通信工程研究所
96
The design methodology of embedded processors can adapt to the design flow of Application-Specific Instruction-Set Processor (ASIP) to perform various types of operations more efficiently. In this thesis, we present a design of extensible instruction set architecture (ISA) for ASIP systems. By removing the less frequently used functionality of the ARMv4 ISA and rearranging its binary encoding, we obtain an extended instruction encoding space. This extended space can be added with special-purpose instructions without any constraint. To use this extensible ISA, we also implement the corresponding software toolchain that includes an assembler, a linker, and some basic libraries. To verify the software toolchain, we modify our RISC32 processor to perform verification. We first use our toolchain to generate an executable binary image, and then execute this image by an HDL simulator which is our RISC32 processor. At last, we compare the simulator’s output results with the referenced ones for correctness checking.
"Application-specific instruction set processor for speech recognition." 2005. http://library.cuhk.edu.hk/record=b5892381.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references (leaves 69-71).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- The Emergence of ASIP --- p.1
Chapter 1.1.1 --- Related Work --- p.3
Chapter 1.2 --- Motivation --- p.6
Chapter 1.3 --- ASIP Design Methodologies --- p.7
Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8
Chapter 1.5 --- Thesis outline --- p.10
Chapter 2 --- Automatic Speech Recognition --- p.11
Chapter 2.1 --- Overview of ASR system --- p.11
Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12
Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14
Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14
Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14
Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15
Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17
Chapter 2.3.5 --- Probability Evaluation --- p.18
Chapter 2.4 --- The Viterbi Search Engine --- p.19
Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22
Chapter 3 --- Design of ASIP Platform --- p.24
Chapter 3.1 --- Instruction Fetch --- p.25
Chapter 3.2 --- Instruction Decode --- p.26
Chapter 3.3 --- Datapath --- p.29
Chapter 3.4 --- Register File Systems --- p.30
Chapter 3.4.1 --- Memory Hierarchy --- p.30
Chapter 3.4.2 --- Register File Organization --- p.31
Chapter 3.4.3 --- Special Registers --- p.34
Chapter 3.4.4 --- Address Generation --- p.34
Chapter 3.4.5 --- Load and Store --- p.36
Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37
Chapter 4.1 --- Hardware Architecture Exploration --- p.37
Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37
Chapter 4.1.2 --- Multiplication and Accumulation --- p.38
Chapter 4.1.3 --- Pipelining --- p.41
Chapter 4.1.4 --- Memory Architecture --- p.43
Chapter 4.1.5 --- Saturation Logic --- p.44
Chapter 4.1.6 --- Specialized Addressing Modes --- p.44
Chapter 4.1.7 --- Repetitive Operation --- p.47
Chapter 4.2 --- Software Algorithm Implementation --- p.49
Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49
Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54
Chapter 5 --- Simulation Results --- p.56
Chapter 6 --- Conclusions and Future Work --- p.60
Appendices --- p.62
Chapter A --- Base Instruction Set --- p.62
Chapter B --- Special Registers --- p.65
Chapter C --- Chip Microphotograph of ASIP --- p.67
Chapter D --- The Testing Board of ASIP --- p.68
Bibliography --- p.69
Wang, Albert, and 王伯文. "Improving instruction set design of embedded microcontroller architecture based on Transport-Triggered Architecture and VLIW." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/74214506110234397499.
Full text國立臺灣科技大學
電子工程系
93
In this paper, we propose a new design concept of instruction set design based on Very Long Instruction Word (VLIW) and Transport Triggered Architecture(TTA). VLIW has advantages on highly parallel ability and easy for hardware implementation. But it also has disadvantages with poorly code density and binary compatible. Differ from traditional architecture, TTA archives operation by data movement. Because the only operation is move, implementations of TTA are more simple than other architecture and ease to extend other specific applications. Bus TTA has the same disadvantages with VLIW. We will analyze and propose improvements for VLIW and TTA on two aspects: For the disadvantages of VLIW, we propose instruction tag to improve flexibilities and binary compatible issues. For TTA, we propose a multiple-source instruction format that between TTA and traditional RISC architecture for code density issue. And we will have a instruction set implementation that combine 2 concepts.
Fang, Jhih-Jhong, and 方志中. "The Design and Verification of an ARMv4T Instruction Set Architecture Compatible Microprocessor IP." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/82114362652112642892.
Full text國立臺灣科技大學
電子工程系
97
In this thesis, an ARMv4T instruction set architecture compatible microprocessor IP (Intellectual Property), Proto3-ARM9TM, is proposed. In order to improve the performance of the Proto-ARM9M processor [25], we redesign the architectureof the processor and its major modules. For the processor design, the number of pipeline stages is increased from 5 to 9, the register file is constructed by using SRAMs instead of D-FFs, multiply-accumulator is built with a pipelined parallel multiplier instead of the iterated multiplier, and a barrel shifter is designed to replace the DesignWare shifter block. We also employ the following mechanisms to reduce the effects of hazards: two groups of forwarding paths, two stages for exception detection, a 128x60 branch target cache, and the 2-bit branch prediction scheme. For the implemnted instruction set, we also implement both the coprocessor and the Thumb instruction sets. In addition, a coprocessor interface for the Proto3-ARM9TM processor is designed and implemented. Comparing to the Proto-ARM9M processor, the operating frequency is increased from 21 MHz to 45.3 MHz on the same FPGA platform, the IPC is increased from 0.47 to 0.7 on the same set of testing programs, and the performance is increased by an amount of 221.58%. The Proto3-ARM9TM system along with AMBA and its related peripherals, are implemented and verified at Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18μm cell library, respectively. When realized with the FPGA, the Proto3-ARM9TM system consumes 9728 LUTs and operates at the maximum frequency of 31 MHz. When realized with the cell library, the Proto3-ARM9TM has a core area of 1.400×1.393 mm2 and the whole chip area is 2.280×2.274 mm2. The average power consumption is 83 mW at the operating frequency of 134 MHz.
Chang, Tzu-Mu, and 張慈牧. "The Design and Implementation of a 6502 Instruction Set Architecture Compatible Microprocessor IP." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/59486858963263400232.
Full text國立臺灣科技大學
電子工程系
91
In this thesis, a 6502 instruction set architecture compatible microprocessor IP (Intellectual Property), Proto-6502, is proposed. Since the reusability of IP depends on the completeness of its verification, a reconfigurable and automatic comparable verification environment is also proposed to insure the consistency of every implementation stage and the completeness of the verification. According to the behavior of every 6502 instruction, we design the ASM chart and the microoperations of Proto-6502. For balancing the speed and the area, the microoperations and their corresponding datapath of Proto-6502 are adjusted according to the delay-cost analysis result. All of the behavioral-level and the register transfer level (RTL) designs have been verified by the verification environment and the behavior of all instructions are consistent with the 6502 microprocessor; in addition, the average statement coverage of every module in Proto-6502 is 94.4%. In the synthesis stage, a suitable state encoding style of Proto-6502’s FSM is found according to the area, speed, and power analyses. To improve the power consumption of the circuit, we use the gated clock method to reduce the unnecessary switching activities of the registers. The reduction of power consumption is 20.81%. Proto-6502 has been implemented and verified with Xilnx Vertex 400 FPGA and TSMC 0.25 mm cell library. In the FPGA part, it takes 2010 LUTs and operates at the internal working frequency of 12.9 MHz. In the cell-based part, the core occupies the area of 333 mm 333 mm, which is approximately equivalent to 3700 gates, and consumes about 68 mW in the typical operating condition at the internal working frequency of 80 MHz.
Lin, Jin-Ho, and 林晉禾. "The Design and Verification of an ARM v4 Instruction Set Architecture Compatible Microprocessor IP." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/35790359121880667696.
Full text國立臺灣科技大學
電子工程系
93
In this thesis, an ARM v4 instruction set architecture compatible microprocessor IP (Intellectual Property), Proto-ARM9M, is proposed. Since the reusability of IP depends on the completeness of its verification, we also develop a test environment to demonstrate IP accuracy on each steps of ASIC design flow, such as behavioral level, register transfer level, post-synthesized gate level, and post-layout gate level model. Based on the behavior of ARM v4 instruction set architecture, we design the behavioral-level model of Proto-ARM9M, and establish a test environment to verify it. After the behavioral-level model verification is done, we start to design register-transfer-level model of Proto-ARM9M. A typical five-stage pipeline is used in the Proto-ARM9M datapath. The individual module of datapath, such as instruction decoder, register file, shifter, arithmetic and logic unit, multiply-accumulator, and program status register are designed carefully to improve performance and decrease area. The register-transfer-level simulation results in the testbech are the same as ARM instruction simulator, ADS (ARM Developer Suite), and the average code coverage of every module in Proto-ARM9M is 96.58%. Proto-ARM9M has been implemented and verified with Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.35 μm cell library. In the FPGA part, it takes 9304 LUTs and operates at the maximum working frequency of 21 MHz. Furthermore, all of the testing programs are run successfully in FPGA development board. In the cell-based part, the core occupies 3420.8 μm 3212.5 μm, which is approximately equivalent to 55450 gates, and the whole chip occupies 5251.8 μm 5087.4 μm. Proto-ARM9M consumes about 151.2 mW to 192.8 mW in the SS(Slow NMOS Slow PMOS model)simulation condition at the maximum working frequency of 33 MHz.
Hung, Yu-Ting, and 洪毓廷. "X86-64 Instruction Set Architecture Supports for an LLVM-Based Retargetable Hybrid Binary Translator." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/15113686460618532792.
Full text國立交通大學
資訊科學與工程研究所
105
Hybrid binary translation (HBT) is a binary translation technology which combines the technologies of static binary translation and dynamic binary translation. The HBT-86 is an LLVM-based retargetable HBT system for x86 instruction set architecture (ISA). For the previous HBT-86, the front-end supports only the x86 integer, x87 floating-point and a part of SSE SIMD integer instruction sets, and the back-end supports the x86 and x64 target platforms. However, comparing with 32-bit ISA, 64-bit ISA can access larger memory and registers. Thus, there are more and more 64-bit executables of applications in recent years. In this thesis, we extend the previous HBT-86 to support x64 source ISA. Moreover, for validating the retargetability of HBT-86, we extend the previous HBT-86 to support ARM-64 target ISA. For x64 to x64 emulation experiments, our HBT-86 is about 2.30 and 2.14 times faster than QEMU for SPEC2006 CINT and SPEC2006 CFP benchmark, respectively. For x64 to ARM-64 emulation, our HBT-86 is about 3.68 and 9.27 times faster than QEMU for SPEC2006 CINT and SPEC2006 CFP benchmark, respectively.
江建德. "Efficient Two-Layered Cycle-Accurate Modeling Technique for Processor Family with Same Instruction Set Architecture." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/54733771635248826664.
Full textWu, Jin-You, and 巫謹佑. "The design and implementation of an SoC based on the RISC-V Instruction Set Architecture." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qb64q4.
Full text國立交通大學
資訊科學與工程研究所
107
The purpose of this research is to design and implement a system-on-chip (SoC) based on the RISC-V instruction set architecture (ISA). RISC-V is an open source ISA based on the Reduced Instruction Set Computing (RISC) principle. Compared to most RISC ISAs, the RISC-V ISA is freely available for any purposes. Allowing anyone to design, manufacture and sell RISC-V chips and software. In recent years, RISC-V has risen rapidly due to the booming development of the Internet of Things (IoT) and the licensing and patents of the commercial RISC ISA. In response to the fragmentation of IoT scenarios, there is a high demand for low power, low cost and customization, which are the features of RISC-V ISA. Therefore, in this research, we will design and implementation a RISC-V processor that supports RV32IM. After the design and implementation, we will pass the RISC-V official ISA tests and use the standard Dhrystone benchmark to evaluate the performance (DMIPS/MHz) of our purposed RISC-V processor in a full-system simulation environment. In addition, we will also compile the binary file with the open source GNU Compiler Toolchain and verify the correctness of the proposed process SoC running on the Xilinx FPGA, KC705 development platform.
Gribble, Donald L. "A new RISC architecture for high speed data acquisition." Thesis, 1991. http://hdl.handle.net/1957/37001.
Full textGraduation date: 1992
Syu, Dong-Fong, and 許東豐. "The Design and Implementation of a Soft-Core Processor based on the MicroBlaze Instruction Set Architecture." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/f85auq.
Full text國立交通大學
資訊科學與工程研究所
107
This research aims to design and implement a 32-bit soft-core RISC processor based on the Xilinx MicroBlaze Instruction Set Architecture (ISA). MicroBlaze is a 32-bit RISC soft-core processor that provides a series of configurable features, complete software toolchain support, and flexible interfaces communicating with peripherals, memory, and other IPs. Therefore, many application-level researches have been conducted based on the MicroBlaze processor. However, the synthesizable RTL model of the MicroBlaze processor is not in public domain. This proprietary nature prevents developers from gaining deeper insights into their designs. At present, only a few research projects, such as OpenFire, MB-Lite, and SecretBlaze, have been carried out on investigating and implementing the MicroBlaze processor’s microarchitecture. Although these researches are open-source projects, the proposed synthesizable processors are not well-tested with complex software systems and the technical documentations are not thorough enough for practical replacement of the MicroBlaze processor from Xilinx. Therefore, in this research, we will focus on the design and implementation of a 32-bit soft-core processor – KernelBlaze based on the MicroBlaze ISA and the DLX architecture. We will use Dhrystone, an integer Benchmark program running on bare-metal systems, to evaluate the performance (DMIPS/MHz) of the proposed KernelBlaze processor. In addition, a FreeRTOS application will also be used to verify the correctness of the KernelBlaze processor running on the Xilinx development board - KC705. Finally, we will compare the design trade-offs of KernelBlaze with the OpenFire processor in terms of DMIPS/MHz, resource utilization, and maximal synthesizable circuit frequency.
Hsu, Chung-yang, and 徐昌陽. "The Design and Verification of an ARM v4 Instruction Set Architecture Compatible Memory-Management-Unit IP." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/60592735992821194446.
Full text國立臺灣科技大學
電子工程系
96
MMU, sometimes called “paged memory management unit”(PMMU), is a computer hardware component responsible for handling accesses to memory requested by CPU. Its functions include address translation, access permission checks for instruction and data address, and memory sharing. By translating virtual addresses to physical address, it helps the operating system manage virtual memory with hardware support. In this thesis, an MMU IP (Intellectual Property) compatible to ARM v4 architecture is proposed. The MMU consists of an FSM(Finite State Machine) Control Unit, TLB(Translation Look-aside Buffer), a calculation and protection module, and an AMBA(Advanced Microcontroller Bus Architecture) Interface to read the translation table in the main memory. Proto-ARM922, which is combined proto-ARM9M with cache, system co-processor, MMU, and AMBA interface, has been implemented and verified with Xilinx Spartan-3 XC3S1500-4FG676 FPGA and TSMC 0.18 μm cell library. In the FPGA part, it takes 21211 LUTs and operates at the maximum working frequency of 11 MHz. Furthermore, all of the testing programs are run successfully in FPGA development board. In the cell-based part, the core occupies 3183.26 μm × 3423.08 μm, which is approximately equivalent to 481533 gates, and the whole chip occupies 4088 μm × 4081 μm, and in the SS (Slow NMOS Slow PMOS model) simulation condition it operates at the maximum working frequency of 40 MHz, and it comsumes about 167.1 mW.
"Very large register file for BLAS-3 operations." Chinese University of Hong Kong, 1995. http://library.cuhk.edu.hk/record=b5888541.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1995.
Includes bibliographical references (leaves 117-118).
Abstract --- p.i
Acknowledgement --- p.iii
List of Tables --- p.v
List of Figures --- p.vi
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- BLAS-3 Operations --- p.2
Chapter 1.2 --- Organization of Thesis --- p.2
Chapter 1.3 --- Contribution --- p.3
Chapter 2 --- Background Studies --- p.4
Chapter 2.1 --- Registers & Cache Memory --- p.4
Chapter 2.2 --- Previous Research --- p.6
Chapter 2.3 --- Problem of Register & Cache --- p.8
Chapter 2.4 --- BLAS-3 Operations On RISC Microprocessor --- p.10
Chapter 3 --- Compiler Optimization Techniques for BLAS-3 Operations --- p.12
Chapter 3.1 --- One-Dimensional Q-Way J-Loop Unrolling --- p.13
Chapter 3.2 --- Two-Dimensional P×Q -Ways I×J-Loops Unrolling --- p.15
Chapter 3.3 --- Addition of Code to Remove Redundant Code --- p.17
Chapter 3.4 --- Simulation Result --- p.17
Chapter 3.5 --- Summary --- p.23
Chapter 4 --- Architectural Model of Very Large Register File --- p.25
Chapter 4.1 --- Architectural Model --- p.26
Chapter 4.2 --- Traditional Register File vs. Very Large Register File --- p.32
Chapter 5 --- Ideal Case Study of Very Large Register File --- p.35
Chapter 5.1 --- Matrix Multiply --- p.36
Chapter 5.2 --- LU Decomposition --- p.41
Chapter 5.3 --- Convolution --- p.50
Chapter 6 --- Worst Case Study of Very Large Register File --- p.58
Chapter 6.1 --- Matrix Multiply --- p.59
Chapter 6.2 --- LU Decomposition --- p.65
Chapter 6.3 --- Convolution --- p.74
Chapter 7 --- Proposed Case Study of Very Large Register File --- p.81
Chapter 7.1 --- Matrix Multiply --- p.82
Chapter 7.2 --- LU Decomposition --- p.91
Chapter 7.3 --- Convolution --- p.102
Chapter 7.4 --- Comparison --- p.111
Chapter 8 --- Conclusion & Future Work --- p.114
Chapter 8.1 --- Summary --- p.114
Chapter 8.2 --- Future Work --- p.115
Bibliography --- p.117