Dissertations / Theses: 'Data processors'

1

Chen, Tien-Fu. "Data prefetching for high-performance processors /." Thesis, Connect to this title online; UW restricted, 1993. http://hdl.handle.net/1773/6871.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

García, Almiñana Jordi. "Automatic data distribution for massively parallel processors." Doctoral thesis, Universitat Politècnica de Catalunya, 1997. http://hdl.handle.net/10803/5981.

Full text

Abstract:

Massively Parallel Processor systems provide the required computational power to solve most large scale High Performance Computing applications. Machines with physically distributed memory allow a cost-effective way to achieve this performance, however, these systems are very diffcult to program and tune. In a distributed-memory organization each processor has direct access to its local memory, and indirect access to the remote memories of other processors. But the cost of accessing a local memory location can be more than one order of magnitude faster than accessing a remote memory location. In these systems, the choice of a good data distribution strategy can dramatically improve performance, although different parts of the data distribution problem have been proved to be NP-complete.
The selection of an optimal data placement depends on the program structure, the program's data sizes, the compiler capabilities, and some characteristics of the target machine. In addition, there is often a trade-off between minimizing interprocessor data movement and load balancing on processors. Automatic data distribution tools can assist the programmer in the selection of a good data layout strategy. These use to be source-to-source tools which annotate the original program with data distribution directives.
Crucial aspects such as data movement, parallelism, and load balance have to be taken into consideration in a unified way to efficiently solve the data distribution problem.
In this thesis a framework for automatic data distribution is presented, in the context of a parallelizing environment for massive parallel processor (MPP) systems. The applications considered for parallelization are usually regular problems, in which data structures are dense arrays. The data mapping strategy generated is optimal for a given problem size and target MPP architecture, according to our current cost and compilation model.
A single data structure, named Communication-Parallelism Graph (CPG), that holds symbolic information related to data movement and parallelism inherent in the whole program, is the core of our approach. This data structure allows the estimation of the data movement and parallelism effects of any data distribution strategy supported by our model. Assuming that some program characteristics have been obtained by profiling and that some specific target machine features have been provided, the symbolic information included in the CPG can be replaced by constant values expressed in seconds representing data movement time overhead and saving time due to parallelization. The CPG is then used to model a minimal path problem which is solved by a general purpose linear 0-1 integer programming solver. Linear programming techniques guarantees that the solution provided is optimal, and it is highly effcient to solve this kind of problems.
The data mapping capabilities provided by the tool includes alignment of the arrays, one or two-dimensional distribution with BLOCK or CYCLIC fashion, a set of remapping actions to be performed between phases if profitable, plus the parallelization strategy associated.
The effects of control flow statements between phases are taken into account in order to improve the accuracy of the model. The novelty of the approach resides in handling all stages of the data distribution problem, that traditionally have been treated in several independent phases, in a single step, and providing an optimal solution according to our model.

APA, Harvard, Vancouver, ISO, and other styles

3

Agarwal, Virat. "Algorithm design on multicore processors for massive-data analysis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34839.

Full text

Abstract:

Analyzing massive-data sets and streams is computationally very challenging. Data sets in systems biology, network analysis and security use network abstraction to construct large-scale graphs. Graph algorithms such as traversal and search are memory-intensive and typically require very little computation, with access patterns that are irregular and fine-grained. The increasing streaming data rates in various domains such as security, mining, and finance leaves algorithm designers with only a handful of clock cycles (with current general purpose computing technology) to process every incoming byte of data in-core at real-time. This along with increasing complexity of mining patterns and other analytics puts further pressure on already high computational requirement. Processing streaming data in finance comes with an additional constraint to process at low latency, that restricts the algorithm to use common techniques such as batching to obtain high throughput. The primary contributions of this dissertation are the design of novel parallel data analysis algorithms for graph traversal on large-scale graphs, pattern recognition and keyword scanning on massive streaming data, financial market data feed processing and analytics, and data transformation, that capture the machine-independent aspects, to guarantee portability with performance to future processors, with high performance implementations on multicore processors that embed processorspecific optimizations. Our breadth first search graph traversal algorithm demonstrates a capability to process massive graphs with billions of vertices and edges on commodity multicore processors at rates that are competitive with supercomputing results in the recent literature. We also present high performance scalable keyword scanning on streaming data using novel automata compression algorithm, a model of computation based on small software content addressable memories (CAMs) and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-level algorithmic approach to process financial feeds we present a solution that decodes and normalizes option market data at rates an order of magnitude more than the current needs of the market, yet portable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithm design challenges to process massive-data and present solutions and techniques that we believe can be used and extended to solve future research problems in this domain.

APA, Harvard, Vancouver, ISO, and other styles

4

Dreibelbis, Harold N., Dennis Kelsch, and Larry James. "REAL-TIME TELEMETRY DATA PROCESSING and LARGE SCALE PROCESSORS." International Foundation for Telemetering, 1991. http://hdl.handle.net/10150/612912.

Full text

Abstract:

International Telemetering Conference Proceedings / November 04-07, 1991 / Riviera Hotel and Convention Center, Las Vegas, Nevada
Real-time data processing of telemetry data has evolved from a highly centralized single large scale computer system to multiple mini-computers or super mini-computers tied together in a loosely coupled distributed network. Each mini-computer or super mini-computer essentially performing a single function in the real-time processing sequence of events. The reasons in the past for this evolution are many and varied. This paper will review some of the more significant factors in that evolution and will present some alternatives to a fully distributed mini-computer network that appear to offer significant real-time data processing advantages.

APA, Harvard, Vancouver, ISO, and other styles

5

Bartlett, Viv A. "Exploiting data dependencies in low power asynchronous VLSI signal processors." Thesis, University of Westminster, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252037.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Revuelta, Fernández Borja. "Study of Scalable Architectures on FPGA for Space Data Processors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254909.

Full text

Abstract:

Spacecrafts are notably complex systems designed and constructed in multidisciplinary teams. The on-board computer of a spacecraft is part of the on-board data systems in charge of the on-board processing and handling of payload data collected from the instruments, which require high-performance radiationhardened-by-design (RHBD) processing devices. Over the last decades, the demand of high-performance systems used as on-board payload data processing systems has been increasing steadily due to new mission requirements, such as flexibility, faster development time, and new applications. At the same time, this user trend creates a need for higher performance components operating in radiation environments. The architecture proposed in this thesis is motivated by the results of recent activities supported by the European Space Agency (ESA) in the fields of Network-on-Chips (NoC) and floating-point VLIW Digital Signal Processors (DSPs). This architecture aims to study scaling aspects of VLIW-enable DSP SoC designs using FPGAs. The project shall perform the necessary pre-study activities required for the SoC design, such as synthesis of the IPs on the target FPGA technology. Lastly, using several DSPs for processing, a LEON3 processor for control, and several components from the GRLIB IP Library, the architecture implemented provides the user an with early version of a platform for further software development on multi-DSP platforms. Also, this architecture consists on a quad-core DSP system, providing a high-performance platform.
Satelliter och rymdfarkoster är komplexa system som utvecklas av tvärvetenskapliga designteam. I ombordsystemen är omborddatorn ansvarig för processering och hantering av vetenskaplig data från nyttolasterna, och kräver strålningstoleranta processorer med hög prestanda. Under de senaste årtiondena har efterfrågan på högpresterande system för omborddatabehandling ökat stadigt på grund av nya uppdragskrav, såsom ökad flexibilitet, snabbare utvecklingstid och nya applikationer. Av samma anledningar har efterfrågan för strålningstoleranta processorkomponenter med ännu högra prestanda ökat med liknande takt. I den här mastersavhandlingen föreslås en arkitektur som är motiverad av aktiviteter med stöd av ESA (den Europeiska rymdorganisationen) inom nätverk i integrerade kretsar (“Network-On-Chip”, NoC) och signalprocessorer (“DSP”, Digital Signal Processor) med flyttalsstöd. Den föreslagna arkitekturen avses vara lämplig för att användas till att undersöka skalering av komplex kretsdesign av integrerade system (“System-on-Chip”, SoC) med signalprocessorer, m.h.a. FPGA-teknologi. Slutligen, genom att använda flera dedikerade processorer för signalbehandling, en LEON3-processor för kontroll, och flera komponenter från GRLIB-biblioteket, ges det möjlighet för en potentiell avändare att göra mjukvarutester på ett flerkärnigt inbyggt processorsystem. Arkitekturen har designats med ett fyrkärnigt signalprocesserings system, vilket anses ge en hög prestanda.

APA, Harvard, Vancouver, ISO, and other styles

7

Mullins, Robert D. "Dynamic instruction scheduling and data forwarding in asynchronous superscalar processors." Thesis, University of Edinburgh, 2001. http://hdl.handle.net/1842/12701.

Full text

Abstract:

Improvements in semiconductor technology have supported an exponential growth in microprocessor performance for many years. The ability to continue on this trend throughout the current decade poses serious challenges as feature sizes enter the deep sub-micron range. The problems due to increasing power consumption, clock distribution and the growing complexity of both design and verification, may soon limit the extent to which the underlying technological advances may be exploited. One approach which may ease these problems is the adoption of an asynchronous design style - one in which the global clock signal is omitted. Commonly-cited advantages include: the ability to exploit local variations in processing speed, the absence of a clock signal and its distribution network, and the ease of reuse and composability provided through the use of delay-insensitive module interfaces. While the techniques to design such circuits have matured over the past decade, studies of the impact of asynchrony of processor architecture have been less common. One challenge in particular is to develop multiple-issue architectures that are able to fully exploit asynchronous operation. Multiple-issue architectures have traditionally exploited the determinism and predictability ensured by synchronous operation. Unfortunately, this limits the effectiveness of the architecture when the clock is removed. The work presented in this dissertation describes in detail the problems of exploiting asynchrony in the design of superscalar processors. A number of techniques are presented for implementing both data forwarding and dynamic scheduling mechanisms, techniques that are central to exploiting instruction-level parallelism and achieving high-performance. A technique called instruction compounding is introduced, which appends dependency information to instructions during compilation, which can be exploited at run-time. This simplifies the implementation of both the dynamic scheduling and data-forwarding mechanisms.

APA, Harvard, Vancouver, ISO, and other styles

8

Åström, Fransson Donny. "Utilizing Multicore Processors with Streamed Data Parallel Applications for Mobile Platforms." Thesis, KTH, Elektronik- och datorsystem, ECS, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-125822.

Full text

Abstract:

Performance has always been a major concern among computers and microprocessors. So have the qualities of the applications that are executed by these processors. When the multicore evolution began, programmers faced a shift in programming paradigms required to utilize the full potential of these processors. Now that the same evolution has reached the market of mobile devices, the developers focusing on mobile platforms are facing the same challenge. This thesis focuses on assessing some of the possible application quality gains that can be achieved by adopting parallel programming techniques for mobile platforms. In particular, throughput performance, low-latency performance and power consumption. A Proof of Concept application was developed to measure these specific qualities using a streamed data parallel approach. By adopting proper parallel programming techniques to utilize the multicore processor architecture, it was possible to achieve 90% better throughput performance or the same latency quality with 60% lower CPU frequency. Unfortunately, the power consumption could not be accurately measured using the available hardware.

APA, Harvard, Vancouver, ISO, and other styles

9

Duric, Milovan. "Specialization and reconfiguration of lightweight mobile processors for data-parallel applications." Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/386568.

Full text

Abstract:

The worldwide utilization of mobile devices makes the segment of low power mobile processors leading in the entire computer industry. Customers demand low-cost, high-performance and energy-efficient mobile devices, which execute sophisticated mobile applications such as multimedia and 3D games. State-of-the-art mobile devices already utilize chip multiprocessors (CMP) with dedicated accelerators that exploit data-level parallelism (DLP) in these applications. Such heterogeneous system design enable the mobile processors to deliver the desired performance and efficiency. The heterogeneity however increases the processors complexity and manufacturing cost when adding extra special-purpose hardware for the accelerators. In this thesis, we propose new hardware techniques that leverage the available resources of a mobile CMP to achieve cost-effective acceleration of DLP workloads. Our techniques are inspired by classic vector architectures and the latest reconfigurable architectures, which both achieve high power efficiency when running DLP workloads. The high requirement of additional resources for these two architectures limits their applicability beyond high-performance computers. To achieve their advantages in mobile devices, we propose techniques that: 1) specialize the lightweight mobile cores for classic vector execution of DLP workloads; 2) dynamically tune the number of cores for the specialized execution; and 3) reconfigure a bulk of the existing general purpose execution resources into a compute hardware accelerator. Specialization enables one or more cores to process configurable large vector operands with new special purpose vector instructions. Reconfiguration goes one step further and allow the compute hardware in mobile cores to dynamically implement the entire functionality of diverse compute algorithms. The proposed specialization and reconfiguration techniques are applicable to a diverse range of general purpose processors available in mobile devices nowadays. However, we chose to implement and evaluate them on a lightweight processor based on the Explicit Data Graph Execution architecture, which we find promising for the research of low-power processors. The implemented techniques improve the mobile processor performance and the efficiency on its existing general purpose resources. The processor with enabled specialization/reconfiguration techniques efficiently exploits DLP without the extra cost of special-purpose accelerators.
La utilización de dispositivos móviles a nivel mundial hace que el segmento de procesadores móviles de bajo consumo lidere la industria de computación. Los clientes piden dispositivos móviles de bajo coste, alto rendimiento y bajo consumo, que ejecuten aplicaciones móviles sofisticadas, tales como multimedia y juegos 3D.Los dispositivos móviles más avanzados utilizan chips con multiprocesadores (CMP) con aceleradores dedicados que explotan el paralelismo a nivel de datos (DLP) en estas aplicaciones. Tal diseño de sistemas heterogéneos permite a los procesadores móviles ofrecer el rendimiento y la eficiencia deseada. La heterogeneidad sin embargo aumenta la complejidad y el coste de fabricación de los procesadores al agregar hardware de propósito específico adicional para implementar los aceleradores. En esta tesis se proponen nuevas técnicas de hardware que aprovechan los recursos disponibles en un CMP móvil para lograr una aceleración con bajo coste de las aplicaciones con DLP. Nuestras técnicas están inspiradas por los procesadores vectoriales clásicos y por las recientes arquitecturas reconfigurables, pues ambas logran alta eficiencia en potencia al ejecutar cargas de trabajo DLP. Pero la alta exigencia de recursos adicionales que estas dos arquitecturas necesitan, limita sus aplicabilidad más allá de las computadoras de alto rendimiento. Para lograr sus ventajas en dispositivos móviles, en esta tesis se proponen técnicas que: 1) especializan núcleos móviles ligeros para la ejecución vectorial clásica de cargas de trabajo DLP; 2) ajustan dinámicamente el número de núcleos de ejecución especializada; y 3) reconfiguran en bloque los recursos existentes de ejecución de propósito general en un acelerador hardware de computación. La especialización permite a uno o más núcleos procesar cantidades configurables de operandos vectoriales largos con nuevas instrucciones vectoriales. La reconfiguración da un paso más y permite que el hardware de cómputo en los núcleos móviles ejecute dinámicamente toda la funcionalidad de diversos algoritmos informáticos. Las técnicas de especialización y reconfiguración propuestas son aplicables a diversos procesadores de propósito general disponibles en los dispositivos móviles de hoy en día. Sin embargo, en esta tesis se ha optado por implementarlas y evaluarlas en un procesador ligero basado en la arquitectura "Explicit Data Graph Execution", que encontramos prometedora para la investigación de procesadores de baja potencia. Las técnicas aplicadas mejoraran el rendimiento del procesador móvil y la eficiencia energética de sus recursos para propósito general ya existentes. El procesador con técnicas de especialización/reconfiguración habilitadas explota eficientemente el DLP sin el coste adicional de los aceleradores de propósito especial.

APA, Harvard, Vancouver, ISO, and other styles

10

Picciau, Andrea. "Concurrency and data locality for sparse linear algebra on modern processors." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/58884.

Full text

Abstract:

Graphics processing units (GPUs) are used as accelerators for algorithms in which the same instructions are carried out on different data. Algorithms for sparse linear algebra can achieve good performance on GPU, although they tend to have an irregular pattern of accesses to memory. The performance of these algorithms is highly dependent on input data. In fact, the parallelism these algorithms can achieve is limited by the opportunities for concurrency given by the data. Focusing on the solution of sparse riangular linear systems of equations, this thesis shows that a good partitioning of the data and a good scheduling of the computation can greatly improve performance on GPUs. For this class of algorithms, a partition of the data that maximises concurrency in the execution does not necessarily achieve the best performance. Instead, improving data locality by reducing concurrency reduces the latency of memory access and consequently the execution time. First, this work characterises the problem formally using graph theory and performance models. Then, algorithms that can be used effectively to partition the data are described. These algoritms aim to balance concurrency and data locality automatically. This approach is evaluated experimentally on the solution of linear equations with the preconditioned conjugate gradient method. Also, the thesis shows that the proposed approach can be used in the case when a matrix changes during the execution of an algorithm from one iteration to the other, like in the simplex method. In this case, the approach proposed in this thesis allows to update the partition of the matrix from one iteration to the other. Finally, the algorithms and performance models developed in the thesis are used to discuss the limitations of the acceleration of the simplex method with GPUs.

APA, Harvard, Vancouver, ISO, and other styles

11

Erici, Michael. "A processor in control : a study of whether processors face increased liability under the General Data Protection Regulation." Thesis, Stockholms universitet, Juridiska institutionen, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-142776.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Yu, Jason Kwok Kwun. "Vector processing as a soft-core processor accelerator." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/2394.

Full text

Abstract:

Soft processors simplify hardware design by being able to implement complex control strategies using software. However, they are not fast enough for many intensive data-processing tasks, such as highly data-parallel embedded applications. This thesis suggests adding a vector processing core to the soft processor as a general-purpose accelerator for these types of applications. The approach has the benefits of a purely software-oriented development model, a fixed ISA allowing parallel software and hardware development, a single accelerator that can accelerate multiple functions in an application, and scalable performance with a single source code. With no hardware design experience needed, a software programmer can make area-versus-performance tradeoffs by scaling the number of functional units and register file bandwidth with a single parameter. The soft vector processor can be further customized by a number of secondary parameters to add and remove features for the specific application to optimize resource utilization. This thesis shows that a vector processing architecture maps efficiently into an FPGA and provides a scalable amount of performance for a reasonable amount of area. Configurations of the soft vector processor with different performance levels are estimated to achieve speedups of 2-24x for 5-26x the area of a Nios II/s processor on three benchmark kernels.

APA, Harvard, Vancouver, ISO, and other styles

13

Davis, Edward L., and William E. Grahame. "HELICOPTER FLIGHT TESTING and REAL TIME ANALYSIS with DATA FLOW ARRAY PROCESSORS." International Foundation for Telemetering, 1986. http://hdl.handle.net/10150/615414.

Full text

Abstract:

International Telemetering Conference Proceedings / October 13-16, 1986 / Riviera Hotel, Las Vegas, Nevada
When flight testing helicopters, it is essential to process and analyze many parameters spontaneously and accurately for instantaneous feedback in order to make spot decisions on the safety and integrity of the aircraft. As various maneuvers stress the airframe or load oscillatory components, the absolute limits as well as interrelated limits including average and cumulative cycle loading must be continuously monitored. This paper presents a complete acquisition and analysis system (LDF/ADS) that contains modularly expandable array processors which provide real time acquisition, processing and analysis of multiple concurrent data streams and parameters. Simple limits checking and engineering units conversions are performed as well as more complex spectrum analyses, correlations and other high level interprocessing interactively with the operator. An example configuration is presented herein which illustrates how the system interacts with the operator during an actual flight test. The processed and derived parameters are discussed and the part they play in decision making is demonstrated. The LDF/ADS system may perform vibration analyses on many structural components during flight. Potential problems may also be isolated and reported during flight. Signatures or frequency domain representations of past problems or failures may be stored in nonvolatile memory and the LDF/ADS system will perform real time convolutions to determine the degrees of correlation of a present problem with all known past problems and reply instantly. This real time fault isolation is an indispensable tool for potential savings in lives and aircraft as well as eliminating unnecessary down time.

APA, Harvard, Vancouver, ISO, and other styles

14

Swenson, Kim Christian. "Exploiting network processors for low latency, high throughput, rate-based sensor update delivery." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Thesis/Fall2009/k_swenson_121109.pdf.

Full text

Abstract:

Thesis (M.S. in computer science)--Washington State University, December 2009.
Title from PDF title page (viewed on Feb. 9, 2010). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 92-94).

APA, Harvard, Vancouver, ISO, and other styles

15

DUTT, Nikil D., Hiroaki TAKADA, and Hiroyuki TOMIYAMA. "Memory Data Organization for Low-Energy Address Buses." Institute of Electronics, Information and Communication Engineers, 2004. http://hdl.handle.net/2237/15042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Lindener, Tobias. "Enabling Arbitrary Memory Constraint Standing Queries on Distributed Stream Processors using Approximate Algorithms." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237459.

Full text

Abstract:

Relational algebra and SQL have been a standard in declarative analytics for decades. Yet, at web-scale, even simple analytics queries can prove challenging within Distributed Stream Processing environments. Two examples of such queries are "count" and "count distinct". Since aforementioned queries require persistence of all keys (the value identifying an element), such queries would result in continuously increasing memory demand. Through approximation techniques with fixed-size memory layouts, said tasks are feasible and potentially more resource efficient within streaming systems. Within this thesis, (1) the advantages of approximate queries on distributed stream processing are demonstrated. Furthermore, (2) the resource efficiency as well as (3) challenges of approximation techniques, and (4) dataset dependent optimizations are presented. The prototype is implemented using the Yahoo Data Sketch library on Apache Flink. Based on the evaluation results and the experiences with the prototype, potential improvements like deeper integration into the streaming framework are presented. Throughout the analysis, the combination of approximate algorithms and distributed stream processing shows promising results depending on the dataset and the required accuracy.
Relationsalgebra och SQL har varit standard inom analys i decennier. Inom distribuerade strömmande datamiljöer på web-nivå kan dock även enklare analytiska frågor visa sig utmanande. Två exempel är frågorna ”count” och ”count distinct”. Eftersom de nämnda frågorna kräver att alla nycklar (de värden som identifierar ett element) är persistenta så resulterar detta traditionellt i en kontinuerlig ökning av minneskraven. Genom uppskattningsmetoder med bestämd storlek av minnes-layouten blir de ovannämnda frågorna rimliga och potentiellt mer resurseffektiva inom strömmande system. I detta forskningsarbete demonstreras (1) fördelarna samt gränserna for approximativa frågor inom distribuerade strömmande processer. Vidare presenteras (2) resurseffektivitet samt (3) svårigheter med uppskattningsmetoder. (4) Optimeringar med avseende på olika dataset redovisas. Prototypen är implementerad med Yahoo Data Sketch biblioteket på Apache Flink. Möjliga förbättringar som djupare integration inom strömmande ramverk, baserat på evalueringsresultaten samt erfarenheter med prototypen presenteras. I analysen visas att en kombination av approximativa algoritmer och distribuerade strömmande processer resulterar i lovande resultat, beroende på dataset samt begärd noggrannhet.

APA, Harvard, Vancouver, ISO, and other styles

17

Pflieger, Mark Eugene. "A theory of optimal event-related brain signal processors applied to omitted stimulus data /." The Ohio State University, 1991. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487757723995598.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Zhuang, Xiaotong. "Compiler Optimizations for Multithreaded Multicore Network Processors." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11566.

Full text

Abstract:

Network processors are new types of multithreaded multicore processors geared towards achieving both fast processing speed and flexibility of programming. The architecture of network processors considers many special properties for packet processing, including multiple threads, multiple processor cores on the same chip, special functional units, simplified ISA and simplified pipeline, etc. The architectural peculiarities of network processors raise new challenges for compiler design and optimization. Due to very high clocking speeds, the CPU memory gap on such processors is huge, making registers extremely precious. Moreover, the register file is split into two banks, and for any ALU instruction, the two source operands must come from different banks. We present and compare three different approaches to do register allocation and bank assignment. We also address the problem of sharing registers across threads in order to maximize the utilization of hardware resources. The context switches on the IXP network processor only happen when long latency operations are encountered. As a result, context switches are highly frequent. Therefore, the designer of the IXP network processor decided to make context switches extremely lightweight, i.e. only the program counter(PC) is stored together with the context. Since registers are not saved and restored during context switches, it becomes difficult to share registers across threads. For a conventional processor, each thread can assume that it can use the entire register file, because registers are always part of the context. However, with lightweight context switch, each thread must take a separate piece of the register file, making register usage inefficient. Programs executing on network processors typically have runtime constraints. Scheduling of multiple programs sharing a CPU must be orchestrated by the OS and the hardware using certain sharing policies. Real time applications demand a real time aware OS kernel to meet their specified deadlines. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms is typically feasible. In this work, we demonstrate that a compiler approach could achieve some of the OS scheduling and real time scheduling functionalities without introducing a hefty overhead.

APA, Harvard, Vancouver, ISO, and other styles

19

Silva, João Paulo Sá da. "Data processing in Zynq APSoC." Master's thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/14703.

Full text

Abstract:

Mestrado em Engenharia de Computadores e Telemática
Field-Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985, i.e. less than 30 years ago. The influence of FPGAs on many directions in engineering is growing continuously and rapidly. There are many reasons for such progress and the most important are the inherent reconfigurability of FPGAs and relatively cheap development cost. Recent field-configurable micro-chips combine the capabilities of software and hardware by incorporating multi-core processors and reconfigurable logic enabling the development of highly optimized computational systems for a vast variety of practical applications, including high-performance computing, data, signal and image processing, embedded systems, and many others. In this context, the main goals of the thesis are to study the new micro-chips, namely the Zynq-7000 family and to apply them to two selected case studies: data sort and Hamming weight calculation for long vectors.
Field-Programmable Gate Arrays (FPGAs) foram inventadas pela Xilinx em 1985, ou seja, há menos de 30 anos. A influência das FPGAs está a crescer continua e rapidamente em muitos ramos de engenharia. Há varias razões para esta evolução, as mais importantes são a sua capacidade de reconfiguração inerente e os baixos custos de desenvolvimento. Os micro-chips mais recentes baseados em FPGAs combinam capacidades de software e hardware através da incorporação de processadores multi-core e lógica reconfigurável permitindo o desenvolvimento de sistemas computacionais altamente otimizados para uma grande variedade de aplicações práticas, incluindo computação de alto desempenho, processamento de dados, de sinal e imagem, sistemas embutidos, e muitos outros. Neste contexto, este trabalho tem como o objetivo principal estudar estes novos micro-chips, nomeadamente a família Zynq-7000, para encontrar as melhores formas de potenciar as vantagens deste sistema usando casos de estudo como ordenação de dados e cálculo do peso de Hamming para vetores longos.

APA, Harvard, Vancouver, ISO, and other styles

20

BenDor, Jonathan, and J. D. Baker. "Processing Real-Time Telemetry with Multiple Embedded Processors." International Foundation for Telemetering, 1994. http://hdl.handle.net/10150/611671.

Full text

Abstract:

International Telemetering Conference Proceedings / October 17-20, 1994 / Town & Country Hotel and Conference Center, San Diego, California
This paper describes a system in which multiple embedded processors are used for real-time processing of telemetry streams from satellites and radars. Embedded EPC-5 modules are plugged into VME slots in a Loral System 550. Telemetry streams are acquired and decommutated by the System 550, and selected parameters are packetized and appended to a mailbox which resides in VME memory. A Windows-based program continuously fetches packets from the mailbox, processes the data, writes to log files, displays processing results on screen, and sends messages via a modem connected to a serial port.

APA, Harvard, Vancouver, ISO, and other styles

21

Baumstark, Lewis Benton Jr. "Extracting Data-Level Parallelism from Sequential Programs for SIMD Execution." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/4823.

Full text

Abstract:

The goal of this research is to retarget multimedia programs written in sequential languages (e.g., C) to architectures with data-parallel execution capabilities. Image processing algorithms often have a high potential for data-level parallelism, but the artifacts imposed by the sequential programming language (e.g., loops, pointer variables) can obscure the parallelism and prohibit generation of efficient parallel code. This research presents a program representation and recognition approach for generating a data parallel program specification from sequential source code and retargeting it to data parallel execution mechanisms. The representation is based on an extension of the multi-dimensional synchronous dataflow model of computation. A partial recognition approach identifies and transforms only those program elements that hinder parallelization while leaving other computational elements intact. This permits flexibility in the types of programs that can be retargeted, while avoiding the complexity of complete program recognition. This representation and recognition process is implemented in the PARRET system, which is used to extract the high-level specification of a set of image-processing programs. From this specification, code is generated for Intels SSE2 instruction set and for the SIMPil processor. The results demonstrate that PARRET can exploit, given sufficient parallel resources, the maximum available parallelism in the retargeted applications. Similarly, the results show PARRET can also exploit parallelism on architectures with hardware-limited parallel resources. It is desirable to estimate potential parallelism before undertaking the expensive process of reverse engineering and retargeting. The goal is to narrow down the search space to a select set of loops which have a high likelihood of being data-parallel. This work also presents a hybrid static/dynamic approach, called DLPEST, for estimating the data-level parallelism in sequential program loops. We demonstrate the correctness of the DLPESTs estimates, show that estimates for programs of 25 to 5000 lines of code can be performed in under 10 minutes and that estimation time scales sub-linearly with input program size.

APA, Harvard, Vancouver, ISO, and other styles

22

Tidball, John E. "REAL-TIME HIGH SPEED DATA COLLECTION SYSTEM WITH ADVANCED DATA LINKS." International Foundation for Telemetering, 1997. http://hdl.handle.net/10150/609754.

Full text

Abstract:

International Telemetering Conference Proceedings / October 27-30, 1997 / Riviera Hotel and Convention Center, Las Vegas, Nevada
The purpose of this paper is to describe the development of a very high-speed instrumentation and digital data recording system. The system converts multiple asynchronous analog signals to digital data, forms the data into packets, transmits the packets across fiber-optic lines and routes the data packets to destinations such as high speed recorders, hard disks, Ethernet, and data processing. This system is capable of collecting approximately one hundred megabytes per second of filtered packetized data. The significant system features are its design methodology, system configuration, decoupled interfaces, data as packets, the use of RACEway data and VME control buses, distributed processing on mixedvendor PowerPCs, real-time resource management objects, and an extendible and flexible configuration.

APA, Harvard, Vancouver, ISO, and other styles

23

Lee, Yu-Heng George. "DYNAMIC KERNEL FUNCTION FOR HIGH-SPEED REAL-TIME FAST FOURIER TRANSFORM PROCESSORS." Wright State University / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=wright1260821902.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Johnson, Carl E. "AN APPLICATION OF ETHERNET TECHNOLOGY AND PC TELEMETRY DATA PROCESSORS IN REAL-TIME RANGE SAFETY DECISION MAKING." International Foundation for Telemetering, 1992. http://hdl.handle.net/10150/608910.

Full text

Abstract:

International Telemetering Conference Proceedings / October 26-29, 1992 / Town and Country Hotel and Convention Center, San Diego, California
The ethernet technology has vastly improved the capability to make real-time decisions during the flight of a vehicle. This asset combined with a PC telemetry data processor and the power of a high resolution graphics workstation, allows the decision makers to have a highly reliable graphical display of information on which to make vehicle related safety decisions in real-time.

APA, Harvard, Vancouver, ISO, and other styles

25

Bierman, Cathy. "Revision and writing quality of seventh graders composing with and without word processors." Diss., Virginia Polytechnic Institute and State University, 1988. http://hdl.handle.net/10919/53912.

Full text

Abstract:

This experimental study examined the effects of word processing on revision and writing quality of expository compositions produced by seventh—graders. Thirty—six students in two accelerated English classes served as subjects. Prior to the experimental period, all students completed a handwritten composition (pretest) and received identical instruction in (a) composing and revising and (b) using a word processor. One intact class was randomly assigned as the experimental group. During the six-week treatment period all students wrote six compositions (three drafts per composition). The experimental group completed all composing and revising on the computer and the control group completed their compositions with pen and paper. Posttest l (produced on computer in the experimental group and by hand in the control group) and posttest 2 (handwritten in both groups) were analyzed for the frequency and types of revisions made between first and second drafts. The pretest and three posttests were analyzed for writing quality of final drafts. There were no significant differences: (a) between groups in the number of revisions in posttest l (computer written by experimental subjects and handwritten by control subjects), (b) in percentage of high—level revisions made with and without the word processor, and (c) in quality of compositions produced with and without the computer. There was a significant difference between groups in the number of revisions in handwritten compositions (posttest 2) produced by both groups after the treatment; the word processing group revised more frequently than did the group not exposed to six weeks of word processing. The experimental subjects also significantly increased in frequency of revisions from the time of posttest l (computer written) to posttest 2 (handwritten). A significant difference across time in writing quality scores was found. The findings suggested that students who compose and revise on computer can make substantially more revisions when they resume pen and paper composing and revising; however, use of the word processor does not differentially affect types of revisions attempted or writing quality. Word processors increase motivation, and adequate systems may increase the ability to detect and eliminate textual problems. Recommendations for research, theory, and instruction are discussed.
Ed. D.

APA, Harvard, Vancouver, ISO, and other styles

26

Ungureanu, George. "Automatic Software Synthesis from High-Level ForSyDe Models Targeting Massively Parallel Processors." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-127832.

Full text

Abstract:

In the past decade we have witnessed an abrupt shift to parallel computing subsequent to the increasing demand for performance and functionality that can no longer be satisfied by conventional paradigms. As a consequence, the abstraction gab between the applications and the underlying hardware increased, triggering both industry and academia in several research directions. This thesis project aims at analyzing some of these directions in order to offer a solution for bridging the abstraction gap between the description of a problem at a functional level and the implementation on a heterogeneous parallel platform using ForSyDe – a formal design methodology. This report treats applications employing data-parallel and time-parallel computation, regards nvidia CUDA-enabled GPGPUs as the main backend platform. The report proposes a heuristic transformation-and-refinement process based on analysis methods and design decisions to automate and aid in a correct-by-design backend code synthesis. Its purpose is to identify potential data parallelism and time parallelism in a high-level system. Furthermore, based on a basic platform model, the algorithm load-balances and maps the execution onto the best computation resources in an automated design flow. This design flow will be embedded into an already existing tool, f2cc (ForSyDe-to-CUDA C) and tested for correctness on an industrial-scale image processing application aimed at monitoring inkjet print-heads reliability.

APA, Harvard, Vancouver, ISO, and other styles

27

Majd, Farjam. "Two new parallel processors for real time classification of 3-D moving objects and quad tree generation." PDXScholar, 1985. https://pdxscholar.library.pdx.edu/open_access_etds/3421.

Full text

Abstract:

Two related image processing problems are addressed in this thesis. First, the problem of identification of 3-D objects in real time is explored. An algorithm to solve this problem and a hardware system for parallel implementation of this algorithm are proposed. The classification scheme is based on the "Invariant Numerical Shape Modeling" (INSM) algorithm originally developed for 2-D pattern recognition such as alphanumeric characters. This algorithm is then extended to 3-D and is used for general 3-D object identification. The hardware system is an SIMD parallel processor, designed in bit slice fashion for expandability. It consists of a library of images coded according to the 3-D INSM algorithm and the SIMD classifier which compares the code of the unknown image to the library codes in a single clock pulse to establish its identity. The output of this system consists of three signals: U, for unique identification; M, for multiple identification; and N, for non-identification of the object. Second, the problem of real time image compaction is addressed. The quad tree data structure is described. Based on this structure, a parallel processor with a tree architecture is developed which is independent of the data entry process, i.e., data may be entered pixel by pixel or all at once. The hardware consists of a tree processor containing a tree generator and three separate memory arrays, a data transfer processor, and a main memory unit. The tree generator generates the quad tree of the input image in tabular form, using the memory arrays in the tree processor for storage of the table. This table can hold one picture frame at a given time. Hence, for processing multiple picture frames the data transfer processor is used to transfer their respective quad trees from the tree processor memory to the main memory. An algorithm is developed to facilitate the determination of the connections in the circuit.

APA, Harvard, Vancouver, ISO, and other styles

28

Su, Chun-Yi. "Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory Systems." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/51255.

Full text

Abstract:

By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

29

Guney, Murat Efe. "High-performance direct solution of finite element problems on multi-core processors." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34662.

Full text

Abstract:

A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.

APA, Harvard, Vancouver, ISO, and other styles

30

Weston, Mindy. "The Right to Be Forgotten: Analyzing Conflicts Between Free Expression and Privacy Rights." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/6453.

Full text

Abstract:

As modern technology continues to affect civilization, the issue of electronic rights grows in a global conversation. The right to be forgotten is a data protection regulation specific to the European Union but its consequences are creating an international stir in the fields of mass communication and law. Freedom of expression and privacy rights are both founding values of the United States which are protected by constitutional amendments written before the internet also changed those fields. In a study that analyzes the legal process of when these two fundamental values collide, this research offers insight into both personal and judicial views of informational priority. This thesis conducts a legal analysis of cases that cite the infamous precedents of Melvin v. Reid and Sidis v. F-R Pub. Corp., to examine the factors on which U.S. courts of law determinewhether freedom or privacy rules.

APA, Harvard, Vancouver, ISO, and other styles

31

Huo, Jiale. "On testing concurrent systems through contexts of queues." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=102987.

Full text

Abstract:

Concurrent systems, including asynchronous circuits, computer networks, and multi-threaded programs, have important applications, but they are also very complex and expensive to test. This thesis studies how to test concurrent systems through contexts consisting of queues. Queues, modeling buffers and communication delays, are an integral part of the test settings for concurrent systems. However, queues can also distort the behavior of the concurrent system as observed by the tester, so one should take into account the queues when defining conformance relations or deriving tests. On the other hand, queues can cause state explosion, so one should avoid testing them if they are reliable or have already been tested. To solve these problems, we propose two different solutions. The first solution is to derive tests using some test selection criteria such as test purposes, fault coverage, and transition coverage. The second solution is to compensate for the problems caused by the queues so that testers do not discern the presence of the queues in the first place. Unifying the presentation of the two solutions, we consider in a general testing framework partial specifications, various contexts, and a hierarchy of conformance relations. Case studies on test derivation for asynchronous circuits, communication protocols, and multi-threaded programs are presented to demonstrate the applications of the results.

APA, Harvard, Vancouver, ISO, and other styles

32

Hayes, Timothy. "Novel vector architectures for data management." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/397645.

Full text

Abstract:

As the rate of annual data generation grows exponentially, there is a demand to manage, query and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased, and further performance must come from exploiting parallelism. Vector architectures offer a highly efficient and scalable way of exploiting data-level parallelism (DLP) through sophisticated single instruction-multiple data (SIMD) instruction sets. Traditionally, vector machines were used to accelerate scientific workloads rather than business-domain applications. In this thesis, we design innovative vector extensions for a modern superscalar microarchitecture that are optimised for data management workloads. Based on extensive analysis of these workloads, we propose new algorithms, novel instructions and microarchitectural optimisations. We first profile a leading commercial decision support system to better understand where the execution time is spent. We find that the hash join operator is responsible for a significant portion of the time. Based on our profiling, we develop lightweight integer-based pipelined vector extensions to capture the DLP in the operator. We then proceed to implement and evaluate these extensions using a custom simulation framework based on PTLsim and DRAMSim2. We motivate key design decisions based on the structure of the algorithm and compare these choices against alternatives experimentally. We discover that relaxing the base architecture's memory model is very beneficial when executing a vectorised implementation of the algorithm. This relaxed model serves as a powerful mechanism to execute indexed vector memory instructions out of order without requiring complex associative hardware. We find that our vectorised implementation shows good speedups. Furthermore, the vectorised version exhibits better scalability compared to the original scalar version run on a microarchitecture with larger superscalar and out-of-order structures. We then make a detailed study of SIMD sorting algorithms. Using our simulation framework, we evaluate the strengths, weaknesses and scalability of three diverse vectorised sorting algorithms- quicksort, bitonic mergesort and radix sort. We find that each of these algorithms has its unique set of bottlenecks. Based on these findings, we propose VSR sort- a novel vectorised non-comparative sorting algorithm that is based on radix sort but without its drawbacks. VSR sort, however, cannot be implemented directly with typical vector instructions due to the irregularity of its DLP. To facilitate the implementation of this algorithm, we define two new vector instructions and propose a complementary hardware structure for their execution. We find that VSR sort significantly outperforms each of the other vectorised algorithms. Next, we propose and evaluate five different ways of vectorising GROUP BY data aggregations. We find that although data aggregation algorithms are abundant in DLP, it is often too irregular to be expressed efficiently using typical vector instructions. By extending the hardware used for VSR sort, we propose a set of vector instructions and novel algorithms to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Finally, we evaluate the area, energy and power of these extensions using McPAT. Our results show that our proposed vector extensions come with a modest area overhead, even when using a large maximum vector length with lockstepped parallel lanes. Using sorting as a case study, we find that all of the vectorised algorithms consume much less energy than their scalar counterpart. In particular, our novel VSR sort requires an order of magnitude less energy than the scalar baseline. With respect to power, we discover that our vector extensions present a very reasonable increase in wattage.
El crecimiento exponencial de la ratio de creación de datos anual conlleva asociada una demanda para gestionar, consultar y resumir cantidades enormes de información rápidamente. En el pasado, se confiaba en el escalado de la frecuencia de los procesadores para incrementar el rendimiento. Hoy en día los incrementos de rendimiento deben conseguirse mediante la explotación de paralelismo. Las arquitecturas vectoriales ofrecen una manera muy eficiente y escalable de explotar el paralelismo a nivel de datos (DLP, por sus siglas en inglés) a través de sofisticados conjuntos de instrucciones "Single Instruction-Multiple Data" (SIMD). Tradicionalmente, las máquinas vectoriales se usaban para acelerar aplicaciones científicas y no de negocios. En esta tesis diseñamos extensiones vectoriales innovadoras para una microarquitectura superescalar moderna, optimizadas para tareas de gestión de datos. Basándonos en un extenso análisis de estas aplicaciones, también proponemos nuevos algoritmos, instrucciones novedosas y optimizaciones en la microarquitectura. Primero, caracterizamos un sistema comercial de soporte de decisiones. Encontramos que el operador "hash join" es responsable de una porción significativa del tiempo. Basándonos en nuestra caracterización, desarrollamos extensiones vectoriales ligeras para datos enteros, con el objetivo de capturar el paralelismo en este operandos. Entonces implementos y evaluamos estas extensiones usando un simulador especialmente adaptado por nosotros, basado en PTLsim y DRAMSim2. Descubrimos que relajar el modelo de memoria de la arquitectura base es altamente beneficioso, permitiendo ejecutar instrucciones vectoriales de memoria indexadas, fuera de orden, sin necesitar hardware asociativo complejo. Encontramos que nuestra implementación vectorial consigue buenos incrementos de rendimiento. Seguimos con la realización de un estudio detallado de algoritmos de ordenación SIMD. Usando nuestra infraestructura de simulación, evaluamos los puntos fuertes y débiles así como la escalabilidad de tres algoritmos vectorizados de ordenación diferentes quicksort, bitonic mergesort y radix sort. A partir de este análisis, proponemos "VSR sort" un nuevo algoritmo de ordenación vectorizado, basado en radix sort pero sin sus limitaciones. Sin embargo, VSR sort no puede ser implementado directamente con instrucciones vectoriales típicas, debido a la irregularidad de su DLP. Para facilitar la implementación de este algoritmo, definimos dos nuevas instrucciones vectoriales y proponemos una estructura hardware correspondiente. VSR sort consigue un rendimiento significativamente más alto que los otros algoritmos. A continuación, proponemos y evaluamos cinco maneras diferentes de vectorizar agregaciones de datos "GROUP BY". Encontramos que, aunque los algoritmos de agregación de datos tienen DLP abundante, frecuentemente este es demasiado irregular para ser expresado eficientemente usando instrucciones vectoriales típicas. Mediante la extensión del hardware usado para VSR sort, proponemos un conjunto de instrucciones vectoriales y algoritmos para capturar mejor este DLP irregular. Finalmente, evaluamos el área, energía y potencia de estas extensiones usando McPAT. Nuestros resultados muestran que las extensiones vectoriales propuestas conllevan un aumento modesto del área del procesador, incluso cuando se utiliza una longitud vectorial larga con varias líneas de ejecución vectorial paralelas. Escogiendo los algoritmos de ordenación como caso de estudio, encontramos que todos los algoritmos vectorizados consumen mucha menos energía que una implementación escalar. En particular, nuestro nuevo algoritmo VSR sort requiere un orden de magnitud menos de energía que el algoritmo escalar de referencia. Respecto a la potencia disipada, descubrimos que nuestras extensiones vectoriales presentan un incremento muy razonable

APA, Harvard, Vancouver, ISO, and other styles

33

Low, Douglas Wai Kok. "Network processor memory hierarchy designs for IP packet classification /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/6973.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Bousselham, Abdel Kader. "FPGA based data acquistion and digital pulse processing for PET and SPECT." Doctoral thesis, Stockholm University, Department of Physics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-6618.

Full text

Abstract:

The most important aspects of nuclear medicine imaging systems such as Positron Emission Tomography (PET) or Single Photon Emission Computed Tomography (SPECT) are the spatial resolution and the sensitivity (detector efficiency in combination with the geometric efficiency). Considerable efforts have been spent during the last two decades in improving the resolution and the efficiency by developing new detectors. Our proposed improvement technique is focused on the readout and electronics. Instead of using traditional pulse height analysis techniques we propose using free running digital sampling by replacing the analog readout and acquisition electronics with fully digital programmable systems.

This thesis describes a fully digital data acquisition system for KS/SU SPECT, new algorithms for high resolution timing for PET, and modular FPGA based decentralized data acquisition system with optimal timing and energy. The necessary signal processing algorithms for energy assessment and high resolution timing are developed and evaluated. The implementation of the algorithms in field programmable gate arrays (FPGAs) and digital signal processors (DSP) is also covered. Finally, modular decentralized digital data acquisition systems based on FPGAs and Ethernet are described.

APA, Harvard, Vancouver, ISO, and other styles

35

Kim, Jongmyon. "Architectural Enhancements for Color Image and Video Processing on Embedded Systems." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6948.

Full text

Abstract:

As emerging portable multimedia applications demand more and more computational throughput with limited energy consumption, the need for high-efficiency, high-throughput embedded processing is becoming an important challenge in computer architecture. In this regard, this dissertation addresses application-, architecture-, and technology-level issues in existing processing systems to provide efficient processing of multimedia in many, or ideally all, of its form. In particular, this dissertation explores color imaging in multimedia while focusing on two architectural enhancements for memory- and performance-hungry embedded applications: (1) a pixel-truncation technique and (2) a color-aware instruction set (CAX) for embedded multimedia systems. The pixel-truncation technique differs from previous techniques (e.g., 4:2:2 and 4:2:0 subsampling) used in image and video compression applications (e.g., JPEG and MPEG) in that it reduces the information content in individual pixel word sizes rather than in each dimension. Thus, this technique drastically reduces the bandwidth and memory required to transport and store color images without perceivable distortion in color. At the same time, it maintains the pixel storage format of color image processing in which each pixel computation is performed simultaneously on 3-D YCbCr components, which are widely used in the image and video processing community. CAX supports parallel operations on two-packed 16-bit (6:5:5) YCbCr data in a 32-bit datapath processor, providing greater concurrency and efficiency for processing color image sequences. This dissertation presents the impact of CAX on processing performance and on both area and energy efficiency for color imaging applications in three major processor architectures: dynamically scheduled (superscalar), statically scheduled (very long instruction word, VLIW), and embedded single instruction multiple data (SIMD) array processors. Unlike typical multimedia extensions, CAX obtains substantial performance and code density improvements through direct support for color data processing rather than depending solely on generic subword parallelism. In addition, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. In summary, CAX, coupled with the pixel-truncation technique, provides an efficient mechanism that meets the computational requirements and cost goals for future embedded multimedia products.

APA, Harvard, Vancouver, ISO, and other styles

36

El, Moussawi Ali Hassan. "SIMD-aware word length optimization for floating-point to fixed-point conversion targeting embedded processors." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S150/document.

Full text

Abstract:

Afin de limiter leur coût et/ou leur consommation électrique, certains processeurs embarqués sacrifient le support matériel de l'arithmétique à virgule flottante. Pourtant, pour des raisons de simplicité, les applications sont généralement spécifiées en utilisant l'arithmétique à virgule flottante. Porter ces applications sur des processeurs embarqués de ce genre nécessite une émulation logicielle de l'arithmétique à virgule flottante, qui peut sévèrement dégrader la performance. Pour éviter cela, l'application est converti pour utiliser l'arithmétique à virgule fixe, qui a l'avantage d'être plus efficace à implémenter sur des unités de calcul entier. La conversion de virgule flottante en virgule fixe est une procédure délicate qui implique des compromis subtils entre performance et précision de calcul. Elle permet, entre autre, de réduire la taille des données pour le coût de dégrader la précision de calcul. Par ailleurs, la plupart de ces processeurs fournissent un support pour le calcul vectoriel de type SIMD (Single Instruction Multiple Data) afin d'améliorer la performance. En effet, cela permet l'exécution d'une opération sur plusieurs données en parallèle, réduisant ainsi le temps d'exécution. Cependant, il est généralement nécessaire de transformer l'application pour exploiter les unités de calcul vectoriel. Cette transformation de vectorisation est sensible à la taille des données ; plus leurs tailles diminuent, plus le taux de vectorisation augmente. Il apparaît donc un compromis entre vectorisation et précision de calcul. Plusieurs travaux ont proposé des méthodologies permettant, d'une part la conversion automatique de virgule flottante en virgule fixe, et d'autre part la vectorisation automatique. Dans l'état de l'art, ces deux transformations sont considérées indépendamment, pourtant elles sont fortement liées. Dans ce contexte, nous étudions la relation entre ces deux transformations, dans le but d'exploiter efficacement le compromis entre performance et précision de calcul. Ainsi, nous proposons d'abord un algorithme amélioré pour l'extraction de parallélisme SLP (Superword Level Parallelism ; une technique de vectorisation). Puis, nous proposons une nouvelle méthodologie permettant l'application conjointe de la conversion de virgule flottante en virgule fixe et de l'exploitation du SLP. Enfin, nous implémentons cette approche sous forme d'un flot de compilation source-à-source complètement automatisé, afin de valider ces travaux. Les résultats montrent l'efficacité de cette approche, dans l'exploitation du compromis entre performance et précision, vis-à-vis d'une approche classique considérant ces deux transformations indépendamment
In order to cut-down their cost and/or their power consumption, many embedded processors do not provide hardware support for floating-point arithmetic. However, applications in many domains, such as signal processing, are generally specified using floating-point arithmetic for the sake of simplicity. Porting these applications on such embedded processors requires a software emulation of floating-point arithmetic, which can greatly degrade performance. To avoid this, the application is converted to use fixed-point arithmetic instead. Floating-point to fixed-point conversion involves a subtle tradeoff between performance and precision ; it enables the use of narrower data word lengths at the cost of degrading the computation accuracy. Besides, most embedded processors provide support for SIMD (Single Instruction Multiple Data) as a mean to improve performance. In fact, this allows the execution of one operation on multiple data in parallel, thus ultimately reducing the execution time. However, the application should usually be transformed in order to take advantage of the SIMD instruction set. This transformation, known as Simdization, is affected by the data word lengths ; narrower word lengths enable a higher SIMD parallelism rate. Hence the tradeoff between precision and Simdization. Many existing work aimed at provide/improving methodologies for automatic floating-point to fixed-point conversion on the one side, and Simdization on the other. In the state-of-the-art, both transformations are considered separately even though they are strongly related. In this context, we study the interactions between these transformations in order to better exploit the performance/accuracy tradeoff. First, we propose an improved SLP (Superword Level Parallelism) extraction (an Simdization technique) algorithm. Then, we propose a new methodology to jointly perform floating-point to fixed-point conversion and SLP extraction. Finally, we implement this work as a fully automated source-to-source compiler flow. Experimental results, targeting four different embedded processors, show the validity of our approach in efficiently exploiting the performance/accuracy tradeoff compared to a typical approach, which considers both transformations independently

APA, Harvard, Vancouver, ISO, and other styles

37

Chai, Sek Meng. "Real time image processing on parallel arrays for gigascale integration." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15513.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Stanić, Milan. "Design of energy-efficient vector units for in-order cores." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/405647.

Full text

Abstract:

In the last 15 years, power dissipation and energy consumption have become crucial design concerns for almost all computer systems. Technology feature size scaling leads to higher power density and therefore to complex and costly cooling. While power dissipation is critical for high-performance systems such as data centers due to large power usage, for mobile systems battery life is a primary concern. In the low-end mobile processor market, power, energy and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. The ultimate goal in low-end systems is also to increase performance, but only if area/energy budget is not compromised. Vector architectures have been traditionally applied to the supercomputing domain with many successful incarnations. The energy efficiency and high performance of vector processors, as well as their applicability in other emerging domains, encourage pursuing further research on vector architectures. However adding support for them using conventional design incurs area and power overheads that would not be acceptable for low-end mobile processors and also there is a lack of appropriate tools to perform this research. In this thesis, we propose an integrated vector-scalar design for the low-power ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We complement this with an advanced integrated design which features three energy-performance efficient ideas: (1) chaining from the memory hierarchy, (2) direct result forwarding and (3) memory shape instructions. This thesis also presents two tools for measuring and analyzing an application suitability for vector microarchitectures. The first tool is VALib, a library that enables hand-crafted vectorization of applications and its main purpose is to collect data for detailed instruction level characterization and to generate input traces for the second tool. The second tool is SimpleVector, a fast trace-driven simulator that is used to estimate the execution time of a vectorized application on a candidate vector microarchitecture. The thesis also evaluates characteristics of Knights Corner processor with simple in-order SIMD cores. Acquired knowledge is applied in the integrated design.
En los últimos 15 años, la potencia disipada y el consumo de energía se han convertido en elementos cruciales del diseño de la práctica totalidad de sistemas de computación. El escalado del tamaño de los transistores conlleva densidades de potencia más altas y, en consecuencia, sistemas de refrigeración más complejos y costosos. Mientras que la potencia disipada es crítica para sistemas de alto rendimiento, como por ejemplo centros de datos, debido a su uso de gran potencia, para sistemas móviles la duración de la batería es la preocupación principal. Para el mercado de procesadores móviles de prestaciones más modestas, los límites permitidos para la potencia, energía y área del chip son significativamente más bajas que para los servidores, ordenadores de sobremesa, portátiles o móviles de gama alta. El objetivo final en sistemas de gama baja es igualmente el de incrementar el rendimiento, pero sólo si el "presupuesto" para energía o área no se ve comprometido. Tradicionalmente, las arquitecturas vectoriales han sido usadas en el ámbito de la supercomputación, con diversas implementaciones exitosas. La eficiencia energética y el alto rendimiento de los procesadores vectoriales, así como que se puedan aplicar a ámbitos emergentes, motivan a continuar la investigación en arquitecturas vectoriales. No obstante, añadir soporte paravectores basado en diseños convencionales conlleva incrementos de potencia y área que no son aceptables para procesadores móviles de gama baja. Además, no existen herramientas apropiadas para realizar esta investigación. En esta tesis, proponemos un diseño integrado vectorial-escalar para arquitecturas ARM de bajo consumo, que principalmente reutiliza el hardware escalar ya presente en el procesador para implementar el soporte de ejecución de instrucciones vectoriales. El elemento clave del diseño es nuestro modelo de ejecución por bloques propuesto en la tesis, que agrupa instrucciones de cómputo vectorial para ejecutarlas de manera coordinada. Complementamos esto con un diseño integrado avanzado que implementa tres ideas para incrementar el rendimiento eficientemente en cuanto a la energía consumida: (1) encadenamiento (chaining) desde la jerarquía de memoria, (2) reenvío (forwarding) directo de los resultados, y (3) instrucciones de memoria "shape", con patrones de acceso complejos. Además, esta tesis presenta dos herramientas para medir y analizar lo apropiado de usar microarquitecturas vectoriales para una aplicación. La primera herramienta es VALib, una biblioteca que permite la vectorización manual de aplicaciones, cuyo propósito principal es el de recolectar datos para una caracterización detallada a nivel de instrucción, así como el de generar trazas para la segunda herramienta, SimpleVector. SimpleVector es un simulador rápido basado en trazas que estima el tiempo de ejecución de una aplicación vectorial en la microarquitectura vectorial candidata. Finalmente, la tesis también evalúa las características del procesador Knight's Corner, con unidades SIMD en orden sencillas. Lo aprendido en estos análisis se ha aplicado en el diseño integrado.

APA, Harvard, Vancouver, ISO, and other styles

39

McKenzie, Donald John. "An investigation of the effects which using the word processor has on the writing of standard six pupils." Thesis, Rhodes University, 1994. http://hdl.handle.net/10962/d1003531.

Full text

Abstract:

In order to discover to what extent the use of the word processor affects the motivation of high school students when engaged in writing tasks, and to determine the effects of the word processing on the length and quality of their work and editing, two groups, carefully matched in terms of prior computer experience, intelligence and language ability were given eight writing tasks. The test group used word processors while the control group used pen and paper. Their behaviour was closely observed and their writing was subsequently compared. It was found that while the test group were more motivated and spent longer both writing and editing their work, the quality of the work of both groups was similar. The degree of editing was greater for the test group. The conclusion is that there is a place for the use of the word processor in the English classroom, but specific strategies need to be developed to optimise its benefits.

APA, Harvard, Vancouver, ISO, and other styles

40

Colombet, Quentin. "Decoupled (SSA-based) register allocators : from theory to practice, coping with just-in-time compilation and embedded processors constraints." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2012. http://tel.archives-ouvertes.fr/tel-00764405.

Full text

Abstract:

My thesis deals with register allocation. During this phase, the compiler has to assign variables of the source program, in an arbitrary big number, to actual registers of the processor, in a limited number k. Recent works, for instance the thesis of F. Bouchez and S. Hack, have shown that it is possible to split in two different decoupled step this phase: the spill - store the variables into memory to release registers - followed by the registers assignment. These works demonstrate the feasibility of this decoupling relying on a theoretic framework and some assumptions. In particular, it is sufficient to ensure after the spill step that the number of variables simultaneously live is below k.My thesis follows these works by showing how to apply this kind of approach when real-world constraints come in play: instructions encoding, ABI (application binary interface), register aliasing. Different approaches are proposed. They allow either to ignore these problems or to directly tackle them into the theoretic framework. The hypothesis of the models and the proposed solutions are evaluated and validated using a thorough experimental study with the compiler of STMicroelectronics. Finally, all these works have been done with the constraints of modern compilers in mind, the JIT (just-in-time) compilation, where the compilation time et the memory footprint of the compiler are key factors. We strive to offer solutions that cope with these criteria or improve the result until a given budget is reached. We, in particular, used the SSA (static single assignment) form to define algorithm like tree scan that generalizes linear scan based approaches proposed for JIT compilation.

APA, Harvard, Vancouver, ISO, and other styles

41

Meyer, Andreas, Sergey Smirnov, and Mathias Weske. "Data in business processes." Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2011/5304/.

Full text

Abstract:

Process and data are equally important for business process management. Process data is especially relevant in the context of automated business processes, process controlling, and representation of organizations' core assets. One can discover many process modeling languages, each having a specific set of data modeling capabilities and the level of data awareness. The level of data awareness and data modeling capabilities vary significantly from one language to another. This paper evaluates several process modeling languages with respect to the role of data. To find a common ground for comparison, we develop a framework, which systematically organizes process- and data-related aspects of the modeling languages elaborating on the data aspects. Once the framework is in place, we compare twelve process modeling languages against it. We generalize the results of the comparison and identify clusters of similar languages with respect to data awareness.
Prozesse und Daten sind gleichermaßen wichtig für das Geschäftsprozessmanagement. Prozessdaten sind dabei insbesondere im Kontext der Automatisierung von Geschäftsprozessen, dem Prozesscontrolling und der Repräsentation der Vermögensgegenstände von Organisationen relevant. Es existieren viele Prozessmodellierungssprachen, von denen jede die Darstellung von Daten durch eine fest spezifizierte Menge an Modellierungskonstrukten ermöglicht. Allerdings unterscheiden sich diese Darstellungenund damit der Grad der Datenmodellierung stark untereinander. Dieser Report evaluiert verschiedene Prozessmodellierungssprachen bezüglich der Unterstützung von Datenmodellierung. Als einheitliche Grundlage entwickeln wir ein Framework, welches prozess- und datenrelevante Aspekte systematisch organisiert. Die Kriterien legen dabei das Hauptaugenmerk auf die datenrelevanten Aspekte. Nach Einführung des Frameworks vergleichen wir zwölf Prozessmodellierungssprachen gegen dieses. Wir generalisieren die Erkenntnisse aus den Vergleichen und identifizieren Cluster bezüglich des Grades der Datenmodellierung, in welche die einzelnen Sprachen eingeordnet werden.

APA, Harvard, Vancouver, ISO, and other styles

42

Bharadwaj, V. "Distributed Computation With Communication Delays: Design And Analysis Of Load Distribution Strategies." Thesis, Indian Institute of Science, 1994. http://hdl.handle.net/2005/161.

Full text

Abstract:

Load distribution problems in distributed computing networks have attracted much attention in the literature. A major objective in these studies is to distribute the processing load so as to minimize the time of processing of the entire load. In general, the processing load can be indivisible or divisible. An indivisible load has to be processed in its entirety on a single processor. On the other hand, a divisible load can be partitioned and processed on more than one processor. Divisible loads are either modularly divisible or arbitrarily divisible. Modularly divisible loads can be divided into pre-defined modules and cannot be further sub-divided. Further, precedence relations between modules may exist. Arbitrarily divisible loads can be divided into several fractions of arbitrary lengths which usually do not have any precedence relations. Such type of loads are characterized by their large volume and the property that each data element requires an identical and independent processing. One of the important problems here is to obtain an optimal load distribution, which minimizes the processing time when the distribution is subject to communication delays in the interconnecting links. A specific application in which such loads are encountered is in edge-detection of images. Here the given image frame can be arbitrarily divided into many sub-frames and each of these can be independently processed. Other applications include processing of massive experimental data. The problems associated with the distribution of such arbitrarily divisible loads are usually analysed in the framework of what is known as divisible job theory. The research work reported in this thesis is a contribution in the area of distributing arbitrarily divisible loads in distributed computing systems subject to communication delays. The main objective in this work is to design and analyseload distribution strategies to minimize the processing time of the entire load in a given network. Two types of networks are considered, namely (i) single-level tree (or star) network and (ii) linear network. In both the networks we assume that there is a non-zero delay associated with load transfer. Further, the processors in the network may or may not be equipped with front-ends (Le., communication co-processors). The main contributions in this thesis are summarized below. First, a mathematical formulation of the load distribution problem in single-level tree and linear networks is presented. In both the networks, it is assumed that there are (m +1) processors and m communication links. In the case of single-level tree networks, the load to be processed is assumed to originate at the root processor, which divides the load into (m +1) fractions, keeps its own share of the load for processing, and distributes the rest to the child processors one at a time and in a fixed sequence. In all the earlier studies in the literature, it had been assumed that for a load distribution to be optimal, it should be such that all the processors must stop computing at the same time. In this thesis, it is shown that this assumption is in general not true, and holds only for a restricted class of single-level tree networks which satisfy a certain condition. The concept of an equivalent network is introduced to obtain a precise formulation of this condition in terms of the processor and link speed parameters. It is shown that this condition can be used to identify processor-link pairs which can be eliminated from a given network (i.e., these processors need not be given any computational load) without degrading its time performance. It is proved that the resultant reduced network (a network from which these inefficient processor-link pairs have been removed) gives the optimal time performance if and only if the load distribution is such that all the processors stop computing at the same time instant. These results are first proved for the case when the root processor is equipped with a front-end and then extended to the case when it is not. In the latter case, an additional condition, between the speed of the root processor and the speed of each of the links, to be satisfied by the network is specified. An optimal sequence for applying these conditions is also obtained. In the case of linear networks the processing load is assumed to originate at the processor situated at one end of the network. Each processor in the network keeps its own load fraction for computing and transmits the rest to its successor. Here too, in all the earlier studies in the literature, it has been assumed that for the processing time to be a minimum, the load distribution must be such that all the processors must stop computing at the same instant in time. Though this condition has been proved by others to be both necessary and sufficient, a different and more rigorous proof, similar to the case of single-level tree network, is presented here. Finally, the effect of inaccurate modelling on the processing time and on the above conditions are discussed through an illustrative example and it is shown that the model adopted in this thesis gives reasonably accurate results. In the case of single-level tree networks, so far it has been assumed that the root processor distributes the processing load in a fixed sequence. However, since there are m child processors, a total of m! different sequences of load distribution are possible. Using the closed-form derived for the processing time, it is proved here that the optimal sequence of load distribution follows the decreasing order of link speeds. Further, if physical rearrangement of processors and links is allowed, then it is shown that the optimal arrangement follows a decreasing order of link and processor speeds with the fastest processor at the root. The entire analysis is first done for the case when the root processor is equipped with a front-end, and then extended to the case when it is not. In the without front-end case, it is shown that the same optimal sequencing result holds. However, in an optimal arrangement, the root processor need not be the fastest. In this case an algorithm has been proposed for obtaining optimal arrangement. Illustrative examples are given for all the cases considered. Next, a new strategy of load distribution is proposed by which the processing time obtained in earlier studies can be further minimized. Here the load is distributed by the root processor to a child processor in more than one installment (instead of in a single installment) such that the processing time is further minimized. First; the case in which all the processors are equipped :tn front-ends is considered. Recursive equations are obtained for a heterogeneous network and these are solved for the special case of a homogeneous network (having identical processors and identical links). Using this closed-form solution, the ultimate limits of performance are explored through an asymptotic analysis with respect to the number of installments and number of processors in the network. Trade-off relationships between the number of installments and the number of processors in the network are also presented. These results are then extended to the case when the processors are not equipped with front-ends. Finally, the efficiency of this new strategy of load distribution is demonstrated by comparing it with the existing single-installment strategy in the literature. The multi-installment strategy explained above is then applied to linear net-As. Here, .the processing load is assumed to originate at one extreme end of the network, First the case when all the processors are equipped with front-ends is considered. Recursive equations for a heterogeneous network are obtained and these are solved for the special case of a homogeneous network. Using this closed form solution, an asymptotic analysis is performed with respect to the number of installments. However, the asymptotic results with respect to the number of processors was obtained computationally since analytical results could not be obtained. It is found that for a given network, once the number of installments is fixed, there is an optimum number of processors to be used in the network, beyond which the time performance degrades. Trade-off relationships between the number of installments and the number of processors is obtained. These results are then extended to the case when the processors are not equipped with front-ends. Comparisions with the existing single-installment strategy is also done. The single-installment strategy discussed in the literature has the disadvantage that the front-ends of the processors are not utilized efficiently in a linear network. This is due to the fact that a processor starts computing its own load fraction only after the entire load to be communicated through its front-end has been received. In this thesis, a new strategy is proposed in which a processor starts computing as soon as it receives its load fraction, simultaneously allowing its front-end to receive and transmit load to its successors. Recursive equations are developed and solved for the special case of a heterogeneous network in which the processors and links are arranged in the decreasing order of speeds. Further, it is shown that in this strategy, if the processing load originates in the interior of the network, the sequence of load distribution should- be such that the load should be first distributed to the side with a lesser number of processors. An expression for the optimal load origination point in the network is derived. A comparative study of this strategy with an earlier strategy is also presented. Finally, it is shown that even though the analysis is carried out for a special case of a heterogeneous network, this load distribution strategy can also be applied to a linear network in which the processors and links are arbitrarily arranged and still obtain a significant improvement in the time performance.

APA, Harvard, Vancouver, ISO, and other styles

43

Henriksson, Tomas. "Intra-packet data-flow protocol processor /." Linköping : Univ, 2003. http://www.bibl.liu.se/liupubl/disp/disp2003/tek813s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Bai, Shuanghua. "Data reconciliation for dynamic processes." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/29279.

Full text

Abstract:

In a modern chemical plant, the implementation of a distributed control system leads to a large number of measurements that are available online for process monitoring, control, optimization and management decision making. Unfortunately, these measurements often contain errors that degrade the information quality obtained from the raw data. This thesis is dedicated to the development of dynamic data reconciliation (DDR) algorithms for the optimal estimation of variables in dynamic processes. More importantly, the DDR algorithms were implemented within the structures of feedback control loops, and the performance of the DDR algorithms as well as the controllers was quantitatively assessed via a series of process simulations. The DDR algorithms, acting as digital filters, were compared to commonly used filters, such as the exponentially weighted moving average (EWMA), moving average (MA), Kalman and extended Kalman filters. Methodologies to use the DDR algorithms to deal with autocorrelated noise were also investigated. The DDR algorithms integrate information from both measurements and process dynamic models such that, at each sampling time, the estimates obtained by the DDR algorithms provide more precise representations of the current state of the process. Three DDR algorithms were developed, namely, nonlinear programming (NLP) based DDR, predictor-corrector based DDR, and autoassociative neural network (AANN) based DDR. Evaluations of these DDR algorithms were conducted via simulations of three chemical processes, namely a cylindrical storage tank, a spherical storage tank and a binary distillation column. Results demonstrated that the DDR algorithms are efficient and effective tools for the estimation of dynamic processes. They perform significantly better than the EWMA and MA filters. Furthermore, compared to the Kalman filter, the DDR algorithm is easier to understand and to implement. Studies also showed that the structure of process models has considerable impact on the performance of the DDR. The use of the DDR algorithms embedded in feedback control loops significantly enhanced the controller performance. For example, the cost function of the control system in the distillation column was reduced by 28∼39% when linear, adaptive linear and nonlinear DDR algorithms were used. The cost function of the controller in the cylindrical storage tank was reduced by 46% using DDR, while it was reduced by 33% when using a EWMA filter.

APA, Harvard, Vancouver, ISO, and other styles

45

Jasovský, Filip. "Realizace superpočítače pomocí grafické karty." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-220617.

Full text

Abstract:

This master´s thesis deals with realization of supercomputer using graphic card with CUDA technology. The theoretical part of this thesis describes the function and the possibility of graphic cards and desktop computers and processes taking place in the proces sof calculations on them. The practical part deals with creation system for calculations on the graphic card using the algorithm of artificial intelligence, more specifically artificial neural networks. Subsequently is the generated program used for data classification of large input data file. Finally the results are compared.

APA, Harvard, Vancouver, ISO, and other styles

46

Lari, Kamran A. "Sparse data estimation for knowledge processes." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=86073.

Full text

Abstract:

During recent years, industry has increasingly focused on knowledge processes. Similar to traditional or manufacturing processes, knowledge processes need to be managed and controlled in order to provide the expected results for which they were designed. During the last decade, the principals of process management have evolved, especially through work done in software engineering and workflow management.
Process monitoring is one of the major components for any process management system. There have been efforts to design process control and monitoring systems; however, no integrated system has yet been developed as a "generic intelligent system shell". In this dissertation, an architecture for an integrated process monitoring system (IPMS) is developed, whereby the end-to-end activities of a process can be automatically measured and evaluated. In order to achieve this goal, various components of the IPMS and the interrelationship among these components are designed.
Furthermore, a comprehensive study on the available methodologies and techniques revealed that sparse data estimation (SDE) is the key component of the IPMS which does not yet exist. Consequently, a series of algorithms and methodologies are developed as the basis for the sparse data estimation of knowledge based processes. Finally, a series of computer programs demonstrate the feasibility and functionality of the proposed approach when applied to a sample process. The sparse data estimation method is successful for not only knowledge based processes, but also for any process, and indeed for any set of activities that can be modeled as a network.

APA, Harvard, Vancouver, ISO, and other styles

47

Lozano, Albalate Maria Teresa. "Data Reduction Techniques in Classification Processes." Doctoral thesis, Universitat Jaume I, 2007. http://hdl.handle.net/10803/10479.

Full text

Abstract:

The learning process consists of different steps: building a Training Set (TS), training the system, testing its behaviour and finally classifying unknown objects. When using a distance based rule as a classifier, i.e. 1-Nearest Neighbour (1-NN), the first step (building a training set) includes editing and condensing data. The main reason for that is that the rules based on distance need many time to classify each unlabelled sample, x, as each distance from x to each point in the training set should be calculated. So, the more reduced the training set, the shorter the time needed for each new classification process. This thesis is mainly focused on building a training set from some already given data, and specially on condensing it; however different classification techniques are also compared.
The aim of any condensing technique is to obtain a reduced training set in order to spend as few time as possible in classification. All that without a significant loss in classification accuracy. Some
new approaches to training set size reduction based on prototypes are presented. These schemes basically consist of defining a small number of prototypes that represent all the original instances. That includes those approaches that select among the already existing examples (selective condensing algorithms), and those which generate new representatives (adaptive condensing algorithms).
Those new reduction techniques are experimentally compared to some traditional ones, for data represented in feature spaces. In order to test them, the classical 1-NN rule is here applied. However, other classifiers (fast classifiers) have been considered here, as linear and quadratic ones constructed in dissimilarity spaces based on prototypes, in order to realize how editing and condensing concepts work for this different family of classifiers.
Although the goal of the algorithms proposed in this thesis is to obtain a strongly reduced set of representatives, the performance is empirically evaluated over eleven real data sets by comparing not only the reduction rate but also the classification accuracy with those of other condensing techniques. Therefore, the ultimate aim is not only to find a strongly reduced set, but also a balanced one.
Several ways to solve the same problem could be found. So, in the case of using a rule based on distance as a classifier, not only the option of reducing the training set can be afford. A different family of approaches consists of applying several searching methods. Therefore, results obtained by the use of the algorithms here presented are compared in terms of classification accuracy and time, to several efficient search techniques.
Finally, the main contributions of this PhD report could be briefly summarised in four principal points. Firstly, two selective algorithms based on the idea of surrounding neighbourhood. They obtain better results than other algorithms presented here, as well as better than other traditional schemes. Secondly, a generative approach based on mixtures of Gaussians. It presents better results in classification accuracy and size reduction than traditional adaptive algorithms, and similar to those of the LVQ. Thirdly, it is shown that classification rules other than the 1-NN can be used, even leading to better results. And finally, it is deduced from the experiments carried on, that with some databases (as the ones used here) the approaches here presented execute the classification processes in less time that the efficient search techniques.

APA, Harvard, Vancouver, ISO, and other styles

48

Hoenicke, Jochen. "Combination of processes, data, and time /." Oldenburg : Univ., Fak. II, Dep. für Informatik, 2006. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=014970023&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Tusche, Marco. "Empirical processes of multiple mixing data." Thesis, Tours, 2013. http://www.theses.fr/2013TOUR4033/document.

Full text

Abstract:

Cette thèse étudie la convergence en loi des processus empiriques de données à mélange multiple. Son contenu correspond aux articles : Durieu et Tusche (2012), Dehling, Durieu, et Tusche (2012), et Dehiing, Durieu et Tusche (2013). Nous suivons l’approche par approximation introduite dans Dehling, Durieu, et Vo1n (2009) et Dehling and Durieu (2011), qui ont établi des théorèmes limite centraux empiriques pour des variables aléatoires dépendants à valeurs dans R ou RAd, respectivement. En développant leurs techniques, nous généralisons leurs résultats à des espaces arbitraires et à des processus empiriques indexés par des classes de fonctions. De plus, nous étudions des processus empiriques séquentiels. Nos résultats s’appliquent aux chaînes de Markov B-géométriquement ergodiques, aux modèles itératifs lipschitziens, aux systèmes dynamiques présentant un trou spectral pour l’opérateur de Perron-Frobenius associé, ou encore, aux automorphismes du tore. Nous établissons des conditions garantissant la convergence du processus empirique de tels modèles vers un processus gaussien
The present thesis studies weak convergence of empirical processes of multiple mixing data. It is based on the articles Durieu and Tusche (2012), Dehling, Durieu, and Tusche (2012), and Dehling, Durieu, and Tusche (2013). We follow the approximating class approach introduced by Dehling, Durieu, and Voln (2009)and Dehling and Durieu (2011), who established empirical central limit theorems for dependent R- and R”d-valued random variables, respectively. Extending their technique, we generalize their results to arbitrary state spaces and to empirical processes indexed by classes of functions. Moreover we study sequential empirical processes. Our results apply to B-geometrically ergodic Markov chains, iterative Lipschitz models, dynamical systems with a spectral gap on the Perron—Frobenius operator, and ergodic toms automorphisms. We establish conditions under which the empirical process of such processes converges weakly to a Gaussian process

APA, Harvard, Vancouver, ISO, and other styles

50

Malayattil, Sarosh Aravind. "Design of a Multibus Data-Flow Processor Architecture." Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/31379.

Full text

Abstract:

General purpose microcontrollers have been used as computational elements in various spheres of technology. Because of the distinct requirements of specific application areas, however, general purpose microcontrollers are not always the best solution. There is a need for specialized processor architectures for specific application areas. This thesis discusses the design of such a specialized processor architecture targeted towards event driven sensor applications. This thesis presents an augmented multibus dataflow processor architecture and an automation framework suitable for executing a range of event driven applications in an energy efficient manner. The energy efficiency of the multibus processor architecture is demonstrated by comparing the energy usage of the architecture with that of a PIC12F675 microcontroller.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data processors'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles