To see the other types of publications on this topic, follow the link: Analytics Computing.

Dissertations / Theses on the topic 'Analytics Computing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Analytics Computing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Singh, Vivek Kumar. "Essays on Cloud Computing Analytics." Scholar Commons, 2019. https://scholarcommons.usf.edu/etd/7943.

Full text
Abstract:
This dissertation research focuses on two key aspects of cloud computing research – pricing and security using data-driven techniques such as deep learning and econometrics. The first dissertation essay (Chapter 1) examines the adoption of spot market in cloud computing and builds IT investment estimation models for organizations adopting cloud spot market. The second dissertation essay (Chapter 2 and 3) studies proactive threat detection and prediction in cloud computing. The final dissertation essay (Chapter 4) develops a secured cloud files system which protects organizations using cloud computing in accidental data leaks.
APA, Harvard, Vancouver, ISO, and other styles
2

Chakrabarti, Aniket. "Scaling Analytics via Approximate and Distributed Computing." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1500473400586782.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Le, Quoc Do. "Approximate Data Analytics Systems." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-234219.

Full text
Abstract:
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications.
APA, Harvard, Vancouver, ISO, and other styles
4

Katzenbach, Alfred, and Holger Frielingsdorf. "Big Data Analytics für die Produktentwicklung." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-214517.

Full text
Abstract:
Aus der Einleitung: "Auf der Hannovermesse 2011 wurde zum ersten Mal der Begriff "Industrie 4.0" der Öffentlichkeit bekannt gemacht. Die Akademie der Technikwissenschaften hat in einer Arbeitsgruppe diese Grundidee der vierten Revolution der Industrieproduktion weiterbearbeitet und 2013 in einem Abschlussbericht mit dem Titel „Umsetzungsempfehlungen für das Zukunftsprojekt Industrie 4.0“ veröffentlicht (BmBF, 2013). Die Grundidee besteht darin, wandlungsfähige und effiziente Fabriken unter Nutzung moderner Informationstechnologie zu entwickeln. Basistechnologien für die Umsetzung der intelligenten Fabriken sind: — Cyber-Physical Systems (CPS) — Internet of Things (IoT) und Internet of Services (IoS) — Big Data Analytics and Prediction — Social Media — Mobile Computing Der Abschlussbericht fokussiert den Wertschöpfungsschritt der Produktion, während die Fragen der Produktentwicklung weitgehend unberücksichtigt geblieben sind. Die intelligente Fabrik zur Herstellung intelligenter Produkte setzt aber auch die Weiterentwicklung der Produktentwicklungsmethoden voraus. Auch hier gibt es einen großen Handlungsbedarf, der sehr stark mit den Methoden des „Modellbasierten Systems-Engineering“ einhergeht. ..."
APA, Harvard, Vancouver, ISO, and other styles
5

Flatt, Taylor. "CrowdCloud: Combining Crowdsourcing with Cloud Computing for SLO Driven Big Data Analysis." OpenSIUC, 2017. https://opensiuc.lib.siu.edu/theses/2234.

Full text
Abstract:
The evolution of structured data from simple rows and columns on a spreadsheet to more complex unstructured data such as tweets, videos, voice, and others, has resulted in a need for more adaptive analytical platforms. It is estimated that upwards of 80% of data on the Internet today is unstructured. There is a drastic need for crowdsourcing platforms to perform better in the wake of the tsunami of data. We investigated the employment of a monitoring service which would allow the system take corrective action in the event the results were trending in away from meeting the accuracy, budget, and time SLOs. Initial implementation and system validation has shown that taking corrective action generally leads to a better success rate of reaching the SLOs. Having a system which can dynamically adjust internal parameters in order to perform better can lead to more harmonious interactions between humans and machine algorithms and lead to more efficient use of resources.
APA, Harvard, Vancouver, ISO, and other styles
6

Rossi, Tisbeni Simone. "Big data analytics towards predictive maintenance at the INFN-CNAF computing centre." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18430/.

Full text
Abstract:
La Fisica delle Alte Energie (HEP) è da lungo tra i precursori nel gestire e processare enormi dataset scientifici e nell'operare alcuni tra i più grandi data centre per applicazioni scientifiche. HEP ha sviluppato una griglia computazionale (Grid) per il calcolo al Large Hadron Collider (LHC) del CERN di Ginevra, che attualmente coordina giornalmente le operazioni di calcolo su oltre 800k processori in 170 centri di calcolo e gestendo mezzo Exabyte di dati su disco distribuito in 5 continenti. Nelle prossime fasi di LHC, soprattutto in vista di Run-4, il quantitativo di dati gestiti dai centri di calcolo aumenterà notevolmente. In questo contesto, la HEP Software Foundation ha redatto un Community White Paper (CWP) che indica il percorso da seguire nell'evoluzione del software moderno e dei modelli di calcolo in preparazione alla fase cosiddetta di High Luminosity di LHC. Questo lavoro ha individuato in tecniche di Big Data Analytics un enorme potenziale per affrontare le sfide future di HEP. Uno degli sviluppi riguarda la cosiddetta Operation Intelligence, ovvero la ricerca di un aumento nel livello di automazione all'interno dei workflow. Questo genere di approcci potrebbe portare al passaggio da un sistema di manutenzione reattiva ad uno, più evoluto, di manutenzione predittiva o addirittura prescrittiva. La tesi presenta il lavoro fatto in collaborazione con il centro di calcolo dell'INFN-CNAF per introdurre un sistema di ingestione, organizzazione e processing dei log del centro su una piattaforma di Big Data Analytics unificata, al fine di prototipizzare un modello di manutenzione predittiva per il centro. Questa tesi contribuisce a tale progetto con lo sviluppo di un algoritmo di clustering dei messaggi di log basato su misure di similarità tra campi testuali, per superare il limite connesso alla verbosità ed eterogeneità dei log raccolti dai vari servizi operativi 24/7 al centro.
APA, Harvard, Vancouver, ISO, and other styles
7

Parikh, Nidhi Kiranbhai. "Behavior Modeling and Analytics for Urban Computing: A Synthetic Information-based Approach." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/84967.

Full text
Abstract:
The rapid increase in urbanization poses challenges in diverse areas such as energy, transportation, pandemic planning, and disaster response. Planning for urbanization is a big challenge because cities are complex systems consisting of human populations, infrastructures, and interactions and interdependence among them. This dissertation focuses on a synthetic information-based approach for modeling human activities and behaviors for two urban science applications, epidemiology and disaster planning, and with associated analytics. Synthetic information is a data-driven approach to create a detailed, high fidelity representation of human populations, infrastructural systems and their behavioral and interaction aspects. It is used in developing large-scale simulations to model what-if scenarios and for policy making. Big cities have a large number of visitors visiting them every day. They often visit crowded areas in the city and come into contact with each other and the area residents. However, most epidemiological studies have ignored their role in spreading epidemics. We extend the synthetic population model of the Washington DC metro area to include transient populations, consisting of tourists and business travelers, along with their demographics and activities, by combining data from multiple sources. We evaluate the effect of including this population in epidemic forecasts, and the potential benefits of multiple interventions that target transients. In the next study, we model human behavior in the aftermath of the detonation of an improvised nuclear device in Washington DC. Previous studies of this scenario have mostly focused on modeling physical impact and simple behaviors like sheltering and evacuation. However, these models have focused on optimal behavior, not naturalistic behavior. In other words, prior work is focused on whether it is better to shelter-in-place or evacuate, but has not been informed by the literature on what people actually do in the aftermath of disasters. Natural human behaviors in disasters, such as looking for family members or seeking healthcare, are supported by infrastructures such as cell-phone communication and transportation systems. We model a range of behaviors such as looking for family members, evacuation, sheltering, healthcare-seeking, worry, and search and rescue and their interactions with infrastructural systems. Large-scale and complex agent-based simulations generate a large amount of data in each run of the simulation, making it hard to make sense of results. This leads us to formulate two new problems in simulation analytics. First, we develop algorithms to summarize simulation results by extracting causally-relevant state sequences - state sequences that have a measurable effect on the outcome of interest. Second, in order to develop effective interventions, it is important to understand which behaviors lead to positive and negative outcomes. It may happen that the same behavior may lead to different outcomes, depending upon the context. Hence, we develop an algorithm for contextual behavior ranking. In addition to the context mentioned in the query, our algorithm also identifies any additional context that may affect the behavioral ranking.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
8

Worthy, William Tuley. "Aligning Social Media, Mobile, Analytics, and Cloud Computing Technologies and Disaster Response." ScholarWorks, 2018. https://scholarworks.waldenu.edu/dissertations/4696.

Full text
Abstract:
After nearly 2 decades of advances in information and communications technologies (ICT) including social media, mobile, analytics, and cloud computing, disaster response agencies in the United States have not been able to improve alignment between ICT-based information and disaster response actions. This grounded theory study explored emergency response ICT managers' understanding of how social media, mobile, analytics, and cloud computing technologies (SMAC) are related to and can inform disaster response strategies. Sociotechnical theory served as the conceptual framework to ground the study. Data were collected from document reviews and semistructured interviews with 9 ICT managers from emergency management agencies in the state of Hawaii who had experience in responding to major disasters. The data were analyzed using open, axial coding, and selective coding. Three elements of a theory emerged from the findings: (a) the ICT managers were hesitant about SMAC technologies replacing first responder's radios to interoperate between emergency response agencies during major disasters, (b) the ICT managers were receptive to converging conventional ICT with SMAC technologies, and (c) the ICT managers were receptive to joining legacy information sharing strategies with new information sharing strategies based on SMAC technologies. The emergent theory offers a framework for aligning SMAC technologies and disaster response strategies. The implications for positive social change include reduced interoperability failures between disaster agencies during major catastrophes, which may lower the risk of casualties and deaths to emergency responders and disaster victims, thus benefiting them and their communities.
APA, Harvard, Vancouver, ISO, and other styles
9

Panneerselvam, John. "A prescriptive analytics approach for energy efficiency in datacentres." Thesis, University of Derby, 2018. http://hdl.handle.net/10545/622460.

Full text
Abstract:
Given the evolution of Cloud Computing in recent years, users and clients adopting Cloud Computing for both personal and business needs have increased at an unprecedented scale. This has naturally led to the increased deployments and implementations of Cloud datacentres across the globe. As a consequence of this increasing adoption of Cloud Computing, Cloud datacentres are witnessed to be massive energy consumers and environmental polluters. Whilst the energy implications of Cloud datacentres are being addressed from various research perspectives, predicting the future trend and behaviours of workloads at the datacentres thereby reducing the active server resources is one particular dimension of green computing gaining the interests of researchers and Cloud providers. However, this includes various practical and analytical challenges imposed by the increased dynamism of Cloud systems. The behavioural characteristics of Cloud workloads and users are still not perfectly clear which restrains the reliability of the prediction accuracy of existing research works in this context. To this end, this thesis presents a comprehensive descriptive analytics of Cloud workload and user behaviours, uncovering the cause and energy related implications of Cloud Computing. Furthermore, the characteristics of Cloud workloads and users including latency levels, job heterogeneity, user dynamicity, straggling task behaviours, energy implications of stragglers, job execution and termination patterns and the inherent periodicity among Cloud workload and user behaviours have been empirically presented. Driven by descriptive analytics, a novel user behaviour forecasting framework has been developed, aimed at a tri-fold forecast of user behaviours including the session duration of users, anticipated number of submissions and the arrival trend of the incoming workloads. Furthermore, a novel resource optimisation framework has been proposed to avail the most optimum level of resources for executing jobs with reduced server energy expenditures and job terminations. This optimisation framework encompasses a resource estimation module to predict the anticipated resource consumption level for the arrived jobs and a classification module to classify tasks based on their resource intensiveness. Both the proposed frameworks have been verified theoretically and tested experimentally based on Google Cloud trace logs. Experimental analysis demonstrates the effectiveness of the proposed framework in terms of the achieved reliability of the forecast results and in reducing the server energy expenditures spent towards executing jobs at the datacentres.
APA, Harvard, Vancouver, ISO, and other styles
10

Spruth, Wilhelm G. "Enterprise Computing." Universitätsbibliothek Leipzig, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-126859.

Full text
Abstract:
Das vorliegende Buch entstand aus einer zweisemestrigen Vorlesung „Enterprise Computing“, die wir gemeinsam über viele Jahre als Teil des Bachelor- oder Master-Studienganges an der Universität Leipzig gehalten haben. Das Buch führt ein in die Welt des Mainframe und soll dem Leser einen einführenden Überblick geben. Band 1 ist der Einführung in z/OS gewidmet, während sich Band 2 mit der Internet Integration beschäftigt. Ergänzend werden in Band 3 praktische Übungen unter z/OS dargestellt.
APA, Harvard, Vancouver, ISO, and other styles
11

Saker, Vanessa. "Automated feature synthesis on big data using cloud computing resources." Master's thesis, University of Cape Town, 2020. http://hdl.handle.net/11427/32452.

Full text
Abstract:
The data analytics process has many time-consuming steps. Combining data that sits in a relational database warehouse into a single relation while aggregating important information in a meaningful way and preserving relationships across relations, is complex and time-consuming. This step is exceptionally important as many machine learning algorithms require a single file format as an input (e.g. supervised and unsupervised learning, feature representation and feature learning, etc.). An analyst is required to manually combine relations while generating new, more impactful information points from data during the feature synthesis phase of the feature engineering process that precedes machine learning. Furthermore, the entire process is complicated by Big Data factors such as processing power and distributed data storage. There is an open-source package, Featuretools, that uses an innovative algorithm called Deep Feature Synthesis to accelerate the feature engineering step. However, when working with Big Data, there are two major limitations. The first is the curse of modularity - Featuretools stores data in-memory to process it and thus, if data is large, it requires a processing unit with a large memory. Secondly, the package is dependent on data stored in a Pandas DataFrame. This makes the use of Featuretools with Big Data tools such as Apache Spark, a challenge. This dissertation aims to examine the viability and effectiveness of using Featuretools for feature synthesis with Big Data on the cloud computing platform, AWS. Exploring the impact of generated features is a critical first step in solving any data analytics problem. If this can be automated in a distributed Big Data environment with a reasonable investment of time and funds, data analytics exercises will benefit considerably. In this dissertation, a framework for automated feature synthesis with Big Data is proposed and an experiment conducted to examine its viability. Using this framework, an infrastructure was built to support the process of feature synthesis on AWS that made use of S3 storage buckets, Elastic Cloud Computing services, and an Elastic MapReduce cluster. A dataset of 95 million customers, 34 thousand fraud cases and 5.5 million transactions across three different relations was then loaded into the distributed relational database on the platform. The infrastructure was used to show how the dataset could be prepared to represent a business problem, and Featuretools used to generate a single feature matrix suitable for inclusion in a machine learning pipeline. The results show that the approach was viable. The feature matrix produced 75 features from 12 input variables and was time efficient with a total end-to-end run time of 3.5 hours and a cost of approximately R 814 (approximately $52). The framework can be applied to a different set of data and allows the analysts to experiment on a small section of the data until a final feature set is decided. They are able to easily scale the feature matrix to the full dataset. This ability to automate feature synthesis, iterate and scale up, will save time in the analytics process while providing a richer feature set for better machine learning results.
APA, Harvard, Vancouver, ISO, and other styles
12

Winberg, André, and Ramin Alberto Golrang. "Analytics as a Service : Analysis of services in Microsoft Azure." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-47655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Soukup, Petr. "High-Performance Analytics (HPA)." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-165252.

Full text
Abstract:
The aim of the thesis on the topic of High-Performance Analytics is to gain a structured overview of solutions of high performance methods for data analysis. The thesis introduction concerns with definitions of primary and secondary data analysis, and with the primary systems which are not appropriate for analytical data analysis. The usage of mobile devices, modern information technologies and other factors caused a rapid change of the character of data. The major part of this thesis is devoted particularly to the historical turn in the new approaches towards analytical data analysis, which was caused by Big Data, a very frequent term these days. Towards the end of the thesis there are discussed the system sources which greatly participate in the new approaches to the analytical data analysis as well as in the technological solutions of High Performance Analytics themselves. The second, practical part of the thesis is aimed at a comparison of the performance in conventional methods for data analysis and in one of the high performance methods of High Performance Analytics (more precisely, with In-Memory Analytics). Comparison of individual solutions is performed in identical environment of High Performance Analytics server. The methods are applied to a certain sample whose volume is increased after every round of executed measurement. The conclusion evaluates the tests results and discusses the possibility of usage of the individual High Performance Analytics methods.
APA, Harvard, Vancouver, ISO, and other styles
14

Koza, Jacob. "Active Analytics: Suggesting Navigational Links to Users Based on Temporal Analytics Data." UNF Digital Commons, 2019. https://digitalcommons.unf.edu/etd/892.

Full text
Abstract:
Front-end developers are tasked with keeping websites up-to-date while optimizing user experiences and interactions. Tools and systems have been developed to give these individuals granular analytic insight into who, with what, and how users are interacting with their sites. These systems maintain a historical record of user interactions that can be leveraged for design decisions. Developing a framework to aggregate those historical usage records and using it to anticipate user interactions on a webpage could automate the task of optimizing web pages. In this research a system called Active Analytics was created that takes Google Analytics historical usage data and provides a dynamic front-end system for automatically updating web page navigational elements. The previous year’s data is extracted from Google Analytics and transformed into a summarization of top navigation steps. Once stored, a responsive front-end system selects from this data a timespan of three weeks from the previous year: current, previous and next. The most frequently reached pages, or their parent pages, will have their navigational UI elements highlighted on a top-level or landing page to attempt to reduce the effort to reach those pages. The Active Analytics framework was evaluated by eliciting volunteers by randomly assigning two versions of a site, one with the framework, one without. It was found that users of the framework-enabled site were able to navigate a site more easily than the original.
APA, Harvard, Vancouver, ISO, and other styles
15

Rodríguez, Pupo Luis Enrique. "An Analytics Platform for Integrating and Computing Spatio-Temporal Metrics in Location-aware Games." Doctoral thesis, Universitat Jaume I, 2021. http://hdl.handle.net/10803/671588.

Full text
Abstract:
This thesis presents an analytics platform for calculating spatio-temporal metrics in the context of geogames and context-based applications. It is based on an underlying conceptual model for spatio-temporal metrics, which consists of dimensions and variables to describe spatial and temporal phenomena, metrics functions to calculate application-relevant information and conditions using these data models, and actions to be triggered when certain conditions are met. The analytics platform is implemented as a cloud-based, distributed application that allows developers to define data requirements, collect required (client-generated) data, and define and execute spatio-temporal metrics. It is designed to handle large amounts of (streaming) data and to scale well under increasing amounts of data and metrics computations. The platform is validated in two experiments: a location-aware game for collecting noise data in a city and a mobile application for location-based mental health treatments, which shows its usability, versatility and feasibility in real-world scenario<br>Esta tesis presenta una plataforma de análisis para calcular métricas espacio-temporales en el contexto de geojuegos y aplicaciones basadas en el contexto. Se basa en un modelo conceptual para métricas espacio-temporales compuesto de dimensiones y variables para describir fenómenos con componentes espaciales y temporales, funciones de métricas para calcular información relevante para la aplicación, y acciones activadas cuando se cumplan ciertas condiciones. La implementación consiste en una aplicación distribuida basada en la nube que permite a los desarrolladores definir los requisitos de datos, recopilarlos en el cliente y ejecutar métricas espacio-temporales. Está diseñada para ser escalable en cuanto a los datos y al cómputo de las métricas. La plataforma está validada en dos experimentos: un geo-juego para recopilar datos de ruido en una ciudad y una aplicación móvil para tratamientos de salud mental basados en la ubicación, que muestra su usabilidad, versatilidad y viabilidad en escenarios del mundo real.
APA, Harvard, Vancouver, ISO, and other styles
16

Carle, William R. II. "Active Analytics: Adapting Web Pages Automatically Based on Analytics Data." UNF Digital Commons, 2016. http://digitalcommons.unf.edu/etd/629.

Full text
Abstract:
Web designers are expected to perform the difficult task of adapting a site’s design to fit changing usage trends. Web analytics tools give designers a window into website usage patterns, but they must be analyzed and applied to a website's user interface design manually. A framework for marrying live analytics data with user interface design could allow for interfaces that adapt dynamically to usage patterns, with little or no action from the designers. The goal of this research is to create a framework that utilizes web analytics data to automatically update and enhance web user interfaces. In this research, we present a solution for extracting analytics data via web services from Google Analytics and transforming them into reporting data that will inform user interface improvements. Once data are extracted and summarized, we expose the summarized reports via our own web services in a form that can be used by our client side User Interface (UI) framework. This client side framework will dynamically update the content and navigation on the page to reflect the data mined from the web usage reports. The resulting system will react to changing usage patterns of a website and update the user interface accordingly. We evaluated our framework by assigning navigation tasks to users on the UNF website and measuring the time it took them to complete those tasks, one group with our framework enabled, and one group using the original website. We found that the group that used the modified version of the site with our framework enabled was able to navigate the site more quickly and effectively.
APA, Harvard, Vancouver, ISO, and other styles
17

Straub, Kayla Marie. "Data Mining Academic Emails to Model Employee Behaviors and Analyze Organizational Structure." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/71320.

Full text
Abstract:
Email correspondence has become the predominant method of communication for businesses. If not for the inherent privacy concerns, this electronically searchable data could be used to better understand how employees interact. After the Enron dataset was made available, researchers were able to provide great insight into employee behaviors based on the available data despite the many challenges with that dataset. The work in this thesis demonstrates a suite of methods to an appropriately anonymized academic email dataset created from volunteers' email metadata. This new dataset, from an internal email server, is first used to validate feature extraction and machine learning algorithms in order to generate insight into the interactions within the center. Based solely on email metadata, a random forest approach models behavior patterns and predicts employee job titles with $96%$ accuracy. This result represents classifier performance not only on participants in the study but also on other members of the center who were connected to participants through email. Furthermore, the data revealed relationships not present in the center's formal operating structure. The culmination of this work is an organic organizational chart, which contains a fuller understanding of the center's internal structure than can be found in the official organizational chart.<br>Master of Science
APA, Harvard, Vancouver, ISO, and other styles
18

Zheng, Fang. "Middleware for online scientific data analytics at extreme scale." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51847.

Full text
Abstract:
Scientific simulations running on High End Computing machines in domains like Fusion, Astrophysics, and Combustion now routinely generate terabytes of data in a single run, and these data volumes are only expected to increase. Since such massive simulation outputs are key to scientific discovery, the ability to rapidly store, move, analyze, and visualize data is critical to scientists' productivity. Yet there are already serious I/O bottlenecks on current supercomputers, and movement toward the Exascale is further accelerating this trend. This dissertation is concerned with the design, implementation, and evaluation of middleware-level solutions to enable high performance and resource efficient online data analytics to process massive simulation output data at large scales. Online data analytics can effectively overcome the I/O bottleneck for scientific applications at large scales by processing data as it moves through the I/O path. Online analytics can extract valuable insights from live simulation output in a timely manner, better prepare data for subsequent deep analysis and visualization, and gain improved performance and reduced data movement cost (both in time and in power) compared to the conventional post-processing paradigm. The thesis identifies the key challenges for online data analytics based on the needs of a variety of large-scale scientific applications, and proposes a set of novel and effective approaches to efficiently program, distribute, and schedule online data analytics along the critical I/O path. In particular, its solution approach i) provides a high performance data movement substrate to support parallel and complex data exchanges between simulation and online data analytics, ii) enables placement flexibility of analytics to exploit distributed resources, iii) for co-placement of analytics with simulation codes on the same nodes, it uses fined-grained scheduling to harvest idle resources for running online analytics with minimal interference to the simulation, and finally, iv) it supports scalable efficient online spatial indices to accelerate data analytics and visualization on the deep memory hierarchies of high end machines. Our middleware approach is evaluated with leadership scientific applications in domains like Fusion, Combustion, and Molecular Dynamics, and on different High End Computing platforms. Substantial improvements are demonstrated in end-to-end application performance and in resource efficiency at scales of up to 16384 cores, for a broad range of analytics and visualization codes. The outcome is a useful and effective software platform for online scientific data analytics facilitating large-scale scientific data exploration.
APA, Harvard, Vancouver, ISO, and other styles
19

Fan, Qi. "Multi-Objective Optimization for Data Analytics in the Cloud." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAX069.

Full text
Abstract:
Le traitement des requêtes Big Data est devenu de plus en plus important, ce qui a conduit au développement et au déploiement dans le cloud de nombreux systèmes. Cependant, le réglage automatique des nombreux paramètres de ces systèmes Big Data introduit une complexité croissante pour répondre aux objectifs de performance et aux contraintes budgétaires des utilisateurs. La détermination des configurations optimales est un défi en raison de la nécessité de prendre en compte : 1) plusieurs objectifs de performances et contraintes budgétaires concurrents, tels qu'une faible latence et un faible coût, 2) un espace de paramètres de grande dimension avec un contrôle de paramètres complexe, et 3) l'exigence d'une configuration élevée. efficacité de calcul dans l'utilisation du cloud, généralement en 1 à 2 secondes.Pour relever les défis ci-dessus, cette thèse propose des algorithmes d'optimisation multi-objectifs (MOO) efficaces pour un optimiseur de cloud afin de répondre à divers objectifs des utilisateurs. Il calcule les configurations Pareto optimales pour les requêtes Big Data dans un espace de paramètres de grande dimension tout en respectant des exigences strictes en matière de temps de résolution. Plus précisément, cette thèse présente les contributions suivantes.La première contribution de cette thèse est une analyse comparative des méthodes et solveurs MOO existants, identifiant leurs limites, notamment en termes d'efficacité et de qualité des solutions Pareto, lorsqu'elles sont appliquées à l'optimisation du cloud.La deuxième contribution présente les algorithmes MOO conçus pour calculer les solutions optimales de Pareto pour les étapes de requête, qui sont des unités définies par des limites de mélange. Dans le traitement du Big Data à l’échelle de la production, chaque étape opère dans un espace de paramètres de grande dimension, avec des milliers d’instances parallèles. Chaque instance nécessite des paramètres de ressources déterminés lors de l'affectation à l'une des milliers de machines, comme en témoignent des systèmes comme MaxCompute. Pour atteindre l’optimalité Pareto pour chaque étape de requête, nous proposons une nouvelle approche hiérarchique MOO. Cette méthode décompose le problème MOO au niveau de l'étape en plusieurs problèmes MOO parallèles au niveau de l'instance et dérive efficacement des solutions MOO au niveau de l'étape à partir de solutions MOO au niveau de l'instance. Les résultats de l'évaluation utilisant des charges de travail de production démontrent que notre approche hiérarchique MOO surpasse les méthodes MOO existantes de 4% à 77% en termes de performances et jusqu'à 48% en réduction des coûts tout en fonctionnant dans un délai de 0,02 à 0,23 secondes par rapport aux optimiseurs et planificateurs actuels.Notre troisième contribution vise à atteindre l’optimalité Pareto pour l’ensemble de la requête avec un contrôle plus fin des paramètres. Dans les systèmes Big Data comme Spark, certains paramètres peuvent être ajustés indépendamment pour chaque étape de la requête, tandis que d'autres sont partagés entre toutes les étapes, introduisant ainsi un espace de paramètres de grande dimension et des contraintes complexes. Pour relever ce défi, nous proposons une nouvelle approche appelée MOO hiérarchique avec contraintes (HMOOC). Cette méthode décompose le problème d’optimisation d’un grand espace de paramètres en sous-problèmes plus petits, chacun contraint d’utiliser les mêmes paramètres partagés. Étant donné que ces sous-problèmes ne sont pas indépendants, nous développons des techniques pour générer un ensemble suffisamment large de solutions candidates et les agréger efficacement pour former des solutions Pareto optimales globales. Les résultats de l'évaluation utilisant les benchmarks TPC-H et TPC-DS démontrent que HMOOC surpasse les méthodes MOO existantes, obtenant une amélioration de 4,7% à 54,1% de l'hypervolume et une réduction de 81% à 98,3% du temps de résolution<br>Big data query processing has become increasingly important, prompting the development and cloud deployment of numerous systems. However, automatically tuning the numerous parameters in these big data systems introduces growing complexity in meeting users' performance goals and budgetary constraints. Determining optimal configurations is challenging due to the need to address: 1) multiple competing performance goals and budgetary constraints, such as low latency and low cost, 2) a high-dimensional parameter space with complex parameter control, and 3) the requirement for high computational efficiency in cloud use, typically within 1-2 seconds.To address the above challenges, this thesis proposes efficient multi-objective optimization (MOO) algorithms for a cloud optimizer to meet various user objectives. It computes Pareto optimal configurations for big data queries within a high-dimensional parameter space while adhering to stringent solving time requirements. More specifically, this thesis introduces the following contributions.The first contribution of this thesis is a benchmarking analysis of existing MOO methods and solvers, identifying their limitations, particularly in terms of efficiency and the quality of Pareto solutions, when applied to cloud optimization.The second contribution introduces MOO algorithms designed to compute Pareto optimal solutions for query stages, which are units defined by shuffle boundaries. In production-scale big data processing, each stage operates within a high-dimensional parameter space, with thousands of parallel instances. Each instance requires resource parameters determined upon assignment to one of thousands of machines, as exemplified by systems like MaxCompute. To achieve Pareto optimality for each query stage, we propose a novel hierarchical MOO approach. This method decomposes the stage-level MOO problem into multiple parallel instance-level MOO problems and efficiently derives stage-level MOO solutions from instance-level MOO solutions. Evaluation results using production workloads demonstrate that our hierarchical MOO approach outperforms existing MOO methods by 4% to 77% in terms of performance and up to 48% in cost reduction while operating within 0.02 to 0.23 seconds compared to current optimizers and schedulers.Our third contribution aims to achieve Pareto optimality for the entire query with finer-granularity control of parameters. In big data systems like Spark, some parameters can be tuned independently for each query stage, while others are shared across all stages, introducing a high-dimensional parameter space and complex constraints. To address this challenge, we propose a new approach called Hierarchical MOO with Constraints (HMOOC). This method decomposes the optimization problem of a large parameter space into smaller subproblems, each constrained to use the same shared parameters. Given that these subproblems are not independent, we develop techniques to generate a sufficiently large set of candidate solutions and efficiently aggregate them to form global Pareto optimal solutions. Evaluation results using TPC-H and TPC-DS benchmarks demonstrate that HMOOC outperforms existing MOO methods, achieving a 4.7% to 54.1% improvement in hypervolume and an 81% to 98.3% reduction in solving time
APA, Harvard, Vancouver, ISO, and other styles
20

Amur, Hrishikesh. "Storage and aggregation for fast analytics systems." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50397.

Full text
Abstract:
Computing in the last decade has been characterized by the rise of data- intensive scalable computing (DISC) systems. In particular, recent years have wit- nessed a rapid growth in the popularity of fast analytics systems. These systems exemplify a trend where queries that previously involved batch-processing (e.g., run- ning a MapReduce job) on a massive amount of data, are increasingly expected to be answered in near real-time with low latency. This dissertation addresses the problem that existing designs for various components used in the software stack for DISC sys- tems do not meet the requirements demanded by fast analytics applications. In this work, we focus specifically on two components: 1. Key-value storage: Recent work has focused primarily on supporting reads with high throughput and low latency. However, fast analytics applications require that new data entering the system (e.g., new web-pages crawled, currently trend- ing topics) be quickly made available to queries and analysis codes. This means that along with supporting reads efficiently, these systems must also support writes with high throughput, which current systems fail to do. In the first part of this work, we solve this problem by proposing a new key-value storage system – called the WriteBuffer (WB) Tree – that provides up to 30× higher write per- formance and similar read performance compared to current high-performance systems. 2. GroupBy-Aggregate: Fast analytics systems require support for fast, incre- mental aggregation of data for with low-latency access to results. Existing techniques are memory-inefficient and do not support incremental aggregation efficiently when aggregate data overflows to disk. In the second part of this dis- sertation, we propose a new data structure called the Compressed Buffer Tree (CBT) to implement memory-efficient in-memory aggregation. We also show how the WB Tree can be modified to support efficient disk-based aggregation.
APA, Harvard, Vancouver, ISO, and other styles
21

Green, Oded. "High performance computing for irregular algorithms and applications with an emphasis on big data analytics." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51860.

Full text
Abstract:
Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.
APA, Harvard, Vancouver, ISO, and other styles
22

Dash, Sajal. "Exploring the Landscape of Big Data Analytics Through Domain-Aware Algorithm Design." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99798.

Full text
Abstract:
Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. High volume and velocity of the data warrant a large amount of storage, memory, and compute power while a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. In this thesis, we present our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental arrival of data through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We present Claret, a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool, to demonstrate the application of the first guideline. It combines algorithmic concepts extended from the stochastic force-based multi-dimensional scaling (SF-MDS) and Glimmer. Claret computes approximate weighted Euclidean distances by combining a novel data mapping called stretching and Johnson Lindestrauss' lemma to reduce the complexity of WMDS from O(f(n)d) to O(f(n) log d). In demonstrating the second guideline, we map the problem of identifying multi-hit combinations of genetic mutations responsible for cancers to weighted set cover (WSC) problem by leveraging the semantics of cancer genomic data obtained from cancer biology. Solving the mapped WSC with an approximate algorithm, we identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples. To identify three- and four-hits, which require orders of magnitude larger computational power, we have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. Developing new statistics to combine search results over time makes incremental analysis feasible. iBLAST performs (1+δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We also explored various approaches to mitigate catastrophic forgetting in incremental training of deep learning models.<br>Doctor of Philosophy<br>Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. Here volume represents the data's size, variety represents various sources and formats of the data, and velocity represents the data arrival rate. High volume and velocity of the data warrant a large amount of storage, memory, and computational power. In contrast, a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. This thesis presents our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric (pair-wise distance and distribution-related) and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental data arrival through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We demonstrate the application of the first guideline through the design and development of Claret. Claret is a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool that can reduce the dimension of high-dimensional data points. In demonstrating the second guideline, we identify combinations of cancer-causing gene mutations by mapping the problem to a well known computational problem known as the weighted set cover (WSC) problem. We have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer to solve the problem in less than two hours instead of an estimated hundred years. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. This analysis was made possible by developing new statistics to combine search results over time. We also explored various approaches to mitigate the catastrophic forgetting of deep learning models, where a model forgets to perform machine learning tasks efficiently on older data in a streaming setting.
APA, Harvard, Vancouver, ISO, and other styles
23

Lopez, Inga Milton Elvis, and Huaranga Ricardo Martín Guerrero. "Modelo de business intelligence y analytics soportado por la tecnologia cloud computing para pymes del sector retail." Bachelor's thesis, Universidad Peruana de Ciencias Aplicadas (UPC), 2017. http://hdl.handle.net/10757/622650.

Full text
Abstract:
El principal reto de las PYMES peruanas es la obtención de información para la toma de decisiones. En este contexto, el uso de tecnologías tradicionales de análisis de datos como Business Intelligence y Analytics resultan poco accesibles para las empresas, debido a limitantes económicas y de capital humano. El objetivo principal del proyecto es implementar un modelo tecnológico que combine Business Intelligence y Analytics con Cloud Computing, para permitir a las PYMES retail integrar y procesar sus datos para tomar decisiones informadas y oportunas en materia de planificación y gestión de inventarios. Todo con un bajo costo de implementación y despliegue. Para el desarrollo del proyecto se realiza una investigación previa sobre las tecnologías Business Intelligence, Analytics y Cloud Computing, donde se estudian las aplicaciones de cada una y casos de éxito de PYMES retail a nivel mundial. Luego, se diseña el modelo tecnológico orientado a las necesidades de las PYMES retail peruanas, acompañado de un plan de implementación del modelo basado en metodologías de Business Intelligence y Cloud Computing y un análisis de proveedores de servicios cloud que mejor se adecuan a este tipo de empresas. El modelo es validado mediante su implementación en una PYME retail peruana. Para ello, se realiza un análisis de negocio e infraestructura tecnológica y se identifican los principales requerimientos de información. Con ello, se consigue integrar la información aislada de múltiples tiendas, reducir el tiempo de demora de consolidación de datos en un 94% y una reducción de costos del 20%. Finalmente, se propone un plan de continuidad que permite escalar las funcionalidades del modelo, orientado a las tendencias tecnológicas del sector retail.<br>The main challenge for Peruvian SMEs is obtaining information for decision making. In this context, the use of data analysis technologies such as Business Intelligence and Analytics are not very accessible to companies, due to economic and human capital limitations. The main objective of the project is the implementation of a technological model that combines Business Intelligence and Analytics with Cloud Computing, to allow retail SMEs to integrate and process their data to make informed and timely decisions regarding inventory planning and management, with a low cost of implementation and deployment. For the development of the project, a preliminary investigation is carried out about Business Intelligence, Analytics and Cloud Computing technologies, where the applications of each one is studied and success cases of retail SMEs worldwide. Then, the technological model oriented to the necesities of Peruvian retail SMEs is designed, accompanied by a plan to implement the model based on Business Intelligence and Cloud Computing methodologies and an analysis of cloud service providers that best suit to this type of Business. The model is validated through its implementation in a Peruvian retail SME. To accomplish it, a business and technological infrastructure analysis is carried out and the key information requirements are identified. All this data allows to integrate the isolated information of multiple stores, reducing the data consolidation delay time by 94% and reducing costs by 20%. Finally, we propose a continuity plan that allows scaling the functionalities of the model, oriented to the technological trends of the retail sector.
APA, Harvard, Vancouver, ISO, and other styles
24

Dahlberg, Oskar. "Analytics as a service : Utvärdering över funktionalitet inom Cloud BI." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18795.

Full text
Abstract:
Business Intelligence (BI) hjälper organisationer att kunna nå sina mål genom ett mer effektivt beslutsfattande. Ny teknik kommer ständigt och att tillämpa Cloud Computing (CC) tillsammans med BI ger många fördelar. Detta är väldigt ny teknologi som har en låg mognadsnivå men ständigt utvecklas och i denna rapport kan du läsa om den funktionalitet som finns tillgänglig om du väljer att använda dig av CC för att utföra BI. Det har utförts intervjuer för samla in kvalitativ data som kan beskriva detta fenomen för att kunna ge organisationer mer insikt i hur detta fungerar. Genom att utföra semistrukturerade intervjuer med mycket rum för diskussion har många intressanta mönster framtagits. Totalt deltog fyra stycken respondenter som arbetat eller arbetar med BI och erfarenhet inom CC. Resultatet fokuserar på att ge förslag och ideer till en organisation som kan tänka sig välja en CC lösning för att utföra BI. Mognadsnivån för denna ny teknologi är fortfarande väldigt låg och användningsområden och datatyper samt visualiserings funktioner kommer att tas upp.
APA, Harvard, Vancouver, ISO, and other styles
25

Maguire, Eamonn James. "Systematising glyph design for visualization." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:b98ccce1-038f-4c0a-a259-7f53dfe06ac7.

Full text
Abstract:
The digitalisation of information now affects most fields of human activity. From the social sciences to biology to physics, the volume, velocity, and variety of data exhibit exponential growth trends. With such rates of expansion, efforts to understand and make sense of datasets of such scale, how- ever driven and directed, progress only at an incremental pace. The challenges are significant. For instance, the ability to display an ever growing amount of data is physically and naturally bound by the dimensions of the average sized display. A synergistic interplay between statistical analysis and visualisation approaches outlines a path for significant advances in the field of data exploration. We can turn to statistics to provide principled guidance for prioritisation of information to display. Using statistical results, and combining knowledge from the cognitive sciences, visual techniques can be used to highlight salient data attributes. The purpose of this thesis is to explore the link between computer science, statistics, visualization, and the cognitive sciences, to define and develop more systematic approaches towards the design of glyphs. Glyphs represent the variables of multivariate data records by mapping those variables to one or more visual channels (e.g., colour, shape, and texture). They offer a unique, compact solution to the presentation of a large amount of multivariate information. However, composing a meaningful, interpretable, and learnable glyph can pose a number of problems. The first of these problems exist in the subjectivity involved in the process of data to visual channel mapping, and in the organisation of those visual channels to form the overall glyph. Our first contribution outlines a computational technique to help systematise many of these otherwise subjective elements of the glyph design process. For visual information compression, common patterns (motifs) in time series or graph data for example, may be replaced with more compact, visual representations. Glyph-based techniques can provide such representations that can help users find common patterns more quickly, and at the same time, bring attention to anomalous areas of the data. However, replacing any data with a glyph is not going to make tasks such as visual search easier. A key problem is the selection of semantically meaningful motifs with the potential to compress large amounts of information. A second contribution of this thesis is a computational process for systematic design of such glyph libraries and their subsequent glyphs. A further problem in the glyph design process is in their evaluation. Evaluation is typically a time-consuming, highly subjective process. Moreover, domain experts are not always plentiful, therefore obtaining statistically significant evaluation results is often difficult. A final contribution of this work is to investigate if there are areas of evaluation that can be performed computationally.
APA, Harvard, Vancouver, ISO, and other styles
26

Ens, Barrett. "Spatial Analytic Interfaces." ACM, 2014. http://hdl.handle.net/1993/31595.

Full text
Abstract:
We propose the concept of spatial analytic interfaces (SAIs) as a tool for performing in-situ, everyday analytic tasks. Mobile computing is now ubiquitous and provides access to information at nearly any time or place. However, current mobile interfaces do not easily enable the type of sophisticated analytic tasks that are now well-supported by desktop computers. Conversely, desktop computers, with large available screen space to view multiple data visualizations, are not always available at the ideal time and place for a particular task. Spatial user interfaces, leveraging state-of-the-art miniature and wearable technologies, can potentially provide intuitive computer interfaces to deal with the complexity needed to support everyday analytic tasks. These interfaces can be implemented with versatile form factors that provide mobility for doing such taskwork in-situ, that is, at the ideal time and place. We explore the design of spatial analytic interfaces for in-situ analytic tasks, that leverage the benefits of an upcoming generation of light-weight, see-through, head-worn displays. We propose how such a platform can meet the five primary design requirements for personal visual analytics: mobility, integration, interpretation, multiple views and interactivity. We begin with a design framework for spatial analytic interfaces based on a survey of existing designs of spatial user interfaces. We then explore how to best meet these requirements through a series of design concepts, user studies and prototype implementations. Our result is a holistic exploration of the spatial analytic concept on a head-worn display platform.<br>October 2016
APA, Harvard, Vancouver, ISO, and other styles
27

Morabito, Andrea. "Utilizzo di Scala e Spark per l'esecuzione di programmi Data-Intensive in ambiente cloud." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14843/.

Full text
Abstract:
Questo documento fornisice un introduzione al mondo dei big data e cerca di fornire una panoramica chiara e completa su un linguaggio di programmazione e un framework utile per la manipolazione di grandi dataset e, quindi,come essi sono in grado di interoperare: • Scala, rappresenta un linguaggio di programmazione basato su due paradigmi di programmazione: Object Oriented e Funzionale; • Spark, che può essere visto come un linguaggio per il calcolo distribuito e l’analisi di Big Data. Nel presente testo, dopo aver fornito un’introduzione al contesto nel Capitolo 1, vengono descritti i principali costrutti del linguaggio di programmazione Scala, che sfrutta un modello di comunicazione di message passing basato sugli attori nel Capitolo 2. Successivamente, nel Capitolo 3, viene descritto il framework Spark con la sua architettura e il sottosistema di programmazione basato su RDD. Si conclude poi nel Capitolo 4 con la dimostrazione di un caso d’uso dei due linguaggi, in cui viene preso un dataset da SNAP, gli viene applicato il PageRank (grazie anche all’uso della libreria GraphX) ed eseguito sulla piattaforma di Amazon Web Service EC2.
APA, Harvard, Vancouver, ISO, and other styles
28

McClurg, Josiah. "Fast demand response with datacenter loads: a green dimension of big data." Diss., University of Iowa, 2017. https://ir.uiowa.edu/etd/5811.

Full text
Abstract:
Demand response is one of the critical technologies necessary for allowing large-scale penetration of intermittent renewable energy sources in the electric grid. Data centers are especially attractive candidates for providing flexible, real-time demand response services to the grid because they are capable of fast power ramp-rates, large dynamic range, and finely-controllable power consumption. This thesis makes a contribution toward implementing load shaping with server clusters through a detailed experimental investigation of three broadly-applicable datacenter workload scenarios. We experimentally demonstrate the eminent feasibility of datacenter demand response with a distributed video transcoding application and a simple distributed power controller. We also show that while some software power capping interfaces performed better than others, all the interfaces we investigated had the high dynamic range and low power variance required to achieve high quality power tracking. Our next investigation presents an empirical performance evaluation of algorithms that replace arithmetic operations with low-level bit operations for power-aware Big Data processing. Specifically, we compare two different data structures in terms of execution time and power efficiency: (a) a baseline design using arrays, and (b) a design using bit-slice indexing (BSI) and distributed BSI arithmetic. Across three different datasets and three popular queries, we show that the bit-slicing queries consistently outperform the array algorithm in both power efficiency and execution time. In the context of datacenter power shaping, this performance optimization enables additional power flexibility -- achieving the same or greater performance than the baseline approach, even under power constraints. The investigation of read-optimized index queries leads up to an experimental investigation of the tradeoffs among power constraint, query freshness, and update aggregation size in a dynamic big data environment. We compare several update strategies, presenting a bitmap update optimization that allows improved performance over both a baseline approach and an existing state-of-the-art update strategy. Performing this investigation in the context of load shaping, we show that read-only range queries can be served without performance impact under power cap, and index updates can be tuned to provide a flexible base load. This thesis concludes with a brief discussion of control implementation and summary of our findings.
APA, Harvard, Vancouver, ISO, and other styles
29

Lambert, Glenn M. II. "Security Analytics: Using Deep Learning to Detect Cyber Attacks." UNF Digital Commons, 2017. http://digitalcommons.unf.edu/etd/728.

Full text
Abstract:
Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities. The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area.
APA, Harvard, Vancouver, ISO, and other styles
30

Kapitán, Lukáš. "Vliv vývojových trendů na řešení projektu BI." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-150006.

Full text
Abstract:
The aim of this these is to analyse the trends occurring in Business intelligence. It does examine, summarise and judge each of the trends from the point of their usability in the real world, their influence and modification of each phase of the implementation of Bussiness intelligence. It is clear that each of these trends has its positives and negatives which can influence the statements in the evaluation. These factors are taken into consideration and analysed as well. The advantages and disadvantages of the trends are occurring especially in the areas of economical demand and technical difficultness. The main aim is to compare the methods of implementation of Bussiness intelligence with actual trends in BI. In order to achieve this a few crucial points were set: to investigate recent trends in the BI and to define the methods of implementation in the broadest terms. The awaited benefit of this these is already mentioned investigation and analysis of trends in the area of Bussiness intelligence and its use in implementation.
APA, Harvard, Vancouver, ISO, and other styles
31

Nalluri, Joseph Jayakar. "NETWORK ANALYTICS FOR THE MIRNA REGULOME AND MIRNA-DISEASE INTERACTIONS." VCU Scholars Compass, 2017. http://scholarscompass.vcu.edu/etd/5012.

Full text
Abstract:
miRNAs are non-coding RNAs of approx. 22 nucleotides in length that inhibit gene expression at the post-transcriptional level. By virtue of this gene regulation mechanism, miRNAs play a critical role in several biological processes and patho-physiological conditions, including cancers. miRNA behavior is a result of a multi-level complex interaction network involving miRNA-mRNA, TF-miRNA-gene, and miRNA-chemical interactions; hence the precise patterns through which a miRNA regulates a certain disease(s) are still elusive. Herein, I have developed an integrative genomics methods/pipeline to (i) build a miRNA regulomics and data analytics repository, (ii) create/model these interactions into networks and use optimization techniques, motif based analyses, network inference strategies and influence diffusion concepts to predict miRNA regulations and its role in diseases, especially related to cancers. By these methods, we are able to determine the regulatory behavior of miRNAs and potential causal miRNAs in specific diseases and potential biomarkers/targets for drug and medicinal therapeutics.
APA, Harvard, Vancouver, ISO, and other styles
32

Talevi, Iacopo. "Big Data Analytics and Application Deployment on Cloud Infrastructure." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14408/.

Full text
Abstract:
This dissertation describes a project began in October 2016. It was born from the collaboration between Mr.Alessandro Bandini and me, and has been developed under the supervision of professor Gianluigi Zavattaro. The main objective was to study, and in particular to experiment with, the cloud computing in general and its potentiality in the data elaboration field. Cloud computing is a utility-oriented and Internet-centric way of delivering IT services on demand. The first chapter is a theoretical introduction on cloud computing, analyzing the main aspects, the keywords, and the technologies behind clouds, as well as the reasons for the success of this technology and its problems. After the introduction section, I will briefly describe the three main cloud platforms in the market. During this project we developed a simple Social Network. Consequently in the third chapter I will analyze the social network development, with the initial solution realized through Amazon Web Services and the steps we took to obtain the final version using Google Cloud Platform with its charateristics. To conclude, the last section is specific for the data elaboration and contains a initial theoretical part that describes MapReduce and Hadoop followed by a description of our analysis. We used Google App Engine to execute these elaborations on a large dataset. I will explain the basic idea, the code and the problems encountered.
APA, Harvard, Vancouver, ISO, and other styles
33

Sharma, Rahil. "Shared and distributed memory parallel algorithms to solve big data problems in biological, social network and spatial domain applications." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2277.

Full text
Abstract:
Big data refers to information which cannot be processed and analyzed using traditional approaches and tools, due to 4 V's - sheer Volume, Velocity at which data is received and processed, and data Variety and Veracity. Today massive volumes of data originate in domains such as geospatial analysis, biological and social networks, etc. Hence, scalable algorithms for effcient processing of this massive data is a signicant challenge in the field of computer science. One way to achieve such effcient and scalable algorithms is by using shared & distributed memory parallel programming models. In this thesis, we present a variety of such algorithms to solve problems in various above mentioned domains. We solve five problems that fall into two categories. The first group of problems deals with the issue of community detection. Detecting communities in real world networks is of great importance because they consist of patterns that can be viewed as independent components, each of which has distinct features and can be detected based upon network structure. For example, communities in social networks can help target users for marketing purposes, provide user recommendations to connect with and join communities or forums, etc. We develop a novel sequential algorithm to accurately detect community structures in biological protein-protein interaction networks, where a community corresponds with a functional module of proteins. Generally, such sequential algorithms are computationally expensive, which makes them impractical to use for large real world networks. To address this limitation, we develop a new highly scalable Symmetric Multiprocessing (SMP) based parallel algorithm to detect high quality communities in large subsections of social networks like Facebook and Amazon. Due to the SMP architecture, however, our algorithm cannot process networks whose size is greater than the size of the RAM of a single machine. With the increasing size of social networks, community detection has become even more difficult, since network size can reach up to hundreds of millions of vertices and edges. Processing such massive networks requires several hundred gigabytes of RAM, which is only possible by adopting distributed infrastructure. To address this, we develop a novel hybrid (shared + distributed memory) parallel algorithm to efficiently detect high quality communities in massive Twitter and .uk domain networks. The second group of problems deals with the issue of effciently processing spatial Light Detection and Ranging (LiDAR) data. LiDAR data is widely used in forest and agricultural crop studies, landscape classification, 3D urban modeling, etc. Technological advancements in building LiDAR sensors have enabled highly accurate and dense LiDAR point clouds resulting in massive data volumes, which pose computing issues with processing and storage. We develop the first published landscape driven data reduction algorithm, which uses the slope-map of the terrain as a filter to reduce the data without sacrificing its accuracy. Our algorithm is highly scalable and adopts shared memory based parallel architecture. We also develop a parallel interpolation technique that is used to generate highly accurate continuous terrains, i.e. Digital Elevation Models (DEMs), from discrete LiDAR point clouds.
APA, Harvard, Vancouver, ISO, and other styles
34

Björnbom, Willie, and Alexander Eklöf. "VISUELL PRESENTATION AV VÄDERDATA OCH ELPRISERE TT ARBETE OM DATABASMODELLERING I MOLNET MED BUSINESS INTELLIGENCE." Thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-79491.

Full text
Abstract:
In an environment where data flows everywhere and in all forms, it can be difficult to extract something valuable of it. Business Intelligence, also known as BI, is a technology used to transform information into a valuable resource for primarily companies with a lot of information. But what opportunities does BI offer? In this essay, we use standardized techniques, popular tools and cloud services to perform a pure BI project. We will generate a report in which we will analyze whether there is any correlation between electricity prices and different types of weather data. After the practical part of the work, we will use our experience of the cloud to dig deeper into how safe the cloud reallys is. We will compare the concerns that an ordinary user has to the cloud and compare with how the cloud service provider (CSP) Azure adapts to this.<br>ett samhälle där information flödar i alla dess former så kan det vara svårt att utvinna någontingvärdefullt av detta. Business intelligence, även kallat BI, är en teknik som används för att kunnaomvandla informationen till en värdefull resurs för främst företag. Men vad kan man egentligengöra med BI? I denna uppsats används standardiserade tekniker, nya verktyg och molntjänster föratt utföra ett helt BI-projekt. Projektet innefattar en visuell rapport där det ska göras en grundliganalys om det finns någon korrelation mellan elpriser och olika typer av väderdata.Efter det praktiska arbetet så kommer en teoretiskt fördjupning inom molntjärnes säkerhet attutföras. Den teoretiska fördjupningen kommer att omfatta en jämförelse mellan de mestförekommande orosmoment som användare har inför molnet och hur Azure faktiskt ställer sig tilldessa.
APA, Harvard, Vancouver, ISO, and other styles
35

Cayllahua, Huaman Erick Eduardo, and Arias Felipe Anthonino Ramos. "Desarrollo de un modelo de BI & Analytics usando infraestructura Cloud para la Gestión de PMO en una consultora de TI." Bachelor's thesis, Universidad Peruana de Ciencias Aplicadas (UPC), 2020. http://hdl.handle.net/10757/652806.

Full text
Abstract:
El presente proyecto de tesis tiene como objetivo analizar, diseñar y modelar la arquitectura software para el proceso de gestión de PMO. Este modelo arquitectural será utilizado como base de soporte a los procesos de gestión de llamadas y tickets de la consultora de TI “NECSIA”. La finalidad del presente proyecto es resolver la situación problemática del proceso mencionado que será parte de un análisis profundo la cual se detallará más adelante. El punto crítico de esta situación problemática es que muchas actividades de extracción, transformación y homologación de datos se realizan de manera manual, lo que impide una correcta centralización del flujo de datos en la empresa. El presente proyecto propone una solución de BI y Analytics donde se destaca el modelo arquitectural 4C que integrará las diversas fuentes de información en un repositorio unificado en Cloud. Por tal motivo, se podrá obtener una adecuada gestión y gobierno de datos, sobre todo en sus cálculos históricos de los proyectos involucrados en el proceso de Gestión de PMO. En este contexto, el documento a presentar plantea el uso del marco de trabajo Zachman para en realizar un análisis profundo del negocio con la finalidad de alinear el proceso evaluado a los objetivos estratégicos del negocio. En cuanto respecta al diseño del Modelado de los procesos de negocio se utilizó la notación BPMN. Este estándar nos permitirá mejorar la descomposición y modularización de las actividades que se involucran en los procesos. Finalmente, la presente solución de BI & Analytics busca ser parte del cambio continuo y estar alineados a los objetivos estratégicos de la empresa.<br>This thesis project aims to analyze, design and model the software architecture for the PMO management process. This architectural model will be used as a support base for the call and ticket management processes of the IT consultancy “NECSIA”. The purpose of this project is to solve the problematic situation of the mentioned process that will be part of an in-depth analysis which will be detailed later. The critical point of this problematic situation is that many data extraction, transformation and homologation activities are carried out manually, which prevents a correct centralization of the data flow in the company. This project proposes a BI and Analytics solution that highlights the 4C architectural model that will integrate the various sources of information in a unified repository in the Cloud. For this reason, adequate data management and governance can be obtained, especially in its historical calculations of the projects involved in the PMO Management process. In this context, the document to be presented proposes the use of the Zachman framework to carry out an in-depth analysis of the business in order to align the evaluated process with the strategic objectives of the business. Regarding the design of the Business Process Modeling, the BPMN notation was used. This standard will allow us to improve the decomposition and modularization of the activities that are involved in the processes. Finally, the present BI & Analytics solution seeks to be part of the continuous change and be aligned with the strategic objectives of the company.<br>Tesis
APA, Harvard, Vancouver, ISO, and other styles
36

Lemon, Alexander Michael. "A Shared-Memory Coupled Architecture to Leverage Big Data Frameworks in Prototyping and In-Situ Analytics for Data Intensive Scientific Workflows." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7545.

Full text
Abstract:
There is a pressing need for creative new data analysis methods whichcan sift through scientific simulation data and produce meaningfulresults. The types of analyses and the amount of data handled by currentmethods are still quite restricted, and new methods could providescientists with a large productivity boost. New methods could be simpleto develop in big data processing systems such as Apache Spark, which isdesigned to process many input files in parallel while treating themlogically as one large dataset. This distributed model, combined withthe large number of analysis libraries created for the platform, makesSpark ideal for processing simulation output.Unfortunately, the filesystem becomes a major bottleneck in any workflowthat uses Spark in such a fashion. Faster transports are notintrinsically supported by Spark, and its interface almost denies thepossibility of maintainable third-party extensions. By leveraging thesemantics of Scala and Spark's recent scheduler upgrades, we forceco-location of Spark executors with simulation processes and enable fastlocal inter-process communication through shared memory. This provides apath for bulk data transfer into the Java Virtual Machine, removing thecurrent Spark ingestion bottleneck.Besides showing that our system makes this transfer feasible, we alsodemonstrate a proof-of-concept system integrating traditional HPC codeswith bleeding-edge analytics libraries. This provides scientists withguidance on how to apply our libraries to gain a new and powerful toolfor developing new analysis techniques in large scientific simulationpipelines.
APA, Harvard, Vancouver, ISO, and other styles
37

Veras, Richard Michael. "A Systematic Approach for Obtaining Performance on Matrix-Like Operations." Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/1011.

Full text
Abstract:
Scientific Computation provides a critical role in the scientific process because it allows us ask complex queries and test predictions that would otherwise be unfeasible to perform experimentally. Because of its power, Scientific Computing has helped drive advances in many fields ranging from Engineering and Physics to Biology and Sociology to Economics and Drug Development and even to Machine Learning and Artificial Intelligence. Common among these domains is the desire for timely computational results, thus a considerable amount of human expert effort is spent towards obtaining performance for these scientific codes. However, this is no easy task because each of these domains present their own unique set of challenges to software developers, such as domain specific operations, structurally complex data and ever-growing datasets. Compounding these problems are the myriads of constantly changing, complex and unique hardware platforms that an expert must target. Unfortunately, an expert is typically forced to reproduce their effort across multiple problem domains and hardware platforms. In this thesis, we demonstrate the automatic generation of expert level high-performance scientific codes for Dense Linear Algebra (DLA), Structured Mesh (Stencil), Sparse Linear Algebra and Graph Analytic. In particular, this thesis seeks to address the issue of obtaining performance on many complex platforms for a certain class of matrix-like operations that span across many scientific, engineering and social fields. We do this by automating a method used for obtaining high performance in DLA and extending it to structured, sparse and scale-free domains. We argue that it is through the use of the underlying structure found in the data from these domains that enables this process. Thus, obtaining performance for most operations does not occur in isolation of the data being operated on, but instead depends significantly on the structure of the data.
APA, Harvard, Vancouver, ISO, and other styles
38

Ibidunmoye, Olumuyiwa. "Performance anomaly detection and resolution for autonomous clouds." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-142033.

Full text
Abstract:
Fundamental properties of cloud computing such as resource sharing and on-demand self-servicing is driving a growing adoption of the cloud for hosting both legacy and new application services. A consequence of this growth is that the increasing scale and complexity of the underlying cloud infrastructure as well as the fluctuating service workloads is inducing performance incidents at a higher frequency than ever before with far-reaching impact on revenue, reliability, and reputation. Hence, effectively managing performance incidents with emphasis on timely detection, diagnosis and resolution has thus become a necessity rather than luxury. While other aspects of cloud management such as monitoring and resource management are experiencing greater automation, automated management of performance incidents remains a major concern. Given the volume of operational data produced by cloud datacenters and services, this thesis focus on how data analytics techniques can be used in the aspect of cloud performance management. In particular, this work investigates techniques and models for automated performance anomaly detection and prevention in cloud environments. To familiarize with developments in the research area, we present the outcome of an extensive survey of existing research contributions addressing various aspects of performance problem management in diverse systems domains. We discuss the design and evaluation of analytics models and algorithms for detecting performance anomalies in real-time behaviour of cloud datacenter resources and hosted services at different resolutions. We also discuss the design of a semi-supervised machine learning approach for mitigating performance degradation by actively driving quality of service from undesirable states to a desired target state via incremental capacity optimization. The research methods used in this thesis include experiments on real virtualized testbeds to evaluate aspects of proposed techniques while other aspects are evaluated using performance traces from real-world datacenters. Insights and outcomes from this thesis can be used by both cloud and service operators to enhance the automation of performance problem detection, diagnosis and resolution. They also have the potential to spur further research in the area while being applicable in related domains such as Internet of Things (IoT), industrial sensors as well as in edge and mobile clouds.<br>Grundläggande egenskaper för datormoln såsom resursdelning och självbetjäning driver ett växande nyttjande av molnet för internettjänster. En följd av denna tillväxt är att den underliggande molninfrastrukturens ökande storlek och komplexitet samt fluktuerade arbetsbelastning orsakar prestandaincidenter med högre frekvens än någonsin tidigare. En konsekvens av detta blir omfattande inverkan på intäkter, tillförlitlighet och rykte för de som äger tjänsterna. Det har därför blivit viktigt att snabbt och effektivt hantera prestandaincidenter med avseende på upptäckt, diagnos och korrigering. Även om andra aspekter av resurshantering för datormoln, som övervakning och resursallokering, på senare tid automatiserats i allt högre grad så är automatiserad hantering av prestandaincidenter fortfarande ett stort problem. Denna avhandling fokuserar på hur prestandahanteringen i molndatacenter kan förbättras genom användning av dataanalystekniker på de stora datamängder som produceras i de system som monitorerar prestanda hos datorresurser och tjänster. I synnerhet undersöks tekniker och modeller för automatisk upptäckt och förebyggande av prestandaanomalier i datormoln. För att kartlägga utvecklingen inom forskningsområdet presenterar vi resultatet av en omfattande undersökning av befintliga forskningsbidrag som behandlar olika aspekter av hantering av prestandaproblem inom i relevanta tillämpningsområden. Vi diskuterar design och utvärdering av analysmodeller och algoritmer för att upptäcka prestandaanomalier i realtid hos resurser och tjänster. Vi diskuterar också utformningen av ett maskininlärningsbaserat tillvägagångssätt för att mildra prestandaförluster genom att aktivt driva tjänsternas kvalitet från oönskade tillstånd till ett önskat målläge genom inkrementell kapacitetoptimering. Forskningsmetoderna som används i denna avhandling innefattar experiment på verkliga virtualiserade testmiljöer för att utvärdera aspekter av föreslagna tekniker medan andra aspekter utvärderas med hjälp av belastningsmönster från verkliga datacenter. Insikter och resultat från denna avhandling kan användas av både moln- och tjänsteoperatörer för att bättre automatisera detekteringen av prestandaproblem, inklusive dess diagnos och korrigering. Resultaten har också potential att uppmuntra vidare forskning inom området samtidigt som de är användbara inom relaterade områden som internet-av-saker, industriella sensorer, och storskaligt distribuerade moln eller telekomnätverk.<br>Cloud Control<br>eSSENCE
APA, Harvard, Vancouver, ISO, and other styles
39

Peiro, Sajjad Hooman. "Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers." Licentiate thesis, KTH, Programvaruteknik och Datorsystem, SCS, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193582.

Full text
Abstract:
In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures.  First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers.<br><p>QC 20161005</p>
APA, Harvard, Vancouver, ISO, and other styles
40

Dinter, Barbara, Lisa Frenzel, and Peter Gluchowski. "Tagungsband zum 20. Interuniversitären Doktorandenseminar Wirtschaftsinformatik." Technische Universität Chemnitz, 2017. https://monarch.qucosa.de/id/qucosa%3A20623.

Full text
Abstract:
Das Interuniversitäre Doktorandenseminar Wirtschaftsinformatik ist eine regelmäßige Veranstaltung, in deren Rahmen Doktoranden der Universitäten Chemnitz, Dresden, Freiberg, Halle, Ilmenau, Jena und Leipzig ihr Promotionsprojekt präsentieren und sich den kritischen Fragen der anwesenden Professoren und Doktoranden aller beteiligten Universitäten stellen. Auf diese Weise erhalten die Promovierenden wertvolles Feedback zu Vorgehen, Methodik und inhaltlichen Aspekten ihrer Arbeit, welches sie für ihre Promotion nutzen können. Darüber hinaus bietet das Interuniversitäre Doktorandenseminar Wirtschaftsinformatik eine Plattform für eine fachliche Auseinandersetzung mit aktuellen Themen und sich ankündigenden Trends in der Forschung der Wirtschaftsinformatik. Zudem wird ein akademischer Diskurs über die Grenzen der jeweils eigenen Schwerpunkte der Professur hinaus ermöglicht. Das nunmehr 20. Jubiläum des Doktorandenseminars fand in Chemnitz statt. Der daraus entstandene Tagungsband enthält fünf ausgewählte Beiträge zu den Themenfeldern Service Engineering, Cloud-Computing, Geschäftsprozessmanagement, Requirements Engineering, Analytics und Datenqualität und zeigt damit anschaulich die Aktualität und Relevanz, aber auch die thematische Breite der gegenwärtigen Forschung im Bereich Wirtschaftsinformatik.<br>The inter-university PhD seminar Business Information Systems (“Interuniversitäres Doktorandenseminar Wirtschaftsinformatik”) is an annual one-day event which is organized by the Business Information Systems chairs of the universities of Chemnitz, Dresden, Freiberg, Halle, Ilmenau, Jena and Leipzig. It serves as a platform for PhD students to present their PhD topic and the current status of the thesis. Therefore, the seminar is a good opportunity to gain further knowledge and inspiration based on the feedback and questions of the participating professors and students. The 20th Interuniversitäre Doktorandenseminar Wirtschaftsinformatik took place in Chemnitz in October 2016. The resulting proceedings include five selected articles within the following topic areas: service engineering, cloud computing, business process management, requirements engineering, analytics und data quality. They illustrate the relevance as well as the broad range of topics in current business information systems research. In case of questions and comments, please use the contact details at the end of the articles.
APA, Harvard, Vancouver, ISO, and other styles
41

Gales, Mathis. "Collaborative map-exploration around large table-top displays: Designing a collaboration interface for the Rapid Analytics Interactive Scenario Explorer toolkit." Thesis, Ludwig-Maximilians-University Munich, 2018. https://eprints.qut.edu.au/115909/1/Master_Thesis_Mathis_Gales_final_opt.pdf.

Full text
Abstract:
Sense-making of spatial data on an urban level and large-scale decisions on new infrastructure projects need teamwork from experts with varied backgrounds. Technology can facilitate this collaboration process and magnify the effect of collective intelligence. Therefore, this work explores new useful collaboration interactions and visualizations for map-exploration software with a strong focus on usability. Additionally, for same-time and same-place group work, interactive table-top displays serve as a natural platform. Thus, the second aim of this project is to develop a user-friendly concept for integrating table-top displays with collaborative map-exploration. To achieve these goals, we continuously adapted the user-interface of the map-exploration software RAISE. We adopted a user-centred design approach and a simple iterative interaction design lifecycle model. Alternating between quick prototyping and user-testing phases, new design concepts were assessed and consequently improved or rejected. The necessary data was gathered through continuous dialogue with users and experts, a participatory design workshop, and a final observational study. Adopting a cross-device concept, our final prototype supports sharing information between a user’s personal device and table-top display(s). We found that this allows for a comfortable and practical separation between private and shared workspaces. The tool empowers users to share the current camera-position, data queries, and active layers between devices and with other users. We generalized further findings into a set of recommendations for designing user-friendly tools for collaborative map-exploration. The set includes recommendations regarding the sharing behaviour, the user-interface design, and the idea of playfulness in collaboration.
APA, Harvard, Vancouver, ISO, and other styles
42

Tosson, Amir [Verfasser], and Ullrich [Gutachter] Pietsch. "The way to a smarter community: exploring and exploiting data modeling, big data analytics, high-performance computing and artificial intelligence techniques for applications of 2D energy-dispersive detectors in the crystallography community / Amir Tosson ; Gutachter: Ullrich Pietsch." Siegen : Universitätsbibliothek der Universität Siegen, 2020. http://d-nb.info/1216332282/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

ATTANASIO, ANTONIO. "Mining Heterogeneous Urban Data at Multiple Granularity Layers." Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2709888.

Full text
Abstract:
The recent development of urban areas and of the new advanced services supported by digital technologies has generated big challenges for people and city administrators, like air pollution, high energy consumption, traffic congestion, management of public events. Moreover, understanding the perception of citizens about the provided services and other relevant topics can help devising targeted actions in the management. With the large diffusion of sensing technologies and user devices, the capability to generate data of public interest within the urban area has rapidly grown. For instance, different sensors networks deployed in the urban area allow collecting a variety of data useful to characterize several aspects of the urban environment. The huge amount of data produced by different types of devices and applications brings a rich knowledge about the urban context. Mining big urban data can provide decision makers with knowledge useful to tackle the aforementioned challenges for a smart and sustainable administration of urban spaces. However, the high volume and heterogeneity of data increase the complexity of the analysis. Moreover, different sources provide data with different spatial and temporal references. The extraction of significant information from such diverse kinds of data depends also on how they are integrated, hence alternative data representations and efficient processing technologies are required. The PhD research activity presented in this thesis was aimed at tackling these issues. Indeed, the thesis deals with the analysis of big heterogeneous data in smart city scenarios, by means of new data mining techniques and algorithms, to study the nature of urban related processes. The problem is addressed focusing on both infrastructural and algorithmic layers. In the first layer, the thesis proposes the enhancement of the current leading techniques for the storage and elaboration of Big Data. The integration with novel computing platforms is also considered to support parallelization of tasks, tackling the issue of automatic scaling of resources. At algorithmic layer, the research activity aimed at innovating current data mining algorithms, by adapting them to novel Big Data architectures and to Cloud computing environments. Such algorithms have been applied to various classes of urban data, in order to discover hidden but important information to support the optimization of the related processes. This research activity focused on the development of a distributed framework to automatically aggregate heterogeneous data at multiple temporal and spatial granularities and to apply different data mining techniques. Parallel computations are performed according to the MapReduce paradigm and exploiting in-memory computing to reach near-linear computational scalability. By exploring manifold data resolutions in a relatively short time, several additional patterns of data can be discovered, allowing to further enrich the description of urban processes. Such framework is suitably applied to different use cases, where many types of data are used to provide insightful descriptive and predictive analyses. In particular, the PhD activity addressed two main issues in the context of urban data mining: the evaluation of buildings energy efficiency from different energy-related data and the characterization of people's perception and interest about different topics from user-generated content on social networks. For each use case within the considered applications, a specific architectural solution was designed to obtain meaningful and actionable results and to optimize the computational performance and scalability of algorithms, which were extensively validated through experimental tests.
APA, Harvard, Vancouver, ISO, and other styles
44

Payne, Pepita. "Women and computing : a discourse analytic study /." Title page, contents and abstract only, 1995. http://web4.library.adelaide.edu.au/theses/09AR.PS/09ar.psp346.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Collet, Julien. "Exploration of parallel graph-processing algorithms on distributed architectures." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2391/document.

Full text
Abstract:
Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure<br>With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture
APA, Harvard, Vancouver, ISO, and other styles
46

Borke, Lukas. "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA." Doctoral thesis, Humboldt-Universität zu Berlin, 2017. http://dx.doi.org/10.18452/18307.

Full text
Abstract:
Mit der wachsenden Popularität von GitHub, dem größten Online-Anbieter von Programm-Quellcode und der größten Kollaborationsplattform der Welt, hat es sich zu einer Big-Data-Ressource entfaltet, die eine Vielfalt von Open-Source-Repositorien (OSR) anbietet. Gegenwärtig gibt es auf GitHub mehr als eine Million Organisationen, darunter solche wie Google, Facebook, Twitter, Yahoo, CRAN, RStudio, D3, Plotly und viele mehr. GitHub verfügt über eine umfassende REST API, die es Forschern ermöglicht, wertvolle Informationen über die Entwicklungszyklen von Software und Forschung abzurufen. Unsere Arbeit verfolgt zwei Hauptziele: (I) ein automatisches OSR-Kategorisierungssystem für Data Science Teams und Softwareentwickler zu ermöglichen, das Entdeckbarkeit, Technologietransfer und Koexistenz fördert. (II) Visuelle Daten-Exploration und thematisch strukturierte Navigation innerhalb von GitHub-Organisationen für reproduzierbare Kooperationsforschung und Web-Applikationen zu etablieren. Um Mehrwert aus Big Data zu generieren, ist die Speicherung und Verarbeitung der Datensemantik und Metadaten essenziell. Ferner ist die Wahl eines geeigneten Text Mining (TM) Modells von Bedeutung. Die dynamische Kalibrierung der Metadaten-Konfigurationen, TM Modelle (VSM, GVSM, LSA), Clustering-Methoden und Clustering-Qualitätsindizes wird als "Smart Clusterization" abgekürzt. Data-Driven Documents (D3) und Three.js (3D) sind JavaScript-Bibliotheken, um dynamische, interaktive Datenvisualisierung zu erzeugen. Beide Techniken erlauben Visuelles Data Mining (VDM) in Webbrowsern, und werden als D3-3D abgekürzt. Latent Semantic Analysis (LSA) misst semantische Information durch Kontingenzanalyse des Textkorpus. Ihre Eigenschaften und Anwendbarkeit für Big-Data-Analytik werden demonstriert. "Smart clusterization", kombiniert mit den dynamischen VDM-Möglichkeiten von D3-3D, wird unter dem Begriff "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA" zusammengefasst.<br>With the growing popularity of GitHub, the largest host of source code and collaboration platform in the world, it has evolved to a Big Data resource offering a variety of Open Source repositories (OSR). At present, there are more than one million organizations on GitHub, among them Google, Facebook, Twitter, Yahoo, CRAN, RStudio, D3, Plotly and many more. GitHub provides an extensive REST API, which enables scientists to retrieve valuable information about the software and research development life cycles. Our research pursues two main objectives: (I) provide an automatic OSR categorization system for data science teams and software developers promoting discoverability, technology transfer and coexistence; (II) establish visual data exploration and topic driven navigation of GitHub organizations for collaborative reproducible research and web deployment. To transform Big Data into value, in other words into Smart Data, storing and processing of the data semantics and metadata is essential. Further, the choice of an adequate text mining (TM) model is important. The dynamic calibration of metadata configurations, TM models (VSM, GVSM, LSA), clustering methods and clustering quality indices will be shortened as "smart clusterization". Data-Driven Documents (D3) and Three.js (3D) are JavaScript libraries for producing dynamic, interactive data visualizations, featuring hardware acceleration for rendering complex 2D or 3D computer animations of large data sets. Both techniques enable visual data mining (VDM) in web browsers, and will be abbreviated as D3-3D. Latent Semantic Analysis (LSA) measures semantic information through co-occurrence analysis in the text corpus. Its properties and applicability for Big Data analytics will be demonstrated. "Smart clusterization" combined with the dynamic VDM capabilities of D3-3D will be summarized under the term "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA".
APA, Harvard, Vancouver, ISO, and other styles
47

Nigam, Atish 1981. "Analytical techniques for debugging pervasive computing environments." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/17962.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.<br>Includes bibliographical references (p. 63-65).<br>User level debugging of pervasive environments is important as it provides the ability to observe changes that occur in a pervasive environment and fix problems that result from these changes, especially since pervasive environments may from time to time exhibit unexpected behavior. Simple keepalive messages can not always uncover the source of this behavior because systems can be in an incorrect state while continuing to output information or respond to basic queries. The traditional approach to debugging distributed systems is to instrument the entire environment. This does not work when the environments are cobbled together from systems built around different operating systems, programming languages or platforms. With systems from such disparate backgrounds, it is hard to create a stable pervasive environment. We propose to solve this problem by requiring each system and component to provide a health metric that gives an indication of its current status. Our work has shown that, when monitored at a reasonable rate, simple and cheap metrics can reveal the cause of many problems within pervasive environments. The two metrics that will be focused on in this thesis are transmission rate and transmission data analysis. Algorithms for implementing these metrics, within the stated assumptions of pervasive environments, will be explored along with an analysis of these implementations and the results they provided. Furthermore, a system design will be described in which the tools used to analyze the metrics compose an out of bound monitoring system that retains a level of autonomy from the pervasive environment. The described system provides many advantages and additionally operates under the given assumptions regarding the resources available<br>(cont.) within a pervasive environment.<br>by Atish Nigam.<br>M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
48

Henwood, Cynthia E. (Cynthia Elsie) Carleton University Dissertation Engineering Mechanical. "An analytical model for computing weld microstructures." Ottawa, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
49

Slavětínský, Radek. "Analýza cloudových řešení Business Intelligence pro SME." Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-358847.

Full text
Abstract:
The thesis is focused on the analysis of presently offered products supporting Business Intelligence (BI) which are affordable for small and medium-sized enterprises (SMEs). Current BI solutions available to SMEs are mostly offered via Cloud computing, specifically in the form of Software as a Service (SaaS) as it requires low initial acquisition costs. The objectives of this thesis are to analyse the work in applications for BI in cloud that can be used by SMEs and to analyse in detail the comparison the worldwide extended reporting tools distributed as SaaS in the lower price category. The theoretical part provides a description of the Cloud computing and the BI system. In the practical part are selected following products: IBM Watson Analytics, Qlik Sense Cloud, Zoho Reports, Tableau Public and Microsoft Power BI. Practical testing of these applications was based on evaluation of the selected metrics with weights calculated by using the Fuller's triangle. Analyses and the information form the basis for comparison of selected applications. The contribution of this thesis is in discovering the strengths and weaknesses of these BI solutions. The output of this thesis can be used as a source for the selection of BI applications for SMEs.
APA, Harvard, Vancouver, ISO, and other styles
50

Reda, Roberto. "A Semantic Web approach to ontology-based system: integrating, sharing and analysing IoT health and fitness data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14645/.

Full text
Abstract:
With the rapid development of fitness industry, Internet of Things (IoT) technology is becoming one of the most popular trends for the health and fitness areas. IoT technologies have revolutionised the fitness and the sport industry by giving users the ability to monitor their health status and keep track of their training sessions. More and more sophisticated wearable devices, fitness trackers, smart watches and health mobile applications will appear in the near future. These systems do collect data non-stop from sensors and upload them to the Cloud. However, from a data-centric perspective the landscape of IoT fitness devices and wellness appliances is characterised by a plethora of representation and serialisation formats. The high heterogeneity of IoT data representations and the lack of common accepted standards, keep data isolated within each single system, preventing users and health professionals from having an integrated view of the various information collected. Moreover, in order to fully exploit the potential of the large amounts of data, it is also necessary to enable advanced analytics over it, thus achieving actionable knowledge. Therefore, due the above situation, the aim of this thesis project is to design and implement an ontology based system to (1) allow data interoperability among heterogeneous IoT fitness and wellness devices, (2) facilitate the integration and the sharing of information and (3) enable advanced analytics over the collected data (Cognitive Computing). The novelty of the proposed solution lies in exploiting Semantic Web technologies to formally describe the meaning of the data collected by the IoT devices and define a common communication strategy for information representation and exchange.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!