Dissertations / Theses on the topic 'Big data analysis'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Big data analysis.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Uřídil, Martin. "Big data - použití v bankovní sféře." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-149908.
Full textMagnusson, Jonathan. "Social Network Analysis Utilizing Big Data Technology." Thesis, Uppsala universitet, Avdelningen för datalogi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-170926.
Full textŠoltýs, Matej. "Big Data v technológiách IBM." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193914.
Full textKumar, Abhinav. "SensAnalysis: A Big Data Platform for Vibration-Sensor Data Analysis." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/101529.
Full textMaster of Science
Santos, Lúcio Fernandes Dutra. "Similaridade em big data." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07022018-104929/.
Full textThe data being collected and generated nowadays increase not only in volume, but also in complexity, requiring new query operators. Health care centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of application domains where more powerful and flexible operators are required. Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being referred to as big data. Representing and querying big data using only the traditional scalar data types are not enough anymore. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now that they are starting to become available, its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only a few studies exist targeting the efficacy of the query answers. This Ph.D. work aims at developing variations for the basic similarity operators to propose better suited similarity operators to handle big data, presenting a holistic vision about the database, increasing the effectiveness of the provided answers, but without causing impact on the efficiency on the searching algorithms. To achieve this goal, four mainly contributions are presented: The first one was a result diversification model that can be applied in any comparison criteria and similarity search operator. The second one focused on defining sampling and grouping techniques with the proposed diversification model aiming at speeding up the analysis task of the result sets. The third contribution concentrated on evaluation methods for measuring the quality of diversified result sets. Finally, the last one defines an approach to integrate the concepts of visual data mining and similarity with diversity searches in content-based retrieval systems, allowing a better understanding of how the diversity property is applied in the query process.
Zhu, Shuxiang. "Big Data System to Support Natural Disaster Analysis." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1592404690195316.
Full textIslam, Md Zahidul. "A Cloud Based Platform for Big Data Science." Thesis, Linköpings universitet, Programvara och system, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-103700.
Full textSantos, Rivera Juan De Dios. "Data Analysis on Hadoop - finding tools and applications for Big Data challenges." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-260557.
Full textPragarauskaitė, Julija. "Frequent pattern analysis for decision making in big data." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2013. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2013~D_20130701_092451-80961.
Full textDidžiuliai informacijos kiekiai yra sukaupiami kiekvieną dieną pasaulyje bei jie sparčiai auga. Apytiksliai duomenų tyrybos algoritmai yra labai svarbūs analizuojant tokius didelius duomenų kiekius, nes algoritmų greitis yra ypač svarbus daugelyje sričių, tuo tarpu tikslieji metodai paprastai yra lėti bei naudojami tik uždaviniuose, kuriuose reikalingas tikslus atsakymas. Ši disertacija analizuoja kelias duomenų tyrybos sritis: dažnų sekų paiešką bei vizualizaciją sprendimų priėmimui. Dažnų sekų paieškai buvo pasiūlyti trys nauji apytiksliai metodai, kurie buvo testuojami naudojant tikras bei dirbtinai sugeneruotas duomenų bazes: • Atsitiktinės imties metodas (Random Sampling Method - RSM) formuoja pradinės duomenų bazės atsitiktinę imtį ir nustato dažnas sekas, remiantis atsitiktinės imties analizės rezultatais. Šio metodo privalumas yra teorinis paklaidų tikimybių įvertinimas, naudojantis standartiniais statistiniais metodais. • Daugybinio perskaičiavimo metodas (Multiple Re-sampling Method - MRM) yra RSM metodo patobulinimas, kuris formuoja kelias pradinės duomenų bazės atsitiktines imtis ir taip sumažina paklaidų tikimybes. • Markovo savybe besiremiantis metodas (Markov Property Based Method - MPBM) kelis kartus skaito pradinę duomenų bazę, priklausomai nuo Markovo proceso eilės, bei apskaičiuoja empirinius dažnius remdamasis Markovo savybe. Didelio duomenų kiekio vizualizavimui buvo naudojami pirkėjų internetu elgsenos duomenys, kurie analizuojami naudojant... [toliau žr. visą tekstą]
Kinuthia, Charles, and Ming Peng. "Hardware Implementation and Analysis of a Big Data Algorithm." Thesis, KTH, Skolan för elektro- och systemteknik (EES), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-200611.
Full textSu, Yu. "Big Data Management Framework based on Virtualization and Bitmap Data Summarization." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420738636.
Full textSohangir, Soroosh. "MACHINE LEARNING ALGORITHM PERFORMANCE OPTIMIZATION: SOLVING ISSUES OF BIG DATA ANALYSIS." OpenSIUC, 2015. https://opensiuc.lib.siu.edu/dissertations/1111.
Full textLu, Feng. "Big data scalability for high throughput processing and analysis of vehicle engineering data." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-207084.
Full textBin, Saip Mohamed A. "Big Social Data Analytics: A Model for the Public Sector." Thesis, University of Bradford, 2019. http://hdl.handle.net/10454/18352.
Full textUniversiti Utara Malaysia
Chaudhuri, Abon. "Geometric and Statistical Summaries for Big Data Visualization." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1382235351.
Full textRivetti, di Val Cervo Nicolo. "Efficient Stream Analysis and its Application to Big Data Processing." Thesis, Nantes, 2016. http://www.theses.fr/2016NANT4046/document.
Full textNowadays stream analysis is used in many context where the amount of data and/or the rate at which it is generated rules out other approaches (e.g., batch processing). The data streaming model provides randomized and/or approximated solutions to compute specific functions over (distributed) stream(s) of data-items in worst case scenarios, while striving for small resources usage. In particular, we look into two classical and related data streaming problems: frequency estimation and (distributed) heavy hitters. A less common field of application is stream processing which is somehow complementary and more practical, providing efficient and highly scalable frameworks to perform soft real-time generic computation on streams, relying on cloud computing. This duality allows us to apply data streaming solutions to optimize stream processing systems. In this thesis, we provide a novel algorithm to track heavy hitters in distributed streams and two extensions of a well-known algorithm to estimate the frequencies of data items. We also tackle two related problems and their solution: provide even partitioning of the item universe based on their weights and provide an estimation of the values carried by the items of the stream. We then apply these results to both network monitoring and stream processing. In particular, we leverage these solutions to perform load shedding as well as to load balance parallelized operators in stream processing systems
Aring, Danielle C. "Integrated Real-Time Social Media Sentiment Analysis Service Using a Big Data Analytic Ecosystem." Cleveland State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=csu1494359605127555.
Full textAbu, Salih Bilal Ahmad Abdal Rahman. "Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing." Thesis, Curtin University, 2018. http://hdl.handle.net/20.500.11937/70285.
Full textNhlabano, Valentine Velaphi. "Fast Data Analysis Methods For Social Media Data." Diss., University of Pretoria, 2018. http://hdl.handle.net/2263/72546.
Full textDissertation (MSc)--University of Pretoria, 2019.
National Research Foundation (NRF) - Scarce skills
Computer Science
MSc
Unrestricted
Giannini, Andrea. "Social Network Analysis: Architettura Streaming Big Data di Raccolta e Analisi Dati da Twitter." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25378/.
Full textAbounia, Omran Behzad. "Application of Data Mining and Big Data Analytics in the Construction Industry." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu148069742849934.
Full textDourado, Jonas Rossi. "Delayed Transfer Entropy applied to Big Data." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/18/18153/tde-19022019-134228/.
Full textA recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas.
Wei, Jinliang. "Parallel Analysis of Aspect-Based Sentiment Summarization from Online Big-Data." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505264/.
Full textCao, Hongfei. "High-throughput Visual Knowledge Analysis and Retrieval in Big Data Ecosystems." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13877134.
Full textVisual knowledge plays an important role in many highly skilled applications, such as medical diagnosis, geospatial image analysis and pathology diagnosis. Medical practitioners are able to interpret and reason about diagnostic images based on not only primitive-level image features such as color, texture, and spatial distribution but also their experience and tacit knowledge which are seldom articulated explicitly. This reasoning process is dynamic and closely related to real-time human cognition. Due to a lack of visual knowledge management and sharing tools, it is difficult to capture and transfer such tacit and hard-won expertise to novices. Moreover, many mission-critical applications require the ability to process such tacit visual knowledge in real time. Precisely how to index this visual knowledge computationally and systematically still poses a challenge to the computing community.
My dissertation research results in novel computational approaches for highthroughput visual knowledge analysis and retrieval from large-scale databases using latest technologies in big data ecosystems. To provide a better understanding of visual reasoning, human gaze patterns are qualitatively measured spatially and temporally to model observers’ cognitive process. These gaze patterns are then indexed in a NoSQL distributed database as a visual knowledge repository, which is accessed using various unique retrieval methods developed through this dissertation work. To provide meaningful retrievals in real time, deep-learning methods for automatic annotation of visual activities and streaming similarity comparisons are developed under a gaze-streaming framework using Apache Spark.
This research has several potential applications that offer a broader impact among the scientific community and in the practical world. First, the proposed framework can be adapted for different domains, such as fine arts, life sciences, etc. with minimal effort to capture human reasoning processes. Second, with its real-time visual knowledge search function, this framework can be used for training novices in the interpretation of domain images, by helping them learn experts’ reasoning processes. Third, by helping researchers to understand human visual reasoning, it may shed light on human semantics modeling. Finally, integrating reasoning process with multimedia data, future retrieval of media could embed human perceptual reasoning for database search beyond traditional content-based media retrievals.
Sahasrabudhe, Aditya. "NBA 2020 Finals: Big Data Analysis of Fans’ Sentiments on Twitter." Ohio University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1619784186362291.
Full textLi, Yen-de, and 李彥德. "Data Visualization Analysis of Big Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/rbkvqe.
Full text義守大學
資訊管理學系
105
The evolution of time leads everything happened in the daily life. Both the advances of science and technology and the innovations of materials are important factors of the advance. The Internet promotes more interactions and produces much data. For the accumulation of transactional data from business operation are increasing rapidly, so the study for analyzing big data becomes more and more worth to discover the hidden information among the huge data in hand. In general, the technique of data visualization can quickly provide users with a better way to understand the data. It also avoids the inconvenience caused by too complicated information. In this thesis, two business intelligence platforms, named Opview and Tableau, are applied to analyze the big data produced by telecom industry. First, this work explores the top five telecom companies in Taiwan, such as Chunghwa Telecom, Taiwan Mobile, Far EasTone Telecommunications, Taiwan Star Telecom, and Asia Pacific Telecom, by using Opview social listening platform. For the five telecommunications companies, five key factors are setted for analysis, including broad band, receiving, network, wifi and hot spots. From the result of our analyses by way of Opview implementations, we concluded that the significant factors concerned from end users are the network, broad band and receiving. Thereafter, this study adopts Tableau software to analyze the data of the broad band provided by Chunghwa Telecom to find out the demand analysis, the amount of varieties of bandwith in late years and to locate the concerns of users by way of the presentation of visional graphics. Moreover, this study analyzes the data with sixty thousand records, which are provided by Chunghwa Telecom, by using the business intelligence platform of Tableau. The analyzed results of this study can provide a valuble reference of visualization research for follow-up researchers.
Lima, Luciana Cristina Barbosa de. "Big data for data analysis in financial industry." Master's thesis, 2014. http://hdl.handle.net/1822/34919.
Full textTechnological evolution and the consequent increase of the society and organization dependency was an important driver for the escalation of volume and variety of data. At the same time, market evolution requires the capability to find new paths to improve the products/services, client satisfaction and avoid the cost increase associated with it. Big Data comes up with huge power, not only by the ability of processing large amounts and variety of data at a high velocity, but also by the capability to create value for the organizations that include it in their operational and decision making processes. The relevance of Big Data use for the different industries and how to implement a Big Data solution is something that raises many doubts and discussion. Thus, this paper comes with a business orientation so it will be possible to understand what Big Data actually means for organizations. The project follows a top-down approach which is done in the first instance, an overview on what defines Big Data and what distinguishes it. As it evolves, it directs the focus to the Big Data contribution at an organizational level and the existing market offers. The decomposition of the problem closes with two main contributions. A framework that helps to identify a problem: Big Data and a trial of a case study that identifies the correlation between financial news articles and the change in the stock exchange. The outcome of this trial was a platform with analytic and predictive capabilities in this new Big Data context.
A evolução tecnológica e consequente aumento da dependência da sociedade e organizações levou, nos últimos anos, ao crescimento exorbitante do volume e variedade de dados existentes. Ao mesmo tempo, a evolução do mercado exige às organizações a capacidade de encontrarem novas formas de melhorarem os seus produtos/serviços, satisfazer os seus clientes e evitar o aumento de custos para atingir esses objetivos. O Big Data surge em grande força, apresentando um elevada capacidade de processar a alta velocidade grandes quantidades e variedade dos dados. Este conceito tem evoluído pela sua capacidade de gerar valor às organizações que o incluem nos seus processos operacionais e na tomada de decisão. A pertinência da utilização do Big Data pelas organizações dos mais diversos sectores e a forma como se poderá implementar uma solução Big Data é algo que ainda suscita várias dúvidas e alguma discussão Desta forma, o presente documento, surge com uma orientação ao negócio de modo a que seja possível entender o que o Big Data representa na verdade para as organizações. O projecto segue uma abordagem top-down onde é feito, numa primeira instância, um síntese sobre o que define o Big Data e o que o distingue. À medida que o projeto evolui, o foco direciona-se para o contributo do Big Data a nível organizacional e quais as ofertas existentes nos mercado. A decomposição do problema culmina com dois principais contributos. Uma framework que ajuda à identificação de um problema, como sendo Big Data e experimentação de um caso de estudo que identifica a correlação entre artigos de notícias financeiras e a variação da bolsa de valores. Como resultado desta experimentação foi desenvolvida uma plataforma com capacidades analíticas e preditivas neste novo contexto do Big Data.
Pereira, Flávia Patricia Alves. "Big data e data analysis: visualização de informação." Master's thesis, 2015. http://hdl.handle.net/1822/40106.
Full textA revolução da informação está abranger todas as organizações da sociedade moderna, forçando os especialistas na área de Tecnologias de Informação (TI) a transformar os seus processos de aprendizagem para a criação de valor. As tecnologias produzem e armazenam uma grande quantidade de dados para posteriormente serem produzidas informações. Entender o conjunto heterogéneo de dados e passar a reconhecer dados com significado é o grande objetivo do conceito Big Data. A necessidade de compreender e extrair conhecimento a partir do grande conjunto de dados é um processo difícil mas essencial para as organizações que lidam com informação. Neste contexto é necessário aplicar um processo de análise, limpeza e transformação de dados, denominado de Data Analysis. Este processo conduz o utilizador à escolha da técnica mais adequada perante objetivo da sua análise. A técnica estudada nesta dissertação será a Visualização de Informação (VI). A VI nesta dissertação é estudada com o principal objetivo de se transmitir informação de uma forma clara e efetiva através da utilização de representações gráficas. A mapificação de dados em estruturas visuais (representações gráficas) possibilitam uma vista detalhada sobre o contexto de dados e das suas relações. Os métodos e técnicas de Visualização de Informação evoluíram nas últimas décadas, em consonância com o avanço tecnológico galopante, daqui surge a necessidade de reformular o Modelo do Processo de Visualização, para facilitar a criação de uma representação visual. O objetivo principal centra-se na otimização dos métodos de Visualização – produzir uma representação clara e eficiente tem como principal finalidade potencializar a apropriação de dados por meio de representações gráficas. Para este objetivo foi formulado uma Classificação: “Representação Visual: o que pretendo transmitir”, que contempla o estudo de gráficos e das análises que surgem quando se pretende descobrir ou comunicar padrões e tendências nos dados. A Classificação foi construída como artefacto, com o propósito específico de ajudar o utilizador a decidir qual o gráfico mais adequado para evidenciar um tipo de análise. Para o estudo desta dissertação optou-se por aplicar como abordagem metodológica o Design Science Research, para a classificação sistemática de conceitos e construção da classificação. O utilizador como agente crucial durante o Método de Processo de Visualização: deve ser capaz de perceber qual é a análise mais apropriada para os seus dados e qual o tipo de gráfico mais rentável para o seu trabalho.
The revolution of information reaches all organizations of modern society, forcing experts in the field of Information Technology (IT) to transform their learning processes to create more value. Technology produces and stores a large quantity of data to be able to produce information afterwards. To understand the heretogeneous data and recognize data that matters is the ultimate goal of the concept of Big Data. The need to understand and extract information from a large group of data is a hard but essential process for organizations that deal with information. In this context comes the need analyze, clean and transform data. This process is called Data Analysis. The process guides the user to use the most suitable technique depending on the purpose of his analysis. The technique studied in this thesis will be Information Visualization (IV). The IV in this thesis is studied with the main purpose of transmitting information in a clear and effective way through the use of graphic representations. The mapification of data into visual structures (graphic representations) provide a detailed view of the data context and their relations. The methods and techniques of IV have evolved in the last decades, in line with the rampant technological progress, hence there is the need to redesign the Model of Visualization Process, to further the making of a visual representation. The main goal focus on the optimization of Visualization methods - the main purpose of producing a clear and efficient representation is to enhance the appropriation of data via graphic representations. To attain this purpose was formulated a classification: "Visual Representation: what I intend to transmit", which comtemplates the study of graphics and the analysis that arises when one wants to find or communicate patterns and trends in data. The classification was built as an artefact, with the specific purpose of helping the user to decide which graphic is the most suitable for a certain type of analysis. For the study of this thesis it was chosen Design Science Research to apply as methodological approach, for the systematic classification of concepts and the construction of classification. The user, as a key agent during the Visualization Process Method: should be able to acknowledge which is the most appropriate analysis for his data and what type of graphic is the most profitable for his work.
WANG, SHAO-SIANG, and 王紹祥. "Comparative analysis of Big Data and Data Mining." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/r23d77.
Full text銘傳大學
資訊管理學系碩士班
106
With the rapid development of information technology, the amount of data is generated at a high speed every day, and finding useful information in a large amount of information is an important issue for both industrial development and academic research. The main purpose of this study is to explore the applicable scenarios of data mining and big data. Through the discussion of relevant research, this study aggregates the comparative information of big data and data exploration, including: architecture, usability, ease of use, economic aspect, enterprise scale, industrial type, ability to process data. This study will first compare the similarities and differences between the two, and then through the Modified Delphi method and AHP for big data, and data mining. Through the comparative analysis of the six facets of architecture, usability, ease of use, economics, industrial characteristics, and ability to process data, we will derive the applicable scenarios for data exploration and big data analysis.
Han, Meng. "INFLUENCE ANALYSIS TOWARDS BIG SOCIAL DATA." 2017. http://scholarworks.gsu.edu/cs_diss/121.
Full textTRIPATHI, ASHISH KUMAR. "BIG DATA ANALYSIS USING METAHEURISTIC ALGORITHMS." Thesis, 2018. http://dspace.dtu.ac.in:8080/jspui/handle/repository/16597.
Full textPai, Fu-Tzu, and 白馥慈. "Big Data Analysis for National Health Insurance Research Data." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/43894559097858528541.
Full text國立陽明大學
生物醫學資訊研究所
101
The usages of Big Data among medical researches and studies have been increased tremendously by years. One of those Dig Data is called the longitudinal medical claims data. In usual, medical claims data are hold by patients themselves or health insurance companies, but there is a difference in Taiwan. The National Health Insurance (NHI) Administration of Taiwan was established since March 1995; in other words, NHI manages over 99% medical claims data of citizens in past 18 years. All those data are stored in the National Health Insurance Research Database (NHIRD), which becomes an important data source of Evidence-based medicine (EBM) studies. According to statistics, more and more studies are based on the NHIRD. Due to the information overload and lack of domain-specific analysis tools of NHIRD, it is hard for researchers to extract valuable information from the database without learning any Structured Query Language (SQL). To improve the qualities and efficiencies of NHIRD related researches, this study aims to design a friendly and reusable web-based user interface, which allows users to interact with NHIRD directly without any prerequisite. The user interface is built on Ruby on Rails web framework and running on Ruby for cross platform compatibility. It runs with the data of Longitudinal Health Insurance Database 2005 under PostgreSQL in production mode. We present a flexible web interface that users can easily query database and do elementary analysis without programming expertise. It also dynamically draws statistical charts and calculates estimate number of total entries for every query result. Furthermore, it provides several pre-built query conditions for variety purposes and generates the download link of result data set, which can be used to do advanced analysis. It greatly simplifies the data access of NHIRD and assist associated studies more effectively.
Hidayati, Shintami Chusnul, and Shintami Chusnul Hidayati. "Fashion Style Analysis towards Multimedia Big Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/34383422410852321225.
Full text國立臺灣科技大學
資訊工程系
105
Driven by the huge profit potential in the fashion industry, intelligent fashion analysis may become an important subject in the multimedia and computer vision research. Traditional vision-based clothing research methods focused on analyzing fashion items based on either keywords given by users or low-level features specified by preferred samples. Instead of using less-discriminative low-level features or ambiguous keywords to analyze fashion items, this study proposes novel approaches that focus on clothing genre recognition and fashion trends analysis based on the visually-differentiable fashion style elements. A set of style elements that are crucial for recognizing clothing genres and analyzing fashion trends are identified based on the fashion design theory. In addition, the corresponding salient visual features of each style element are identified and formulated with variables that can be computationally derived with various computer vision algorithms. In terms of clothing genre recognition, we propose a novel classification technique to identify the genres of upperwear and lowerwear from full-body pictures through recognizing fundamental style elements of clothing design, such as collars, front buttons, and sleeves. We extract the representative features for describing style elements based on the spatial layout of body-parts. In addition, we make one step ahead to automatically classifying clothing genres by introducing the advantage of integrating local features of multimodality as the instances of prize-collecting Steiner tree (PCST) problem to discover clothing regions, and exploiting visual style elements to discover the clothing genre. Recognition results show that our clothing genre recognition frameworks have significant performance and superiority in comparison with the state-of-the-art recognition methods. Moreover, the effectiveness of each style element and its visual features on recognizing clothing genres are demonstrated through a set of experiments involving different sets of style elements or features. On the topic of fashion trend spotting, we aim to present a novel algorithm that automatically discovers visual style elements representing fashion trends for a certain season of fashion week events. The five major elements of fashion style (i.e. head decoration, color, silhouette, pattern, and footwear) are investigated in this framework. The trending styles are discovered based on the stylistic coherent and unique characteristics of fashion style elements. The experimental evaluations and analysis on a large number of catwalk show videos well demonstrate the effectiveness of our proposed method.
Hsieh, Cheng-Hsien, and 謝政賢. "Investment Analysis under the Big Data Algorithm." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/32032084438052470526.
Full text國立中興大學
財務金融學系所
103
This paper adopts the Apriori Algorithm of Big Data methods to calculate and in-vest in Taiwan stock market. First of all, we construct four types of sample by “Today’s daily returns are 6% and next day’s returns are 6%”, “Today’s daily returns are 6% and next day’s returns are -6%”, “Today’s daily returns are -6% and next day’s returns are -6%” and “Today’s daily returns are -6% and next day’s returns are 6%” for calculating to find rules. In empirical results, the “Today’s daily returns are 6% and next day’s re-turns are 6%” can only find out rules. After rules calculated by Apriori Algorithm, our investment performance can earn the positive accumulative annual return by rules in Taiwan stock market. Furthermore, compare with Taiwan’s market index, previous three years have positive abnormal return. Finally, compare to the strategy of benchmark by Mean-Variance model. The annual returns of Apriori Algorithm’s rules can beat the method of benchmark. We surmise the Mean-Variance model have the problem of standard error, hence the performance of Mean-Variance model is worse than Apriori Algorithm’s rules.
Vilares, António Alberto Legoinha. "Big data analytics : predictive consumer behaviour analysis." Master's thesis, 2017. http://hdl.handle.net/10362/24457.
Full textO trabalho realizado visa analisar o desempenho da utilização de ferramentas Big Data, para a componente de tratamento de dados e para a implementação de um algoritmo de Data Mining, nomeadamente FP-Growth para a extração de regras de associação, aplicadas ao registo de transações de produtos no mercado do retalho. Os dados extraídos visam analisar as transações realizadas pelos consumidores, de uma cadeia de supermercados, de forma a compreender quais os produtos que são adquiridas em simultâneo, análise denominada como Market Basket Analysis. Foram extraídos registos de um ano, com o histórico de compras de cada cliente. Cada registo contém todos os produtos adquiridos num espaço de um ano. Pretende-se utilizar a informação obtida para identificar produtos correlacionados, com vista a determinar quais os produtos que são frequentemente adquiridos em conjunto. Assim, pretende-se analisar os resultados obtidos e implementar novas estratégias de negócio, adaptando a oferta dos supermercados às preferências dos consumidores. Através de várias ferramentas do ecossistema Hadoop, foram analisados os dados visando eliminar qualquer inconsistência presente na base de dados e gerar novas variáveis para a aplicação de uma segmentação por perfil de consumidor e para a extração de regras de associação. Durante a execução do pré-processamento de dados foram utilizadas as ferramentas de SQL para criar um conjunto de KPIs que permitiu perceber o estado atual do negócio do supermercado. Na análise de clusters, foi decidido que seriam definidos 3 grupos. O primeiro cluster foi constituído pelos clientes de necessidades imediatas, o segundo por clientes de contas correntes e o terceiro por consumidores compulsivos. Para cada um dos clusters gerados foram identificadas um conjunto de regras de associação que permitiu entender os hábitos de consumo de cada tipo de cliente. A componente analítica foi implementada em Spark MLlib, em programação Scala. A utilização de Hadoop em conjunto com Spark permitiu a execução de forma integrada, um conjunto de funcionalidades, sendo possível recorrer a linguagens como SQL, HiveQL, Pig Latin, Python ou Scala numa única plataforma.
LIN, BO-JHEN, and 林帛箴. "The analysis of big data processing systems." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/44963153090913666880.
Full text國立暨南國際大學
資訊管理學系
104
This study first reviewed literature on big data technology and systems to compare the strengths and weaknesses of various systems. To provide users with a more efficient operating environment for big data systems, this study introduced a big data processing system capable of setting up systems semi-automatically as well as helping users solve system problems. The proposed system includes two major functions: (a) set up big data systems, and (b) provide solutions to system problems. The designed instructions can effectively assist users set up big data systems such as Spark and Hadoop; and thus the time involved with system set up for users is decreased When users experience system-related problems, such as ―NameNode not running‖ , ―DataNode not running‖, ―ssh password-free log in failure‖, ―no such file or directory,‖ and ―command not found.‖, users may employ the problem solution function to solve problems commonly encountered. For future study, more functions and problem solutions can be developed for the system. Such improvements could be beneficial to future big data system development as well as recommendations to related studies.
Chen, Zhen-Hua, and 陳珍華. "Big Data:Open Data and Realty Website Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/01934670164456410617.
Full text國立交通大學
資訊學院資訊學程
102
Information for buying a house is from friends, real-agent-web and register-real-price data for most people. However ,those data are in different places.There are no direct comparison. I build a model by using register-real-price data of Hsinchu County.First ,I observe data and delete irrelevant data.For example ,I delete office buildings for commercial purposes. Second,I use K-means clustering to get the conclusion.The average price of real-agent-web is higher than the average price of register-real-price .Third,I calculate ratios of real-agent-web 's price to register-real-price's price by the conditions of 「square feet 」and「age of building」. Fourth,I find some real instances to support the experiment.Fifth,I install Apache and MySQL in Ubuntu and write HTML and PHP. I use the UTF-8 character set to process Chinese words in the house-price data. I write a shell script.It can get data from data.gov.tw and tw.house.yahoo.com termly and automatically. I write Python code to process data.The program imports them to the database automatically.I apply for a web space in order to provide the service that analyzes house prices.The system compares the house-price information of real-agent-web and register-real-price in same counties,「square feet」and「age of building」. It also shows mean and standard deviation of price's ratios.I use 「Google Analytics 」to observe user's browsing behavior .I get users' feedback by questionnaires. In conclusion,the analysis of house prices is useful for consumers.
HUANG, HSIANG-YUN, and 黃湘芸. "Big data analysis of M503 route events." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/q24623.
Full text銘傳大學
新媒體暨傳播管理學系碩士班
107
The relationship between the both sides of the Taiwan strait has always been unstable and irregular. The relationship has not stopped even through the Chinese Nationalist government relocated to Taiwan, Hong Kong transshipment, Mini-three-links and the final cross-straits direct transportation link. In 2018, China launched new northbound flights on the M503 route without prior consultation with Taiwan, which let cross-strait relationship decrease to the minimum and cause subsequent problems. The most influential incident was the cross-strait Spring Festival flight event that brought about more than 50,000 people could not return home to celebrate Chinese New Year. It brought great pressure under the public opinion. The research is mainly to explore that after Taiwan has experienced many political rotations, Tsai Ing-wen is in power at present. Because Taiwan Government denied the 1992 consensus, it leads to the relationship between Taiwan and China in a stalemate. China continuous puts pressure on Taiwan also enabled northbound flights on the M503 route without consulting others contribute to Taiwan government and national fell extremely dissatisfied. The purpose of this dissertation is to explore netizen’s opinion and public opinion on the internet. Therefore, the research uses OpView and Keypo to analyze the trend, positive and negative evaluation, opinion leadership, word cloud, and keyword analysis of the M503 route events. Through comprehending the impact and subsequent effects of the M503 route events as well as clarifying the main issues and suggestions for potential problems could be provided to government decision-making units and academics.
葛喬丹. "Big Data Analysis of Travel YouTube Channels." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/632zjt.
Full text輔仁大學
國際創業與經營管理學程碩士在職專班
106
Video content in today’s world is now becoming king. While there are hundreds of ways to get your content out there, YouTube is the leading social media platform for video content. Following the trends of the modern world, there is a growing connection between video content and travel, especially in the newer generations. As a result, travel channels on YouTube such as Sam Kolder, JR Alli and Beautiful Destinations are becoming more and more popular. But why and how? This is important information for travel YouTubers and travel agencies to know when moving forward with building a stronger content strategy. Put simply, the main goal of this study is to figure out why and how famous personal travel channels became famous on YouTube. This study will aim to help future YouTube travel channels, whether it’s personal or business accounts, understand the different trends that could support them in growing their channel according to certain key factors and analyses. In this study, we can see all the key factors involved, using a social big data analysis approach for famous travel YouTubers. A combination of knowing your audience, the style of video you produce and how you present the video are all key aspects to consider when creating video content.
Lin, Chia-Cheng, and 林家正. "Big Data Analysis on Government Open Data to Establish Virtual Data Set." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/a7ujtc.
Full text國立中正大學
資訊管理學系碩士在職專班
107
Demographic structure is one of the basic factors for measuring the competitiveness of a country. However, due to the short-term changes in the demographic structure is not obvious that in the past when people discussed the implementation policies of national development and social welfare policies, they were often not given priority. In addition, due to the difficulty in obtaining government administrative data in the past, it had caused doubts about the administrative operations and also restricted the conduct of related research activities. With the changes of the times, the accumulation of political statistics for many years has become an important asset. The decree also evolved in response to the current situation, regulating the government's open administrative information to the outside world to promote the participation of civil public affairs. Meanwhile, the information software and hardware are highly popular and the infrastructure is complete, and the computing and storage costs are greatly reduced, which increases the feasibility of the outside world by analyzing the public information of the government to explore social issues. This research is based on the publicly information of the government and the calculation of the annual ring formation method of the Executive Yuan. Using the big data technology to carry out the virtual data set of populations in the next 20 years. Finally, the results of the research are presented with visual tools, hoping to bring advantages and contribute to the application of related fields.
Lu, Jui-nan, and 呂瑞男. "Marketing Strategy Application Research of Big Data Analysis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/523jvh.
Full text國立中山大學
高階經營碩士班
103
IDC’s estimation of the growth of the Big Data market is to exhibit strong growth over the next five years. Market activities can create demand from Big Data analysis, which refers to the ever-increasing volume, variety, velocity, variability and complexity of information. For marketing organizations, big data is the fundamental consequence of the new marketing landscape, born from the digital world we now live in. In business, Big data analysis can help market analysts to distinguish from consumer database in different consumer masses, and summarize the consumption patterns of each species or consumer spending habits, it would be regarded as information pick the modules, and in categories or clusters (Cluster) polymerization in the future to analyze and summarize a specific characteristics of each item, by analyzing the data analysis and classification algorithms easy to pick. Through this case study we found that the direct and indirect effects of the use of big data marketing strategies to consumers, and create market demand behavior analysis summarized the use of strategies to stimulate consumer behavior and market potential to meet the needs of marketing behavior.
黃日佳. "Designing of Virtual Matrix of Big Data Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/wn6zev.
Full textKAO, SHIH-YAN, and 高式彥. "Constructing Kano Model by Using Big Data Analysis." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/79061541046176865069.
Full text逢甲大學
工業工程與系統管理學系
105
Big data has long been a hot topic and could potentially be applied to many organizations which can collect enormous amount of data. In data mining nowadays, various types of analytical methods have been developed for solving specific mining problems that fits the data type. This study uses big data analysis to construct Kano model. The association rules mined from the collected product dataset with Apriori algorithm are used to replace the questionnaire step in the traditional Kano model analysis. Finally, this study uses a wine product dataset to illustrate the proposed method and the comparison with traditional method is also discussed.
Che=Wei, Chuang, and 莊哲偉. "A Scalable Storage System for Big Data Analysis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/35852997098724933407.
Full text國立交通大學
電子工程學系 電子研究所
104
Recently, Machine learning has been widely used in various areas. Since Machine Learning is basically for Big Data Analysis which requires large amount of computation loads and storage, Machine learning will be efficiently accelerated if only if computation ability and storage equipment are both properly optimized through some methodologies. We tried to explore a hardware/software co-design platform for big data analysis with machine learning capability and storage scalability to solve the two major problems in Machine learning that is power, and Speed. To verify this concept, we built a scalable storage system on Hadoop which adopts heterogeneous architectures (CPU+FPGA) for acceleration and power reduction. this thesis will introduce this platform's parameter setting in detail, port a well-known clustering algorithm K-means onto this platform and finally show the profiled comparison between CPU clustering and CPU+FPGA clustering in speed and power. Based on this profiling result, we can claim that this architecture really works. This architecture is different from the solution of CPU+GPU cluster multi-core architecture proposed by Microsoft on 2009. This proposed solution also has the ability of acceleration. Another advantage of it will be of that this architecture can be implemented as the prototype of ASIC and offers a rather accurate prediction of the acceleration after taping out as a chip. Like in this platform, we implement the circuit on FPGA at 120MHz, however this same circuit can pass 200MHz test simulated at UMC 90nm technology which gives us a prediction of the speed. Also the final acceleration is around 25 times faster than not accelerated A9 CPU cluster.
Tsai, Ming-Chun, and 蔡明純. "Big Semiconductor Manufacturing Data Analysis Using Cloud Technique." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/36458920854256695872.
Full text國立中山大學
資訊工程學系研究所
101
In the semiconductor manufacturing industry, one of the key factors to improve the wafer quality is to analyze the existing logs and find out the probable causative parameters affecting the yield of wafers. Due to the huge amount of data and large amounts of parameters that are recorded in logs, it is difficult for the traditional statistical analysis and relational analysis to process such big data to find out the critical parameters affecting the yields. In order to conquer the analysis bottleneck of big data, we take advantage of the high performance computing of MapReduce and design a novel cloud technique with MapReduce named island-based cloud genetic algorithm (ICGA) to mine the critical information. ICGA is integrated of the cloud genetic algorithm and k nearest neighbor (KNN) classifier. In addition, we adopt the concept of statistics to perform the outlier detection to find out the sensitive parameters. Eventually, the critical parameters discovered by ICGA and sensitive parameters detected by the outlier detection are cross verified to obtain the most discriminative parameters. The obtained most discriminative parameters are used to classify the good and the bad wafers. Experimental results show that these parameters can discriminate between good wafers and bad ones with 100% accuracy. In addition, compared with the standalone GA, ICGA speed up the computation by more than 4 times.
YEH, JOBA, and 葉佩峰. "Research on Big Data Analysis Platform and Services." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m9d8qw.
Full text龍華科技大學
資訊管理系碩士班
106
With the development of Internet technology and the innovation of the Internet of Things, businesses have accumulated huge amounts of data onto different types. Traditional data processing techniques are insufficient to handle such increasingly diverse data. Confronting the huge amounts of data and creating high added value in business activities from whom are the new challenges to many companies in recent years. Compared with foreign countries, there were fewer researches of teaching cases and academic studies about the big data analysis platform in Taiwan. This thesis has made some efforts in the research of big data analysis platform. Hadoop, being started as a subproject by the Apache Foundation in 2005, has opened the door for big data techniques research. Among the various branches of commercial distributions, the release that is known to operate in a business model and provides high compatibility and stability is Cloudera. Based on Cloudera, this study explores the Hadoop techniques and the corresponding virtualization and backup strategies, and is divided into two parts in the applications. In the first part of the application, this study explores the construction of an open virtualized format (.OVF) for teaching and personal use. In the second part of the application, taking the Lunghwa University of Science and Technology as a study case to explore the strategy of building a powerful backup cluster using VMware ESXI and MKSBackup with limited resources. This study can serve as a reference site for most SMEs, SOHO groups or colleges with scarce resources. The research results make the self-established big data platform easy to implement and elevate the technical level of big data analysis platform.
Chang, Fu-Chi, and 張富祺. "Trend Forecasting of Influenza Using Big Data Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/c22hst.
Full text國立臺灣科技大學
電機工程系
105
Accurate tracking the outbreak of an infectious disease, like influenza, helps Public Health to make timely and significant decisions that could calm the fear of people and save lives. A traditional disease caring system based on confirmed cases reports an outbreak typically with at least one-week lag. Therefore, some surveillance systems by monitoring indirect signals about influenza have been proposed to provide a faster unearthing. The volume of those signals is huge and could be pick out from social networks or searching databases. Yahoo and Google, the top two internet search providers who own those Big Data had fired researches about disease tracking ever. In this study, we first draw out the huge influenza signals from CDC (Central Disease Control, Taiwan) database, Google Trends database and King Net database. Then, the linear and nonlinear analyses between three databases are investigated. We found a high correlation existed between series drawn from three databases in years (2011-2016) under survey regardless of linear or nonlinear analysis. Furthermore, we proposed a nonlinear tracking model to capture changes in this epidemic trend, and we can detect the outbreak of influenza more early in years with heavy infectious. These results prove that the signals exposed on networks can provide rich material to trend events of human society.
WU, CHUN-YI, and 吳俊逸. "Improve Warehouse Carousel Utilization via Big Data Analysis." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7pgmj3.
Full text國立雲林科技大學
資訊管理系
107
Large warehouses usually consist of an automatic picking system and a manual picking system at the same time. Automatic picking equipment is expensive but efficient such that picking time can be significantly reduced. However, the storage space of the equipment is limited. It is necessary to consider what kind of SKUs and how many quantities shall be allocated to the automatic system so as to achieve optimal utilization. In this stduy, we propose SKU allocation models via big data analysis of historical data to improve the picking rate of the carousel system. Through ABC classification for SKUs and analysis of PCB for outbound packages, we design models for determining maximum stocks and safety stocks for individual SKUs in the carousel. Safety stock is determined by two strategies: moving average and weighted moving average. Maximum stock is set based on the safety stock. In essence, the SKUs with a high purchase rate in the form of carton packages shall be stored in the carousel. By analyzing large amount of historical data and simulating the storage models, we compare model performance and determine the combination of model parameters for achieving the best utilization of the carousel and improving picking efficiency.
LI, YVEN-YANG, and 李岳揚. "Big Data Analysis of Process Equipment Quality Factors." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/94ew3s.
Full text國立中正大學
企業管理學系碩士在職專班
107
Under the trend of Industry 4.0, how to guide the continuous transformation of manufacturing industry to smart manufacturing, how to turn the data produced in production into useful information, and how to implement big data analysis to obtain process variations are going to help for enhancing the survival and profitability of enterprises, gaining the appreciation of customers. Accordingly, the adoption of data mining tools can help to obtain real-time data analysis in a timely manner. In addition, how to combine academic techniques and industrial applications will be the subject for enterprise upgrading in the future. The purpose of this study is to explore how to find out the parameter variations point of the machines through the production data processed supported by the application of decision tree models. We investigate how X factor(s) can influence on Y factor(s) in the manufacturing process with the resulted from the production process. With the data mining with our collected data obtained from the manufacturing behavior, this is expected that the manufacturing processes are able to be optimized and improved by the data analysis to reduce the manufacturing variations. In addition, the results and series of explorations can be beneficial to firms or mangers for the quality improvement of other manufacturing processes in the future.
Luh, Chien-huan, and 陸建寰. "Big Data Analysis Applied To Retailers Recommended System." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/urgka7.
Full text大同大學
資訊經營學系(所)
106
With the rapid development of information and network technology, huge amounts of data have become a new trend in the field of global information and services. Due to the complexity and diversity of information, how to find out real useful information from the huge amount of data employing big data analysis will be the key issue for companies to win the business competition. Although E-commerce is convenient and express, there are still many customers who like to personally touch, try on and purchase goods in the retail stores. This study employs the recommendation system on the retail store and enables the situation closer to the real one. Additionally, a modified collaborative filtering method with weight distribution is proposed to increase its accuracy. The consumers could find the products they need more quickly with the aid of the personalized recommendation system. Moreover, it can be utilized to analyze the consumers’ past consumption information and thus predict the consumer’s preference for the products they purchased. Therefore, the companies can provide consumers with more appropriate services and reduce the wasted shopping time. In addition to improving the customer loyalty and realizing industrial intelligence, big data analysis can increase more profit to the industry.