Dissertations / Theses: 'Big data analysis'

1

Uřídil, Martin. "Big data - použití v bankovní sféře." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-149908.

Full text

Abstract:

There is a growing volume of global data, which is offering new possibilities for those market participants, who know to take advantage of it. Data, information and knowledge are new highly regarded commodity especially in the banking industry. Traditional data analytics is intended for processing data with known structure and meaning. But how can we get knowledge from data with no such structure? The thesis focuses on Big Data analytics and its use in banking and financial industry. Definition of specific applications in this area and description of benefits for international and Czech banking institutions are the main goals of the thesis. The thesis is divided in four parts. The first part defines Big Data trend, the second part specifies activities and tools in banking. The purpose of the third part is to apply Big Data analytics on those activities and shows its possible benefits. The last part focuses on the particularities of Czech banking and shows what actual situation about Big Data in Czech banks is. The thesis gives complex description of possibilities of using Big Data analytics. I see my personal contribution in detailed characterization of the application in real banking activities.

APA, Harvard, Vancouver, ISO, and other styles

2

Magnusson, Jonathan. "Social Network Analysis Utilizing Big Data Technology." Thesis, Uppsala universitet, Avdelningen för datalogi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-170926.

Full text

Abstract:

As of late there has been an immense increase of data within modern society. This is evident within the field of telecommunications. The amount of mobile data is growing fast. For a telecommunication operator, this provides means of getting more information of specific subscribers. The applications of this are many, such as segmentation for marketing purposes or detection of churners, people about to switching operator. Thus the analysis and information extraction is of great value. An approach of this analysis is that of social network analysis. Utilizing such methods yields ways of finding the importance of each individual subscriber in the network. This thesis aims at investigating the usefulness of social network analysis in telecommunication networks. As these networks can be very large the methods used to study them must scale linearly when the network size increases. Thus, an integral part of the study is to determine which social network analysis algorithms that have this scalability. Moreover, comparisons of software solutions are performed to find product suitable for these specific tasks. Another important part of using social network analysis is to be able to interpret the results. This can be cumbersome without expert knowledge. For that reason, a complete process flow for finding influential subscribers in a telecommunication network has been developed. The flow uses input easily available to the telecommunication operator. In addition to using social network analysis, machine learning is employed to uncover what behavior is associated with influence and pinpointing subscribers behaving accordingly.

APA, Harvard, Vancouver, ISO, and other styles

3

Šoltýs, Matej. "Big Data v technológiách IBM." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193914.

Full text

Abstract:

This diploma thesis presents Big Data technologies and their possible use cases and applications. Theoretical part is initially focused on definition of term Big Data and afterwards is focused on Big Data technology, particularly on Hadoop framework. There are described principles of Hadoop, such as distributed storage and data processing, and its individual components. Furthermore are presented the largest vendors of Big Data technologies. At the end of this part of the thesis are described possible use cases of Big Data technologies and also some case studies. The practical part describes implementation of demo example of Big Data technologies and it is divided into two chapters. The first chapter of the practical part deals with conceptual design of demo example, used products and architecture of the solution. Afterwards, implementation of the demo example is described in the second chapter, from preparation of demo environment to creation of applications. Goals of this thesis are description and characteristics of Big Data, presentation of the largest vendors and their Big Data products, description of possible use cases of Big Data technologies and especially implementation of demo example in Big Data tools from IBM.

APA, Harvard, Vancouver, ISO, and other styles

4

Kumar, Abhinav. "SensAnalysis: A Big Data Platform for Vibration-Sensor Data Analysis." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/101529.

Full text

Abstract:

The Goodwin Hall building on the Virginia Tech campus is the most instrumented building for vibration monitoring. It houses 225 hard-wired accelerometers which record vibrations arising due to internal as well as external activities. The recorded vibration data can be used to develop real-time applications for monitoring the health of the building or detecting human activity in the building. However, the lack of infrastructure to handle the massive scale of the data, and the steep learning curve of the tools required to store and process the data, are major deterrents for the researchers to perform their experiments. Additionally, researchers want to explore the data to determine the type of experiments they can perform. This work tries to solve these problems by providing a system to store and process the data using existing big data technologies. The system simplifies the process of big data analysis by supporting code re-usability and multiple programming languages. The effectiveness of the system was demonstrated by four case studies. Additionally, three visualizations were developed to help researchers in the initial data exploration.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

5

Santos, Lúcio Fernandes Dutra. "Similaridade em big data." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-07022018-104929/.

Full text

Abstract:

Os volumes de dados armazenados em grandes bases de dados aumentam em ritmo sempre crescente, pressionando o desempenho e a flexibilidade dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Os problemas de se tratar dados em grandes quantidades, escopo, complexidade e distribuição vêm sendo tratados também sob o tema de big data. O aumento da complexidade cria a necessidade de novas formas de busca - representar apenas números e pequenas cadeias de caracteres já não é mais suficiente. Buscas por similaridade vêm se mostrando a maneira por excelência de comparar dados complexos, mas até recentemente elas não estavam disponíveis nos SGBDs. Agora, com o início de sua disponibilidade, está se tornando claro que apenas os operadores de busca por similaridade fundamentais não são suficientes para lidar com grandes volumes de dados. Um dos motivos disso é que similaridade\' é, usualmente, definida considerando seu significado quando apenas poucos estão envolvidos. Atualmente, o principal foco da literatura em big data é aumentar a eficiência na recuperação dos dados usando paralelismo, existindo poucos estudos sobre a eficácia das respostas obtidas. Esta tese visa propor e desenvolver variações dos operadores de busca por similaridade para torná-los mais adequados para processar big data, apresentando visões mais abrangentes da base de dados, aumentando a eficácia das respostas, porém sem causar impactos consideráveis na eficiência dos algoritmos de busca e viabilizando sua execução escalável sobre grandes volumes de dados. Para alcançar esse objetivo, este trabalho apresenta quatro frentes de contribuições: A primeira consistiu em um modelo de diversificação de resultados que pode ser aplicado usando qualquer critério de comparação e operador de busca por similaridade. A segunda focou em definir técnicas de amostragem e de agrupamento de dados com o modelo de diversificação proposto, acelerando o processo de análise dos conjuntos de resultados. A terceira contribuição desenvolveu métodos de avaliação da qualidade dos conjuntos de resultados diversificados. Por fim, a última frente de contribuição apresentou uma abordagem para integrar os conceitos de mineração visual de dados e buscas por similaridade com diversidade em sistemas de recuperação por conteúdo, aumentando o entendimento de como a propriedade de diversidade pode ser aplicada.
The data being collected and generated nowadays increase not only in volume, but also in complexity, requiring new query operators. Health care centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of application domains where more powerful and flexible operators are required. Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being referred to as big data. Representing and querying big data using only the traditional scalar data types are not enough anymore. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now that they are starting to become available, its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only a few studies exist targeting the efficacy of the query answers. This Ph.D. work aims at developing variations for the basic similarity operators to propose better suited similarity operators to handle big data, presenting a holistic vision about the database, increasing the effectiveness of the provided answers, but without causing impact on the efficiency on the searching algorithms. To achieve this goal, four mainly contributions are presented: The first one was a result diversification model that can be applied in any comparison criteria and similarity search operator. The second one focused on defining sampling and grouping techniques with the proposed diversification model aiming at speeding up the analysis task of the result sets. The third contribution concentrated on evaluation methods for measuring the quality of diversified result sets. Finally, the last one defines an approach to integrate the concepts of visual data mining and similarity with diversity searches in content-based retrieval systems, allowing a better understanding of how the diversity property is applied in the query process.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhu, Shuxiang. "Big Data System to Support Natural Disaster Analysis." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1592404690195316.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Islam, Md Zahidul. "A Cloud Based Platform for Big Data Science." Thesis, Linköpings universitet, Programvara och system, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-103700.

Full text

Abstract:

With the advent of cloud computing, resizable scalable infrastructures for data processing is now available to everyone. Software platforms and frameworks that support data intensive distributed applications such as Amazon Web Services and Apache Hadoop enable users to the necessary tools and infrastructure to work with thousands of scalable computers and process terabytes of data. However writing scalable applications that are run on top of these distributed frameworks is still a demanding and challenging task. The thesis aimed to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large data sets, collectively known as “big data”. The term “big-data” in this thesis refers to large, diverse, complex, longitudinal and/or distributed data sets generated from instruments, sensors, internet transactions, email, social networks, twitter streams, and/or all digital sources available today and in the future. We introduced architectures and concepts for implementing a cloud-based infrastructure for analyzing large volume of semi-structured and unstructured data. We built and evaluated an application prototype for collecting, organizing, processing, visualizing and analyzing data from the retail industry gathered from indoor navigation systems and social networks (Twitter, Facebook etc). Our finding was that developing large scale data analysis platform is often quite complex when there is an expectation that the processed data will grow continuously in future. The architecture varies depend on requirements. If we want to make a data warehouse and analyze the data afterwards (batch processing) the best choices will be Hadoop clusters and Pig or Hive. This architecture has been proven in Facebook and Yahoo for years. On the other hand, if the application involves real-time data analytics then the recommendation will be Hadoop clusters with Storm which has been successfully used in Twitter. After evaluating the developed prototype we introduced a new architecture which will be able to handle large scale batch and real-time data. We also proposed an upgrade of the existing prototype to handle real-time indoor navigation data.

APA, Harvard, Vancouver, ISO, and other styles

8

Santos, Rivera Juan De Dios. "Data Analysis on Hadoop - finding tools and applications for Big Data challenges." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-260557.

Full text

Abstract:

With the increasing number of data generated each day, recent development in software, provide the tools needed to tackle the challenges of the so called Big Data era. This project introduces some of these platforms, in particular it focuses on platforms for data analysis and query tools that works alongside Hadoop. In the first part of this project, the Hadoop framework and its main components, MapReduce, YARN and HDFS are introduced. This is followed by giving an overview of seven platforms that are part of the Hadoop ecosystem. In this overview we exposed their key features, components, programming model and architecture. The following chapter introduced 12 parameters that are used to compare these platforms side by side and it ends with a summary and discussion where they are divided into several classes according to their usage, use cases and data environment. In the last part of this project, a web log analysis, belonging to one of Sweden's top newspapers, was done using Apache Spark, one of the platforms analyzed. The purpose of this analysis was to showcase some of the features of Spark while doing an exploratory data analysis.

APA, Harvard, Vancouver, ISO, and other styles

9

Pragarauskaitė, Julija. "Frequent pattern analysis for decision making in big data." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2013. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2013~D_20130701_092451-80961.

Full text

Abstract:

Huge amounts of digital information are stored in the World today and the amount is increasing by quintillion bytes every day. Approximate data mining algorithms are very important to efficiently deal with such amounts of data due to the computation speed required by various real-world applications, whereas exact data mining methods tend to be slow and are best employed where the precise results are of the highest important. This thesis focuses on several data mining tasks related to analysis of big data: frequent pattern mining and visual representation. For mining frequent patterns in big data, three novel approximate methods are proposed and evaluated on real and artificial databases: • Random Sampling Method (RSM) creates a random sample from the original database and makes assumptions on the frequent and rare sequences based on the analysis results of the random sample. A significant benefit is a theoretical estimate of classification errors made by this method using standard statistical methods. • Multiple Re-sampling Method (MRM) is an improved version of RSM method with a re-sampling strategy that decreases the probability to incorrectly classify the sequences as frequent or rare. • Markov Property Based Method (MPBM) relies upon the Markov property. MPBM requires reading the original database several times (the number equals to the order of the Markov process) and then calculates the empirical frequencies using the Markov property. For visual representation... [to full text]
Didžiuliai informacijos kiekiai yra sukaupiami kiekvieną dieną pasaulyje bei jie sparčiai auga. Apytiksliai duomenų tyrybos algoritmai yra labai svarbūs analizuojant tokius didelius duomenų kiekius, nes algoritmų greitis yra ypač svarbus daugelyje sričių, tuo tarpu tikslieji metodai paprastai yra lėti bei naudojami tik uždaviniuose, kuriuose reikalingas tikslus atsakymas. Ši disertacija analizuoja kelias duomenų tyrybos sritis: dažnų sekų paiešką bei vizualizaciją sprendimų priėmimui. Dažnų sekų paieškai buvo pasiūlyti trys nauji apytiksliai metodai, kurie buvo testuojami naudojant tikras bei dirbtinai sugeneruotas duomenų bazes: • Atsitiktinės imties metodas (Random Sampling Method - RSM) formuoja pradinės duomenų bazės atsitiktinę imtį ir nustato dažnas sekas, remiantis atsitiktinės imties analizės rezultatais. Šio metodo privalumas yra teorinis paklaidų tikimybių įvertinimas, naudojantis standartiniais statistiniais metodais. • Daugybinio perskaičiavimo metodas (Multiple Re-sampling Method - MRM) yra RSM metodo patobulinimas, kuris formuoja kelias pradinės duomenų bazės atsitiktines imtis ir taip sumažina paklaidų tikimybes. • Markovo savybe besiremiantis metodas (Markov Property Based Method - MPBM) kelis kartus skaito pradinę duomenų bazę, priklausomai nuo Markovo proceso eilės, bei apskaičiuoja empirinius dažnius remdamasis Markovo savybe. Didelio duomenų kiekio vizualizavimui buvo naudojami pirkėjų internetu elgsenos duomenys, kurie analizuojami naudojant... [toliau žr. visą tekstą]

APA, Harvard, Vancouver, ISO, and other styles

10

Kinuthia, Charles, and Ming Peng. "Hardware Implementation and Analysis of a Big Data Algorithm." Thesis, KTH, Skolan för elektro- och systemteknik (EES), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-200611.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Su, Yu. "Big Data Management Framework based on Virtualization and Bitmap Data Summarization." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420738636.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Sohangir, Soroosh. "MACHINE LEARNING ALGORITHM PERFORMANCE OPTIMIZATION: SOLVING ISSUES OF BIG DATA ANALYSIS." OpenSIUC, 2015. https://opensiuc.lib.siu.edu/dissertations/1111.

Full text

Abstract:

Because of high complexity of time and space, generating machine learning models for big data is difficult. This research is introducing a novel approach to optimize the performance of learning algorithms with a particular focus on big data manipulation. To implement this method a machine learning platform using eighteen machine learning algorithms is implemented. This platform is tested using four different use cases and result is illustrated and analyzed.

APA, Harvard, Vancouver, ISO, and other styles

13

Lu, Feng. "Big data scalability for high throughput processing and analysis of vehicle engineering data." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-207084.

Full text

Abstract:

"Sympathy for Data" is a platform that is utilized for Big Data automation analytics. It is based on visual interface and workflow configurations. The main purpose of the platform is to reuse parts of code for structured analysis of vehicle engineering data. However, there are some performance issues on a single machine for processing a large amount of data in Sympathy for Data. There are also disk and CPU IO intensive issues when the data is oversized and the platform need fits comfortably in memory. In addition, for data over the TB or PB level, the Sympathy for data needs separate functionality for efficient processing simultaneously and scalable for distributed computation functionality. This paper focuses on exploring the possibilities and limitations in using the Sympathy for Data platform in various data analytic scenarios within the Volvo Cars vision and strategy. This project re-writes the CDE workflow for over 300 nodes into pure Python script code and make it executable on the Apache Spark and Dask infrastructure. We explore and compare both distributed computing frameworks implemented on Amazon Web Service EC2 used for 4 machine with a 4x type for distributed cluster measurement. However, the benchmark results show that Spark is superior to Dask from performance perspective. Apache Spark and Dask will combine with Sympathy for Data products for a Big Data processing engine to optimize the system disk and CPU IO utilization. There are several challenges when using Spark and Dask to analyze large-scale scientific data on systems. For instance, parallel file systems are shared among all computing machines, in contrast to shared-nothing architectures. Moreover, accessing data stored in commonly used scientific data formats, such as HDF5 is not tentatively supported in Spark. This report presents research carried out on the next generation of Big Data platforms in the automotive industry called "Sympathy for Data". The research questions focusing on improving the I/O performance and scalable distributed function to promote Big Data analytics. During this project, we used the Dask.Array parallelism features for interpretation the data sources as a raster shows in table format, and Apache Spark used as data processing engine for parallelism to load data sources to memory for improving the big data computation capacity. The experiments chapter will demonstrate 640GB of engineering data benchmark for single node and distributed computation mode to evaluate the Sympathy for Data Disk CPU and memory metrics. Finally, the outcome of this project improved the six times performance of the original Sympathy for data by developing a middleware SparkImporter. It is used in Sympathy for Data for distributed computation and connected to the Apache Spark for data processing through the maximum utilization of the system resources. This improves its throughput, scalability, and performance. It also increases the capacity of the Sympathy for data to process Big Data and avoids big data cluster infrastructures.

APA, Harvard, Vancouver, ISO, and other styles

14

Bin, Saip Mohamed A. "Big Social Data Analytics: A Model for the Public Sector." Thesis, University of Bradford, 2019. http://hdl.handle.net/10454/18352.

Full text

Abstract:

The influence of Information and Communication Technologies (ICTs) particularly internet technology has had a fundamental impact on the way government is administered, provides services and interacts with citizens. Currently, the use of social media is no longer limited to informal environments but is an increasingly important medium of communication between citizens and governments. The extensive and increasing use of social media will continue to generate huge amounts of user-generated content known as Big Social Data (BSD). The growing body of BSD presents innumerable opportunities as well as challenges for local government planning, management and delivery of public services to citizens. However, the governments have not yet utilised the potential of BSD for better understanding the public and gaining new insights from this new way of interactions. Some of the reasons are lacking in the mechanism and guidance to analyse this new format of data. Thus, the aim of this study is to evaluate how the body of BSD can be mined, analysed and applied in the context of local government in the UK. The objective is to develop a Big Social Data Analytics (BSDA) model that can be applied in the case of local government. Data generated from social media over a year were collected, collated and analysed using a range of social media analytics and network analysis tools and techniques. The final BSDA model was applied to a local council case to evaluate its impact in real practice. This study allows to better understand the methods of analysing the BSD in the public sector and extend the literature related to e-government, social media, and social network theory
Universiti Utara Malaysia

APA, Harvard, Vancouver, ISO, and other styles

15

Chaudhuri, Abon. "Geometric and Statistical Summaries for Big Data Visualization." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1382235351.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Rivetti, di Val Cervo Nicolo. "Efficient Stream Analysis and its Application to Big Data Processing." Thesis, Nantes, 2016. http://www.theses.fr/2016NANT4046/document.

Full text

Abstract:

L’analyse de flux de données est utilisée dans beaucoup de contexte où la masse des données et/ou le débit auquel elles sont générées, excluent d’autres approches (par exemple le traitement par lots). Le modèle flux fourni des solutions aléatoires et/ou fondées sur des approximations pour calculer des fonctions d’intérêt sur des flux (repartis) de n-uplets, en considérant le pire cas, et en essayant de minimiser l’utilisation des ressources. En particulier, nous nous intéressons à deux problèmes classiques : l’estimation de fréquence et les poids lourds. Un champ d’application moins courant est le traitement de flux qui est d’une certaine façon un champ complémentaire aux modèle flux. Celui-ci fournis des systèmes pour effectuer des calculs génériques sur les flux en temps réel souple, qui passent à l’échèle. Cette dualité nous permet d’appliquer des solutions du modèle flux pour optimiser des systèmes de traitement de flux. Dans cette thèse, nous proposons un nouvel algorithme pour la détection d’éléments surabondants dans des flux repartis, ainsi que deux extensions d’un algorithme classique pour l’estimation des fréquences des items. Nous nous intéressons également à deux problèmes : construire un partitionnement équitable de l’univers des n-uplets par rapport à leurs poids et l’estimation des valeurs de ces n-uplets. Nous utilisons ces algorithmes pour équilibrer et/ou délester la charge dans les systèmes de traitement de flux
Nowadays stream analysis is used in many context where the amount of data and/or the rate at which it is generated rules out other approaches (e.g., batch processing). The data streaming model provides randomized and/or approximated solutions to compute specific functions over (distributed) stream(s) of data-items in worst case scenarios, while striving for small resources usage. In particular, we look into two classical and related data streaming problems: frequency estimation and (distributed) heavy hitters. A less common field of application is stream processing which is somehow complementary and more practical, providing efficient and highly scalable frameworks to perform soft real-time generic computation on streams, relying on cloud computing. This duality allows us to apply data streaming solutions to optimize stream processing systems. In this thesis, we provide a novel algorithm to track heavy hitters in distributed streams and two extensions of a well-known algorithm to estimate the frequencies of data items. We also tackle two related problems and their solution: provide even partitioning of the item universe based on their weights and provide an estimation of the values carried by the items of the stream. We then apply these results to both network monitoring and stream processing. In particular, we leverage these solutions to perform load shedding as well as to load balance parallelized operators in stream processing systems

APA, Harvard, Vancouver, ISO, and other styles

17

Aring, Danielle C. "Integrated Real-Time Social Media Sentiment Analysis Service Using a Big Data Analytic Ecosystem." Cleveland State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=csu1494359605127555.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Abu, Salih Bilal Ahmad Abdal Rahman. "Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing." Thesis, Curtin University, 2018. http://hdl.handle.net/20.500.11937/70285.

Full text

Abstract:

This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks.

APA, Harvard, Vancouver, ISO, and other styles

19

Nhlabano, Valentine Velaphi. "Fast Data Analysis Methods For Social Media Data." Diss., University of Pretoria, 2018. http://hdl.handle.net/2263/72546.

Full text

Abstract:

The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time.
Dissertation (MSc)--University of Pretoria, 2019.
National Research Foundation (NRF) - Scarce skills
Computer Science
MSc
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

20

Giannini, Andrea. "Social Network Analysis: Architettura Streaming Big Data di Raccolta e Analisi Dati da Twitter." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25378/.

Full text

Abstract:

Negli ultimi anni i social media, come ad esempio Facebook, Twitter, WhatsApp, YouTube, si sono diffusi a macchia d'olio. Ormai quasi tutti accedono giornalmente su almeno uno di questi per informarsi, esprimere opinioni e interagire con altri utenti. Per questa ragione sono diventati fondamentali per i reparti marketing delle aziende essendo non solo un ottimo canale di comunicazione, ma anche una fonte di informazioni sui clienti e potenziali tali. La tesi si focalizza proprio su quest'ultimo aspetto. Il progetto Social Network Analysis (SNA) vuole essere infatti uno strumento attraverso il quale è possibile visionare e analizzare per intero le reti di interazione tra utenti. Ci si è posti l'obiettivo di realizzare SNA in modo che raccogliesse e si aggiornasse in tempo reale, così da essere sempre al passo con le ultime novità, data la dinamicità delle informazioni all'interno dei social media. Un progetto come SNA comporta dover affrontare diversi ostacoli. Oltre a quello di riuscire a realizzare un'architettura che accolga un flusso continuo di informazioni, uno degli ostacoli più importanti è quello di gestire la grande mole di dati. Per farlo ci si è affidati ad un'architettura distribuita e facilmente scalabile che comprende l'uso di elaborazioni in cluster, di funzioni serverless e di database NoSQL approvvigionati attraverso il servizio cloud di Microsoft, Azure. In questa tesi SNA è stato progettato e implementato basandosi su Twitter, ma è possibile sfruttare la stessa idea su tanti altri social media.

APA, Harvard, Vancouver, ISO, and other styles

21

Abounia, Omran Behzad. "Application of Data Mining and Big Data Analytics in the Construction Industry." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu148069742849934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Dourado, Jonas Rossi. "Delayed Transfer Entropy applied to Big Data." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/18/18153/tde-19022019-134228/.

Full text

Abstract:

Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence.
A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas.

APA, Harvard, Vancouver, ISO, and other styles

23

Wei, Jinliang. "Parallel Analysis of Aspect-Based Sentiment Summarization from Online Big-Data." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505264/.

Full text

Abstract:

Consumer's opinions and sentiments on products can reflect the performance of products in general or in various aspects. Analyzing these data is becoming feasible, considering the availability of immense data and the power of natural language processing. However, retailers have not taken full advantage of online comments. This work is dedicated to a solution for automatically analyzing and summarizing these valuable data at both product and category levels. In this research, a system was developed to retrieve and analyze extensive data from public online resources. A parallel framework was created to make this system extensible and efficient. In this framework, a star topological network was adopted in which each computing unit was assigned to retrieve a fraction of data and to assess sentiment. Finally, the preprocessed data were collected and summarized by the central machine which generates the final result that can be rendered through a web interface. The system was designed to have sound performance, robustness, manageability, extensibility, and accuracy.

APA, Harvard, Vancouver, ISO, and other styles

24

Cao, Hongfei. "High-throughput Visual Knowledge Analysis and Retrieval in Big Data Ecosystems." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13877134.

Full text

Abstract:

Visual knowledge plays an important role in many highly skilled applications, such as medical diagnosis, geospatial image analysis and pathology diagnosis. Medical practitioners are able to interpret and reason about diagnostic images based on not only primitive-level image features such as color, texture, and spatial distribution but also their experience and tacit knowledge which are seldom articulated explicitly. This reasoning process is dynamic and closely related to real-time human cognition. Due to a lack of visual knowledge management and sharing tools, it is difficult to capture and transfer such tacit and hard-won expertise to novices. Moreover, many mission-critical applications require the ability to process such tacit visual knowledge in real time. Precisely how to index this visual knowledge computationally and systematically still poses a challenge to the computing community.

My dissertation research results in novel computational approaches for highthroughput visual knowledge analysis and retrieval from large-scale databases using latest technologies in big data ecosystems. To provide a better understanding of visual reasoning, human gaze patterns are qualitatively measured spatially and temporally to model observers’ cognitive process. These gaze patterns are then indexed in a NoSQL distributed database as a visual knowledge repository, which is accessed using various unique retrieval methods developed through this dissertation work. To provide meaningful retrievals in real time, deep-learning methods for automatic annotation of visual activities and streaming similarity comparisons are developed under a gaze-streaming framework using Apache Spark.

This research has several potential applications that offer a broader impact among the scientific community and in the practical world. First, the proposed framework can be adapted for different domains, such as fine arts, life sciences, etc. with minimal effort to capture human reasoning processes. Second, with its real-time visual knowledge search function, this framework can be used for training novices in the interpretation of domain images, by helping them learn experts’ reasoning processes. Third, by helping researchers to understand human visual reasoning, it may shed light on human semantics modeling. Finally, integrating reasoning process with multimedia data, future retrieval of media could embed human perceptual reasoning for database search beyond traditional content-based media retrievals.

APA, Harvard, Vancouver, ISO, and other styles

25

Sahasrabudhe, Aditya. "NBA 2020 Finals: Big Data Analysis of Fans’ Sentiments on Twitter." Ohio University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1619784186362291.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Li, Yen-de, and 李彥德. "Data Visualization Analysis of Big Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/rbkvqe.

Full text

Abstract:

碩士
義守大學
資訊管理學系
105
The evolution of time leads everything happened in the daily life. Both the advances of science and technology and the innovations of materials are important factors of the advance. The Internet promotes more interactions and produces much data. For the accumulation of transactional data from business operation are increasing rapidly, so the study for analyzing big data becomes more and more worth to discover the hidden information among the huge data in hand. In general, the technique of data visualization can quickly provide users with a better way to understand the data. It also avoids the inconvenience caused by too complicated information. In this thesis, two business intelligence platforms, named Opview and Tableau, are applied to analyze the big data produced by telecom industry. First, this work explores the top five telecom companies in Taiwan, such as Chunghwa Telecom, Taiwan Mobile, Far EasTone Telecommunications, Taiwan Star Telecom, and Asia Pacific Telecom, by using Opview social listening platform. For the five telecommunications companies, five key factors are setted for analysis, including broad band, receiving, network, wifi and hot spots. From the result of our analyses by way of Opview implementations, we concluded that the significant factors concerned from end users are the network, broad band and receiving. Thereafter, this study adopts Tableau software to analyze the data of the broad band provided by Chunghwa Telecom to find out the demand analysis, the amount of varieties of bandwith in late years and to locate the concerns of users by way of the presentation of visional graphics. Moreover, this study analyzes the data with sixty thousand records, which are provided by Chunghwa Telecom, by using the business intelligence platform of Tableau. The analyzed results of this study can provide a valuble reference of visualization research for follow-up researchers.

APA, Harvard, Vancouver, ISO, and other styles

27

Lima, Luciana Cristina Barbosa de. "Big data for data analysis in financial industry." Master's thesis, 2014. http://hdl.handle.net/1822/34919.

Full text

Abstract:

Dissertação de mestrado integrado em Engineering and Management of Information Systems
Technological evolution and the consequent increase of the society and organization dependency was an important driver for the escalation of volume and variety of data. At the same time, market evolution requires the capability to find new paths to improve the products/services, client satisfaction and avoid the cost increase associated with it. Big Data comes up with huge power, not only by the ability of processing large amounts and variety of data at a high velocity, but also by the capability to create value for the organizations that include it in their operational and decision making processes. The relevance of Big Data use for the different industries and how to implement a Big Data solution is something that raises many doubts and discussion. Thus, this paper comes with a business orientation so it will be possible to understand what Big Data actually means for organizations. The project follows a top-down approach which is done in the first instance, an overview on what defines Big Data and what distinguishes it. As it evolves, it directs the focus to the Big Data contribution at an organizational level and the existing market offers. The decomposition of the problem closes with two main contributions. A framework that helps to identify a problem: Big Data and a trial of a case study that identifies the correlation between financial news articles and the change in the stock exchange. The outcome of this trial was a platform with analytic and predictive capabilities in this new Big Data context.
A evolução tecnológica e consequente aumento da dependência da sociedade e organizações levou, nos últimos anos, ao crescimento exorbitante do volume e variedade de dados existentes. Ao mesmo tempo, a evolução do mercado exige às organizações a capacidade de encontrarem novas formas de melhorarem os seus produtos/serviços, satisfazer os seus clientes e evitar o aumento de custos para atingir esses objetivos. O Big Data surge em grande força, apresentando um elevada capacidade de processar a alta velocidade grandes quantidades e variedade dos dados. Este conceito tem evoluído pela sua capacidade de gerar valor às organizações que o incluem nos seus processos operacionais e na tomada de decisão. A pertinência da utilização do Big Data pelas organizações dos mais diversos sectores e a forma como se poderá implementar uma solução Big Data é algo que ainda suscita várias dúvidas e alguma discussão Desta forma, o presente documento, surge com uma orientação ao negócio de modo a que seja possível entender o que o Big Data representa na verdade para as organizações. O projecto segue uma abordagem top-down onde é feito, numa primeira instância, um síntese sobre o que define o Big Data e o que o distingue. À medida que o projeto evolui, o foco direciona-se para o contributo do Big Data a nível organizacional e quais as ofertas existentes nos mercado. A decomposição do problema culmina com dois principais contributos. Uma framework que ajuda à identificação de um problema, como sendo Big Data e experimentação de um caso de estudo que identifica a correlação entre artigos de notícias financeiras e a variação da bolsa de valores. Como resultado desta experimentação foi desenvolvida uma plataforma com capacidades analíticas e preditivas neste novo contexto do Big Data.

APA, Harvard, Vancouver, ISO, and other styles

28

Pereira, Flávia Patricia Alves. "Big data e data analysis: visualização de informação." Master's thesis, 2015. http://hdl.handle.net/1822/40106.

Full text

Abstract:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
A revolução da informação está abranger todas as organizações da sociedade moderna, forçando os especialistas na área de Tecnologias de Informação (TI) a transformar os seus processos de aprendizagem para a criação de valor. As tecnologias produzem e armazenam uma grande quantidade de dados para posteriormente serem produzidas informações. Entender o conjunto heterogéneo de dados e passar a reconhecer dados com significado é o grande objetivo do conceito Big Data. A necessidade de compreender e extrair conhecimento a partir do grande conjunto de dados é um processo difícil mas essencial para as organizações que lidam com informação. Neste contexto é necessário aplicar um processo de análise, limpeza e transformação de dados, denominado de Data Analysis. Este processo conduz o utilizador à escolha da técnica mais adequada perante objetivo da sua análise. A técnica estudada nesta dissertação será a Visualização de Informação (VI). A VI nesta dissertação é estudada com o principal objetivo de se transmitir informação de uma forma clara e efetiva através da utilização de representações gráficas. A mapificação de dados em estruturas visuais (representações gráficas) possibilitam uma vista detalhada sobre o contexto de dados e das suas relações. Os métodos e técnicas de Visualização de Informação evoluíram nas últimas décadas, em consonância com o avanço tecnológico galopante, daqui surge a necessidade de reformular o Modelo do Processo de Visualização, para facilitar a criação de uma representação visual. O objetivo principal centra-se na otimização dos métodos de Visualização – produzir uma representação clara e eficiente tem como principal finalidade potencializar a apropriação de dados por meio de representações gráficas. Para este objetivo foi formulado uma Classificação: “Representação Visual: o que pretendo transmitir”, que contempla o estudo de gráficos e das análises que surgem quando se pretende descobrir ou comunicar padrões e tendências nos dados. A Classificação foi construída como artefacto, com o propósito específico de ajudar o utilizador a decidir qual o gráfico mais adequado para evidenciar um tipo de análise. Para o estudo desta dissertação optou-se por aplicar como abordagem metodológica o Design Science Research, para a classificação sistemática de conceitos e construção da classificação. O utilizador como agente crucial durante o Método de Processo de Visualização: deve ser capaz de perceber qual é a análise mais apropriada para os seus dados e qual o tipo de gráfico mais rentável para o seu trabalho.
The revolution of information reaches all organizations of modern society, forcing experts in the field of Information Technology (IT) to transform their learning processes to create more value. Technology produces and stores a large quantity of data to be able to produce information afterwards. To understand the heretogeneous data and recognize data that matters is the ultimate goal of the concept of Big Data. The need to understand and extract information from a large group of data is a hard but essential process for organizations that deal with information. In this context comes the need analyze, clean and transform data. This process is called Data Analysis. The process guides the user to use the most suitable technique depending on the purpose of his analysis. The technique studied in this thesis will be Information Visualization (IV). The IV in this thesis is studied with the main purpose of transmitting information in a clear and effective way through the use of graphic representations. The mapification of data into visual structures (graphic representations) provide a detailed view of the data context and their relations. The methods and techniques of IV have evolved in the last decades, in line with the rampant technological progress, hence there is the need to redesign the Model of Visualization Process, to further the making of a visual representation. The main goal focus on the optimization of Visualization methods - the main purpose of producing a clear and efficient representation is to enhance the appropriation of data via graphic representations. To attain this purpose was formulated a classification: "Visual Representation: what I intend to transmit", which comtemplates the study of graphics and the analysis that arises when one wants to find or communicate patterns and trends in data. The classification was built as an artefact, with the specific purpose of helping the user to decide which graphic is the most suitable for a certain type of analysis. For the study of this thesis it was chosen Design Science Research to apply as methodological approach, for the systematic classification of concepts and the construction of classification. The user, as a key agent during the Visualization Process Method: should be able to acknowledge which is the most appropriate analysis for his data and what type of graphic is the most profitable for his work.

APA, Harvard, Vancouver, ISO, and other styles

29

WANG, SHAO-SIANG, and 王紹祥. "Comparative analysis of Big Data and Data Mining." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/r23d77.

Full text

Abstract:

碩士
銘傳大學
資訊管理學系碩士班
106
With the rapid development of information technology, the amount of data is generated at a high speed every day, and finding useful information in a large amount of information is an important issue for both industrial development and academic research. The main purpose of this study is to explore the applicable scenarios of data mining and big data. Through the discussion of relevant research, this study aggregates the comparative information of big data and data exploration, including: architecture, usability, ease of use, economic aspect, enterprise scale, industrial type, ability to process data. This study will first compare the similarities and differences between the two, and then through the Modified Delphi method and AHP for big data, and data mining. Through the comparative analysis of the six facets of architecture, usability, ease of use, economics, industrial characteristics, and ability to process data, we will derive the applicable scenarios for data exploration and big data analysis.

APA, Harvard, Vancouver, ISO, and other styles

30

Han, Meng. "INFLUENCE ANALYSIS TOWARDS BIG SOCIAL DATA." 2017. http://scholarworks.gsu.edu/cs_diss/121.

Full text

Abstract:

Large scale social data from online social networks, instant messaging applications, and wearable devices have seen an exponential growth in a number of users and activities recently. The rapid proliferation of social data provides rich information and infinite possibilities for us to understand and analyze the complex inherent mechanism which governs the evolution of the new technology age. Influence, as a natural product of information diffusion (or propagation), which represents the change in an individual’s thoughts, attitudes, and behaviors resulting from interaction with others, is one of the fundamental processes in social worlds. Therefore, influence analysis occupies a very prominent place in social related data analysis, theory, model, and algorithms. In this dissertation, we study the influence analysis under the scenario of big social data. Firstly, we investigate the uncertainty of influence relationship among the social network. A novel sampling scheme is proposed which enables the development of an efficient algorithm to measure uncertainty. Considering the practicality of neighborhood relationship in real social data, a framework is introduced to transform the uncertain networks into deterministic weight networks where the weight on edges can be measured as Jaccard-like index. Secondly, focusing on the dynamic of social data, a practical framework is proposed by only probing partial communities to explore the real changes of a social network data. Our probing framework minimizes the possible difference between the observed topology and the actual network through several representative communities. We also propose an algorithm that takes full advantage of our divide-and-conquer strategy which reduces the computational overhead. Thirdly, if let the number of users who are influenced be the depth of propagation and the area covered by influenced users be the breadth, most of the research results are only focused on the influence depth instead of the influence breadth. Timeliness, acceptance ratio, and breadth are three important factors that significantly affect the result of influence maximization in reality, but they are neglected by researchers in most of time. To fill the gap, a novel algorithm that incorporates time delay for timeliness, opportunistic selection for acceptance ratio, and broad diffusion for influence breadth has been investigated. In our model, the breadth of influence is measured by the number of covered communities, and the tradeoff between depth and breadth of influence could be balanced by a specific parameter. Furthermore, the problem of privacy preserved influence maximization in both physical location network and online social network was addressed. We merge both the sensed location information collected from cyber-physical world and relationship information gathered from online social network into a unified framework with a comprehensive model. Then we propose the resolution for influence maximization problem with an efficient algorithm. At the same time, a privacy-preserving mechanism are proposed to protect the cyber physical location and link information from the application aspect. Last but not least, to address the challenge of large-scale data, we take the lead in designing an efficient influence maximization framework based on two new models which incorporate the dynamism of networks with consideration of time constraint during the influence spreading process in practice. All proposed problems and models of influence analysis have been empirically studied and verified by different, large-scale, real-world social data in this dissertation.

APA, Harvard, Vancouver, ISO, and other styles

31

TRIPATHI, ASHISH KUMAR. "BIG DATA ANALYSIS USING METAHEURISTIC ALGORITHMS." Thesis, 2018. http://dspace.dtu.ac.in:8080/jspui/handle/repository/16597.

Full text

Abstract:

BigDatahasgotthehugeattentionoftheresearchersfromacademiaandindustryforthe decisionandstrategymaking. Thus,efficientdataanalysismethodsarerequiredformanagingthebigdatasets. Dataclustering,aprominentanalysismethodofdatamining,isbeing efficiently employed in big data analysis since it does not require labeled datasets, which is not easily available for the big data problems. K-means, one of the simplest and popularalgorithm,hasbeenemployedforunfoldingthevariousclusteringproblems. However, theresultsofK-meansalgorithmarehighlydependentoninitialclustercentroidsandeasily traps into local optima. To mitigate this issue, a novel metaheuristic algorithm named Military Dog Based Optimizer has been introduced and validated against 17 benchmark functions. Theproposedalgorithmhasbeenalsotestedon8benchmarkclusteringdatasets and compared with other 5 recent state-of-the-art algorithms. Though, the proposed algorithm witnessed better clustering in terms of accuracy as compared to the conventional methods. However,thealgorithmfailtoperformefficientlyonthebigdatasetsintermsof memory space and the time complexities, due to their sequential execution. To overcome this issue, four novel methods have been developed for the efficient clustering of the big datasets. The first method is a hybrid of K-means and bat algorithm which run in parallel over a cluster of computers. The proposed method outperformed K-means, PSO and bat algorithmon5benchmarkdatasets. Thesecondmethodisanovelvariantofthegreywolf optimizer for clustering the big data set, in which the exploration and exploitation ability of the grey wolf optimizer is enhanced using the levy flight and binomial crossover. The proposedmethodperformedefficientlyonthe8benchmarkclusteringdatasetsascompared totheconventionalmethods. Moreover,theparallelperformanceofthepresentedmethods hasbeenalsoanalyzedusingthespeedupmeasure. Third,ahybridmethodnamedK-BBO has been developed which utilizes the search ability of the biogeography based optimizer and K-means for better initial population. Fourth, a novel parallel method using MDBO is introduced and tested on four large scale datasets. Furthermore, to test the applicability oftheproposedmethodsinrealworldscenarios,tworeal-worldproblemsnamely,Twitter sentimentanalysisandfakereviewdetectionhavebeensolvedinthebigdataenvironment usingtheproposedmethods.

APA, Harvard, Vancouver, ISO, and other styles

32

Pai, Fu-Tzu, and 白馥慈. "Big Data Analysis for National Health Insurance Research Data." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/43894559097858528541.

Full text

Abstract:

碩士
國立陽明大學
生物醫學資訊研究所
101
The usages of Big Data among medical researches and studies have been increased tremendously by years. One of those Dig Data is called the longitudinal medical claims data. In usual, medical claims data are hold by patients themselves or health insurance companies, but there is a difference in Taiwan. The National Health Insurance (NHI) Administration of Taiwan was established since March 1995; in other words, NHI manages over 99% medical claims data of citizens in past 18 years. All those data are stored in the National Health Insurance Research Database (NHIRD), which becomes an important data source of Evidence-based medicine (EBM) studies. According to statistics, more and more studies are based on the NHIRD. Due to the information overload and lack of domain-specific analysis tools of NHIRD, it is hard for researchers to extract valuable information from the database without learning any Structured Query Language (SQL). To improve the qualities and efficiencies of NHIRD related researches, this study aims to design a friendly and reusable web-based user interface, which allows users to interact with NHIRD directly without any prerequisite. The user interface is built on Ruby on Rails web framework and running on Ruby for cross platform compatibility. It runs with the data of Longitudinal Health Insurance Database 2005 under PostgreSQL in production mode. We present a flexible web interface that users can easily query database and do elementary analysis without programming expertise. It also dynamically draws statistical charts and calculates estimate number of total entries for every query result. Furthermore, it provides several pre-built query conditions for variety purposes and generates the download link of result data set, which can be used to do advanced analysis. It greatly simplifies the data access of NHIRD and assist associated studies more effectively.

APA, Harvard, Vancouver, ISO, and other styles

33

Hidayati, Shintami Chusnul, and Shintami Chusnul Hidayati. "Fashion Style Analysis towards Multimedia Big Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/34383422410852321225.

Full text

Abstract:

博士
國立臺灣科技大學
資訊工程系
105
Driven by the huge profit potential in the fashion industry, intelligent fashion analysis may become an important subject in the multimedia and computer vision research. Traditional vision-based clothing research methods focused on analyzing fashion items based on either keywords given by users or low-level features specified by preferred samples. Instead of using less-discriminative low-level features or ambiguous keywords to analyze fashion items, this study proposes novel approaches that focus on clothing genre recognition and fashion trends analysis based on the visually-differentiable fashion style elements. A set of style elements that are crucial for recognizing clothing genres and analyzing fashion trends are identified based on the fashion design theory. In addition, the corresponding salient visual features of each style element are identified and formulated with variables that can be computationally derived with various computer vision algorithms. In terms of clothing genre recognition, we propose a novel classification technique to identify the genres of upperwear and lowerwear from full-body pictures through recognizing fundamental style elements of clothing design, such as collars, front buttons, and sleeves. We extract the representative features for describing style elements based on the spatial layout of body-parts. In addition, we make one step ahead to automatically classifying clothing genres by introducing the advantage of integrating local features of multimodality as the instances of prize-collecting Steiner tree (PCST) problem to discover clothing regions, and exploiting visual style elements to discover the clothing genre. Recognition results show that our clothing genre recognition frameworks have significant performance and superiority in comparison with the state-of-the-art recognition methods. Moreover, the effectiveness of each style element and its visual features on recognizing clothing genres are demonstrated through a set of experiments involving different sets of style elements or features. On the topic of fashion trend spotting, we aim to present a novel algorithm that automatically discovers visual style elements representing fashion trends for a certain season of fashion week events. The five major elements of fashion style (i.e. head decoration, color, silhouette, pattern, and footwear) are investigated in this framework. The trending styles are discovered based on the stylistic coherent and unique characteristics of fashion style elements. The experimental evaluations and analysis on a large number of catwalk show videos well demonstrate the effectiveness of our proposed method.

APA, Harvard, Vancouver, ISO, and other styles

34

Hsieh, Cheng-Hsien, and 謝政賢. "Investment Analysis under the Big Data Algorithm." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/32032084438052470526.

Full text

Abstract:

碩士
國立中興大學
財務金融學系所
103
This paper adopts the Apriori Algorithm of Big Data methods to calculate and in-vest in Taiwan stock market. First of all, we construct four types of sample by “Today’s daily returns are 6% and next day’s returns are 6%”, “Today’s daily returns are 6% and next day’s returns are -6%”, “Today’s daily returns are -6% and next day’s returns are -6%” and “Today’s daily returns are -6% and next day’s returns are 6%” for calculating to find rules. In empirical results, the “Today’s daily returns are 6% and next day’s re-turns are 6%” can only find out rules. After rules calculated by Apriori Algorithm, our investment performance can earn the positive accumulative annual return by rules in Taiwan stock market. Furthermore, compare with Taiwan’s market index, previous three years have positive abnormal return. Finally, compare to the strategy of benchmark by Mean-Variance model. The annual returns of Apriori Algorithm’s rules can beat the method of benchmark. We surmise the Mean-Variance model have the problem of standard error, hence the performance of Mean-Variance model is worse than Apriori Algorithm’s rules.

APA, Harvard, Vancouver, ISO, and other styles

35

Vilares, António Alberto Legoinha. "Big data analytics : predictive consumer behaviour analysis." Master's thesis, 2017. http://hdl.handle.net/10362/24457.

Full text

Abstract:

Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
O trabalho realizado visa analisar o desempenho da utilização de ferramentas Big Data, para a componente de tratamento de dados e para a implementação de um algoritmo de Data Mining, nomeadamente FP-Growth para a extração de regras de associação, aplicadas ao registo de transações de produtos no mercado do retalho. Os dados extraídos visam analisar as transações realizadas pelos consumidores, de uma cadeia de supermercados, de forma a compreender quais os produtos que são adquiridas em simultâneo, análise denominada como Market Basket Analysis. Foram extraídos registos de um ano, com o histórico de compras de cada cliente. Cada registo contém todos os produtos adquiridos num espaço de um ano. Pretende-se utilizar a informação obtida para identificar produtos correlacionados, com vista a determinar quais os produtos que são frequentemente adquiridos em conjunto. Assim, pretende-se analisar os resultados obtidos e implementar novas estratégias de negócio, adaptando a oferta dos supermercados às preferências dos consumidores. Através de várias ferramentas do ecossistema Hadoop, foram analisados os dados visando eliminar qualquer inconsistência presente na base de dados e gerar novas variáveis para a aplicação de uma segmentação por perfil de consumidor e para a extração de regras de associação. Durante a execução do pré-processamento de dados foram utilizadas as ferramentas de SQL para criar um conjunto de KPIs que permitiu perceber o estado atual do negócio do supermercado. Na análise de clusters, foi decidido que seriam definidos 3 grupos. O primeiro cluster foi constituído pelos clientes de necessidades imediatas, o segundo por clientes de contas correntes e o terceiro por consumidores compulsivos. Para cada um dos clusters gerados foram identificadas um conjunto de regras de associação que permitiu entender os hábitos de consumo de cada tipo de cliente. A componente analítica foi implementada em Spark MLlib, em programação Scala. A utilização de Hadoop em conjunto com Spark permitiu a execução de forma integrada, um conjunto de funcionalidades, sendo possível recorrer a linguagens como SQL, HiveQL, Pig Latin, Python ou Scala numa única plataforma.

APA, Harvard, Vancouver, ISO, and other styles

36

LIN, BO-JHEN, and 林帛箴. "The analysis of big data processing systems." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/44963153090913666880.

Full text

Abstract:

碩士
國立暨南國際大學
資訊管理學系
104
This study first reviewed literature on big data technology and systems to compare the strengths and weaknesses of various systems. To provide users with a more efficient operating environment for big data systems, this study introduced a big data processing system capable of setting up systems semi-automatically as well as helping users solve system problems. The proposed system includes two major functions: (a) set up big data systems, and (b) provide solutions to system problems. The designed instructions can effectively assist users set up big data systems such as Spark and Hadoop; and thus the time involved with system set up for users is decreased When users experience system-related problems, such as ―NameNode not running‖ , ―DataNode not running‖, ―ssh password-free log in failure‖, ―no such file or directory,‖ and ―command not found.‖, users may employ the problem solution function to solve problems commonly encountered. For future study, more functions and problem solutions can be developed for the system. Such improvements could be beneficial to future big data system development as well as recommendations to related studies.

APA, Harvard, Vancouver, ISO, and other styles

37

Chen, Zhen-Hua, and 陳珍華. "Big Data:Open Data and Realty Website Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/01934670164456410617.

Full text

Abstract:

碩士
國立交通大學
資訊學院資訊學程
102
Information for buying a house is from friends, real-agent-web and register-real-price data for most people. However ,those data are in different places.There are no direct comparison. I build a model by using register-real-price data of Hsinchu County.First ,I observe data and delete irrelevant data.For example ,I delete office buildings for commercial purposes. Second,I use K-means clustering to get the conclusion.The average price of real-agent-web is higher than the average price of register-real-price .Third,I calculate ratios of real-agent-web 's price to register-real-price's price by the conditions of 「square feet 」and「age of building」. Fourth,I find some real instances to support the experiment.Fifth,I install Apache and MySQL in Ubuntu and write HTML and PHP. I use the UTF-8 character set to process Chinese words in the house-price data. I write a shell script.It can get data from data.gov.tw and tw.house.yahoo.com termly and automatically. I write Python code to process data.The program imports them to the database automatically.I apply for a web space in order to provide the service that analyzes house prices.The system compares the house-price information of real-agent-web and register-real-price in same counties,「square feet」and「age of building」. It also shows mean and standard deviation of price's ratios.I use 「Google Analytics 」to observe user's browsing behavior .I get users' feedback by questionnaires. In conclusion,the analysis of house prices is useful for consumers.

APA, Harvard, Vancouver, ISO, and other styles

38

HUANG, HSIANG-YUN, and 黃湘芸. "Big data analysis of M503 route events." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/q24623.

Full text

Abstract:

碩士
銘傳大學
新媒體暨傳播管理學系碩士班
107
The relationship between the both sides of the Taiwan strait has always been unstable and irregular. The relationship has not stopped even through the Chinese Nationalist government relocated to Taiwan, Hong Kong transshipment, Mini-three-links and the final cross-straits direct transportation link. In 2018, China launched new northbound flights on the M503 route without prior consultation with Taiwan, which let cross-strait relationship decrease to the minimum and cause subsequent problems. The most influential incident was the cross-strait Spring Festival flight event that brought about more than 50,000 people could not return home to celebrate Chinese New Year. It brought great pressure under the public opinion. The research is mainly to explore that after Taiwan has experienced many political rotations, Tsai Ing-wen is in power at present. Because Taiwan Government denied the 1992 consensus, it leads to the relationship between Taiwan and China in a stalemate. China continuous puts pressure on Taiwan also enabled northbound flights on the M503 route without consulting others contribute to Taiwan government and national fell extremely dissatisfied. The purpose of this dissertation is to explore netizen’s opinion and public opinion on the internet. Therefore, the research uses OpView and Keypo to analyze the trend, positive and negative evaluation, opinion leadership, word cloud, and keyword analysis of the M503 route events. Through comprehending the impact and subsequent effects of the M503 route events as well as clarifying the main issues and suggestions for potential problems could be provided to government decision-making units and academics.

APA, Harvard, Vancouver, ISO, and other styles

39

葛喬丹. "Big Data Analysis of Travel YouTube Channels." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/632zjt.

Full text

Abstract:

碩士
輔仁大學
國際創業與經營管理學程碩士在職專班
106
Video content in today’s world is now becoming king. While there are hundreds of ways to get your content out there, YouTube is the leading social media platform for video content. Following the trends of the modern world, there is a growing connection between video content and travel, especially in the newer generations. As a result, travel channels on YouTube such as Sam Kolder, JR Alli and Beautiful Destinations are becoming more and more popular. But why and how? This is important information for travel YouTubers and travel agencies to know when moving forward with building a stronger content strategy. Put simply, the main goal of this study is to figure out why and how famous personal travel channels became famous on YouTube. This study will aim to help future YouTube travel channels, whether it’s personal or business accounts, understand the different trends that could support them in growing their channel according to certain key factors and analyses. In this study, we can see all the key factors involved, using a social big data analysis approach for famous travel YouTubers. A combination of knowing your audience, the style of video you produce and how you present the video are all key aspects to consider when creating video content.

APA, Harvard, Vancouver, ISO, and other styles

40

Lin, Chia-Cheng, and 林家正. "Big Data Analysis on Government Open Data to Establish Virtual Data Set." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/a7ujtc.

Full text

Abstract:

碩士
國立中正大學
資訊管理學系碩士在職專班
107
Demographic structure is one of the basic factors for measuring the competitiveness of a country. However, due to the short-term changes in the demographic structure is not obvious that in the past when people discussed the implementation policies of national development and social welfare policies, they were often not given priority. In addition, due to the difficulty in obtaining government administrative data in the past, it had caused doubts about the administrative operations and also restricted the conduct of related research activities. With the changes of the times, the accumulation of political statistics for many years has become an important asset. The decree also evolved in response to the current situation, regulating the government's open administrative information to the outside world to promote the participation of civil public affairs. Meanwhile, the information software and hardware are highly popular and the infrastructure is complete, and the computing and storage costs are greatly reduced, which increases the feasibility of the outside world by analyzing the public information of the government to explore social issues. This research is based on the publicly information of the government and the calculation of the annual ring formation method of the Executive Yuan. Using the big data technology to carry out the virtual data set of populations in the next 20 years. Finally, the results of the research are presented with visual tools, hoping to bring advantages and contribute to the application of related fields.

APA, Harvard, Vancouver, ISO, and other styles

41

Lu, Jui-nan, and 呂瑞男. "Marketing Strategy Application Research of Big Data Analysis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/523jvh.

Full text

Abstract:

碩士
國立中山大學
高階經營碩士班
103
IDC’s estimation of the growth of the Big Data market is to exhibit strong growth over the next five years. Market activities can create demand from Big Data analysis, which refers to the ever-increasing volume, variety, velocity, variability and complexity of information. For marketing organizations, big data is the fundamental consequence of the new marketing landscape, born from the digital world we now live in. In business, Big data analysis can help market analysts to distinguish from consumer database in different consumer masses, and summarize the consumption patterns of each species or consumer spending habits, it would be regarded as information pick the modules, and in categories or clusters (Cluster) polymerization in the future to analyze and summarize a specific characteristics of each item, by analyzing the data analysis and classification algorithms easy to pick. Through this case study we found that the direct and indirect effects of the use of big data marketing strategies to consumers, and create market demand behavior analysis summarized the use of strategies to stimulate consumer behavior and market potential to meet the needs of marketing behavior.

APA, Harvard, Vancouver, ISO, and other styles

42

黃日佳. "Designing of Virtual Matrix of Big Data Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/wn6zev.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

KAO, SHIH-YAN, and 高式彥. "Constructing Kano Model by Using Big Data Analysis." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/79061541046176865069.

Full text

Abstract:

碩士
逢甲大學
工業工程與系統管理學系
105
Big data has long been a hot topic and could potentially be applied to many organizations which can collect enormous amount of data. In data mining nowadays, various types of analytical methods have been developed for solving specific mining problems that fits the data type. This study uses big data analysis to construct Kano model. The association rules mined from the collected product dataset with Apriori algorithm are used to replace the questionnaire step in the traditional Kano model analysis. Finally, this study uses a wine product dataset to illustrate the proposed method and the comparison with traditional method is also discussed.

APA, Harvard, Vancouver, ISO, and other styles

44

Che=Wei, Chuang, and 莊哲偉. "A Scalable Storage System for Big Data Analysis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/35852997098724933407.

Full text

Abstract:

碩士
國立交通大學
電子工程學系電子研究所
104
Recently, Machine learning has been widely used in various areas. Since Machine Learning is basically for Big Data Analysis which requires large amount of computation loads and storage, Machine learning will be efficiently accelerated if only if computation ability and storage equipment are both properly optimized through some methodologies. We tried to explore a hardware/software co-design platform for big data analysis with machine learning capability and storage scalability to solve the two major problems in Machine learning that is power, and Speed. To verify this concept, we built a scalable storage system on Hadoop which adopts heterogeneous architectures (CPU+FPGA) for acceleration and power reduction. this thesis will introduce this platform's parameter setting in detail, port a well-known clustering algorithm K-means onto this platform and finally show the profiled comparison between CPU clustering and CPU+FPGA clustering in speed and power. Based on this profiling result, we can claim that this architecture really works. This architecture is different from the solution of CPU+GPU cluster multi-core architecture proposed by Microsoft on 2009. This proposed solution also has the ability of acceleration. Another advantage of it will be of that this architecture can be implemented as the prototype of ASIC and offers a rather accurate prediction of the acceleration after taping out as a chip. Like in this platform, we implement the circuit on FPGA at 120MHz, however this same circuit can pass 200MHz test simulated at UMC 90nm technology which gives us a prediction of the speed. Also the final acceleration is around 25 times faster than not accelerated A9 CPU cluster.

APA, Harvard, Vancouver, ISO, and other styles

45

Tsai, Ming-Chun, and 蔡明純. "Big Semiconductor Manufacturing Data Analysis Using Cloud Technique." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/36458920854256695872.

Full text

Abstract:

碩士
國立中山大學
資訊工程學系研究所
101
In the semiconductor manufacturing industry, one of the key factors to improve the wafer quality is to analyze the existing logs and find out the probable causative parameters affecting the yield of wafers. Due to the huge amount of data and large amounts of parameters that are recorded in logs, it is difficult for the traditional statistical analysis and relational analysis to process such big data to find out the critical parameters affecting the yields. In order to conquer the analysis bottleneck of big data, we take advantage of the high performance computing of MapReduce and design a novel cloud technique with MapReduce named island-based cloud genetic algorithm (ICGA) to mine the critical information. ICGA is integrated of the cloud genetic algorithm and k nearest neighbor (KNN) classifier. In addition, we adopt the concept of statistics to perform the outlier detection to find out the sensitive parameters. Eventually, the critical parameters discovered by ICGA and sensitive parameters detected by the outlier detection are cross verified to obtain the most discriminative parameters. The obtained most discriminative parameters are used to classify the good and the bad wafers. Experimental results show that these parameters can discriminate between good wafers and bad ones with 100% accuracy. In addition, compared with the standalone GA, ICGA speed up the computation by more than 4 times.

APA, Harvard, Vancouver, ISO, and other styles

46

YEH, JOBA, and 葉佩峰. "Research on Big Data Analysis Platform and Services." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m9d8qw.

Full text

Abstract:

碩士
龍華科技大學
資訊管理系碩士班
106
With the development of Internet technology and the innovation of the Internet of Things, businesses have accumulated huge amounts of data onto different types. Traditional data processing techniques are insufficient to handle such increasingly diverse data. Confronting the huge amounts of data and creating high added value in business activities from whom are the new challenges to many companies in recent years. Compared with foreign countries, there were fewer researches of teaching cases and academic studies about the big data analysis platform in Taiwan. This thesis has made some efforts in the research of big data analysis platform. Hadoop, being started as a subproject by the Apache Foundation in 2005, has opened the door for big data techniques research. Among the various branches of commercial distributions, the release that is known to operate in a business model and provides high compatibility and stability is Cloudera. Based on Cloudera, this study explores the Hadoop techniques and the corresponding virtualization and backup strategies, and is divided into two parts in the applications. In the first part of the application, this study explores the construction of an open virtualized format (.OVF) for teaching and personal use. In the second part of the application, taking the Lunghwa University of Science and Technology as a study case to explore the strategy of building a powerful backup cluster using VMware ESXI and MKSBackup with limited resources. This study can serve as a reference site for most SMEs, SOHO groups or colleges with scarce resources. The research results make the self-established big data platform easy to implement and elevate the technical level of big data analysis platform.

APA, Harvard, Vancouver, ISO, and other styles

47

Chang, Fu-Chi, and 張富祺. "Trend Forecasting of Influenza Using Big Data Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/c22hst.

Full text

Abstract:

碩士
國立臺灣科技大學
電機工程系
105
Accurate tracking the outbreak of an infectious disease, like influenza, helps Public Health to make timely and significant decisions that could calm the fear of people and save lives. A traditional disease caring system based on confirmed cases reports an outbreak typically with at least one-week lag. Therefore, some surveillance systems by monitoring indirect signals about influenza have been proposed to provide a faster unearthing. The volume of those signals is huge and could be pick out from social networks or searching databases. Yahoo and Google, the top two internet search providers who own those Big Data had fired researches about disease tracking ever. In this study, we first draw out the huge influenza signals from CDC (Central Disease Control, Taiwan) database, Google Trends database and King Net database. Then, the linear and nonlinear analyses between three databases are investigated. We found a high correlation existed between series drawn from three databases in years (2011-2016) under survey regardless of linear or nonlinear analysis. Furthermore, we proposed a nonlinear tracking model to capture changes in this epidemic trend, and we can detect the outbreak of influenza more early in years with heavy infectious. These results prove that the signals exposed on networks can provide rich material to trend events of human society.

APA, Harvard, Vancouver, ISO, and other styles

48

WU, CHUN-YI, and 吳俊逸. "Improve Warehouse Carousel Utilization via Big Data Analysis." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7pgmj3.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系
107
Large warehouses usually consist of an automatic picking system and a manual picking system at the same time. Automatic picking equipment is expensive but efficient such that picking time can be significantly reduced. However, the storage space of the equipment is limited. It is necessary to consider what kind of SKUs and how many quantities shall be allocated to the automatic system so as to achieve optimal utilization. In this stduy, we propose SKU allocation models via big data analysis of historical data to improve the picking rate of the carousel system. Through ABC classification for SKUs and analysis of PCB for outbound packages, we design models for determining maximum stocks and safety stocks for individual SKUs in the carousel. Safety stock is determined by two strategies: moving average and weighted moving average. Maximum stock is set based on the safety stock. In essence, the SKUs with a high purchase rate in the form of carton packages shall be stored in the carousel. By analyzing large amount of historical data and simulating the storage models, we compare model performance and determine the combination of model parameters for achieving the best utilization of the carousel and improving picking efficiency.

APA, Harvard, Vancouver, ISO, and other styles

49

LI, YVEN-YANG, and 李岳揚. "Big Data Analysis of Process Equipment Quality Factors." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/94ew3s.

Full text

Abstract:

碩士
國立中正大學
企業管理學系碩士在職專班
107
Under the trend of Industry 4.0, how to guide the continuous transformation of manufacturing industry to smart manufacturing, how to turn the data produced in production into useful information, and how to implement big data analysis to obtain process variations are going to help for enhancing the survival and profitability of enterprises, gaining the appreciation of customers. Accordingly, the adoption of data mining tools can help to obtain real-time data analysis in a timely manner. In addition, how to combine academic techniques and industrial applications will be the subject for enterprise upgrading in the future. The purpose of this study is to explore how to find out the parameter variations point of the machines through the production data processed supported by the application of decision tree models. We investigate how X factor(s) can influence on Y factor(s) in the manufacturing process with the resulted from the production process. With the data mining with our collected data obtained from the manufacturing behavior, this is expected that the manufacturing processes are able to be optimized and improved by the data analysis to reduce the manufacturing variations. In addition, the results and series of explorations can be beneficial to firms or mangers for the quality improvement of other manufacturing processes in the future.

APA, Harvard, Vancouver, ISO, and other styles

50

Luh, Chien-huan, and 陸建寰. "Big Data Analysis Applied To Retailers Recommended System." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/urgka7.

Full text

Abstract:

碩士
大同大學
資訊經營學系(所)
106
With the rapid development of information and network technology, huge amounts of data have become a new trend in the field of global information and services. Due to the complexity and diversity of information, how to find out real useful information from the huge amount of data employing big data analysis will be the key issue for companies to win the business competition. Although E-commerce is convenient and express, there are still many customers who like to personally touch, try on and purchase goods in the retail stores. This study employs the recommendation system on the retail store and enables the situation closer to the real one. Additionally, a modified collaborative filtering method with weight distribution is proposed to increase its accuracy. The consumers could find the products they need more quickly with the aid of the personalized recommendation system. Moreover, it can be utilized to analyze the consumers’ past consumption information and thus predict the consumer’s preference for the products they purchased. Therefore, the companies can provide consumers with more appropriate services and reduce the wasted shopping time. In addition to improving the customer loyalty and realizing industrial intelligence, big data analysis can increase more profit to the industry.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Big data analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles