Dissertations / Theses on the topic 'Clickstream'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 29 dissertations / theses for your research on the topic 'Clickstream.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kliegr, Tomáš. "Clickstream Analysis." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-2065.
Full textJamalzadeh, Mohammadamin. "Analysis of clickstream data." Thesis, Durham University, 2011. http://etheses.dur.ac.uk/3366/.
Full textEkberg, Fredrik. "Jämförelse av analysmetoder för clickstream-data." Thesis, University of Skövde, School of Humanities and Informatics, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-873.
Full textDet här arbetet har som syfte att genom en jämförelse av olika analysmetoder för clickstream-data kunna fungera som en vägledning när en metod ska implementeras. Metoden som använts vid jämförelsen är litteraturstudie i och med att de analyseringsmetoder som ska undersökas redan är framtagna och kunskap om dem fås genom att studera litteratur i vilka de förekommer. Ett antal kriterier används sedan vid själva jämförelsen, anledningen till detta är att metoderna ska jämföras utifrån en gemensam grund.
De metoder som uppfyllde kraven för de olika kriterierna bäst var page events fact model och subsession fact model. Subsession fact model kan dock upplevas som det bästa valet i alla lägen men samtidigt är den kanske lite överdriven om clickstream-datan bara ska användas till att se hur besökarna använder varje individuell sida för att användas i designsupport syfte. Det går alltså att påvisa att syftet styr vilken metod som är mest lämpad.
Hotle, Susan Lisa. "Applications of clickstream information in estimating online user behavior." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53507.
Full textLi, Richard D. (Richard Ding) 1978. "Web clickstream data analysis using a dimensional data warehouse." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86671.
Full textIncludes bibliographical references (leaves 83-84).
by Richard D. Li.
M.Eng.
Wong, Mark Alan. "Logging clickstream data into a database on a consolidated system /." Full text open access at:, 2002. http://content.ohsu.edu/u?/etd,274.
Full textJohansson, Henrik. "Using clickstream data as implicit feedback in information retrieval systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233870.
Full textDet här examensarbetets mål är att undersöka om Wikipedias klickströmsdata kan användas för att förbättra sökprestanda för informationsökningssystem. Arbetet har utförts under antagandet att en övergång mellan två artiklar på Wikipedia sammankopplar artiklarnas innehåll och är av intresse för användaren. För att kunna utnyttja klickströmsdatan krävs det att den struktureras på ett användbart sätt så att det givet en artikel går att se hur läsare har förflyttat sig ut eller in mot artikeln. Vi valde att utnyttja datamängden genom en automatisk sökfrågeexpansion. Två olika metoder togs fram, där den första expanderar sökfrågan med hela artikeltitlar medans den andra expanderar med enskilda ord ur en artikeltitel.Undersökningens resultat visar att den ordbaserade expansionsmetoden presterar bättre än metoden som expanderar med hela artikeltitlar. Den ordbaserade expansionsmetoden lyckades uppnå en förbättring för måttet MAP med 11.21%. Från arbetet kan man också se att expansionmetoden enbart förbättrar prestandan när täckningen för den ursprungliga sökfrågan är liten. Gällande strukturen på klickströmsdatan så presterade den utgående strukturen bättre än den ingående. Examensarbetets slutsats är att denna klickströmsdata lämpar sig bra för att förbättra sökprestanda för ett informationsökningssystem.
Collin, Sara, and Ingrid Möllerberg. "Designing an Interactive tool for Cluster Analysis of Clickstream Data." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-414237.
Full textNeville, Kevin. "Channel attribution modelling using clickstream data from an online store." Thesis, Linköpings universitet, Statistik och maskininlärning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139318.
Full textBača, Roman. "Sběr sémanticky obohacených clickstreamů." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-76722.
Full textMacGibbon, David George. "An investigation into the effects of perceptions of person-team fit during online recruitment; and the uses of clickstream data associated with this medium." Thesis, University of Canterbury. Psychology, 2012. http://hdl.handle.net/10092/7007.
Full textEl-Gharib, Najah Mary. "Using Process Mining Technology to Understand User Behavior in SaaS Applications." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39963.
Full textOlanyk, Luís Roberto Zart. "Um modelo para a implantação de um Data Mart de Clickstream para empresas provedoras de acesso à internet de pequeno e médio porte." Florianópolis, SC, 2002. http://repositorio.ufsc.br/xmlui/handle/123456789/83561.
Full textMade available in DSpace on 2012-10-20T01:09:10Z (GMT). No. of bitstreams: 1 189333.pdf: 663186 bytes, checksum: 23c0867066721833aa2a4272629e3570 (MD5)
Data warehousing é um dos campos de Sistemas de Apoio a Decisão (SAD) com mais rápida expansão na recente Tecnologia da Informação (TI). A Internet, apesar de sua juventude, mostra-se como um superpovoado ambiente de informações e com um alto grau de competitividade. Com o intuito de ampliar o relacionamento com clientes que utilizam sites da Web o presente trabalho busca formular as bases para construção de uma ferramenta SAD que auxilie neste relacionamento. No trabalho são descritos os conceitos referenciados na literatura para construção de um data warehouse de clickstream, demonstrando os requisitos necessários e citando os principais pontos onde diferentes soluções se aplicam, para que, com bases sólidas se verifiquem quais as melhores opções podem ser empregadas na implantação do projeto. De acordo com a estrutura física da organização em estudo, um modelo de implantação de um data mart de clickstream é proposto. Buscando solucionar problemas de navegação e com o foco na busca por uma melhora do serviço prestado para os clientes da organização é executada a implantação do protótipo, o qual mostrou-se importante para auxiliar estas tarefas. Alguns dos resultados obtidos são apresentados, demonstrando assim o poder do protótipo construído. Por fim são discutidas algumas recomendações para trabalhos futuros.
Wang, Yufei 1981. "An analysis of different data base structures and management systems on Clickstream data collected for advocacy based marketing strategies experiments for Intel and GM." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33387.
Full textIncludes bibliographical references (leaves 82-83).
Marketing on the Internet is the next big field in marketing research. Clickstream data is a great contribution to analyze the effects of advocacy based marketing strategies. Handling Clickstream data becomes a big issue. This paper will look at the problems caused by Clickstream data from a database perspective and consider several theories to alleviate the difficulties. Applications of modern database optimization techniques will be discussed and this paper will detail the implementation of these techniques for the Intel and GM project.
by Yufei Wang.
M.Eng.and S.B.
Mccart, James A. "Goal Attainment On Long Tail Web Sites: An Information Foraging Approach." Scholar Commons, 2009. http://scholarcommons.usf.edu/etd/3686.
Full textForslund, John, and Jesper Fahlén. "Predicting customer purchase behavior within Telecom : How Artificial Intelligence can be collaborated into marketing efforts." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279575.
Full textDenna studie undersöker implementeringen av en AI-modell som förutspår kunders köp, inom telekombranschen. Studien syftar även till att påvisa hur en sådan AI-modell kan understödja beslutsfattande i marknadsföringsstrategier. Genom att designa AI-modellen med en Recurrent Neural Network (RNN) arkitektur med ett Long Short-Term Memory (LSTM) lager, drar studien slutsatsen att en sådan design möjliggör en framgångsrik implementering med tillfredsställande modellprestation. Instruktioner erhålls stegvis för att konstruera modellen i studiens metodikavsnitt. RNN-LSTM-modellen kan med fördel användas som ett hjälpande verktyg till marknadsförare för att bedöma hur en kunds beteendemönster på en hemsida påverkar deras köpbeteende över tiden, på ett kvantitativt sätt - genom att observera det ramverk som författarna kallar för Kundköpbenägenhetsresan, på engelska Customer Purchase Propensity Journey (CPPJ). Den empiriska grunden av CPPJ kan hjälpa organisationer att förbättra allokeringen av marknadsföringsresurser, samt gynna deras digitala närvaro genom att möjliggöra mer relevant personalisering i kundupplevelsen.
Berg, Marcus. "Evaluating Quality of Online Behavior Data." Thesis, Stockholms universitet, Statistiska institutionen, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-97524.
Full textChen, Ting-Rui, and 陳廷睿. "Extended Clickstream: an analysis of the missing user behaviors in the Clickstream." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/9nrd3r.
Full text國立中央大學
資訊工程學系
107
Nowadays, people often use clickstream to represent the behavior of online users. However, we found that clickstream only represents part of users' browsing behaviors. For instance, clickstream does not include tab switching and browser window switching. We collect these kinds of behaviors and named as ``extended clickstream". This thesis builds a service to capture both of clickstream and extended clickstream, also provides an analysis of the differences between above. We use a Multi-Task learning model with GRU components to perform multi-objective predictions of ``what kind of website the user will go next time" and ``how long the interval of clicks will be" for the time series of clickstreams and extended clickstreams. Our experimental results show that combining clickstream and extended clickstream can improve the prediction performance. In addition, this article finds that the clickstream will record unintended clicks due to the operation mechanism of certain websites. Moreover, we can differentiate the single user from several devices by combining the clickstream and extended clickstream.
Moe, Wendy W., and Peter S. Fader. "Capturing Evolving Visit Behavior in Clickstream Data." 2001. http://hdl.handle.net/10150/105085.
Full textSu, Ching-Lun, and 蘇敬倫. "Predicting Online Purchasing Behavior Using Clickstream Data." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/nnmd7f.
Full text國立臺灣大學
經濟學研究所
106
Online shopping has been booming in recent ten years. It is now a critical issue for online retailers how to make good use of the rich data generated in the process of online shopping. Online retailers cannot observe physical characteristics of the customers, such as gender and age. But they can use browsing data to analyze customers’ preferences and predict purchasing behavior. This study explores the relationships between browsing behavior, customer characteristics, and purchase results using clickstream data from the website of an online wine retailer. I use a K-Means model to cluster customers based on the filters they chose when browsing the website. I find the clustering results are significantly correlated with customers’ location and gender. Also, the more filters a customer choose before a purchase, the more wines they buy and the higher their order total. The results of logistic regressions show that customers who choose a low price range to filter products are most likely to buy.
Teixeira, Ricardo Filipe Fernandes e. Costa Magalhães. "Using clickstream data to analyze online purchase intentions." Dissertação, 2015. https://repositorio-aberto.up.pt/handle/10216/83497.
Full textNowadays, traditional business techniques are almost deprecated due to the insurgence of the world of online virtual shopping, the so-called e-commerce. This new, in many ways, uncharted territory poses difficult challenges when it comes to apply marketing techniques especially traditional methods, as these are not effective when dealing with online customers. In this context, it is imperative that companies have a complete in-depth understanding of online behavior in order to succeed within this complex environment in which they compete.The server Web logs of each customer are the main sources of potentially useful information for online stores. These logs contain details on how each customer visited the online store, moreover, it is possible to reconstruct the sequence of accessed pages, the so-called clickstream data. This data is fundamental in depicting each customer's behavior. Analyzing and exploring this behavior is key to improve customer relationship management. The analysis of clickstream data allows for the understanding of customer intentions. One of the most studied measures regards customer conversion, that is, the percentage of customers that will actually perform a purchase during a specific online session. During this dissertation we investigate other relevant intentions, namely, customer purchasing engagement and real-time purchase likelihood. Actual data from a major European online grocery retail store will be used to support and evaluate different data mining models.
Teixeira, Ricardo Filipe Fernandes e. Costa Magalhães. "Using clickstream data to analyze online purchase intentions." Master's thesis, 2015. https://repositorio-aberto.up.pt/handle/10216/83497.
Full textNowadays, traditional business techniques are almost deprecated due to the insurgence of the world of online virtual shopping, the so-called e-commerce. This new, in many ways, uncharted territory poses difficult challenges when it comes to apply marketing techniques especially traditional methods, as these are not effective when dealing with online customers. In this context, it is imperative that companies have a complete in-depth understanding of online behavior in order to succeed within this complex environment in which they compete.The server Web logs of each customer are the main sources of potentially useful information for online stores. These logs contain details on how each customer visited the online store, moreover, it is possible to reconstruct the sequence of accessed pages, the so-called clickstream data. This data is fundamental in depicting each customer's behavior. Analyzing and exploring this behavior is key to improve customer relationship management. The analysis of clickstream data allows for the understanding of customer intentions. One of the most studied measures regards customer conversion, that is, the percentage of customers that will actually perform a purchase during a specific online session. During this dissertation we investigate other relevant intentions, namely, customer purchasing engagement and real-time purchase likelihood. Actual data from a major European online grocery retail store will be used to support and evaluate different data mining models.
Chen, Po Chu, and 陳伯駒. "Predicting Consumers’ Purchase Decision by Clickstream Data: A Machine Learning Approach." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/cdk6y2.
Full text國立臺灣大學
經濟學研究所
106
In the recent years, numerous commerces have gradually shifted from physi- cal store to web-shops, so-called the e-commerce. These online stores contain lots of log files in the back-end which basically record the pages accessed by visitors, namely the clickstream data. In this study, we predict consumers’ purchase decision by analyzing the clickstream data from an online wine re- tailer. We impose two modern machine learning model, decision tree and ran- dom forest, to predict consumers’ final purchase intention. Besides the normal features based on visitors’ activities on the website, we construct a new feature that clusters different groups of visitors according to the sequence page-type accessed. After re-sampling to remedy the unbalanced data, our two models both show high predictive accuracy up to 90% and provides a new insight for retailer to target some specific visitors on website.
Chang, Peishih. "Sifting customers from the clickstream behavior pattern discovery in a virtual shopping environment /." Thesis, 2007. http://library1.njit.edu/etd/fromwebvoyage.cfm?id=njit-etd2007-043.
Full text"Um Modelo Para A Implantação de Um Data Mart de Clickstream Para Empresas Provedoras de Acesso À Internet de Pequeno E Médio Porte." Tese, Programa de Pós Graduação em Engenharia de Produção, 2002. http://teses.eps.ufsc.br/defesa/pdf/7993.pdf.
Full textCamacho, Pedro André Freitas. "Sistema de recomendação em real-time para reserva de transfers." Master's thesis, 2020. http://hdl.handle.net/10071/22131.
Full textThe continued growth in the number of tourists in recent years is proportional to the incresead use of transfer services. The offer of this type of service is becoming a trend. Today’s customers are more demanding and require a more streamlined and personalized online experience, which can be achieved through techniques to anticipate customer behaviour. In contemporary society, the search for mechanisms that can recommend or assist in choosing products or services is increasingly a trend, fostering the concepts of crossselling and upselling in companies. The acquisition of private transfer services through reservations on the websites generate a large amount of data that can be used to segment customers and build a recommendation system that suggest other products or services to the customer. In the course of this dissertation, we present and develop a hybrid classification model based on a transfer company based in the Algarve, which intends to increase sales of its parallel services (experiences/ tours). An exploratory analysis was carried out to identify the company’s customers’ behaviour and patterns and apply customer segmentation techniques. The proposed recommendation system works with a classification model in which it determines, in the first stage, potential buyers of experiences. Later, in a second phase, it suggests which of the available experiences will be best suited to each client. Only a low percentage of customers who buy transfer services also buy experiences and are intended to increase this percentage.
Mota, Gabriel Ivan da Silva Rosa Neco da. "Detection of fraud patterns in electronic commerce environments." Master's thesis, 2014. http://hdl.handle.net/1822/33392.
Full textElectronic transactions (e-commerce) have revolutionized the way consumers shop, making small and local retailers, which were being affected by the worldwide crisis, accessible to the entire world. As e-commerce market expands, commercial transactions supported by credit cards - Card or Customer Not Present (CNP) also increases. This growing relationship, quite natural and expected, has clear advantages, facilitating e-commerce transactions and attracting new possibilities for trading. However, at the same time a big and serious problem emerge: the occurrence of fraudulent situations in payments. Fraud imposes severe financial losses, which deeply impacts e-commerce companies and their revenue. In order to minimize losses, they spend a lot of efforts (and money) trying to establish the most satisfactory solutions to detect and counteract in a timely manner the occurrence of a fraud scenario. In the ecommerce domain, fraud analysts are typically interested in subject oriented customer data, frequently extracted from each order process that occurred in an e-commerce site. Besides transactional data, all their behavior data e.g. clickstream data are traced and recorded, enriching the means of detection with profiling data and providing a way to trace customers behavior along time. In this work, a signature-based method was used to establish the characteristics of user behavior and detect potential fraud cases. Signatures have already been used successfully for anomalous detection in many areas like credit card usage, network intrusion, and in particular in telecommunications fraud. A signature is defined by a set of attributes that receive a diverse range of variables - e.g. the average number of orders, time spent per order, number of payment attempts, number of days since last visit, and many others - related to the behavior of a user, referring to an e-commerce application scenario. Based on the analysis of user behavior deviation, detected by comparing the user recent activity with the user behavior data, which is expressed through the user signature, it's possible to detect potential fraud situations (deviate behaviors) in useful time, giving a more robust and accurate support decision system to the fraud analysts on their daily job.
As transações electrónicas (e-commerce) têm revolucionado a maneira como os consumidores fazem compras on-line, facilitando o acesso a partir de qualquer parte do globo, a retalhistas pequenos e locais, que estão a ser afectados pela crise mundial. À medida que o mercado do e-commerce se expande, transações comerciais suportadas por cartões de crédito – Cartão ou Cliente Não Presente (CNP) - também aumentam. Este crescimento natural e expectável apresenta claras vantagens, facilitando as transações e-commerce e atraindo novas possibilidades de negócio. Contudo, ao mesmo tempo, existe um grande e grave problema: a ocorrência de situações fraudulentas nos pagamentos. A fraude encontra-se associada a graves perdas financeiras, que têm um impacto profundo na receita de companhias de comércio electrónico. Grandes esforços (e dinheiro) são gastos numa tentativa de estabelecer soluções mais satisfatórias na detecção de casos de fraude em tempo útil, por forma a minimizar perdas. No domínio do e-commerce, os analistas de fraude estão tipicamente interessados em dados orientados ao consumidor, extraídos de cada uma das ordens de compra realizadas no site de comércio electrónico. Além dos dados transacionais, todos os dados comportamentais, i.e. dados clickstream, são guardados, enriquecendo assim os meios de detecção e garantindo uma forma de rastrear o comportamento dos consumidores ao longo do tempo. Neste trabalho utilizámos um método baseado na aplicação de assinaturas para estabelecer características comportamentais de consumidores e-commerce e assim, detectar potenciais casos de fraude. A aplicação de assinaturas foi já usada com sucesso na detecção de anomalias em diversas áreas, como a utilização de cartões de crédito, intrusão de redes e em particular, fraude em telecomunicações. Uma assinatura é definida por um conjunto de atributos que recebem um diverso leque de variáveis - e.g. número médio de encomendas, tempo de compra, número de tentativas de pagamento, número de dias desde a última visita, entre muitos outros – relacionados com o comportamento de um consumidor. Baseado na análise do desvio comportamental do consumidor, detectado através da comparação da sua atividade recente, com os seus dados comportamentais, expressados através da sua assinatura, é possível a detecção de potenciais casos de fraude (comportamentos díspares do habitual) em tempo real, garantindo assim um sistema mais robusto e preciso, capaz de servir de suporte à decisão aos analistas de fraude no seu trabalho diário.
Borges, Eurico Alexandre Teixeira. "Sistemas de Data Webhousing : análise, desenho, implementação e exploração de sistemas reais." Master's thesis, 2004. http://hdl.handle.net/1822/2787.
Full textA Web tem-se tornado um dos espaços mais apelativos para as organizações como forma de divulgação das suas actividades, promoção dos seus produtos e serviços e desenvolvimento de actividades comerciais. Todavia, os visitantes de um sítio Web podem facilmente saltar para um sítio da concorrência caso não encontrem rapidamente aquilo que procuram, ou se tiverem qualquer outro motivo que não seja do seu agrado. Conhecer os visitantes e garantir que os produtos, serviços ou informação são aqueles que eles procuram é imperativo. É por isso que as organizações têm tentado analisar vários tipos de questões relacionadas, por exemplo, com a forma como os clientes procuram os produtos, onde abandonam o sítio e porquê, qual a frequência de visitas dos seus clientes, quais os produtos ou serviços que mais interesse despertaram nos visitantes, enfim tudo o que possa contribuir para a melhoria do sítio e para manter ou atrair novos clientes. Todos os movimentos e selecções dos utilizadores de um sítio Web podem ser acompanhados através dos “cliques“ que vão fazendo ao longo do seu processo de interacção com as diversas páginas Web. A esta sequência de “diques” dá-se o nome de clickstream. Será a partir dos dados registados pelo servidor Web sobre as selecções do utilizador que se poderá iniciar o estudo das suas iterações e comportamento. Contudo, o registo mantido pelos servidores Web forma apenas um esqueleto que terá de ser enriquecido com os registos dos vários componentes e sistemas que suportam o seu funcionamento. Este tipo de integração e conciliação de dados num único repositório é, tradicionalmente, feito no seio de um Data Warehouse que, pelo acréscimo dos dados de dlickstream, se torna num Data Webhouse. Todo o processo de extracção, transformação e integração no Data Webhouse é, no entanto, dificultado pelo volume, incomplitude e heterogeneidade dos dados e pela própria tecnologia utilizada no ambiente Web. Nesta dissertação, é apresentado e descrito um modelo dimensional para um Data Webhouse para análise de um sítio Web comercial. São estudadas e apresentadas algumas das suas fontes de dados bem como técnicas que podem ser utilizadas para eliminar ou reduzir os problemas existentes nos dados de clickstream. É descrito todo o desenvolvimento e implementação do processo de extracção, limpeza, transformação e integração de dados no Data Webhouse com especial relevo para as tarefas de clickstream - a identificação de utilizadores e agentes automáticos e a reconstrução de sessões. É apresentado o Webuts — Web Usage Tracking Statistics, um protótipo de um sistema de apoio à decisão para acompanhamento e análise estatística das actividades dos utilizadores de um sítio Web e onde se incorporam alguns dos elementos, técnicas, princípios e práticas descritas.
The Web is becoming one of the most appeallng environments for the many organisations as a means of promoting its businesses and activities as well as a commercialisation channel. However, a Web user can easily leave one organisation’s Web site for its competitors if he doesn’t find what he is looking for or if he finds something unpleasant on one organisation’s site. To know the site’s users and making sure that the products, services or information the site is providing is what the users want is nowadays a must. That is why many organisations have started to study how their web site users browse the site, where are they leaving the site and why, how frequently do their users return, what products and services are most appealing and, in general terms, everything that may be used to improve the Web site and attract new users. Every user moves may be tracked by retaining the clicks selections they do on the different Web pages during their visit. This flow of clicks is now called clickstream. It is the data logged by the Web server on the user’s selections that will enable the organisation to study their moves and behaviour. However, the Web server log only keeps the bare bones of the user’s activity. This data will have to be enriched with data collected by other systems designed to provide the Web site with contents or additional functionalities. Traditionally, the gathering and integration of data from heterogeneous data sources is done inside a Data Warehouse. By adding clickstream data to it we are creating a Data Webhouse. However, Web technology, the data volume, its heterogeneity and incompleteness will create difficulties in the process of extracting, transforming and loading data into the Data Webhouse. In this document we present a dimensional model for a Data Webhouse whose purpose is to analyse a commercial Web site. Several data sources are presented and analised in detail. Some of the techniques used to eliminate or reduce clickstream data problems are also described. The Data Webhouse extraction, cleaning, transformation and loading process is described and special attention is paid to clickstream processing tasks such as user and robot identification and user session reconstruction. A new decision support system prototype, named Webuts - Web Usage Tracking Statistics, is presented. This system’s purpose is to track and analyse a Web site users’ moves and actitivities as well as generate some statistical data on the Web site operation. Its operation is based on a Data Webhouse and its development incorporated some of the elements, techniques and best practices studied and described.
Sonae, Indústria Consultoria e Gestão - Departamento de Sistemas de Informação
Cavalcanti, Fábio Torres. "Incremental mining techniques." Master's thesis, 2005. http://hdl.handle.net/1822/3965.
Full textThe increasing necessity of organizational data exploration and analysis, seeking new knowledge that may be implicit in their operational systems, has made the study of data mining techniques gain a huge impulse. This impulse can be clearly noticed in the e-commerce domain, where the analysis of client’s past behaviours is extremely valuable and may, eventually, bring up important working instruments for determining his future behaviour. Therefore, it is possible to predict what a Web site visitor might be looking for, and thus restructuring the Web site to meet his needs. Thereby, the visitor keeps longer navigating in the Web site, what increases his probability of getting attracted by some product, leading to its purchase. To achieve this goal, Web site adaptation has to be fast enough to change while the visitor navigates, and has also to ensure that this adaptation is made according to the most recent visitors’ navigation behaviour patterns, which requires a mining algorithm with a sufficiently good response time for frequently update the patterns. Typical databases are continuously changing over the time, what can invalidate some patterns or introduce new ones. Thus, conventional data mining techniques were proved to be inefficient, as they needed to re-execute to update the mining results with the ones derived from the last database changes. Incremental mining techniques emerged to avoid algorithm re-execution and to update mining results when incremental data are added or old data are removed, ensuring a better performance in the data mining processes. In this work, we analyze some existing incremental mining strategies and models, giving a particular emphasis in their application on Web sites, in order to develop models to discover Web user behaviour patterns and automatically generate some recommendations to restructure sites in useful time. For accomplishing this task, we designed and implemented Spottrigger, a system responsible for the whole data life cycle in a Web site restructuring work. This life cycle includes tasks specially oriented to extract the raw data stored in Web servers, pass these data by intermediate phases of cleansing and preparation, perform an incremental data mining technique to extract users’ navigation patterns and finally suggesting new locations of spots on the Web site according to the patterns found and the profile of the visitor. We applied Spottrigger in our case study, which was based on data gathered from a real online newspaper. Our main goal was to collect, in a useful time, information about users that at a given moment are consulting the site and thus restructuring the Web site in a short term, delivering the scheduled advertisements, activated according to the user’s profile. Basically, our idea is to have advertisements classified in levels and restructure the Web site to have the higher level advertisements in pages the visitor will most probably access. In order to do that, we construct a page ranking for the visitor, based on results obtained through the incremental mining technique. Since visitors’ navigation behaviour may change during time, the incremental mining algorithm will be responsible for catching this behaviour changes and fast update the patterns. Using Spottrigger as a decision support system for advertisement, a newspaper company may significantly improve the merchandising of its publicity spots guaranteeing that a given advertisement will reach to a higher number of visitors, even if they change their behaviour when visiting pages that were usually not visited.
A crescente necessidade de exploração e análise dos dados, na procura de novo conhecimento sobre o negócio de uma organização nos seus sistemas operacionais, tem feito o estudo das técnicas de mineração de dados ganhar um grande impulso. Este pode ser notado claramente no domínio do comércio electrónico, no qual a análise do comportamento passado dos clientes é extremamente valiosa e pode, eventualmente, fazer emergir novos elementos de trabalho, bastante válidos, para a determinação do seu comportamento no futuro. Desta forma, é possível prever aquilo que um visitante de um sítio Web pode andar à procura e, então, preparar esse sítio para atender melhor as suas necessidades. Desta forma, consegue-se fazer com que o visitante permaneça mais tempo a navegar por esse sítio o que aumenta naturalmente a possibilidade dele ser atraído por novos produtos e proceder, eventualmente, à sua aquisição. Para que este objectivo possa ser alcançado, a adaptação do sítio tem de ser suficientemente rápida para que possa acompanhar a navegação do visitante, ao mesmo tempo que assegura os mais recentes padrões de comportamento de navegação dos visitantes. Isto requer um algoritmo de mineração de dados com um nível de desempenho suficientemente bom para que se possa actualizar os padrões frequentemente. Com as constantes mudanças que ocorrem ao longo do tempo nas bases de dados, invalidando ou introduzindo novos padrões, as técnicas de mineração de dados convencionais provaram ser ineficientes, uma vez que necessitam de ser reexecutadas a fim de actualizar os resultados do processo de mineração com os dados subjacentes às modificações ocorridas na base de dados. As técnicas de mineração incremental surgiram com o intuito de evitar essa reexecução do algoritmo para actualizar os resultados da mineração quando novos dados (incrementais) são adicionados ou dados antigos são removidos. Assim, consegue-se assegurar uma maior eficiência aos processos de mineração de dados. Neste trabalho, analisamos algumas das diferentes estratégias e modelos para a mineração incremental de dados, dando-se particular ênfase à sua aplicação em sítios Web, visando desenvolver modelos para a descoberta de padrões de comportamento dos visitantes desses sítios e gerar automaticamente recomendações para a sua reestruturação em tempo útil. Para atingir esse objectivo projectámos e implementámos o sistema Spottrigger, que cobre todo o ciclo de vida do processo de reestruturação de um sítio Web. Este ciclo é composto, basicamente, por tarefas especialmente orientadas para a extracção de dados “crus” armazenados nos servidores Web, passar estes dados por fases intermédias de limpeza e preparação, executar uma técnica de mineração incremental para extrair padrões de navegação dos utilizadores e, finalmente, reestruturar o sítio Web de acordo com os padrões de navegação encontrados e com o perfil do próprio utilizador. Além disso, o sistema Spottrigger foi aplicado no nosso estudo de caso, o qual é baseado em dados reais provenientes de um jornal online. Nosso principal objectivo foi colectar, em tempo útil, alguma informação sobre o perfil dos utilizadores que num dado momento estão a consultar o sítio e, assim, fazer a reestruturação do sítio num período de tempo tão curto quanto o possível, exibindo os anúncios desejáveis, activados de acordo com o perfil do utilizador. Os anúncios do sistema estão classificados por níveis. Os sítios são reestruturados para que os anúncios de nível mais elevado sejam lançados nas páginas com maior probabilidade de serem visitadas. Nesse sentido, foi definida uma classificação das páginas para o utilizador, baseada nos padrões frequentes adquiridos através do processo de mineração incremental. Visto que o comportamento de navegação dos visitantes pode mudar ao longo do tempo, o algoritmo de mineração incremental será também responsável por capturar essas mudanças de comportamento e rapidamente actualizar os padrões. .