To see the other types of publications on this topic, follow the link: Data Scraping.

Dissertations / Theses on the topic 'Data Scraping'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 41 dissertations / theses for your research on the topic 'Data Scraping.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Carle, Victor. "Web Scraping using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281344.

Full text
Abstract:
This thesis explores the possibilities of creating a robust Web Scraping algorithm, designed to continously scrape a specific website even though the HTML code is altered. The algorithm is intended to be used on websites that have a repetitive HTML structure containing data that can be scraped. A repetitive HTML structure often displays; news articles, videos, books, etc. This creates code in the HTML which is repeated many times, as the only thing different between the things displayed are for example titles. A good examplewould be Youtube. The scraper works through using text classification of words in the code of the HTML, training a Support Vector Machine to recognize the words or variable names. Classification of the words surrounding the sought-after data is done with the assumption that the future HTML ofa website will be similar to the current HTML, this in turn allows for robust scraping to be performed. To evaluate its performance a web archive is used in which the performance of the algorithm is back-tested on past versions of the site to hopefully get an idea of what the performance in the future might look like. The algorithm achieves varying results depending on a large variety of variables within the websites themselves as well as the past versions of the websites. The best performance was achieved on Yahoo news achieving an accuracy of 90 % dating back three months from the time the scraper stopped working.
Den här rapporten undersöker vad som krävs för att skapa en robust webbskrapare, designad för att kontinuerligt kunna skrapa en specifik hemsida trots att den underliggande HTML-koden förändras. En algoritm presenteras som är lämplig för hemsidor med en repetitiv HTML-struktur. En repetitiv HTML struktur innebär ofta att det visas saker såsom nyhetsartiklar, videos, böcker och så vidare. Det innebär att samma HTML-kod återanvänds ett flertal gånger, då det enda som skiljer de här sakerna åt är exempelvis deras titlar. Ett bra exempel är hemsidan Youtube. Skraparen funkar genom att använda textklassificering av ord som finns i HTML-koden, på så sätt kan maskinlärningsalgoritmen, support vector machine, känna igen den kod som omger datan som är eftersökt på hemsidan. För att möjliggöra detta så förvandlas HTML-koden, samt relevant metadata, till vektorer med hjälp av bag-of-words-modellen. Efter omvandlingen kan vektorerna matas in i maskinlärnings-modellen och klassifiera datan. Algoritmen testas på äldre versioner utav hemsidan tagna från ett webarkiv för att förhoppningsvis få en bra bild utav vad framtida prestationer skulle kunna vara. Algoritmen uppnår varierande resultat baserat på en stor mängd variabler inom hemsidan samt de äldre versionerna av hemsidorna. Algoritmen presterade bäst på Yahoo news där den uppnådde 90 % träffsäkerhet på äldre sidor.
APA, Harvard, Vancouver, ISO, and other styles
2

Färholt, Fredric. "Less Detectable Web Scraping Techniques." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-104887.

Full text
Abstract:
Web scraping is an efficient way of gathering data, and it has also become much eas- ier to perform and offers a high success rate. People no longer need to be tech-savvy when scraping data since several easy-to-use platform services exist. This study conducts experiments to see if people can scrape in an undetectable fashion using a popular and intelligent JavaScript library (Puppeteer). Three web scraper algorithms, where two of them use movement patterns from real-world web users, demonstrate how to retrieve information automatically from the web. They operate on a website built for this research that utilizes known semi-security mechanisms, honeypot, and activity logging, making it possible to collect and evaluate data from the algorithms and the website. The result shows that it may be possible to construct a web scraper algorithm with less detectability using Puppeteer. One of the algorithms reveals that it is possible to control computer performance using built-in methods in Puppeteer.
Webbskrapning är ett effektivt sätt att hämta data på, det har även blivit en aktivitet som är enkel att genomföra och chansen att en lyckas är hög. Användare behöver inte längre vara fantaster inom teknik när de skrapar data, det finns idag mängder olika och lättanvändliga plattformstjänster. Den här studien utför experi- ment för att se hur personer kan skrapa på ett oupptäckbart sätt med ett populärt och intelligent JavaScript bibliotek (Puppeteer). Tre webbskrapningsalgoritmer, där två av dem använder rörelsemönster från riktiga webbanvändare, demonstrerar hur en kan samla information. Webbskrapningsalgoritmerna har körts på en hemsida som ingått i experimentet med kännbar säkerhet, honeypot, och aktivitetsloggning, nå- got som gjort det möjligt att samla och utvärdera data från både algoritmerna och hemsidan. Resultatet visar att det kan vara möljligt att skrapa på ett oupptäckbart sätt genom att använda Puppeteer. En av algoritmerna avslöjar även möjligheten att kontrollera prestanda genom att använda inbyggda metoder i Puppeteer.
APA, Harvard, Vancouver, ISO, and other styles
3

Legaspi, Ramos Xurxo. "Scraping Dynamic Websites for Economical Data : A Framework Approach." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-57070.

Full text
Abstract:
Internet is a source of live data that is constantly updating with data of almost anyfield we can imagine. Having tools that can automatically detect these updates andcan select that information that we are interested in are becoming of utmost importancenowadays. That is the reason why through this thesis we will focus on someeconomic websites, studying their structures and identifying a common type of websitein this field: Dynamic Websites. Even when there are many tools that allow toextract information from the internet, not many tackle these kind of websites. Forthis reason we will study and implement some tools that allow the developers to addressthese pages from a different perspective.
APA, Harvard, Vancouver, ISO, and other styles
4

Oucif, Kadday. "Evaluation of web scraping methods : Different automation approaches regarding web scraping using desktop tools." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188418.

Full text
Abstract:
A lot of information can be found and extracted from the semantic web in different forms through web scraping, with many techniques emerging throughout time. This thesis is written with the objective to evaluate different web scraping methods in order to develop an automated, performance reliable, easy implemented and solid extraction process. A number of parameters are set to better evaluate and compare consisting techniques. A matrix of desktop tools are examined and two were chosen for evaluation. The evaluation also includes the learning of setting up the scraping process with so called agents. A number of links gets scraped by using the presented techniques with and without executing JavaScript from the web sources. Prototypes with the chosen techniques are presented with Content Grabber as a final solution. The result is a better understanding around the subject along with a cost-effective extraction process consisting of different techniques and methods, where a good understanding around the web sources structure facilitates the data collection. To sum it all up, the result is discussed and presented with regard to chosen parameters.
En hel del information kan bli funnen och extraherad i olika format från den semantiska webben med hjälp av webbskrapning, med många tekniker som uppkommit med tiden. Den här rapporten är skriven med målet att utvärdera olika webbskrapnings metoder för att i sin tur utveckla en automatiserad, prestandasäker, enkelt implementerad och solid extraheringsprocess. Ett antal parametrar är definierade för att utvärdera och jämföra befintliga webbskrapningstekniker. En matris av skrivbords verktyg är utforskade och två är valda för utvärdering. Utvärderingen inkluderar också tillvägagångssättet till att lära sig sätta upp olika webbskrapnings processer med så kallade agenter. Ett nummer av länkar blir skrapade efter data med och utan exekvering av JavaScript från webbsidorna. Prototyper med de utvalda teknikerna testas och presenteras med webbskrapningsverktyget Content Grabber som slutlig lösning. Resultatet utav det hela är en bättre förståelse kring ämnet samt en prisvärd extraheringsprocess bestående utav blandade tekniker och metoder, där en god vetskap kring webbsidornas uppbyggnad underlättar datainsamlingen. Sammanfattningsvis presenteras och diskuteras resultatet med hänsyn till valda parametrar.
APA, Harvard, Vancouver, ISO, and other styles
5

Rodrigues, Lanny Anthony, and Srujan Kumar Polepally. "Creating Financial Database for Education and Research: Using WEB SCRAPING Technique." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-36010.

Full text
Abstract:
Our objective of this thesis is to expand the microdata database of publicly available corporate information of the university by web scraping mechanism. The tool for this thesis is a web scraper that can access and concentrate information from websites utilizing a web application as an interface for client connection. In our comprehensive work we have demonstrated that the GRI text files approximately consist of 7227 companies; from the total number of companies the data is filtered with “listed” companies. Among the filtered 2252 companies some do not have income statements data. Hence, we have finally collected data of 2112 companies with 36 different sectors and 13 different countries in this thesis. The publicly available information of income statements between 2016 to 2020 have been collected by GRI of microdata department. Collecting such data from any proprietary database by web scraping may cost more than $ 24000 a year were collecting the same from the public database may cost almost nil, which we will discuss further in our thesis.In our work we are motivated to collect the financial data from the annual financial statement or financial report of the business concerns which can be used for the purpose to measure and investigate the trading costs and changes of securities, common assets, futures, cryptocurrencies, and so forth. Stock exchange, official statements and different business-related news are additionally sources of financial data that individuals will scrape. We are helping those petty investors and students who require financial statements from numerous companies for several years to verify the condition of the economy and finance concerning whether to capitalise or not, which is not possible in a conventional way; hence they use the web scraping mechanism to extract financial statements from diverse websites and make the investment decisions on further research and analysis.Here in this thesis work, we have indicated the outcome of the web scraping is to keep the extracted data in a database. The gathered data of the resulted database can be implemented for the required goal of further research, education, and other purposes with the further use of the web scraping technique.
APA, Harvard, Vancouver, ISO, and other styles
6

Cosman, Vadim, and Kailash Chowdary. "End user interface for collecting and evaluating company data : Real-time data collection through web-scraping." Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37740.

Full text
Abstract:
The demand of open and reliable data, in the Era of Big Data is constantly increasing as thediversity of research and the need of trustworthy data as high-quality data is increasesconsiderably the quality of the findings . However, it is very hard to get reliable data for free witha small effort. With an immense progress of tools, on one hand for data scraping, data cleansing,data storing, and on the other hand so many platforms with data that can be scrapped, it isabsolutely crucial to make use of them and easily build data sets with real and trustworthy data,for free and in a user-friendly way. Using several available tools, an application with a graphicaluser interface (GUI) was developed. The possibilities of the applications are: collecting financialdata for any given list of companies, updating an existent data set, build a data set out of thewhole data warehouse(DW), based on several filters, make the data sets available to anyone whouses the application, and build simple visualization of the existent data. To make sure that‘garbage data in – garbage data out’ concept is avoided, a constant analysis of the data quality isperformed, and the quality of the data is adjusted so that it is ready for use in a research project.The work provides a viable solution for collecting data and making it borderless while respectingthe standards of data sharing. The application can collect data from 2 sources, with more than250 features per company. The application is updated with more functionalities and more sourcesof data.
APA, Harvard, Vancouver, ISO, and other styles
7

Ceccaroni, Giacomo. "Raccolta di dati eterogenei e multi-sorgente per data visualization dei rapporti internazionali dell'Ateneo di Bologna." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13940/.

Full text
Abstract:
Il caso di studio descritto all'interno di questo documento, analizza la raccolta dei dati sul Web e la visualizzazione di tali attraverso tecniche di Data Visualization. Il sistema risultante si pone come obiettivo quello di poter essere utilizzato dal personale dell'Area Relazioni Internazionali dell'Università di Bologna per ottenere informazioni utili alla mappatura delle relazioni internazionali mantenute da professori e ricercatori. L'obiettivo è stato raggiunto partendo dai requisiti specificati, utilizzati poi nella successiva fase di analisi del problema e del dominio. L'utilizzo pratico che verrà fatto dell'applicazione Web finita, è descritto attraverso la scrittura di scenari previsti e casi d'uso. La parte implementativa del progetto si sviluppa iniziando con una panoramica delle tecnologie utilizzate per raggiungere l'obiettivo e delle ragioni che hanno portato a tali scelte. Tra gli elementi tecnologici trattati vi sono Couchbase Server, Scopus API, moduli Python e framework Javascript. In particolare, per mettere in atto la visualizzazione dei dati nel progetto sono stati utilizzati: D3.js e Leaflet.js.
APA, Harvard, Vancouver, ISO, and other styles
8

Franchini, Giulia. "Associazioni non profit e linked open data: un esperimento." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8350/.

Full text
Abstract:
Le Associazioni Non Profit giocano un ruolo sempre più rilevante nella vita dei cittadini e rappresentano un'importante realtà produttiva del nostro paese; molto spesso però risulta difficile trovare informazioni relative ad eventi, attività o sull'esistenza stessa di queste associazioni. Per venire in contro alle esigenze dei cittadini molte Regioni e Province mettono a disposizione degli elenchi in cui sono raccolte le informazioni relative alle varie organizzazioni che operano sul territorio. Questi elenchi però, presentano spesso grossi problemi, sia per quanto riguarda la correttezza dei dati, sia per i formati utilizzati per la pubblicazione. Questi fattori hanno portato all'idea e alla necessità di realizzare un sistema per raccogliere, sistematizzare e rendere fruibili le informazioni sulle Associazioni Non Profit presenti sul territorio, in modo che questi dati possano essere utilizzati liberamente da chiunque per scopi diversi. Il presente lavoro si pone quindi due obiettivi principali: il primo consiste nell'implementazione di un tool in grado di recuperare le informazioni sulle Associazioni Non Profit sfruttando i loro Siti Web; questo avviene per mezzo dell'utilizzo di tecniche di Web Crawling e Web Scraping. Il secondo obiettivo consiste nel pubblicare le informazioni raccolte, secondo dei modelli che ne permettano un uso libero e non vincolato; per la pubblicazione e la strutturazione dei dati è stato utilizzato un modello basato sui principi dei linked open data.
APA, Harvard, Vancouver, ISO, and other styles
9

Holm, Andreas, and Oscar Ahlm. "Skrapa Facebook : En kartläggning över hur data kan samlas in från Facebook." Thesis, Malmö universitet, Institutionen för datavetenskap och medieteknik (DVMT), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-43326.

Full text
Abstract:
På sociala medier delas det varje dag en stor mängd data. Om denna data kan samlas in ochsorteras, kan den vara värdefull som underlag för forskningsarbete. Särskilt för forskning iländer där sociala medier kan vara enda platsen för medborgare att göra sin röst hörd. Fa-cebook är en av världens mest använda sociala medieplattformar och är därför en potentiellrik källa att samla data ifrån. Dock har Facebook på senare år valt att vara mer restrik-tiv kring vem som får tillgång till data på deras plattform. Detta har öppnat ett intresseför hur man kan få tillgång till den data som delas på Facebooks plattform utan explicittillstånd från Facebook. Det öppnar samtidigt för frågor kring etik och legalitet gällandedetsamma. Detta arbete ämnade därför undersöka olika aspekter, så som tekniska, etiska,lagliga, kring att samla data från Facebooks plattform genom att utföra en litteraturstudiesamt experiment. Litteraturstudien visade att det var svårt att hitta material om vilkatekniska åtgärder som Facebook tar för att förhindra webbskrapning. Experimenten somgenomfördes visade en del av dessa, bland annat att HTML-strukturen förändras och attid för HTML-element förändras vid vissa händelser, vilket försvårar webbskrapningspro-cessen. Litteraturstudien visade även att det är besvärligt att veta vad som är lagligt attskrapa från Facebook och vad som är olagligt. Detta dels för att olika länder har olika lagaratt förhålla sig till när det kommer till webbskrapning, dels för att det kan vara svårt attveta vad som räknas som personlig data och som då skyddas av bland annat GDPR.
A vast amount of data is shared daily on social media platforms. Data that if it can becollected and sorted can prove valueable as a basis for research work. Especially in countrieswhere social media constitutes the only possible place for citizens to make their voicesheard. Facebook is one of the most frequently used social media platforms and thus can bea potential rich source from which data can be collected. But Facebook has become morerestrictive about who gets access to the data on their platform. This has created an interestin ways how to get access to the data that is shared on Facebooks platform without gettingexplicit approval from Facebook. At the same time it creates questions about the ethicsand the legality of it. This work intended to investigate different aspects, such as technical,ethical, legal, related to the collecting of data from Facebooks platform by performing aliterary review and experiments. The literary review showed that it was difficult to findmaterial regarding technical measures taken by Facebook to prevent web scraping. Theexperiments that were performed identified some of these measures, among others thatthe structure of the HTML code changes and that ids of HTML elements updates whendifferent events occur on the web page, which makes web scraping increasingly difficult.The literary review also showed that it is troublesome to know which data is legal to scrapefrom Facebook and which is not. This is partly due to the fact that different countries havedifferent laws to which one must conform when scraping web data, and partly that it canbe difficult to know what counts as personal data and thus is protected by GDPR amongother laws.
APA, Harvard, Vancouver, ISO, and other styles
10

Mascellaro, Maria Maddalena. "Integrazione di sorgenti eterogenee per un sistema di Data Visualization." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16818/.

Full text
Abstract:
Il caso di studio descritto all'interno del volume di tesi analizza le tecniche di raccolta dei dati sul Web e il modo in cui essi possono essere rappresentati attraverso le tecniche di Data Visualization. L’applicazione web realizzata permette all'utente di visionare le collaborazioni internazionali dei professori e dei ricercatori dell'Università di Bologna con istituzioni estere. L'obiettivo di questa tesi è quello di raccogliere le informazioni relative alle collaborazioni contenute in una banca dati chiamata Web of Science. Questi dati sono poi stati integrati con quelli già presenti all'interno dell’applicazione web. Per questo motivo sono state individuate due macro-fasi di lavoro: la raccolta dei dati e l’integrazione di essi con quelli già raccolti in precedenza nella banca dati Scopus. La prima fase è stata la più corposa all’interno di questo progetto di tesi, è stata effettuata con script Python che, attraverso la libreria WOS e le API di Web of Science, hanno estrapolato i dati dalla banca dati. Durante la seconda fase è stata modificata l’interfaccia del sito, permettendo all’utente di individuare quale fosse l’origine delle pubblicazioni esaminate. Un'altra funzionalità implementata è stata la versione multilingua del sito (italiano-inglese).
APA, Harvard, Vancouver, ISO, and other styles
11

Ventura, Pedro Côrte-Real Machado. "Dashboard de tráfego nos websites de empresas europeias." Master's thesis, Instituto Superior de Economia e Gestão, 2019. http://hdl.handle.net/10400.5/19475.

Full text
Abstract:
Mestrado em Gestão de Sistemas de Informação
Os serviços de Business Intelligence oferecem várias formas de processar e analisar a riqueza de um conjunto de dados nas empresas nos dias de hoje. Neste trabalho de projeto, foi utilizado o Microsoft Power BI para desenvolver uma solução de business intelligence com um conjunto de dados no âmbito de tráfego de websites. Para fornecer contexto teórico em como desenvolver uma solução de business intelligence através de um conjunto de dados semiestruturados, os princípios chave de modelação multidimensional são introduzidos. Os resultados do processo de desenvolvimento são evidenciados na parte da descrição e discussão dos resultados, de exemplos de dashboards criados para a solução. Sendo esta uma ferramenta com utilização diária, o desempenho técnico e a leitura de documentação de apoio é importante para uma adopção positiva desta ferramenta por parte dos utilizadores. Os aspectos de funcionalidade e desempenho foram analisados e otimizados com base na pesquisa da revisão de literatura, e a ferramenta foi colocada a funcionar corretamente.
Business intelligence services offer many ways to process and analyze the richness of a data set in business today. In this project work, Microsoft Power BI and a web scraping tool were used to develop a business intelligence solution with a data set within the scope of website traffic. To provide theoretical context on how to develop a business intelligence solution through a semi-structured data set, key principles of multidimensional modeling are introduced. The results of the development process are shown in the description and discussion of the results, examples of dashboards created for the solution. As this is a daily tool, technical performance and reading supporting documentation is important for the positive adoption of this tool by users. Functionality and performance aspects were analyzed and optimized based on literature review research, and the tool was put to work correctly.
info:eu-repo/semantics/publishedVersion
APA, Harvard, Vancouver, ISO, and other styles
12

Hidén, Filip, and Magnus Qvarnström. "En jämförelse av prestanda mellan centraliserad och decentraliserad datainsamling." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291266.

Full text
Abstract:
In the modern world, data and information is used on a larger scale than ever before. Much of this information is stored on the internet in many different shapes, like articles, files and webpages, among others. If you try to start a new project or company that depends on this data there is a need for a way to efficiently search for, sort and gather what you need to process. A common method to achieve this is called Web scraping, that can be implemented in several different ways to search and gather data. This can be an expensive investment for smaller companies, as Web scraping is an intensive process that requires that you pay for a powerful enough server to manage everything. The purpose of this report is to investigate whether there exist other cheaper alternatives to implement Web scraping, that don’t require access to expensive servers. To find an answer to this, it was necessary to research the subject of Web scraping further along with different system architectures that are used in the industry to implement it. This research was then used to develop a Web scraping application that was implemented on both a centralised server and as a decentralised implementation on an Android device. Finally all the summarized research and results from performance tests of the two applications were used in order to provide a result. The conclusion drawn from these results was that decentralised android implementations is a valid and functional solution for Web scraping today, however the difference in performance means it’s not always useful for every situation. Instead it must be handled based on the specifications and requirements of the particular company. There is also a very limited amount of research done on this topic, which means it needs further investigation in order to keep developing implementations and knowledge on this particular subject.
I den moderna världen används data och information i en större skala än någonsin tidigare. Mycket av denna information och data kan hittas på internet i många olika former som artiklar, filer, webbsidor med mera. Om man försöker att starta ett nytt projekt eller företag som är beroende av delar av denna data behövs det ett sätt att effektivt söka igenom den, sortera ut det som söks och samla in den för att hanteras. Ett vanligt sätt att göra detta är en metod som kallas Web scraping, som kan implementeras på flera olika sätt för att söka och samla in den funna datan. För små företag kan detta bli en kostsam satsning, då Web scraping är en intensiv process som vanligtvis kräver att man måste betala för att driva en tillräckligt kraftfull server som kan hantera datan. Syftet med denna rapport är att undersöka om det finns giltiga och billigare alternativ för att implementera Web scraping lösningar, som inte kräver tillgång till kostsamma serverlösningar. För att svara på detta utfördes en undersökning runt Web scraping, samt olika systemarkitekturer som används för att utveckla dessa system i den nuvarande marknaden samt hur de kan implementeras. Med denna kunskap utvecklades en Web scraping applikation som anpassades för att samla in ingredienser från recept artiklar på internet. Denna implementation anpassades sedan för två olika lösningar, en centraliserad på en server och en decentraliserad, för Android enheter. Till slut summerades all den insamlade faktan, tillsammans med enhetstester utförda på test implementationerna för att få ut ett resultat. Slutsatsen som drogs av detta resultat var att decentraliserade Android implementationer är en giltig och funktionell lösning för Web scraping idag, men skillnaden i prestanda innebär att det inte alltid är en användbar lösning, istället måste det bestämmas beroende på ett företags behov och specifikationer. Dessutom är forskningen runt detta ämne begränsat, och kräver vidare undersökning och fördjupning för att förbättra kunskaper och implementationer av detta område i framtiden.
APA, Harvard, Vancouver, ISO, and other styles
13

Michalakidis, Georgios. "Appreciation of structured and unstructured content to aid decision making : from web scraping to ontologies and data dictionaries in healthcare." Thesis, University of Surrey, 2016. http://epubs.surrey.ac.uk/812261/.

Full text
Abstract:
A systematic approach to the extraction of data from disparate data sources is proposed. The World Wide Web is a most diverse dataset; identifying ways in which this large database provides means for data quality verification with concepts such as data lineage and provenance allows to follow the same approach as a means to aid decision-making in sensitive domains such as healthcare. Through lessons learned from research in the UK and internationally, we conclude that emphasis on interoperable and model-based support of the data syndication can enhance data quality, an issue still current (American Hospital Association, 2015) and with data barriers in healthcare due to governance concerns. To improve on the above, we start by proposing a system for solution-orientated reporting of errors associated with the extraction of routinely collected clinical data. We then explore key concepts to assess the readiness of data for research and define an ontology-driven approach to create data dictionaries for quality improvement in healthcare. Finally, we apply this research to facilitate the enablement of consistent data recording across a health system to allow for service quality comparisons. Work deriving from this research and built by the author commissioned and aided by the UK NHS, University of Surrey, Green Cross Medical, particularly in creating and testing software systems in real-world scenarios, has facilitated: quality improvement in healthcare data extraction from GP practices in the UK, a state-of-art system for Web-enabling Hospital Episode Statistics (HES) data for dermatology and, finally, an online system designed to enable cancer Multi-Disciplinary Teams (MDTs) to self-assess and receive feedback on how their team performs against the standards set out in ‘The Characteristics of an Effective MDT’ provided by NHS IQ, formerly part of National Cancer Action Team (NCAT), which in 2016 won the Quality in Care Programme’s “Digital Innovation in the Treatment of Cancer” award. Further experimentation shows there is potential for the methods proposed to be applicable in other sectors such as the investment sector (initial investigation has happened through the early stages of this research) but it is suggested that this potential be explored further.
APA, Harvard, Vancouver, ISO, and other styles
14

Jakupovic, Edin. "Alternative Information Gathering on Mobile Devices." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210712.

Full text
Abstract:
Searching and gathering information about specific topics is a time wasting, but vital practise. With the continuous growth and surpassing of desktop devices, the mobile market is becoming a more important area to consider. Due to the portability of mobile devices, certain tasks are more difficult to perform, compared to on a desktop device. Searching for information online is generally slower on mobile devices than on desktop devices, even though the majority of searches are performed on mobile devices. The largest challenges with searching for information online using mobile devices, are the smaller screen sizes, and the time spent jumping between sources and search results in a browser. These challenges could be solved by using an application that focuses on the relevancy of search results, summarizes the content of them, and presents them on a single screen. The aim of this study was to find an alternative data gathering method with a faster and simpler searching experience. This data gathering method was able to quickly find and gather data requested through a search term by a user. The data was then analyzed and presented to the user in a summarized form, to eliminate the need to visit the source of the content. A survey was performed by having a smaller target group of users answer a questionnaire. The results showed that the method was quick, results were often relevant, and the summaries reduced the need to visit the source page. But while the method had potential for future development, it is hindered by ethical issues related to the use of web scrapers.
Sökning och insamling av information om specifika ämnen är en tidskrävande, men nödvändig praxis. Med den kontinuerliga tillväxten som gått förbi stationära enheters andel, blir mobilmarknaden ett viktigt område att överväga. Med tanke på rörligheten av bärbara enheter, så blir vissa uppgifter svårare att utföra, jämfört med på stationära enheter. Att söka efter information på Internet är generellt långsammare på mobila enheter än på stationära. De största utmaningarna med att söka efter information på Internet med mobila enheter, är de mindre skärmstorlekarna, och tiden spenderad på att ta sig mellan källor och sökresultat i en webbläsare. Dessa utmaningar kan lösas genom att använda en applikation som fokuserar på relevanta sökresultat och sammanfattar innehållet av dem, samt presenterar dem på en enda vy. Syftet med denna studie är att hitta en alternativ datainsamlingsmetod för attskapa en snabbare och enklare sökupplevelse. Denna datainsamlingsmetod kommer snabbt att kunna hitta och samla in data som begärts via en sökterm av en användare. Därefter analyseras och presenteras data för användaren i en sammanfattad form för att eliminera behovet av att besöka innehållets källa. En undersökning utfördes genom att en mindre målgrupp av användare svarade på ett formulär av frågor. Resultaten visade att metoden var snabb, resultaten var ofta relevanta och sammanfattningarna minskade behovet av att besöka källsidan. Men medan metoden hade potential för framtida utveckling, hindras det av de etiska problemen som associeras med användningen av web scrapers.
APA, Harvard, Vancouver, ISO, and other styles
15

Blázquez, Soriano María Desamparados. "Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis Approach." Doctoral thesis, Universitat Politècnica de València, 2020. http://hdl.handle.net/10251/116836.

Full text
Abstract:
[ES] En la Era Digital, el creciente uso de Internet y de dispositivos digitales está transformando completamente la forma de interactuar en el contexto económico y social. Miles de personas, empresas y organismos públicos utilizan Internet en sus actividades diarias, generando de este modo una enorme cantidad de datos actualizados ("Big Data") accesibles principalmente a través de la World Wide Web (WWW), que se ha convertido en el mayor repositorio de información del mundo. Estas huellas digitales se pueden rastrear y, si se procesan y analizan de manera apropiada, podrían ayudar a monitorizar en tiempo real una infinidad de variables económicas. En este contexto, el objetivo principal de esta tesis doctoral es generar indicadores económicos, basados en datos web, que sean capaces de proveer regularmente de predicciones a corto plazo ("nowcasting") sobre varias actividades empresariales que son fundamentales para el crecimiento y desarrollo de las economías. Concretamente, tres indicadores económicos basados en la web han sido diseñados y evaluados: en primer lugar, un indicador de orientación exportadora, basado en un modelo que predice si una empresa es exportadora; en segundo lugar, un indicador de adopción de comercio electrónico, basado en un modelo que predice si una empresa ofrece la posibilidad de venta online; y en tercer lugar, un indicador de supervivencia empresarial, basado en dos modelos que indican la probabilidad de supervivencia de una empresa y su tasa de riesgo. Para crear estos indicadores, se han descargado una diversidad de datos de sitios web corporativos de forma manual y automática, que posteriormente se han procesado y analizado con técnicas de análisis Big Data. Los resultados muestran que los datos web seleccionados están altamente relacionados con las variables económicas objeto de estudio, y que los indicadores basados en la web que se han diseñado en esta tesis capturan en un alto grado los valores reales de dichas variables económicas, siendo por tanto válidos para su uso por parte del mundo académico, de las empresas y de los decisores políticos. Además, la naturaleza online y digital de los indicadores basados en la web hace posible proveer regularmente y de forma barata de predicciones a corto plazo. Así, estos indicadores son ventajosos con respecto a los indicadores tradicionales. Esta tesis doctoral ha contribuido a generar conocimiento sobre la viabilidad de producir indicadores económicos con datos online procedentes de sitios web corporativos. Los indicadores que se han diseñado pretenden contribuir a la modernización en la producción de estadísticas oficiales, así como ayudar a los decisores políticos y los gerentes de empresas a tomar decisiones informadas más rápidamente.
[CAT] A l'Era Digital, el creixent ús d'Internet i dels dispositius digitals està transformant completament la forma d'interactuar al context econòmic i social. Milers de persones, empreses i organismes públics utilitzen Internet a les seues activitats diàries, generant d'aquesta forma una enorme quantitat de dades actualitzades ("Big Data") accessibles principalment mitjançant la World Wide Web (WWW), que s'ha convertit en el major repositori d'informació del món. Aquestes empremtes digitals poden rastrejar-se i, si se processen i analitzen de forma apropiada, podrien ajudar a monitoritzar en temps real una infinitat de variables econòmiques. En aquest context, l'objectiu principal d'aquesta tesi doctoral és generar indicadors econòmics, basats en dades web, que siguen capaços de proveïr regularment de prediccions a curt termini ("nowcasting") sobre diverses activitats empresarials que són fonamentals per al creixement i desenvolupament de les economies. Concretament, tres indicadors econòmics basats en la web han sigut dissenyats i avaluats: en primer lloc, un indicador d'orientació exportadora, basat en un model que prediu si una empresa és exportadora; en segon lloc, un indicador d'adopció de comerç electrònic, basat en un model que prediu si una empresa ofereix la possibilitat de venda online; i en tercer lloc, un indicador de supervivència empresarial, basat en dos models que indiquen la probabilitat de supervivència d'una empresa i la seua tasa de risc. Per a crear aquestos indicadors, s'han descarregat una diversitat de dades de llocs web corporatius de forma manual i automàtica, que posteriorment s'han analitzat i processat amb tècniques d'anàlisi Big Data. Els resultats mostren que les dades web seleccionades estan altament relacionades amb les variables econòmiques objecte d'estudi, i que els indicadors basats en la web que s'han dissenyat en aquesta tesi capturen en un alt grau els valors reals d'aquestes variables econòmiques, sent per tant vàlids per al seu ús per part del món acadèmic, de les empreses i dels decisors polítics. A més, la naturalesa online i digital dels indicadors basats en la web fa possible proveïr regularment i de forma barata de prediccions a curt termini. D'aquesta forma, són avantatjosos en comparació als indicadors tradicionals. Aquesta tesi doctoral ha contribuït a generar coneixement sobre la viabilitat de produïr indicadors econòmics amb dades online procedents de llocs web corporatius. Els indicadors que s'han dissenyat pretenen contribuïr a la modernització en la producció d'estadístiques oficials, així com ajudar als decisors polítics i als gerents d'empreses a prendre decisions informades més ràpidament.
[EN] In the Digital Era, the increasing use of the Internet and digital devices is completely transforming the way of interacting in the economic and social framework. Myriad individuals, companies and public organizations use the Internet for their daily activities, generating a stream of fresh data ("Big Data") principally accessible through the World Wide Web (WWW), which has become the largest repository of information in the world. These digital footprints can be tracked and, if properly processed and analyzed, could help to monitor in real time a wide range of economic variables. In this context, the main goal of this PhD thesis is to generate economic indicators, based on web data, which are able to provide regular, short-term predictions ("nowcasting") about some business activities that are basic for the growth and development of an economy. Concretely, three web-based economic indicators have been designed and evaluated: first, an indicator of firms' export orientation, which is based on a model that predicts if a firm is an exporter; second, an indicator of firms' engagement in e-commerce, which is based on a model that predicts if a firm offers e-commerce facilities in its website; and third, an indicator of firms' survival, which is based on two models that indicate the probability of survival of a firm and its hazard rate. To build these indicators, a variety of data from corporate websites have been retrieved manually and automatically, and subsequently have been processed and analyzed with Big Data analysis techniques. Results show that the selected web data are highly related to the economic variables under study, and the web-based indicators designed in this thesis are capturing to a great extent their real values, thus being valid for their use by the academia, firms and policy-makers. Additionally, the digital and online nature of web-based indicators makes it possible to provide timely, inexpensive predictions about the economy. This way, they are advantageous with respect to traditional indicators. This PhD thesis has contributed to generating knowledge about the viability of producing economic indicators with data coming from corporate websites. The indicators that have been designed are expected to contribute to the modernization of official statistics and to help in making earlier, more informed decisions to policy-makers and business managers.
Blázquez Soriano, MD. (2019). Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis Approach [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/116836
TESIS
APA, Harvard, Vancouver, ISO, and other styles
16

Wöldern, Lars. "Discovery and Analysis of Social Media Data : How businesses can create customized filters to more effectively use public data." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-75275.

Full text
Abstract:
The availability of prospective customer information present on social media platforms has led to many marketing and customer-facing departments utilizing social media data in processes such as demographics research, and sales and campaign planning. However, if your business needs require further filtration of data, beyond what is provided by existing filters, the volume and rate at which data can be manually sifted, is constrained by the speed and accuracy of employees, and their digital competency. The repetitive nature of filtration work, lends itself to automation, that ultimately has the potential to alleviate large productivity bottlenecks, enabling organizations to distill larger volumes of unfiltered data, faster and with greater precision. This project employs automation and artificial intelligence, to filter Linkedin profiles using customized selection criteria, beyond what is currently available, such as nationality and age. By introducing the ability to produce tailored indices of social media data, automated filtration offers organizations the opportunity to better utilize rich prospective data for more efficient customer review and targeting.
APA, Harvard, Vancouver, ISO, and other styles
17

Wu, Yongliang. "Aggregating product reviews for the Chinese market." Thesis, KTH, Kommunikationssystem, CoS, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-91484.

Full text
Abstract:
As of December 2007, the number of Internet users in China had increased to 210 million people. The annual growth rate reached 53.3 percent in 2008, with the average number of Internet users increasing every day by 200,000 people. Currently, China's Internet population is slightly lower than the 215 million internet users in the United States. [1] Despite the rapid growth of the Chinese economy in the global Internet market, China’s e-commerce is not following the traditional pattern of commerce, but instead has developed based on user demand. This growth has extended into every area of the Internet. In the west, expert product reviews have been shown to be an important element in a user’s purchase decision. The higher the quality of product reviews that customers received, the more products they buy from on-line shops. As the number of products and options increase, Chinese customers need impersonal, impartial, and detailed products reviews. This thesis focuses on on-line product reviews and how they affect Chinese customer’s purchase decisions. E-commerce is a complex system. As a typical model of e-commerce, we examine a Business to Consumer (B2C) on-line retail site and consider a number of factors; including some seemingly subtitle factors that may influence a customer’s eventually decision to shop on website. Specifically this thesis project will examine aggregated product reviews from different on-line sources by analyzing some existing western companies. Following this the thesis demonstrates how to aggregate product reviews for an e-business website. During this thesis project we found that existing data mining techniques made it straight forward to collect reviews. These reviews were stored in a database and web applications can query this database to provide a user with a set of relevant product reviews. One of the important issues, just as with search engines is providing the relevant product reviews and determining what order they should be presented in. In our work we selected the reviews based upon matching the product (although in some cases there are ambiguities concerning if two products are actually identical or not) and ordering the matching reviews by date - with the most recent reviews present first. Some of the open questions that remain for the future are: (1) improving the matching - to avoid the ambiguity concerning if the reviews are about the same product or not and (2) determining if the availability of product reviews actually affect a Chinese user's decision to purchase a product.
I december 2007 uppgick antalet internetanvändare i Kina har ökat till 210 miljoner människor. Den årliga tillväxttakten nådde 53,3 procent 2008, med den genomsnittliga Antalet Internet-användare ökar för varje dag av 200.000 människor. Närvarande Kinas Internet befolkningen är något lägre än de 215 miljoner Internetanvändare i USA Staterna.[1] Trots den snabba tillväxten i den kinesiska ekonomin i den globala Internetmarknaden, Kinas e-handel inte följer det traditionella mönstret av handel, men i stället har utvecklats baserat på användarnas efterfrågan. Denna tillväxt har utvidgas till alla områden I Internet. I väst har expert recensioner visat sig vara en viktig del I användarens köpbeslut. Ju högre kvalitet på produkten recensioner som kunderna mottagna fler produkter de köper från on-line butiker. Eftersom antalet produkter och alternativen ökar, kinesiska kunderna behöver opersonlig, opartisk och detaljerade produkter recensioner. Denna avhandling fokuserar på on-line recensioner och hur de påverkar Kinesiska kundens köpbeslut.</p> E-handel är ett komplext system. Som en typisk modell för e-handel, vi undersöka ett Business to Consumer (B2C) on-line-försäljning plats och överväga ett antal faktorer; inklusive några till synes subtitle faktorer som kan påverka kundens småningom Beslutet att handla på webbplatsen. Uttryckligen detta examensarbete kommer att undersöka aggregerade recensioner från olika online-källor genom att analysera vissa befintliga västra företag. Efter den här avhandlingen visar hur samlade produkt recensioner för en e-affärer webbplats. Under detta examensarbete fann vi att befintliga data mining tekniker gjort det rakt fram för att samla recensioner. Dessa översyner har lagrats i en databas och webb program kan söka denna databas för att ge en användare med en rad relevanta product recensioner. En av de viktiga frågorna, precis som med sökmotorer är att tillhandahålla relevanta produkt recensioner och bestämma vilken ordning de ska presenteras i. vårt arbete har vi valt recensioner baserat på matchning produkten (men i vissa fall det finns oklarheter i fråga om två produkter verkligen identiska eller inte) och beställa matchande recensioner efter datum - med den senaste recensioner närvarande första. Några av de öppna frågorna som kvarstår för framtiden är: (1) förbättra matchning - För att undvika oklarheter rörande om Gästrecensionerna om samma produkt eller inte och (2) avgöra om det finns recensioner faktiskt påverka en kinesisk användarens val att köpa en produkt.
APA, Harvard, Vancouver, ISO, and other styles
18

Pettersson, Emeli, and Albin Carlson. "Att hitta en nål i en höstack: Metoder och tekniker för att sålla och gradera stora mängder ostrukturerad textdata." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20105.

Full text
Abstract:
Big Data är i dagsläget ett populärt ämne som kan användas för en mängd olika syften. Bland annat kan det användas för att analysera data på webben i hopp om att identifiera brott mot mänskliga rättigheter. Genom att tillämpa tekniker inom områden som Artificiell Intelligens (AI), Information Retrieval (IR) samt data- visualisering, hoppas företaget Globalworks AB kunna identifiera röster vilka uttrycker sig om förtryck och kränkningar i social media. Artificiell intelligens och informationshämtning är dock breda områden och forskning som behandlar dem kan finnas långt tillbaka i tiden. Vi har därför valt att utföra en systematisk litteraturstudie i syfte att kartlägga existerande forskning inom dessa områden. Med en litterär sammanställning bistår vi med en ontologisk överblick i hur ett system som använder dessa tekniker är strukturerat, med vilka metoder och teknologier ett sådant system kan utvecklas, samt hur dessa kan kombineras.
Big Data is a popular topic these days which can be utilized for numerous purposes. It can, for instance, be used in order to analyse data made available online in hopes of identifying violations against human rights. By applying techniques within such areas as Artificial Intelligence (AI), Information Retrieval (IR), and Visual Analytics, the company Globalworks Ltd. aims to identify single voices in social media expressing grievances concerning such violations. Artificial Intelligence and Information Retrieval are broad topics however, and have been an active area of research for quite some time. We have therefore chosen to conduct a systematic literature review in hopes of mapping together existing research covering these areas. By presenting a literary compilation, we provide an ontological view of how an information system utilizing techniques within these areas could be structured, in addition to how such a system could deploy said techniques.
APA, Harvard, Vancouver, ISO, and other styles
19

De, Luca Gabriele. "PARLEN: uno strumento modulare per l’analisi di articoli e il riconoscimento di entità." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10905/.

Full text
Abstract:
La tesi descrive PARLEN, uno strumento che permette l'analisi di articoli, l'estrazione e il riconoscimento delle entità - ad esempio persone, istituzioni, città - e il collegamento delle stesse a risorse online. PARLEN è inoltre in grado di pubblicare i dati estratti in un dataset basato su principi e tecnologie del Semantic Web.
APA, Harvard, Vancouver, ISO, and other styles
20

Andersson, Pontus. "Developing a Python based web scraper : A study on the development of a web scraper for TimeEdit." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-43140.

Full text
Abstract:
I en värld där alltmer information lagras på internet är det svårt för en vanlig användare att hänga med. Även när informationen finns tillgänglig på en och samma hemsida kan den hemsidan sakna funktioner eller vara svår att läsa av. Idén bakom att skrapa hemsidor, tidningar eller spel på information är inte ny och detta examensarbete fokuserar på att bygga en web scraper med tillhörande hemsida där användare kan ladda upp sitt schema skrapat från TimeEdit. Hemsidan ska sedan presentera denna skrapade data på ett visuellt tilltalande sett. När system är färdigutvecklade utvärderas dem för att se om examensarbetets mål har uppnåtts samt om systemen har förbättrat det befintliga sättet att hantera schemaläggning i TimeEdit hos lärare och studenter. I sammanfattningen finns sedan framtida forskning och arbeten presenterat.
The concept of scraping the web is not new, however, with modern programming languages it is possible to build web scrapers that can collect unstructured data and save this in a structured way. TimeEdit, a scheduling platform used by Mid Sweden University, has no feasible way to count how many hours has been scheduled at any given week to a specific course, student, or professor. The goal of this thesis is to build a python-based web scraper that collects data from TimeEdit and saves this in a structured manner. Users can then upload this text file to a dynamic website where it is extracted from the file and saved into a predetermined database and unique to that user. The user can then get this data presented in a fast, efficient, and user-friendly way. This platform is developed and evaluated with the resulting platform being a good and fast way to scan a TimeEdit schedule and evaluate the extracted data. With the platform built future work is recommended to make it a finishes product ready for live use by all types of users.
APA, Harvard, Vancouver, ISO, and other styles
21

Morgan, Justin L. "Clustering Web Users By Mouse Movement to Detect Bots and Botnet Attacks." DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2304.

Full text
Abstract:
The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing machine learning-based detection schemes, by introducing a novel unsupervised learning approach to cluster users based on behavioral biometrics. The idea is that, by differentiating users based on their behavior, for example how they use the mouse or type on the keyboard, information can be provided for website administrators to make more informed decisions on declaring if a user is a human or a bot. This approach is similar to how modern websites require users to login before browsing their website; which in doing so, website administrators can make informed decisions on declaring if a user is a human or a bot. An added benefit of this approach is that it is a human observational proof (HOP); meaning that it will not inconvenience the user (user friction) with human interactive proofs (HIP) such as CAPTCHA, or with login requirements
APA, Harvard, Vancouver, ISO, and other styles
22

Johansson, Richard, and Heino Otto Engström. "Topic propagation over time in internet security conferences : Topic modeling as a tool to investigate trends for future research." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177748.

Full text
Abstract:
When conducting research, it is valuable to find high-ranked papers closely related to the specific research area, without spending too much time reading insignificant papers. To make this process more effective an automated process to extract topics from documents would be useful, and this is possible using topic modeling. Topic modeling can also be used to provide topic trends, where a topic is first mentioned, and who the original author was. In this paper, over 5000 articles are scraped from four different top-ranked internet security conferences, using a web scraper built in Python. From the articles, fourteen topics are extracted, using the topic modeling library Gensim and LDA Mallet, and the topics are visualized in graphs to find trends about which topics are emerging and fading away over twenty years. The result found in this research is that topic modeling is a powerful tool to extract topics, and when put into a time perspective, it is possible to identify topic trends, which can be explained when put into a bigger context.
APA, Harvard, Vancouver, ISO, and other styles
23

Mulazzani, Alberto. "Social media sensing: Twitter e Reddit come casi di studio e comparazione applicati ai test prenatali." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text
Abstract:
Avere un figlio per molte persone può essere la gioia più grande della loro vita, ma la gravidanza è uno dei momenti più delicati della vita di una donna e come tale va controllata accuratamente in ogni suo aspetto. Non sempre questo processo è esente da rischi, e quello che è un momento di felicità si può trasformare in un momento difficile. Questo studio si prefigge l'obiettivo di permettere una visione grafica sentiment-oriented di una serie di parole chiave riferite al mondo delle diagnosi prenatali e dei test prenatali. Saranno presentati i dati ottenuti da due piattaforme di Social Networking: Reddit e Twitter nel lasso temporale che va dal 01/01/2011 al 31/03/2018 per rispondere a due domande fondamentali: Quanto è cambiato il volume di dati durante il periodo di tempo analizzato, con valore unitario di un mese e quanto è cambiato il sentiment o opinione dei dati durante il periodo di tempo analizzato, con valore unitario di un mese.
APA, Harvard, Vancouver, ISO, and other styles
24

Yu, Andrew Seohwan. "NBA ON-BALL SCREENS: AUTOMATIC IDENTIFICATION AND ANALYSIS OF BASKETBALL PLAYS." Cleveland State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=csu14943636475232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Kefurt, Pavel. "Získávání znalostí z veřejných semistrukturovaných dat na webu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255386.

Full text
Abstract:
The first part of the thesis deals with the methods and tools that can be used to retrieve data from websites and the tools used for data mining. The second part is devoted to practical demonstration of the entire process. Web Czech Dance Sport Federation, which is available on www.csts.cz , is used as the source web site.
APA, Harvard, Vancouver, ISO, and other styles
26

Kolečkář, David. "Systém pro integraci webových datových zdrojů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417239.

Full text
Abstract:
The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database.
APA, Harvard, Vancouver, ISO, and other styles
27

Bonde-Hansen, Martin. "The Dynamics of Rent Gap Formation in Copenhagen : An empirical look into international investments in the rental market." Thesis, Malmö universitet, Malmö högskola, Institutionen för Urbana Studier (US), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-41157.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Dorka, Moritz. "On the domain-specific formalization of requirement specifications - a case study of ETCS." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-182866.

Full text
Abstract:
This paper presents a piece of software to automatically extract requirements captured in Microsoft Word files while using domain knowledge. In a subsequent step, these requirements are enhanced for implementation purposes and ultimately saved to ReqIF, an XML-based file format for the exchange of specification documents. ReqIF can be processed by a wide range of industry-standard requirements management tools. By way of this enhancement a formalization of both the document structure and selected elements of its natural language contents is achieved. In its current version, the software was specifically developed for processing the Subset-026, a conceptually demanding specification document covering the core functionality of the pan-European train protection system ETCS. Despite this initial focus, the two-part design of this thesis facilitates a generic applicability of its findings: Section 2 presents the fundamental challenges of weakly structured specification documents and devotes a large part to the computation of unique, but human-readable requirement identifiers. Section 3, however, delves into more domain-specific features, the text processing capabilities, and the actual implementation of this novel software. Due to the open-source nature of the application, an adaption to other use-cases can be achieved with comparably little effort
Diese Arbeit befasst sich mit einer Software zur automatisierten Extraktion von Anforderungen aus Dokumenten im Microsoft Word Format unter Nutzung von Domänenwissen. In einem nachgelagerten Schritt werden diese Anforderungen für Implementierungszwecke aufgewertet und schließlich als ReqIF, einem XML-basierten Dateiformat zum Austausch von Spezifikationsdokumenten, gespeichert. ReqIF wird von zahlreichen branchenüblichen Anforderungsmanagementwerkzeugen unterstützt. Durch die Aufwertung wird eine Formalisierung der Struktur sowie ausgewählter Teile der natürlichsprachlichen Inhalte des Dokuments erreicht. Die jetzige Version der Software wurde speziell für die Verarbeitung des Subset-026 entwickelt, eines konzeptionell anspruchsvollen Anforderungsdokuments zur Beschreibung der Kernfunktionalität des europaweiten Zugsicherungssystems ETCS. Trotz dieser ursprünglichen Intention erlaubt die zweigeteilte Gestaltung der Arbeit eine allgemeine Anwendung der Ergebnisse: Abschnitt 2 zeigt die grundsätzlichen Herausforderungen in Bezug auf schwach strukturierte Anforderungsdokumente auf und widmet sich dabei ausführlich der Ermittlung von eindeutigen, aber dennoch menschenlesbaren Anforderungsidentifikatoren. Abschnitt 3 befasst sich hingegen eingehender mit den domänenspezifischen Eigenschaften, den Textaufbereitungsmöglichkeiten und der konkreten Implementierung der neuen Software. Da die Software unter open-source Prinzipien entwickelt wurde, ist eine Anpassung an andere Anwendungsfälle mit relativ geringem Aufwand möglich
APA, Harvard, Vancouver, ISO, and other styles
29

Jílek, Radim. "Služba pro ověření spolehlivosti a pečlivosti českých advokátů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363772.

Full text
Abstract:
This thesis deals with the design and implementation of the Internet service, which allows to objectively assess and verify the reliability and diligence of Czech lawyers based on publicly available data of several courts. The aim of the thesis is to create and put into operation this service. The result of the work are the programs that provide partial actions in the realization of this intention.
APA, Harvard, Vancouver, ISO, and other styles
30

Vivès, Rémi. "Three essays on the role of expectations in business cycles." Thesis, Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0453.

Full text
Abstract:
Cette thèse étudie le rôle des anticipations dans les cycles économiques en analysant trois types d’anticipations différentes. Dans un premier temps, je me concentre sur une explication théorique des cycles économiques générée par des changements d'anticipations qui se révèlent auto-réalisatrices. Ce chapitre contribue à améliorer un puzzle provenant de la littérature sunspot, soutenant ainsi une interprétation des cycles économiques basée sur les prophéties auto-réalisatrices. Dans un deuxième temps, j’analyse empiriquement comment les annonces de la banque centrale se propagent à l’économie via la modification des croyances des acteurs du marché. Ce chapitre montre que des annonces crédibles sur les futures politiques monétaires non conventionnelles peuvent être utilisées comme un instrument de coordination des anticipations dans un contexte de crise de la dette souveraine. Dans un troisième temps, je m'intéresse à un concept plus large d'anticipations et étudie le pouvoir prédictif du climat politique sur la tarification du risque souverain. Ce chapitre montre que le climat politique apporte un pouvoir prédictif supplémentaire aux spreads des obligations d'Etat, au-delà des déterminants traditionnels. Différentes méthodologies sont utilisées dans cette thèse, notamment des analyses théoriques et empiriques, du web scraping ainsi que des méthodes d'apprentissage automatique et d'analyse textuelle. Par ailleurs, j’exploite dans cette thèse des données innovantes provenant du réseau social Twitter. Tous mes résultats transmettent le même message : les anticipations comptent, tant pour la recherche en économie que pour l'élaboration de politiques économiques
In this thesis, I investigate the role of expectations in business cycles by studying three different kinds of expectations. First, I focus on a theoretical explanation of business cycles generated by changes in expectations which turn out to be self-fulfilling. This chapter improves a puzzle from the sunspot literature, thereby giving more evidence towards an interpretation of business cycles based on self-fulfilling prophecies. Second, I empirically analyze the propagation mechanisms of central bank announcements through changes in market participants' beliefs. This chapter shows that credible announcements about future unconventional monetary policies can be used as a coordination device in a sovereign debt crisis framework. Third, I study a broader concept of expectations and investigate the predictive power of political climate on the pricing of sovereign risk. This chapter shows that political climate provides additional predictive power beyond the traditional determinants of sovereign bond spreads. In order to interrogate the role of expectations in business cycles from multiple angles, I use a variety of methodologies in this thesis, including theoretical and empirical analyses, web scraping, machine learning, and textual analysis. In addition, this thesis uses innovative data from the social media platform Twitter. Regardless of my methodology, all my results convey the same message: expectations matter, both for economic research and economically sound policy-making
APA, Harvard, Vancouver, ISO, and other styles
31

Tadisetty, Srikanth. "Prediction of Psychosis Using Big Web Data in the United States." Kent State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=kent1532962079970169.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Santos, João Manuel Azevedo. "Real Estate Market Data Scraping and Analysis for Financial Investments." Dissertação, 2018. https://repositorio-aberto.up.pt/handle/10216/116510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Santos, João Manuel Azevedo. "Real Estate Market Data Scraping and Analysis for Financial Investments." Master's thesis, 2018. https://repositorio-aberto.up.pt/handle/10216/116510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Hsu, Ning, and 徐寧. "Intellectual Property Law and Competition Law Regimes on Data Collection in the Era of Big Data: Focusing on Web Scraping." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/qtavez.

Full text
Abstract:
碩士
國立政治大學
科技管理與智慧財產研究所
106
In the era of Big Data, an unprecedented scale of digital data is being generated, which leads to an explosion of “publicly available” content on websites. In order to obtain those data from the Web, an automatic and efficient data extraction technology, commonly referred to as “web scraping”, has been created. It has become one of the indispensable technologies to gain access to data sources outside of a firm. Web scraping, however, often involves unauthorized use of scraped data for commercial purposes. Data scrapers thus face potential legal liabilities for copyright infringement or considered in contravention of unfair competition law. As the lawfulness associated with web scraping is highly fact sensitive, legal uncertainty might hinder innovative data-driven business models. This paper examines the commercial use of web scraping technologies which retrieves data from public websites. It examines copyright infringement claims in cases such as Kelly v. Arriba, Field v. Google, and AP v. Meltwater. It then reviews the leading cases in the United States, China, and Taiwan involving famous digital companies such as Google, Yelp, and Baidu. Lastly, the paper explores and provides recommendations on how to govern web scraping to better achieve the balance between free flow of information, and the interests of different market participants.
APA, Harvard, Vancouver, ISO, and other styles
35

Fabrício, Gustavo de Souza Machado. "Does sacking a coach really help? Evidence from a Difference-in-Differences approach." Master's thesis, 2022. http://hdl.handle.net/10362/136015.

Full text
Abstract:
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
This project looks to evaluate if football clubs should or should not change their coach in order to improve their performance in the national league. For this analysis I selected, three of the most important European football leagues, La Liga (Spain), Serie A (Italy) and Premier League (England). The data used in this project was taken from the transfermarkt website, a large football platform. The data period is from season 2005-06 to season 2019-20 and has information about individual games results and squad value by player. The steps before the analysis were a data cleaning and consolidation of the information, creation of new features as a performance measure and selection of cases of interest for this analysis based on club and coach profile. Numeric variables were standardized to be on the same scale and make different seasons comparable. A K-means was applied to identify clubs according to their investments which has a proportional correlation with performance. Finally, a difference in differences analysis was applied to evaluate if a club would obtain a performance gain if they decided to sack their coach between game twelve and twenty-six of the season after a poor performance in consideration to squad price. As a general conclusion, it is possible to consider that on average the clubs in the treatment group and comparison group recover their performance after a period of underperforming, but the recovery of the clubs that sack their coach is lower compared with the clubs that keep them.
APA, Harvard, Vancouver, ISO, and other styles
36

Cunha, Paulo Ricardo Gonçalves da. "Strategies for extracting web data: practical case." Master's thesis, 2018. http://hdl.handle.net/1822/59299.

Full text
Abstract:
Dissertação de mestrado integrado em Engineering and Management of Information Systems
Nowadays, the task of collecting data from Web sources is becoming increasingly complex. This complexity arises, in part, from the large data volume (and continues to increase), as well as from the proliferation of platforms that make them available. Based on the previous assumption, this dissertation project had as main objective the identification of strategies that allow the extraction of data from Web sources. In order to reach this goal, the following tasks were defined: identification of tools and frameworks that aid in the extraction process of data, tests with the tools and frameworks identified, development of a framework that illustrates possible strategies for the extraction of data and finally the application of the proposed framework in a Practical Case. The proposed framework consists of a methodology with possible strategies for extracting data from web sources. The Practical Case was carried out on the ALGORITMI Research Centre of the University of Minho. In the first instance, the data of the authors in the ALGORITMI Research Centre are collected. Other data are then collected from other sources, such as their publications and later stored in a relational database. The collections and decisions taken during the study case are based on the application of the proposed framework. The insertion of the data obtained from different sources in a single location allows the creation of a Single Entry Point for reading data, that is, we have a single data source. The creation of this unique data source will allow the user to access all the data desired without the need to spend time trying to locate it The present work is organized in five chapters: introduction (where a brief description is given to the problem and objectives of the work), literary review (concepts, methodologies and strategies for obtaining data from Web sources), framework proposal, application of the proposed framework in a Practical Case that focuses on the ALGORITMI Research Centre and finally the conclusion (where some considerations are woven and some proposals for future work are presented).
Nos dias de hoje, a tarefa de recolha de dados proveniente de fontes Web está a tornar-se cada vez mais complexa. Esta complexidade surge, em parte, do grande volume de dados existente (e que continua a aumentar), assim como, da proliferação de plataformas que os disponibilizam. Tendo por base o pressuposto anterior, este projeto de dissertação teve como principal objetivo a identificação de estratégias que possibilitam a extração de dados de fontes Web. Para alcançar esse objetivo foram definidas as seguintes tarefas: identificação de ferramentas e frameworks que auxiliam no processo de extração de dados, realização de testes com as ferramentas e frameworks identificados, desenvolvimento de um framework que ilustra as estratégias possíveis para a extração de dados e por fim a aplicação do framework proposto num caso de estudo. O framework proposto consiste numa metodologia com as estratégias possíveis para a extração de dados provenientes de fontes web. O caso de estudo realizado incide sobre o Centro ALGORITMI da Universidade do Minho. Em primeira instância procede-se à recolha dos dados dos autores existentes no Centro ALGORITMI. De seguida são recolhidos outros dados de outras fontes, tais como, as suas publicações e posteriormente armazenados numa base de dados relacional. As recolhas e decisões tomadas no decorrer do caso de estudo baseiam-se na aplicação do framework proposto. A inserção dos dados obtidos de diferentes fontes num único local permite a criação de um Single Entry Point para a leitura de dados, ou seja, passamos a possuir uma única fonte de dados. A criação desta fonte única de dados permitirá ao utilizador aceder aos dados que pretende sem a necessidade de despender muito tempo à sua procura. O presente trabalho encontra-se organizado em cinco capítulos sendo eles: introdução (onde é efetuada uma descrição ao problema e objetivos do trabalho), revisão literária (conceitos, metodologias e estratégias para obtenção de dados de fontes Web), framework (proposta e explicação da metodologia desenvolvida), caso de estudo (aplicação do framework proposto num caso de estudo que incide sobre o centro ALGORITMI) e conclusão (onde são tecidas consideração e apresentadas algumas propostas para trabalhos futuros).
APA, Harvard, Vancouver, ISO, and other styles
37

Freire, Filipe Manuel Leitão Gonçalves. "Recolha de contratos de despesa pública e segmentação dos perfis de despesa a nível municipal." Master's thesis, 2020. http://hdl.handle.net/10362/97480.

Full text
Abstract:
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
Devido à necessidade de analisar como são investidos os capitais públicos nos municípios Portugueses nos diversos tipos de contratos de aquisição de bens e serviços, torna-se fundamental criar ferramentas que permitam a compreensão destes investimentos. É desejável perceber como oscilam estes investimentos em função da dimensão da população. Neste projeto, o objetivo é recolher dados disponibilizados na web sobre contratos e criar uma segmentação para os diversos tipos de despesa pública, que permita detetar eventuais desvios anómalos na relação entre despesa pública municipal e dimensão populacional. Para este efeito, foi desenvolvido um web crawler com recurso à linguagem de programação Python que permitiu extrair de forma automática os contratos públicos do site http://www.base.gov.pt/. Foram analisados os dados recolhidos tendo sido detetada uma relação do tipo log-log entre população e despesa pública. Posteriormente foi feita uma análise de segmentação com base nos resíduos da relação anteriormente mencionada com recurso a técnicas de DataMining. Foram usados diversos algoritmos de Clustering, em particular, o K-Medoids, do qual foram gerados dois grupos distintos de tipos de despesa.
Due to the need to analyze how public capital is invested in Portuguese municipalities in the various types of contracts for the acquisition of goods and services, it is essential to create tools that allow the understanding of these investments. It is desirable to understand how these investments oscillate according to the size of the population. In this project, the objective is to collect data available on the web about contracts and to create a segmentation for the various types of public expenditure, allowing to detect any anomalous deviations in the relationship between municipal public expenditure and population size. For this purpose, a web crawler was developed using the Python programming language that allowed to automatically extract public contracts from the site http://www.base.gov.pt/. The data collected were analyzed and a log-log relationship between population and public expenditure was detected. Subsequently, a segmentation analysis based on the residues of the referred relationship was performed using DataMining techniques. Several Clustering algorithms were used, in particular K-Medoids, from which two distinct groups of expense types were generated.
APA, Harvard, Vancouver, ISO, and other styles
38

Fiorani, Matteo. "Mixed-input second-hand car price estimation model based on scraped data." Master's thesis, 2022. http://hdl.handle.net/10362/134276.

Full text
Abstract:
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
The number of second-hand cars is growing year by year. More and more people prefer to buy a second-hand car rather than a new one due to the increasing cost of new cars and their fast devaluation in price. Consequently, there has also been an increase in online marketplaces for peerto- peer (P2P) second-hand cars trades. A robust price estimation is needed for both dealers, to have a good idea on how to price their cars, and buyers, to understand whether a listing is overpriced or not. Price estimation for second-hand cars has been, to my knowledge, so far only explored with numerical and categorical features such as mileage driven, brand or production year. An approach that also uses image data has yet to be developed. This work aims to investigate the use of a multi-input price estimation model for second-hand cars taking advantage of a convolutional neural network (CNN), to extract features from car images, combined with an artificial neural network (ANN), dealing with the categorical-numerical features, and assess whether this method improves accuracy in price estimation over more traditional single-input methods. To train and evaluate the model, a dataset of second-hand car images and textual features is scraped from a marketplace and curated such that more than 700 images can be used for the training.
APA, Harvard, Vancouver, ISO, and other styles
39

(8086355), Ryan Merrill Dailey. "Automated Discovery of Real-Time Network Camera Data from Heterogeneous Web Pages." Thesis, 2021.

Find full text
Abstract:
Reduction in the cost of Network Cameras along with a rise in connectivity enables entities all around the world to deploy vast arrays of camera networks. Network cameras offer real-time visual data that can be used for studying traffic patterns, emergency response, security, and other applications. Although many sources of Network Camera data are available, collecting the data remains difficult due to variations in programming interface and website structures. Previous solutions rely on manually parsing the target website, taking many hours to complete. We create a general and automated solution for indexing Network Camera data spread across thousands of uniquely structured webpages. We analyze heterogeneous webpage structures and identify common characteristics among 73 sample Network Camera websites (each website has multiple web pages). These characteristics are then used to build an automated camera discovery module that crawls and indexes Network Camera data. Our system successfully extracts 57,364 Network Cameras from 237,257 unique web pages.
APA, Harvard, Vancouver, ISO, and other styles
40

Fejfar, Petr. "Interaktivní procházení webu a extrakce dat." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-389671.

Full text
Abstract:
Title: Interactive crawling and data extraction Author: Bc. Petr Fejfar Author's e-mail address: pfejfar@gmail.com Department: Department of Distributed and Dependable Systems Supervisor: Mgr. Pavel Je ek, Ph.D., Department of Distributed and De- pendable Systems Abstract: The subject of this thesis is Web crawling and data extraction from Rich Internet Applications (RIA). The thesis starts with analysis of modern Web pages along with techniques used for crawling and data extraction. Based on this analysis, we designed a tool which crawls RIAs according to the instructions defined by the user via graphic interface. In contrast with other currently popular tools for RIAs, our solution is targeted at users with no programming experience, including business and analyst users. The designed solution itself is implemented in form of RIA, using the Web- Driver protocol to automate multiple browsers according to user-defined instructions. Our tool allows the user to inspect browser sessions by dis- playing pages that are being crawled simultaneously. This feature enables the user to troubleshoot the crawlers. The outcome of this thesis is a fully design and implemented tool enabling business user to extract data from the RIAs. This opens new opportunities for this type of user to collect data from Web pages for use...
APA, Harvard, Vancouver, ISO, and other styles
41

Botelho, Miguel Tavares. "Unfolding the influencing factors and dynamics of overall hotel scores." Master's thesis, 2019. http://hdl.handle.net/10071/19456.

Full text
Abstract:
The hospitality and tourism industry was boosted by the help of hotel review sites, which consists in an increasing demand on the part of tourists. We extracted more than thirty thousand reviews from Tripadvisor to understand the variations in customers' perceptions of high/low end and chain/independent hotels and on which aspects this variation is most evident. We used sentiment analysis to assign a score to the aspects of each review. We compared machine learning algorithms, namely, random forest, decision tree and decision tree with adaBoost, to predict the overall score. Then, we used the Gini index to understand the aspects that most influence the overall score. Finally, we compared the reviews with temporal windows overtime with Jaccard index to characterize the dynamics of customer satisfaction focusing on three aspects: "Service", "Location" and "Sleep". Correlating the responses of the hotel to the users' reviews, we wanted to demonstrate the impact in the customers' perception of the hotel quality. The best performances were achieved by the decision trees which indicated that "Service" is the most influential aspect for satisfaction, while "Location" and "Sleep" were the aspects considered less important. By identifying the moments of drastic changes, we verified that "Service" is also the most related to the overall score. These analyses allow hotel management to track the trends of tourists' assessment in each category. Generally speaking, a focus on the "Service" should be done. However, an analysis, for a particular hotel, of the dynamics of the overall score to compare with its category would be advantageous.
A indústria da hospitalidade e turismo foi impulsionada pela ajuda de sites de avaliações de hotéis, que leva a uma exigencia crescente por parte dos turistas. Extraímos mais de trinta mil avaliações do Tripadvisor para entender as variações nas percepções dos clientes de hotéis de alta/baixa gama e cadeia/independentes e quais os aspectos essa variação é mais evidente. Usámos sentiment analysis para atribuir uma pontuação aos aspectos de cada avaliação. Comparámos algoritmos de aprendizagem automática, nomeadamente, "random forest", "decision tree" e "decision tree with adaBoost", para prever a pontuação geral. Depois, usámos o índice de Gini para entender os aspectos que mais influenciam a pontuação geral. Por fim, comparámos avaliações com as janelas temporais ao longo do tempo com o índice de Jaccard para caracterizar a dinâmica de satisfação do cliente com foco em três aspectos: "Service", "Location" e "Sleep". Ao correlacionar as respostas do hotel com as avaliações, queriamos demonstrar o impacto na percepção dos clientes sobre a qualidade dos hoteis. Os melhores desempenhos foram alcançados pelo decision tree que indicou que "Service" é o aspecto mais influente para satisfação, enquanto que "Location" e "Sleep" foram os aspectos considerados menos importantes. Ao identificar os momentos de mudanças drásticas, constatámos que "Service" também é o mais relacionado à pontuação geral. Estas análises permitem que a gestão dos hoteis acompanhe as tendências da avaliação dos turistas em cada categoria. De um modo geral, um foco no serviço deve ser feito. No entanto, uma análise, para um hotel particular, da dinâmica da pontuação geral para comparar com sua categoria seria vantajosa.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography