Academic literature on the topic 'Data Scraping'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Data Scraping.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Data Scraping"

1

Khder, Moaiad. "Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application." International Journal of Advances in Soft Computing and its Applications 13, no. 3 (November 28, 2021): 145–68. http://dx.doi.org/10.15849/ijasca.211128.11.

Full text
Abstract:
Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.
APA, Harvard, Vancouver, ISO, and other styles
2

Padghan, Sameer, Satish Chigle, and Rahul Handoo. "Web Scraping-Data Extraction Using Java Application and Visual Basics Macros." Journal of Advances and Scholarly Researches in Allied Education 15, no. 2 (April 1, 2018): 691–95. http://dx.doi.org/10.29070/15/56996.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Scassa, Teresa. "Ownership and control over publicly accessible platform data." Online Information Review 43, no. 6 (October 14, 2019): 986–1002. http://dx.doi.org/10.1108/oir-02-2018-0053.

Full text
Abstract:
Purpose The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse these data. Design/methodology/approach Using Airbnb as a case study, this paper examines the data ecosystem that arises around publicly accessible platform data. It analyzes current statute and case law in order to understand the state of the law around the scraping of such data. Findings This paper demonstrates that there is considerable uncertainty about the practice of data scraping, and that there are risks in allowing the law to evolve in the context of battles between business competitors without a consideration of the broader public interest in data scraping. It argues for a data ecosystem approach that can keep the public dimension issues more squarely within the frame when data scraping is judicially considered. Practical implications The nature of some sharing economy platforms requires that a large subset of their data be publicly accessible. These data can be used to understand how platform companies operate, to assess their compliance with laws and regulations and to evaluate their social and economic impacts. They can also be used in different kinds of data analytics. Such data are therefore sought after by civil society organizations, researchers, entrepreneurs and regulators. This paper considers who has a right to control access to and use of these data, and addresses current uncertainties in how the law will apply to scraping activities, and builds an argument for a consideration of the public interest in data scraping. Originality/value The issue of ownership/control over publicly accessible information is of growing importance; this paper offers a framework for approaching these legal questions.
APA, Harvard, Vancouver, ISO, and other styles
4

Maślankowski, Jacek. "The collection and analysis of the data on job advertisements with the use of big data." Wiadomości Statystyczne. The Polish Statistician 64, no. 9 (September 30, 2019): 60–74. http://dx.doi.org/10.5604/01.3001.0013.7590.

Full text
Abstract:
The goal of this paper is to present, on the one hand, the benefits for offi-cial statistics (labour market) resulting from the use of web scraping methods to gather data on job advertisements from websites belonging to big data compilations, and on the other, the challenges connected to this process. The paper introduces the results of experimental research where web-scraping and text-mining methods were adopted. The analysis was based on the data from 2017–2018 obtained from the most popular job-searching websites, which was then collated with Statistics Poland’s data obtained from Z-05 forms. The above-mentioned analysis demonstrated that web-scraping methods canbe adopted by public statistics services to obtain statistical data from alternative sourcescomplementing the already-existing databases, providing the findings of such researchremain coherent with the results of the already-existing studies.
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Yuguang, Dengyun Zhu, Bin Zhang, Qi Guo, Fucheng Wan, and Ning Ma. "Review of data scraping and data mining research." Journal of Physics: Conference Series 1982, no. 1 (July 1, 2021): 012161. http://dx.doi.org/10.1088/1742-6596/1982/1/012161.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Maulana, Afrizal Aziz, Ajib Susanto, and Desi Purwanti Kusumaningrum. "Rancang Bangun Web Scraping Pada Marketplace di Indonesia." JOINS (Journal of Information System) 4, no. 1 (July 1, 2019): 41–53. http://dx.doi.org/10.33633/joins.v4i1.2544.

Full text
Abstract:
E-commerce dan marketplace berkaitan dengan sistem dropship. Dropship merupakan istilah dari jual beli dimana drop shipper (pengecer) tidak memiliki barang. Drop shipper kini masih menggunakan cara manual dalam mendapatkan data barang dari supplier dan unggah yaitu dengan mengambil data barang secara satu persatu dan mengunggah manual satu persatu yang membutuhkan waktu cukup lebih. Pada penelitian ini dibangun aplikasi terbaru untuk membantu drop shipper dalam mendapatkan data produk dan mengunggahnya secara otomatis. Pengembangan sistem yang digunakan adalah waterfall dengan alur proses dari analisis kebutuhan, perancangan sistem, implementasi sistem, pengujian dan pemeliharaan sistem. Penelitian ini menghasilkan sebuah aplikasi yang dapat mengambil data / scraping barang pada sebuah toko supplier yang kemudian mendapatkan hasil pengambilan data dalam bentuk .csv. Kemudian dilakukan proses unggah secara otomatis dengan hanya memasukkan nama file hasil pengambilan data yang berformat .csv, maka data otomatis terunggah ke toko drop shipper. Hasil pengujian web scrapping berhasil dilakukan dengan mengambil data produk dari marketplace Tokopedia, Shoopee dan diunggah ke e-commerce Afrizal22hop.Kata kunci : Marketplace, E-commerce, Dropship, Drop shipper, Web Scraping
APA, Harvard, Vancouver, ISO, and other styles
7

Speckmann, Felix. "Web Scraping." Zeitschrift für Psychologie 229, no. 4 (December 2021): 241–44. http://dx.doi.org/10.1027/2151-2604/a000470.

Full text
Abstract:
Abstract. When people use the Internet, they leave traces of their activities: blog posts, comments, articles, social media posts, etc. These traces represent behavior that psychologists can analyze. A method that makes downloading those sometimes very large datasets feasible is web scraping, which involves writing a program to automatically download specific parts of a website. The obtained data can be used to exploratorily generate new hypotheses, test existing ones, or extend existing research. The present Research Spotlight explains web scraping and discusses the possibilities, limitations as well as ethical and legal challenges associated with the approach.
APA, Harvard, Vancouver, ISO, and other styles
8

Krotov, Vlad, and Matthew Tennyson. "Research Note: Scraping Financial Data from the Web Using the R Language." Journal of Emerging Technologies in Accounting 15, no. 1 (February 1, 2018): 169–81. http://dx.doi.org/10.2308/jeta-52063.

Full text
Abstract:
ABSTRACT The main goal of this research note is to educate business researchers on how to automatically scrape financial data from the World Wide Web using the R programming language. This paper is organized into the following main parts. The first part provides a conceptual overview of the web scraping process. The second part educates the reader about the Rvest package—a popular tool for browsing and downloading web data in R. The third part educates the reader about the main functions of the XBRL package. The XBRL package was developed specifically for working with financial data distributed using the XBRL format in the R environment. The fourth part of this paper presents an example of a relatively complex web scraping task implemented using the R language. This complex web scraping task involves using both the Rvest and XBRL packages for the purposes of retrieving, preprocessing, and organizing financial and nonfinancial data related to a company from various sources and using different data forms. The paper ends with some concluding remarks on how the web scraping approach presented in this paper can be useful in other research projects involving financial and nonfinancial data.
APA, Harvard, Vancouver, ISO, and other styles
9

Rao, M. Kameswara, Rohit Lagisetty, M. S. V. K. Maniraj, K. N. S. Dattu, and B. Sneha Ganga. "Commodity Price Data Analysis Using Web Scraping." International Journal of Advances in Applied Sciences 4, no. 4 (December 1, 2015): 146. http://dx.doi.org/10.11591/ijaas.v4.i4.pp146-150.

Full text
Abstract:
<p>Today, analysis of data which is available on the web has become more popular, by using such data we are capable to solve many issues. Our project deals with the analysis of commodity price data available on the web. In general, commodity price data analysis is performed to know inflation rate prevailing in the country and also to know cost price index (CPI). Presently in some countries this analysis is done manually by collecting data from different cities, then calculate inflation and CPI using some predefined formulae. To make this entire process automatic we are developing this project. Now a day’s most of the customers are depending on online websites for their day to day purchases. This is the reason we are implementing a system to collect the data available in various e-commerce sites for commodity price analysis. Here, we are going to introduce a data scraping technique which enables us to collect data of various products available online and then store it in a database there after we perform analysis on them. By this process we can reduce the burden of collecting data manually by reaching various cities. The system consists of web module which perform analysis and visualization of data available in the database.</p>
APA, Harvard, Vancouver, ISO, and other styles
10

Gallagher, John R., and Aaron Beveridge. "Project-Oriented Web Scraping in Technical Communication Research." Journal of Business and Technical Communication 36, no. 2 (December 13, 2021): 231–50. http://dx.doi.org/10.1177/10506519211064619.

Full text
Abstract:
This article advocates for web scraping as an effective method to augment and enhance technical and professional communication (TPC) research practices. Web scraping is used to create consistently structured and well-sampled data sets about domains, communities, demographics, and topics of interest to TPC scholars. After providing an extended description of web scraping, the authors identify technical considerations of the method and provide practitioner narratives. They then describe an overview of project-oriented web scraping. Finally, they discuss implications for the concept as a sustainable approach to developing web scraping methods for TPC research.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Data Scraping"

1

Carle, Victor. "Web Scraping using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281344.

Full text
Abstract:
This thesis explores the possibilities of creating a robust Web Scraping algorithm, designed to continously scrape a specific website even though the HTML code is altered. The algorithm is intended to be used on websites that have a repetitive HTML structure containing data that can be scraped. A repetitive HTML structure often displays; news articles, videos, books, etc. This creates code in the HTML which is repeated many times, as the only thing different between the things displayed are for example titles. A good examplewould be Youtube. The scraper works through using text classification of words in the code of the HTML, training a Support Vector Machine to recognize the words or variable names. Classification of the words surrounding the sought-after data is done with the assumption that the future HTML ofa website will be similar to the current HTML, this in turn allows for robust scraping to be performed. To evaluate its performance a web archive is used in which the performance of the algorithm is back-tested on past versions of the site to hopefully get an idea of what the performance in the future might look like. The algorithm achieves varying results depending on a large variety of variables within the websites themselves as well as the past versions of the websites. The best performance was achieved on Yahoo news achieving an accuracy of 90 % dating back three months from the time the scraper stopped working.
Den här rapporten undersöker vad som krävs för att skapa en robust webbskrapare, designad för att kontinuerligt kunna skrapa en specifik hemsida trots att den underliggande HTML-koden förändras. En algoritm presenteras som är lämplig för hemsidor med en repetitiv HTML-struktur. En repetitiv HTML struktur innebär ofta att det visas saker såsom nyhetsartiklar, videos, böcker och så vidare. Det innebär att samma HTML-kod återanvänds ett flertal gånger, då det enda som skiljer de här sakerna åt är exempelvis deras titlar. Ett bra exempel är hemsidan Youtube. Skraparen funkar genom att använda textklassificering av ord som finns i HTML-koden, på så sätt kan maskinlärningsalgoritmen, support vector machine, känna igen den kod som omger datan som är eftersökt på hemsidan. För att möjliggöra detta så förvandlas HTML-koden, samt relevant metadata, till vektorer med hjälp av bag-of-words-modellen. Efter omvandlingen kan vektorerna matas in i maskinlärnings-modellen och klassifiera datan. Algoritmen testas på äldre versioner utav hemsidan tagna från ett webarkiv för att förhoppningsvis få en bra bild utav vad framtida prestationer skulle kunna vara. Algoritmen uppnår varierande resultat baserat på en stor mängd variabler inom hemsidan samt de äldre versionerna av hemsidorna. Algoritmen presterade bäst på Yahoo news där den uppnådde 90 % träffsäkerhet på äldre sidor.
APA, Harvard, Vancouver, ISO, and other styles
2

Färholt, Fredric. "Less Detectable Web Scraping Techniques." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-104887.

Full text
Abstract:
Web scraping is an efficient way of gathering data, and it has also become much eas- ier to perform and offers a high success rate. People no longer need to be tech-savvy when scraping data since several easy-to-use platform services exist. This study conducts experiments to see if people can scrape in an undetectable fashion using a popular and intelligent JavaScript library (Puppeteer). Three web scraper algorithms, where two of them use movement patterns from real-world web users, demonstrate how to retrieve information automatically from the web. They operate on a website built for this research that utilizes known semi-security mechanisms, honeypot, and activity logging, making it possible to collect and evaluate data from the algorithms and the website. The result shows that it may be possible to construct a web scraper algorithm with less detectability using Puppeteer. One of the algorithms reveals that it is possible to control computer performance using built-in methods in Puppeteer.
Webbskrapning är ett effektivt sätt att hämta data på, det har även blivit en aktivitet som är enkel att genomföra och chansen att en lyckas är hög. Användare behöver inte längre vara fantaster inom teknik när de skrapar data, det finns idag mängder olika och lättanvändliga plattformstjänster. Den här studien utför experi- ment för att se hur personer kan skrapa på ett oupptäckbart sätt med ett populärt och intelligent JavaScript bibliotek (Puppeteer). Tre webbskrapningsalgoritmer, där två av dem använder rörelsemönster från riktiga webbanvändare, demonstrerar hur en kan samla information. Webbskrapningsalgoritmerna har körts på en hemsida som ingått i experimentet med kännbar säkerhet, honeypot, och aktivitetsloggning, nå- got som gjort det möjligt att samla och utvärdera data från både algoritmerna och hemsidan. Resultatet visar att det kan vara möljligt att skrapa på ett oupptäckbart sätt genom att använda Puppeteer. En av algoritmerna avslöjar även möjligheten att kontrollera prestanda genom att använda inbyggda metoder i Puppeteer.
APA, Harvard, Vancouver, ISO, and other styles
3

Legaspi, Ramos Xurxo. "Scraping Dynamic Websites for Economical Data : A Framework Approach." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-57070.

Full text
Abstract:
Internet is a source of live data that is constantly updating with data of almost anyfield we can imagine. Having tools that can automatically detect these updates andcan select that information that we are interested in are becoming of utmost importancenowadays. That is the reason why through this thesis we will focus on someeconomic websites, studying their structures and identifying a common type of websitein this field: Dynamic Websites. Even when there are many tools that allow toextract information from the internet, not many tackle these kind of websites. Forthis reason we will study and implement some tools that allow the developers to addressthese pages from a different perspective.
APA, Harvard, Vancouver, ISO, and other styles
4

Oucif, Kadday. "Evaluation of web scraping methods : Different automation approaches regarding web scraping using desktop tools." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188418.

Full text
Abstract:
A lot of information can be found and extracted from the semantic web in different forms through web scraping, with many techniques emerging throughout time. This thesis is written with the objective to evaluate different web scraping methods in order to develop an automated, performance reliable, easy implemented and solid extraction process. A number of parameters are set to better evaluate and compare consisting techniques. A matrix of desktop tools are examined and two were chosen for evaluation. The evaluation also includes the learning of setting up the scraping process with so called agents. A number of links gets scraped by using the presented techniques with and without executing JavaScript from the web sources. Prototypes with the chosen techniques are presented with Content Grabber as a final solution. The result is a better understanding around the subject along with a cost-effective extraction process consisting of different techniques and methods, where a good understanding around the web sources structure facilitates the data collection. To sum it all up, the result is discussed and presented with regard to chosen parameters.
En hel del information kan bli funnen och extraherad i olika format från den semantiska webben med hjälp av webbskrapning, med många tekniker som uppkommit med tiden. Den här rapporten är skriven med målet att utvärdera olika webbskrapnings metoder för att i sin tur utveckla en automatiserad, prestandasäker, enkelt implementerad och solid extraheringsprocess. Ett antal parametrar är definierade för att utvärdera och jämföra befintliga webbskrapningstekniker. En matris av skrivbords verktyg är utforskade och två är valda för utvärdering. Utvärderingen inkluderar också tillvägagångssättet till att lära sig sätta upp olika webbskrapnings processer med så kallade agenter. Ett nummer av länkar blir skrapade efter data med och utan exekvering av JavaScript från webbsidorna. Prototyper med de utvalda teknikerna testas och presenteras med webbskrapningsverktyget Content Grabber som slutlig lösning. Resultatet utav det hela är en bättre förståelse kring ämnet samt en prisvärd extraheringsprocess bestående utav blandade tekniker och metoder, där en god vetskap kring webbsidornas uppbyggnad underlättar datainsamlingen. Sammanfattningsvis presenteras och diskuteras resultatet med hänsyn till valda parametrar.
APA, Harvard, Vancouver, ISO, and other styles
5

Rodrigues, Lanny Anthony, and Srujan Kumar Polepally. "Creating Financial Database for Education and Research: Using WEB SCRAPING Technique." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-36010.

Full text
Abstract:
Our objective of this thesis is to expand the microdata database of publicly available corporate information of the university by web scraping mechanism. The tool for this thesis is a web scraper that can access and concentrate information from websites utilizing a web application as an interface for client connection. In our comprehensive work we have demonstrated that the GRI text files approximately consist of 7227 companies; from the total number of companies the data is filtered with “listed” companies. Among the filtered 2252 companies some do not have income statements data. Hence, we have finally collected data of 2112 companies with 36 different sectors and 13 different countries in this thesis. The publicly available information of income statements between 2016 to 2020 have been collected by GRI of microdata department. Collecting such data from any proprietary database by web scraping may cost more than $ 24000 a year were collecting the same from the public database may cost almost nil, which we will discuss further in our thesis.In our work we are motivated to collect the financial data from the annual financial statement or financial report of the business concerns which can be used for the purpose to measure and investigate the trading costs and changes of securities, common assets, futures, cryptocurrencies, and so forth. Stock exchange, official statements and different business-related news are additionally sources of financial data that individuals will scrape. We are helping those petty investors and students who require financial statements from numerous companies for several years to verify the condition of the economy and finance concerning whether to capitalise or not, which is not possible in a conventional way; hence they use the web scraping mechanism to extract financial statements from diverse websites and make the investment decisions on further research and analysis.Here in this thesis work, we have indicated the outcome of the web scraping is to keep the extracted data in a database. The gathered data of the resulted database can be implemented for the required goal of further research, education, and other purposes with the further use of the web scraping technique.
APA, Harvard, Vancouver, ISO, and other styles
6

Cosman, Vadim, and Kailash Chowdary. "End user interface for collecting and evaluating company data : Real-time data collection through web-scraping." Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37740.

Full text
Abstract:
The demand of open and reliable data, in the Era of Big Data is constantly increasing as thediversity of research and the need of trustworthy data as high-quality data is increasesconsiderably the quality of the findings . However, it is very hard to get reliable data for free witha small effort. With an immense progress of tools, on one hand for data scraping, data cleansing,data storing, and on the other hand so many platforms with data that can be scrapped, it isabsolutely crucial to make use of them and easily build data sets with real and trustworthy data,for free and in a user-friendly way. Using several available tools, an application with a graphicaluser interface (GUI) was developed. The possibilities of the applications are: collecting financialdata for any given list of companies, updating an existent data set, build a data set out of thewhole data warehouse(DW), based on several filters, make the data sets available to anyone whouses the application, and build simple visualization of the existent data. To make sure that‘garbage data in – garbage data out’ concept is avoided, a constant analysis of the data quality isperformed, and the quality of the data is adjusted so that it is ready for use in a research project.The work provides a viable solution for collecting data and making it borderless while respectingthe standards of data sharing. The application can collect data from 2 sources, with more than250 features per company. The application is updated with more functionalities and more sourcesof data.
APA, Harvard, Vancouver, ISO, and other styles
7

Ceccaroni, Giacomo. "Raccolta di dati eterogenei e multi-sorgente per data visualization dei rapporti internazionali dell'Ateneo di Bologna." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13940/.

Full text
Abstract:
Il caso di studio descritto all'interno di questo documento, analizza la raccolta dei dati sul Web e la visualizzazione di tali attraverso tecniche di Data Visualization. Il sistema risultante si pone come obiettivo quello di poter essere utilizzato dal personale dell'Area Relazioni Internazionali dell'Università di Bologna per ottenere informazioni utili alla mappatura delle relazioni internazionali mantenute da professori e ricercatori. L'obiettivo è stato raggiunto partendo dai requisiti specificati, utilizzati poi nella successiva fase di analisi del problema e del dominio. L'utilizzo pratico che verrà fatto dell'applicazione Web finita, è descritto attraverso la scrittura di scenari previsti e casi d'uso. La parte implementativa del progetto si sviluppa iniziando con una panoramica delle tecnologie utilizzate per raggiungere l'obiettivo e delle ragioni che hanno portato a tali scelte. Tra gli elementi tecnologici trattati vi sono Couchbase Server, Scopus API, moduli Python e framework Javascript. In particolare, per mettere in atto la visualizzazione dei dati nel progetto sono stati utilizzati: D3.js e Leaflet.js.
APA, Harvard, Vancouver, ISO, and other styles
8

Franchini, Giulia. "Associazioni non profit e linked open data: un esperimento." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8350/.

Full text
Abstract:
Le Associazioni Non Profit giocano un ruolo sempre più rilevante nella vita dei cittadini e rappresentano un'importante realtà produttiva del nostro paese; molto spesso però risulta difficile trovare informazioni relative ad eventi, attività o sull'esistenza stessa di queste associazioni. Per venire in contro alle esigenze dei cittadini molte Regioni e Province mettono a disposizione degli elenchi in cui sono raccolte le informazioni relative alle varie organizzazioni che operano sul territorio. Questi elenchi però, presentano spesso grossi problemi, sia per quanto riguarda la correttezza dei dati, sia per i formati utilizzati per la pubblicazione. Questi fattori hanno portato all'idea e alla necessità di realizzare un sistema per raccogliere, sistematizzare e rendere fruibili le informazioni sulle Associazioni Non Profit presenti sul territorio, in modo che questi dati possano essere utilizzati liberamente da chiunque per scopi diversi. Il presente lavoro si pone quindi due obiettivi principali: il primo consiste nell'implementazione di un tool in grado di recuperare le informazioni sulle Associazioni Non Profit sfruttando i loro Siti Web; questo avviene per mezzo dell'utilizzo di tecniche di Web Crawling e Web Scraping. Il secondo obiettivo consiste nel pubblicare le informazioni raccolte, secondo dei modelli che ne permettano un uso libero e non vincolato; per la pubblicazione e la strutturazione dei dati è stato utilizzato un modello basato sui principi dei linked open data.
APA, Harvard, Vancouver, ISO, and other styles
9

Holm, Andreas, and Oscar Ahlm. "Skrapa Facebook : En kartläggning över hur data kan samlas in från Facebook." Thesis, Malmö universitet, Institutionen för datavetenskap och medieteknik (DVMT), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-43326.

Full text
Abstract:
På sociala medier delas det varje dag en stor mängd data. Om denna data kan samlas in ochsorteras, kan den vara värdefull som underlag för forskningsarbete. Särskilt för forskning iländer där sociala medier kan vara enda platsen för medborgare att göra sin röst hörd. Fa-cebook är en av världens mest använda sociala medieplattformar och är därför en potentiellrik källa att samla data ifrån. Dock har Facebook på senare år valt att vara mer restrik-tiv kring vem som får tillgång till data på deras plattform. Detta har öppnat ett intresseför hur man kan få tillgång till den data som delas på Facebooks plattform utan explicittillstånd från Facebook. Det öppnar samtidigt för frågor kring etik och legalitet gällandedetsamma. Detta arbete ämnade därför undersöka olika aspekter, så som tekniska, etiska,lagliga, kring att samla data från Facebooks plattform genom att utföra en litteraturstudiesamt experiment. Litteraturstudien visade att det var svårt att hitta material om vilkatekniska åtgärder som Facebook tar för att förhindra webbskrapning. Experimenten somgenomfördes visade en del av dessa, bland annat att HTML-strukturen förändras och attid för HTML-element förändras vid vissa händelser, vilket försvårar webbskrapningspro-cessen. Litteraturstudien visade även att det är besvärligt att veta vad som är lagligt attskrapa från Facebook och vad som är olagligt. Detta dels för att olika länder har olika lagaratt förhålla sig till när det kommer till webbskrapning, dels för att det kan vara svårt attveta vad som räknas som personlig data och som då skyddas av bland annat GDPR.
A vast amount of data is shared daily on social media platforms. Data that if it can becollected and sorted can prove valueable as a basis for research work. Especially in countrieswhere social media constitutes the only possible place for citizens to make their voicesheard. Facebook is one of the most frequently used social media platforms and thus can bea potential rich source from which data can be collected. But Facebook has become morerestrictive about who gets access to the data on their platform. This has created an interestin ways how to get access to the data that is shared on Facebooks platform without gettingexplicit approval from Facebook. At the same time it creates questions about the ethicsand the legality of it. This work intended to investigate different aspects, such as technical,ethical, legal, related to the collecting of data from Facebooks platform by performing aliterary review and experiments. The literary review showed that it was difficult to findmaterial regarding technical measures taken by Facebook to prevent web scraping. Theexperiments that were performed identified some of these measures, among others thatthe structure of the HTML code changes and that ids of HTML elements updates whendifferent events occur on the web page, which makes web scraping increasingly difficult.The literary review also showed that it is troublesome to know which data is legal to scrapefrom Facebook and which is not. This is partly due to the fact that different countries havedifferent laws to which one must conform when scraping web data, and partly that it canbe difficult to know what counts as personal data and thus is protected by GDPR amongother laws.
APA, Harvard, Vancouver, ISO, and other styles
10

Mascellaro, Maria Maddalena. "Integrazione di sorgenti eterogenee per un sistema di Data Visualization." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16818/.

Full text
Abstract:
Il caso di studio descritto all'interno del volume di tesi analizza le tecniche di raccolta dei dati sul Web e il modo in cui essi possono essere rappresentati attraverso le tecniche di Data Visualization. L’applicazione web realizzata permette all'utente di visionare le collaborazioni internazionali dei professori e dei ricercatori dell'Università di Bologna con istituzioni estere. L'obiettivo di questa tesi è quello di raccogliere le informazioni relative alle collaborazioni contenute in una banca dati chiamata Web of Science. Questi dati sono poi stati integrati con quelli già presenti all'interno dell’applicazione web. Per questo motivo sono state individuate due macro-fasi di lavoro: la raccolta dei dati e l’integrazione di essi con quelli già raccolti in precedenza nella banca dati Scopus. La prima fase è stata la più corposa all’interno di questo progetto di tesi, è stata effettuata con script Python che, attraverso la libreria WOS e le API di Web of Science, hanno estrapolato i dati dalla banca dati. Durante la seconda fase è stata modificata l’interfaccia del sito, permettendo all’utente di individuare quale fosse l’origine delle pubblicazioni esaminate. Un'altra funzionalità implementata è stata la versione multilingua del sito (italiano-inglese).
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Data Scraping"

1

vanden Broucke, Seppe, and Bart Baesens. Practical Web Scraping for Data Science. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-3582-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

MacDonald, Allyson, ed. Web Scraping with Python: Collecting More Data from the Modern Web. 2nd ed. Beijing: O’Reilly Media, 2018.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Python Web Scraping: Hands-on data scraping and crawling using PyQT, Selnium, HTML and Python. Packt Publishing, 2017.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Web Scraping with Python: Collecting Data from the Modern Web. O’Reilly Media, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Broucke, Seppe vanden. Practical Web Scraping for Data Science: Best Practices and Examples with Python. Apress, 2018.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Web Scraping with Python: Successfully scrape data from any website with the power of Python. Packt Publishing, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Python Automation Cookbook: 75 Python Automation Ideas for Web Scraping, Data Wrangling, and Processing Excel, Reports, Emails, and More, 2nd Edition. Packt Publishing, Limited, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Oliva Abarca, Jesús Eduardo. Cultura y Big Data. Métodos y técnicas para el análisis cultural en una sociedad datificada. Ediciones Comunicación Científica, 2021. http://dx.doi.org/10.52501/cc.014.

Full text
Abstract:
El propósito de este libro es ejemplificar los conceptos, métodos y aplicaciones de la analítica cultural, propuesta formulada por Lev Manovich, y que consiste en el empleo sistemático de herramientas y técnicas de la ciencia de datos para el análisis de datos masivos de fenómenos culturales. Con tal objetivo, se presentan tres estudios que ejemplifican diferentes flujos de trabajo desde este enfoque. El primero consiste en un análisis del micromecenazgo cultural y creativo en México y América Latina, así como en la elaboración de dos sistemas de recomendación a partir de la recolección automatizada de datos mediante “raspado de red” (web scraping), todo ello con la finalidad de brindar información relevante para la financiación de proyectos de artistas y creativos independientes. El segundo estudio es la aplicación de técnicas de procesamiento de lenguaje natural (natural language processing) a un corpus conformado por diversos tweets, a partir de los cuales se elabora un modelo de clasificación automática de textos. A partir del examen automatizado de atributos sintácticos y semánticos, se perciben las diferencias estructurales entre tweets clasificados como noticias, frases o reflexiones, y ficciones. En el tercer estudio se abordan las posibilidades y usos de la visión computacional para el análisis y modelado de sistemas de clasificación de imágenes de obras plásticas; para ello, se parte del método iconográfico-iconológico, así como del procesamiento automatizado de atributos visuales de piezas artísticas. Del desarrollo de estas investigaciones se ratifica la necesidad de fomentar abordajes interdisciplinarios para el análisis de la cultura.
APA, Harvard, Vancouver, ISO, and other styles
10

Bélair-Gagnon, Valérie, and Nikki Usher, eds. Journalism Research That Matters. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780197538470.001.0001.

Full text
Abstract:
Despite the looming crisis in journalism, a research–practice gap plagues the news industry. This volume seeks to change the research–practice gap, with timely scholarly research on the most pressing problems facing the news industry today, translated for a non-specialist audience. Contributions from academics and journalists are brought together in order to push a conversation about how to do the kind of journalism research that matters, meaning research that changes journalism for the better for the public and helps make journalism more financially sustainable. The book covers important concerns such as the financial survival of quality news and information, how news audiences consume (or don’t consume) journalism, and how issues such as race, inequality, and diversity must be addressed by journalists and researchers alike. The book addresses needed interventions in policy research and provides a guide to understanding buzzwords like “news literacy,” “data literacy,” and “data scraping” that are more complicated than they might initially seem. Practitioners provide suggestions for working together with scholars—from focusing on product and human-centered design to understanding the different priorities that media professionals and scholars can have, even when approaching collaborative projects. This book provides valuable insights for media professionals and scholars about news business models, audience research, misinformation, diversity and inclusivity, and news philanthropy. It offers journalists a guide on what they need to know, and a call to action for what kind of research journalism scholars can do to best help the news industry reckon with disruption.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Data Scraping"

1

Boehmke, Bradley C. "Scraping Data." In Use R!, 129–62. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-45599-0_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bressoud, Thomas, and David White. "Web Scraping." In Introduction to Data Systems, 681–714. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-54371-6_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhao, Bo. "Web Scraping." In Encyclopedia of Big Data, 1–3. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-32001-4_483-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhao, Bo. "Web Scraping." In Encyclopedia of Big Data, 951–53. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-32010-6_483.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Egger, Roman, Markus Kroner, and Andreas Stöckl. "Web Scraping." In Applied Data Science in Tourism, 67–82. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-88389-8_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Patel, Jay M. "Introduction to Web Scraping." In Getting Structured Data from the Internet, 1–30. Berkeley, CA: Apress, 2020. http://dx.doi.org/10.1007/978-1-4842-6576-5_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nolan, Deborah, and Duncan Temple Lang. "Scraping Data from HTML Forms." In Use R!, 315–38. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4614-7900-0_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

vanden Broucke, Seppe, and Bart Baesens. "From Web Scraping to Web Crawling." In Practical Web Scraping for Data Science, 155–72. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-3582-9_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

vanden Broucke, Seppe, and Bart Baesens. "Introduction." In Practical Web Scraping for Data Science, 3–23. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-3582-9_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

vanden Broucke, Seppe, and Bart Baesens. "The Web Speaks HTTP." In Practical Web Scraping for Data Science, 25–48. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-3582-9_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Data Scraping"

1

Beno, Miloslav, Jakub Misek, and Filip Zavoral. "AgentMat: Framework for data scraping and semantization." In 2009 Third International Conference on Research Challenges in Information Science (RCIS). IEEE, 2009. http://dx.doi.org/10.1109/rcis.2009.5089286.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Thomas, David Mathew, and Sandeep Mathur. "Data Analysis by Web Scraping using Python." In 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, 2019. http://dx.doi.org/10.1109/iceca.2019.8822022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Diouf, Rabiyatou, Edouard Ngor Sarr, Ousmane Sall, Babiga Birregah, Mamadou Bousso, and Seny Ndiaye Mbaye. "Web Scraping: State-of-the-Art and Areas of Application." In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019. http://dx.doi.org/10.1109/bigdata47090.2019.9005594.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

"A SEMANTIC SCRAPING MODEL FOR WEB RESOURCES - Applying Linked Data to Web Page Screen Scraping." In 3rd International Conference on Agents and Artificial Intelligence. SciTePress - Science and and Technology Publications, 2011. http://dx.doi.org/10.5220/0003185704510456.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chaulagain, Ram Sharan, Santosh Pandey, Sadhu Ram Basnet, and Subarna Shakya. "Cloud Based Web Scraping for Big Data Applications." In 2017 IEEE International Conference on Smart Cloud (SmartCloud). IEEE, 2017. http://dx.doi.org/10.1109/smartcloud.2017.28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

PRATIBA, D., ABHAY M.S., AKHIL DUA, Giridhar K. SHANBHAG, NEEL BHANDARI, and UTKARSH SINGH. "Web Scraping And Data Acquisition Using Google Scholar." In 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS). IEEE, 2018. http://dx.doi.org/10.1109/csitss.2018.8768777.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ertam, Fatih. "Deep learning based text classification with Web Scraping methods." In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE, 2018. http://dx.doi.org/10.1109/idap.2018.8620790.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

You, Jaebeom, Jaekyu Lee, and Hyuk-Yoon Kwon. "A Complete and Fast Scraping Method for Collecting Tweets." In 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 2021. http://dx.doi.org/10.1109/bigcomp51126.2021.00014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Park, Andrew J., Ruhi Naaz Quadari, and Herbert H. Tsang. "Phishing website detection framework through web scraping and data mining." In 2017 8th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). IEEE, 2017. http://dx.doi.org/10.1109/iemcon.2017.8117212.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Fatmasari, Yesi Novaria Kunang, and Susan Dian Purnamasari. "Web Scraping Techniques to Collect Weather Data in South Sumatera." In 2018 International Conference on Electrical Engineering and Computer Science (ICECOS). IEEE, 2018. http://dx.doi.org/10.1109/icecos.2018.8605202.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography