Log in

Relevant bibliographies by topics / Data Scraping / Journal articles

To see the other types of publications on this topic, follow the link: Data Scraping.

Journal articles on the topic 'Data Scraping'

Author: Grafiati

Published: 4 June 2021

Last updated: 25 April 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data Scraping.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Khder, Moaiad. "Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application." International Journal of Advances in Soft Computing and its Applications 13, no. 3 (November 28, 2021): 145–68. http://dx.doi.org/10.15849/ijasca.211128.11.

Full text

Abstract:

Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.

APA, Harvard, Vancouver, ISO, and other styles

2

Padghan, Sameer, Satish Chigle, and Rahul Handoo. "Web Scraping-Data Extraction Using Java Application and Visual Basics Macros." Journal of Advances and Scholarly Researches in Allied Education 15, no. 2 (April 1, 2018): 691–95. http://dx.doi.org/10.29070/15/56996.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Scassa, Teresa. "Ownership and control over publicly accessible platform data." Online Information Review 43, no. 6 (October 14, 2019): 986–1002. http://dx.doi.org/10.1108/oir-02-2018-0053.

Full text

Abstract:

Purpose The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse these data. Design/methodology/approach Using Airbnb as a case study, this paper examines the data ecosystem that arises around publicly accessible platform data. It analyzes current statute and case law in order to understand the state of the law around the scraping of such data. Findings This paper demonstrates that there is considerable uncertainty about the practice of data scraping, and that there are risks in allowing the law to evolve in the context of battles between business competitors without a consideration of the broader public interest in data scraping. It argues for a data ecosystem approach that can keep the public dimension issues more squarely within the frame when data scraping is judicially considered. Practical implications The nature of some sharing economy platforms requires that a large subset of their data be publicly accessible. These data can be used to understand how platform companies operate, to assess their compliance with laws and regulations and to evaluate their social and economic impacts. They can also be used in different kinds of data analytics. Such data are therefore sought after by civil society organizations, researchers, entrepreneurs and regulators. This paper considers who has a right to control access to and use of these data, and addresses current uncertainties in how the law will apply to scraping activities, and builds an argument for a consideration of the public interest in data scraping. Originality/value The issue of ownership/control over publicly accessible information is of growing importance; this paper offers a framework for approaching these legal questions.

APA, Harvard, Vancouver, ISO, and other styles

4

Maślankowski, Jacek. "The collection and analysis of the data on job advertisements with the use of big data." Wiadomości Statystyczne. The Polish Statistician 64, no. 9 (September 30, 2019): 60–74. http://dx.doi.org/10.5604/01.3001.0013.7590.

Full text

Abstract:

The goal of this paper is to present, on the one hand, the benefits for offi-cial statistics (labour market) resulting from the use of web scraping methods to gather data on job advertisements from websites belonging to big data compilations, and on the other, the challenges connected to this process. The paper introduces the results of experimental research where web-scraping and text-mining methods were adopted. The analysis was based on the data from 2017–2018 obtained from the most popular job-searching websites, which was then collated with Statistics Poland’s data obtained from Z-05 forms. The above-mentioned analysis demonstrated that web-scraping methods canbe adopted by public statistics services to obtain statistical data from alternative sourcescomplementing the already-existing databases, providing the findings of such researchremain coherent with the results of the already-existing studies.

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Yuguang, Dengyun Zhu, Bin Zhang, Qi Guo, Fucheng Wan, and Ning Ma. "Review of data scraping and data mining research." Journal of Physics: Conference Series 1982, no. 1 (July 1, 2021): 012161. http://dx.doi.org/10.1088/1742-6596/1982/1/012161.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Maulana, Afrizal Aziz, Ajib Susanto, and Desi Purwanti Kusumaningrum. "Rancang Bangun Web Scraping Pada Marketplace di Indonesia." JOINS (Journal of Information System) 4, no. 1 (July 1, 2019): 41–53. http://dx.doi.org/10.33633/joins.v4i1.2544.

Full text

Abstract:

E-commerce dan marketplace berkaitan dengan sistem dropship. Dropship merupakan istilah dari jual beli dimana drop shipper (pengecer) tidak memiliki barang. Drop shipper kini masih menggunakan cara manual dalam mendapatkan data barang dari supplier dan unggah yaitu dengan mengambil data barang secara satu persatu dan mengunggah manual satu persatu yang membutuhkan waktu cukup lebih. Pada penelitian ini dibangun aplikasi terbaru untuk membantu drop shipper dalam mendapatkan data produk dan mengunggahnya secara otomatis. Pengembangan sistem yang digunakan adalah waterfall dengan alur proses dari analisis kebutuhan, perancangan sistem, implementasi sistem, pengujian dan pemeliharaan sistem. Penelitian ini menghasilkan sebuah aplikasi yang dapat mengambil data / scraping barang pada sebuah toko supplier yang kemudian mendapatkan hasil pengambilan data dalam bentuk .csv. Kemudian dilakukan proses unggah secara otomatis dengan hanya memasukkan nama file hasil pengambilan data yang berformat .csv, maka data otomatis terunggah ke toko drop shipper. Hasil pengujian web scrapping berhasil dilakukan dengan mengambil data produk dari marketplace Tokopedia, Shoopee dan diunggah ke e-commerce Afrizal22hop.Kata kunci : Marketplace, E-commerce, Dropship, Drop shipper, Web Scraping

APA, Harvard, Vancouver, ISO, and other styles

7

Speckmann, Felix. "Web Scraping." Zeitschrift für Psychologie 229, no. 4 (December 2021): 241–44. http://dx.doi.org/10.1027/2151-2604/a000470.

Full text

Abstract:

Abstract. When people use the Internet, they leave traces of their activities: blog posts, comments, articles, social media posts, etc. These traces represent behavior that psychologists can analyze. A method that makes downloading those sometimes very large datasets feasible is web scraping, which involves writing a program to automatically download specific parts of a website. The obtained data can be used to exploratorily generate new hypotheses, test existing ones, or extend existing research. The present Research Spotlight explains web scraping and discusses the possibilities, limitations as well as ethical and legal challenges associated with the approach.

APA, Harvard, Vancouver, ISO, and other styles

8

Krotov, Vlad, and Matthew Tennyson. "Research Note: Scraping Financial Data from the Web Using the R Language." Journal of Emerging Technologies in Accounting 15, no. 1 (February 1, 2018): 169–81. http://dx.doi.org/10.2308/jeta-52063.

Full text

Abstract:

ABSTRACT The main goal of this research note is to educate business researchers on how to automatically scrape financial data from the World Wide Web using the R programming language. This paper is organized into the following main parts. The first part provides a conceptual overview of the web scraping process. The second part educates the reader about the Rvest package—a popular tool for browsing and downloading web data in R. The third part educates the reader about the main functions of the XBRL package. The XBRL package was developed specifically for working with financial data distributed using the XBRL format in the R environment. The fourth part of this paper presents an example of a relatively complex web scraping task implemented using the R language. This complex web scraping task involves using both the Rvest and XBRL packages for the purposes of retrieving, preprocessing, and organizing financial and nonfinancial data related to a company from various sources and using different data forms. The paper ends with some concluding remarks on how the web scraping approach presented in this paper can be useful in other research projects involving financial and nonfinancial data.

APA, Harvard, Vancouver, ISO, and other styles

9

Rao, M. Kameswara, Rohit Lagisetty, M. S. V. K. Maniraj, K. N. S. Dattu, and B. Sneha Ganga. "Commodity Price Data Analysis Using Web Scraping." International Journal of Advances in Applied Sciences 4, no. 4 (December 1, 2015): 146. http://dx.doi.org/10.11591/ijaas.v4.i4.pp146-150.

Full text

Abstract:

<p>Today, analysis of data which is available on the web has become more popular, by using such data we are capable to solve many issues. Our project deals with the analysis of commodity price data available on the web. In general, commodity price data analysis is performed to know inflation rate prevailing in the country and also to know cost price index (CPI). Presently in some countries this analysis is done manually by collecting data from different cities, then calculate inflation and CPI using some predefined formulae. To make this entire process automatic we are developing this project. Now a day’s most of the customers are depending on online websites for their day to day purchases. This is the reason we are implementing a system to collect the data available in various e-commerce sites for commodity price analysis. Here, we are going to introduce a data scraping technique which enables us to collect data of various products available online and then store it in a database there after we perform analysis on them. By this process we can reduce the burden of collecting data manually by reaching various cities. The system consists of web module which perform analysis and visualization of data available in the database.</p>

APA, Harvard, Vancouver, ISO, and other styles

10

Gallagher, John R., and Aaron Beveridge. "Project-Oriented Web Scraping in Technical Communication Research." Journal of Business and Technical Communication 36, no. 2 (December 13, 2021): 231–50. http://dx.doi.org/10.1177/10506519211064619.

Full text

Abstract:

This article advocates for web scraping as an effective method to augment and enhance technical and professional communication (TPC) research practices. Web scraping is used to create consistently structured and well-sampled data sets about domains, communities, demographics, and topics of interest to TPC scholars. After providing an extended description of web scraping, the authors identify technical considerations of the method and provide practitioner narratives. They then describe an overview of project-oriented web scraping. Finally, they discuss implications for the concept as a sustainable approach to developing web scraping methods for TPC research.

APA, Harvard, Vancouver, ISO, and other styles

11

Hillen, Judith. "Web scraping for food price research." British Food Journal 121, no. 12 (November 21, 2019): 3350–61. http://dx.doi.org/10.1108/bfj-02-2019-0081.

Full text

Abstract:

Purpose The purpose of this paper is to discuss web scraping as a method for extracting large amounts of data from online sources. The author wants to raise awareness of the method’s potential in the field of food price research, hoping to enable fellow researchers to apply this method. Design/methodology/approach The author explains the technical procedure of web scraping, reviews the existing literature, and identifies areas of application and limitations for food price research. Findings The author finds that web scraping is a promising method to collect customised, high-frequency data in real time, overcoming several limitations of currently used food price data sources. With today’s applications mostly focussing on (online) consumer prices, the scope of applications for web scraping broadens as more and more price data are published online. Research limitations/implications To better deal with the technical and legal challenges of web scraping and to exploit its scalability, joint data collection projects in the field of agricultural and food economics should be considered. Originality/value In agricultural and food economics, web scraping as a data collection technique has received little attention. This is one of the first articles to address this topic with particular focus on food price analysis.

APA, Harvard, Vancouver, ISO, and other styles

12

Djufri, Mohammad. "PENERAPAN TEKNIK WEB SCRAPING UNTUK PENGGALIAN POTENSI PAJAK (STUDI KASUS PADA ONLINE MARKET PLACE TOKOPEDIA, SHOPEE DAN BUKALAPAK)." Jurnal BPPK : Badan Pendidikan dan Pelatihan Keuangan 13, no. 2 (December 23, 2020): 65–75. http://dx.doi.org/10.48108/jurnalbppk.v13i2.636.

Full text

Abstract:

Currently, millions of transaction data are avaliable on the internet, which can be retrieved and analyzed for excavating potential taxes. This article aims to examine whether the search data through web scraping techniques can be applied in an attempt to excavate the potential tax by the Account Representative. This paper uses an informetric approach, which will be examined quantitative information in the form of transaction data of sellers recorded on the three online marketplace (OMP) namely Tokopedia, Shopee and Bukalapak. The results show that web scraping techniques can be used for extracting potential taxes, and the best web scraping technique that can be done by the Directorate General of Taxation (DJP) is to develop its own integrated web scraping application as a Business Intelligence system. The results of this research are expected to contribute academically in the form of the use of web scraping in data extraction for the excavation of potential taxes and policy implications in terms of data search through the internet by the Directorate General of Taxation

APA, Harvard, Vancouver, ISO, and other styles

13

Djufri, Mohammad. "PENERAPAN TEKNIK WEB SCRAPING UNTUK PENGGALIAN POTENSI PAJAK (STUDI KASUS PADA ONLINE MARKET PLACE TOKOPEDIA, SHOPEE DAN BUKALAPAK)." Jurnal BPPK : Badan Pendidikan dan Pelatihan Keuangan 13, no. 2 (December 23, 2020): 65–75. http://dx.doi.org/10.48108/jurnalbppk.v13i2.636.

Full text

Abstract:

Currently, millions of transaction data are avaliable on the internet, which can be retrieved and analyzed for excavating potential taxes. This article aims to examine whether the search data through web scraping techniques can be applied in an attempt to excavate the potential tax by the Account Representative. This paper uses an informetric approach, which will be examined quantitative information in the form of transaction data of sellers recorded on the three online marketplace (OMP) namely Tokopedia, Shopee and Bukalapak. The results show that web scraping techniques can be used for extracting potential taxes, and the best web scraping technique that can be done by the Directorate General of Taxation (DJP) is to develop its own integrated web scraping application as a Business Intelligence system. The results of this research are expected to contribute academically in the form of the use of web scraping in data extraction for the excavation of potential taxes and policy implications in terms of data search through the internet by the Directorate General of Taxation

APA, Harvard, Vancouver, ISO, and other styles

14

Midhu Bala, G., and K. Chitra. "Data Extraction and Scratching Information Using R." Shanlax International Journal of Arts, Science and Humanities 8, no. 3 (January 1, 2021): 140–44. http://dx.doi.org/10.34293/sijash.v8i3.3588.

Full text

Abstract:

Web scraping is the process of automatically extracting multiple WebPages from the World Wide Web. It is a field with active developments that shares a common goal with text processing, the semantic web vision, semantic understanding, machine learning, artificial intelligence and human- computer interactions. Current web scraping solutions range from requiring human effort, the ad-hoc, and to fully automated systems that are able to extract the required unstructured information, convert into structured information, with limitations. This paper describes a method for developing a web scraper using R programming that locates files on a website and then extracts the filtered data and stores it. The modules used and the algorithm of automating the navigation of a website via links are mentioned in this paper. Further it can be used for data analytics.

APA, Harvard, Vancouver, ISO, and other styles

15

Tong, Hui Fen, Wei Liu, Yan Chi, and Wei Wang. "Design of Automatic Scraping System." Advanced Materials Research 853 (December 2013): 625–30. http://dx.doi.org/10.4028/www.scientific.net/amr.853.625.

Full text

Abstract:

Based on the principle of hand scraping a set of automatic scraping system is designed in this paper. Machine tool bed consists of supporting chassis and cast iron sidewall, which has little deformation, good rigidity and high mechanical strength. And the operating floor features 3-Degree of Freedom with high precision. Guideway adopts imported linear guideway, featuring high precision, high load - bearing redundancy and long life. Scraping tool holder is stably installed in the hole at the end of the piston rod. By the electrical control system the cylinder can control scraping tool to do reciprocal motion stably. Scraping points on the surface of workpieces are captured by a CCD camera with high detection efficiency and accurate test data. The results show that the system is designed to greatly reduce labor intensity and improve scraping efficiency under the precondition of keeping good scraping quality.

APA, Harvard, Vancouver, ISO, and other styles

16

Bradley, Alex, and Richard J. E. James. "Web Scraping Using R." Advances in Methods and Practices in Psychological Science 2, no. 3 (July 30, 2019): 264–70. http://dx.doi.org/10.1177/2515245919859535.

Full text

Abstract:

The ubiquitous use of the Internet in daily life means that there are now large reservoirs of data that can provide fresh insights into human behavior. One of the key barriers preventing more researchers from utilizing online data is that they do not have the skills to access the data. This Tutorial addresses this gap by providing a practical guide to scraping online data using the popular statistical language R. Web scraping is the process of automatically collecting information from websites. Such information can take the form of numbers, text, images, or videos. This Tutorial shows readers how to download web pages, extract information from those pages, store the extracted information, and do so across multiple pages of a website. A website has been created to assist readers in learning how to web-scrape. This website contains a series of examples that illustrate how to scrape a single web page and how to scrape multiple web pages. The examples are accompanied by videos describing the processes involved and by exercises to help readers increase their knowledge and practice their skills. Example R scripts have been made available at the Open Science Framework.

APA, Harvard, Vancouver, ISO, and other styles

17

Munro, Ken. "Android scraping: accessing personal data on mobile devices." Network Security 2014, no. 11 (November 2014): 5–9. http://dx.doi.org/10.1016/s1353-4858(14)70111-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Bonifacio, Charmaine, Thomas E. Barchyn, Chris H. Hugenholtz, and Stefan W. Kienzle. "CCDST: A free Canadian climate data scraping tool." Computers & Geosciences 75 (February 2015): 13–16. http://dx.doi.org/10.1016/j.cageo.2014.10.010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Chiapponi, Elisa, Marc Dacier, Onur Catakoglu, Olivier Thonnard, and Massimiliano Todisco. "Scraping Airlines Bots: Insights Obtained Studying Honeypot Data." International Journal of Cyber Forensics and Advanced Threat Investigations 2, no. 1 (May 22, 2021): 3–28. http://dx.doi.org/10.46386/ijcfati.v2i1.23.

Full text

Abstract:

Airline websites are the victims of unauthorised online travel agencies and aggregators that use armies of bots to scrape prices and flight information. These so-called Advanced Persistent Bots (APBs) are highly sophisticated. On top of the valuable information taken away, these huge quantities of requests consume a very substantial amount of resources on the airlines' websites. In this work, we propose a deceptive approach to counter scraping bots. We present a platform capable of mimicking airlines' sites changing prices at will. We provide results on the case studies we performed with it. We have lured bots for almost 2 months, fed them with indistinguishable inaccurate information. Studying the collected requests, we have found behavioural patterns that could be used as complementary bot detection. Moreover, based on the gathered empirical pieces of evidence, we propose a method to investigate the claim commonly made that proxy services used by web scraping bots have millions of residential IPs at their disposal. Our mathematical models indicate that the amount of IPs is likely 2 to 3 orders of magnitude smaller than the one claimed. This finding suggests that an IP reputation-based blocking strategy could be effective, contrary to what operators of these websites think today.

APA, Harvard, Vancouver, ISO, and other styles

20

Rico, Noelia, Susana Montes, and Irene Díaz. "Scraping Relative Chord Progressions Data for Genre Classification." Advances in Artificial Intelligence and Machine Learning 01, no. 01 (2021): 66–81. http://dx.doi.org/10.54364/aaiml.2021.1105.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Paramartha, Dede Yoga, Ana Lailatul Fitriyani, and Setia Pramana. "Development of Automated Environmental Data Collection System and Environment Statistics Dashboard." Indonesian Journal of Statistics and Its Applications 5, no. 2 (June 30, 2021): 314–25. http://dx.doi.org/10.29244/ijsa.v5i2p314-325.

Full text

Abstract:

Environmental data such as pollutants, temperature, and humidity are data that have a role in the agricultural sector in predicting rainfall conditions. In fact, pollutant data is common to be used as a proxy to see the density of industry and transportation. With this need, it is necessary to have automated data from outside websites that are able to provide data faster than satellite confirmation. Data sourced from IQair, can be used as a benchmark or confirmative data for weather and environmental statistics in Indonesia. Data is taken by scraping method on the website. Scraping is done on the API available on the website. Scraping is divided into 2 stages, the first is to determine the location in Indonesia, the second is to collect statistics such as temperature, humidity, and pollutant data (AQI). The module used in python is the scrapy module, where the crawling is effective starting from May 2020. The data is recorded every three hours for all regions of Indonesia and directly displayed by the Power BI-based dashboard. We also illustrated that AQI data can be used as a proxy for socio-economic activity and also as an indicator in monitoring green growth in Indonesia.

APA, Harvard, Vancouver, ISO, and other styles

22

Annisa, Cholifa Fitri, and Setia Pramana. "KAJIAN PEMANFAATAN DATA GOOGLE MAPS DALAM OFFICIAL STATISTICS." Seminar Nasional Official Statistics 2020, no. 1 (January 5, 2021): 328–37. http://dx.doi.org/10.34123/semnasoffstat.v2020i1.614.

Full text

Abstract:

Publikasi statistik usaha penyediaan makan minum yang diterbitkan oleh BPS tidak bisa memfasilitasi pebisnis dalam mengidentifikasikan daerah yang berpotensi memiliki kemampuan untuk dikembangkan usaha pada sektor penyediaan makan dan minum. Selain itu, adanya keterbatasan waktu, biaya, dan tenaga dalam pengumpulan data oleh Subdirektorat Pariwisata BPS pada survei VREST sehingga, menyebabkan statistik penyediaan makan minum tidak bisa di terbitkan sesuai metodologi yaitu setiap tahun. Penelitian ini memanfaatkan metode web scraping untuk mendapatkan data usaha penyedia makan minum dari situs web google maps. Jumlah data yang terkumpul sebanyak 34.526 usaha penyedia makan minum di Pulau Jawa dan Bali. Hasil nilai pencocokan data hasil web scraping dengan data frame BPS menunjukkan persentase kemiripan (match) sebesar 68,22%. Provinsi Bali adalah daerah yang memiliki potensi untuk mengembangkan usaha penyediaan makanan minuman terkhusus pada Kota/Kabupaten Jembrana, Buleleng, Tabanan, Karangasem, dan Klungkung. Sedangkan, provinsi Jawa Tengah adalah daerah yang memiliki potensi untuk mengembangkan usaha akomodasi terkhusus pada Kota/Kabupaten Cilacap, Blora, Grobogan, Batang, dan Kendal.

APA, Harvard, Vancouver, ISO, and other styles

23

A. Yani, Dhita Deviacita, Helen Sasty Pratiwi, and Hafiz Muhardi. "Implementasi Web Scraping untuk Pengambilan Data pada Situs Marketplace." Jurnal Sistem dan Teknologi Informasi (JUSTIN) 7, no. 4 (October 30, 2019): 257. http://dx.doi.org/10.26418/justin.v7i4.30930.

Full text

Abstract:

Perdagangan elektronik atau e-commerce merupakan proses penyebaran, pembelian, penjualan, pemasaran barang dan jasa melalui sistem elektronik seperti internet. E-commerce dapat melibatkan transfer dana elektronik, pertukaran data elektronik, sistem manajemen inventori otomatis, dan sistem pengumpulan data otomatis. Di Indonesia, terdapat beberapa perusahaan yang bergerak dalam bidang e-commerce, yang beberapa diantaranya adalah Bukalapak, Elevenia, dan JD.id. Setiap perusahaan tersebut memiliki situs web dimana calon pembeli dapat mencari, memilih, dan membeli produk yang diinginkan. Agar dapat mengambil keputusan yang terbaik ketika melakukan proses perbelanjaan, selain seorang calon pembeli yang harus melakukan pencarian di beberapa situs marketplace mengenai produk yang hendak dibeli secara manual juga mereka ingin mencari produk yang terbaik berdasarkan jumlah produk dengan penjualan terbanyak atau terlaris. Dengan bantuan aplikasi web pencarian terbaik dan teknik web scraping ini akan mampu melakukan pencarian di beberapa situs marketplace dan menampilkan hasil pencarian secara bersamaan. Menurut hasil pengujian dengan menggunakan white box testing dan black box testing, sistem telah mampu memberikan hasil terbaik produk dari gabungan hasil pencarian di tiga situs web marketplace sesuai kata kunci yang dimasukkan oleh pengguna.

APA, Harvard, Vancouver, ISO, and other styles

24

Liu, Wei, and Ping Chen. "Justification of the behavior regulatory pattern on data scraping." Computer Law & Security Review 43 (November 2021): 105578. http://dx.doi.org/10.1016/j.clsr.2021.105578.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Liu, Wei, and Ping Chen. "Justification of the behavior regulatory pattern on data scraping." Computer Law & Security Review 43 (November 2021): 105578. http://dx.doi.org/10.1016/j.clsr.2021.105578.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Gausling, Tina. "Kommerzialisierung öffentlich-zugänglicher Informationen im Wege des Data Scraping." Computer und Recht 37, no. 9 (September 1, 2021): 609–14. http://dx.doi.org/10.9785/cr-2021-370913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Himawan, Arif, Adri Priadana, and Aris Murdiyanto. "Implementation of Web Scraping to Build a Web-Based Instagram Account Data Downloader Application." IJID (International Journal on Informatics for Development) 9, no. 2 (December 31, 2020): 59–65. http://dx.doi.org/10.14421/ijid.2020.09201.

Full text

Abstract:

Instagram has been used by many groups, such as business people, academics, to politicians, to take advantage of the insights gained by processing and analyzing Instagram data for various purposes. However, before processing and analyzing data, users must first pass data collection or downloading from Instagram. The problem faced is that most data collection methods are still done manually as for many parties that offer Instagram account data download services with various price options. This research applied a web scraping method to automatically build a web-based Instagram account data download application so that several parties can use it. The web scraping method was chosen because by using this method, researchers do not need to use Instagram's Application Programming Interface (API), which has access restrictions in retrieving data on Instagram. In this study, application testing was conducted on 15 Instagram accounts with various publications, namely between 100 and 11000. Based on the download data analysis results, the application of the web scraping method to download Instagram account data can successfully download a maximum of 2412 account data. In this application, users can download Instagram account data to Data Collection and then manage it like deleting and exporting data collection in the form of CSV, Excel, or JSON.

APA, Harvard, Vancouver, ISO, and other styles

28

Black, Michael L. "The World Wide Web as Complex Data Set: Expanding the Digital Humanities into the Twentieth Century and Beyond through Internet Research." International Journal of Humanities and Arts Computing 10, no. 1 (March 2016): 95–109. http://dx.doi.org/10.3366/ijhac.2016.0162.

Full text

Abstract:

While intellectual property protections effectively frame digital humanities text mining as a field primarily for the study of the nineteenth century, the Internet offers an intriguing object of study for humanists working in later periods. As a complex data source, the World Wide Web presents its own methodological challenges for digital humanists, but lessons learned from projects studying large nineteenth century corpora offer helpful starting points. Complicating matters further, legal and ethical questions surrounding web scraping, or the practice of large scale data retrieval over the Internet, will require humanists to frame their research to distinguish it from commercial and malicious activities. This essay reviews relevant research in the digital humanities and new media studies in order to show how web scraping might contribute to humanities research questions. In addition to recommendations for addressing the complex concerns surrounding web scraping this essay also provides a basic overview of the process and some recommendations for resources.

APA, Harvard, Vancouver, ISO, and other styles

29

Mancosu, Moreno, and Federico Vegetti. "What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data." Social Media + Society 6, no. 3 (July 2020): 205630512094070. http://dx.doi.org/10.1177/2056305120940703.

Full text

Abstract:

In reaction to the Cambridge Analytica scandal, Facebook has restricted the access to its Application Programming Interface (API). This new policy has damaged the possibility for independent researchers to study relevant topics in political and social behavior. Yet, much of the public information that the researchers may be interested in is still available on Facebook, and can be still systematically collected through web scraping techniques. The goal of this article is twofold. First, we discuss some ethical and legal issues that researchers should consider as they plan their collection and possible publication of Facebook data. In particular, we discuss what kind of information can be ethically gathered about the users (public information), how published data should look like to comply with privacy regulations (like the GDPR), and what consequences violating Facebook’s terms of service may entail for the researcher. Second, we present a scraping routine for public Facebook posts, and discuss some technical adjustments that can be performed for the data to be ethically and legally acceptable. The code employs screen scraping to collect the list of reactions to a Facebook public post, and performs a one-way cryptographic hash function on the users’ identifiers to pseudonymize their personal information, while still keeping them traceable within the data. This article contributes to the debate around freedom of internet research and the ethical concerns that might arise by scraping data from the social web.

APA, Harvard, Vancouver, ISO, and other styles

30

Chanda, Siddhant Vinayak, and Arivoli A. "Web Scraping in Finance using Python." International Journal of Engineering and Advanced Technology 9, no. 5 (June 30, 2020): 255–62. http://dx.doi.org/10.35940/ijeat.e9457.069520.

Full text

Abstract:

The objective of this paper is to highlight different ways to extract financial data ( Balance Sheet, Income Statement and Cash Flow) of different companies from Yahoo finance and present an elaborate model to provide an economical, reliable and, a time-efficient tool for this purpose. It aims at aiding business analysts who are not well versed with coding but need quantitative outputs to analyse, predict, and make market decisions, by automating the process of generation of financial data. A python model is used, which scrapes the required data from Yahoo finance and presents it in a precise and concise manner in the form of an Excel sheet. A web application is build using python with a minimalistic and simple User Interface to facilitate this process. This proposed method not only removes any chances of human error caused due to manual extraction of data but also improves the overall productivity of analysts by drastically reducing the time it takes to generate the data and thus saves a substantial amount of human hours for the consumer. We also discuss the importance of data mining and scraping technologies in the finance industry, different methods of scraping online data, and the legal aspect of web scraping which is highly dependent on generated data to analyse and make decisions.

APA, Harvard, Vancouver, ISO, and other styles

31

Victoriano, Jayson, Jaime Pulumbarit, Luisito Lacatan, Richard Albert Salivio, and Rica Louise Barawid. "Data Analysis of Bulacan State University Faculty Scientific Publication Based on Google Scholar using Web Data Scraping Technique." International Journal of Computing Sciences Research 6 (January 31, 2022): 976–87. http://dx.doi.org/10.25147/ijcsr.2017.001.1.85.

Full text

Abstract:

Purpose – The paper aims to analyze and monitor the research publication productivity of the faculty members of Bulacan State University. This paper compiles all the scientific publications from Bulacan State University (BulSU) and its external campuses as an index in Google Scholar. This study was intended to track and monitor the scientific productivity of the faculty members of Bulacan State University and each college and campus. Method – With the use of web data scraping techniques, metadata files were gathered from various faculty research engagement in Google Scholar Database. Web scraping is the process of extracting useful information from web pages (HTML pages) and saving it in a format that you specify, such as Excel (.xls), CSV (comma-separated values), or any other structured format using PurseHub. Different stages of data extraction are completed before descriptive insight is provided to the university. The researchers used the web data scraping techniques to extract the necessary data in the web through the Google Scholar database to specifically gather the research outputs of the faculty members of Bulacan State University. Results – Results have shown that a significant increase that happened throughout the covered years on the number of publications was made possible by the scholarships and study breaks that the faculty members were able to get and a significant increase in the number of citations was brought by the benefits of collaborating with other researchers from other HEIs. Conclusion – Bulacan State University has seen both big improvements and stagnation with the growth of the number of publications being produced by the faculty members. Based on the results of the study, the publications of BulSU have experienced a significant increase due to the impact of scholarship and the benefits that comes with it to the productivity of the faculty members. On the other hand, the factor that affected the significant increase in citations is the collaborators with researchers from other academic institutions. Recommendation – The administration should consider doing necessary steps to encourage the faculty members who engage in research activities to create Google Scholar accounts for easy monitoring of publications and citations. This would allow the easier navigation of the research studies produced by the faculty members which will help the university monitor the progress of research activity effectively and for an impressive research activity, the researchers also recommend that the administration should also provide competitive incentives and positive reinforcements. The faculty, on the other hand, are recommended to broaden their horizon and engage in collaborations with other researchers from other academic institutions for wider perspective of things. Practical Implication – Maximizing the use of data scraping in gathering information about research activity from vast networks will make the navigation and organization of research activity easier. With this, monitoring the progress of faculty research productivity will also be done in ease.

APA, Harvard, Vancouver, ISO, and other styles

32

Desstya, Anatri, Zuhdan K. Prasetyo, Suyanta ., and Fitri April Yanti. "SCIENCE CONCEPT IN KEROKAN." Humanities & Social Sciences Reviews 7, no. 3 (April 30, 2019): 374–81. http://dx.doi.org/10.18510/hssr.2019.7355.

Full text

Abstract:

Purpose: The aim of the study was to describe the scientific concepts of scraping (kerokan). The first step in this study was literature reviewed from various sources. Methodology: This type of research is qualitative research through the preparation phase, data collection, and data analysis. Data were collected by conducting interviews in the Javanese community. Results: The process of scraping there was a scientific concept, which was reflected in the concept of motion that generates heat and then opens the pores of the body resulting in better metabolism. Widening which was marked by enlargement of the vascular diameter and also by the migration of white blood cells, the immune agents that were deceived because they thought the body had wounds. The function of blood cells is to attack the viruses and bacteria that may exist, so the attacker can be eradicated. Implications: Therefore, Scraping also triggers cardiovascular reactions. Body temperature rises slightly, about 0.5 to 2 degrees Celsius. Increasing the temperature causes a faster chemical reaction that occurs in cardiovascular. This research was expected to open the public’s knowledge of doubts against scraping (kerokan)

APA, Harvard, Vancouver, ISO, and other styles

33

Xia, Guoqing, Chun Liu, Chong Xu, and Tiancheng Le. "Dynamic Analysis of the High-Speed and Long-Runout Landslide Movement Process Based on the Discrete Element Method: A Case Study of the Shuicheng Landslide in Guizhou, China." Advances in Civil Engineering 2021 (February 28, 2021): 1–16. http://dx.doi.org/10.1155/2021/8854194.

Full text

Abstract:

On July 23, 2019, a high-speed and long-runout landslide occurred in Jichang Town, Shuicheng County, Guizhou Province, China, causing 42 deaths and 9 missing. This paper used the discrete element software MatDEM to construct a three-dimensional discrete element model based on digital elevation data and then simulated and analyzed the movement and accumulation process of the landslide. The maximum average velocity of the source area elements reached 14 m/s when passed through the scraping area; meanwhile, the velocity of the scraping area elements increased rapidly. At 90 s, the maximum displacement of the source area elements reached 1358.5 m. The heat generated during the movement of the landslide was mainly the frictional heat, and the frictional heat increased sharply when the source area elements passed through the scraping area. The change of frictional heat has a certain positive correlation with the velocity of the scraping area elements. Finally, the volume of the scraping area elements was 2.4 times greater than the source area elements in the deposits. The scraping effect increases the volume of the sliding body and expands the impact area of the landslide disaster. Additionally, by setting different compressive and tensile strengths as well as internal friction coefficients to analyze the influences of their value changes on the landslide movement process, the results show that the smaller the strengths and internal friction coefficient of the model, the greater the depth and area of the scraping area, which will result in a thicker accumulation; meanwhile, the average displacement, average velocity, and heat will also increase.

APA, Harvard, Vancouver, ISO, and other styles

34

Dameani, Tiara. "Analisis Panel Data Web Scraping Artikel Kekerasan Dalam Rumah Tangga Tahun 2019- 2020 di DKI Jakarta." Jurnal Teknologi Informasi 7, no. 1 (June 30, 2021): 43–49. http://dx.doi.org/10.52643/jti.v7i1.1321.

Full text

Abstract:

Domestic Violence is a crime by individuals that occurs in a small scope and by people known to the victim. In this study, data on Domestic Violence was obtained using the web scraping method for articles in online media for 2019-2020 in five cities in DKI Jakarta Province. In addition to scraping data, this study also uses two additional data sources (1) the National Socio-Economic Survey data in measuring the level of alcohol consumption and the average price of alcohol in cities in Jakarta. (2) Official publication data on the website of the Central Bureau of Statistics in measuring the unemployment rate and per capita expenditure. The results of this study (1) Describe how to scrap online media using the Kapow application (2) Modeling data panel Pooled Least Square relationship of domestic violence with the environment at the city level. The conclusions of this study are (1) that with the web scraping method researchers can easily get data and are more real time than having to wait for official court reports, (2) alcohol consumption, prices, unemployment rates, per capita expenditures do not identify an effect on Internal Violence. Bulk stairs, this may be caused by psychological factors rather than environmental factors. Suggestions for further research are to further expand the factors that cause domestic violence and expand the panel data period which is not only limited to two periods.

APA, Harvard, Vancouver, ISO, and other styles

35

Srividhya, V., and P. Megala. "Scraping and Visualization of Product Data from E-commerce Websites." International Journal of Computer Sciences and Engineering 7, no. 5 (May 31, 2019): 1403–7. http://dx.doi.org/10.26438/ijcse/v7i5.14031407.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Rahmatulloh, Alam, and Rohmat Gunawan. "Web Scraping with HTML DOM Method for Data Collection of Scientific Articles from Google Scholar." Indonesian Journal of Information Systems 2, no. 2 (February 26, 2020): 16. http://dx.doi.org/10.24002/ijis.v2i2.3029.

Full text

Abstract:

Google Scholar is a web-based service for searching a broad academic literature. Various types of references can be accessed such as: peer-reviewed papers, theses, books, abstracts and articles from academic publishers, professional communities, pre-printed data centers, universities and other academic organizations. Google Scholar provides the profile creation feature of every researcher, expert and lecturer. Quantity of publication from an academic institution along with detailed data on the publication of scientific articles can be accessed through Google Scholar. A recap of the publication of scientific articles of each researcher in an institution or organization is needed to determine the research performance collectively. But the problems that occur, the unavailability of recap services for publishing scientific articles for each researcher in an institution or organization. So that the scientific article publication data can be utilized by academic institutions or organizations, this research will take data from Google Scholar to make a recap of scientific article publication data by applying web scraping technology. Implementation of web scraping can help to take the available resources on the web and the results can be utilized by other applications. By doing web scraping on Google Scholar, collective scientific article publication data can be obtained. So that the process of making scientific publications data recap can be done quickly. Experiments in this study have succeeded in taking 236 researchers data from Google Scholar, with 9 attributes, and 2,420 articles.

APA, Harvard, Vancouver, ISO, and other styles

37

Barcaroli, Guilio, Alessandra Nurra, Sergio Salamone, Monica Scannapieco, Marco Scarnò, and Donato Summa. "Internet as Data Source in the Istat Survey on ICT in Enterprises." Austrian Journal of Statistics 44, no. 2 (April 30, 2015): 31–43. http://dx.doi.org/10.17713/ajs.v44i2.53.

Full text

Abstract:

The Istat sampling survey on ICT in enterprises aims at producing information onthe use of ICT and in particular on the use of Internet by Italian enterprises for various purposes (e-commerce, e-recruitment, advertisement, e-tendering, e-procurement, egovernment). To such a scope, data are collected by means of the traditional instrument of the questionnaire. Istat began to explore the possibility to use web scraping techniques, associated, in the estimation phase, to text and data mining algorithms, with the aim to replace traditional instruments of data collection and estimation, or to combine them in an integrated approach. The 8,600 websites, indicated by the 19,000 enterprises responding to ICT survey of year 2013, have been scraped and the acquired texts have been processed in order to try to reproduce the same information collected via questionnaire. Preliminary results are encouraging, showing in some cases a satisfactory predictive capability of fittedmodels (mainly those obtained by using the Naive Bayes algorithm). Also the method known as Content Analysis has been applied, and its results compared to those obtained with classical learners. In order to improve the overall performance, an advanced system for scraping and mining is being adopted, based on the open source Apache suite Nutch-Solr-Lucene. On the basis of the nal results of this test, an integrated system harnessing both survey data and data collected from Internet to produce the required estimates will be implemented, based on systematic scraping of the near 100,000 websites related to the whole population of Italian enterprises with 10 persons employed and more, operating in industry and services. This new approach, based on Internet as Data source (IaD), is characterized by advantages and drawbacks that need to be carefully analysed.

APA, Harvard, Vancouver, ISO, and other styles

38

Levin, Boris A., Aleksandra S. Komarova, Oksana L. Rozanova, and Alexander S. Golubtsov. "Unexpected Diversity of Feeding Modes among Chisel-Mouthed Ethiopian Labeobarbus (Cyprinidae)." Water 13, no. 17 (August 26, 2021): 2345. http://dx.doi.org/10.3390/w13172345.

Full text

Abstract:

Trophic resource partitioning is one of the main drivers of adaptive radiation. The evolutionary diversification of large African barbs, the genus Labeobarbus, seems to be related to mouth polymorphism. The chisel-mouthed or scraping phenotype has repeatedly evolved within Labeobarbus. At least five ecomorphs with a scraping mouth morphology were detected in the waters of the Ethiopian Highlands and can be provisionally classified into two groups: (i) “Varicorhinus”-like, and (ii) “Smiling”-like. Previously, all Labeobarbus with a scraping-mouth morphology were considered to be periphyton feeders. Using data on morphology, diet and stable isotope ratios (C and N), we addressed the question: does a scraping-mouth morphology predict feeding on periphyton? Our study revealed that five scraper ecomorphs exhibited three main feeding modes: (i) periphyton-eating, (ii) herbivory–detritivory, and (iii) insectivory. Two cases of the parallel divergence of sympatric ecomorphs with distinct feeding modes (herbivory–detritivory vs. insectivory) were revealed in two geographically isolated basins. A significant difference in δ15N values was detected among sympatric scraper ecomorphs. A periphytonophagous scraper was rich in δ15N values that are comparable with those in sympatric piscivorous fish. This data sheds light on the possibility of the utilization of periphyton as a protein-rich food by fishes.

APA, Harvard, Vancouver, ISO, and other styles

39

McDonald, Sarah, and Nicholas J. Horton. "Data Scraping, Ingestation, and Modeling: Bringing Data from cars.com into the Intro Stats Class." CHANCE 32, no. 4 (October 2, 2019): 57–64. http://dx.doi.org/10.1080/09332480.2019.1695443.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Mulyani, Asri, Dede Kurniadi, and Ikbal Lukmanul Hakim. "Web Scraping Pada Web Media Digital untuk Membangun Aplikasi Android." Jurnal Algoritma 18, no. 1 (November 26, 2021): 313–22. http://dx.doi.org/10.33364/algoritma/v.18-1.949.

Full text

Abstract:

Perkembangan teknologi dalam melakukan otomatiasi pada suatu pekerjaan tertenu sangatlah pesat pada era saat ini, efisiensi waktu dalam melakukan pekerjaan menjadi sangat tinggi, tidak terkecuali dalam melakukan penyalinan informasi dari website, jika pengambilan data dilakukan secara konfensional akan memakan waktu yang cukup lama serta banyak hal yang harus dilakukan, dengan memanfaatkan teknik web scraping pengambilan data atau penyalinan informasi dari website dilakukan secara otomatis yang mana akan memangkas waktu sangat banyak jika dibandingkan dengan penyalinan secara konfensional. Adapun salah satu pemanfaatan dari teknik web scraping adalah untuk membuat REST-API tanpa database dan di integrasikan ke mobile apps. sehingga tujuan dari penelitian ini untuk mengimplementasikan teknik web scraping unuk membangun aplikasi berbasi android. Adapun metodologi yang digunakan dalam penelitian ini adalah Rational Unified Process (RUP), tahapan yang dilakukan dalam metodologi ini mulai dari inception, elaboration construction, sampai dengan transition . Hasil dari penelitian ini adalah berupa aplikasi media digital berbasi android sebagai client-nya dan REST-API sebagai server-nya.

APA, Harvard, Vancouver, ISO, and other styles

41

Ensari, Elif, and Bilge Kobaş. "Web scraping and mapping urban data to support urban design decisions." A/Z : ITU journal of Faculty of Architecture 15, no. 1 (2018): 5–21. http://dx.doi.org/10.5505/itujfa.2018.40360.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Al Walid, Md Hosne. "Data Analysis and Visualization of Continental Cancer Situation by Twitter Scraping." International Journal of Modern Education and Computer Science 11, no. 7 (July 8, 2019): 23–31. http://dx.doi.org/10.5815/ijmecs.2019.07.03.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Vela, Belen, Jose Maria Cavero, Paloma Caceres, and Carlos E. Cuesta. "A Semi-Automatic Data–Scraping Method for the Public Transport Domain." IEEE Access 7 (2019): 105627–37. http://dx.doi.org/10.1109/access.2019.2932197.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Yang, Y., L. T. Wilson, and J. Wang. "Development of an automated climatic data scraping, filtering and display system." Computers and Electronics in Agriculture 71, no. 1 (April 2010): 77–87. http://dx.doi.org/10.1016/j.compag.2009.12.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Paisal, Paisal, Erli Haryati, Dwi Candra Arianti, Muhammad Rasyid Ridha, and Annida Annida. "Microfilaria Detection on Giemsa Blood Smears using Real-Time PCR." Medical Laboratory Technology Journal 5, no. 1 (June 17, 2019): 24. http://dx.doi.org/10.31964/mltj.v5i1.210.

Full text

Abstract:

Filariasis is an infectious disease caused by filarial worms, which found in many tropical and subtropical regions. In 2017, 12,677 cases of chronic filariasis were found in Indonesia, which 132 cases of them were from Central Kalimantan province. Data of Kapuas District Health Office shows 17 cases filariasis in 2015. Frequently, filariasis patients did not show any diseases symptoms, especially when the level of microfilariae in the blood is deficient. On the other hand, microscopic assay with Giemsa blood smears is still the gold standard to define filariasis. Thus, a false negative result may occur due to a low level of microfilariae in the blood. In this study, we develop a real-time PCR method, targeting the HhaI gene of filaria, to detect filarial worm from stored Giemsa blood taken from filariasis patients, in both dry and wet scraping methods. Our result shows that real-time PCR can detect Brugia malayi in all scraping samples, with Ct value from wet scraping sample tends to be higher than dry scraping. In conclusion, the real-time PCR method can be further used to define filariasis, especially in the condition when Giemsa smear blood cannot determine patient filariasis status.

APA, Harvard, Vancouver, ISO, and other styles

46

Satriajati, Salim, Satria Bagus Panuntun, and Setia Pramana. "IMPLEMENTASI WEB SCRAPING DALAM PENGUMPULAN BERITA KRIMINAL PADA MASA PANDEMI COVID-19." Seminar Nasional Official Statistics 2020, no. 1 (January 5, 2021): 300–308. http://dx.doi.org/10.34123/semnasoffstat.v2020i1.578.

Full text

Abstract:

Saat ini telah banyak situs berita yang menyediakan informasi terkait kejadian maupun fenomena. Di sisi lain, pandemi Covid-19 memunculkan krisis dan masalah multidimensi. Salah satunya adalah timbulnya kriminalitas di tengah masyarakat. Penelitian ini bertujuan untuk mengumpulkan berita kriminal yang terjadi pada masa pandemi Covid-19 dari situs berita. Adapun pengumpulan informasi dari situs berita menggunakan teknik web scraping. Web scraping adalah suatu teknik penggalian informasi dari situs web. Berita yang berhasil dikumpulkan, kemudian dapat dianalisis mengenai adanya kemungkinan tren kejadian kriminal beriringan dengan tren pandemi Covid-19 di Indonesia. Situs berita yang digunakan pada penelitian ini adalah detik.com. Berdasarkan situs Alexa Internet (alexa.com), detik.com menjadi salah satu situs berita yang paling sering diakses dan masuk ke dalam 10 besar situs web dengan traffic tertinggi di Indonesia. Sedangkan data Covid-19 di Indonesia bersumber dari situs KawalCOVID19.id. Hasil penelitian menunjukkan bahwa jumlah berita kriminal dan jumlah kasus terkonfirmasi Covid-19 memiliki tren harian yang sama, yakni makin meningkat. Berdasarkan penelitian ini, dapat disimpulkan bahwa Web scraping dapat diimplementasikan untuk mengumpulkan berita. Hasil dari web scraping selanjutnya dapat digunakan untuk mengetahui tren jumlah berita kriminal harian yang kemudian dibandingkan dengan tren harian jumlah kasus terkonfirmasi Covid-19 di Indonesia.

APA, Harvard, Vancouver, ISO, and other styles

47

Chalk, Stuart J. "Leveraging Web 2.0 technologies to add value to the IUPAC Solubility Data Series: development of a REST style website and application programming interface (API)." Pure and Applied Chemistry 87, no. 11-12 (December 1, 2015): 1127–37. http://dx.doi.org/10.1515/pac-2015-0403.

Full text

Abstract:

AbstractThis paper details an approach to re-purposing scientific data as presented on a web page for the sole purpose of making the data more available for searching and integration into other websites. Data ‘scraping’ is used to extract metadata from a set of pages on the National Institute of Standards and Technology (NIST) website, clean, organize and store the metadata in a MySQL database. The metadata is then used to create a new website at the authors institution using the CakePHP framework to create a representational state transfer (REST) style application program interface (API). The processes used for website analysis, schema development, database construction, metadata scraping, REST API development, and remote data integration are discussed. Lessons learned and tips and tricks on how to get the most out of the process are also included.

APA, Harvard, Vancouver, ISO, and other styles

48

Kumar, Swarn Avinash, Moustafa M. Nasralla, Iván García-Magariño, and Harsh Kumar. "A machine-learning scraping tool for data fusion in the analysis of sentiments about pandemics for supporting business decisions with human-centric AI explanations." PeerJ Computer Science 7 (September 17, 2021): e713. http://dx.doi.org/10.7717/peerj-cs.713.

Full text

Abstract:

The COVID-19 pandemic is changing daily routines for many citizens with a high impact on the economy in some sectors. Small-medium enterprises of some sectors need to be aware of both the pandemic evolution and the corresponding sentiments of customers in order to figure out which are the best commercialization techniques. This article proposes an expert system based on the combination of machine learning and sentiment analysis in order to support business decisions with data fusion through web scraping. The system uses human-centric artificial intelligence for automatically generating explanations. The expert system feeds from online content from different sources using a scraping module. It allows users to interact with the expert system providing feedback, and the system uses this feedback to improve its recommendations with supervised learning.

APA, Harvard, Vancouver, ISO, and other styles

49

Karthikeyan T., Karthik Sekaran, Ranjith D., Vinoth Kumar V., and Balajee J M. "Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques." International Journal of Web Portals 11, no. 2 (July 2019): 41–52. http://dx.doi.org/10.4018/ijwp.2019070103.

Full text

Abstract:

Web scraping is a technique to extract information from various web documents automatically. It retrieves the related contents based on the query, aggregates and transforms the data from an unstructured format into a structured representation. Text classification becomes a vital phase to summarize the data and in categorizing the webpages adequately. In this article, using effective web scraping methodologies, the data is initially extracted from websites, then transformed into a structured form. Based on the keywords from the data, the documents are classified and labeled. A recursive feature elimination technique is applied to the data to select the best candidate feature subset. The final data-set trained with standard machine learning algorithms. The proposed model performs well on classifying the documents from the extracted data with a better accuracy rate.

APA, Harvard, Vancouver, ISO, and other styles

50

Lisikh, Alla, and Sergey Kobyakov. "Analysis of Innovative Technologies for Mechanical Processing of Textile Raw Materials." National Interagency Scientific and Technical Collection of Works. Design, Production and Exploitation of Agricultural Machines, no. 50 (2020): 164–72. http://dx.doi.org/10.32515/2414-3820.2020.50.164-172.

Full text

Abstract:

Article is devoted to developed and implemented new technical solutions for the processing of hemp stems in order to obtain bast. This article presents the rationale for new approaches in solving the problem of obtaining a cannabis club for various purposes, depending on the sharpness indicator. The stiffness indicator is manageable by changing the number of technological transitions that include processing mechanisms. The article analyzes new technical solutions as a result of which a technological scheme for isolating hemp bast is proposed. Using the proposed technological scheme for the extraction of hemp bast, individual parts of the equipment were designed and manufactured, which include such processes as: scraping with simultaneous scraping, scuttle with combing and shaking with vibration. The process of scraping with scraping is carried out in the scraping and scraping experimental section. The design and technological parameters of the bead part are made in such a way that they provide a gradual increase in the intensity of the bead process. The process of combining vibrating and shaking actions on the material at the same time is performed by the needles of the combed field and the strips of the needle conveyor, where a layer of material is periodically thrown in a vertical plane. The use of scuttle processes with combing and shaking with vibration in the technology of obtaining hemp bast in several passes allows you to obtain a bast with a fire content and its mass-length in a wide range. This combination of shaking and vibration processes provides an increase in the efficiency of de-sharpening the bast. The experimental data obtained show that the proposed process of isolating hemp bast, consisting of the alternation of several main processes: crushing with grooved slat-type rollers with a speed difference between pairs of rollers, scuttling with simultaneous scraping, shaking in combination with vibration, provides a bast with a content of fire and mass-long in a wide range, while the content of the fire in the resulting bast may fluctuate, depending on how many transitions will be used to process hemp straw.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!