Academic literature on the topic 'Hidden web crawling'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Hidden web crawling.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Hidden web crawling"

1

Liakos, Panagiotis, Alexandros Ntoulas, Alexandros Labrinidis, and Alex Delis. "Focused crawling for the hidden web." World Wide Web 19, no. 4 (2015): 605–31. http://dx.doi.org/10.1007/s11280-015-0349-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gupta, Sonali, and Komal Kumar Bhatia. "Optimal Query Generation for Hidden Web Extraction through Response Analysis." International Journal of Information Retrieval Research 4, no. 2 (2014): 1–18. http://dx.doi.org/10.4018/ijirr.2014040101.

Full text
Abstract:
A huge number of Hidden Web databases exists over the WWW forming a massive source of high quality information. Retrieval of this information for enriching the repository of the search engine is the prime target of a Hidden web crawler. Besides this, the crawler should perform this task at an affordable cost and resource utilization. This paper proposes a Random ranking mechanism whereby the queries to be raised by the hidden web crawler have been ranked. By ranking the queries according to the proposed mechanism, the Hidden Web crawler is able to make an optimal choice among the candidate que
APA, Harvard, Vancouver, ISO, and other styles
3

Soulemane, Moumie, Mohammad Rafiuzzaman, and Hasan Mahmud. "Crawling the Hidden Web: An Approach to Dynamic Web Indexing." International Journal of Computer Applications 55, no. 1 (2012): 7–15. http://dx.doi.org/10.5120/8717-7290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Deshmukh, Mayuri Anantrao. "2 Way Crawling." International Journal of Applied Evolutionary Computation 10, no. 3 (2019): 34–39. http://dx.doi.org/10.4018/ijaec.2019070105.

Full text
Abstract:
As we know that the deep web grows at very fast pace, there has been increased interest in techniques which help efficiently locate and check deep web interfaces. So, it is important to achieve wide coverage and high efficiency on the large volume of web resources. For this we propose a multistage framework, Smart crawler. Smart crawler is a two-stage crawler used to efficiently harvest deep web interfaces. In the first stage, the crawler performs site-based searching for center pages and avoids visiting non-relevant sites. In the second stage, an adaptive link ranking technique is used which
APA, Harvard, Vancouver, ISO, and other styles
5

Sheng, Cheng, Nan Zhang, Yufei Tao, and Xin Jin. "Optimal algorithms for crawling a hidden database in the web." Proceedings of the VLDB Endowment 5, no. 11 (2012): 1112–23. http://dx.doi.org/10.14778/2350229.2350232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sharma, Dilip Kumar, and A. K. Sharma. "A Novel Architecture for Deep Web Crawler." International Journal of Information Technology and Web Engineering 6, no. 1 (2011): 25–48. http://dx.doi.org/10.4018/jitwe.2011010103.

Full text
Abstract:
A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information. Various techniques have been proposed for crawling deep Web information, but much remains undiscovered. In this paper, the authors analyze and compare important deep Web information crawling techniques to find their relative limitations and advantages. To minimize limitations of existing deep Web crawlers, a novel architec
APA, Harvard, Vancouver, ISO, and other styles
7

TIAN, JIAN-WEI, WEN-HUI QI, and XIAO-XIAO LIU. "RETRIEVING DEEP WEB DATA THROUGH MULTI-ATTRIBUTES INTERFACES WITH STRUCTURED QUERIES." International Journal of Software Engineering and Knowledge Engineering 21, no. 04 (2011): 523–42. http://dx.doi.org/10.1142/s0218194011005396.

Full text
Abstract:
A great deal of data on the Web lies in the hidden databases, or the deep Web. Most of the deep Web data is not directly available and can only be accessed through the query interfaces. Current research on deep Web search has focused on crawling the deep Web data via Web interfaces with keywords queries. However, these keywords-based methods have inherent limitations because of the multi-attributes and top-k features of the deep Web. In this paper we propose a novel approach for siphoning structured data with structured queries. Firstly, in order to retrieve all the data non-repeatedly in hidd
APA, Harvard, Vancouver, ISO, and other styles
8

Prieto, Víctor, Manuel Álvarez, Rafael López-García, and Fidel Cacheda. "A scale for crawler effectiveness on the client-side hidden web." Computer Science and Information Systems 9, no. 2 (2012): 561–83. http://dx.doi.org/10.2298/csis111215015p.

Full text
Abstract:
The main goal of this study is to present a scale that classifies crawling systems according to their effectiveness in traversing the ?clientside? Hidden Web. First, we perform a thorough analysis of the different client-side technologies and the main features of the web pages in order to determine the basic steps of the aforementioned scale. Then, we define the scale by grouping basic scenarios in terms of several common features, and we propose some methods to evaluate the effectiveness of the crawlers according to the levels of the scale. Finally, we present a testing web site and we show t
APA, Harvard, Vancouver, ISO, and other styles
9

Niu, Beibei, Jinzheng Ren, Ansa Zhao, and Xiaotao Li. "Lender Trust on the P2P Lending: Analysis Based on Sentiment Analysis of Comment Text." Sustainability 12, no. 8 (2020): 3293. http://dx.doi.org/10.3390/su12083293.

Full text
Abstract:
Lender trust is important to ensure the sustainability of P2P lending. This paper uses web crawling to collect more than 240,000 unique pieces of comment text data. Based on the mapping relationship between emotion and trust, we use the lexicon-based method and deep learning to check the trust of a given lender in P2P lending. Further, we use the Latent Dirichlet Allocation (LDA) topic model to mine topics concerned with this research. The results show that lenders are positive about P2P lending, though this tendency fluctuates downward with time. The security, rate of return, and compliance o
APA, Harvard, Vancouver, ISO, and other styles
10

Koop, Martin, Erik Tews, and Stefan Katzenbeisser. "In-Depth Evaluation of Redirect Tracking and Link Usage." Proceedings on Privacy Enhancing Technologies 2020, no. 4 (2020): 394–413. http://dx.doi.org/10.2478/popets-2020-0079.

Full text
Abstract:
AbstractIn today’s web, information gathering on users’ online behavior takes a major role. Advertisers use different tracking techniques that invade users’ privacy by collecting data on their browsing activities and interests. To preventing this threat, various privacy tools are available that try to block third-party elements. However, there exist various tracking techniques that are not covered by those tools, such as redirect link tracking. Here, tracking is hidden in ordinary website links pointing to further content. By clicking those links, or by automatic URL redirects, the user is bei
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Hidden web crawling"

1

Antelius, Daniel. "Link Extraction for Crawling Flash on the Web." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-117604.

Full text
Abstract:
The set of web pages not reachable using conventional web search engines is usually called the hidden or deep web. One client-side hurdle for crawling the hidden web is Flash files. This thesis presents a tool for extracting links from Flash files up to version 8 to enable web crawling. The files are both parsed and selectively interpreted to extract links. The purpose of the interpretation is to simulate the normal execution of Flash in the Flash runtime of a web browser. The interpretation is a low level approach that allows the extraction to occur offline and without involving automation of
APA, Harvard, Vancouver, ISO, and other styles
2

Moraes, Tiago Guimarães. "Seleção de valores para preenchimento de formulários web." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/77762.

Full text
Abstract:
Os motores de busca tradicionais utilizam técnicas que rastreiam as páginas na Web através de links HTML. Porém a maior parte da Web não é acessada por essas técnicas. A parcela da Web não acessada é chamada de Web oculta. Uma enorme quantidade de informação estruturada e de melhor qualidade que a presente na Web tradicional está disponível atrás das interfaces de busca, os formulários que são pontos de entrada para a Web oculta. Essa porção da Web é de difícil acesso para os motores de busca, pois o preenchimento correto dos formulários representa um grande desafio, dado que foram construídos
APA, Harvard, Vancouver, ISO, and other styles
3

Kantorski, Gustavo Zanini. "Preenchimento automático de formulários na web oculta." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/107988.

Full text
Abstract:
Muitas informações disponíveis na Web estão armazenadas em bancos de dados on-line e são acessíveis somente após um usuário enviar uma consulta por meio de uma interface de busca. Essas informações estão localizadas em uma parte da Web conhecida como Web Oculta ou Web Profunda e, geralmente, são inacessíveis por máquinas de busca tradicionais. Uma vez que a forma de acessar os dados na Web Oculta se dá por intermédio de submissões de consultas, muitos trabalhos têm focado em como preencher automaticamente campos de formulários. Esta tese apresenta uma metodologia para o preenchimento de formul
APA, Harvard, Vancouver, ISO, and other styles
4

Moraes, Maurício Coutinho. "Towards completely automatized HTML form discovery on the web." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/70194.

Full text
Abstract:
The forms discovered by our proposal can be directly used as training data by some form classifiers. Our experimental validation used thousands of real Web forms, divided into six domains, including a representative subset of the publicly available DeepPeep form base (DEEPPEEP, 2010; DEEPPEEP REPOSITORY, 2011). Our results show that it is feasible to mitigate the demanding manual work required by two cutting-edge form classifiers (i.e., GFC and DSFC (BARBOSA; FREIRE, 2007a)), at the cost of a relatively small loss in effectiveness.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Hidden web crawling"

1

Liakos, Panagiotis, and Alexandros Ntoulas. "Topic-Sensitive Hidden-Web Crawling." In Web Information Systems Engineering - WISE 2012. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35063-4_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rawat, Romil, Anand Singh Rajawat, Vinod Mahor, Rabindra Nath Shaw, and Ankush Ghosh. "Dark Web—Onion Hidden Service Discovery and Crawling for Profiling Morphing, Unstructured Crime and Vulnerabilities Prediction." In Lecture Notes in Electrical Engineering. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-0749-3_57.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gupta, Sonali, and Komal Kumar Bhatia. "Optimal Query Generation for Hidden Web Extraction Through Response Analysis." In The Dark Web. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-3163-0.ch005.

Full text
Abstract:
A huge number of Hidden Web databases exists over the WWW forming a massive source of high quality information. Retrieval of this information for enriching the repository of the search engine is the prime target of a Hidden web crawler. Besides this, the crawler should perform this task at an affordable cost and resource utilization. This paper proposes a Random ranking mechanism whereby the queries to be raised by the hidden web crawler have been ranked. By ranking the queries according to the proposed mechanism, the Hidden Web crawler is able to make an optimal choice among the candidate queries and efficiently retrieve the Hidden web databases. The Hidden Web crawler proposed here also possesses an extensible and scalable framework to improve the efficiency of crawling. The proposed approach has also been compared with other methods of Hidden Web crawling existing in the literature.
APA, Harvard, Vancouver, ISO, and other styles
4

Sharma, Dilip Kumar, and A. K. Sharma. "A Novel Architecture for Deep Web Crawler." In The Dark Web. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-3163-0.ch015.

Full text
Abstract:
A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information. Various techniques have been proposed for crawling deep Web information, but much remains undiscovered. In this paper, the authors analyze and compare important deep Web information crawling techniques to find their relative limitations and advantages. To minimize limitations of existing deep Web crawlers, a novel architecture is proposed based on QIIIEP specifications (Sharma & Sharma, 2009). The proposed architecture is cost effective and has features of privatized search and general search for deep Web data hidden behind html forms.
APA, Harvard, Vancouver, ISO, and other styles
5

Elangovan, Ramanujam. "The Dark Web." In Encyclopedia of Criminal Activities and the Deep Web. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-5225-9715-5.ch008.

Full text
Abstract:
The deep web (also called deepnet, the invisible web, dark web, or the hidden web) refers to world wide web content that is not part of the surface web, which is indexed by standard search engines. The more familiar “surface” web contains only a small fraction of the information available on the internet. The deep web contains much of the valuable data on the web, but is largely invisible to standard web crawling techniques. Besides it being the huge source of information, it also provides the rostrum for cybercrime like by providing download links for movies, music, games, etc. without having their copyrights. This article aims to provide context and policy recommendations pertaining to the dark web. The dark web's complete history, from its creation to the latest incidents and the way to access and their sub forums are briefly discussed with respective to the user perspective.
APA, Harvard, Vancouver, ISO, and other styles
6

Hai-Jew, Shalin. "Multimodal Mapping of a University’s Formal and Informal Online Brand." In Packaging Digital Information for Enhanced Learning and Analysis. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4462-5.ch007.

Full text
Abstract:
With the popularization of the Social Web (or Read-Write Web) and millions of participants in these interactive spaces, institutions of higher education have found it necessary to create online presences to promote their university brands, presence, and reputation. An important aspect of that engagement involves being aware of how their brand is represented informally (and formally) on social media platforms. Universities have traditionally maintained thin channels of formalized communications through official media channels, but in this participatory new media age, the user-generated contents and communications are created independent of the formal public relations offices. The university brand is evolving independently of official controls. Ex-post interventions to protect university reputation and brand may be too little, too late, and much of the contents are beyond the purview of the formal university. Various offices and clubs have institutional accounts on Facebook as well as wide representation of their faculty, staff, administrators, and students online. There are various microblogging accounts on Twitter. Various photo and video contents related to the institution may be found on photo- and video-sharing sites, like Flickr, and there are video channels on YouTube. All this digital content is widely available and may serve as points-of-contact for the close-in to more distal stakeholders and publics related to the institution. A recently available open-source tool enhances the capability for crawling (extracting data) these various social media platforms (through their Application Programming Interfaces or “APIs”) and enables the capture, analysis, and social network visualization of broadly available public information. Further, this tool enables the analysis of previously hidden information. This chapter introduces the application of Network Overview, Discovery and Exploration for Excel (NodeXL) to the empirical and multimodal analysis of a university’s electronic presence on various social media platforms and offers some initial ideas for the analytical value of such an approach.
APA, Harvard, Vancouver, ISO, and other styles
7

Hai-Jew, Shalin. "Multimodal Mapping of a University's Formal and Informal Online Brand." In Social Media Marketing. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5637-4.ch052.

Full text
Abstract:
With the popularization of the Social Web (or Read-Write Web) and millions of participants in these interactive spaces, institutions of higher education have found it necessary to create online presences to promote their university brands, presence, and reputation. An important aspect of that engagement involves being aware of how their brand is represented informally (and formally) on social media platforms. Universities have traditionally maintained thin channels of formalized communications through official media channels, but in this participatory new media age, the user-generated contents and communications are created independent of the formal public relations offices. The university brand is evolving independently of official controls. Ex-post interventions to protect university reputation and brand may be too little, too late, and much of the contents are beyond the purview of the formal university. Various offices and clubs have institutional accounts on Facebook as well as wide representation of their faculty, staff, administrators, and students online. There are various microblogging accounts on Twitter. Various photo and video contents related to the institution may be found on photo- and video-sharing sites, like Flickr, and there are video channels on YouTube. All this digital content is widely available and may serve as points-of-contact for the close-in to more distal stakeholders and publics related to the institution. A recently available open-source tool enhances the capability for crawling (extracting data) these various social media platforms (through their Application Programming Interfaces or “APIs”) and enables the capture, analysis, and social network visualization of broadly available public information. Further, this tool enables the analysis of previously hidden information. This chapter introduces the application of Network Overview, Discovery and Exploration for Excel (NodeXL) to the empirical and multimodal analysis of a university's electronic presence on various social media platforms and offers some initial ideas for the analytical value of such an approach.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Hidden web crawling"

1

Xin Wang, Luhua Wang, Gengyu Wei, Dongmei Zhang, and Yixian Yang. "Hidden web crawling for SQL injection detection." In 2010 3rd IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT 2010). IEEE, 2010. http://dx.doi.org/10.1109/icbnmt.2010.5704860.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rahayuda, I. Gede Surya, and Ni Putu Linda Santiari. "Crawling and cluster hidden web using crawler framework and fuzzy-KNN." In 2017 5th International Conference on Cyber and IT Service Management (CITSM). IEEE, 2017. http://dx.doi.org/10.1109/citsm.2017.8089225.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

El-desouky, Ali, Hesham Ali, and Sally El-ghamrawy. "An Automatic Label Extraction Technique for Domain-Specific Hidden Web Crawling (LEHW)." In 2006 International Conference on Computer Engineering and Systems. IEEE, 2006. http://dx.doi.org/10.1109/icces.2006.320490.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

El-Desouky, Ali I., Hesham A. Ali, and Sally M. El-Ghamrawy. "A New Framework for Domain-Specific Hidden Web Crawling Based on Data Extraction Techniques." In 2006 ITI 4th International Conference on Information & Communications Technology. IEEE, 2006. http://dx.doi.org/10.1109/itict.2006.358295.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!