Log in

Relevant bibliographies by topics / Web crawler behavior / Journal articles

To see the other types of publications on this topic, follow the link: Web crawler behavior.

Journal articles on the topic 'Web crawler behavior'

Author: Grafiati

Published: 5 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Web crawler behavior.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

张, 晓燕. "Research on the Crime of Web Crawler Behavior." Dispute Settlement 10, no. 02 (2024): 1130–37. http://dx.doi.org/10.12677/ds.2024.102155.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Dikaiakos, Marios D., Athena Stassopoulou, and Loizos Papageorgiou. "An investigation of web crawler behavior: characterization and metrics." Computer Communications 28, no. 8 (2005): 880–97. http://dx.doi.org/10.1016/j.comcom.2005.01.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Bai, Quan, Gang Xiong, Yong Zhao, and Longtao He. "Analysis and Detection of Bogus Behavior in Web Crawler Measurement." Procedia Computer Science 31 (2014): 1084–91. http://dx.doi.org/10.1016/j.procs.2014.05.363.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Xing-Hua, Lu, Ye Wen-Quan, and Liu Ming-Yuan. "Personalized Recommendation Algorithm for Web Pages Based on Associ ation Rule Mining." MATEC Web of Conferences 173 (2018): 03020. http://dx.doi.org/10.1051/matecconf/201817303020.

Full text

Abstract:

In order to improve the user ' s ability to access websites and web pages, according to the interest preference of the user, the personalized recommendation design is carried out, and the personalized recommendation model for web page visit is established to meet the personalized interest demand of the user to browse the web page. A webpage personalized recommendation algorithm based on association rule mining is proposed. Based on the semantic features of web pages, user browsing behavior is calculated by similarity computation, and web crawler algorithm is constructed to extract the semantic features of web pages. The autocorrelation matching method is used to match the features of web page and user browsing behavior, and the association rules feature quantity of user browsing website behavior is mined. According to the semantic relevance and semantic information of web users to search words, fuzzy registration is taken, Web personalized recommendation is obtained to meet the needs of the users browse the web. The simulation results show that the method is accurate and user satisfaction is higher.

APA, Harvard, Vancouver, ISO, and other styles

5

李, 朝阳. "Research on the Legal Regulation of Personal Information Tort of Web Crawler Behavior." E-Commerce Letters 13, no. 02 (2024): 3745–52. http://dx.doi.org/10.12677/ecl.2024.132458.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Jing, Xu, Dong Jian He, Lin Sen Zan, Jian Liang Li, and Wang Yao. "Automatic Detection System of Web-Based Malware for Management-Type SaaS." Advanced Materials Research 129-131 (August 2010): 670–74. http://dx.doi.org/10.4028/www.scientific.net/amr.129-131.670.

Full text

Abstract:

In management-type SaaS, user must be permitted to submit tenant’s business data on the SP's server, which may be embedded by the web-based malware. In this paper, we propose the automatic detecting method of web-based malware based on behavior analysis, which can make sure to meet the SLA by detecting the web-based malware actively. First, tenant’s update is downloaded to the bastion host by the web crawler. Second, it detect the behavior that tenant’s update is opened by IE. In order to break the malicious behavior during detecting, the IE has been injected in the DLL. Last, if the sensitive operations happen, the URL is appended to the malicious address database, and at same time the system administrator is informed by the SMS. The result of test is shown that our method can detect the web-based malware accurately. It helps to improve the service level of the management-type SaaS.

APA, Harvard, Vancouver, ISO, and other styles

7

章, 徐南. "On the Application of Restrictions on the Characterization of Unfair Competition in Web Crawler Behavior." Dispute Settlement 10, no. 01 (2024): 245–50. http://dx.doi.org/10.12677/ds.2024.101035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Leithner, Manuel, and Dimitris E. Simos. "CHIEv." ACM SIGAPP Applied Computing Review 21, no. 1 (2021): 5–23. http://dx.doi.org/10.1145/3477133.3477134.

Full text

Abstract:

Researchers and practitioners in the fields of testing, security assessment and web development seeking to evaluate a given web application often have to rely on the existence of a model of the respective system, which is then used as input to task-specific tools. Such models may include information on HTTP endpoints and their parameters, available user actions/event listeners and required assets. Unfortunately, this data is often unavailable in practice, as only rigorous development practices or manual analysis guarantee their existence and correctness. Crawlers based on static analysis have traditionally been used to extract required information from existing sites. Regrettably, these tools can not accurately account for the dynamic behavior introduced by technologies such as JavaScript that are prevalent on modern sites. While methods based on dynamic analysis exist, they are often not fully capable of identifying event listeners and their effects. In an earlier work, we presented XIEv, an approach for dynamic analysis of web applications that produces an execution trace usable for the extraction of navigation graphs, identification of bugs at runtime and enumeration of resources. It offers improved recognition and selection of event listeners as well as a greater range of observed effects compared to existing approaches. While the evaluation of our research prototype implementation confirmed the capabilities of XIEv, it was generally out-performed by static crawlers in terms of speed. This work introduces CHIEv, an approach that augments XIEv by enabling concurrent processing as well as incorporating the results of a static crawler in real-time. Our results indicate a significant increase in performance, particularly when applied to larger sites.

APA, Harvard, Vancouver, ISO, and other styles

9

Muneeb Ahmed Farooqi, Muhammad Arslan Ashraf, and Muhammad Umer Shaukat. "Google Page Rank Site Structure Strategies for Marketing Web Pages." Journal of Computing & Biomedical Informatics 2, no. 02 (2021): 140–57. http://dx.doi.org/10.56979/202/2021/30.

Full text

Abstract:

There are several Search Engines to categorize the web content and show us on base of our search query.These search engines are continuously visiting the pages/sites and gather the information using different techniques called crawling/spidering. On basis of daily content collection all search engines are managing their own indexes for searches.For every business there is a need to make its pages as most top rated/ranked pages by making structurally and content wise batter so that any crawler can easily crawl it and can rank it among the top 10 results.In this thesis only, structural behavior is being discussed in which internal graphical relationships between pages and the loading time of pages in terms of all supportive content .Structural overview includes the HTML tags structure which works as tree flow starting from tag and then moving towards child nodes.Page speed identifies the loading time of a page, it helps search engine to categories pages for mobile devices as well.Sometimes there is a thunder option on mobile search with rank it represents that this page is super-fast in loading. Page loading includes the loading of all content except Ajax base content.According to Google Page, rank is based on page content, page structure and page loading time.As discussion is only Page structure and page loading time so Google already gave some instructions related to Page structure and page loading speed but those instructions are not enough also it need to discover the new dimensions to make business pages among the top-rated results.

APA, Harvard, Vancouver, ISO, and other styles

10

M., Bharathi, Aditya Sai Srinivas T., and Sri K.Teja. "From Pages to Citations: Optimizing Your Journal's Google Scholar Indexing Strategy." Journal of Advancement in Parallel Computing 7, no. 2 (2024): 13–19. https://doi.org/10.5281/zenodo.10862890.

Full text

Abstract:

<em>Embark on a journey through the intricacies of Google Scholar indexing, where patience meets promise. This abstract navigates the path from compliance with inclusion guidelines to troubleshooting metadata discrepancies. Learn the art of optimizing article-level metadata and harnessing dedicated hosting platforms for enhanced indexing success. Discover strategies to mitigate indexing unpredictability and maintain consistency during website migrations. With insights into web crawler behavior and citation impact, unlock the gateway to expanded research accessibility. Join us in illuminating the scholarly sphere, where every indexed article contributes to the global dialogue of knowledge dissemination. Explore the transformative potential of Google Scholar indexing for your journal's impact.</em>

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Chuan, Wen-qiang Li, Yan Li, Hui-zhen Na, and Qian Shi. "Research and Application of Knowledge Resources Network for Product Innovation." Scientific World Journal 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/495309.

Full text

Abstract:

In order to enhance the capabilities of knowledge service in product innovation design service platform, a method of acquiring knowledge resources supporting for product innovation from the Internet and providing knowledge active push is proposed. Through knowledge modeling for product innovation based on ontology, the integrated architecture of knowledge resources network is put forward. The technology for the acquisition of network knowledge resources based on focused crawler and web services is studied. Knowledge active push is provided for users by user behavior analysis and knowledge evaluation in order to improve users’ enthusiasm for participation in platform. Finally, an application example is illustrated to prove the effectiveness of the method.

APA, Harvard, Vancouver, ISO, and other styles

12

Althunibat, Ahmad, Wael Alzyadat, Siti Sarah Maidin, Adnan Hnaif, and Basem Alokush. "Prediction of accessibility testing using a generalized linear model for e-government." Journal of Infrastructure, Policy and Development 8, no. 7 (2024): 3520. http://dx.doi.org/10.24294/jipd.v8i7.3520.

Full text

Abstract:

Using the United Nations’ Online Services Indicator (OSI) as a benchmark, the study analyzes Jordan’s e-government performance trends from 2008 to 2022, revealing temporal variations and areas of discontent. The research incorporates diverse testing strategies, considering technological, organizational, and environmental factors, and aligns with global frameworks emphasizing usability, accessibility, and security. The proposed model unfolds in three stages: data collection, performing data operations, and target selection using the Generalized Linear Model (GLM). Leveraging web crawling techniques, the data collection process extracts structured information from the Jordanian e-government portal. Results demonstrate the model’s efficacy in assessing accessibility and predicting web crawler behavior, providing valuable insights for policymakers and officials. This model serves as a practical tool for the enhancement of e-government services, addressing citizen concerns and improving overall service quality in Jordan and beyond.

APA, Harvard, Vancouver, ISO, and other styles

13

Song, Haojing. "Analysis of Chengdu Luxury Market Based on Big Data Analysis." Frontiers in Business, Economics and Management 6, no. 3 (2022): 245–48. http://dx.doi.org/10.54097/fbem.v6i3.3629.

Full text

Abstract:

At present, luxury consumers are more eager to get social recognition, so tangible goods are favored. In this paper, the current situation of luxury goods market in Chengdu based on big data analysis is analyzed. The original logs obtained by using software such as Web crawler go through preprocessing operations of web records such as data cleaning, identification of the same user and identification of independent sessions in turn, so as to obtain data that can be directly discovered and used, and extract these data related to user behavior analysis. Big data analysis shows that Chengdu luxury goods market has great potential, but irrational consumption is serious. Therefore, the government and relevant departments should set up specialized agencies for luxury goods management as soon as possible, formulate corresponding management measures and strengthen the supervision of luxury goods.

APA, Harvard, Vancouver, ISO, and other styles

14

Long, Junyu. "Analysis of Investor Sentiment and Trading Behavior." International Journal of Global Economics and Management 5, no. 3 (2024): 169–76. https://doi.org/10.62051/ijgem.v5n3.19.

Full text

Abstract:

This work focuses on analyzing the non-ferrous metal stock market by selecting specific indicators to reflect the sector's overall performance, considering the prevalent market conditions. In this paper, we preprocessed the index data and applied principal component analysis to obtain five principal components, accounting for 95.263% of the total variance, thereby establishing an investor sentiment measurement model. Further, to establish a correlation between trading volume and investor sentiment, we conducted logistic regression analysis based on the sentiment measurement model. The tested model achieved an accuracy rate of 87.3%. Additionally, using a Python web crawler, we collected and standardized the "greed and fear index" as an emotional indicator, matched it with daily data, and incorporated it into the logistic regression model. This enhanced model exhibited an accuracy rate of 82.5%. Our findings reveal that the greed and fear index has a more significant impact on emotional index changes than trading volume and turnover rate in logistic regression. This work not only contributes to a deeper understanding of investor sentiment in the non-ferrous metal stock market but also provides insights into potential market behaviors, guiding investors and policymakers in making informed decisions.

APA, Harvard, Vancouver, ISO, and other styles

15

Rawat, Romil, Sonali Gupta, S. Sivaranjani, Om Kumar C.U., Megha Kuliha, and K. Sakthidasan Sankaran. "Malevolent Information Crawling Mechanism for Forming Structured Illegal Organisations in Hidden Networks." International Journal of Cyber Warfare and Terrorism 12, no. 1 (2022): 1–14. http://dx.doi.org/10.4018/ijcwt.311422.

Full text

Abstract:

Terrorist groups like ISIS have made considerable use of the dark web (DW) to carry out their malicious objectives such as spreading propaganda, recruiting and radicalizing new recruits, and secretly raising finances. Al-Hayat Media Center, an ISIS media agency, released a link on their forum describing how to access their DW website. It also sent out the identical message over Telegram, which included links to a Tor service with a “.onion” address. This study develops an analytical framework for scraping and analyzing the DW on the internet. The authors successfully tested a web crawler to collect account information for thousands of merchants and their related marketplace listings using a case study marketplace. The paper explains how to scrape DW marketplaces in the most viable and effective way possible. The findings of the case study support the validity of the proposed analytical framework, which is useful for academics researching this emerging phenomena as well as investigators looking into illegal behavior on the DW.

APA, Harvard, Vancouver, ISO, and other styles

16

Travers, Nicolas, Zeinab Hmedeh, Nelly Vouzoukidou, Cedric du Mouza, Vassilis Christophides, and Michel Scholl. "RSS feeds behavior analysis, structure and vocabulary." International Journal of Web Information Systems 10, no. 3 (2014): 291–320. http://dx.doi.org/10.1108/ijwis-06-2014-0023.

Full text

Abstract:

Purpose – The purpose of this paper is to present a thorough analysis of three complementary features of real-scale really simple syndication (RSS)/Atom feeds, namely, publication activity, items characteristics and their textual vocabulary, that the authors believe are crucial for emerging Web 2.0 applications. Previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds’ behavior and content, characterization that can be used to successfully benchmark the effectiveness and efficiency of various Web syndication processing/analysis techniques. Design/methodology/approach – The authors empirical study relies on a large-scale testbed acquired over an eight-month campaign from 2010. They collected a total number of 10,794,285 items originating from 8,155 productive feeds. The authors deeply analyze feeds productivity (types and bandwidth), content (XML, text and duplicates) and textual content (vocabulary and buzz-words). Findings – The findings of the study are as follows: 17 per cent of feeds produce 97 per cent of the items; a formal characterization of feeds publication rate conducted by using a modified power law; most popular textual elements are the title and description, with the average size of 52 terms; cumulative item size follows a lognormal distribution, varying greatly with feeds type; 47 per cent of the feed-published items share the same description; the vocabulary does not belong to Wordnet terms (4 per cent); characterization of vocabulary growth using Heaps’ laws and the number of occurrences by a stretched exponential distribution conducted; and ranking of terms does not significantly vary for frequent terms. Research limitations/implications – Modeling dedicated Web applications capacities, Defining benchmarks, optimizing Publish/Subscribe index structures. Practical implications – It especially opens many possibilities for tuning Web applications, like an RSS crawler designed with a resource allocator and a refreshing strategy based on the Gini values and evolution to predict bursts for each feed, according to their category and class for targeted feeds; an indexing structure which matches textual items’ content, which takes into account item size according to targeted feeds, size of the vocabulary and term occurrences, updates of the vocabulary and evolution of term ranks, typos and misspelling correction; filtering by pruning items for content duplicates of different feeds and correlation of terms to easily detect replicates. Originality/value – A content-oriented analysis of dynamic Web information.

APA, Harvard, Vancouver, ISO, and other styles

17

Gaigalas, Di, and Sun. "Advanced Cyberinfrastructure to Enable Search of Big Climate Datasets in THREDDS." ISPRS International Journal of Geo-Information 8, no. 11 (2019): 494. http://dx.doi.org/10.3390/ijgi8110494.

Full text

Abstract:

Understanding the past, present, and changing behavior of the climate requires close collaboration of a large number of researchers from many scientific domains. At present, the necessary interdisciplinary collaboration is greatly limited by the difficulties in discovering, sharing, and integrating climatic data due to the tremendously increasing data size. This paper discusses the methods and techniques for solving the inter-related problems encountered when transmitting, processing, and serving metadata for heterogeneous Earth System Observation and Modeling (ESOM) data. A cyberinfrastructure-based solution is proposed to enable effective cataloging and two-step search on big climatic datasets by leveraging state-of-the-art web service technologies and crawling the existing data centers. To validate its feasibility, the big dataset served by UCAR THREDDS Data Server (TDS), which provides Petabyte-level ESOM data and updates hundreds of terabytes of data every day, is used as the case study dataset. A complete workflow is designed to analyze the metadata structure in TDS and create an index for data parameters. A simplified registration model which defines constant information, delimits secondary information, and exploits spatial and temporal coherence in metadata is constructed. The model derives a sampling strategy for a high-performance concurrent web crawler bot which is used to mirror the essential metadata of the big data archive without overwhelming network and computing resources. The metadata model, crawler, and standard-compliant catalog service form an incremental search cyberinfrastructure, allowing scientists to search the big climatic datasets in near real-time. The proposed approach has been tested on UCAR TDS and the results prove that it achieves its design goal by at least boosting the crawling speed by 10 times and reducing the redundant metadata from 1.85 gigabytes to 2.2 megabytes, which is a significant breakthrough for making the current most non-searchable climate data servers searchable.

APA, Harvard, Vancouver, ISO, and other styles

18

Yu, Yan, Peiyu Xu, Shuo Liu, Taiming He, Lu Yang, and Junqiang Zhang. "An unsupervised machine learning-based profile system of Chinese researchers." Journal of Infrastructure, Policy and Development 8, no. 11 (2024): 7281. http://dx.doi.org/10.24294/jipd.v8i11.7281.

Full text

Abstract:

The construction of researcher profiles is crucial for modern research management and talent assessment. Given the decentralized nature of researcher information and evaluation challenges, we propose a profile system for Chinese researchers based on unsupervised machine learning and algorithms. This system builds comprehensive profiles based on researchers’ basic and behavior information dimensions. It employs Selenium and Web Crawler for real-time data retrieval from academic platforms, utilizes TF-IDF and BERT for expertise recognition, DTM for academic dynamics, and K-means clustering for profiling. The experimental results demonstrate that these methods are capable of more accurately mining the academic expertise of researchers and performing domain clustering scoring, thereby providing a scientific basis for the selection and academic evaluation of research talents. This interactive analysis system aims to provide an intuitive platform for profile construction and analysis.

APA, Harvard, Vancouver, ISO, and other styles

19

Ju, Chunhua, and Shuangzhu Zhang. "Influencing Factors of Continuous Use of Web-Based Diagnosis and Treatment by Patients With Diabetes: Model Development and Data Analysis." Journal of Medical Internet Research 22, no. 9 (2020): e18737. http://dx.doi.org/10.2196/18737.

Full text

Abstract:

Background The internet has become a major source of health care information for patients and has enabled them to obtain continuous diagnosis and treatment services. However, the quality of web-based health care information is mixed, which raises concerns about the credibility of physician advice obtained on the internet and markedly affects patients’ choices and decision-making behavior with regard to web-based diagnosis and treatment. Therefore, it is important to identify the influencing factors of continuous use of web-based diagnosis and treatment from the perspective of trust. Objective The objective of our study was to investigate the influencing factors of patients’ continuous use of web-based diagnosis and treatment based on the elaboration likelihood model and on trust theory in the face of a decline in physiological conditions and the lack of convenient long-term professional guidance. Methods Data on patients with diabetes in China who used an online health community twice or more from January 2018 to June 2019 were collected by developing a web crawler. A total of 2437 valid data records were obtained and then analyzed using correlation factor analysis and regression analysis to validate our research model and hypotheses. Results The timely response rate (under the central route), the reference group (under the peripheral route), and the number of thank-you letters and patients’ ratings that measure physicians’ electronic word of mouth are all positively related with the continuous use of web-based diagnosis and treatment by patients with diabetes. Moreover, the physician’s professional title and hospital’s ranking level had weak effects on the continuous use of web-based diagnosis and treatment by patients with diabetes, and the effect size of the physician’s professional title was greater than that of the hospital’s ranking level. Conclusions From the patient's perspective, among all indicators that measure physicians’ service quality, the effect size of a timely response rate is much greater than those of effect satisfaction and attitude satisfaction; thus, the former plays an essential role in influencing the patients’ behavior of continuous use of web-based diagnosis and treatment services. In addition, the effect size of electronic word of mouth was greater than that of the physician’s offline reputation. Physicians who provide web-based services should seek clues to patients’ needs and preferences for receiving health information during web-based physician-patient interactions and make full use of their professionalism and service reliability to communicate effectively with patients. Furthermore, the platform should improve its electronic word of mouth mechanism to realize its full potential in trust transmission and motivation, ultimately promoting the patient’s information-sharing behavior and continuous use of web-based diagnosis and treatment.

APA, Harvard, Vancouver, ISO, and other styles

20

Li, Lin, and Sang-Bing Tsai. "An Empirical Study on the Precise Employment Situation-Oriented Analysis of Digital-Driven Talents with Big Data Analysis." Mathematical Problems in Engineering 2022 (January 4, 2022): 1–11. http://dx.doi.org/10.1155/2022/8758898.

Full text

Abstract:

This paper conducts an in-depth research analysis on the precise employment of college graduates in the context of big data using a number-driven approach. The textual information of the study is obtained by using in-depth interviews, and the evaluation index system of college students’ employment quality is constructed by combining the step-by-step coding method with rooting theory. The research on the current situation of employment recommendation platform research and the application status of big data in the employment recommendation platform is explored by using a bibliometric approach. And the innovative use of web crawler technology is used to comprehensively understand the recommendation function and status quo of the same type of recommendation platform, which provides a reference for the research of this platform. Based on the preliminary analysis of platform requirements and overall design, the overall design and functional implementation of the big data employment recommendation platform are carried out by using big data crawler technology, big data architecture technology, text mining technology, database technology, etc. The construction of a recommendation module based on user history information, a recommendation based on real-time user online behavior data, and hybrid recommendation carried out on the recommendation module to grasp all-round the platform is built based on a stakeholder perspective. Based on the platform construction, the initial platform operation and maintenance management mechanism was established from the stakeholder’s perspective. The Pearson correlation coefficient is used to objectively evaluate the current situation of talent supply in universities and talent demand in enterprises from the perspective of image and data. In the research on the development status of the big data education industry, the Lorenz curve and Gini coefficient are used to match the status of new big data majors with their college construction volume in each province and provide data support for the reasonable adjustment of majors setting in each province according to the education level.

APA, Harvard, Vancouver, ISO, and other styles

21

Jose, Jeeva, and Lal P. Sojan. "Analysis of the Temporal Behaviour of Search Engine Crawlers at Web Sites." COMPUSOFT: An International Journal of Advanced Computer Technology 02, no. 06 (2013): 136–42. https://doi.org/10.5281/zenodo.14602851.

Full text

Abstract:

Web log mining is the extraction of web logs to analyze user behaviour at web sites. In addition to user information, web logs provide immense information about search engine traffic and behaviour. Search engine crawlers are highly automated programs that periodically visit the web site to collect information. The behaviour of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine crawlers, ethics of search engines etc. The time spent by various crawlers is significant in identifying the server load as major proportion of the server load is constituted by search engine crawlers. A temporal analysis of the search engine crawlers were done to identify their behaviour. It was found that there is a significant difference in the total time spent by various crawlers. The presence of search engine crawlers at web sites on hourly basis was also done to identify the dynamics of search engine crawlers at web sites. 

APA, Harvard, Vancouver, ISO, and other styles

22

Ma, Zirui, and Bin Gu. "The influence of firm-Generated video on user-Generated video: Evidence from China." International Journal of Engineering Business Management 14 (January 2022): 184797902211186. http://dx.doi.org/10.1177/18479790221118628.

Full text

Abstract:

We examined the impact of firm-generated content on firm-related user-generated content. Researches have proven that firm-related user-generated content impacts firm revenue. Therefore, it is necessary to determine what factors stimulate the creation of firm-related user-generated content. Creating user-generated videos related to companies (e.g. E-sports) on Chinese online video platforms often includes the use of firm-generated videos as material. It may suggest that firm-generated content may play an essential role in influencing Internet users to create firm-related user-generated content. We collected a unique dataset using a Python web crawler and used the LSDV model for empirical analysis, including 2977 firm-generated videos and 49860 user-generated videos, to explore the impact of firm-generated video attributes on user-generated videos. The results show that some attributes of firm-generated video have a significant impact on user-generated video. The number of comments and coins on firm-generated videos positively affects user-generated videos, while the number of favorite on firm-generated videos negatively affects user-generated videos. We also found that the difference in how users feel about firm-generated videos affects the likelihood of users creating their original videos. User engagement and user brand identity in FGV positively impact stimulating user-generated videos. The authors suggest that companies can maximize the impact on user-generated content by targeting the creation of firm-generated content based on these video attributes. The authors also suggest that firms further investigate theories related to motivational factors, such as the impact of consumer engagement, brand identity, and perceived usefulness on user-generated content.

APA, Harvard, Vancouver, ISO, and other styles

23

Menshchikov, Aleksandr, Antonina Komarova, and Yurij Gatchin. "A Study of Web-Crawlers Behaviour." Voprosy kiberbezopasnosti, no. 3(21) (2017): 49–54. http://dx.doi.org/10.21681/2311-3456-2017-3-49-54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Menshchikov, A. A., A. V. Komarova, Y. A. Gatchin, M. E. Kalinkina, V. L. Tkalich, and O. I. Pirozhnikova. "Modeling the behavior of web crawlers on a web resource." Journal of Physics: Conference Series 1679 (November 2020): 032043. http://dx.doi.org/10.1088/1742-6596/1679/3/032043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Chittiprolu, Vinay, Nagaraj Samala, and Raja Shekhar Bellamkonda. "Heritage hotels and customer experience: a text mining analysis of online reviews." International Journal of Culture, Tourism and Hospitality Research 15, no. 2 (2021): 131–56. http://dx.doi.org/10.1108/ijcthr-02-2020-0050.

Full text

Abstract:

Purpose In business, online reviews have an economic impact on firm performance. Customers’ data in the form of online reviews was used to understand the appreciation and service complaints written by previous customers. The study is an analysis of the online reviews written by the customers about Indian heritage hotels. This study aims to understand the dimensions of service appreciation and service complaints by comparing positive- and negative-rated reviews and find the patterns in the determinants of the satisfaction and dissatisfaction of the customers. Design/methodology/approach A total of 23,643 online reviews about heritage hotels were collected from the TripAdvisor website by using a Web crawler developed in Python. A total of 1000 reviews were randomly selected for further analysis to eliminate the bandwagon effect. Unsupervised text mining techniques were used to analyze reviews and find out the interesting patterns in text data. Findings Based on Herzberg two-factor theory, this study found satisfied and dissatisfied determinants separately. The study revealed some common categories discussed by satisfied and dissatisfied customers. The factors which satisfy the customers may also dissatisfy the customers if not delivered properly. Satisfied customers mentioned about tangible features of the hotel stay, which includes physical signifiers, traditional services, staff behavior and professionalism and core products (rooms, food). However, most of the customers complained about intangible service problems, such as staff attitude, services failure, issues with reservation and food, value for money and room condition. The results are contradicting with commercial hotels-based studies owing to the unique services provided by heritage hotels. Practical implications The dimensions for satisfaction and dissatisfaction among customer of heritage hotels provide marketers to understand the real emotion and perception of the customers. As these dimensions were extracted through text mining of the reviews written by the customer of heritage hotels, the results would certainly give better insights to the hotel marketers. Originality/value The study is a rare attempt to study online reviews of customers on heritage hotels through a text mining approach and find the patterns in the behavior and the determinants of satisfaction and dissatisfaction of customers.

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Qi, Yi Yang, Zhengren Li, Na Liu, and Xiaohang Zhang. "Research on the influence of balance patch on players' character preference." Internet Research 30, no. 3 (2020): 995–1018. http://dx.doi.org/10.1108/intr-04-2019-0148.

Full text

Abstract:

PurposeThe balance patch is an important but not well studied area to maintain game fairness and improve player entertainment. In this paper, we examine the effect of balance patch on player's character preference and further explore the moderating effect of psychological distance and character selection pattern.Design/methodology/approachIn study 1, a web crawler was used to get server-side data of 40, 974 multi-player online battle arena (MOBA) players through official application programming interfaces (APIs). A paired-T test and a stepwise regression were performed to verify the hypothesis. In study 2, a 2-patch type (buff vs nerf) × 2 psychological distance (near vs distant) × 2 character selection pattern (stable vs variable) between-subjects design was adopted to confirm the empirical conclusions through questionnaire survey design and further explored the mediating effect of patch adjustment perception.FindingsThe analyzed results showed that the buff patch led to an increase in players' character preference, while the nerf patch led to a decrease in players' character preference. Moreover, the main effect was mediated by patch adjustment perception. Furthermore, psychological distance and character selection pattern both moderated the relationship between balance patch and character preference changes. The character preference of the near psychological distance increased more significantly elicited by buff patches and decreased more significantly in an adverse situation. Similarly, players with variable selection pattern of characters were more sensitive to the stimuli, and the character preference of the variable group changed more significantly than that of the stable group caused by balance patch release.Originality/valueThis paper studies the influence of a patch on the balance of character strength on player preference, which expands the research on game balance and fairness. The present results contribute to the theoretical research on consumer behavior of psychological distance and character selection pattern elicited by balance patches. Meanwhile, the results indicate that psychological distance theory can apply to the study of the relationship between players and virtual characters.

APA, Harvard, Vancouver, ISO, and other styles

27

Wang, Junze, Ying Zhou, Wei Zhang, Richard Evans, and Chengyan Zhu. "Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data." Journal of Medical Internet Research 22, no. 11 (2020): e22152. http://dx.doi.org/10.2196/22152.

Full text

Abstract:

Background The COVID-19 pandemic has created a global health crisis that is affecting economies and societies worldwide. During times of uncertainty and unexpected change, people have turned to social media platforms as communication tools and primary information sources. Platforms such as Twitter and Sina Weibo have allowed communities to share discussion and emotional support; they also play important roles for individuals, governments, and organizations in exchanging information and expressing opinions. However, research that studies the main concerns expressed by social media users during the pandemic is limited. Objective The aim of this study was to examine the main concerns raised and discussed by citizens on Sina Weibo, the largest social media platform in China, during the COVID-19 pandemic. Methods We used a web crawler tool and a set of predefined search terms (New Coronavirus Pneumonia, New Coronavirus, and COVID-19) to investigate concerns raised by Sina Weibo users. Textual information and metadata (number of likes, comments, retweets, publishing time, and publishing location) of microblog posts published between December 1, 2019, and July 32, 2020, were collected. After segmenting the words of the collected text, we used a topic modeling technique, latent Dirichlet allocation (LDA), to identify the most common topics posted by users. We analyzed the emotional tendencies of the topics, calculated the proportional distribution of the topics, performed user behavior analysis on the topics using data collected from the number of likes, comments, and retweets, and studied the changes in user concerns and differences in participation between citizens living in different regions of mainland China. Results Based on the 203,191 eligible microblog posts collected, we identified 17 topics and grouped them into 8 themes. These topics were pandemic statistics, domestic epidemic, epidemics in other countries worldwide, COVID-19 treatments, medical resources, economic shock, quarantine and investigation, patients’ outcry for help, work and production resumption, psychological influence, joint prevention and control, material donation, epidemics in neighboring countries, vaccine development, fueling and saluting antiepidemic action, detection, and study resumption. The mean sentiment was positive for 11 topics and negative for 6 topics. The topic with the highest mean of retweets was domestic epidemic, while the topic with the highest mean of likes was quarantine and investigation. Conclusions Concerns expressed by social media users are highly correlated with the evolution of the global pandemic. During the COVID-19 pandemic, social media has provided a platform for Chinese government departments and organizations to better understand public concerns and demands. Similarly, social media has provided channels to disseminate information about epidemic prevention and has influenced public attitudes and behaviors. Government departments, especially those related to health, can create appropriate policies in a timely manner through monitoring social media platforms to guide public opinion and behavior during epidemics.

APA, Harvard, Vancouver, ISO, and other styles

28

Chen, Yulin. "Analysis of Community Interaction Modules of European and American Universities." Journalism and Media 2, no. 2 (2021): 129–54. http://dx.doi.org/10.3390/journalmedia2020009.

Full text

Abstract:

Purpose—Using a sample of universities from Europe and North America the research herein seeks to understand the content trends of university brand pages through an exploration of the social community and the measurement of user participation and behavior. The analysis relies on an artificial intelligence approach. Through the verification of interactions between users and content on the university brand pages, recommendations are made, which aim to ensure the pages meet the needs of users in the future. Design/methodology/approach—The study sample was drawn from six well-known universities in Europe and North America. The content of 23,158 posts made over the course of nine years between 1 January 2011 to 31 December 2019 was obtained by a web crawler. Concepts in the fields of computer science, data mining, big data and ensemble learning (Random Decision Forests, eXtreme Gradient Boosting and AdaBoost) were combined to analyze the results obtained from social media exploration. Findings—By exploring community content and using artificial intelligence analysis, the research identified key information on the university brand pages that significantly affected the cognition and behavior of users. The results suggest that distinct levels of user participation arise from the use of different key messages on the university fan page. The interactive characteristics identified within the study sample were classified as one of the following module-types: (a) information and entertainment satisfaction module, (b) compound identity verification module or (c) compound interactive satisfaction module. Research limitations/implications—The study makes a contribution to the literature by developing a university community information interaction model, which explains different user behaviors, and by examining the impact of common key (image) clues contained within community information. This work also confirms that the behavioral involvement of users on the university’s brand page is closely related to the information present within the university community. A limitation of the study was the restriction of the sample to only European and North American cultural and economic backgrounds and the use of Facebook as the sole source of information about the university community. Practical implications—Practically, the research contributes to our understanding of how, in official community interactions, user interactions can be directed by features such as information stimuli and brand meanings. In addition, the work clarifies the relationship between information and user needs, explaining how the information characteristics and interaction rules particular to a given school can be strengthened in order to better manage the university brand page and increase both the attention and interaction of page users. Originality/value—This research provides an important explanation of the role of key information on the university fan pages and verifies the importance of establishing key (image) clues in the brand community, which in turn affect user cognition and interaction. Although related research exists on information manipulation and the importance of online communities, few studies have directly discussed the influence of key information on the fan pages of university brands. Therefore, this research will help to fill gaps in the literature and practice by examining a specific context, while at the same time providing a valuable and specific reference for the community operation and management of other related university brands.

APA, Harvard, Vancouver, ISO, and other styles

29

ANNA LÁZÁR, KATALIN. "A FORMAL LANGUAGE THEORETIC APPROACH TO DISTRIBUTED COMPUTING ON DYNAMIC NETWORKS." Advances in Complex Systems 13, no. 03 (2010): 253–80. http://dx.doi.org/10.1142/s0219525910002608.

Full text

Abstract:

In this paper, we present a formal language theoretic approach to the behavior of complex systems of cooperating and communicating agents performing distributed computation on dynamic networks. In particular, we model peer-to-peer networks and the information harvest of Internet crawlers on the World Wide Web, employing grammar systems theoretical constructions. In grammar systems theory, the grammars can be interpreted as agents, whilst the generated language describes the behavior of the system. To characterize the various phenomena that may arise in peer-to-peer networks, we apply networks of parallel multiset string processors. The multiset string processors form teams, send and receive information through collective and individual filters. We deal with the dynamics of the string collections. To describe the information harvest of the crawlers, we employ certain regulated rewriting devices in eco-grammar systems. We illustrate the wide range of applicability of the regulated rewriting devices in the field of web crawling techniques. We demonstrate that these eco-grammar systems with rather simple component grammars suffice to identify any recursively enumerable language.

APA, Harvard, Vancouver, ISO, and other styles

30

Ravichandran, S., M. Umamaheswari, and S. Lakshminarayanan. "Design and Development of an Improved Scheme for Automated Analysis of User Behaviour Profiles on Web Search Engine." Asian Journal of Science and Applied Technology 6, no. 1 (2017): 22–27. http://dx.doi.org/10.51983/ajsat-2017.6.1.941.

Full text

Abstract:

All business web crawlers give back similar results for a similar inquiry, paying little respect to the client’s genuine intrigue. Since inquiries submitted to web indexes have a tendency to be short and uncertain, they are not liable to have the capacity to express the client’s exact needs. They make discovering data on the web fast and simple. A noteworthy inadequacy of non-specific web indexes is that they take after the ”one size fits all” model and are not versatile to individual clients. Distinctive clients have diverse foundations and interests. In any case, successful personalization can’t be accomplished without precise client profiles. Various grouping calculations have been utilized to arrange client related data to make precise client profiles. In this paper, it presents develops client conduct profile naturally as a methods for the execution internet searcher that is gone for building on the web, versatile shrewd frameworks that have both their structure and usefulness advancing in time.

APA, Harvard, Vancouver, ISO, and other styles

31

Zhao, Xu, Wenju Zhang, Weijun He, and Chuanchao Huang. "Research on customer purchase behaviors in online take-out platforms based on semantic fuzziness and deep web crawler." Journal of Ambient Intelligence and Humanized Computing 11, no. 8 (2019): 3371–85. http://dx.doi.org/10.1007/s12652-019-01533-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Wang, Juying, and Xiaoqing Yu. "The Driving Path of Customer Sustainable Consumption Behaviors in the Context of the Sharing Economy—Based on the Interaction Effect of Customer Signal, Service Provider Signal, and Platform Signal." Sustainability 13, no. 7 (2021): 3826. http://dx.doi.org/10.3390/su13073826.

Full text

Abstract:

The sharing economy, based on collaboration, sharing, and innovation, has brought about a disruptive revolution in the transformation of the economy and provided a new operating mechanism for promoting sustainable consumption. Therefore, exploring which signals in the sharing economy can effectively stimulate customer consumption behaviors is of great significance. The research uses the signal-interpretation-response (I-I-R) model to build a research framework for customer sustainable consumption behaviors in the context of the sharing economy. With the help of web crawler technology, we captured customer online review data on Airbnb, the sharing accommodation platform, to study the driving path to interpret how multiple signals from different sources influence sustainable consumption behaviors. Regression research shows that the scores in the customer signal, the sustainable services provided in the service provider signal, the super-host certification in the platform signal, and the interactive effects of the three signals have a significant positive impact on customer sustainable consumption behaviors. Consequently, the increase of customer sustainable consumption behaviors improves sales performance. Furthermore, the fuzzy-set qualitative comparative analysis (fsQCA) found five configurations for customer sustainable consumption behaviors based on different property types. The research results provide a reference for strengthening customer sustainable consumption behaviors and improving the service quality of platforms and service providers.

APA, Harvard, Vancouver, ISO, and other styles

33

Más-Bleda, Amalia, Mike Thelwall, Kayvan Kousha, and Isidro F. Aguillo. "Successful researchers publicizing research online." Journal of Documentation 70, no. 1 (2014): 148–72. http://dx.doi.org/10.1108/jd-12-2012-0156.

Full text

Abstract:

Purpose – This study aims to explore the link creating behaviour of European highly cited scientists based upon their online lists of publications and their institutional personal websites. Design/methodology/approach – A total of 1,525 highly cited scientists working at European institutions were first identified. Outlinks from their online lists of publications and their personal websites pointing to a pre-defined collection of popular academic websites and file types were then gathered by a personal web crawler. Findings – Perhaps surprisingly, a larger proportion of social scientists provided at least one outlink compared to the other disciplines investigated. By far the most linked-to file type was PDF and the most linked-to type of target website was scholarly databases, especially the Digital Object Identifier website. Health science and life science researchers mainly linked to scholarly databases, while scientists from engineering, hard sciences and social sciences linked to a wider range of target websites. Both book sites and social network sites were rarely linked to, especially the former. Hence, whilst successful researchers frequently use the Web to point to online copies of their articles, there are major disciplinary and other differences in how they do this. Originality/value – This is the first study to analyse the outlinking patterns of highly cited researchers' institutional web presences in order to identify which web resources they use to provide access to their publications.

APA, Harvard, Vancouver, ISO, and other styles

34

Wang, Hao, Zonghu Wang, Bin Zhang, and Jun Zhou. "Information collection for fraud detection in P2P financial market." MATEC Web of Conferences 189 (2018): 06006. http://dx.doi.org/10.1051/matecconf/201818906006.

Full text

Abstract:

Fintech companies have been facing challenges from fraudulent behavior for a long time. Fraud rate in Chinese P2P financial market could go as high as 10%. It is crucial to collect sufficient information of the user as input to the anti-fraud process. Data collection framework for Fintech companies are different from conventional internet firms. With individual-based crawling request, we need to deal with new challenges negligible elsewhere. In this paper, we give an outline of how we collect data from the web to facilitate our anti-fraud process. We also overview the challenges and solutions to our problems. Our team at HC Financial Service Group is one of the few companies that are capable of developing full-fledged crawlers on our own.

APA, Harvard, Vancouver, ISO, and other styles

35

Zhang, Bo, Pan Xiao, and Xiaohong Yu. "The Influence of Prosocial and Antisocial Emotions on the Spread of Weibo Posts: A Study of the COVID-19 Pandemic." Discrete Dynamics in Nature and Society 2021 (September 11, 2021): 1–9. http://dx.doi.org/10.1155/2021/8462264.

Full text

Abstract:

This study investigates the influences of the prosocial and antisocial tendency of Weibo users on post transmission during the COVID-19 pandemic. To overcome the deficiency of existing research on prosocial and antisocial emotions, we employ a web crawler technology to obtain post data from Weibo and identify texts with prosocial or antisocial emotions. We use SnowNLP to construct semantic dictionaries and training models. Our major findings include the following. First, through correlation analysis and negative binomial regression, we find that user posts with high intensity and prosocial emotion can trigger comments or forwarding behaviour. Second, the influence of antisocial emotion on Weibo comments, likes, and retweets are insignificant. Third, the general emotion about prosocial comments in Weibo also shows the emotion trend of prosocial comments. Overall, a major contribution of this paper is our focus on prosocial and antisocial emotions in cyberspace, providing a new perspective on emotion communication.

APA, Harvard, Vancouver, ISO, and other styles

36

Ramos, Célia M. Q., Daniel Jorge Martins, Francisco Serra, et al. "Framework for a Hospitality Big Data Warehouse." International Journal of Information Systems in the Service Sector 9, no. 2 (2017): 27–45. http://dx.doi.org/10.4018/ijisss.2017040102.

Full text

Abstract:

In order to increase the hotel's competitiveness, to maximize its revenue, to meliorate its online reputation and improve customer relationship, the information about the hotel's business has to be managed by adequate information systems (IS). Those IS should be capable of returning knowledge from a necessarily large quantity of information, anticipating and influencing the consumer's behaviour. One way to manage the information is to develop a Big Data Warehouse (BDW), which includes information from internal sources (e.g., Data Warehouse) and external sources (e.g., competitive set and customers' opinions). This paper presents a framework for a Hospitality Big Data Warehouse (HBDW). The framework includes a (1) Web crawler that periodically accesses targeted websites to automatically extract information from them, and a (2) data model to organize and consolidate the collected data into a HBDW. Additionally, the usefulness of this HBDW to the development of the business analytical tools is discussed, keeping in mind the implementation of the business intelligence (BI) concepts.

APA, Harvard, Vancouver, ISO, and other styles

37

Chandan, Srinath. "Dynamic GEN AI-Powered Web Crawling on Azure Using Automation Account and GPT-3.5." International Journal of Engineering and Advanced Technology (IJEAT) 14, no. 2 (2024): 6–10. https://doi.org/10.35940/ijeat.B4556.14021224.

Full text

Abstract:

<strong>Abstract: </strong>The integration of AI-powered automation in web crawling marks a significant advancement over traditional methods, which were often labor-intensive, inflexible, and prone to security risks. This paper presents a case study on the implementation of a dynamic web crawling solution using Azure Automation Account, leveraging GPT-3.5 from Azure OpenAI services. This new approach allows for parameterized execution via automation variables, enabling user-defined requirements to guide the crawler's behavior in a more flexible and intelligent manner. Unlike previous static methods that required constant manual adjustments, our system uses GPT-3.5's Natural Language Processing (NLP) capabilities to interpret complex instructions and dynamically adapt to various web structures. Post-crawling, the data undergoes a security scan using ClamAV, ensuring its integrity before storage in Azure Blob Storage. SendGrid is employed for user alerts regarding the scan results and storage status. The system is scheduled to run at regular intervals, fully automating the process while maintaining robust security protocols. This paper includes a detailed comparison between traditional web crawling techniques and this AI-driven approach, demonstrating the improvements in efficiency, security, and adaptability.

APA, Harvard, Vancouver, ISO, and other styles

38

Srinath, Chandan, and Sakshi Srivastava. "Dynamic GEN AI-Powered Web Crawling on Azure Using Automation Account and GPT-3.5." International Journal of Engineering and Advanced Technology 14, no. 2 (2024): 6–10. https://doi.org/10.35940/ijeat.b4556.14021224.

Full text

Abstract:

The integration of AI-powered automation in web crawling marks a significant advancement over traditional methods, which were often labor-intensive, inflexible, and prone to security risks. This paper presents a case study on the implementation of a dynamic web crawling solution using Azure Automation Account, leveraging GPT-3.5 from Azure OpenAI services. This new approach allows for parameterized execution via automation variables, enabling user-defined requirements to guide the crawler's behavior in a more flexible and intelligent manner. Unlike previous static methods that required constant manual adjustments, our system uses GPT-3.5's Natural Language Processing (NLP) capabilities to interpret complex instructions and dynamically adapt to various web structures. Post-crawling, the data undergoes a security scan using Clam AV, ensuring its integrity before storage in Azure Blob Storage. SendGrid is employed for user alerts regarding the scan results and storage status. The system is scheduled to run at regular intervals, fully automating the process while maintaining robust security protocols. This paper includes a detailed comparison between traditional web crawling techniques and this AI-driven approach, demonstrating the improvements in efficiency, security, and adaptability.

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, Yulin. "A Social Media Mining and Ensemble Learning Model: Application to Luxury and Fast Fashion Brands." Information 12, no. 4 (2021): 149. http://dx.doi.org/10.3390/info12040149.

Full text

Abstract:

This research proposes a framework for the fashion brand community to explore public participation behaviors triggered by brand information and to understand the importance of key image cues and brand positioning. In addition, it reviews different participation responses (likes, comments, and shares) to build systematic image and theme modules that detail planning requirements for community information. The sample includes luxury fashion brands (Chanel, Hermès, and Louis Vuitton) and fast fashion brands (Adidas, Nike, and Zara). Using a web crawler, a total of 21,670 posts made from 2011 to 2019 are obtained. A fashion brand image model is constructed to determine key image cues in posts by each brand. Drawing on the findings of the ensemble analysis, this research divides cues used by the six major fashion brands into two modules, image cue module and image and theme cue module, to understand participation responses in the form of likes, comments, and shares. The results of the systematic image and theme module serve as a critical reference for admins exploring the characteristics of public participation for each brand and the main factors motivating public participation.

APA, Harvard, Vancouver, ISO, and other styles

40

Aiello, Luca Maria, Martina Deplano, Rossano Schifanella, and Giancarlo Ruffo. "People Are Strange When You're a Stranger: Impact and Influence of Bots on Social Networks." Proceedings of the International AAAI Conference on Web and Social Media 6, no. 1 (2021): 10–17. http://dx.doi.org/10.1609/icwsm.v6i1.14236.

Full text

Abstract:

Bots are, for many Web and social media users, the source of many dangerous attacks or the carrier of unwanted messages, such as spam. Nevertheless, crawlers and software agents are a precious tool for analysts, and they are continuously executed to collect data or to test distributed applications. However, no one knows which is the real potential of a bot whose purpose is to control a community, to manipulate consensus, or to influence user behavior. It is commonly believed that the better an agent simulates human behavior in a social network, the more it can succeed to generate an impact in that community. We contribute to shed light on this issue through an online social experiment aimed to study to what extent a bot with no trust, no profile, and no aims to reproduce human behavior, can become popular and influential in a social media. Results show that a basic social probing activity can be used to acquire social relevance on the network and that the so-acquired popularity can be effectively leveraged to drive users in their social connectivity choices. We also register that our bot activity unveiled hidden social polarization patterns in the community and triggered an emotional response of individuals that brings to light subtle privacy hazards perceived by the user base.

APA, Harvard, Vancouver, ISO, and other styles

41

Kang, Minhyung. "Dual paths to continuous online knowledge sharing: a repetitive behavior perspective." Aslib Journal of Information Management 72, no. 2 (2019): 159–78. http://dx.doi.org/10.1108/ajim-05-2019-0127.

Full text

Abstract:

Purpose Continuous knowledge sharing by active users, who are highly active in answering questions, is crucial to the sustenance of social question-and-answer (Q&A) sites. The purpose of this paper is to examine such knowledge sharing considering reason-based elaborate decision and habit-based automated cognitive processes. Design/methodology/approach To verify the research hypotheses, survey data on subjective intentions and web-crawled data on objective behavior are utilized. The sample size is 337 with the response rate of 27.2 percent. Negative binomial and hierarchical linear regressions are used given the skewed distribution of the dependent variable (i.e. the number of answers). Findings Both elaborate decision (linking satisfaction, intentions and continuance behavior) and automated cognitive processes (linking past and continuance behavior) are significant and substitutable. Research limitations/implications By measuring both subjective intentions and objective behavior, it verifies a detailed mechanism linking continuance intentions, past behavior and continuous knowledge sharing. The significant influence of automated cognitive processes implies that online knowledge sharing is habitual for active users. Practical implications Understanding that online knowledge sharing is habitual is imperative to maintaining continuous knowledge sharing by active users. Knowledge sharing trends should be monitored to check if the frequency of sharing decreases. Social Q&A sites should intervene to restore knowledge sharing behavior through personalized incentives. Originality/value This is the first study utilizing both subjective intentions and objective behavior data in the context of online knowledge sharing. It also introduces habit-based automated cognitive processes to this context. This approach extends the current understanding of continuous online knowledge sharing behavior.

APA, Harvard, Vancouver, ISO, and other styles

42

Ding, Lilan, and Nurul Hanim Romainoor. "A study on the perception of Sichuan Museum tourism experience based on web text analysis." Journal of Social Science and Humanities 5, no. 5 (2022): 1–9. http://dx.doi.org/10.26666/rmp.jssh.2022.5.1.

Full text

Abstract:

Museum tourism forms a key element of cultural tourism. Museums are a microcosm of local culture, allowing tourists a window into local history, culture and characteristics in a time and physical space. Using the Sichuan Museum as a case study, this paper uses Python data mining techniques to crawl a total of 4332 visitor web reviews. The text content analysis method was used to explore the characteristics of visitor perceptions of their experience during the Sichuan Museum tour. The results revealed that visitors' behavior is mainly characterized by the following four aspects: "visiting, feeling, learning and taking photos". 73.12% of visitors' reviews showing positive emotions, 18.32% of reviews revealing neutral emotions and only 8.56% of visitor reviews containing negative emotions.

APA, Harvard, Vancouver, ISO, and other styles

43

Wang, Ru, Shuhui Xu, Shugang Li, and Qiwei Pang. "Research on Influence Mechanism of Consumer Satisfaction Evaluation Behavior Based on Grounded Theory in Social E-Commerce." Systems 12, no. 12 (2024): 572. https://doi.org/10.3390/systems12120572.

Full text

Abstract:

For enterprises, exploring the influence mechanism of consumer satisfaction evaluation behavior (CSEB) holds significant research value for the advancement and further development of social e-commerce platforms. The existing literature primarily focuses on quantitative methods in investigating the influence mechanism of CSEB within social e-commerce platforms. This study endeavors to expand the theoretical boundaries of CSEB through qualitative research. This study adopts a mixed-methods approach, combining primary data collected through in-depth interviews with 32 participants and secondary data gathered from 1000 users via web crawlers. Utilizing grounded theory as an analytical framework, the study meticulously summarizes, concludes, and refines the influencing factors of CSEB. Based on these findings, a robust CSEB model is constructed to provide a deeper understanding of the phenomenon. The study reveals that in the decision-making process of consumer evaluation, behavior is primarily driven by evaluation motivations. These motivations are intricately intertwined with product perception, social influence, and perceived behavior control. The interplay among these factors significantly shapes the manner in which consumers engage in satisfaction evaluation on social e-commerce platforms. This study complements existing quantitative research by providing nuanced insights into the complex interplay of factors, which drive consumer evaluation behavior. Furthermore, the study proposes actionable countermeasures and suggestions for businesses and platform managers to effectively promote and enhance consumer satisfaction evaluation activities, thereby contributing to the sustained growth and development of social e-commerce platforms.

APA, Harvard, Vancouver, ISO, and other styles

44

Chen, Jizhuo. "Empirical Study on the Regulation of Data Crawling Behavior under the Anti-Unfair Competition Law." Advances in Social Behavior Research 16, no. 2 (2025): 1–11. https://doi.org/10.54254/2753-7102/2025.21604.

Full text

Abstract:

Data crawling refers to the automated process of acquiring and storing web information, with data crawlers being one of its most widely used forms. The webpage acquisitionwebpage filteringwebpage storage method of crawling, along with data transactions, often involves breaches of contract, infringements, unfair competition disputes, and other compliance-related legal risks. When courts handle such cases, they have generally adopted the Anti-Unfair Competition Law as the legal basis for regulating data crawling and its subsequent applications, achieving widespread legal consensus. In assessing the scope of unfair competition behavior, the judicial community has widely accepted a moderate extension of the criteria for identifying competitive relationships, and more cases are being adjudicated under the second article of the Anti-Unfair Competition Law. Courts generally use the Anti-Unfair Competition Law as the legal framework when reviewing data crawling behavior, while also emphasizing the balancing of multiple interests. At the same time, challenges arise in case rulings regarding the identification of competitive relationships, damage caused by competition, and the determination of business ethics, which necessitate the optimization of existing criteria and the introduction of new standards to enhance their recognizability and operability. Furthermore, when applying the general provisions of the Anti-Unfair Competition Law, judicial difficulties arise, calling for a return to the competitive law nature, achieving a regulatory model under dynamic competition, and introducing economic analysis standards to enhance the predictability of business ethics judgments.

APA, Harvard, Vancouver, ISO, and other styles

45

Gautam, Shivani, Rajesh Bhatia, and Shaily Jain. "Classification and analysis for Focused Crawled Textual Dataset for retrieving Indian origin scientists." International Journal of Experimental Research and Review 34, Special Vo (2023): 72–85. http://dx.doi.org/10.52756/ijerr.2023.v34spl.008.

Full text

Abstract:

Text classification also called (text categorization or text tagging) is a crucial and extensively used approach in Natural Language Processing (NLP), to predict unseen content documents into prearranged categories. In this paper, we evaluate the dataset construction and evaluation process as a component of text classification. To begin with, we produced a newly created dataset for Indian Origin Scientists for text classification, which was collected by applying focused crawling and web scraping techniques. We then demonstrate an extensive evaluation of numerous models on this recently constructed dataset. Our evaluations display that the Random forest model outperforms the rest of the supervised models. Our results produce a fine beginning for additional research in Indian Origin Scientists' classification of text. Investigational outcome with K Nearest Neighbor, Logistic Regression, and Support Vector Machine for Indian-origin scientists produced much better performances for Random Forest when combined with SMOTE and K fold cross-validation techniques. We apply the Area under the ROC Curve to compute the effectiveness of the chosen models. Overall, the Random Forest classifier exhibited the best output along with 90% micro-average AUC.

APA, Harvard, Vancouver, ISO, and other styles

46

Min, Wang, and Zhilong You. "Attitudes of online users towards personal information leakage: based on Sina Weibo database." E3S Web of Conferences 251 (2021): 01044. http://dx.doi.org/10.1051/e3sconf/202125101044.

Full text

Abstract:

With the rapid development of internet, people pay more attention to personal information security. Drawing upon three components of attitude, this study was designed to realize online users’ attitudes toward personal information leakage. Web crawl program was used to get the blog data from Sina Weibo. Results show that the main media for personal information leak include mobile phones, telephone, media and networks. People who have verification published blog more than no verification people. People pay more attention to account number. Pioneer and No verification people have more negative affection. Personal account, VIP and Organization people have more positive affection. If the blog has higher interaction, positive affect will also rise. Media’s blogs exert an imperceptible influence on people’s behavior.

APA, Harvard, Vancouver, ISO, and other styles

47

Liang, Jialing, Peiquan Jin, Lin Mu, and Jie Zhao. "An Experimental Study of Spammer Detection on Chinese Microblogs." International Journal of Software Engineering and Knowledge Engineering 30, no. 11n12 (2020): 1759–77. http://dx.doi.org/10.1142/s021819402040029x.

Full text

Abstract:

With the development of Web 2.0, social media such as Twitter and Sina Weibo have become an essential platform for disseminating hot events. Simultaneously, due to the free policy of microblogging services, users can post user-generated content freely on microblogging platforms. Accordingly, more and more hot events on microblogging platforms have been labeled as spammers. Spammers will not only hurt the healthy development of social media but also introduce many economic and social problems. Therefore, the government and enterprises must distinguish whether a hot event on microblogging platforms is a spammer or is a naturally-developing event. In this paper, we focus on the hot event list on Sina Weibo and collect the relevant microblogs of each hot event to study the detecting methods of spammers. Notably, we develop an integral feature set consisting of user profile, user behavior, and user relationships to reflect various factors affecting the detection of spammers. Then, we employ typical machine learning methods to conduct extensive experiments on detecting spammers. We use a real data set crawled from the most prominent Chinese microblogging platform, Sina Weibo, and evaluate the performance of 10 machine learning models with five sampling methods. The results in terms of various metrics show that the Random Forest model and the over-sampling method achieve the best accuracy in detecting spammers and non-spammers.

APA, Harvard, Vancouver, ISO, and other styles

48

Kang, Yuhao. "Research on Unfair Competition in the Digital Economy: Innovation, Regulation, and Balancing Strategies." SHS Web of Conferences 200 (2024): 01007. http://dx.doi.org/10.1051/shsconf/202420001007.

Full text

Abstract:

Online unfair competition is on the rise, disrupting market order and violating consumer rights in the context of the fast developing digital economy. Therefore, the purpose of this essay is to investigate the nature and forms of unfair competition in the internet economy and how it affects innovation. In addition, it examines the shortcomings of the existing regulatory frameworks and suggests balanced approaches and effective techniques to encourage the digital economy's healthy expansion. Defamation, misleading advertising, and web crawlers are common examples of unfair competition on the Internet. Another kind of unfair competition is data mining, which involves acquiring rivals' customer information or trade secrets. These behaviors not only undermine the principle of fair competition in the market but also mislead consumers and harm their interests. There is a certain relationship between unfair competition on the Internet and innovation. On the one hand, unfair competition on the Internet may stem from companies' excessive pursuit of innovation. To stand out in fierce market competition, companies may resort to unfair means to gain competitive advantages. On the other hand, unfair competition on the Internet may also inhibit innovation, as companies may reduce their innovation investment due to concerns about competitors taking unfair measures. The current regulatory mechanism has certain shortcomings in addressing unfair competition behavior on the Internet. On the other hand, the regulatory mechanism's definition of unfair competition on the Internet is not clear enough, which makes it difficult for regulatory agencies to accurately judge and handle it in the law enforcement process. At the same time, the regulatory mechanism's punishment for unfair competition on the Internet is not strong enough to effectively deter illegal behavior. To address unfair competition on the Internet, it is necessary to establish effective regulatory mechanisms and balance measures. Firstly, it is necessary to clarify the definition of unfair competition on the Internet and establish specific regulatory rules and standards. Secondly, it is necessary to strengthen the enforcement efforts of regulatory agencies, increase the severity of penalties, and ensure the effectiveness of regulatory mechanisms. At the same time, it is also necessary to strengthen corporate self-discipline, promote compliance with market rules, and jointly maintain market order. In short, unfair competition on the Internet has had a negative impact on the development of the digital economy, and it is necessary to establish effective regulatory mechanisms and balance measures to promote the healthy development of the digital economy.

APA, Harvard, Vancouver, ISO, and other styles

49

Mishra, Vikas, Pierre Laperdrix, Walter Rudametkin, and Romain Rouvoy. "Déjà vu: Abusing Browser Cache Headers to Identify and Track Online Users." Proceedings on Privacy Enhancing Technologies 2021, no. 2 (2021): 391–406. http://dx.doi.org/10.2478/popets-2021-0033.

Full text

Abstract:

Abstract Many browser cache attacks have been proposed in the literature to sniff the user’s browsing history. All of them rely on specific time measurements to infer if a resource is in the cache or not. Unlike the state-of-the-art, this paper reports on a novel cache-based attack that is not a timing attack but that abuses the HTTP cache-control and expires headers to extract the exact date and time when a resource was cached by the browser. The privacy implications are serious as this information can not only be utilized to detect if a website was visited by the user but it can also help build a timeline of the user’s visits. This goes beyond traditional history sniffing attacks as we can observe patterns of visit and model user’s behavior on the web. To evaluate the impact of our attack, we tested it on all major browsers and found that all of them, except the ones based on WebKit, are vulnerable to it. Since our attack requires specific HTTP headers to be present, we also crawled the Tranco Top 100K websites and identified 12, 970 of them can be detected with our approach. Among them, 1, 910 deliver resources that have expiry dates greater than 100 days, enabling long-term user tracking. Finally, we discuss possible defenses at both the browser and standard levels to prevent users from being tracked.

APA, Harvard, Vancouver, ISO, and other styles

50

Bracciale, Lorenzo, Pierpaolo Loreti, Andrea Detti, and Nicola Blefari Melazzi. "Analysis of Data Persistence in Collaborative Content Creation Systems: The Wikipedia Case." Information 10, no. 11 (2019): 330. http://dx.doi.org/10.3390/info10110330.

Full text

Abstract:

A very common problem in designing caching/prefetching systems, distribution networks, search engines, and web-crawlers is determining how long a given content lasts before being updated, i.e., its update frequency. Indeed, while some content is not frequently updated (e.g., videos), in other cases revisions periodically invalidate contents. In this work, we present an analysis of Wikipedia, currently the 5th most visited website in the world, evaluating the statistics of updates of its pages and their relationship with page view statistics. We discovered that the number of updates of a page follows a lognormal distribution. We provide fitting parameters as well as a goodness of fit analysis, showing the statistical significance of the model to describe the empirical data. We perform an analysis of the views–updates relationship, showing that in a time period of a month, there is a lack of evident correlation between the most updated pages and the most viewed pages. However, observing specific pages, we show that there is a strong correlation between the peaks of views and updates, and we find that in more than 50% of cases, the time difference between the two peaks is less than a week. This reflects the underlying process whereby an event causes both an update and a visit peak that occurs with different time delays. This behavior can pave the way for predictive traffic analysis applications based on content update statistics. Finally, we show how the model can be used to evaluate the performance of an in-network caching scenario.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!