Journal articles: 'DOM (Document Object Model)'

1

Role, François, and Philippe Verdret. "Le Document Object Model (DOM)." Cahiers GUTenberg, no. 33-34 (1999): 155–71. http://dx.doi.org/10.5802/cg.265.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Yanlong, and Jinhua Liu. "Object-oriented Design based Comprehensive Experimental Development of Document Object Model." Advances in Engineering Technology Research 3, no. 1 (December 7, 2022): 390. http://dx.doi.org/10.56028/aetr.3.1.390.

Full text

Abstract:

JavaScript code using Document Object Model (DOM) can realize the dynamic control of Web pages, which is the important content of the Web development technology course. The application of DOM is very flexible and includes many knowledge points, so it is difficult for students to master. In order to help students to understand each knowledge point and improve their engineering ability to solve practical problems, a DOM comprehensive experiment project similar to blind box is designed and implemented. This experimental project integrates knowledge points such as DOM events, DOM operations, and communication between objects. Practice has proved that running and debugging of the project can help students to understand and master relevant knowledge points.

APA, Harvard, Vancouver, ISO, and other styles

3

Radilova, Martina, Patrik Kamencay, Robert Hudec, Miroslav Benco, and Roman Radil. "Tool for Parsing Important Data from Web Pages." Applied Sciences 12, no. 23 (November 24, 2022): 12031. http://dx.doi.org/10.3390/app122312031.

Full text

Abstract:

This paper discusses the tool for the main text and image extraction (extracting and parsing the important data) from a web document. This paper describes our proposed algorithm based on the Document Object Model (DOM) and natural language processing (NLP) techniques and other approaches for extracting information from web pages using various classification techniques such as support vector machine, decision tree techniques, naive Bayes, and K-nearest neighbor. The main aim of the developed algorithm was to identify and extract the main block of a web document that contains the text of the article and the relevant images. The algorithm on a sample of 45 web documents of different types was applied. In addition, the issue of web pages, from the structure of the document to the use of the Document Object Model (DOM) for their processing, was analyzed. The Document Object Model was used to load and navigation of the document. It also plays an important role in the correct identification of the main block of web documents. The paper also discusses the levels of natural language. These methods of automatic natural language processing help to identify the main block of the web document. In this way, the all-textual parts and images from the main content of the web document were extracted. The experimental results show that our method achieved a final classification accuracy of 88.18%.

APA, Harvard, Vancouver, ISO, and other styles

4

Ahmad Sabri, Ily Amalina, and Mustafa Man. "Improving Performance of DOM in Semi-structured Data Extraction using WEIDJ Model." Indonesian Journal of Electrical Engineering and Computer Science 9, no. 3 (March 1, 2018): 752. http://dx.doi.org/10.11591/ijeecs.v9.i3.pp752-763.

Full text

Abstract:

<p>Web data extraction is the process of extracting user required information from web page. The information consists of semi-structured data not in structured format. The extraction data involves the web documents in html format. Nowadays, most people uses web data extractors because the extraction involve large information which makes the process of manual information extraction takes time and complicated. We present in this paper WEIDJ approach to extract images from the web, whose goal is to harvest images as object from template-based html pages. The WEIDJ (Web Extraction Image using DOM (Document Object Model) and JSON (JavaScript Object Notation)) applies DOM theory in order to build the structure and JSON as environment of programming. The extraction process leverages both the input of web address and the structure of extraction. Then, WEIDJ splits DOM tree into small subtrees and applies searching algorithm by visual blocks for each web page to find images. Our approach focus on three level of extraction; single web page, multiple web page and the whole web page. Extensive experiments on several biodiversity web pages has been done to show the comparison time performance between image extraction using DOM, JSON and WEIDJ for single web page. The experimental results advocate via our model, WEIDJ image extraction can be done fast and effectively.</p>

APA, Harvard, Vancouver, ISO, and other styles

5

Sankari, S., and S. Bose. "Efficient Identification of Structural Relationships for XML Queries using Secure Labeling Schemes." International Journal of Intelligent Information Technologies 12, no. 4 (October 2016): 63–80. http://dx.doi.org/10.4018/ijiit.2016100104.

Full text

Abstract:

XML emerged as a de-facto standard for data representation and information exchange over the World Wide Web. By utilizing document object model (DOM), XML document can be viewed as XML DOM tree. Nodes of an XML tree are labeled to uniquely identify every node by following a labeling scheme. This paper proposes a method to efficiently identify the two structural relationships namely document order (DO) and sibling relationship that exist between the XML nodes using two secure labeling schemes specifically enhanced Dewey coding (EDC) and secure Dewey coding (SDC). These structural relationships influence the performance of XML queries so they need to be identified in efficient time. This paper implements the method to identify DO and sibling relationship using EDC and SDC labels for various real-time XML documents. Experiment results show the identification of DO and sibling relationship using SDC labels performs better than EDC labels for processing XML queries.

APA, Harvard, Vancouver, ISO, and other styles

6

Feng, Jian, Ying Zhang, and Yuqiang Qiao. "A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model." Journal of Computing and Information Technology 28, no. 1 (July 10, 2020): 19–31. http://dx.doi.org/10.20532/cit.2020.1004899.

Full text

Abstract:

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.

APA, Harvard, Vancouver, ISO, and other styles

7

Sabri, Ily Amalina Ahmad, and Mustafa Man. "A performance of comparative study for semi-structured web data extraction model." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 6 (December 1, 2019): 5463. http://dx.doi.org/10.11591/ijece.v9i6.pp5463-5470.

Full text

Abstract:

<span lang="EN-US">The extraction of information from multi-sources of web is an essential yet complicated step for data analysis in multiple domains. In this paper, we present a data extraction model based on visual segmentation, DOM tree and JSON approach which is known as Wrapper Extraction of Image using DOM and JSON (WEIDJ) for extracting semi-structured data from biodiversity web. The large number of information from multiple sources of web which is image’s information will be extracted using three different approach; Document Object Model (DOM), Wrapper image using Hybrid DOM and JSON (WHDJ) and Wrapper Extraction of Image using DOM and JSON (WEIDJ). Experiments were conducted on several biodiversity website. The experiment results show that WEIDJ approach promising results with respect to time analysis values. WEIDJ wrapper has successfully extracted greater than 100 images of data from the multi-source web biodiversity of over 15 different websites.</span>

APA, Harvard, Vancouver, ISO, and other styles

8

Ahmad Sabri, Ily Amalina, and Mustafa Man. "A deep web data extraction model for web mining: a review." Indonesian Journal of Electrical Engineering and Computer Science 23, no. 1 (July 1, 2021): 519. http://dx.doi.org/10.11591/ijeecs.v23.i1.pp519-528.

Full text

Abstract:

The World Wide Web has become a large pool of information. Extracting structured data from a published web pages has drawn attention in the last decade. The process of web data extraction (WDE) has many challenges, dueto variety of web data and the unstructured data from hypertext mark up language (HTML) files. The aim of this paper is to provide a comprehensive overview of current web data extraction techniques, in termsof extracted quality data. This paper focuses on study for data extraction using wrapper approaches and compares each other to identify the best approach to extract data from online sites. To observe the efficiency of the proposed model, we compare the performance of data extraction by single web page extraction with different models such as document object model (DOM), wrapper using hybrid dom and json (WHDJ), wrapper extraction of image using DOM and JSON (WEIDJ) and WEIDJ (no-rules). Finally, the experimentations proved that WEIDJ can extract data fastest and low time consuming compared to other proposed method.<br /><div> </div>

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Shuai, Ling Li Zhao, and Jun Sheng Li. "A Kind of Integrated Model for Panorama, Terrain and 3D Data Based on GML." Advanced Materials Research 955-959 (June 2014): 3850–53. http://dx.doi.org/10.4028/www.scientific.net/amr.955-959.3850.

Full text

Abstract:

Panorama image can provide 360 degrees view in one hotspot, which could solve the traditional three-dimensional expression of inadequate authenticity, difficult data acquisition as well as laborious and time-consuming modeling. However, we need other geographic information. So we propose a kind of integrated model based on GML, which contains a set of data structures to obtain panorama, terrain and 3D Data rapidly from the GML file, after analyzing GML files structure and parsing by the Document Object Model (DOM). The experiment shows that integrated model is very validated in web application using PTViewer, Java 3D and Web-related technologies.

APA, Harvard, Vancouver, ISO, and other styles

10

Ran, Peipei, Wenjie Yang, Zhongyue Da, and Yuke Huo. "Work orders management based on XML file in printing." ITM Web of Conferences 17 (2018): 03009. http://dx.doi.org/10.1051/itmconf/20181703009.

Full text

Abstract:

The Extensible Markup Language (XML) technology is increasingly used in various field, if it’s used to express the information of work orders will improve efficiency for management and production. According to the features, we introduce the technology of management for work orders and get a XML file through the Document Object Model (DOM) technology in the paper. When we need the information to conduct production, parsing the XML file and save the information in database, this is beneficial to the preserve and modify for information.

APA, Harvard, Vancouver, ISO, and other styles

11

Xia, Xiang, Zhi Shu Li, and Yi Xiang Fan. "The Advanced "Rich-Client" Method Based on DOM for the Dynamic and Configurable Web Application." Advanced Materials Research 756-759 (September 2013): 1691–95. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1691.

Full text

Abstract:

In order to meet the user requirement of the dynamic customization and configuration of the changeable and complicated page functionality on the client, when constructing the web application platform, an advanced rich-client method and technology based on DOM ( Document Object Model ) was designed and used to develop the client module. The client module with rich-client technology was in the traditional J2EE (Java 2 Enterprise Edition ) architecture which was the Client-Centric and MVC ( Model-View-Control ) mode. On the client side, according to the dynamic page generation algorithm, developers wrote JavaScript scripting language based on DOM and Ajax (Asynchronous JavaScript and XML) for user customization and choose the part of the third-party open-source Extjs ( Extendable JavaScript ) components as the page elements to generate client-side dynamic configuration interface. From a user experience perspective, The good performance test results of the advanced rich-client method effectively examine the distinguishing features of the new method.

APA, Harvard, Vancouver, ISO, and other styles

12

Uçar, Erdem, Erdinç Uzun, and Pınar Tüfekci. "A novel algorithm for extracting the user reviews from web pages." Journal of Information Science 43, no. 5 (September 1, 2016): 696–712. http://dx.doi.org/10.1177/0165551516666446.

Full text

Abstract:

Extracting the user reviews in websites such as forums, blogs, newspapers, commerce, trips, etc. is crucial for text processing applications (e.g. sentiment analysis, trend detection/monitoring and recommendation systems) which are needed to deal with structured data. Traditional algorithms have three processes consisting of Document Object Model (DOM) tree creation, extraction of features obtained from this tree and machine learning. However, these algorithms increase time complexity of extraction process. This study proposes a novel algorithm that involves two complementary stages. The first stage determines which HTML tags correspond to review layout for a web domain by using the DOM tree as well as its features and decision tree learning. The second stage extracts review layout for web pages in a web domain using the found tags obtained from the first stage. This stage is more time-efficient, being approximately 21 times faster compared to the first stage. Moreover, it achieves a relatively high accuracy of 96.67% in our experiments of review block extraction.

APA, Harvard, Vancouver, ISO, and other styles

13

Mironov, Valeriy V., Artem S. Gusarenko, and Nafisa I. Yusupova. "Software extract data from word-based documents situationally-oriented approach." Journal Of Applied Informatics 16, no. 96 (December 24, 2021): 66–83. http://dx.doi.org/10.37791/2687-0649-2021-16-6-66-83.

Full text

Abstract:

The article discusses the use of situation-oriented approach to software processing word-documents. The documents under consideration are prepared by the user in the environment of the Microsoft Word processor or its analogs and are used in the future as data sources. The openness of the Office Open XML and Open Document Format made it possible to apply the concept of virtual documents mapped to ZIP archives for programmatic access to XML components of word documents in a situational environment. The importance of developing preliminary agreements regarding the placement of information in the document for subsequent search and retrieval, for example, using pre-prepared templates, is substantiated. For the DOCX and ODT formats, the article discusses the use of key phrases, bookmarks, content controls, custom XML components to organize the extraction of entered data. For each option, tree-like models of access to the extracted data, as well as the corresponding XPath expressions, are built. It is noted that the use of one or another option depends on the functionality and limitations of the word processor and is characterized by varying complexity of developing a blank template, entering data by the user and programming data extraction. The applied solution is based on entering metadata into the article using content controls placed in a stub template and bound to elements of a custom XML component. The developed hierarchical situational model of HSM provides extraction of an XML component, loading it into a DOM object and XSLT transformations to obtain the resulting data: an error report and JavaScript code for subsequent use of the extracted metadata.

APA, Harvard, Vancouver, ISO, and other styles

14

Yu, Lehe, and Zhengxiu Gui. "Analysis of Enterprise Social Media Intelligence Acquisition Based on Data Crawler Technology." Entrepreneurship Research Journal 11, no. 2 (February 22, 2021): 3–23. http://dx.doi.org/10.1515/erj-2020-0267.

Full text

Abstract:

Abstract There are generally hundreds of millions of nodes in social media, and they are connected to a huge social network through attention and fan relationships. The news is spread through this huge social network. This paper studies the acquisition technology of social media topic data and enterprise data. The topic positioning technology based on Sina meta search and topic related keywords is introduced, and the crawling efficiency of topic crawlers is analyzed. Aiming at the factors of diverse and variable webpage structure on the Internet, this paper proposes a new Web information extraction algorithm by studying the general laws existing in the webpage structure, combining DOM (Document Object Model) tree and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Several links in the algorithm are introduced in detail, including Web page processing, DOM tree construction, segmented text content acquisition, and web content extraction based on the DBSCAN algorithm. The simulation results show that the intelligence culture, intelligence system, technology platform and intelligence organization ecological collaboration strategy under the extraction of DOM tree and DBSCAN information can improve the level of intelligence participation of all employees. There is a significant positive correlation between the level of participation and the level of the intelligence environment of all employees. According to the research results, the DOM tree and DBSCAN information proposed in this paper can extract the enterprise’s employee intelligence and the effective implementation of relevant collaborative strategies, which can provide guidance for the effective implementation of the employee intelligence.

APA, Harvard, Vancouver, ISO, and other styles

15

He, Zecheng, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby Lee, and Jindong Chen. "ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 7 (May 18, 2021): 5931–38. http://dx.doi.org/10.1609/aaai.v35i7.16741.

Full text

Abstract:

As mobile devices are becoming ubiquitous, regularly interacting with a variety of user interfaces (UIs) is a common aspect of daily life for many people. To improve the accessibility of these devices and to enable their usage in a variety of settings, building models that can assist users and accomplish tasks through the UI is vitally important. However, there are several challenges to achieve this. First, UI components of similar appearance can have different functionalities, making understanding their function more important than just analyzing their appearance. Second, domain-specific features like Document Object Model (DOM) in web pages and View Hierarchy (VH) in mobile applications provide important signals about the semantics of UI elements, but these features are not in a natural language format. Third, owing to a large diversity in UIs and absence of standard DOM or VH representations, building a UI understanding model with high coverage requires large amounts of training data. Inspired by the success of pre-training based approaches in NLP for tackling a variety of problems in a data-efficient way, we introduce a new pre-trained UI representation model called ActionBert. Our methodology is designed to leverage visual, linguistic and domain-specific features in user interaction traces to pre-train generic feature representations of UIs and their components. Our key intuition is that user actions, e.g., a sequence of clicks on different UI components, reveals important information about their functionality. We evaluate the proposed model on a wide variety of downstream tasks, ranging from icon classification to UI component retrieval based on its natural language description. Experiments show that the proposed ActionBert model outperforms multi-modal baselines across all downstream tasks by up to 15.5%.

APA, Harvard, Vancouver, ISO, and other styles

16

Miyashita, Hisashi, and Hironobu Takagi. "Multimedia Content Formats in Depth; How Do They Make Interactive Broadcast/Communication Services Possible? (4); Declarative Data Format (3) -Document Object Model (DOM)/Scripting Language-." Journal of The Institute of Image Information and Television Engineers 61, no. 4 (2006): 453–58. http://dx.doi.org/10.3169/itej.61.453.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sulikowski, Piotr, Tomasz Zdziebko, Kristof Coussement, Krzysztof Dyczkowski, Krzysztof Kluza, and Karina Sachpazidu-Wójcicka. "Gaze and Event Tracking for Evaluation of Recommendation-Driven Purchase." Sensors 21, no. 4 (February 16, 2021): 1381. http://dx.doi.org/10.3390/s21041381.

Full text

Abstract:

Recommendation systems play an important role in e-commerce turnover by presenting personalized recommendations. Due to the vast amount of marketing content online, users are less susceptible to these suggestions. In addition to the accuracy of a recommendation, its presentation, layout, and other visual aspects can improve its effectiveness. This study evaluates the visual aspects of recommender interfaces. Vertical and horizontal recommendation layouts are tested, along with different visual intensity levels of item presentation, and conclusions obtained with a number of popular machine learning methods are discussed. Results from the implicit feedback study of the effectiveness of recommending interfaces for four major e-commerce websites are presented. Two different methods of observing user behavior were used, i.e., eye-tracking and document object model (DOM) implicit event tracking in the browser, which allowed collecting a large amount of data related to user activity and physical parameters of recommending interfaces. Results have been analyzed in order to compare the reliability and applicability of both methods. Observations made with eye tracking and event tracking led to similar results regarding recommendation interface evaluation. In general, vertical interfaces showed higher effectiveness compared to horizontal ones, with the first and second positions working best, and the worse performance of horizontal interfaces probably being connected with banner blindness. Neural networks provided the best modeling results of the recommendation-driven purchase (RDP) phenomenon.

APA, Harvard, Vancouver, ISO, and other styles

18

Qi, Huiyu, Nobuo Funabiki, Khaing Hsu Wai, Xiqin Lu, Htoo Htoo Sandi Kyaw, and Wen-Chung Kao. "An Implementation of Element Fill-in-Blank Problems for Code Understanding Study of JavaScript-Based Web-Client Programming." International Journal of Information and Education Technology 12, no. 11 (2022): 1179–84. http://dx.doi.org/10.18178/ijiet.2022.12.11.1736.

Full text

Abstract:

At present, web-client programming using HTML, CSS, and JavaScript is essential in web application systems to offer dynamic behaviors in web pages. With rich libraries and short coding features, it becomes common in developing user interfaces. However, the teaching course is not common in universities due to limited time. Therefore, self-study tools are strongly desired to promote it in societies. Previously, we have studied the programming learning assistant system (PLAS) as a programming self-study platform. In PLAS, among several types of programming problems, the element fill-in-blank problem (EFP) has been implemented for code understanding study of C and Java programming. In an EFP instance, the blank elements in a source code should be filled in with the proper words, where the correctness is checked by string matching. In this paper, we implement EFP for web-client programming in PLAS. In a web page, HTML and CSS define the components with tags in the document object model (DOM), and JavaScript offers their dynamic changes with libraries, which are blanked in EFP. Besides, a set of web page screenshots are given to help the solution. For evaluations, the generated 21 EFP instances were assigned to 20 master students in Okayama University. By analyzing their solution results, the effectiveness was confirmed for JavaScript programming learning.

APA, Harvard, Vancouver, ISO, and other styles

19

Gupta, Shashank, and B. B. Gupta. "Smart XSS Attack Surveillance System for OSN in Virtualized Intelligence Network of Nodes of Fog Computing." International Journal of Web Services Research 14, no. 4 (October 2017): 1–32. http://dx.doi.org/10.4018/ijwsr.2017100101.

Full text

Abstract:

This article introduces a distributed intelligence network of Fog computing nodes and Cloud data centres for smart devices against XSS vulnerabilities in Online Social Network (OSN). The cloud data centres compute the features of JavaScript, injects them in the form of comments and saved them in the script nodes of Document Object Model (DOM) tree. The network of Fog devices re-executes the feature computation and comment injection process in the HTTP response message and compares such comments with those calculated in the cloud data centres. Any divergence observed will simply alarm the signal of injection of XSS worms on the nodes of fog located at the edge of the network. The mitigation of such worms is done by executing the nested context-sensitive sanitization on the malicious variables of JavaScript code embedded in such worms. The prototype of the authors' work was developed in Java development framework and installed on the virtual machines of Cloud data centres (typically located at the core of network) and the nodes of Fog devices (exclusively positioned at the edge of network). Vulnerable OSN-based web applications were utilized for evaluating the XSS worm detection capability of the authors' framework and evaluation results revealed that their work detects the injection of XSS worms with high precision rate and less rate of false positives and false negatives.

APA, Harvard, Vancouver, ISO, and other styles

20

Qi, Nian, and Ji Hong Ye. "Nonlinear Dynamic Analysis of Space Frame Structures by Discrete Element Method." Applied Mechanics and Materials 638-640 (September 2014): 1716–19. http://dx.doi.org/10.4028/www.scientific.net/amm.638-640.1716.

Full text

Abstract:

This document explores the possibility of the discrete element method (DEM) being applied in nonlinear dynamic analysis of space frame structures. The method models the analyzed object to be composed by finite particles and the Newton’s second law is applied to describe each particle’s motion. The parallel-bond model is adopted during the calculation of internal force and moment arising from the deformation. The procedure of analysis is vastly simple, accurate and versatile. Numerical examples are given to demonstrate the accuracy and applicability of this method in handling the large deflection and dynamic behaviour of space frame structures. Besides, the method does not need to form stiffness matrix or iterations, so it is more advantageous than traditional nonlinear finite element method.

APA, Harvard, Vancouver, ISO, and other styles

21

Al-Dailami, Abdulrahman, Chang Ruan, Zhihong Bao, and Tao Zhang. "QoS3: Secure Caching in HTTPS Based on Fine-Grained Trust Delegation." Security and Communication Networks 2019 (December 28, 2019): 1–16. http://dx.doi.org/10.1155/2019/3107543.

Full text

Abstract:

With the ever-increasing concern in network security and privacy, a major portion of Internet traffic is encrypted now. Recent research shows that more than 70% of Internet content is transmitted using HyperText Transfer Protocol Secure (HTTPS). However, HTTPS encryption eliminates the advantages of many intermediate services like the caching proxy, which can significantly degrade the performance of web content delivery. We argue that these restrictions lead to the need for other mechanisms to access sites quickly and safely. In this paper, we introduce QoS3, which is a protocol that can overcome such limitations by allowing clients to explicitly and securely re-introduce in-network caching proxies using fine-grained trust delegation without compromising the integrity of the HTTPS content and modifying the format of Transport Layer Security (TLS). In QoS3, we classify web page contents into two types: (1) public contents that are common for all users, which can be stored in the caching proxies, and (2) private contents that are specific for each user. Correspondingly, QoS3 establishes two separate TLS connections between the client and the web server for them. Specifically, for private contents, QoS3 just leverages the original HTTPS protocol to deliver them, without involving any middlebox. For public contents, QoS3 allows clients to delegate trust to specific caching proxy along the path, thereby allowing the clients to use the cached contents in the caching proxy via a delegated HTTPS connection. Meanwhile, to prevent Man-in-the-Middle (MitM) attacks on public contents, QoS3 validates the public contents by employing Document object Model (DoM) object-level checksums, which are delivered through the original HTTPS connection. We implement a prototype of QoS3 and evaluate its performance in our testbed. Experimental results show that QoS3 provides acceleration on page load time ranging between 30% and 64% over traditional HTTPS with negligible overhead. Moreover, QoS3 is deployable since it requires just minor software modifications to the server, client, and the middlebox.

APA, Harvard, Vancouver, ISO, and other styles

22

Fernandez-Tudela, Elisa, Luis C. Zambrano, Lázaro G. Lagóstena, and Manuel Bethencourt. "Documentación y análisis de un cepo de ancla romano y sus elementos iconográficos y epigráficos sellados." Virtual Archaeology Review 13, no. 26 (January 21, 2022): 147–62. http://dx.doi.org/10.4995/var.2022.15349.

Full text

Abstract:

This paper aims to present the documentation and analysis methodology carried out on a lead trap from the ancient period, which belongs to the collection of traps in the Museum of Cádiz (Andalusia, Spain). The anchor stock had some interesting characteristics for this research. On the one hand, from the point of view of conservation and restoration, due to the alterations it presented. On the other hand, from a historical and archaeological point of view, it showed signs of reliefs on its surface hidden under the alteration products. The removal of the different layers of alteration that covered the surface during conservation and restoration treatments revealed an unpublished iconographic and epigraphic programme, as well as possible marks of use and manufacture. The poor state of conservation of the original surface made it impossible to visualise the details as a whole, so we applied photogrammetric methods, and subsequently processed models using various GIS analysis and point cloud processing softwares.Two photogrammetric models (in Agisoft PhotoScan) were made to document the trap in general: one prior to the conservation and restoration process; and a second three-dimensional (3D) model once the surface had been cleaned. The purpose of the second model was to visualise the reliefs programme in general, as well as the different surface details. The first complete 3D model of the object was used to perform a virtual reconstruction of the anchor including the elements that did not preserve, using a 3D modelling program (Blender).Nine areas of the stock surface were selected for the analyses of the various iconographic and epigraphic features, which were documented and processed in Agisoft PhotoScan. The Digital Elevation Model (DEM) and point cloud models were then processed with different analyses tools in Geographic Information System (GIS) (such as QGIS) and point cloud processing software (CloudCompare). Our results document a piece of highly interesting information from its surface consisting of reliefs of four dolphins; at least four rectangular stamps: two of them with possible inscriptions, and an anthropomorphic figure. Thanks to the comparative data, we conclude that the four dolphins were made with the same stamp during the stock manufacturing process. Further, we were able to reconstruct the dolphin stamp, partially preserved in each of the reliefs, by unifying the 3D models, thus revealing the original set. This system of stamping by means of reusable dies is well known in other elements such as amphorae but has not been studied in the specific case of lead traps.In the case of the epigraphic elements, the 3D documentation methodology revealed numerous micro-surface details, not visible under conventional documentation techniques, which could help specialists to interpret these inscriptions. Although they have not been analysed in this research, its documentation has promoted the appreciation of surface details that could refer to the manufacturing processes (moulds and tools) or the traces of use, providing historical information on this object. At the same time, the virtual reconstruction of the anchor has aided the formation of hypotheses on the dimensions and original appearance of the anchor. The different tools used, such as raster analysis using shadow mapping and point cloud alignment, proved to be very effective. They have fulfilled the established objectives and have helped to establish a possible analysis methodology for future lead traps with decorative elements. These types of artefacts recovered from underwater sites are very common in museum collections. In many cases, their state of conservation and the difficulty in handling them due to their size and weight make it difficult to document surface details. In this case, the multidisciplinary work of conservation and 3D documentation allows for high-quality documentation that is easy to access and exchange between researchers. The combined use of photogrammetric techniques with virtual RTI provides a non-invasive method for the object, low cost and easy processing compared to other conventional methods.

APA, Harvard, Vancouver, ISO, and other styles

23

Chrismanto, Antonius Rachmat, Willy Sudiarto Raharjo, and Yuan Lukito. "Firefox Extension untuk Klasifikasi Komentar Spam pada Instagram Berbasis REST Services." Jurnal Edukasi dan Penelitian Informatika (JEPIN) 5, no. 2 (August 6, 2019): 146. http://dx.doi.org/10.26418/jp.v5i2.33010.

Full text

Abstract:

Klasifikasi komentar spam pada Instagram (IG) hanya dapat digunakan oleh pengguna melalui sistem yang berjalan di sisi client, karena data IG tidak dapat dimanipulasi dari luar IG. Dibutuhkan sistem yang dapat memanipulasi data dari sisi client dalam bentuk browser extension. Penelitian ini berfokus pada pengembangan browser extension untuk Firefox yang memanfaatkan web services REST pada layanan cloud dengan platform Amazon Web Services (AWS). Browser extension yang dikembangkan menggunakan 2 algoritma klasifikasi, yaitu KNN dan Distance-Weighted KNN (DW-KNN). Extension ini mampu menandai komentar spam dengan mengubah Document Object Model (DOM) IG menjadi berwarna merah dengan dicoret (strikethrough). Metode pengembangan extension dilakukan dengan metode Rapid Application Development (RAD). Pengujian pada penelitian ini dilakukan pada hasil implementasi browser extension dan pengukuran akurasi web service (algoritma KNN & DW-KNN). Pengujian implementasi browser extension menggunakan metode pengujian fungsionalitas, dimana setiap fitur yang telah diimplementasikan diuji apakah sudah sesuai dengan spesifikasi yang telah ditentukan sebelumnya. Pengujian akurasi web service dilakukan dengan bantuan tool SOAPUI. Hasil pengujian extension adalah: (1) pengujian extension pada sembarang halaman web berhasil 100%, (2) pengujian pada halaman awal (default) IG berhasil 100%, (3) pengujian pada halaman profile suatu akun IG berhasil 100%, (4) pengujian pada suatu posting IG dan komentarnya, tidak selalu berhasil karena dipengaruhi oleh kemampuan algoritma pada web services, (5) pengujian untuk bahasa bukan Indonesia tidak selalu berhasil karena bergantung pada library bahasa, (6) pengujian untuk load more comments pada IG tidak selalu berhasil karena bergantung pada algoritma pada web services, dan (7) pengujian pilihan algoritma pada options extension berhasil 100%. Hasil akurasi rata-rata tertinggi algoritma KNN adalah 80% untuk k=1, sedangkan DW-KNN adalah 90% untuk k=2.

APA, Harvard, Vancouver, ISO, and other styles

24

Fang, Xiu Susie, Quan Z. Sheng, Xianzhi Wang, Anne H. H. Ngu, and Yihong Zhang. "GrandBase: generating actionable knowledge from Big Data." PSU Research Review 1, no. 2 (August 14, 2017): 105–26. http://dx.doi.org/10.1108/prr-01-2017-0005.

Full text

Abstract:

Purpose This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase. Design/methodology/approach In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed. Findings Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed. Originality/value To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

APA, Harvard, Vancouver, ISO, and other styles

25

Ратов, Д. В. "Object adaptation of Drag and Drop technology for web-system interface components." ВІСНИК СХІДНОУКРАЇНСЬКОГО НАЦІОНАЛЬНОГО УНІВЕРСИТЕТУ імені Володимира Даля, no. 4(268) (June 10, 2021): 7–12. http://dx.doi.org/10.33216/1998-7927-2021-268-4-7-12.

Full text

Abstract:

Today, in the development of information systems, cloud technologies are often used for remote computing and data processing. There are web technologies, and on their basis, libraries and frameworks have been developed for creating web applications and user interfaces designed for the operation of information systems in browsers. Ready-made JavaScript libraries have been developed to add drag and drop functionality to a web application. However, in some situations, the library may not be available, or there may be overhead or dependencies that the project does not need to use it. In such situations, an alternative solution provides the functionality of APIs available in modern browsers. The article discusses the current state of the methods of the Drag and Drop mechanism and proposes a programmatic way to improve the interface by creating a class for dragging and dropping elements when organizing work in multi-user information web systems. Drag and Drop is a convenient way to improve the interface. Grabbing an element with the mouse and moving it visually simplifies many operations: from copying and moving documents, as in file managers, to placing orders in online store services. The HTML drag and drop API uses the DOM event model to retrieve information about a dragged element and update that element after the drag. Using JavaScript event handlers, it is possible to turn any element of the web system into a drag-and-drop element or drop target. To solve this problem, a JavaScript object was developed with methods that allow you to create a copy of any object and handle all events of this object aimed at organizing the Drag and Drop mechanism. Basic algorithm of Drag and Drop technology based on processing mouse events. The software implementation is considered and the results of the practical use of object adaptation of the Drag and Drop technology for the interface components of the web system - the medical information system MedSystem, in which the application modules have the implementation of the dispatcher and the interactive window interface are presented. In the "Outpatient clinic" module, the Drag and Drop mechanism is used when working with the "Appointment sheet". In the "Hospital" module of the MedSystem medical information system, the Drag and Drop mechanism is used in the "List of doctor's appointments". The results of using object adaptation of Drag and Drop technology have shown that this mechanism organically fits into existing technologies for building web applications and has sufficient potential to facilitate and automate work in multi-user information systems and web services.

APA, Harvard, Vancouver, ISO, and other styles

26

Klochkov, Denys, and Jan Mulawka. "Improving Ruby on Rails-Based Web Application Performance." Information 12, no. 8 (August 9, 2021): 319. http://dx.doi.org/10.3390/info12080319.

Full text

Abstract:

The evolution of web development and web applications has resulted in creation of numerous tools and frameworks that facilitate the development process. Even though those frameworks make web development faster and more efficient, there are certain downsides to using them. A decrease in application performance when using an “off the shelf” framework might be a crucial disadvantage, especially given the vital role web application response time plays in user experience. This contribution focuses on a particular framework—Ruby on Rails. Once the most popular framework, it has now lost its leading position, partially due to slow performance metrics and response times, especially in larger applications. Improving and expanding upon the previous work in this field, an attempt to improve the response time of a specially developed benchmark application is made. This is achieved by performing optimizations that can be roughly divided into two groups. The first group concerns the frontend improvements, which include: adopting the client-side rendering, JavaScript Document Object Model (DOM) manipulation and asynchronous requests. Another group can be described as the backend improvements, which include implementing intelligent, granular caching, disabling redundant modules, as well as profiling and optimizing database requests and reducing database access inefficiencies. Those improvements resulted in overall up to 74% decreased page loading times, with perceived application performance being improved above this mark due to the adoption of a client-side rendering strategy. Using the different metrics of application performance measurements, each of the improvement steps is evaluated with regards to its effect on different aspects of overall performance. In conclusion, this work presents a way to significantly decrease the response time of a particular Ruby on Rails application and simultaneously provide a better user experience. Even though the majority of this process is specific to Rails, similar steps can be taken to improve applications implemented with the use of other similar frameworks. As the result of the work, a groundwork is laid for the development of the tool that could assist the developers in improving their applications as well.

APA, Harvard, Vancouver, ISO, and other styles

27

Tse, William T., Kevin K. Duh, and Morris Kletzel. "A Low-Cost, Open-Source Informatics Framework for Clinical Trials and Outcomes Research." Blood 118, no. 21 (November 18, 2011): 4763. http://dx.doi.org/10.1182/blood.v118.21.4763.4763.

Full text

Abstract:

Abstract Abstract 4763 Data collection and analysis in clinical studies in hematology often require the use of specialized databases, which demand extensive information technology (IT) support and are expensive to maintain. With the goal of reducing the cost of clinical trials and promoting outcomes research, we have devised a new informatics framework that is low-cost, low-maintenance, and adaptable to both small- and large-scale clinical studies. This framework is based on the idea that most clinical data are hierarchical in nature: a clinical protocol typically entails the creation of sequential patient files, each of which documents multiple encounters, during which clinical events and data are captured and tagged for later retrieval and analysis. These hierarchical trees of clinical data can be easily stored in a hypertext mark-up language (HTML) document format, which is designed to represent similar hierarchical data on web pages. In this framework, the stored clinical data will be structured according to a web standard called Document Object Model (DOM), for which powerful informatics techniques have been developed to allow efficient retrieval and collation of data from the HTML documents. The proposed framework has many potential advantages. The data will be stored in plain text files in the HTML format, which is both human and machine readable, hence facilitating data exchange between collaborative groups. The framework requires only a regular web browser to function, thereby easing its adoption in multiple institutions. There will be no need to set up or maintain a relational database for data storage, thus minimizing data fragmentation and reducing the demand for IT support. Data entry and analysis will be performed mostly on the client computer, requiring the use of a backend server only for central data storage. Utility programs for data management and manipulation will be written in Javascript and JQuery, computer languages that are free, open-source and easy to maintain. Data can be captured, retrieved, and analyzed on different devices, including desktop computers, tablets or smart phones. Encryption and password protection can be applied in document storage and data transmission to ensure data security and HIPPA compliance. In a pilot project to implement and test this informatics framework, we designed prototype programming modules to perform individual tasks commonly encountered in clinical data management. The functionalities of these modules included user-interface creation, patient data entry and retrieval, visualization and analysis of aggregate results, and exporting and reporting of extracted data. These modules were used to access simulated clinical data stored in a remote server, employing standard web browsers available on all desktop computers and mobile devices. To test the capability of these modules, benchmark tests were performed. Simulated datasets of complete patient records, each with 1000 data items, were created and stored in the remote server. Data were retrieved via the web using a gzip compressed format. Retrieval of 100, 300, 1000 such records took only 1.01, 2.45, and 6.67 seconds using a desktop computer via a broadband connection, or 3.67, 11.39, and 30.23 seconds using a tablet computer via a 3G connection. Filtering of specific data from the retrieved records was equally speedy. Automated extraction of relevant data from 300 complete records for a two-sample t-test analysis took 1.97 seconds. A similar extraction of data for a Kaplan-Meier survival analysis took 4.19 seconds. The program allowed the data to be presented separately for individual patients or in aggregation for different clinical subgroups. A user-friendly interface enabled viewing of the data in either tabular or graphical forms. Incorporation of a new web browser technique permitted caching of the entire dataset locally for off-line access and analysis. Adaptable programming allowed efficient export of data in different formats for regulatory reporting purposes. Once the system was set up, no further intervention from IT department was necessary. In summary, we have designed and implemented a prototype of a new informatics framework for clinical data management, which should be low-cost and highly adaptable to various types of clinical studies. Field-testing of this framework in real-life clinical studies will be the next step to demonstrate its effectiveness and potential benefits. Disclosures: No relevant conflicts of interest to declare.

APA, Harvard, Vancouver, ISO, and other styles

28

Pakhmutova, N. "Differential object marking in Ibero-Romance languages: Explanatory models in domestic and Russian educational literature." Rhema, no. 2, 2019 (2019): 61–76. http://dx.doi.org/10.31862/2500-2953-2019-2-61-76.

Full text

Abstract:

Differential object marking / dom is the term for the phenomenon of distinguishing two classes of direct objects, one bearing a special marker, while the other lacking it. In modern linguistics, the marker licensing is partially or fully attributed to the features of a direct object: Animacy/Inanimacy and referential status. Russian didactic literature generally contains a reduced explanatory model of Spanish dom, based on the grammar of the Royal Spanish Academy. For Catalan, the explanatory model is complicated by the usus/norm split, the latter reducing the phenomenon’s scope. The paper focuses on the improvement of dom explanatory models for Spanish and Catalan.

APA, Harvard, Vancouver, ISO, and other styles

29

Ma, Jie, Dongyan Pei, Xuhan Zhang, Qiuying Lai, Fei He, Chao Fu, Jianhui Liu, and Weixin Li. "The Distribution of DOM in the Wanggang River Flowing into the East China Sea." International Journal of Environmental Research and Public Health 19, no. 15 (July 28, 2022): 9219. http://dx.doi.org/10.3390/ijerph19159219.

Full text

Abstract:

Dissolved organic matter (DOM) is a central component in the biogeochemical cycles of marine and terrestrial carbon pools, and its structural features greatly impact the function and behavior of ecosystems. In this study, the Wanggang River, which is a seagoing river that passes through Yancheng City, was selected as the research object. Three-dimensional (3D) fluorescence spectral data and UV–visible spectral data were used for component identification and source analysis of DOM based on the PARAFAC model. The results showed that the DOM content of the Wanggang River during the dry season was significantly higher than during the wet season; the DOM content increased gradually from the upper to lower reaches; the proportion of terrigenous components was higher during the wet season than during the dry. UV–Vis spectral data a280 and a355 indicated that the relative concentrations of protein-like components in the DOM of the Wanggang River were higher than those of humic-like components, and the ratio of aromatic substances in the DOM of the Wanggang River water was higher during the wet season. The DOM in the Wanggang River was dominated by protein-like components (>60%), and the protein-like components were dominated by tryptophan proteins (>40%). This study showed that the temporal and spatial distributions of DOM in rivers can be accurately determined using 3D fluorescence spectroscopy combined with the PARAFAC model. This provides useful insight into the biogeochemical process of DOM in rivers of coastal areas.

APA, Harvard, Vancouver, ISO, and other styles

30

Lim, Taesoo, Hoontae Kim, Minsoo Kim, and Suk-Ho Kang. "Object-oriented XML document meta-model for B2B collaborations." Production Planning & Control 14, no. 8 (December 2003): 810–26. http://dx.doi.org/10.1080/09537280310001647887.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Tavakkoli Piralilou, Sepideh, Hejar Shahabi, Ben Jarihani, Omid Ghorbanzadeh, Thomas Blaschke, Khalil Gholamnia, Sansar Meena, and Jagannath Aryal. "Landslide Detection Using Multi-Scale Image Segmentation and Different Machine Learning Models in the Higher Himalayas." Remote Sensing 11, no. 21 (November 2, 2019): 2575. http://dx.doi.org/10.3390/rs11212575.

Full text

Abstract:

Landslides represent a severe hazard in many areas of the world. Accurate landslide maps are needed to document the occurrence and extent of landslides and to investigate their distribution, types, and the pattern of slope failures. Landslide maps are also crucial for determining landslide susceptibility and risk. Satellite data have been widely used for such investigations—next to data from airborne or unmanned aerial vehicle (UAV)-borne campaigns and Digital Elevation Models (DEMs). We have developed a methodology that incorporates object-based image analysis (OBIA) with three machine learning (ML) methods, namely, the multilayer perceptron neural network (MLP-NN) and random forest (RF), for landslide detection. We identified the optimal scale parameters (SP) and used them for multi-scale segmentation and further analysis. We evaluated the resulting objects using the object pureness index (OPI), object matching index (OMI), and object fitness index (OFI) measures. We then applied two different methods to optimize the landslide detection task: (a) an ensemble method of stacking that combines the different ML methods for improving the performance, and (b) Dempster–Shafer theory (DST), to combine the multi-scale segmentation and classification results. Through the combination of three ML methods and the multi-scale approach, the framework enhanced landslide detection when it was tested for detecting earthquake-triggered landslides in Rasuwa district, Nepal. PlanetScope optical satellite images and a DEM were used, along with the derived landslide conditioning factors. Different accuracy assessment measures were used to compare the results against a field-based landslide inventory. All ML methods yielded the highest overall accuracies ranging from 83.3% to 87.2% when using objects with the optimal SP compared to other SPs. However, applying DST to combine the multi-scale results of each ML method significantly increased the overall accuracies to almost 90%. Overall, the integration of OBIA with ML methods resulted in appropriate landslide detections, but using the optimal SP and ML method is crucial for success.

APA, Harvard, Vancouver, ISO, and other styles

32

Sokolov, A. V. "The document as a cognitive object." Scientific and Technical Libraries, no. 8 (August 30, 2021): 13–38. http://dx.doi.org/10.33186/1027-3689-2021-8-13-38.

Full text

Abstract:

The problem of scholarly knowledge strata is one of the key subjects in modern philosophy of science and epistemology (theory of knowledge). The threelevel model is the most popular in the document sphere as a sociocultural domain; it comprises empirical, theoretical and philosophical scientific knowledge. The author describes collision and contradictions between theoretical documentology and the empirical disciplines of document studies and book studies. He appraises achievements of empirical knowledge in the documentosphere and compares theoretical and methodological ways of three types: informational, conventional and medialogical ones. The author is skeptical about the following statement: the concept of document is relative, conventional and conditional in its essence. The author examines “The Strategy for Librarianship in the Russian Federation for the Period up to the Year 2030”. He insists that the Strategy has to be supported by the scientific strategy aimed to promote each level of empirical, theoretical and philosophical knowledge. He is also concerned about lacking philosophical level in document studies. The wisdom of philosophy lies in its comprising both rational and irrational sides of objects being studied. With document rational and irrational attributes in view, he offers the philosophical definition of document: document is the statement of humanness within social environment and historical time. This definition comprises anthropological and national variations of homo sapiens and philosophical concept of humanness as the unity of the opposite universal principles – the material and ideal.

APA, Harvard, Vancouver, ISO, and other styles

33

Zhang, Fang Fang. "The Simulation of English Character Encryption Encoding Optimized Communication Model." Applied Mechanics and Materials 602-605 (August 2014): 3261–64. http://dx.doi.org/10.4028/www.scientific.net/amm.602-605.3261.

Full text

Abstract:

In order to enhance the security of the English character document, data-level information security issues have been researched on the English characters document. Use object-oriented programming to get the character information in English characters document and also get the encoding. Use the “long key stream” algorithm to encrypt. During the process of encryption and decryption, the lossless of English characters file format information is maintained. Also, the setting of encryption and decryption is introduced. English characters document which use this encryption method cannot be cracked illegally. Research on this simulation model guarantees the information safety of English characters document on bottom-level.

APA, Harvard, Vancouver, ISO, and other styles

34

Hung-Yu Kao, Jan-Ming Ho, and Ming-Syan Chen. "WISDOM: Web intrapage informative structure mining based on document object model." IEEE Transactions on Knowledge and Data Engineering 17, no. 5 (May 2005): 614–27. http://dx.doi.org/10.1109/tkde.2005.84.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Chillon, Alberto Hernandez, Diego Sevilla Ruiz, Jesus Garcia Molina, and Severino Feliciano Morales. "A Model-Driven Approach to Generate Schemas for Object-Document Mappers." IEEE Access 7 (2019): 59126–42. http://dx.doi.org/10.1109/access.2019.2915201.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

LALMAS, M., and T. ROLLEKE. "FOUR-VALUED KNOWLEDGE AUGMENTATION FOR STRUCTURED DOCUMENT RETRIEVAL." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 11, no. 01 (February 2003): 67–86. http://dx.doi.org/10.1142/s0218488503001953.

Full text

Abstract:

Structured documents are composed of objects with a content and a logical structure. The effective retrieval of structured documents requires models that provide for a content-based retrieval of objects that takes into account their logical structure, so that the relevance of an object is not solely based on its content, but also on the logical structure among objects. This paper proposes a formal model for representing structured documents where the content of an object is viewed as the knowledge contained in that object, and the logical structure among objects is capture by a process of knowledge augmentation: the knowledge contained in an object is augmented with that of its structurally related objects. The knowledge augmentation process takes into account the fact that knowledge can be incomplete and become inconsistent.

APA, Harvard, Vancouver, ISO, and other styles

37

Kallempudi, Goutham, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. "Toward Semi-Supervised Graphical Object Detection in Document Images." Future Internet 14, no. 6 (June 8, 2022): 176. http://dx.doi.org/10.3390/fi14060176.

Full text

Abstract:

The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.

APA, Harvard, Vancouver, ISO, and other styles

38

Shulginov, V. A. "COGNITIVE MODEL OF HYPERTEXT." Bulletin of Kemerovo State University, no. 4 (November 26, 2016): 233–38. http://dx.doi.org/10.21603/2078-8975-2016-4-233-238.

Full text

Abstract:

With digital technology, reading and writing manifest themselves as being extensively multi-sensory activity entailing perceptual, cognitive and motor interactions with digital text. The paper studies a cognitive model of an electronic hypertext, according to which the connectivity between documents has proven to play an important role in determining the communicative and cognitive activity of a certain user in it. For the analysis we collected RuNet links with high citation index and built the Corpus (2242 samples). We analyzed the links and found out that in most cases (>45 %) the topic of the document becomes the object of the author's reception. Links form a semantic network around the document and realize the bidirectional associative connection with the topic. We found out that besides the topic of the document, genre, tone of communication, spatio-temporal relationships in the structure of the electronic hypertext may also be the objects of the author's reception.

APA, Harvard, Vancouver, ISO, and other styles

39

Sinha, Sankalp, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. "Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images." Applied Sciences 12, no. 20 (October 20, 2022): 10578. http://dx.doi.org/10.3390/app122010578.

Full text

Abstract:

In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.

APA, Harvard, Vancouver, ISO, and other styles

40

Gao, Yu Wei, Xia Hou, and Ning Li. "An Implementation of Documents Interoperability Measuring System." Applied Mechanics and Materials 385-386 (August 2013): 1764–70. http://dx.doi.org/10.4028/www.scientific.net/amm.385-386.1764.

Full text

Abstract:

For the purpose of measuring document interoperability, a Feature Data Model (FDM) of open office document format is proposed to deal with documents. Feature Data is defined as a document container, which contains a number of document features. Each data object in the container meets different document standards and specifications. By using FDM, the instance documents establish the mapping relationships of features between different formats. Then use Feature Data as assistance to measure interoperability and calculate the statistical results automatically. The Documents Interoperability Measuring System (DIMS) mentioned in this paper is implemented by JAVA to prove the feasibility of this model and architecture.

APA, Harvard, Vancouver, ISO, and other styles

41

Naik, Shivam, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. "Investigating Attention Mechanism for Page Object Detection in Document Images." Applied Sciences 12, no. 15 (July 26, 2022): 7486. http://dx.doi.org/10.3390/app12157486.

Full text

Abstract:

Page object detection in scanned document images is a complex task due to varying document layouts and diverse page objects. In the past, traditional methods such as Optical Character Recognition (OCR)-based techniques have been employed to extract textual information. However, these methods fail to comprehend complex page objects such as tables and figures. This paper addresses the localization problem and classification of graphical objects that visually summarize vital information in documents. Furthermore, this work examines the benefit of incorporating attention mechanisms in different object detection networks to perform page object detection on scanned document images. The model is designed with a Pytorch-based framework called Detectron2. The proposed pipelines can be optimized end-to-end and exhaustively evaluated on publicly available datasets such as DocBank, PublayNet, and IIIT-AR-13K. The achieved results reflect the effectiveness of incorporating the attention mechanism for page object detection in documents.

APA, Harvard, Vancouver, ISO, and other styles

42

WANG, JASON T. L., and PETER A. NG. "TEXPROS: AN INTELLIGENT DOCUMENT PROCESSING SYSTEM." International Journal of Software Engineering and Knowledge Engineering 02, no. 02 (June 1992): 171–96. http://dx.doi.org/10.1142/s0218194092000099.

Full text

Abstract:

This paper presents the design of an intelligent document processing system, called TEXPROS. The system is a combination of filing and retrieval systems, which supports storing, extracting, classifying, categorizing, retrieving and browsing information from a variety of documents. TEXPROS is built based on object-oriented programming and rule-based specification techniques. In this paper, we describe main design goals of the system, its data model, logical file structure, and strategies for document classification and categorization. We also illustrate various retrieval methods and query processing techniques through examples. Finally applications of TEXPROS are presented, where we suggest ways in which the use of the system may alter the software process model.

APA, Harvard, Vancouver, ISO, and other styles

43

Wang, Meimei, and Jiayuan Lin. "Retrieving individual tree heights from a point cloud generated with optical imagery from an unmanned aerial vehicle (UAV)." Canadian Journal of Forest Research 50, no. 10 (October 2020): 1012–24. http://dx.doi.org/10.1139/cjfr-2019-0418.

Full text

Abstract:

Individual tree height (ITH) is one of the most important vertical structure parameters of a forest. Field measurement and laser scanning are very expensive for large forests. In this paper, we propose a cost-effective method to acquire ITHs in a forest using the optical overlapping images captured by an unmanned aerial vehicle (UAV). The data sets, including a point cloud, a digital surface model (DSM), and a digital orthorectified map (DOM), were produced from the UAV imagery. The canopy height model (CHM) was obtained by subtracting the digital elevation model (DEM) from the DSM removed of low vegetation. Object-based image analysis was used to extract individual tree crowns (ITCs) from the DOM, and ITHs were initially extracted by overlaying ITC outlines on the CHM. As the extracted ITHs were generally slightly shorter than the measured ITHs, a linear relationship was established between them. The final ITHs of the test site were retrieved by inputting extracted ITHs into the linear regression model. As a result, the coefficient of determination (R2), the root mean square error (RMSE), the mean absolute error (MAE), and the mean relative error (MRE) of the retrieved ITHs against the measured ITHs were 0.92, 1.08 m, 0.76 m, and 0.08, respectively.

APA, Harvard, Vancouver, ISO, and other styles

44

Dykhanov, Stanyslav, and Natalia Guk. "Analysis of the structure of web resources using the object model." Eastern-European Journal of Enterprise Technologies 5, no. 2(119) (October 30, 2022): 6–13. http://dx.doi.org/10.15587/1729-4061.2022.265961.

Full text

Abstract:

The methodology for analyzing the structure of a web resource using an object model, which is based on the description of the page in HTML and using style sheets, has been proposed. The object of research is a web resource page, the model of which is depicted as a DOM tree. Data on the structural elements of the tree are supplemented with information about the styles of the design of the pages. To determine the similarity of pages, it is proposed to apply a criterion that takes into account the structural and stylistic similarity of pages with the corresponding coefficients. To compare page models with each other, the method of aligning trees will be used. Editing distance is used as a metric, and renaming operations, deleting, and adding a tree node is used as editing operations. To determine the similarity in styles, the Jaccard metric is used. To cluster web pages, the k-means method with a cosine distance measure is applied. Intracluster analysis is carried out using a modification of the Zhang-Shasha algorithm. The proposed approach is implemented in the form of an algorithm and software using Python programming language and related libraries. The computational experiment was performed to analyze the structure of individual websites existing on the Internet, as well as to group pages from different web resources. The structure of the formed clusters was analyzed, the RMS similarity of elements in the middle of the clusters was calculated. To assess the quality of the developed approach for the tasks under consideration, expert partitioning was built, the values of accuracy and completeness metrics were calculated. The results of the analysis of the structure of the web resource can be used to improve the structure of the components of the web resource, to understand the navigation of users on the site, to reengineer the web resource

APA, Harvard, Vancouver, ISO, and other styles

45

Jovanovic, Nenad, Ranko Popovic, and Zoran Jovanovic. "Defining a general object model of distributed systems entities in Java." Facta universitatis - series: Electronics and Energetics 16, no. 2 (2003): 185–94. http://dx.doi.org/10.2298/fuee0302185j.

Full text

Abstract:

This paper deals with modeling and simulation of distributed systems in Java programming language. Distributed systems consist of components which communicate and coordinate their actions by passing messages. Components of distributed systems which define some functionality are entities determined with static attributes. The paper presents general model for modeling in Java environment entities distributed systems communicate using message passing protocol. We described a way to present model components and their functional links in a form of an XML document. A model can be executed as an application and (or) an applet. Presented model can be used for modeling heterogeneous systems.

APA, Harvard, Vancouver, ISO, and other styles

46

Talandová, Petra, and Jiří Rybička. "Method specification for automated evaluation of documents formal quality." Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis 57, no. 6 (2009): 305–14. http://dx.doi.org/10.11118/actaun200957060305.

Full text

Abstract:

Automated documents processing allows production of large amount of documents. Formal quality of the documents is very important as it contributes to better understanding and information transmission. The paper deals with the automated documents quality evaluation. This requires a design of a document model. The model contains the objects of which the pages are compiled, the types of objects and, the most important, the objects’ parameters. The parameters of the object are very important as they are inputs for the document evaluation according to the typographical rules. The parameters are an important part of the model which should reliably describe the document. A set of criteria is designed, which are used to describe the requirements on appropriate methods for model formation. From large amount of methods, methods that meet the criteria can be applied to the document. The result is a model of a real document which can be used for the automatic evaluation based on the typographical rules.

APA, Harvard, Vancouver, ISO, and other styles

47

Yoon, Chae-Eun, Hye-hyeon Jeoung, and Chang-Jin Seo. "Detection for Document-Type Malware Code using Deep Learning Model and PDF Object Analysis." TRANSACTION OF THE KOREAN INSTITUTE OF ELECTRICAL ENGINEERS P 70P, no. 1 (March 31, 2021): 44–49. http://dx.doi.org/10.5370/kieep.2021.70.1.044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Schauerhuber, A., E. Kapsammer, W. Schwinger, W. Retschitzegger, and M. Wimmer. "Bridging WebML to model-driven engineering: from document type definitions to meta object facility." IET Software 1, no. 3 (June 1, 2007): 81–97. http://dx.doi.org/10.1049/iet-sen:20060066.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Ajam, Maher, Mustafa Alshawi, and Toufic Mezher. "Augmented process model for e-tendering: Towards integrating object models with document management systems." Automation in Construction 19, no. 6 (October 2010): 762–78. http://dx.doi.org/10.1016/j.autcon.2010.04.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

S.V., Kapochkin, Zaripova E.V., and Maksimov N.V. "A Document Object Model for Solving the Problem of Identification and Structurization of Documentary Flows of Rosfinmonitoring." KnE Social Sciences 3, no. 2 (February 15, 2018): 1. http://dx.doi.org/10.18502/kss.v3i2.1517.

Full text

Abstract:

The paper considers the issues of building an information retrieval system by using the algorithm of automated classification and recognition of the structure of full-text documents. It describes the selected approaches, as well as the algorithm for identifying the document type and the algorithm for recognizing its logical structure, developed on the basis of these approaches, with the aim of further semantic processing. It introduces a multi-stage method for automated recognition and formation of a model of the logical structure of a document. Experimental studies of this method have been conducted on the array of reporting documents “Rosfinmonitoring”.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'DOM (Document Object Model)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles