To see the other types of publications on this topic, follow the link: Malware similarity.

Journal articles on the topic 'Malware similarity'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Malware similarity.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chen, Yu-Hung, Jiann-Liang Chen, and Ren-Feng Deng. "Similarity-Based Malware Classification Using Graph Neural Networks." Applied Sciences 12, no. 21 (October 26, 2022): 10837. http://dx.doi.org/10.3390/app122110837.

Full text
Abstract:
This work proposes a novel malware identification model that is based on a graph neural network (GNN). The function call relationship and function assembly content obtained by analyzing the malware are used to generate a graph that represents the functional structure of a malware sample. In addition to establishing a multi-classification model for predicting malware family, this work implements a similarity model that is based on Siamese networks, measuring the distance between two samples in the feature space to determine whether they belong to the same malware family. The distance between the samples is gradually adjusted during the training of the model to improve the performance. A Malware Bazaar dataset analysis reveals that the proposed classification model has an accuracy and area under the curve (AUC) of 0.934 and 0.997, respectively. The proposed similarity model has an accuracy and AUC of 0.92 and 0.92, respectively. Further, the proposed similarity model identifies the unseen malware family with approximately 70% accuracy. Hence, the proposed similarity model exhibits better performance and scalability than the pure classification model and previous studies.
APA, Harvard, Vancouver, ISO, and other styles
2

YANG, Yi, Pu-Rui SU, Ling-Yun YING, and Deng-Guo FENG. "Dependency-Based Malware Similarity Comparison Method." Journal of Software 22, no. 10 (October 25, 2011): 2438–53. http://dx.doi.org/10.3724/sp.j.1001.2011.03888.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jang, Jae-wook, Hyunjae Kang, Jiyoung Woo, Aziz Mohaisen, and Huy Kang Kim. "Andro-AutoPsy: Anti-malware system based on similarity matching of malware and malware creator-centric information." Digital Investigation 14 (September 2015): 17–35. http://dx.doi.org/10.1016/j.diin.2015.06.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jang, Jae-wook, Hyunjae Kang, Jiyoung Woo, Aziz Mohaisen, and Huy Kang Kim. "Andro-Dumpsys: Anti-malware system based on the similarity of malware creator and malware centric information." Computers & Security 58 (May 2016): 125–38. http://dx.doi.org/10.1016/j.cose.2015.12.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pavithra, J., and S. Selvakumara Samy. "An Adaptive Feature Centric XG Boost Ensemble Classifier Model for Improved Malware Detection and Classification." International Journal on Recent and Innovation Trends in Computing and Communication 10, no. 2s (December 31, 2022): 208–17. http://dx.doi.org/10.17762/ijritcc.v10i2s.5930.

Full text
Abstract:
Machine learning (ML) is often used to solve the problem of malware detection and classification and various machine learning approaches are adapted to the problem of malware classification; still acquiring poor performance by the way of feature selection, and classification. To manage the issue, an efficient Adaptive Feature Centric XG Boost Ensemble Learner Classifier “AFC-XG Boost” novel algorithm is presented in this paper. The proposed model has been designed to handle varying data sets of malware detection obtained from Kaggle data set. The model turns the process of XG Boost classifier in several stages to optimize the performance. At preprocessing stage, the data set given has been noise removed, normalized and tamper removed using Feature Base Optimizer “FBO” algorithm. The FBO would normalize the data points as well as performs noise removal according to the feature values and their base information. Similarly, the performance of standard XG Boost has been optimized by adapting Feature selection using Class Based Principle Component Analysis “CBPCA” algorithm, which performs feature selection according to the fitness of any feature for different classes. Based on the selected features, the method generates regression tree for each feature considered. Based on the generated trees, the method performs classification by computing Tree Level Ensemble Similarity “TLES” and Class Level Ensemble Similarity “CLES”. Using both method computes the value of Class Match Similarity “CMS” based on which the malware has been classified. The proposed approach achieves 97% accuracy in malware detection and classification with the less time complexity of 34 seconds for 75000 samples
APA, Harvard, Vancouver, ISO, and other styles
6

Venkatraman, Sitalakshmi, and Mamoun Alazab. "Use of Data Visualisation for Zero-Day Malware Detection." Security and Communication Networks 2018 (December 2, 2018): 1–13. http://dx.doi.org/10.1155/2018/1728303.

Full text
Abstract:
With the explosion of Internet of Things (IoT) worldwide, there is an increasing threat from malicious software (malware) attackers that calls for efficient monitoring of vulnerable systems. Large amounts of data collected from computer networks, servers, and mobile devices need to be analysed for malware proliferation. Effective analysis methods are needed to match with the scale and complexity of such a data-intensive environment. In today’s Big Data contexts, visualisation techniques can support malware analysts going through the time-consuming process of analysing suspicious activities thoroughly. This paper takes a step further in contributing to the evolving realm of visualisation techniques used in the information security field. The aim of the paper is twofold: (1) to provide a comprehensive overview of the existing visualisation techniques for detecting suspicious behaviour of systems and (2) to design a novel visualisation using similarity matrix method for establishing malware classification accurately. The prime motivation of our proposal is to identify obfuscated malware using visualisation of the extended x86 IA-32 (opcode) similarity patterns, which are hard to detect with the existing approaches. Our approach uses hybrid models wherein static and dynamic malware analysis techniques are combined effectively along with visualisation of similarity matrices in order to detect and classify zero-day malware efficiently. Overall, the high accuracy of classification achieved with our proposed method can be visually observed since different malware families exhibit significantly dissimilar behaviour patterns.
APA, Harvard, Vancouver, ISO, and other styles
7

Shi, Hongbo, Tomoki Hamagami, Katsunari Yoshioka, Haoyuan Xu, Kazuhiro Tobe, and Shigeki Goto. "Structural classification and similarity measurement of malware." IEEJ Transactions on Electrical and Electronic Engineering 9, no. 6 (September 27, 2014): 621–32. http://dx.doi.org/10.1002/tee.22018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Chia-Mei, and Shi-Hao Wang. "Advancing Malware Classification With an Evolving Clustering Method." International Journal of Applied Metaheuristic Computing 9, no. 3 (July 2018): 1–12. http://dx.doi.org/10.4018/ijamc.2018070101.

Full text
Abstract:
This article describes how honeypots and intrusion detection systems serve as major mechanisms for security administrators to collect a variety of sample viruses and malware for further analysis, classification, and system protection. However, increased variety and complexity of malware makes the analysis and classification challenging, especially when efficiency and timely response are two contradictory yet equally significant criteria in malware classification. Besides, similarity-based classifications exhibit insufficiency because the mutation and fuzzification of malware exacerbate classification difficulties. In order to improve malware classification speed and attend to mutation, this research proposes the ameliorated progressive classification that integrates static analysis and improved k-means algorithm. This proposed classification aims at assisting network administrators to have a malware classification preprocess and make efficient malware classifications upon the capture of new malware, thus enhancing the defense against malware.
APA, Harvard, Vancouver, ISO, and other styles
9

Frenklach, Tatiana, Dvir Cohen, Asaf Shabtai, and Rami Puzis. "Android malware detection via an app similarity graph." Computers & Security 109 (October 2021): 102386. http://dx.doi.org/10.1016/j.cose.2021.102386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Park, Chan-Kyu, Hyong-Shik Kim, Tae Jin Lee, and Jae-Cheol Ryou. "Function partitioning methods for malware variant similarity comparison." Journal of the Korea Institute of Information Security and Cryptology 25, no. 2 (April 30, 2015): 321–30. http://dx.doi.org/10.13089/jkiisc.2015.25.2.321.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

ZHAO, Bing-lin, Fu-dong LIU, Zheng SHAN, Yi-hang CHEN, and Jian LIU. "Graph Similarity Metric Using Graph Convolutional Network: Application to Malware Similarity Match." IEICE Transactions on Information and Systems E102.D, no. 8 (August 1, 2019): 1581–85. http://dx.doi.org/10.1587/transinf.2018edl8259.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Choi, Sunoh. "Combined kNN Classification and Hierarchical Similarity Hash for Fast Malware Detection." Applied Sciences 10, no. 15 (July 28, 2020): 5173. http://dx.doi.org/10.3390/app10155173.

Full text
Abstract:
Every day, hundreds of thousands of new malicious files are created. Existing pattern-based antivirus solutions have difficulty detecting these new malicious files. Artificial intelligence (AI)–based malware detection has been proposed to solve the problem; however, it takes a long time. Similarity hash–based detection has also been proposed; however, it has a low detection rate. To solve these problems, we propose k-nearest-neighbor (kNN) classification for malware detection with a vantage-point (VP) tree using a similarity hash. When we use kNN classification, we reduce the detection time by 67% and increase the detection rate by 25%. With a VP tree using a similarity hash, we reduce the similarity-hash search time by 20%.
APA, Harvard, Vancouver, ISO, and other styles
13

Black, Paul, Iqbal Gondal, Peter Vamplew, and Arun Lakhotia. "Function Similarity Using Family Context." Electronics 9, no. 7 (July 17, 2020): 1163. http://dx.doi.org/10.3390/electronics9071163.

Full text
Abstract:
Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.
APA, Harvard, Vancouver, ISO, and other styles
14

Rinaldi, Aditia. "Implementasi Fuzzy Hashing untuk Signature Malware." Jurnal ULTIMA Computing 6, no. 1 (June 1, 2014): 33–38. http://dx.doi.org/10.31937/sk.v6i1.293.

Full text
Abstract:
Cryptographic hash value has long been used as a database of signatures to identify malware. The most widely used is the MD5 and/or SHA256. In addition, there are fuzzy hashing that slightly different from the traditional hash: length hash value is not fixed and hash value can be used to calculate the degree of similarity of some malware that may still be a variant. This research use ssdeep tool to calculate fuzzy hash. Database signature with fuzzy hash is smaller than SHA256 and larger than MD5. The level of accuracy for the detection of script-based malware variants is greater than the executable-based malware variants. Index Terms—file signature, fuzzy hashing, malware signature, rolling hashing, sha
APA, Harvard, Vancouver, ISO, and other styles
15

Wrench, P., and B. Irwin. "Detecting Derivative Malware Samples Using Deobfuscation-Assisted Similarity Analysis." SAIEE Africa Research Journal 107, no. 2 (June 2016): 65–77. http://dx.doi.org/10.23919/saiee.2016.8531543.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Joe, Woo-Jin, and Hyong-Shik Kim. "A Malware Variants Detection Method based on Behavior Similarity." Korean Institute of Smart Media 8, no. 4 (December 31, 2019): 25–32. http://dx.doi.org/10.30693/smj.2019.8.4.25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Wang, Changguang, Ziqiu Zhao, Fangwei Wang, and Qingru Li. "A Novel Malware Detection and Family Classification Scheme for IoT Based on DEAM and DenseNet." Security and Communication Networks 2021 (January 5, 2021): 1–16. http://dx.doi.org/10.1155/2021/6658842.

Full text
Abstract:
With the rapid increase in the amount and type of malware, traditional methods of malware detection and family classification for IoT applications through static and dynamic analysis have been greatly challenged. In this paper, a new simple and effective attention module of Convolutional Neural Networks (CNNs), named as Depthwise Efficient Attention Module (DEAM), is proposed and combined with a DenseNet to propose a new malware detection and family classification model. Based on the good effect of the DenseNet in the field of image classification and the visual similarity of the malware family on images, the gray-scale image transformed from malware is input into the model combined with the DEAM and DenseNet for malware detection, and then the family classification is carried out. The DEAM is a general lightweight attention module improved based on the Convolutional Block Attention Module (CBAM), which can strengthen the attention to the characteristics of malware and improve the model effect. We use the MalImg dataset, Microsoft malware classification challenge dataset (BIG 2015), and our dataset constructed by the two above-mentioned datasets to verify the effectiveness of the proposed model in family classification and malware detection. Experimental results show that the proposed model achieves 99.3% in terms of accuracy for malware detection on our dataset and achieves 98.5% and 97.3% in terms of accuracy for family classification on the MalImg dataset and BIG 2015 dataset, respectively. The model can reliably detect IoT malware and classify its families.
APA, Harvard, Vancouver, ISO, and other styles
18

Gopika, Bhardwaj, and Yadav Rashi. "Predicting the Spread of Malware Outbreaks Using Autoencoder Based Neutral Networks." MENDEL 25, no. 1 (June 24, 2019): 157–64. http://dx.doi.org/10.13164/mendel.2019.1.157.

Full text
Abstract:
Malware Outbreaks are pervasive in today's digital world. However, there is a lack of awareness on part of general public on how to safeguard against such attacks and a need for increased cooperation between various national and international research as well as governmental organizations to combat the threat. On the positive side, cyber security websites, blogs and newsletters post articles outlining the working and spread of a malware outbreak and steps to recover from the same as well. In this project, an effective approach to predicting the spread of malware outbreaks is presented. The scope of the project is 15 Malware Outbreaks and the approach involves collecting these cyber aware articles from the web, assigning them to the 15 Malware Outbreaks using Topic Modeling and Similarity Analysis and along with Spread information of the Malware Outbreaks, this is input to auto encoder neural network for learning latent space representations which are further used to predict the spread of malware outbreak as either high or low spread outbreak, achieving a prediction accuracy of 75.56. This work can be used to process large amount of cyber aware content for effective and accurate prediction in the era of much-needed cyber security.
APA, Harvard, Vancouver, ISO, and other styles
19

Ndibanje, Bruce, Ki Kim, Young Kang, Hyun Kim, Tae Kim, and Hoon Lee. "Cross-Method-Based Analysis and Classification of Malicious Behavior by API Calls Extraction." Applied Sciences 9, no. 2 (January 10, 2019): 239. http://dx.doi.org/10.3390/app9020239.

Full text
Abstract:
Data-driven public security networking and computer systems are always under threat from malicious codes known as malware; therefore, a large amount of research and development is taking place to find effective countermeasures. These countermeasures are mainly based on dynamic and statistical analysis. Because of the obfuscation techniques used by the malware authors, security researchers and the anti-virus industry are facing a colossal issue regarding the extraction of hidden payloads within packed executable extraction. Based on this understanding, we first propose a method to de-obfuscate and unpack the malware samples. Additional, cross-method-based big data analysis to dynamically and statistically extract features from malware has been proposed. The Application Programming Interface (API) call sequences that reflect the malware behavior of its code have been used to detect behavior such as network traffic, modifying a file, writing to stderr or stdout, modifying a registry value, creating a process. Furthermore, we include a similarity analysis and machine learning algorithms to profile and classify malware behaviors. The experimental results of the proposed method show that malware detection accuracy is very useful to discover potential threats and can help the decision-maker to deploy appropriate countermeasures.
APA, Harvard, Vancouver, ISO, and other styles
20

Daeef, Ammar Yahya, Ali Al-Naji, Ali K. Nahar, and Javaan Chahl. "Features Engineering to Differentiate between Malware and Legitimate Software." Applied Sciences 13, no. 3 (February 3, 2023): 1972. http://dx.doi.org/10.3390/app13031972.

Full text
Abstract:
Malware is the primary attack vector against the modern enterprise. Therefore, it is crucial for businesses to exclude malware from their computer systems. The most responsive solution to this issue would operate in real time at the edge of the IT system using artificial intelligence. However, a lightweight solution is crucial at the edge because these options are restricted by the lack of available memory and processing power. The best contender to offer such a solution is application programming interface (API) calls. However, creating API call characteristics that offer a high malware detection rate with quick execution is a significant challenge. This work uses visualisation analysis and Jaccard similarity to uncover the hidden patterns produced by different API calls in order to accomplish this goal. This study also compared neural networks which use long sequences of API calls with shallow machine learning classifiers. Three classifiers are used: support vector machine (SVM), k-nearest neighbourhood (KNN), and random forest (RF). The benchmark data set comprises 43,876 examples of API call sequences, divided into two categories: malware and legitimate. The results showed that RF performed similarly to long short-term memory (LSTM) and deep graph convolutional neural networks (DGCNNs). They also suggest the potential for performing inference on edge devices in a real-time setting.
APA, Harvard, Vancouver, ISO, and other styles
21

Daeef, Ammar Yahya, Ali Al-Naji, and Javaan Chahl. "Features Engineering for Malware Family Classification Based API Call." Computers 11, no. 11 (November 11, 2022): 160. http://dx.doi.org/10.3390/computers11110160.

Full text
Abstract:
Malware is used to carry out malicious operations on networks and computer systems. Consequently, malware classification is crucial for preventing malicious attacks. Application programming interfaces (APIs) are ideal candidates for characterizing malware behavior. However, the primary challenge is to produce API call features for classification algorithms to achieve high classification accuracy. To achieve this aim, this work employed the Jaccard similarity and visualization analysis to find the hidden patterns created by various malware API calls. Traditional machine learning classifiers, i.e., random forest (RF), support vector machine (SVM), and k-nearest neighborhood (KNN), were used in this research as alternatives to existing neural networks, which use millions of length API call sequences. The benchmark dataset used in this study contains 7107 samples of API call sequences (labeled to eight different malware families). The results showed that RF with the proposed API call features outperformed the LSTM (long short-term memory) and gated recurrent unit (GRU)-based methods against overall evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
22

Kumar, Rajesh, Xiaosong Zhang, Riaz Khan, and Abubakar Sharif. "Research on Data Mining of Permission-Induced Risk for Android IoT Devices." Applied Sciences 9, no. 2 (January 14, 2019): 277. http://dx.doi.org/10.3390/app9020277.

Full text
Abstract:
With the growing era of the Internet of Things (IoT), more and more devices are connecting with the Internet using android applications to provide various services. The IoT devices are used for sensing, controlling and monitoring of different processes. Most of IoT devices use Android applications for communication and data exchange. Therefore, a secure Android permission privileged mechanism is required to increase the security of apps. According to a recent study, a malicious Android application is developed almost every 10 s. To resist this serious malware campaign, we need effective malware detection approaches to identify malware applications effectively and efficiently. Most of the studies focused on detecting malware based on static and dynamic analysis of the applications. However, to analyse the risky permission at runtime is a challenging task. In this study, first, we proposed a novel approach to distinguish between malware and benign applications based on permission ranking, similarity-based permission feature selection, and association rule for permission mining. Secondly, the proposed methodology also includes the enhancement of the random forest algorithm to improve the accuracy for malware detection. The experimental outcomes demonstrate high proficiency of the accuracy for malware detection, which is pivotal for android apps aiming for secure data exchange between IoT devices.
APA, Harvard, Vancouver, ISO, and other styles
23

Yousefi-Azar, Mahmood, Len Hamey, Vijay Varadharajan, and Shiping Chen. "Byte2vec: Malware Representation and Feature Selection for Android." Computer Journal 63, no. 8 (November 17, 2019): 1125–38. http://dx.doi.org/10.1093/comjnl/bxz121.

Full text
Abstract:
Abstract Malware detection based on static features and without code disassembling is a challenging path of research. Obfuscation makes the static analysis of malware even more challenging. This paper extends static malware detection beyond byte level $n$-grams and detecting important strings. We propose a model (Byte2vec) with the capabilities of both binary file feature representation and feature selection for malware detection. Byte2vec embeds the semantic similarity of byte level codes into a feature vector (byte vector) and also into a context vector. The learned feature vectors of Byte2vec, using skip-gram with negative-sampling topology, are combined with byte-level term-frequency (tf) for malware detection. We also show that the distance between a feature vector and its corresponding context vector provides a useful measure to rank features. The top ranked features are successfully used for malware detection. We show that this feature selection algorithm is an unsupervised version of mutual information (MI). We test the proposed scheme on four freely available Android malware datasets including one obfuscated malware dataset. The model is trained only on clean APKs. The results show that the model outperforms MI in a low-dimensional feature space and is competitive with MI and other state-of-the-art models in higher dimensions. In particular, our tests show very promising results on a wide range of obfuscated malware with a false negative rate of only 0.3% and a false positive rate of 2.0%. The detection results on obfuscated malware show the advantage of the unsupervised feature selection algorithm compared with the MI-based method.
APA, Harvard, Vancouver, ISO, and other styles
24

Han, KyoungSoo, BooJoong Kang, and Eul Gyu Im. "Malware Analysis Using Visualized Image Matrices." Scientific World Journal 2014 (2014): 1–15. http://dx.doi.org/10.1155/2014/132713.

Full text
Abstract:
This paper proposes a novel malware visual analysis method that contains not only a visualization method to convert binary files into images, but also a similarity calculation method between these images. The proposed method generates RGB-colored pixels on image matrices using the opcode sequences extracted from malware samples and calculates the similarities for the image matrices. Particularly, our proposed methods are available for packed malware samples by applying them to the execution traces extracted through dynamic analysis. When the images are generated, we can reduce the overheads by extracting the opcode sequences only from the blocks that include the instructions related to staple behaviors such as functions and application programming interface (API) calls. In addition, we propose a technique that generates a representative image for each malware family in order to reduce the number of comparisons for the classification of unknown samples and the colored pixel information in the image matrices is used to calculate the similarities between the images. Our experimental results show that the image matrices of malware can effectively be used to classify malware families both statically and dynamically with accuracy of 0.9896 and 0.9732, respectively.
APA, Harvard, Vancouver, ISO, and other styles
25

Qasem, Abdullah, Sami Zhioua, and Karima Makhlouf. "Finding a Needle in a Haystack: The Traffic Analysis Version." Proceedings on Privacy Enhancing Technologies 2019, no. 2 (April 1, 2019): 270–90. http://dx.doi.org/10.2478/popets-2019-0030.

Full text
Abstract:
Abstract Traffic analysis is the process of extracting useful/sensitive information from observed network traffic. Typical use cases include malware detection and website fingerprinting attacks. High accuracy traffic analysis techniques use machine learning algorithms (e.g. SVM, kNN) and require to split the traffic into correctly separated blocks. Inspired by digital forensics techniques, we propose a new network traffic analysis approach based on similarity digest. The approach features several advantages compared to existing techniques, namely, fast signature generation, compact signature representation using Bloom filters, efficient similarity detection between packet traces of arbitrary sizes, and in particular dropping the traffic splitting requirement altogether. Experimental results show very promising results on VPN and malware traffic, but low results on Tor traffic due mainly to the single-size cells feature.
APA, Harvard, Vancouver, ISO, and other styles
26

He, Gaofeng, Bingfeng Xu, Lu Zhang, and Haiting Zhu. "On-Device Detection of Repackaged Android Malware via Traffic Clustering." Security and Communication Networks 2020 (May 31, 2020): 1–19. http://dx.doi.org/10.1155/2020/8630748.

Full text
Abstract:
Malware has become a significant problem on the Android platform. To defend against Android malware, researchers have proposed several on-device detection methods. Typically, these on-device detection methods are composed of two steps: (i) extracting the apps’ behavior features from the mobile devices and (ii) sending the extracted features to remote servers (such as a cloud platform) for analysis. By monitoring the behaviors of the apps that are running on mobile devices, available methods can detect suspicious applications (simply, apps) accurately. However, mobile devices are typically resource limited. The feature extraction and massive data transmission might consume substantial power and CPU resources; thus, the performance of mobile devices will be degraded. To address this issue, we propose a novel method for detecting Android malware by clustering apps’ traffic at the edge computing nodes. First, a new integrated architecture of the cloud, edge, and mobile devices for Android malware detection is presented. Then, for repackaged Android malware, the network traffic content and statistics are extracted at the edge as detection features. Finally, in the cloud, similarities between apps are calculated, and the similarity values are automatically clustered to separate the original apps and the malware. The experimental results demonstrate that the proposed method can detect repackaged Android malware with high precision and with a minimal impact on the performance of mobile devices.
APA, Harvard, Vancouver, ISO, and other styles
27

Niu, Wei-Na, Jiao Xie, Xiao-Song Zhang, Chong Wang, Xin-Qiang Li, Rui-Dong Chen, and Xiao-Lei Liu. "HTTP-Based APT Malware Infection Detection Using URL Correlation Analysis." Security and Communication Networks 2021 (April 7, 2021): 1–12. http://dx.doi.org/10.1155/2021/6653386.

Full text
Abstract:
APT malware exploits HTTP to establish communication with a C & C server to hide their malicious activities. Thus, HTTP-based APT malware infection can be discovered by analyzing HTTP traffic. Recent methods have been dependent on the extraction of statistical features from HTTP traffic, which is suitable for machine learning. However, the features they extract from the limited HTTP-based APT malware traffic dataset are too simple to detect APT malware with strong randomness insufficiently. In this paper, we propose an innovative approach which could uncover APT malware traffic related to data exfiltration and other suspect APT activities by analyzing the header fields of HTTP traffic. We use the Referer field in the HTTP header to construct a web request graph. Then, we optimize the web request graph by combining URL similarity and redirect reconstruction. We also use a normal uncorrelated request filter to filter the remaining unrelated legitimate requests. We have evaluated the proposed method using 1.48 GB normal HTTP flow from clickminer and 280 MB APT malware HTTP flow from Stratosphere Lab, Contagiodump, and pcapanalysis. The experimental results have shown that the URL-correlation-based APT malware traffic detection method can correctly detect 96.08% APT malware traffic, and its recall rate is 98.87%. We have also conducted experiments to compare our approach against Jiang’s method, MalHunter, and BotDet, and the experimental results have confirmed that our detection approach has a better performance, the accuracy of which reached 96.08% and the F1 value increased by more than 5%.
APA, Harvard, Vancouver, ISO, and other styles
28

Wei, Chaoxian, Qiang Li, Dong Guo, and Xiangyu Meng. "Toward Identifying APT Malware through API System Calls." Security and Communication Networks 2021 (December 9, 2021): 1–14. http://dx.doi.org/10.1155/2021/8077220.

Full text
Abstract:
Self-developed malware was usually used by advanced persistent threat (APT) attackers to launch APT attacks. Therefore, we can enhance the understanding and cognition of APT attacks by comprehending the behavior of APT malware. Unfortunately, the current research cannot effectively explain the relationship between the recognition, detection, and defense of APT. The model of similar studies also lacks an explanation about it. To defend against APT attacks and inquire about the similarity of different APT attacks, this study proposes an APT malware classification method based on a combination of multiple deep learning algorithms and transfer learning by collecting malware used in several famous APT groups in public. By extracting the application programming interface (API) system calls, with the vector representation of features by combining dynamic LSTM and attention algorithm, we can obtain API at different APT families classification contributions trained dynamic. Thus, we used transfer learning to perform multiple classifications of the APT family. This study aims to reduce the burden of network security staff from reviewing a large number of suspicious files when defending against APT attacks. Additionally, it can effectively intercept them in the initial invasion stage of APT to perform targeted defense against specific APT attacks by combining threat intelligence in public. The experimental result shows that the proposed method can achieve 99.2% in distinguishing common malware from APT malware and assign APT malware to different APT families with an accuracy of 95.5%.
APA, Harvard, Vancouver, ISO, and other styles
29

Xu, Ming, Lingfei Wu, Shuhui Qi, Jian Xu, Haiping Zhang, Yizhi Ren, and Ning Zheng. "A similarity metric method of obfuscated malware using function-call graph." Journal of Computer Virology and Hacking Techniques 9, no. 1 (January 22, 2013): 35–47. http://dx.doi.org/10.1007/s11416-012-0175-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Haq, Irfan Ul, and Juan Caballero. "A Survey of Binary Code Similarity." ACM Computing Surveys 54, no. 3 (June 2021): 1–38. http://dx.doi.org/10.1145/3446371.

Full text
Abstract:
Binary code similarityapproaches compare two or more pieces of binary code to identify their similarities and differences. The ability to compare binary code enables many real-world applications on scenarios where source code may not be available such as patch analysis, bug search, and malware detection and analysis. Over the past 22 years numerous binary code similarity approaches have been proposed, but the research area has not yet been systematically analyzed. This article presents the first survey of binary code similarity. It analyzes 70 binary code similarity approaches, which are systematized on four aspects: (1) the applications they enable, (2) their approach characteristics, (3) how the approaches are implemented, and (4) the benchmarks and methodologies used to evaluate them. In addition, the survey discusses the scope and origins of the area, its evolution over the past two decades, and the challenges that lie ahead.
APA, Harvard, Vancouver, ISO, and other styles
31

Chu, Sung-Taek, HeeSeok Kim, Kwang-Hyuk Im, Kyu-Il Kim, and Chang-Ho Seo. "Development of a Performance Evaluation Model on Similarity Measurement Method of Malware." Journal of the Korea Contents Association 14, no. 10 (October 28, 2014): 32–40. http://dx.doi.org/10.5392/jkca.2014.14.10.032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Cho, In Kyeom, and Eul Gyu Im. "Improvement of Performance of Malware Similarity Analysis by the Sequence Alignment Technique." KIISE Transactions on Computing Practices 21, no. 3 (March 15, 2015): 263–68. http://dx.doi.org/10.5626/ktcp.2015.21.3.263.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Jang, Eun-Gyeom, Sang Jun Lee, and Joong In Lee. "A Study on Similarity Comparison for File DNA-Based Metamorphic Malware Detection." Journal of the Korea Society of Computer and Information 19, no. 1 (January 29, 2014): 85–94. http://dx.doi.org/10.9708/jksci.2014.19.1.085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Taheri, Rahim, Meysam Ghahramani, Reza Javidan, Mohammad Shojafar, Zahra Pooranian, and Mauro Conti. "Similarity-based Android malware detection using Hamming distance of static binary features." Future Generation Computer Systems 105 (April 2020): 230–47. http://dx.doi.org/10.1016/j.future.2019.11.034.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Rahim Khan, M. A., R. C. Tripathi, and Ajit Kumar. "Repacked android application detection using image similarity." Nexo Revista Científica 33, no. 01 (July 20, 2020): 190–99. http://dx.doi.org/10.5377/nexo.v33i01.10058.

Full text
Abstract:
The popularity of Android brings many functionalities to its users but it also brings many threats. Repacked Android application is one such threat which is the root of many other threats such as malware, phishing, adware, and economical loss. Earlier many techniques have been proposed for the detection of repacked application but they have their limitations and bottlenecks. In this work, we proposed an image similarity based repacked application detection technique. The proposed work utilized the main idea behind the repacking of application that is “the attacker wants to create fake application looking visually similar to the original". We convert each APK file into a grayscale image and then use perceptual hashing for creating a hash of each image. The string distance algorithms like Hamming distance was used to calculate the distance and searching for the repacked application. The proposed work also used distance calculation on binary features extracted from the app. The proposed work is very powerful in terms of detection accuracy and scanning speed and we achieved 96% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
36

Berta, Katarina, Sasa Stojanovic, Milos Cvetanovic, and Zaharije Radivojevic. "Estimation of similarity between functions extracted from x86 executable files." Serbian Journal of Electrical Engineering 12, no. 2 (2015): 253–62. http://dx.doi.org/10.2298/sjee1502253b.

Full text
Abstract:
Comparison of functions is required in various domains of software engineering. In most domains, comparison is done using source code, but in some domains, such as license violation or malware analysis, only binary code is available. The goal of this paper is to evaluate whether the existing solution meant for ARM architecture can be applied to x86 architecture. The existing solution encompasses multiple approaches, but for the purpose of this paper three representative approaches are implemented; two are based on machine learning, and the third does not require previous knowledge. Results show that the best recalls obtained for the first ten positions on both architectures are comparable and do not differ significantly. The results confirm that adaptation of all approaches of the existing solution is not only possible but also promising and represent adequate basis for future research.
APA, Harvard, Vancouver, ISO, and other styles
37

Yamany, Bahaa, Mahmoud Said Elsayed, Anca D. Jurcut, Nashwa Abdelbaki, and Marianne A. Azer. "A New Scheme for Ransomware Classification and Clustering Using Static Features." Electronics 11, no. 20 (October 14, 2022): 3307. http://dx.doi.org/10.3390/electronics11203307.

Full text
Abstract:
Ransomware is a strain of malware that disables access to the user’s resources after infiltrating a victim’s system. Ransomware is one of the most dangerous malware organizations face by blocking data access or publishing private data over the internet. The major challenge of any entity is how to decrypt the files encrypted by ransomware. Ransomware’s binary analysis can provide a means to characterize the relationships between different features used by ransomware families to track the ransomware encryption mechanism routine. In this paper, we compare the different ransomware detection approaches and techniques. We investigate the criteria, parameters, and tools used in the ransomware detection ecosystem. We present the main recommendations and best practices for ransomware mitigation. In addition, we propose an efficient ransomware indexing system that provides search functionalities, similarity checking, sample classification, and clustering. The new system scheme mainly targets native ransomware binaries, and the indexing engine depends on hybrid data from the static analyzer system. Our scheme tracks and classifies ransomware based on static features to find the similarity between different ransomware samples. This is done by calculating the absolute Jaccard index. Results have shown that Import Address Table (IAT) feature can be used to classify different ransomware more accurately than the Strings feature.
APA, Harvard, Vancouver, ISO, and other styles
38

Namanya, Anitta Patience, Irfan U. Awan, Jules Pagna Disso, and Muhammad Younas. "Similarity hash based scoring of portable executable files for efficient malware detection in IoT." Future Generation Computer Systems 110 (September 2020): 824–32. http://dx.doi.org/10.1016/j.future.2019.04.044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Turnip, Togu Novriansyah, Pratiwi Okuli Manik, Jhon Harry Tampubolon, and Patota Adi Petro Siahaan. "Klasifikasi Aplikasi Android menggunakan Algoritme K-Means dan Convolutional Neural Network berdasarkan Permission." Jurnal Teknologi Informasi dan Ilmu Komputer 7, no. 2 (February 18, 2020): 399. http://dx.doi.org/10.25126/jtiik.2020702641.

Full text
Abstract:
<p><em>Convolutional Neural Network</em> (CNN) adalah salah satu metode <em>multilayer perceptron</em> yang dapat melakukan klasifikasi aplikasi lebih dari dua kelas. Penelitian ini mengklasifikasikan aplikasi ke dalam tiga kelas, yaitu kelas aplikasi tidak berbahaya, mengandung <em>malware</em> kurang berbahaya, dan mengandung <em>malware</em> berbahaya. Dataset yang digunakan pada penelitian ini terdiri dari <em>dataset</em> Androsec dan Koodous dengan total data 37289 aplikasi. <em>Dataset</em> mengandung aplikasi <em>undetected</em> (tidak mengandung <em>malware</em>) dan <em>detected</em> (mengandung <em>malware</em>). Data <em>detected</em> perlu dikelompokkan dengan algoritme <em>k-means </em>sehingga menghasilkan kelompok aplikasi kurang berbahaya dan berbahaya berdasarkan tingkat kemiripan fitur <em>permission</em> yang dimiliki aplikasi. Kerangka kerja meliputi <em>dataset preprocessing, learning and classification algorithm using CNN</em>, dan <em>check APK to Model</em>. Tingkat akurasi terbaik yang didapat pada penelitian ini adalah 92,23% dan dapat mengklasifikasikan ke dalam kelas tidak berbahaya, kurang berbahaya, dan berbahaya.</p><p> </p><p><em><strong>Abstract</strong></em></p><p class="Judul2"><em>Convolutional Neural Network (CNN) is a multilayer perceptron method which able to classify apps more than two classes. This paper describes classification into three classes such as benign/no malware, less harmful, and harmful application. In this research, we use and construct dataset from Androsec and Koodous with total 37289 apps. Dataset consists of undetected (no malware) and detected (consists of malware). Detected files need to clustered with k-means algorithm to clasify apps into less harmful and harmful </em><em>based on apps permission similarity.</em><em> The framework includes dataset preprocessing, learning and classification algorithm using CNN, and check APK to Model. In this research, we get the best accuracy 92,23% and able to classify apps into three classes benign, less harmful, and harmful.</em><em></em></p><p><em><strong><br /></strong></em></p>
APA, Harvard, Vancouver, ISO, and other styles
40

Torabi, Sadegh, Mirabelle Dib, Elias Bou-Harb, Chadi Assi, and Mourad Debbabi. "A Strings-Based Similarity Analysis Approach for Characterizing IoT Malware and Inferring Their Underlying Relationships." IEEE Networking Letters 3, no. 3 (September 2021): 161–65. http://dx.doi.org/10.1109/lnet.2021.3076600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Parres-Peredo, Alvaro, Ivan Piza-Davila, and Francisco Cervantes. "Unexpected-Behavior Detection Using TopK Rankings for Cybersecurity." Applied Sciences 9, no. 20 (October 17, 2019): 4381. http://dx.doi.org/10.3390/app9204381.

Full text
Abstract:
Anomaly-based intrusion detection systems use profiles to characterize expected behavior of network users. Most of these systems characterize the entire network traffic within a single profile. This work proposes a user-level anomaly-based intrusion detection methodology using only the user’s network traffic. The proposed profile is a collection of TopK rankings of reached services by the user. To detect unexpected behaviors, the real-time traffic is organized into TopK rankings and compared to the profile using similarity measures. The experiments demonstrated that the proposed methodology was capable of detecting a particular kind of malware attack in all the users tested.
APA, Harvard, Vancouver, ISO, and other styles
42

NICHEPORUK, A., A. NICHEPORUK, I. NEGA, Y. NICHEPORUK, and A. KAZANTSEV. "INFORMATION TECHNOLOGY FOR DETECTING METAMORPHIC VIRUSES BASED ON THE ANALYSIS OF THE BEHAVIOR OF APPLICATIONS IN THE CORPORATE NETWORK." Computer Systems and Information Technologies 1, no. 1 (September 2, 2020): 60–67. http://dx.doi.org/10.31891/csit-2020-1-8.

Full text
Abstract:
The problem of cybercrime is one of the greatest threats to the modern information world. Among a wide range of different types of malware, the leading place is occupied by viral programs that use mutations of their own software code, ie polymorphic and metamorphic viruses. The purpose of transforming your own code is for attackers to try to make their previous malware different (in terms of syntax, not in terms of semantics) with each new infection. According to a study conducted by Webroot in 2018, about 94% of all malware performs mutations in their software code. In addition, the problem of the prevalence of mutated software is complicated by the availability of free access to metamorphic generators, which allows you to import into malware metamorphic component. Therefore, the relevance of the development of new methods and information technologies focused on the detection of polymorphic and metamorphic software leaves no doubt. The paper proposed the information technology for detecting metamorphic viruses based on the analysis of the behavior of applications in the corporate network. The detection process is based on the analysis of API calls that describe the potentially dangerous behavior of the software application. After establishing the fact of suspicious behavior of the application, the disassembled code of the functional blocks of the suspicious application is compared with the code of the functional blocks of its modified version. Modified emulators are installed on network hosts to create a modified version of the software application. In order to increase the overall efficiency of detection of metamorphic viruses, information technology involves searching a match between the functional blocks of the metamorphic virus and its modified version. A fuzzy inference system is used to form a conclusion about the similarity of a suspicious program to a metamorphic virus. In case of insufficient manifestation of harmful behavior and in order to increase the level of reliability for the detection of metamorphic virus, other network hosts are involved.
APA, Harvard, Vancouver, ISO, and other styles
43

Yavneh, Amir, Roy Lothan, and Dan Yamin. "Co-similar malware infection patterns as a predictor of future risk." PLOS ONE 16, no. 3 (March 29, 2021): e0249273. http://dx.doi.org/10.1371/journal.pone.0249273.

Full text
Abstract:
The internet is flooded with malicious content that can come in various forms and lead to information theft and monetary losses. From the ISP to the browser itself, many security systems act to defend the user from such content. However, most systems have at least one of three major limitations: 1) they are not personalized and do not account for the differences between users, 2) their defense mechanism is reactive and unable to predict upcoming attacks, and 3) they extensively track and use the user’s activity, thereby invading her privacy in the process. We developed a methodological framework to predict future exposure to malicious content. Our framework accounts for three factors–the user’s previous exposure history, her co-similarity to other users based on their previous exposures in a conceptual network, and how the network evolves. Utilizing over 20,000 users’ browsing data, our approach succeeds in achieving accurate results on the infection-prone portion of the population, surpassing common methods, and doing so with as little as 1/1000 of the personal information it requires.
APA, Harvard, Vancouver, ISO, and other styles
44

Pan, Zulie, Taiyan Wang, Lu Yu, and Yintong Yan. "Position Distribution Matters: A Graph-Based Binary Function Similarity Analysis Method." Electronics 11, no. 15 (August 5, 2022): 2446. http://dx.doi.org/10.3390/electronics11152446.

Full text
Abstract:
Binary function similarity analysis evaluates the similarity of functions at the binary level to aid program analysis, which is popular in many fields, such as vulnerability detection, binary clone detection, and malware detection. Graph-based methods have relatively good performance in practice, but currently, they cannot capture similarity in the aspect of the graph position distribution and lose information in graph processing, which leads to low accuracy. This paper presents PDM, a graph-based method to increase the accuracy of binary function similarity detection, by considering position distribution information. First, an enhanced Attributed Control Flow Graph (ACFG+) of a function is constructed based on a control flow graph, assisted by the instruction embedding technique and data flow analysis. Then, ACFG+ is fed to a graph embedding model using the CapsGNN and DiffPool mechanisms, to enrich information in graph processing by considering the position distribution. The model outputs the corresponding embedding vector, and we can calculate the similarity between different function embeddings using the cosine distance. Similarity detection is completed in the Siamese network. Experiments show that compared with VulSeeker and PalmTree+VulSeeker, PDM can stably obtain three-times and two-times higher accuracy, respectively, in binary function similarity detection and can detect up to six-times more results in vulnerability detection. When comparing with some state-of-the-art tools, PDM has comparable Top-5, Top-10, and Top-20 ranking results with respect to BinDiff, Diaphora, and Kam1n0 and significant advantages in the Top-50, Top-100, and Top-200 detection results.
APA, Harvard, Vancouver, ISO, and other styles
45

Choi, Sunoh. "Malicious Powershell Detection Using Graph Convolution Network." Applied Sciences 11, no. 14 (July 12, 2021): 6429. http://dx.doi.org/10.3390/app11146429.

Full text
Abstract:
The internet’s rapid growth has resulted in an increase in the number of malicious files. Recently, powershell scripts and Windows portable executable (PE) files have been used in malicious behaviors. To solve these problems, artificial intelligence (AI) based malware detection methods have been widely studied. Among AI techniques, the graph convolution network (GCN) was recently introduced. Here, we propose a malicious powershell detection method using a GCN. To use the GCN, we needed an adjacency matrix. Therefore, we proposed an adjacency matrix generation method using the Jaccard similarity. In addition, we show that the malicious powershell detection rate is increased by approximately 8.2% using GCN.
APA, Harvard, Vancouver, ISO, and other styles
46

Botacin, Marcus, Vitor Hugo Galhardo Moia, Fabricio Ceschin, Marco A. Amaral Henriques, and André Grégio. "Understanding uses and misuses of similarity hashing functions for malware detection and family clustering in actual scenarios." Forensic Science International: Digital Investigation 38 (September 2021): 301220. http://dx.doi.org/10.1016/j.fsidi.2021.301220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Hai, Nguyen Minh. "A STATISTICAL APPROACH FOR PACKER IDENTIFICATION." Vietnam Journal of Science and Technology 54, no. 3A (March 20, 2018): 129. http://dx.doi.org/10.15625/2525-2518/54/3a/11966.

Full text
Abstract:
Most of modern malware are packed by packers which automatically generate a lot of obfuscation techniques to defeat the anti-virus software. To identify packer, most of industry approaches still adopt the well-known technique of signature matching which can be easily evaded. This paper studies the new approach of applying a statistical approach to tackle this problem. We propose a new weight for extracting what obfuscation techniques might be more favourable in packers. We call it obfuscation technique frequency-inverse packer frequency ( ). As the term implies, calculates values for each obfuscation techniques in a packer through an inverse proportion of the frequency of the obfuscation technique in a particular packer to the percentage of packers the obfuscation technique appears in. Obfuscation techniques with high value show a strong relationship with the packer they appear in. Based on this weight, packer is represented by a vector of . Then the used packer is identified by measuring the similarity between vectors of packer and targeted file. For checking the accuracy of our approach, we have performed the experiments of identifying packer on 200 real-world malware for comparing between our approach with the binary signature technique adopted in CFF Explorer. The result shows that our technique produces the better detection.
APA, Harvard, Vancouver, ISO, and other styles
48

Bukhanov, D. G., V. M. Polyakov, and M. A. Redkina. "Detection of malware using an artificial neural network based on adaptive resonant theory." Prikladnaya Diskretnaya Matematika, no. 52 (2021): 69–82. http://dx.doi.org/10.17223/20710410/52/4.

Full text
Abstract:
The process of detecting malicious code by anti-virus systems is considered. The main part of this process is the procedure for analyzing a file or process. Artificial neural networks based on the adaptive-resonance theory are proposed to use as a method of analysis. The graph2vec vectorization algorithm is used to represent the analyzed program codes in numerical format. Despite the fact that the use of this vectorization method ignores the semantic relationships between the sequence of executable commands, it allows to reduce the analysis time without significant loss of accuracy. The use of an artificial neural network ART-2m with a hierarchical memory structure made it possible to reduce the classification time for a malicious file. Reducing the classification time allows to set more memory levels and increase the similarity parameter, which leads to an improved classification quality. Experiments show that with this approach to detecting malicious software, similar files can be recognized by both size and behavior.
APA, Harvard, Vancouver, ISO, and other styles
49

Arslan, Recep Sinan. "FG-Droid: Grouping based feature size reduction for Android malware detection." PeerJ Computer Science 8 (July 14, 2022): e1043. http://dx.doi.org/10.7717/peerj-cs.1043.

Full text
Abstract:
Background The number of applications prepared for use on mobile devices has increased rapidly with the widespread use of the Android OS. This has resulted in the undesired installation of Android application packages (APKs) that violate user privacy or are malicious. The increasing similarity between Android malware and benign applications makes it difficult to distinguish them from each other and causes a situation of concern for users. Methods In this study, FG-Droid, a machine-learning based classifier, using the method of grouping the features obtained by static analysis, was proposed. It was created because of experiments with machine learning (ML), deep neural network (DNN), recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU)-based models using Drebin, Genome, and Arslan datasets. Results The experimental results revealed that FG-Droid achieved a 97.7% area under the receiver operating characteristic (ROC) curve (AUC) score with a vector including only 11 static features and the ExtraTree algorithm. While reaching a high classification rate, only 0.063 seconds were needed for analysis per application. This means that the proposed feature selection method is faster than all traditional feature selection methods, and FG-Droid is one of the tools to date with the shortest analysis time per application. As a result, an efficient classifier with few features, low analysis time, and high classification success was developed using a unique feature grouping method.
APA, Harvard, Vancouver, ISO, and other styles
50

Wang, Yan, Peng Jia, Cheng Huang, Jiayong Liu, and Peisong He. "Hierarchical Attention Graph Embedding Networks for Binary Code Similarity against Compilation Diversity." Security and Communication Networks 2021 (August 10, 2021): 1–19. http://dx.doi.org/10.1155/2021/9954520.

Full text
Abstract:
Binary code similarity comparison is the technique that determines if two functions are similar by only considering their compiled form, which has many applications, including clone detection, malware classification, and vulnerability discovery. However, it is challenging to design a robust code similarity comparison engine since different compilation settings that make logically similar assembly functions appear to be very different. Moreover, existing approaches suffer from high-performance overheads, lower robustness, or poor scalability. In this paper, a novel solution HBinSim is proposed by employing the multiview features of the function to address these challenges. It first extracts the syntactic and semantic features of each basic block by static analysis. HBinSim further analyzes the function and constructs a syntactic attribute control flow graph and a semantic attribute control flow graph for each function. Then, a hierarchical attention graph embedding network is designed for graph-structured data processing. The network model has a hierarchical structure that mirrors the hierarchical structure of the function. It has three levels of attention mechanisms applied at the instruction, basic block, and function level, enabling it to attend differentially to more and less critical content when constructing the function representation. We conduct extensive experiments to evaluate its effectiveness and efficiency. The results show that our tool outperforms the state-of-the-art binary code similarity comparison tools by a large margin against compilation diversity clone searching. A real-world vulnerabilities search case further demonstrates the usefulness of our system.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography