Novel active learning methods for enhanced PC malware detection in windows OS Academic Article uri icon


  • The formation of new malwares every day poses a significant challenge to anti-virus vendors since antivirus tools, using manually crafted signatures, are only capable of identifying known malware instances and their relatively similar variants. To identify new and unknown malwares for updating their anti-virus signature repository, anti-virus vendors must daily collect new, suspicious files that need to be analyzed manually by information security experts who then label them as malware or benign. Analyzing suspected files is a time-consuming task and it is impossible to manually analyze all of them. Consequently, anti-virus vendors use machine learning algorithms and heuristics in order to reduce the number of suspect files that must be inspected manually. These techniques, however, lack an essential element – they cannot be daily updated. In this work we introduce a solution for this updatability gap. We present an active learning (AL) framework and introduce two new AL methods that will assist anti-virus vendors to focus their analytical efforts by acquiring those files that are most probably malicious. Those new AL methods are designed and oriented towards new malware acquisition. To test the capability of our methods for acquiring new malwares from a stream of unknown files, we conducted a series of experiments over a ten-day period. A comparison of our methods to existing high performance AL methods and to random selection, which is the naïve method, indicates that the AL methods outperformed random selection for all performance measures. Our AL methods outperformed existing AL method in two respects, both related to the number of new malwares acquired daily, the core measure in this study. First, our best performing AL method, termed “Exploitation”, acquired on the 9th day of the experiment about 2.6 times more malwares than the existing AL method and 7.8 more times than the random selection. Secondly, while the existing AL method showed a decrease in the number of new malwares acquired over 10 days, our AL methods showed an increase and a daily improvement in the number of new malwares acquired. Both results point towards increased efficiency that can possibly assist anti-virus vendors.

publication date

  • October 1, 2014