Boosting the detection of malicious documents using designated active learning methods Conference Paper uri icon


  • Most organizations usually create, send and receive huge amounts of documents daily, Attackers increasingly take advantage of innocent users who tend to casually open email massages assumed to be benign, carrying malicious documents. Recent targeted attacks aimed at organizations, utilize the new Microsoft Word documents (*.docx). Anti-virus software fails to detect new unknown malicious files, including malicious docx files. In this study, we present SFEM feature extraction methodology and designated Active Learning (AL) methods, aimed at accurate detection of new unknown malicious docx files that also efficiently enhances the detection's model capabilities over time. Our AL methods identify and acquire only small set of new docx files that are most likely malicious, as well as informative benign files, these files are used for enhancing the knowledge stores of both the detection model and the anti-virus software. Results show that our active learning methods used only 14% of the labeled docx files within organization which led to a reduction of 95.5% in labeling efforts compared to passive learning and SVM-Margin (existing active learning method). Our AL methods also showed a significant improvement of 91% in unknown docx malware acquisition compared to passive learning and SVM-Margin, thus providing an improved updating solution for detection model, as well as the anti-virus software widely used within organizations.

publication date

  • December 9, 2015