Defense Methods Against Adversarial Examples for Recurrent Neural Networks. Academic Article uri icon

abstract

  • Adversarial examples are known to mislead deep learning models to incorrectly classify them, even in domains where such models achieve state of the art performance. Until recently, research on both attack and defense methods focused on image recognition, mostly using convolutional neural networks. In recent years, adversarial example generation methods for recurrent neural networks (RNNs) have been published, demonstrating that RNN classifiers are vulnerable as well. In this paper, we present five novel defense methods to make RNN classifiers more robust against such attacks, as opposed to previous defense methods, which where designed only for non-sequence based models. We evaluate our methods against state of the art attacks in the cyber security domain, where real adversaries (malware developers) exist, but our methods can be applied against any sequence based adversarial attack, e.g., in the NLP domain. Using our methods we decrease such attack effectiveness from 99.9% to about 15%.

publication date

  • January 1, 2019