Adversarial Robustness of Deep Learning

TUTORIAL ON ROBUST DEEP LEARNING MODELS

Abstract

This tutorial aims to introduce the fundamentals of adversarial robustness ofdeep learning, presenting a well-structured review of up-to-date techniques toassess the vulnerability of various types of deep learning models to adversarialexamples. This tutorial will particularly highlight state-of-the-art techniques inadversarial attacks and robustness verification of deep neural networks (DNNs).We will also introduce some effective countermeasures to improve robustness ofdeep learning models, with a particular focus on generalisable adversarial train-ing. We aim to provide a comprehensive overall picture about this emergingdirection and enable the community to be aware of the urgency and importanceof designing robust deep learning models in safety-critical data analytical ap-plications, ultimately enabling the end-users to trust deep learning classifiers.We will also summarize potential research directions concerning the adversarialrobustness of deep learning, and its potential benefits to enable accountable andtrustworthy deep learning-based data analytical systems and applications.

Content

The content of the tutorial is planned as below:

Introduction: this part will introduce the concept of adversarial robustness by showing some examples from computer vision, natural language processing, and malware detection, autonomous systems. Specifically, we will demonstrate the vulnerabilities of various types of deep learning models to different adversarial examples. We will also highlight the dissimilarities of research focuses on adversarial robustness from different communities, i.e., attack, defense and verification.

Falsification: this part will detail some famous adversarial attack methods with an aim to provide some insights of why adversarial examples exit and how to generate adversarial perturbation effectively and efficiently. Specifically, we will present five well-established works, including FGSM [1], C&W [2], DeepFool [3], JMSA [4], ZeroAttack [20]. In the end of this part, we will also briefly touch some novel adversarial examples emerged recently, including universal adversarial examples [21], spatial-transformed attacks [7], adversarial patches [22], etc

Rectification: this part will present an overview of state-of-the-art robust optimisation techniques for adversarial defense, with emphasis on generalisable adversarial training and regularisation methods. In particular, adversarial training with Fast Gradient Method (FGM) [14], Projected Gradient Method (PGM) [15], Wasserstein Risk Minimization (WRM) [18] will be analysed with respect to generalisation guarantees, and regularisation techniques such as spectral normalisation [13] and Lipschitz regularisation [23] to promote training stability and robustness against adversarial examples will be also discussed.

Verification: this part will review the state-of-the-art on the formal verification techniques for checking whether a deep learning model is robust. Techniques to be discussed including constraint solving based techniques (MILP, Reluplex [8]), approximation techniques (MaxSens [23], AI2 [10], DeepSymbol), and global optimisation based techniques (DLV [11], DeepGO [12], DeepGame [17]).

KEY REFERENCES

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy: Explaining and Harnessing Adversarial Examples. ICLR 2015
Carlini, Nicholas, and David Wagner. Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (S&P), 2017.
Moosavi-Dezfooli, Seyed-Mohsen, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. IEEE conference on computer vision and pattern recognition (CVPR), 2016.
Papernot, Nicolas, et al. The limitations of deep learning in adversarial settings. 2016 IEEE European symposium on security and privacy (EuroS&P). 2016.
Ilyas, Andrew, et al. Black-box Adversarial Attacks with Limited Queries and Information. International Conference on Machine Learning (ICML). 2018.
Wang, Qinglong, et al. "Adversary resistant deep neural networks with an application to malware detection." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 2017.
Xiao, Chaowei, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. "Spatially Transformed Adversarial Examples." In International Conference on Learning Representations (ICLR) 2018.
Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. CAV 2017.
Xiang, W., Tran, H. D., Johnson, T. T. Output reachable set estimation and verification for multilayer neural networks. IEEE transactions on neural networks and learning systems, 29(11), 5777-5783. 2018
Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, Martin Vechev. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. 2018 IEEE Symposium on Security and Privacy (S&P)
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, Min Wu. Safety Verification of Deep Neural Networks. CAV 2017.
Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska. Reachability Analysis of Deep Neural Networks with Provable Guarantees. IJCAI 2018
Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska. A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees. Theoretical Computer Science, vol. 807, pp. 298-329, 2020.
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR 2017.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR 2018.
Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. ICLR 2018.
Farzan Farnia, Jesse Zhang, and David Tse. Generalizable adversarial training via spectral normalization. ICLR 2019.
Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, andNicolas Usunier. Parseval networks: Improving robustness to adversarialexamples. ICML 2017.
Xinping Yi. Asymptotic singular value distribution of linear convolutional layers. arXiv:2006.07117, 2020.
Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi,Cho-Jui Hsieh, and Shin-Ming Cheng. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceed-ings of the AAAI Conference on Artificial Intelligence, volume 33, pages 742–749, 2019.
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and PascalFrossard. Universal adversarial perturbations. In Proceedings of the IEEE con-ference on computer vision and pattern recognition, pages 1765–1773, 2017.
Simen Thys, Wiebe Van Ranst, and Toon Goedem ́e. Fooling automated surveil-lance cameras: adversarial patches to attack person detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural networks:analysis and efficient estimation. In Advances in Neural Information Processing Systems, pages 3835–3844, 2018.

IJCAI 2021 Tutorial

Towards Robust Deep Learning Models: Verification, Falsification, and Rectification

The 30th International Joint Conference on Artificial Intelligence (IJCAI-21)

August 19th to August 26th, 2021

TUTORIAL ON ROBUST DEEP LEARNING MODELS

Abstract

Content

PROGRAM

PRESENTERS

KEY REFERENCES

CONTACT

Web Builder and Tutorial Assistant: