TUTORIAL ON ROBUST DEEP LEARNING MODELS
Abstract
This tutorial aims to introduce the fundamentals of adversarial robustness ofdeep learning, presenting a well-structured review of up-to-date techniques toassess the vulnerability of various types of deep learning models to adversarialexamples. This tutorial will particularly highlight state-of-the-art techniques inadversarial attacks and robustness verification of deep neural networks (DNNs).We will also introduce some effective countermeasures to improve robustness ofdeep learning models, with a particular focus on generalisable adversarial train-ing. We aim to provide a comprehensive overall picture about this emergingdirection and enable the community to be aware of the urgency and importanceof designing robust deep learning models in safety-critical data analytical ap-plications, ultimately enabling the end-users to trust deep learning classifiers.We will also summarize potential research directions concerning the adversarialrobustness of deep learning, and its potential benefits to enable accountable andtrustworthy deep learning-based data analytical systems and applications.
Content
The content of the tutorial is planned as below:
-
Introduction: this part will introduce the concept of adversarial robustness by showing some examples from computer vision, natural language processing, and malware detection, autonomous systems. Specifically, we will demonstrate the vulnerabilities of various types of deep learning models to different adversarial examples. We will also highlight the dissimilarities of research focuses on adversarial robustness from different communities, i.e., attack, defense and verification.
-
Falsification: this part will detail some famous adversarial attack methods with an aim to provide some insights of why adversarial examples exit and how to generate adversarial perturbation effectively and efficiently. Specifically, we will present five well-established works, including FGSM [1], C&W [2], DeepFool [3], JMSA [4], ZeroAttack [20]. In the end of this part, we will also briefly touch some novel adversarial examples emerged recently, including universal adversarial examples [21], spatial-transformed attacks [7], adversarial patches [22], etc
-
Rectification: this part will present an overview of state-of-the-art robust optimisation techniques for adversarial defense, with emphasis on generalisable adversarial training and regularisation methods. In particular, adversarial training with Fast Gradient Method (FGM) [14], Projected Gradient Method (PGM) [15], Wasserstein Risk Minimization (WRM) [18] will be analysed with respect to generalisation guarantees, and regularisation techniques such as spectral normalisation [13] and Lipschitz regularisation [23] to promote training stability and robustness against adversarial examples will be also discussed.
-
Verification: this part will review the state-of-the-art on the formal verification techniques for checking whether a deep learning model is robust. Techniques to be discussed including constraint solving based techniques (MILP, Reluplex [8]), approximation techniques (MaxSens [23], AI2 [10], DeepSymbol), and global optimisation based techniques (DLV [11], DeepGO [12], DeepGame [17]).