Blackbox Interpretability: Next Frontier in Adversarial ML Evasion

Abstract

Blackbox interpretability is an emerging field of study in AI and ML. ML often incorporates millions of features, making model decision interpretation nearly impossible. Interpretability promises answers to those questions, but the bad news is that attackers can also use it to identify weak spots in your defenses. Learn how you can use it to identify when attackers have discovered your secret sauce.Learning Objectives:1: Understand blackbox interpretability.2: See how attackers can leverage blackbox interpretability to evade detection (feature evasion).3: See how defenders can leverage blackbox interpretability to identify feature evasion.