Dangerous Skills: Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions on Virtual Personal Assistant Systems

Abstract

Virtual personal assistants (VPA) (e.g., Amazon Alexa and Google Assistant) today mostly rely on the voice channel to communicate with their users, which however is known to be vulnerable, lacking proper authentication (from the user to the VPA). A new authentication challenge, from the VPA service to the user, has emerged with the rapid growth of the VPA ecosystem, which allows a third party to publish a function (called skill) for the service and therefore can be exploited to spread malicious skills to a large audience during their interactions with smart speakers like Amazon Echo and Google Home. In this paper, we report a study that concludes such remote, large-scale attacks are indeed realistic. We discovered two new attacks: voice squatting in which the adversary exploits the way a skill is invoked (e.g., ``open capital one''), using a malicious skill with a similarly pronounced name (e.g., ``capital won'') or a paraphrased name (e.g., ``capital one please'') to hijack the voice command meant for a legitimate skill (e.g., ``capital one''), and voice masquerading in which a malicious skill impersonates the VPA service or a legitimate skill during the user's conversation with the service to steal her personal information. These attacks aim at the way VPAs work or the user's misconceptions about their functionalities, and are found to pose a realistic threat by our experiments (including user studies and real-world deployments) on Amazon Echo and Google Home. The significance of our findings has already been acknowledged by Amazon and Google, and further evidenced by the risky skills found on Alexa and Google markets by the new squatting detector we built. We further developed a technique that automatically captures an ongoing masquerading attack and demonstrated its efficacy.

Abstract

Slides

Videos