Star 0

Abstract

Security teams must address the countless vulnerabilities in popular document formats like PDFs, Office files and legacy textual formats. This session will cover the best practices on how to build a document analysis pipeline including the pros and cons of true type detection, sandboxing, signatures, dynamic/static content inspection, isolation and content disarming and reconstruction. We will also cover the attackers view and the different evasion techniques of malicious payloads going through a carefully designed document analysis pipeline. We suggest mandatory building blocks for designing such a pipeline: a mapping component to handle classification of byte arrays, a prepare component to support morphism to a more accurate file representation, analysis component to run the different heuristics, an isolation component and then a CDR component. And then finally a workflow that orchestrates and ties these components together to yield low false positive/negatives rate. Real war stories will be shared including defining the right amount of tolerance for balancing between productivity, performance, vendor integration and success rates, future adaptability of the pipeline and practical implementation details.