MalwareScore is a machine learning based antivirus solution included in Endgame's enterprise security platform. It is fast, lightweight, frequently updated, and has been continually expanded to more and more file types. MalwareScore's journey from Kaggle competition code built in 2015, to brittle proof of concept, to robust production model running on customer workstations contains many twists and turns.
I'll talk about how a small team of data scientists built the original data pipeline and ran into many problems as the data scaled. I’ll explore the tradeoff between speed of development and maintainability of your data pipeline and stress the importance of defining interfaces between research and engineering for handing off completed projects.
As a data scientist at Endgame, Phil develops products that help security analysts find and respond to threats. This work has ranged from tuning a machine learning algorithm to best identify malware to building a data exploration platform for HTTP request data. Previously, he developed image processing algorithms for a small defense contractor. While earning a PhD in physics, Phil used a machine learning algorithm and the IceCube detector at the south pole to search for neutrinos from other galaxies.