Case Study

Malwarebytes Case Study

Issue link:

Contents of this Issue


Page 1 of 3

Today, the scale of Malwarebytes' threat data processing and analysis is massive. The company focuses on helping its consumers and business customers to protect against threats, not just remediate malware infections. It uses this data to identify anomalies based on endpoint variations and critical detections. But this data is not only used for predicting potential threats. Malwarebytes adopted Qubole in concert with Kafka (for ingesting data streams), and an AWS S3 data lake (for data storage). The virtually unlimited compute power in the cloud seemed like the solution to its performance problems, but soon proved cost-prohibitive. Compared to on-premises, adding compute was fast, as administrators could do it themselves at any point. But manually releasing it when it was not needed was almost impossible due to the bursty and unpredictable nature of big data workloads. For example, a ransomware outbreak in the world, resulting in a surge in data volumes of 5x to 25x or more. This resulted in constant cost overruns and fire drills for overloaded systems administrators. Malwarebytes' data processing paradigm changed with Qubole. First, it de-coupled compute and storage. Second, "playing by the rules of the game of the cloud," says Kulkarni— leveraging things like autoscaling (scaling out and scaling up, and being elastic & ephemeral in nature), low-cost compute instances (AWS Spot), and storage (an AWS S3 data lake)— significantly improved the efficiency of data platform. Today, Malwarebytes uses Qubole to process its data. About 60 to 70% of it is logs, telemetry and other types of unstructured and semi-structured data that is being processed in Qubole. "Qubole has really mastered the elasticity component of the cloud," says Malwarebytes Director of Data Science and Engineering Manju Vasishta. "Qubole helped us run our ETL at night, spinning up and spinning down clusters when we needed them." This ability to add and remove compute resources on demand—based on the workload or SLA, and without human intervention—in a matter of minutes has greatly increased the speed at which Malwarebytes processes critical data, directly affecting the company's ability to detect, predict, and remediate emerging threats. Qubole isn't just quick. It's highly efficient, too. "You really have to aggregate the heck out of our data to make any sense of it or to bring it to a level where it can guide us in our decision-making process," says Sujay. He cites one key project for which Qubole aggregates and processes between 20 and 48 terabytes of raw data per day but delivers just 2 to 3 terabytes of meaningful and actionable data. Qubole provides a single framework for processing data more quickly, whether for use in ML models for predictions, in BI applications for business reporting, or for GDPR compliance—all with just one full-time administrator plus three senior engineers—a few times per quarter. The result is more powerful insights, because they involve better data. Finally, Qubole is cost-effective. Malwarebytes pays Qubole only for the resources it uses. Qubole's Elasticity Improves Processing Speed and Lowers Costs CASE STUDY We really needed to work with a company that had mastered the elasticity of the cloud, and that company is Qubole." " Manju Vasishta Director of Data Science and Engineering Malwarebytes

Articles in this issue

view archives of Case Study - Malwarebytes Case Study