Explore a Modern Data Lake in AWS for Agile Analytics

Explore a Modern Data Lake in AWS for Agile Analytics

  • Learn about the latest trends in Big Data in the Cloud
  • Find out how Qubole is helping companies like Oracle, Pinterest, and MediaMath deploy Big Data in the cloud for hundreds of users
  • Hear from speakers from 47 Linings, AWS and Qubole

About the Talks

Mick Bass, 47LiningCEO, walks through best practices for data lake reference architecture in AWS and shares real-world customer use cases like detecting fraud, determining propensity to buy, predicting customer churn, optimizing industrial processes and offering content recommendations.

Ashish Dubey, Qubole’s Technical Director of Solutions Architecture, dives into Hadoop, Presto and Spark use cases and shows how Qubole built significant application-level automation to effectively leverage EC2 and Spot Instances to maintain reliability, performance, and cost for different workloads (ML, ETL, concurrent interactive analysis, etc.). Ashish also highlights industry trends in which modern data teams can consume read and analyze data from S3, Redshift, or other data sources, as well as discusses when should you use Presto, Spark, or Hive and how Qubole can help achieve 80-90% in cost savings through autoscaling and spot instances.

Michael Stubblefield, Principal Engineer from SendGrid, presents a summary of their data flow and give an example of using SendGrid’s Webhook feature to add vital mail send data to your company’s Data Lake or Data Warehouse. Michael will also discuss how SendGrid integrates its logging systems, Kafka, S3, Hadoop, and various AWS services at scale.

Brian Nelson, SendGrid’s Senior Datawarehouse Architect, follow-ons with diving into how SendGrid integrates its many traditional databases and Big Data technologies across Amazon EC2 and Redshift to drive dashboards that its leaders use to track company performance and make decisions.