Internet of Things in the World of ML & AI from Qubole & Google Cloud
Bill Vorhies: Good morning, good afternoon, and good evening to all of our attendees joining us today for this latest Data Science Central Webinar. This is Bill Vorhies, your host. I’m the editorial director with Data Science Central and also chief data scientist for Data-Magnum. I’d like to start off our event today by thanking Qubole for sponsoring today’s event. Qubole is a valued supporter of the Data Science Central community, and were honored to have them sponsoring our event today. I’d also like to take this opportunity to mention and show our appreciation for some other recent sponsors including Dell Statistica, Alteryx, Tableau, Microsoft, Executive, and Pivotal to name just a few.
Now past webinars are available on demand at DataScienceCentral.com. If you haven’t had an opportunity to view them, I encourage you to take a look. They provide some very useful insight into a wide variety of topics of interest to our data science community.
Today’s webinar is entitled an Internet of Things case study: Using Big Data to Create Operational Efficiencies. Before we begin, I’d like to briefly review the format for today’s webinar. Today’s webinar will be an hour long. We’ll have two presenters that I’ll introduce in just a minute. There’ll be a 10 to 15-minute question-and-answer period following the presentation, and this event is being recorded and will be available on datasciencecentral.com later this afternoon following today’s live event. I’d also like to encourage our attendees to provide questions throughout the presentation, we’ll be reviewing and presenting them on your behalf during the Q&A portion of today’s event.
I’m very pleased to introduce today’s speaker, Dr. Mohan Krishnamurthy with Qubole and Paul Asoyan with Google Cloud Platform. Mohan has a background in mechanical engineering and controls and has worked with auto manufacturers and trucking fleets in improving fuel efficiency and reducing emissions. He was part of the research group at West Virginia University that conducted the emission study exposing the Volkswagen scandal. Mohan has a PhD in mechanical engineering from West Virginia University and an MBA from UC Berkeley’s Haas School of Business.
Paul Asoyan is a strategic partnership manager with Google Cloud Platform, where he helped shape the future of how Google’s cloud technology is used. His role is to develop and cultivate services and system integrator relationships with strategic partners in the United States and Canada. Paul’s particular focus is the Internet of Things and big data. Now prior to joining Google Cloud Platform, Paul managed strategic partnerships on Google Maps. He’s been with Google for eight years where he previously helped build high performing sales and channel business for Google Apps. Paul’s a graduate of the University of California Berkeley and holds a degree in political science.
Thanks for being with us today Mohan and Paul. We’re looking forward to your presentation.
The Internet of Things is touted with the potential to glean new insights and efficiencies by tapping into massive streams of data from sensors and machines. Driven by industry standardization, regulation, and the need for competitive advantage, the trucking industry became early adopters in onboard telematics and engine sensor data. Though this has generated volumes of data, most trucking companies have not fully capitalized on the wealth of business value in telematics data. This webinar will look at a case study that examines how IoT based data can be analyzed and turned into insights that reduce cost and create operational efficiencies across the business using Google Platform and Qubole.
In today’s webinar, you’ll hear from Mohan Krishnamurthy about data derived from the automobile and trucking industry. This data was then utilized to gain insights for fuel efficiency.
Additionally, Paul Asoyan will share why Google Cloud Platform is the best place to build an IoT initiative taking advantage of Google’s heritage of web-scale processing, analytics, and machine intelligence. Mohan, with that, I’m going to turn it over to you. You can begin as soon as you’re ready to go.
Mohan Krishnamurthy: Thanks, Bill . Thank you, everyone, for taking the time to be with us on this webinar. Let me give you a quick introduction about myself. This is what I look like as you see in the picture. I am Mohan Krishnamurthy. I’m a senior product manager at Qubole. As Bill briefly mentioned, I have spent a big part of my career looking at data analytics and data science, more focused on the auto industry and the trucking industry in particular. Helping fleets become more fuel-efficient and helping engine manufacturers design engines and trucks which are a more fuel efficient and more clean in terms of emissions.
Data has played a big part in coming up to those better engines. Today at Qubole, I play a role in building a self-service data analytics platform that not only is going to help data scientists but also our data engineers and analysts to have easy access to data so that they can perform the analytics that can turn the data into business insights.
Today we are going to talk about IoT. IoT spans a wide spectrum, and we are going to focus on IoT analytics in general, and how analytics can turn that IoT data into business value. The business value of IoT is very wide range IoT is touted as a revolution in itself. It’s because it brings about various changes. It has created new revenue streams, new products are coming out in the market. It has the potential to reduce your operational costs, make better use of your assets.
To me, this is also personal. Last night, I was going back from work and it was in an accident. Two minutes later, I get a call from my wife asking if I’m okay. This was possible because my car had an IoT device plugged into the OBD device. This device called my wife was on the emergency call list and made sure that I was okay, and this is the kind of change that IoT can bring in a B2C environment, in a B2C environment.
Is this change so easy to produce? There is definitely going to be some of those challenges, and those challenges are specific to the large data that the IoT is going to generate. Let’s take a look at some of those challenges. If you are in the IoT business, you’re providing a service, you are going to face with this large data. You have to make a choice as to choosing the right technology, but that technology is numerous in place. You have to make a choice, and you have to make some upfront investment cost that becomes a barrier in leveraging this data. It’s also going to change your business process. Those are some of the barriers that stand in the way of IoT and the IoT delivering its business value.
Once those barriers are overcome, how do you leverage the data? You have to look at the right infrastructure that can play into the way IoT data comes up. It is going to grow exponentially. We are not talking about megabytes of data, we are now talking about petabytes of data, so choosing the right infrastructure becomes a very important choice. Choosing an architecture that is scalable, that can be optimized for cost, but not just that. It should be easy to implement and easy to administer.
Once you made the choice, you’re also not supposed to be locked into a particular vendor, and that is where cloud becomes a great choice and a perfect pairing for IoT and its data. Now you have the right infrastructure in place, what do you do next? How do you turn this data into value, into insights? You ask the right questions. Questions come in various forms.
As a data scientist, I have had the trouble dealing with the infrastructure, but once the infrastructure part was solved, I had to look at what kind of questions can I ask. I have to write some machine learning algorithms to understand or be able to predict fuel economy from trucks. I was looking into parameters that affect fuel efficiency, and that is a data mining question. Questions come in various forms and your infrastructure should be capable of handling the variety of questions that go into determining the value in the data. You ask the right questions, you get the answers, but does it end there? I believe you have to operationalize the data. Operationalizing the data is possible only if you develop the right metrics.
In the past, I’ve looked at fleets to say that, this has to be the fuel efficiency of the truck, and that becomes a metric to look into. In terms of driver behavior, is your driver driving at a particular speed? Understanding if speed is a metric that needs to be looked at. Acceleration could be another metric they look at. Hard braking was another parameter that has helped fleets in the past from the data that I’ve looked at. Once you have these metrics, you have to build the right dashboard so that the user understands what changes can be made. Don’t take my word for what it is. Let’s take a look at Carrus Mobile which is a company that is revolutionizing the trucking industry.
A trucking industry has been for a very long time and is now ripe for some disruption. Carrus Mobile is bringing the disruption into the market, but how? They provide a plug-and-play hardware device that is now the telematics platform for the trucking fleets. You simply plug this device into the OBD port of trucks that now collects the engine data and sends it for further analysis.
Why is this important? The trucking industry has been collecting telematics data for a very long time, but the only piece of information that was being used with geolocation data. Today, with connectivity to the engine and the standard protocols that have been in place, there is more engine information that is being collected. Now comes the challenge of turning that information into insight. It is not just geolocation, it is understanding the health of the truck. How is your driver behavior affecting the fuel efficiency? Can I improve certain operations that is going to help my fleet? If you’re a manufacturer, can I redesign my engine to be more fuel-efficient?
Taking this data and catering solutions that are particular to the user and the specific needs of the industry becomes very critical. One distinction I can see is, if you are a large fleet, the needs of you mining the data are very different from a smaller fleet. As a solution provider, you need to understand the difference, you need to able to cater solutions that are very focused and so that the end user can derive the benefits of your insights.
Carrus Mobile started as a telematics provider, but they decided to revamp the business model. They started asking, what additional value can Carrus Mobile provide? They started looking at, how can you productize predictive analytics offerings that can save fuel costs for their customers? Deriving insights that can help fleet managers be able to better predict a maintenance frequency for their trucking fleets, as two of the example the Carrus Mobile provides.
They started also with an operational question. How can I minimize the time and effort it takes to deliver these insights? They also made a choice of using open-source software, but as many of you know, maintaining open-source software is not an easy task and it is not user-friendly. They wanted to improve the productivity of their engineering and data science team but eliminating the time spent on DevOps. They turned Qubole and looked to partner with Qubole as a solution because Qubole provides that scalable architecture by leveraging the elasticity of the cloud and provides an easy access to implement and administer the data through a well-developed user interface.
Not only that, because of the cloud, because Qubole leverages the cloud storage and compute, they can easily deploy the product in days instead of months. Because Qubole is also managed platform, Carrus now does not have to worry about upgrades and changes in technology. Each upgrade is automatically updated by Qubole, so Carrus as an end user now has the flexibility to deal with the new updates.
How does Qubole do this? Let’s take a look under the hood as to how Qubole leverages the elasticity of the cloud and separates storage and compute. The first part of this slide is the event producer, which happens in our case in this example is the truck that is generating data. This data from the trucks ECU is sent via a mobile device to a server. From there, the data is transported via an open source technology, which in this case of Kafka, to a storage location on the public cloud. If it is on the Google Cloud Platform it is going to be the Google Cloud storage.
Now, as you can see, the elasticity part of the storage is handled by the cloud provider. As your data grows, you don’t have to worry about adding new resources to store the data. The public cloud is going to take care of that. For the analytics part, this is where Qubole separation of cloud and compute becomes very important. As a data scientist, if I’m going to run a machine learning algorithm, I need to spin up a spark cluster. I can come into Qubole’s UI, and with a click of a button, turn on as spark cluster, import the data from my public cloud, run my query, and Qubole automatically will terminate a cluster when my workload or query is done.
Once I have the results, I can now share the results through a user-friendly notebook where I can generate graphs which you can share with the rest of the group. In Carrus Mobile’s case, they can share these as reports to their end customers. That is the power of separation in storage and compute that Qubole provides. Using this infrastructure leveraging Qubole, Carrus now provides essential fleet analytic solution that helps fleet managers identify when they have to perform maintenance operations.
As a driver, it gives me some metrics as to, am I over speeding? Or what is in my driving behavior affecting fuel efficiency? What steps can I take to be a safe and a fuel-efficient driver?
The benefits go beyond just these. If you are a senior management looking at add the fleet operations, you’re going to look at ways to reduce costs further. One thing that pops out and becomes very prominent is insurance, which is about 30% of fleet costs. By being able to track your driver, monitoring them for the driving behavior, now you have a better understanding of operations that lead to a safer fleet. This can, in turn, reduce your insurance costs. Fleets now work with insurance companies, and with this wealth of information, are able to reduce the insurance costs and hence they save money or the fleet operations.
This is just one use case in one specific industry. The value of IoT is boundless, and the value of IoT data is even more, but you have to choose the right infrastructure as the right questions. Qubole provides that infrastructure and an ability to be able to leverage the data to derive the insights that can turn this data into a business value.
Now, Paul will talk to you in detail about the Google Cloud and how Google Cloud helps companies in the IoT space leverage their offering. Paul.
Paul Asoyan: Wonderful. Thank You. Mohan, well, thank you very much, of course. My name is Paul. I work for Google on the Google Cloud Platform team. What I want to talk to you about is IoT Internet of Toasters as we jokingly say but also Internet of Things, of course.
The reason why we’re a little bit humorous with that is because IoT is everywhere, and we feel that it may have reached the higher peak of the hype curve. Everybody’s talking about IoT. IoT is the hottest thing out there or one of the hottest things that everybody wants to know more about, but we argue that there is no consensus on what IoT actually is. We argue that IoT is a concept that basically says that if a particular device is connected to the internet, but this is not our default perception of that device connected to the internet such as, for example, watches. Watches are not connected to the Internet by default or washing machines or toasters even, then this is an IoT device.
We don’t really say Internet of computers because computers are connected to the Internet by default. That’s our working definition of IoT, and it’s understandable that it’s a confusing concept. Also, it’s understandable that’s an exciting concept because numbers sort of support it. We argue that there’s going to be at least 20 billion devices connected by the year 2020. The economic impact of that is in trillions. Once you’re dealing with trillions, it’s just very difficult to be precise with that, but it’s very significant. 54% of top performer companies will invest more in sensors this year, and the sources are cited below in general. Anytime we’re talking about billions humorously, we should have Dr. Ebal there.
It is a period of transition. As you remember, those flip phones were not connected to the internet and then they sort of became connected to the internet, and now they are, in our opinion, out of the IoT period of transformation because all the phones are connected to the Internet.
Once you become connected, you remove yourself from the realm of IoT but wearables and watches and cars are sort of connected, so we are in this very interesting period of transition. We argue that it is conceivable that we’re in the midst of the fourth Industrial Revolution. McKinsey agrees with us. They also say that there will be no longer a difference between information and materials because products will be inextricably linked to their information, and there is a lot of information there.
Why build IoT? Information is everywhere but it’s not data yet, it’s shocking how much information is out there that we’re not leveraging or that we’re unable to leverage for various reasons. Anything from the information within our houses, informations in cities, retail transportation, manufacturing, and healthcare has a tremendous amount of potential if we know how to wrangle this information and make perhaps predictions about what will happen in the future based on the past.
How do you collect and process this analog information to transfer into useful business intelligence? What’s interesting is IoT comes to us through a different lens, because we’re used to digital information kind of being the digital native, originating in the digital format, whereas IoT information is analog that needs to be translated into the digital realm, and Internet of Things is the way to do this.
IoT is a new graph of how we connect with data as individuals in our day-to-day lives. We also argue that the internet created the information graph that changes how we produce access share and generate knowledge. Social media created the social graph that changed how we establish and foster relationships with others. The Internet of Things creates a physical graph that changes how we interact with object and environment.
With all this hype and these tremendous opportunities out there, what has been holding people back? Well, it’s obviously a very, very complex and challenging environment. There are a lot of moving parts here, everything from device hardware, everything from manufacturing that to the device operating system, networking, mobile app development, data scale, security, and cloud app development. All of this needs to work in concert with each other, and each one of those circles by itself is quite the challenge.
There was a survey that was sent out from the World Economic Forum industrial internet survey in 2014 when they’ve asked, what’s holding people back. Interoperability, of course, is number one, followed very closely by security, business case, legacy hardware, and so forth. What we’re seeing also on our side is an exponential interest in developing IoT solutions. We feel that this is going to be an absolutely critical part of most businesses in the very near future. Just to highlight what’s holding companies back, it’s the technical complexity, security, and privacy, data challenges, cost, lack of standards, all of these are absolutely valid concerns.
Why now is a great time for IoT? Well, tiny cheap devices are more available, complete Wi-Fi plus logic for less than $3. Cloud backed data and processing. Large machine learning systems and petabytes scale systems now available on demand. We’re talking about a cent per gigabyte of storage costs. Foundationally, we feel like we’re in an interesting time when all of these things are possible. IoT at Google is basically resides around the Google Cloud Platform and includes Nest, and Brillo, and Weave, in Eddystone, and OnHub, Glass, Android Auto, and Android itself. We’re going to chat about how all of these things connect with each other.
Many of you are familiar with Nest, whether it’s the thermostat, or the camera, or the smoke alarm, and it’s basically a solution for the connected and thoughtful home. It’s interesting that these very disparate devices are going to be connected with each other. We also have Brillo and Weave, and we’ve built it to help device makers built for IoT create open ecosystems and great opportunity for services.
What is Brillo. Brillo is an OS tailored for IoT to build a secure connected device and comes with services and analytics and updates. If you’re interested in learning more, there is a URL on the bottom that I encourage you to check out. Weave is a communications platform for IoT devices that standardizes device commands and state, facilitates user interaction from mobile devices on the web, and integrates with Google services, and also there is a URL on the bottom that I encourage you to check out.
Eddystone is essentially Bluetooth beacons to drive IoT solutions. There is a lot of interest in that. It’s everything from mobile payments to tracking and so forth, so this is something that we’re also very interested in. There’s also Google ATAP, which is a division within Google that’s working on other things IoT as well. There is also Google X that’s working on things that are– It’s related. One of the things that we can talk about is the contact lens that actually send signals through your phone signaling your state of health. Google Fit, of course, is an application that I encourages to check out as well, basically, it measures how fast you’ve gone, run, and had walked.
One of our newest things is the router for the new way to Wi-Fi, and it’s called on hub, I also encourage you to check that out. Essentially– oh, I apologize. The newest device that we have is Google Home, and that’s a device that you can control with your voice.
Google, in our opinion, is uniquely positioned to take advantage of the IoT opportunity out there. We have software engineering, security, site reliability engineering, we have folks who work in privacy and in the in the UX realm. Our focus is on the user. We have a bias for open culture of innovation and world-class infrastructure. I think all of this is a wonderful foundation for us to be successful. We also do collaborate with a lot of universities, and the resulting open ecosystem should facilitate experiment with application and user experience, ensure privacy and security, develop systems that guarantee interoperability.
How does IoT relate to Google Cloud Platform? Is something that I’m particularly excited to talk about. We feel that the data center is not a collection of computers, a data center is a computer, and we’re very well known for our technical infrastructure. One of our edges is that we run an incredible distributed technical infrastructure. We have 70 edge locations in 33 countries, and it’s the broadest reaching network of any cloud provider. Google Cloud Platform gives you a pretty comprehensive look into what is important. It’s compute, it’s networking, it’s big data, it’s operation, storage, mobile, and developer tools.
Google research in data technologies is by no means new. Many of you know that we have published a white paper called MapReduce that later became Hadoop. We also have a lot of interesting technologies that should be familiar to most of you, including Bigtable for example, as well as perhaps pops up. As products on Google Cloud Platform, what we have done is, we have taken internal tools that we’ve been using for a long time and made them into enterprise great solutions. Bigtable, for example, BigQuery, Dataflow, cloud storage are some of the things that should ring a bell.
I want to focus for just a second on BigQuery, just because its center stage for not just collecting a tremendous amount of data but analyzing that data, which is one of the more challenging things out there. What is it? It’s a fully managed no-ops data warehouse. It is better by scale and fast, it offers the convenience of SQL, and it’s essentially an externalization called Google Dremel. It’s something that we have worked on for many, many years, is now external to our customers. Why do we feel that it’s a compelling solution into IoT realm? It’s an independently scaling compute and storage, scales in seconds, easy multi-tenancy and unlimited resources, and it’s a consumption-based model, so you’re not prepaying.
Unlimited compute is really critical here. Just because of the vast amount of data, something that Mohan alluded to, multiple times there is a tremendous amount of data in the IoT realm, workloads cannot be competing for finite resources if you want to be successful in the IoT realm. No starvation of resources due to a bad job, and no resizing clusters for more resources.
Unlimited storage, so storage is also incredibly important. The reason for that is– the first question that we get asked is, so fine, we’re going to be able to collect all this information, how do we store it in a secure fashion? How do we store petabytes upon petabytes of data? You don’t need to throw anything away or archive all data because you can make amazing predictions based on the corpus of data. The bigger that data is, the better off you are with all things being equal.
Then store everything accessible by SQL immediately. It’s basically flat 2 cents per gigabyte per month, and that price may be a tad holt because we do change our pricing all the time, and it’s getting cheaper and cheaper with time. Consumption-based model, obviously, so you’re not really prepaying for anything, and it scales with your use.
Let’s run some stats for complex query, and this is, I think, also maybe slightly dated just because our computing resources are getting much faster. So execution time is 11 seconds, data scan, four terabytes. If you wanted to measure 100 billion and a data shuffle 278 gigabytes. It’s pretty incredible, what is happening now and what can happen in the next two or three years.
Security is one of the most important aspects there, and we feel that ours is certainly a standout. As far as people are concerned, we have more than 500 top security experts on staff. We care a lot about the physical security peace, and we do offer full stack. We have the top seven manufacturers. We built everything: hardware, custom software stack, full stack ownership greatly reduces attack a surface, of course, and live migration keeps you running wild with patch.
Quick case study on Google-enabling IoT. Nest is something that you’re already familiar with, and it’s sitting on the Google Cloud Platform. Shield architecture and Google container engine. There’s Acuima and there is monitoring goods in transit using pops up with Eddystone beacons.
If I had to briefly say why Google for Iot? It’s the speed of insight, Google’s global network, broad footprint across the IoT reference stack, innovation, security, pricing, and, of course, 15 years of know-how building the world’s largest supercomputers and processing enormous amounts of data. That’s all I have. Thank you.
Bill: Okay. Well, Mohan, Paul, thank you so much for that excellent presentation. We’ll get started with today’s Q&A session. I want to thank the audience for their participation. We’ve had great, many questions that have come in during the presentation. We’ll do our best to get through them all during the time remaining.
During the Q&A session, I’ll leave up this screen with contact information for Mohan and Paul if you’d like to contact them following at today’s webinar. Let’s get started. Mohan, Paul, if I am a company that is contemplating getting into IoT, I have a choice to make about whether or not this is going to be in the cloud or on-premises. Can you tell us what the decision criteria would be for making that call and what are the advantages of running this in either environment?
Paul: This is Paul. I can point out a few things and I would love for Mohan to chime in as well. We firmly believe that leveraging computing power of Google or frankly another Cloud provider is something that cannot be matched on-premise. Secondly, the pace of innovation that Google and other cloud providers are known for will enable you to leverage the latest technology in the cloud. Of course, the security piece. We feel that security is a moving target. If you have security in the cloud, then it can happen much faster, and this will enable your solution to be more secure in the cloud. Of course, perhaps the last thing, is the scale.
If you want to scale your solution on-premise, you’re going to have to plan for data center expansion servers and so forth. Whereas, in the cloud, it scales as you go.
Mohan: One thing I would like to add to Paul’s comment is also the ad hoc nature of analysis. As a data scientist, I think of a query. I want to be able to run the query. When the query is done, you want your compute instances to be turned off. If I am on-prem, I have to be able to plan what is my workload, and I need to be able to make the infrastructure decision up front. With the cloud, you have been on-demand instances that you can create where you can run your query, bring down those instances when your query is done. I think that’s another powerful feature that cloud provides.
Bill: Terrific. Thank you for that answer. Here is a point of confusion that we’ve probably brought on ourselves in this era of talking about big data and streaming analytics. If I can already do real-time analytics, why do I need big data analytics?
Paul: Bill, could you repeat the question, please?
Bill: Sure. If we are already doing real-time analytics, why do I need big data analytics? Why are big data analytics needed if I can do real-time analytics already?
Mohan: The way I understand is real-time analytics provides definitely a lot of value in our team. I gave you an example of a real-time alert. I was with a solar company where monitoring the system was very important. For example, if the solar panel or your power plant is not producing the right amount of output, you generate that alert, you’re continuously monitoring, but there’s also value in, what I refer to as the cold data. You’re collecting this large amount of data that is stored in essential place. It gives you a historical analysis that is equally important.
In the case of a solar power plant, it was very critical to know the weather conditions, the weather patterns in a particular location before deciding on where the new power plant should go to. Yes, real-time offers benefit, but there’s other information that can be mined out of the data that is collected over time. Hence, having this infrastructure in place becomes very important.
Paul: I agree with Mohan. This is Paul, by the way. We feel that not only big data analytics is certainly the future of IoT, but especially, it’s powerful. Just to drive Mohan’s point is if you harness the power of machine learning on top of it, that will be able to help you predict future events with tremendous accuracy, and, of course, for that, you need big data.
Bill: Thank you. Thank you for those answers. I’d like to get at the broad question of what are the major benefits of running big data analytics on IoT events, but let me give you some specifics as well because we’ve had questions come in about whether or not or how to get those benefits, for example, for a recommenders or for studying usage patterns for upselling. Perhaps you could focus on those as well as just talking about the broader benefits. Anybody wants to take that one?
Paul: This is Paul. I just want to make sure that I understand the question fully. How do you harness the power of IoT to generate more business? Is that the question you think?
Bill: Generally, what are the benefits of running big data analytics on IoT events, and then the audience has asked for specific examples relating to upselling and recommenders.
Mohan: I can cite an example. I may be biased in this. I’m going to use automotive as an example. There are companies such as Verizon Hum, Automatic is another company, and there’s another company called MileIQ. These are companies that are new business models in automotive space. They provide you with these dongles that you can connect to your car. They are collecting data as you drive. They’re collecting information about your driving behavior, the routes that you normally use. Using this data, they are able to create data products where now they have multiple customers that they can sell this product to.
One example is Automatic now works with, for example, insurance companies in providing information about their customers and how drivers are driving in various geolocations. This helps insurance companies to create risk patterns in certain locations. This is the information that the insurance company did not have earlier. Now, using the data, they get better understanding of driving behavior of their customers. These vendors, they also now have information about the cars itself, the fuel economy of these cars. Now they can package this information and sell it to the car manufactures itself.
It is just one source of data, but now it is catering to different customers and generating revenue streams for these customers alongside the vendors.
Bill: Okay. Thank you for that answer. Mohan, as a fellow data scientist, I have to ask a simple question that is, is there any restriction on the analytic languages that I can use in Qubole. Can I use, for example, R to take a real simple one?
Mohan: As a data scientist, I would say yes, absolutely. Qubole now only provides R through our notebook interface data scientists or anyone who wants to run analysis in R can use and run R on Qubole. I’d like to add that is not just R, but Qubole provides access to other languages of your choice. You’re seeing a growing number of data scientist use Python as their choice which can also be run within the Qubole environment. Now you’re also seeing Scala become a choice for data scientist and analyst. Scala can also be run within the Qubole environment. Now we provide that choice and now you can run your analysis the way you normally do within the Qubole environment.
Bill: Terrific. Really no practical restrictions at all, and it’ll handle all of my data science needs and make it simple for me. There are still some barriers that people have in their minds about moving their data onto the cloud. Certainly one of them is security of the data, so what guarantees can you provide for security of the data? Are there certifications or transpency? Are there bodies of compliance practices? How do you deal with questions about data security?
Mohan: This is Mohan. Within Qubole, we are strictly on the cloud in a sense. We leverage the public cloud, so we’ve taken security very seriously and we are constantly improving our security features and we are complying for South too. Also, we’re working towards HIPAA compliance as we see a growing customer base in the healthcare industry.
Paul: I just wanted to point out on the Google Cloud Platform side of the house, we have a plethora of certifications, but the one that are perhaps the most relevant are SSAE 16 as well as ISAE 3402 Type II and SOC Type II. We have the SOC three public audit reports available to anybody as well as ISO 27001 and other ones as well as PCI DSS version 3.1. Google Cloud Platform will also support HIPAA covered customers by entering into a business associate agreement, the so-called BAA. The cloud platform BAA currently covers compute engine, cloud storage, cloud sequel genomics and BigQuery.
Bill: Paul, thank you. I notice on one of your slides you mentioned as one of the barriers, not a real barrier but a perceived barrier, our concerns about standardization. Is there a particular organization or set of standards that’s in the lead right now? Where’s the industry headed on standardizing inputs?
Paul: I think that this is perhaps the most contentious point. We have not seen a strong leader there, but we do feel that in the very foreseeable future, and my opinion is six to 12 months, we’re going to see some sort of sanitization. It’s difficult to predict who is going to emerge victorious so to say in this, but it is a great question and a question that is not resolved in it’s finality yet, but we do feel that it’s going to be resolved very soon.
Bill: Yes. In such evolving tech in this industry, I certainly agree with you. It’s going to have to arise. Mohan, we have a question. You talked about your work in optimizing fuel efficiency with your truck manufacturer, how did it turn out? Were you able to reduce maintenance? Can you give some quantitative scale to that and can you talk a little bit about the machine learning algorithms and statistical models that you used to get your result?
Mohan: Yes, sure. One example I can think of is when I worked with a fleet in Reno. They make daily trips to Port of Oakland in California. The optimization we looked at was given the route that they take, and it’s a daily route that they take, we were able to see differences between their drivers as to what contributed to more fuel consumption. I briefly mentioned this during my talk, we looked at the parameters that stood out was the vehicle speed itself which I think it’s a fairly straightforward to assume vehicle speed has an impact on fuel efficiency.
What speed should the driver drive? Is is 55 mph? Is it 60 mph? When we looked at the kind of engines in the truck and the load that they carry, so they’re most likely carrying a fully loaded truck when they come from Reno to Oakland. They empty their truck and they’re running an empty load sometimes from Port of Oakland to back to Reno. By looking at these various operating conditions, we noticed that for condition one, which is to say if they’re fully loaded. Maybe if they keep their speed between 50mph and 55 mph, they got the most fuel efficiency.
When they are driving an empty load, the speed was slightly different. When they’re carrying a different load, for example, if it’s a half-loaded truck, the speed that we looked at was very different. It’s easy to use averages, but it’s only when you go done to the granularity you get the most benefit. You had asked an earlier question, why is the historical analysis important? It is to win this kind of information, you need to look at the historical information. To give you the result of this analysis, the fleet in Reno was able to increase their fuel efficiency by 5%. 5% to a regular user who is just a car driver, it is a very small number, but for a single truck that drives about 100 miles to 200,000 miles a year, that 5% translates to huge cost savings.
Bill: Well, it’s always great to hear real outcomes on these studies and particularly when they’re as positive as that one was. Obviously, that was a huge cost savings for those folks. Well, Mohan, Paul, thanks for some great answers to some very good questions. For those of you that asked questions that weren’t answered today, we’ll be sending all the unanswered questions to Mohan and the Qubole team so they can follow up with you today after today’s webinar.
I have just a few quick announcements. If you’ll please mark your calendars for June 21st. That’s our next DSC webinar which is Enhanced Predictive Modeling with Better Data Preparation. That will be sponsored by Alteryx. Also, today’s taping will be available for on-demand viewing later today, and you can find that on the homepage of datasciencecentral.com in the webinar tab which is located at the top of the page.
Well, this brings today’s webinar to a close, and I’d like to thank our audience for their attendance and thoughtful questions, and a special thanks again for Qubole for their sponsorship and our speakers today, Mohan Krishnamurthy and Paul Asoyan for their insight into today’s topic. My name is Bill Vorhies. I’m very pleased to have been your host for today’s event. I look forward to seeing you all again on June 21st. Have a great day.
Paul: Thank you. Goodbye.