Video

×

Dale Treece – Data Platforms 2017

 

Dale Treece: My name is Dale Treece. I’m a solutions architect and engineering lead for the Digital Data Services group at Scripps Networks Interactive. Who is Scripps Networks interactive? HGTV, Food Network, GAC, Cooking Channel, Do-it-Yourself Network, those kinds of things that you should recognize, hopefully. Started out also in my career and the instrumentation and controls field, nuclear power and also a stint at Oak Ridge National Laboratory for like 14 years.

Moved to Scripps Networks, been there 17 years, done a lot of different things from sysadmins, DBA, that kind of thing. The job I had prior to this position was leading the search platform, which is a platform team that provides services for all the digital delivery. We were successful in migrating to the cloud 100% and implemented a fully embedded DevOps team. I was asked to take this digital data services gig and that was the goal, is to implement a fully embedded DevOps team.

Today we’re going to talk about our journey as we started creating this team. We’re going to talk about the creation of the culture, we’re going to talk about the evolution of the organization and some of the challenges we had, we’re going to talk about the emerging architecture. I’ve got some diagrams to go through. I’m a dry erase board, dry erase marker kind of guy. I’m not real comfortable when we have text up there that we got to step through, so I’m going to power through the first three or four slides. We’re going to get into the diagram and show you architecturally how we moved and iterated through to our current state.

We’re also going to talk a little bit about an initiative that’s about two months old, two months or so, something like that, where we’re actually starting to implement a self-service environment for our customers. We’ve rolled out about four environments so far and it has been successful. There are some challenges and things that we’re going through and some bumps in the road here or there. Qubole has been really helpful in helping us get through some of those. Then we’ll just sort of summarize at the end, and then I’ll talk a little bit about some of the things that we’re trying to do and looking forward.

These are the slides I’m going to power through. You’ll hear this a lot, break down the walls, support comfortable and open communications. It’s very important. The walls, what are the walls? The walls are dev and ops, dev and business, dev and product, there’s a lot of walls, especially, if you’re in this old pattern of the development life cycle. The next three are really artifacts of the current day. I mean, we’re moving quickly, the velocities increase, we’re trying to get to the value– to provide value to the product team as fast as we can.

There’s some issues there when you have a team– let’s talk a little bit about this first. My first search team, I hired them all. It’s easy to do a DevOps type team when you’re hiring them. You know what you’re looking for and it’s– I’m not saying it’s super easy, you got to find them, but it’s a little easier.

In this particular situation, I was given a team of a DBA, a Sysadmin, a couple of Java developers, a couple of BI folks, and now I’m being asked to build this DevOps team, which should incorporate everything from the cloud IAM Roles and access, to network VPCs and security groups; so that was a challenge in and of itself. A lot of these folks came from that period of time where we’re planning for six months, we’re building for three years, and then we finally get to the point where we’re able to execute and use the application that we developed.

You’re going to get some things like the change in volatility. You’re trying to get to the value as quick as possible so that they can make decisions based on that. There’s a lot of change in volatility that you need to embrace, and you have to help your folks get through that. Be able to pivot, and I think we’ve gotten pretty good at that. We’ve talked about delivering value quickly.

Minimize the impact of failure. I’m not talking so much about on the development process, I’m talking about the psyche of the folks. A lot of these folks are used to this, planned for every scenario, every little thing, “Let’s get everything perfect, and then we’re going to start building this thing.” Failure is our friend in this new day, right? We’re wanting to get to that point as fast as possible, so that we can make decisions on what we’re going to do next.

Education, that is extremely important. It is a priority component of our development life cycle. We try to embed learning into our projects. There’s cases where you can’t embed it but it’s very important, so we’ll actually make it a task that’s in our PI or program increment.

The last thing but is the most important thing, is have some fun with it. Get out there and learn some new things, dig into some new technologies, innovate, create, and celebrate, even the failures.

Next thing I want to talk about is some of the organization challenges; the legacy infrastructure, the frameworks, and applications, we’ve all got that. If you’re totally in the cloud, you may still have legacy frameworks and applications. You may not have the infrastructure anymore but you still have the legacy frameworks and applications you have to deal with. That’s going to vary, depending on how large of a system you’re talking about. We were very fortunate in this case that when we were asked to become this big data services team for Scripps that they gave us really one charter, and that is to take our audience data warehouse, get that in the cloud.

Make communication and collaboration with the product owner a priority. That is one of the most important things in your group. We broke down those barriers and those walls and when we’re developing, we’re in constant contact with that product owner so that we can move quickly and when changes are necessary or need to be made, we’re aware of it quickly. We’re teaming with our product owner.

Emerging architecture, you may hear things like iterative architecture, architectural runway, just-in-time architecture. It is a challenge. It’s something we try to do. We’ve done well sometimes and sometimes we have not. That’s where you’re looking out ahead and that’s primarily my responsibility or a solutions architect engineer at the program level, to be looking at the backlog of the program, looking out ahead, they’re aware of what their infrastructure– the status of their current infrastructure, and they’re looking out ahead to making sure that we have runway to land some of those products that are in that backlog.

Fragmentation, just-in-time, that’s the challenge is sometimes it was just in time. Sometimes it actually was not there and we had to embed it in the project. We would see an impact from that. We’re having to build the runway before we can even start building the project. Then sometimes we were pretty good. We looked out ahead and we had some things in place that really helped us when some of those products that they were wanting to build came into our queue.

Testing, it’s hugely important. The coverage, the patterns, the automation of it. You’ve got unit testing, and integration testing, and all these different flavors of testing. What we decide to do coming out of the gate, is we were more about the product and the behavior of that product. We follow behavior-driven development process, where we are actually working in conjunction with the product owner, we’re building the acceptance criteria, we’re building the tests, and we are developing against those tests. When all your tests pass, the product is functioning like it should per the specifications and requirements.

The reason we chose that, if we’re doing unit testing, the lower level method, or class, or whatever you’re writing those tests for, it may function great, but your products could still not function like the product owner wants it to function.

Hire or develop T shaped skill sets. What I mean by T shape, I’m a generalist, I am not a T shape. I just hit my microphone, right? Sorry. I am not a T shaped resource, I am a generalist, I am a dash, I am the top bar. I have some depth because of the experience and things that I’ve gotten into and I’m a little bit risk-averse, I’m okay with it, but what we want for our DevOps team is to have a T shaped resource. They’re broad, they have some depth, and they have an area in which they’re an expert.

We were so fortunate. I didn’t get to choose this team, but I am so blessed when I look back and thankful that I have them. We had most areas covered, release engineers, data engineers, security engineers, build and deploy test engineers. We had a lot of these things covered. What happens is this resource now is an expert and a reference for your team, so people can come to them with questions in that particular field, but yet we have the bar across the top with some depth that we’re constantly trying to build across the team.

[sighs]

Now I can relax, this is where I feel comfortable. My team throws me a dry erase marker when I come into meeting sometimes. I want to talk about the emerging architecture we started as close to from scratch as you can start. The very first thing, like I said, we wanted to get our audience data warehouse into the cloud. Here we are, we’re doing all these evaluations, we’re an AWS cloud customer, enterprise-wide. We’re digging into different things, we’re doing our due diligence there, and we settle in on Redshift at the time. We’re talking two years ago. All the things I’m showing you happened in a period of time of about two years.

Things are happening really quickly. It feels a lot more than that, maybe it’s because all the stuff we crammed in it. We evaluated, we chose Redshift. This is our initial migration to the cloud. We had our audience data warehousing in Redshift. We did make some modifications to the structure because the on-prem warehouse, we only captured a single point in time for a particular customer, but we expanded this. Now we have, how many times they’ve moved, how many times they’ve interacted with sweeps, so on.

We moved that to the cloud. We had one near real-time process that was pretty important that we integrate into the audience data warehouse and that is our sweeps. Dream Home, Smart Home, those kinds of things, we take those entries real time. What we ended up doing was taking the framework we developed in search which had a lot of success and we done what I called– we were in the cloud but we were still rack them and stack them. We were still custom application and we’re managing the whole thing.

This orange box is much bigger than that orange box, there’s a lot of things in there, but it was all custom based on a Felix OSGi framework. We moved out to the cloud, so we got it all tied together, we’re feeling pretty good. We’re we’re streaming Dream Home throughput rates and we’re going into audience data warehouse, we’ve got this view of our customer that we’re pretty happy about it. We’re thinking Redshift is it, we’re done, we’re there.

Next thing– one thing I forgot, let me go back one time. The green line up to the top, we started out with one data science team that was our primary customer, and so we’re– they’ve got access to Redshift, so they can get in there and look at things and do some stuff for the customer. It’s great. Write run reports, all this kind of stuff. They’re pretty happy. They come up with their first product. Boom. Great. Guess what? We left it in Redshift. We created this product and put it in another scheme in Redshift and here we go, we’re cooking with gas.

Then after that, things started happening that really drove us into another direction. We were asked to load some data, they wanted to get a first look at some data. What are we going to do? We’re going to load it in Redshift. We develop a process, we take those documents, we load it in Redshift. Hey, we’re doing great. Our analysts have access to it. They can see and do what they want.

Funny thing happened, that continued– let me go back– that continued until we started getting more and more products, more and more requests to load data in there. One day, we do some calculations saying, “Hey, we’re at the cheapest level of Redshift right now, we’ve got this compute storage coupling thing going on and we’re going to run out of space.” There were two levels, we got– it’s pretty reasonable cost at the level we’re currently at. We’re digging, we’re saying, we’re figuring out at what time we predict that we’re going to have to move to the next level, then we look at the cost and we’re going to like, “Uh-oh.”

Then we started digging in to how we can decouple that storage and that compute. What’s one of the best storage mechanisms out there, I think, it’s cheap, is S3. We start looking at how we can move our current load data and get a process together to actually push our data into S3 and serve our customers via S3. We started visualizing some patterns, we’re starting to see some patterns here. During that process when we’re loading some of this data, the analyst go, “Hey, this is great,” but we were doing on some logic back at the beginning, back at the front where we’re putting the data into Redshift. They’re saying, “Hey, this is great but we need these other two fields.” Now we got to go back and load 18 months of data because they need these other two fields.

We started seeing some patterns here and I’ll go through it here. We break our process up into four areas and that’s land, we’re landing it raw, we’re landing at all. We keep it for quite a while, we load it and from that– when you land the data like that, we had some limitations as far as how they could access it. They couldn’t look at the whole repository. Sometimes we need to optimize and make it more efficient so they could look at the whole repository of data, but they could get that initial look, so we solved one problem. They could look at it and say, “This is cool. That day’s worth of data is cool. Now, can I look at the years worth of data?” That’s where the load came in. We do the minimal transformations there to get it to the point where they can look at the entire corpus of data.

Then the next piece is, is an integrate phase and that’s where we either we’ll integrate it in with a mark, we’ll integrate it in with some other schema, some other file set, whatever. That’s our integrate phase. Then there’s an action phase where we are producing a product. The analyst now have seen this data, they’ve analyzed it, they’ve come up with some products. Then when we produce those products, we call that action.

One thing I want to point out here is, you’ll see the process columns there, they get a little larger width as they go towards their product, that was intentional. We want to push the business logic as close to the product as possible. We don’t want to reload, we don’t want to take the data and reload 18 months worth of data. Like I said, in land, we land at all. There’s not a lot of logic in the land process, and the logic gets more complex and a larger set of logic as you move towards that product. For example, if we have a product and we’ve got this pretty pretty complex business logic associated with it, then something happens, I’m not having to go back like we did before and load everything from the beginning, I just got a load– I just got to change and reproduce that product.

This is where we’re at today. Visualizing those patterns, we sort of built those patterns up and we– that’s good for automation, we figured out automation is our friend too. We’re a small team, right? These patterns start visualizing, start standardizing them, start reproducing those patterns, really helped us to automate and to give us some cycles back for more innovation.

I want to go through this process real quick. Again, you’ll see the land, you’ll see the load, you’ll see the integrate, you’ll see the action. We have 30, 35, maybe 40 disparate data sources that we pull in now over time that it’s morphed into some internal data, external data, APIs, data drops from vendors, et cetera. We do, just like I said, we’ll land it.

What we changed too though to get away from Redshift, when we pushed that data into S3, we needed to give our analyst a way to look at that data, and so the first thing we did was actually create and manage our own Presto cluster. That took a lot of cycles away from us and so we started doing some evaluation and looking out there and seeing, “What’s available from a managed service perspective?” We’re really moving into the managed services area because, like I said, we’re a small team, there’s a lot of demands being thrown at us and we needed to get cycles back.

We ended up evaluating Qubole’s offering in comparison with some others and went with Qubole. Now, we have this data being landed an S3 structured, unstructured, whatever and we’re able to produce some schemas still to allow them probably only to see certain– in some cases, only a certain set of data. It might be a day’s worth, a week’s worth, whatever, just enough to give them a feel for what it looks like. Then again, we still do the same load process where there’s some things we may need to do, some partitioning things we need to do, some formatting things we need to do, but we keep that transformation to a minimum.

Again, that’s pushed in to the load schema. Qubole sits on top of that and now, they have access via the different tools that are available there Presto, Hive, Spark to be able to go in and look at the entire corpus of data and do some more in-depth analysis.

Our audience data warehouse, we’re still in Redshift. That’s the only thing in Redshift now. We don’t have any load data, no land data anymore. All that stuff’s out and all that’s in there is actually the data warehouse. We still have an integrate process that we use to push data into that audience data warehouse. Again, in the integration process, we’re doing all the mastering of our customer. We’ve got the de-dupes, and filtering, and whatever all that logic consists of to push that into there.

Our action process, we do have a lot of products now and we’re starting to expand. Our initial customers have expanded a little bit on the ad sales side. Our initial data science group was in the ad sales department. We now serve ad sales data science, ad sales marketing, ad sales inventory, ad sales ops. That is actually the four, we’ll get in a little bit, the four groups that we have rolled out of self-service type infrastructure too.

We have products in S3, we push things to databases, depends on what the customer needs and wants, documents in S3. We actually, in the action process, down at the bottom, when you see the Qubole Presto cluster down on the very bottom there, it’s going over to a search and discovery index. We’re starting to branch out into providing products for delivery on the digital side. There’s like five or six different products that the analyst come up with some calculation, “We have the data. We’re producing that product on a daily basis and it’s actually being used to provide relevancy information for delivery of content on the site.”

Actually, if I’ve redid this diagram, that’s still in action but I just want to call that out, that we’re using Presto now to just pull from the load. We can action off of any layer really that we want to. If we wanted to action off land, we could.

Next. Now, let’s talk about the self-service platform overview. When we started out with our first customers, everything was just like a diagram we saw earlier this morning in the keynote, is everything’s being thrown over the wall, here’s what we want, build this for us, and we do that. Then as the customer base started growing, more things coming over the wall and it’s getting a little difficult to keep all the balls in the air and all this kind of stuff.

We always knew that we were going to move in this self-service direction, but we really started getting after it because we had heard that there’s more customers coming, the digital engineering side and that we’ve now branched out internationally, and I got an email the other day. It’s very important to us if we’re going to maintain the size of this team, which is six folks, that we get into the self-service area.

Here’s basically, this is a very, very, very high level. We’ve got a production environment which consists of what I showed you earlier at the top box. We still produce products. There are some strict SLAs we have to adhere to sometimes and so we have a lot of things in there that maybe typically, they wouldn’t do in their self-service environment.

Like I said, we’ve rolled out four self-service environments. I’m showing you two here. Basically, we give them read-only access to the data lake, and we use the least privileges type pattern, so that we can control what they have access to. We have PII data and our lake, so we can control what each group needs. That’s a set of questions we ask them and we don’t give them carte blanche access to everything yet. We got a request today for that. [laughs]

There’s a lot of things going on with the self-service area. It’s fun, we’re having a good time but– in a self-service environment, it looks a lot like– it is basically a Qubole account. They have all the tools available to them, they can schedule jobs, they can write their own queries, they can write their own notebooks, they can schedule those notebooks, they can build more complex workflows off of queries and notebooks together, they can schedule those.

One example in the ad sales marketing group, they had a process that was rather manual, took a significant amount of hours. I’m thinking what they told me was somewhere around a week or something like that, and they did this every month. When we got this– we rolled it out to them first. When we this rolled out, man, they took off. We’re talking about a matter of a couple of weeks, they had this thing in a fairly complex workflow and it’s producing this data for them weekly. There’s no manual intervention, it’s scheduled, so they’re like, “Woo.” Then, like I said, they took off, so there really are– our team that’s out there really stressing the infrastructure in the environments and really getting a lot of the bugs to surface.

I’m only showing one area, this area SS81S3, so they can read from the data lake, but they can write or read from their own sandbox. If they want, they can they can source reports off those areas in that sandbox, and they can schedule them, they can maintain all of that. One of the values we’re trying to drive home to them is, if you have rather strict SLAs, you don’t want to support this 24/7, 365, we have our production environment, we have some more stringent strict type processes and then we support it 24/7, 365. We’re going to see how that goes.

They all want the access and, “Hey, get out of my way,” kind of thing. I’m wondering though like if there’s some situations that may occur where at night they’re asleep, they’re woken up at 2:00 AM and got to get something fixed. Anyway, we’re going to see how that goes. Also, what we’re also doing is we’re moving in a direction where some of those areas, some of those sandbox areas for these different self-service environments. We’ve got one team that wants an Aurora instance, we’ve done that. We got one team, get this, got one team that actually wants to push stuff to Excel. We can do that and we’re doing that.

Key takeaways, so in a matter of a few months, 24-ish, something like that, we’ve established a fully embedded DevOps culture, we’ve produced a broad talented T shaped type team, we’ve developed and deployed a platform that supports rapid product delivery. When I was on the search platform, we had things down as far as our frameworks, our CICD processes, and all our building deploys stuff, our testing stuff at build time, and it instills confidence. Instills in confidence that you can move pretty quickly.

I think one deployment we were doing, it was a rather, rather, long, long day but we probably did like 40, 50 some odd deployments because they felt comfortable. If it passed these tests, we know it’s good. If we made a change and another test failed, and we didn’t roll back and say, “Hold it guys, we got to stop the website roll out, we’re done. We’re going to have to roll back. We’re going to deploy next week.” No, we felt comfortable enough that we make that change and roll forward. In a matter of minutes, we’re out there and we tell the team that we’re supporting, “Hey, we’re good to go.”

We’ve got that same kind of thing here. Like I said, we focus on behavior-driven testing, development and testing. Those tests are automated in a framework we call robot test framework. It’s in our build process. Those things are run during the build and deploy. It’s not very impactful execution type– I mean some of them are very slow and long-running, so it’d be very impactful at execution time, so we push those into our build and deploy process.

Our customers are starting to reap some benefits from our self-service data platform. Our road map, we’ve got a lot of work in this area. We’ve had several meetings while we’re here at the conference related to our self-service platform. The engineering department is very helpful, and I feel like we’re making progress. We’re continued to develop and expand testing patterns and processes. You can never have enough testing to me. You want to make sure that and get that confidence that you can make a change and row forward.

We’ve got a big initiative right now, trying to return some resource cycles for innovation. We’ve got some automation as far as operational automation that we’re trying to focus in on, trying to automate some things to give us those cycles back. Monitoring notification and self-healing. We’re big into this. If we can monitor it, we can learn on it, we should be able to do some things to self-heal and take care of that, so that we’re not having to get in and do all the manual steps to intervene and take care of that, taking cycles away from innovating.

We have an internal group. I call it DevOps, my wife would call it CFO, control freak ops, because that’s what I am, is a little bit of a control freak. That’s why I like DevOps is because you control it all. I feel uncomfortable but I’m starting to get to the point where we really need some help, as far as there are some tier 1, tier 2 type stuff. We have a vendor that we’re working with, that I feel pretty good about, we’ve made some pretty good strides in offloading some of that tier 1, tier 2 stuff.

Then we’ve got a pretty big initiative right now, digging into data lineage. Our data sources, our lake’s starting to grow and we need to know– one of our big things, which we’ve had some conversations down here, is I need to know, if I’ve got these data sources and downstream, it forks off into like five or six different pipelines that end up at five or six different products, anywhere along the way, if something fails, I need to know that. I’d love to self-heal, but we’re marching down that direction trying to again get some more cycles back.

That’s it. Any questions? Like I said, I’m a generalist, my experts are at home. I have one here, but my experts are at home, so you might be able to stump me. I’m pretty sure you could, but I’d love to just have a conversation. My target audience today is I feel like we’re somewhere in the middle, we’re not there. When I was in the advisory board meeting, there’s people talking in that’s been with Qubole for five years. We’re barely two years old kind of thing in and we’ve got off the start line, and we’re making good progress.

My target audience today is those folks that maybe are still at the start line or just a little bit past it and just want to be there for you, if we can help you. I’d love to have conversations with you, I’d love to share contact information and we can talk way after this conference, if you’d like. That’s my target, is if you’re there, if you’re in the cloud, you’ve got everything cooking with gas. I’m really looking to see if we can connect with those folks that maybe we’re just a little bit further ahead. I gained a tremendous amount by those folks that are ahead of us in the first day, the second day. I just want to be there and give back a little bit there for you. Questions.

Audience 1: In your self-service world, do you allow people to bring their own data? We’ve got people that are like, “If only I could join this spreadsheet with all the other data that we have in the cloud.”

Dale: We do. That environment that we have set up for them, they can pull data from any other source they want. The thing that we’re limiting right now is, we don’t allow any kind of product or data that’s been developed in that self-service account to be shared with other accounts currently. We’re having some discussions about changing that, but still, currently because I’m a control freak, I don’t want to get something that I can’t control or fix, if people are saying, “Hey, this is broke.” They can read from the data lake, we process all that data into the data lake, but they can pull their own source.

Audience 2: Just to clarify, so they can pull down to Excel but they can’t load up?

Dale: Into the lake. They can load up into their sandbox area.

Audience 2: Right. They can bring it down but they can’t mix it in with your production data that’s up?

Dale: Correct, we don’t allow them to write over the sources, currently. Like I said, I got an email today of, “I want access to right over the lake,” and I’m going like, “We’re going to have to meet and talk about that a little bit.”

[laughter]

Audience 3: I actually have a ton of questions, I’m going to want your card after this.

Dale: Cool.

Audience 3: You mentioned that they can pull data in– or for the storage accounts that you provide, are they allowed to write to any location or their own locations that they–?[crosstalk]

Dale: The self-service accounts can write only to areas that we have set up for them, but it’s– I mean it’s unlimited, right?

Audience 3: Another self-service account can’t read into a location that another self-service account has written to and basically get around your policy of not sharing data between accounts?

Dale: No.

Audience 3: That’s good.

Dale: They’re limited to just what we set up for them in their sandbox area-

Audience 3: I got it. Cool.

Dale: -and there’s no interaction between accounts.

Audience 3: Then you mentioned the team of six T shaped skill sets, what are those skill sets?

Dale: The skill sets I’m talking about are like DBA, or a database administrator, a data engineer, a release engineer where you got build and deploy type of expertise. You’ve got software engineers or people that are Java developers, that kind of thing. The T shape means that they have an area that they’re really good at, they’ve got a lot of experience with, expertise in, more so maybe than another member of the team.

In that area, they become the reference for the team. The team all the time, we sit in a big place we call the pit. The pit is a big area where we have a bunch of seats and tables and all the time, people are turning around talking to the data expert in our group saying, “This query is killing me, I don’t know what’s going on. I’ve done the best I could.” He comes over like, “It’s this, this, this.” Boom. Done. There’s that interaction with those reference folks.

It was not easy to take a DBA that has done nothing else for 20 years to get them into knowing what a VPC is, a security group, all those kinds of things. We use pair programming, we’ve done some creative things with training, we tried to encourage– now our velocity initially was down, until we got over the hump but we’ve done something to try to encourage when you’re pair programming with someone, choose that area you’re not good at. Choose that area that you’re maybe not good at to try to bolster or increase your knowledge in that area.

It’s worked well because I’ve had three folks promote out the DBA that had done nothing but a DBA for 20 years, just exploded and then promoted out. What was so cool at that point they promoted, we were in pretty good shape from a DevOps perspective. Most everyone could cover the entire range of skills needed to do whatever we need to do on our platform.

I’m sitting there thinking, “Okay, this is my data person, this is my DBA, what’s going to happen?” Then somebody goes, “I got it.” They volunteered, “I got it.” I said, “Cool.” They took it, barely missed a beat. We had a dip but that knowledge transfer had been going on and that person felt confident and said, “I got it.” Then what that done to me was allowed me to hire more of an entry-level kind of person instead of an expert. They came in and we start embedding them in all of our processes, and then they find a niche that maybe we’ve got a hole in or a gap that they just like and there they go.

This happened two other times. All three times, we barely had a blip. They were good people. Not saying that when they’re gone, we didn’t miss them. We missed them but we barely had a blip.

Audience 4: Hi,I want to know how you want to do your tier 1, tier 2 support since your users are mostly internal users, right, like inside a company?

Dale: We do have external users, we do have some products that we produce outside that we deliver outside of our company. We actually are legally bound in some cases to deliver those products, but what we’re doing is we’ve failed a couple of times. Failure is okay. We failed a few times and so it’s been quite a while of us trying to get this tier, 1 tier 2 transition off the ground.

What we decided to do is just basically break it down into the smallest components we can. We’ve got one of our most important pipelines, we just broke it down into pieces and we said, “Here, take this piece.” They start setting up meetings, they start evaluating, they start looking at monitoring, and alerting, and adding, and they start understanding that whole piece. When they get that one done, we go to the next piece.

What over time, we’re going to have that entire pipeline covered by them and there’s a lot of cycles that go to supporting that. That’s how we’re doing, we’re phasing it in small pieces. We’ve got some really good folks too, and they’re actually doing some of the self-healing and automation, which I was ecstatic about because I thought that was something we were going to do, but they’re tier 2 folks are actually involved and actually did some of the automation.

Audience 5: Thank you. I’ve heard a lot of people talking about self-service today and we have a lot of audience members that are in the Excel world, and so I’m wondering, for all those different teams, do they end up getting a data person that was more used to doing those queries and whatnot or were you having to train people to–? What was that mixture?

Dale: We’re two months into this, so what we’re finding is we have this whole broad spectrum of expertise and experience. We got the guys that have taken off. They’re really doing some complex stuff, so much so that they have actually caused a couple of infrastructure type of issues that Qubole engineering and us are actually dealing with. Then we’ve got those folks that are okay with ANSI SQL kind of stuff.

What we’ve done is when we onboard these teams, we develop these templates of examples. In addition to some of the examples that Qubole may provide, we’d actually do some specific examples for them to be able to look at. We have a session with them. We know we’re committed to them in the first part of it to try to help them get over the hump. The range of expertise varies. The account we just rolled out is the one that actually wants the Excel push, but they seem to be fairly savvy from a SQL perspective.

We’re just getting into this self-service area. It’s important to us though. Like I said, we’ve already got a new set of customers. We’re going to treat those like we did the original ones where they throw it over the wall until we can’t handle it no more and hopefully, be trying to get them up to speed where they feel comfortable becoming a self-service account themselves.

Audience 6: You said from your data lake are populating those data sets into the self-service accounts-

Dale: No, they’re reading and pulling. [crosstalk]

Audience 6: – or they have access to it?

Dale: They access and they pull.

Audience 6: Okay. Do you have any concerns for multiple versions of the truth or any pushback from leadership that you’re going to confuse us with different groups producing different metrics on the same data?

Dale: No, what we’re trying to do is take ownership of everything, team with the product owners, but our job is to provide the tools, and the infrastructure, and the platform for them to be able to do their business. The workaround governance, and source of truth, and all of that is being done at, for example, the ad sales layer. Those folks are getting together, they’re talking about, “Hey–” We’re lot better than we were. All the data was all over the place, we’ve aggregated it now.

We actually do some of the initial heavy lifting and processing of the data. We consider the lake the source of truth, but yes, they can pull from there, do some things and start propagating that out somewhere and somebody on another account could maybe do something similar. We’re really leaving that up to the users themselves and they’re doing a pretty good job of trying to coordinate that and take care of that.