Curiosity-Driven Data Science - by Eric Colson @ HBR

Data science can enable wholly new and innovative capabilities that can completely differentiate a company. But those innovative capabilities aren’t so much designed or envisioned as they are discovered and revealed through curiosity-driven tinkering by the data scientists. So, before you jump on the data science bandwagon, think less about how data science will support and execute your plans and think more about how to create an environment to empower your data scientists to come up with things you never dreamed of.

First, some context. I am the Chief Algorithms Officer at Stitch Fix, an online personalized styling service with 2.7 million clients in the U.S. and plans to enter the U.K. next year. The novelty of our service affords us exclusive and unprecedented data with nearly ideal conditions to learn from it. We have more than 100 data scientists that power algorithmic capabilities used throughout the company. We have algorithms for recommender systems, merchandise buying, inventory management, relationship management, logistics, operations — we even have algorithms for designing clothes! Each provides material and measurable returns, enabling us to better serve our clients, while providing a protective barrier against competition. Yet, virtually none of these capabilities were asked for by executives, product managers, or domain experts — and not even by a data science manager (and certainly not by me). Instead, they were born out of curiosity and extracurricular tinkering by data scientists.

Data scientists are a curious bunch, especially the good ones. They work towards clear goals, and they are focused on and accountable for achieving certain performance metrics. But they are also easily distracted, in a good way. In the course of doing their work they stumble on various patterns, phenomenon, and anomalies that are unearthed during their data sleuthing. This goads the data scientist’s curiosity: “Is there a better way that we can characterize a client’s style?” “If we modeled clothing fit as a distance measure could we improve client feedback?” “Can successful features from existing styles be re-combined to create better ones?” To answer these questions, the data scientist turns to the historical data and starts tinkering. They don’t ask permission. In some cases, explanations can be found quickly, in only a few hours or so. Other times, it takes longer because each answer evokes new questions and hypotheses, leading to more testing and learning.

Are they wasting their time? No. Not only does data science enable rapid exploration, it’s relatively easier to measure the value of that exploration, compared to other domains. Statistical measures like AUC, RMSE, and R-squared quantify the amount of predictive power the data scientist’s exploration is adding. The combination of these measures and a knowledge of the business context allows the data scientist to assess the viability and potential impact of a solution that leverages their new insights. If there is no “there” there, they stop. But when there is compelling evidence and big potential, the data scientist moves on to more rigorous methods like randomized controlled trials or A/B Testing, which can provide evidence of causal impact. They want to see how their new algorithm performs in real life, so they expose it to a small sample of clients in an experiment. They’re already confident it will improve the client experience and business metrics, but they need to know by how much. If the experiment yields a big enough gain, they’ll roll it out to all clients. In some cases, it may require additional work to build a robust capability around the new insights. This will almost surely go beyond what can be considered “side work” and they’ll need to collaborate with others for engineering and process changes.

The key here is that no one asked the data scientist to come up with these innovations. They saw an unexplained phenomenon, had a hunch, and started tinkering. They didn’t have to ask permission to explore because it’s relatively cheap to allow them to do so. Had they asked permission, managers and stakeholders probably would have said ‘no’.

These two things, low cost exploration and the ability to measure the results, set data science apart from other business functions. Sure, other departments are curious too: “I wonder if clients would respond better to this this type of creative?” a marketer might ask. “Would a new user interface be more intuitive?” a product manager inquires. But those questions can’t be answered with historical data. Exploring those ideas requires actually building something, which will be costly. And justifying the cost is often difficult since there’s no evidence that suggests the ideas will work. With its low-cost exploration and risk-reducing evidence, data science makes it possible to try more things, leading to more innovation.

Sounds great, right? It is! But you can’t just declare as an organization that “we’ll do this too.” This is a very different way of doing things. You need to create an environment in which it can thrive.

First, you have to position data science as its own entity. Don’t bury it under another department like marketing, product, finance, etc. Instead, make it its own department, reporting to the CEO. In some cases, the data science team will need to collaborate with other departments to provide solutions. But it will do so as equal partners, not as a support staff that merely executes on what is asked of them. Instead of positioning data science as a supportive team in service to other departments, make it responsible for business goals. Then, hold it accountable to hitting those goals — but let the data scientists come up with the solutions.

Next, you need to equip the data scientists with all the technical resources they need to be autonomous. They’ll need full access to data as well as the compute resources to process their explorations. Requiring them to ask permission or request resources will impose a cost and less exploration will occur. My recommendation is to leverage a cloud architecture where the compute resources are elastic and nearly infinite.

The data scientists will need to have the skills to provision their own processors and conduct their own exploration. They will have to be great generalists. Most companies divide their data scientists into teams of specialists — say, Modelers, Machine Learning Engineers, Data Engineers, Causal Inference Analysts, etc. – in order to get more focus. But this will require more people to be involved to pursue any exploration. Coordinating multiple people gets expensive quickly. Instead, leverage “full-stack data scientists” with the skills to do all the functions. This lowers the cost of trying things, as a single tinkering initiative may require each of the data science functions I mentioned. Of course, data scientists can’t be experts in everything. So, you’ll need to provide a data platform that can help abstract them from the intricacies of distributed processing, auto-scaling, etc. This way the data scientist focuses more on driving business value through testing and learning, and less on technology.

Finally, you need a culture that will support a steady process of learning and experimentation. This means the entire company must have common values for things like learning by doing, being comfortable with ambiguity, balancing long-and short-term returns. These values need to be shared across the entire organization as they cannot survive in isolation.

But before you jump in and implement this at your company, be aware that it will be hard if not impossible to implement at an older company. I’m not sure it could have worked, even at Stitch Fix, if we hadn’t enabled data science to be autonomous from the very the beginning. I’ve been at Stitch Fix for six and a half years and, with a seat at the executive table, data science never had to be “inserted” into the organization. Rather, data science was native to us in the formative years, and hence, the necessary ways-of-working are more natural to us.

This is not to say data science is destined for failure at older, more mature companies, though it is certainly harder than starting from scratch. Some companies have been able to pull off miraculous changes. And it’s too important not to try. The benefits of this model are substantial, and for any company that wants data science to be a competitive advantage, it’s worth considering whether this approach can work for you.

Eric Colson is Chief Algorithms Officer at Stitch Fix. Prior to that he was Vice President of Data Science and Engineering at Netflix. @ericcolson

Embed Block

Enter a valid embed URL or code.

Bethany - Managing DirectorDecember 21, 2018