“What problem are you solving?”

This is a question that I have heard a few times in the past when talking about my research activities with colleagues from other teams. Depending on the situation or on who was asking, the question could take other forms such as:

• Did the client ask for it?
• Is it really addressing a pain point for the client?
• What is the benefit of this exercise?
• Why do you focus on this instead of X? (“X” being one of the numerous services which could be improved – since nothing can ever be 100% efficient).

If you are a researcher and have already faced this kind of situation, this article might help you figure out the answer. Some of the questions above are closely tied to the notion of Return-on-Investment, or ROI. If you solve a problem expressed by the client, then the return is assured. If not, then your research activity has an uncertain ROI.

But having an uncertain ROI is not necessarily a bad thing. Put differently, not being able to calculate a precise ROI does not automatically imply a loss of value (negative ROI).

The first part of this article will consider research activities from a general perspective, while the second part will focus on R&D from a data science perspective.


We can consider the ROI challenge from both the Investment and the Return sides.

The Investment (“I”) side: Should you spend your time (and money) on an activity whose result is uncertain? The answer is… uncertain itself!

First, as a rule, it is usually a good idea to balance the risk and not to put all the eggs in the same basket. To the extent possible, try to spread your time and budget across several projects with various degrees of risk.

Second, sometimes you will be doing research with a precise result in mind. Other times you will be pursuing an analysis or experiment just because it is interesting or that the technology seems very promising. The latter is not always a bad thing – and things may work out in ways which simply could not have been anticipated at the outset of the project.

There are many success stories which started as something different – three examples below:

• Twitter started as a platform to exchange SMS messages with a small group of people
• Facebook started as Facemash at Harvard, a sort of “Hot or Not” online game
• The World Wide Web originated from a project to facilitate the sharing of information among researchers at the CERN laboratory in Geneva

The Return (“R”) side: If you do not solve a real client problem, you will fail.

First, this is probably true as a general rule, but exceptions do exist. To come back to the Twitter example, it was virtually impossible to pinpoint which problem Twitter solved. Biz Stone, one of the Twitter co-founders would counter the challenge by asking “what real problem is ice-cream solving?”. Twitter may not solve a well-formulated human problem, but that does not prevent hundreds of millions of people from using it every day.

Second, client needs come in various forms. Some innovations will be client-driven (i.e. clients explicitly asked for “it”) and some innovations may answer latent needs that clients did not know they had. As Steve Jobs famously remarked, sometimes “people don’t know what they want until you show it to them.” Twitter would fall into this category.

Beyond ROI, let us consider the human aspect.

When comparing research and development activities, pure development is easier to understand: it feels somewhat safe. When in development mode, things are fairly certain: we know precisely what is being built and we know when it will be ready. This is something everyone can work with, and your colleagues can start planning all the associated activities involved (marketing, sales, deployment, customer service, etc. ).

Research, by contrast, is uncertain.

What is being worked on? Why? Can we see it? When will it be ready? You will not be able to satisfactorily answer these questions until the project is sufficiently mature. Secrecy is not a great option either though. Thus, managing expectations often generates a certain amount of frustration.


In the field of data science, research is necessary for a variety of reasons, including:

Keeping up with the technology pace.

Software is evolving faster than ever on all fronts of the data science domain – be it storage and access, analysis and algorithms or data visualization. Hadoop was first released in 2006, which was the same year that deep learning kicked off. In the last 10 year, many powerful tools have appeared (Cassandra, MongoDB, D3, AngularJS, etc.). The potential for efficiency gains is substantial, and it requires constant experimentation to bring these technologies into your business environment. Good news: costs are low. The advent of open-source software has been creating larger and ever more passionate developer communities. As a result, some of the best software available today is free. (Go ahead and try it out!)

♦ Creating new revenue streams

Data sets will keep on expanding. Bigger data – both in volume and variety – leads to more opportunities for inventing new products and services. Entire lines of business may emerge as a result of solving data needs. The best example is Amazon Web Services which started 15 years ago and grew to a projected $13 bn revenues this year. In many industries, it has become the norm to “monetize” the data and analytics expertise.

Improving user experience

UX is a major differentiator in the digital world. Data science helps understand what, when and why something is needed through A/B testing, mapping consumer journey, speech or text processing, segmentation and other techniques. Getting the UX right is taking the friction out of the human-product interaction and has the potential to bring in the “Wow” factor for consumers.

Predicting outcomes

The ability to foresee results under different “what-if” scenarios is increasingly critical to all aspects of business – from finance to operations, legal, marketing or communication.
All of the above applications require heavy experimentation, and there are very few data science activities whose result is perfectly determined beforehand. Often a data scientist may work on many little “silly” exercises before tackling one large-scale challenge. Taken in isolation, none of the toy exercises is really worth it. However, put them all together and it is easy to see that the probability to find something great multiplies considerably.


Let’s go back to the initial question: “What problem are you solving?”. Beyond the project at hand, this question may indicate a lack of sufficient communication around the need for doing research in general.

Whether you are an individual contributor or a team leader, it will help if you make a case for research very early on by explaining the probabilistic nature of success. For instance: “We will try 10 different things with the expectation that 3 will work out and that 1 will very likely pay off”.

Second, it will also help if you convey the big picture, beyond the scope of the project discussed. For instance: “We will try to understand the time of purchase, but the bigger theme here is Know Your Customer: a key ingredient to Marketing”.

Finally, great ideas can emerge not only from your existing customers but also from other industries, academic research, conferences or books. As long as you fail reasonably fast, there is little harm in experimenting!