AI: Stop Kidding Yourself, Focus On Data First
The internet is littered with articles highlighting the colossal failure rates of AI and Big Data initiatives across industries. The numbers range between 87% and 60% — that is unfortunately the percentage of projects that fail, despite CIOs and CDOs channeling gobs of monies into all things AI-related. It’s disturbing just how bad the success rates are for AI, Data Science, Analytics, and Big Data projects. Worse, the numbers from multiple annual studies aren’t changing much. VentureBeat AI reported 87% of data science projects never make it into production; per a NewVantage survey, 77% of businesses reported that business adoption of Big Data and AI initiatives continues to represent a big challenge for business, meaning majority of the software being built is going nowhere; per Gartner, 80% of analytics insights will not deliver business outcomes through 2022; while another Gartner report stated that 60% of Big Data projects fail to move past preliminary stages, a number that was later corrected to 85%.
In majority cases unfortunately, companies have ‘technology solutions in search of a problem’, when common-sense dictates the exact opposite. In fact, surveys have repeatedly shown that technology is the least of the problems. Throwing big money or repeatedly changing leadership, often times bringing seasoned veterans, doesn’t always seem to lead to success either. What then, is the root-cause behind these rather expensive failures?
We must start by asking a simple question: “What the heck does it even mean to Use AI?” I have increasingly stopped using the term AI, replacing it with a more palatable (for me) Advanced Analytics. I feel less guilty using the phrase yet it communicates the message most want to hear. Drawn from my own personal experience of dabbling in Data Science over the past several years, Advanced Analytics comprises 4 essential ingredients: Problem Understanding, Data Management, Modeling & Analytics and Solution Deployment. Lose one, and the entire pie falls apart. Let us literally compare the above pie-chart with the art of baking a pie. Forgetting to add sugar, adding too much water in the flour, over-baking or under-cooking, are all recipes for disaster. One misstep, and the entire pie becomes unpalatable. The same is true for the Advanced Analytics pie.
A large number of AI initiatives just outright miss the age-old tenet of applying right solution to the right problem. Sometimes people think, all they need to do is throw money at a problem or put a technology in, and success comes out the other end, something that just doesn’t happen. Although representing only 10% of the pie-chart, there is absolutely no substitute for domain expertise, that is, having a clear understanding of the problem to be solved. Domain experts might well be able to solve complex problems with very simple data aggregation and charting or with basic statistics rather than employing complex AI algorithms. What are touted as “AI solutions” in fact, might often be simple linear regressions or outright 2-dimensional scatter plots. But knowing how to plot what variables against each other, or what variables to use to predict the desired outcome, is the forte of domain experts. Organizations undergoing AI transformations often burden their domain experts with the task of training newly-hired AI/ML experts, something that causes more harm than good in most instances, especially when the end solution ends up using basic statistics or BI dashboards. The result: Domain experts feel they could have done it themselves, given enough time and resources; the AI experts feel underwhelmed, not having used their potential to the fullest.
The other 10% of the Advanced Analytics pie — Solution Deployment aka DevOps — is equally important. Imagine employing a complex ML algorithm like XGBoost to predict key variables responsible for an undesired result. An AI expert might implicitly understand how to implement the algorithm and use it to produce accurate results but >95% of the organization might be interested in only using the technique as a black box. We all also know that execs like to see the ‘What’ not ‘How’ of problem-solving (the execs admittedly have other business issues to deal with). DevOps is the only way to democratize complex algorithms across the organization. A high-quality web-portal or a PC-based thick client might be the most effective way to shield the underlying complexity of algorithms for most of the organization. One must just ensure that the solution is not the forte of a single AI expert in the team, meaning there is more than one individual who can demystify the black box.
I am going to skip talking about the actual analytics part — the 30% piece — simply because that is what comes to the heads of most people when they hear the word AI. It’s the most glamorous piece, the one that gets more press than everything else on the pie-chart. It’s the focus of 10s if not 100s of online-courses, college degrees and bootcamps. And that is what is often considered the elixir to drive organizational productivity manifold. The only point I will make though is that AI Modeling, Training and Analytics represents only 30% of the pie-chart.
Which leaves us with the biggest slice of them all, almost 50%, and often the least exciting facet of Advanced Analytics — Data Management. Data is what makes all of these analytics and capabilities possible, but most pay least attention to it unfortunately. Most organizations are highly siloed, with owners who are simply not collaborating and leaders who are not facilitating cross-domain communications. I’ve had data scientists tell me they could have done a project, but they couldn’t get access to the data. I find myself unblocking my team’s data access issues more than spending time & effort in getting them trained on Analytics methods. The problem with data is always that it lives in different formats — structured and unstructured, video files, text, and images — and is kept in different places with different security and privacy requirements. Inconsistent and difficult-to-access data results in projects slowing to a crawl right at the start, because the data needs to be collected and cleaned. Most organizations lose patience without quick, early wins. And this, according to me, is the #1 reason for the failure of a majority of AI initiatives.
Just like a child cannot write well-formed sentences unless and until she has learned basic alphabets and added words to her vocabulary, Advanced Analytics cannot succeed without first fully understanding the underlying data. One must always start every Analytics project by first building the ability to access, extract, cleanse, manipulate and comprehend all available data. Developing this ability is the all-important first success one must have before embarking on rest of the Analytics journey. “Success (however small) breeds success,” is the mantra I often profess to my teams.
If there is only one place an organization could invest in — capital, infrastructure, headcount and training — it must be in Data Management. Organizations must focus on data even if they don’t adopt AI. By Data Management, I mean all 3 aspects: (1) Data Ingestion — the process of moving data from its source to some form of storage, (2) Data Storage, and (3) Data Consumption — the process of moving data from the storage to end algorithms and DevOps. Once the data has been cleansed and made readily available, it can, at the very least, be easily extracted and plotted. You would be surprised how many business problems can be fixed with just easy access to data. Simple BI dashboards and Excel charts would suddenly start delivering incredible value. And of course, stitching that 30% Analytics slice to create automated predictive and prescriptive AI solutions would become infinitely easy.
“Data is the new oil” — it’s a colloquially used idiom lately, often to encourage adoption of AI methods. The argument should rather be to treat Data as a valuable asset first. Massage it, pamper it, store it in beautiful barrels. Splurge on it. AI analytics can wait.