5 Lessons To Help You Avoid AIOps Pitfalls

essidsolutions

The increasing demands on IT operations have made IT Ops leaders focus on AIOps. To realize the full potential of AIOps, IT Ops teams must adopt these five lessons and avoid these pitfalls in their AIOps journey, says Yoram Pollack, director of product marketing, BigPanda.

AIOps is still relatively new compared to existing technologies such as enterprise data warehouses, and early on, many AIOps projects suffered hiccups, the aftereffects of which are still felt today. That’s why, for some IT Ops teams and leaders, the prospect of transforming their IT operations using AIOps is a cause for concern. Well, not all of them do. In fact, when structured and executed properly, many of them succeed exceedingly well.

At the same time, AIOps has matured enough to a point where a critical mass of enterprises today – including some of the largest companies in the worldOpens a new window – have successfully deployed it and have learned valuable lessons along the way.

5 Lessons To Deploy AIOps Projects

Here are the five main lessons learned to help you avoid any pitfalls when setting out on your own AIOps journey.

1. Set your expectations straight

Yes, we all know the saying,  “Aim for the moon, and even if you miss you’ll land among the stars”. But unfortunately, this doesn’t apply to AIOps adoption. As Gartner stated in its recent market guide for AIOps platforms – enterprises should “prioritize practical outcomes over aspirational goals by adopting an incremental approach…” when deploying AIOps platforms.

Biting off more than you can chew can delay your AIOps project – often by months or even years. Start small, where it hurts the most in your IT operations ecosystem, or what causes the most delays to your incident management lifecycle. Do so by integrating one tool at a time and testing one AIOps capability at a time. Once you are satisfied, you can incrementally add more tools to the AIOps platform and then test more capabilities. In addition to making sure that your AIOps platform has proven itself before you begin to fully rely on it, this step-by-step approach also gives your team the chance to accumulate the skills and confidence they need over time.

Additionally, remember that there is a tendency to think that AI behaves in a human-like manner. And so, it is often anthropomorphized and thought to have unrealistic “superhuman” capabilities. The reality is that AI in IT operations is algorithmic and relies on alert ingestion, normalization and enrichment (or tagging) before correlation patterns can be generated, tested, and refined.

See More: Why System Integrators Must Modernize to Enable Data-Driven Business Models — Or Be Left Behind

2. Make sure you can integrate with all your existing tools

You’ve probably invested a huge amount of resources, and years of development and customization in your monitoring, change, topology, collaboration and remediation tools. And they are more than likely tightly integrated into your IT Ops workflows and processes, reflecting your organization’s knowledge and experience in providing services to your customers.

So it’s imperative that your chosen AIOps platform is able to integrate with these tools and ingest their data. Otherwise, vital information and key capabilities needed for the AI to work properly will be missing. And that’s beside the fact that a long and painful long rip-and-replace project can easily derail a project just by the sheer amount of effort and a long time to value.

3. You need to be able to adequately prepare and cleanse your data

“Garbage in, garbage out” is a well-known maxim in IT, and it applies to IT operations as well.  As we just mentioned, it’s critical to ingest all the alerts from all your tools. But it’s not enough. The quality of these ingested alerts also has a great impact on the success of AIOps solutions.

Alert quality is a combination of several attributes: actionability, clarity, and the presence or absence of contextual information. The key to high-quality alert data is cleansing and preparation through normalization, enrichment and tagging – which add the needed clarity and context.

The importance of a common alert taxonomy (aka normalization), as well as the presence of context in these alerts  is clear when looking at event correlation. AIOps tools correlate the hundreds of thousands of alerts they ingest into a small number of high-quality, actionable incidents – by detecting correlation patterns in the data through the use of  AI/ML. This is very difficult to achieve without a common alert naming scheme or with context-less data and leads to limited, low-quality incidents as a result of weak correlation.

Similarly, successful root cause analysis in modern IT environments relies on the detection of the different dependencies between infrastructure and application components. This information is either buried in the incoming data and needs to be extracted or is contained in external data sources and needs to be added to the alerts. These external data sources include inventory management systems, orchestration tools, APM service or flow maps, CMDBs and more. To be able to match incidents to IT changes that may be causing them, you need to be able to add information that resides in a variety of additional tools such as CI/CD, change management, and more.

So it’s clear why your AIOps platform must provide built-in normalization, enrichment and tagging capabilities that can add all this much-needed context at scale and be able to process millions of IT alerts every day.

See More: 6 Data Cleansing Strategies Your Organization Needs Right Now

4. Your AI/ML needs to be explainable

Good data going into your AIOps platform will get you good results, and successfully leveraging your existing tribal knowledge to train and configure the AI  will definitely benefit you.  But, you also have to be able to see, understand and edit the correlation logic as the AI/ML trains itself. Unfortunately, some solutions still obscure it and do not provide adequate control and testability. This is one of the most common causes of AIOps failure.

Google spam filters are a good analogy. Google provides a baseline configuration that’s very sophisticated at detecting spam. But it does give you the choice of classifying something as spam on your own or removing the spam tag from a wrongly detected email. It provides an explanation of its decision and then learns from your intervention moving forward.

The same is true for AI/ML in IT Ops. Your teams have to trust the results your AIOps tool is producing, and that trust comes from explainability. They need to understand why the AI correlated certain alerts together, and they must then have the ability to either accept or change the correlation pattern, so it produces the desired result. Remember, you can have the best AI in the world, but if your teams don’t understand why it’s grouping certain alerts together (and why it’s not grouping others), they are always going to be suspicious of the results even when they are correct, and eventually avoid using the ML.

5. Your AIOps needs to be democratized

No two enterprises are exactly the same. Some utilize centralized IT Ops and NOC teams, and others use distributed DevOps and SRE teams. Some have huge and complex IT organizations, while others can afford to be small and nimble. Some were born and bred in the cloud, while others are just starting their digital transformation. And in each of these enterprises, there are many important stakeholders that can, and need, to benefit from AIOps, from the actual practitioners all the way up to heads of BUs and CIOs.

Conclusion 

AIOps platforms must be accessible and present their data, views and dashboards to every persona in your organization, no matter which type of enterprise you belong to. Additionally, the platform cannot be reliant on data scientists, and a configuration cannot depend on third-party consultants and product experts, and the admin overhead needs to be minimal. Only then can you realize the full potential of your AIOps investment.

Did you find this article helpful? Tell us what you think on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d be thrilled to hear from you.