
With various fast-growing technological innovations promising to enable digital transformation across enterprises, IT and business leaders are beginning to consider artificial intelligence (AI) as an attractive solution to many complex issues in operations and infrastructure. How can AI transform IT operations, and what are the key success factors to maximize your return on investment?
Context
In a 2019 survey where business leaders ranked their top three IT priorities, 82% of them responded that their priority was managing a digital transformation initiative or program1. As leaders strive to modernize their businesses by adopting new technologies and processes, the demand for support from IT operations subsequently increased along with the complexity of IT issues.
To fully seize the value of their IT investments, IT leaders are now seeking new solutions to tackle the workload, improve efficiency, and cut costs. Many are considering AI for its power to transform IT and business operations, but they also need to be prepared for the key challenges during implementation.
How AI can transform IT Operations
IT operations teams are faced with more and more complex issues while the demand from business and end users grows. AI can help improve application monitoring by centralizing data and producing more accurate predictions and causality. On the other hand, AI can help streamline service desk ticketing process to save time for the service desk team to work on more complex issues.
A company can adopt new technologies to meet new business needs and upgrade the existing tools, meaning that there are more applications to monitor. Adding to the complexity of the situation are the additional IT investment decisions made across the business functions without the involvement of the IT organization – in 2018, 24% of global services leaders (managers and above) reported to have made purchase decisions without including IT2.
Currently, most applications are paired with their own application performance monitoring (APM) tools or digital experience monitoring (DEM) tools, resulting in data being collected in silos. Hence, infrastructure and operations leaders are faced with the challenge to efficiently monitor all the applications across the organization and create a comprehensive view of overall performance for incident prediction, causality identification, and decision-making.
One solution is artificial intelligence for IT operations (AIOps) platforms with its data processing power.
Figure 1 – AIOps enables event correlation across applications and business-IT domains
Without replacing any existing APMs, AIOps platforms allow all the monitoring logs to be centralized and processed using natural language processing (NLP): data of different sorts can be aggregated and analyzed together. Instead of looking at applications one at a time, AIOps enables event correlation across applications and business-IT domains, and machine learning (ML) models can then be applied to detect anomalies and identify causality within the IT ecosystem.
Infrastructure and operations leaders can also leverage AI for more accurate incident prediction and alerts. Traditional monitoring tools have models that are either hardcoded or predetermined by vendors. Hence, they are modeled on patterns that are common and can be easily discovered by humans. With machine learning algorithms, models are built directly from data and take into consideration new patterns created by the machines. As APM collects more data for inputs, these models will continue to evolve; hence making them more precise and up to date than preprogrammed models. Periodically, data scientists will perform validation to ensure that the models are still understandable, and help improve prediction by either identifying new correlations or correcting inaccurate results.
Identifying the root cause is key in incident response and future prediction, and AI can help speed up the process. As mentioned above, AIOps platforms aggregate data from different departments or domains and generate event correlations that may not be obvious if the investigation only focuses on one application. While AI enhances pattern recognition, it will also help uncover cause and effect relationships among application performance data. Once the root cause has been identified, IT operations can respond to incidents better and faster with more targeted resolution. Moreover, understanding the cause will help prevent similar future incidents.
When an issue occurs with any IT products or services, business users will escalate the incident to the service desk. Before implementing AI, each stage of the ticket management requires human involvement to understand the context and then take actions based on individual experience or expertise. It could take days, if not weeks, to close a single ticket given the volume of requests, and the limited capacity of IT support team.
Figure 2 – AI can help increase the availability of experts for more complex tasks
With AI-powered virtual support agent (VSA), business users will be able to reach the service desk on demand, whether a human agent is available or not. Utilizing NLP, VSAs can identify common questions and commands from the conversation with business users. Then, they will be able to provide answers by searching through knowledge center and resources as well as perform basic tasks such as updating password or restoring settings.
Another way to use ML and NLP to improve ticket lifecycle is to automate ticket categorization and optimize the availability of experts. Starting by analyzing the ticketing process and understanding the key metrics of operational expectations (performance metrics, confidence thresholds, etc.), a predictive model can be built leveraging the existing backlog and closed ticket history. Therefore, when a ticket is generated, the request will be qualified and automatically sorted to an incident category, and subsequently assigned to one of the available technical experts responsible for that type of requests. This will help the ticket get to the right level of support faster while giving more time back to the Level 1 support to handle more complex requests.
AI can help transform the IT operations and, at the same time, improve data management and facilitate centralization of databases. This in turn enhances the incident prediction and the causality identification. In addition, AI can increase the availability of the service desk and therefore improve the end user experience with quicker turnarounds.
To maximize the value of these benefits, some key success factors need to be addressed to facilitate a successful AI roll-out.
Key success factors and how to implement them
To achieve success in AI implementation, decision-makers need to make sure that the needs and targets for the implementation are first agreed upon by both business and IT leaders. Secondly, the organization should be adequately informed throughout the project with a comprehensive communication plan. Thirdly, data governance and infrastructure must be properly implemented. And lastly, the organization’s core values should be incorporated in the design of the models so that bias can be minimized.
To drive business values using AI, business leaders need to share their priorities and objectives with IT leaders who will then introduce these priorities as part of the guidelines and specifications for machine learning models to generate useful business insights. Decision-makers should choose new technologies that adapt to the requirements as the most advanced technology may not always be the best fit for the specific business needs.
Preparing the organization for transformation is critical for the success of adoption. As early as POC phase, business and IT leaders need to showcase the functionality of the AI technology, and explain the changes in day-to-day work (where they come from, their objectives and impacts) to onboard business users and IT operations team. These will help demystify the new technology, establish interest and excitement, and build users’ confidence in the tool.
The data governance should be effectively defined in order to implement a sustainable data strategy that ensures good data quality and efficiency of data valorization: the organization, processes, and controls from data collection to storage and recovery. Before introducing new technologies, it is essential to know the data heritage in order to have secure foundations to build on. The infrastructures need to be able to handle large volumes of data and process data of different types, especially when data are pooled together in centralized data lakes instead of spread-out in multiple siloed databases.
Predictive models need to incorporate the core values of the organization to avoid unwanted bias in pattern identification and event correlation. When selecting datasets for training, data scientists need to make sure that the population does not include any systemic or institutionalized bias. Hence, they need to clean up data for accuracy, ensure the population is truly random and inclusive, and constantly monitor and audit the models. For instance, the Local Interpretable Model-agnostic Explanations (LIME) method helps to determine how predictions change when modifying inputs and parameters.
Conclusion
To keep up with the high demand for new technology, IT leaders can consider implementing AI in IT operations while keeping in mind the challenges when it comes to business objectives, change communication, data management, and model bias. For executives, AI helps to generate quick business insights and monitors applications performance to track the return on investment of the new and legacy tools. For the service desk team, AI streamlines simple tasks and optimizes the workload based on individual expertise. For users, AI powered VSA executes simple requests or suggest solutions right away.
In this article, we focused on the benefit of AI and AIOps platforms. In addition to incorporating AI, companies can also leverage blockchain technology to drive data analytics and combat some of the challenges mentioned above. For instance, data heritage is easy to trace as they are recorded and verified by a distributed network of computers every time a transaction or activity takes place, hence ensuring the integrity of data. Moreover, companies can leverage the computational power of blockchain to perform analysis on large sets of data, and make trustworthy predictions given the accuracy of data stored on a blockchain network.
Optimizing the 3 Stages of Your Cloud Software Development Lifecycle
May 25, 2023
Your Cloud Optimization Strategy requires seamless coordination between optimization levers throughout the SDLC to produce and maintain effective cloud solutions. Discover best practices and improvement opportunities for each lever, where they fit in the SDLC, and how to synergize them effectively.
Accelerate Cloud Maturity with the Right Cloud Optimization Strategy
May 18, 2023
Migration is only the beginning of the cloud journey. Moving to the cloud is not enough to leverage its advantages – a central, organized framework is needed to direct efforts. Learn to formulate a strategy customized to your needs and optimize your cloud enterprise continuously with a Cloud Optimization Strategy.
Have a Question? Just Ask
Whether you're looking for practical advice or just plain curious, our experienced principals are here to help. Check back weekly as we publish the most interesting questions and answers right here.