Efficient ticket handling becomes paramount as the telecommunications industry embraces new advancements like 5G and technology stack disaggregation. Manual processes and lengthy queueing times can no longer keep up with these cutting-edge networks’ increasing complexity and demands. However, by harnessing the potential of advanced AI/ML algorithms and ticket automation, network operators can experience a transformative shift in their operations. This blog explains the ticket creation and resolution process and delves into the benefits of using ML algorithms and automation at different operational stages to address challenges amplified by the increased complexity of modern networks.
Process of Ticket Creation and Resolution
1.1 Ticket Creation: Origins and Varieties
Ticket origins can vary widely from sources such as human submissions to address customer complaints, equipment faults and alarms or to resolve network performance issues. Additionally, tickets can be automatically generated through alarm automation or third-party systems. This diverse range of ticket origins reflects the multifaceted nature of the issues that the network operations must handle.
Automated ticket creation through alarming systems often marks the initial step in the automation journey of a Network Operations Centre (NOC). Many fault management systems already have capabilities to automate ticket creation based on a received alarm type and a set of business rules. However, next steps that would include automated triaging, root cause determination and remediation automation require more sophisticated AI/ML approach.
1.2 Queue Allocation and Ticket Troubleshooting
Once a ticket is created, it is allocated to a specific queue manually by a human operator or automatically by a system. However, accurate queue allocation can be challenging, leading to inefficiencies. Tickets may spend unnecessary time in a queue, only to be later reassigned to another team or operator, thereby extending the waiting time. Accurate and efficient queue allocation is crucial to minimize delays and promptly ensure tickets reach the appropriate teams or individuals.
Once a ticket is correctly allocated to a team responsible for its resolution, it must reach the front of the queue before the troubleshooting process starts. This stage involves analysing the issue, identifying its root cause, and devising an appropriate resolution action. The duration of troubleshooting can vary significantly, ranging from a few minutes for simple problems to several days for complex issues. The expertise and efficiency of the support team play a vital role in expediting this stage.
1.3 Data Science Approach for Ticket Allocation and Resolution
Addressing the challenges of ticket allocation and troubleshooting requires a comprehensive method that can handle both tasks simultaneously. In this context, a data science approach, that leverages information retrieval techniques and a Large Language Model (LLM), effectively matches and analyses tickets.
The foundation of this method lies in the information retrieval task. By utilizing a database of historical tickets and the description of a new incoming ticket, the system endeavours to find the most relevant past ticket that closely matches the new one. This is achieved through the semantic similarity of the text content embedded in the tickets. The LLM plays a crucial role in understanding and comparing the ticket content, leading to the generation of a similarity score for each potential match. The top matched entries, along with their corresponding similarity scores, are then provided as the output.
Once the tickets are successfully matched based on their semantic similarity, the next step is to engage a recommender system that takes into consideration various ticket attributes. These attributes include the ticket origin; such as customer care, service request, change management, or alarm-triggered, as well as network element and user associations, ticket severity, and others. The recommender system effectively processes this information and generates recommendations for the team most likely to resolve the ticket and the best resolution approach.
Another benefit of using the LLM is the question answering capability. A user can ask specific questions to the system to get additional insights. For example, questions like “will restarting the software resolve the problem of high CPU usage?” or “can you create a report with this information?”.
An important aspect of the process is model evaluation and the assessment of system sensitivity and specify (false positives, false negatives). This is being continuously assessed and if the model's performance is unsatisfactory, iterations can be made by refining the pre-processing steps, adjusting the model architecture, or augmenting the dataset to improve predictions.
Another dimension of the problem that we account for is the model life cycle. To sustain, the algorithm needs to get access to updated databases. This can be addressed by leveraging tools such as vector databases. Those tools represent information as a vector of numbers and an algorithm manages to retrieve information in a timely manner.
By combining the power of information retrieval with a Large Language Model and a feature-rich recommender system, this data science approach streamlines the ticket allocation and resolution process. It enables efficient ticket handling by identifying past cases with similar characteristics and suggesting the most suitable team and resolution strategy for the new tickets. This approach ultimately enhances ticket management efficiency and ensures quicker and more effective problem resolution.
1.4 Remediation Automation
As observed earlier, the Root Cause Analysis (RCA) process leads to identifying the most appropriate resolution action for a given issue. This action could involve escalating the ticket to a higher level of support for further investigation or automating specific tasks, such as software restarts, system re-provisioning, or parameter changes.
Automating such resolution actions is feasible when they are repeatable and well-documented. Typically, support teams maintain a library of Methods of Procedure (MOP) that can be automated using a hyper-automation framework encompassing various access methods like Robotic Process Automation (RPA), Command-Line Interface (CLI), and Application Programming Interfaces (API).
An output-based decisioning engine further enhances the framework's capabilities. It is a component of an automated system that makes decisions or takes actions based on the output or results of previous steps or processes. It is designed to evaluate the outcomes or predictions produced by various algorithms, models, or modules within the system and then make informed decisions or initiate further actions accordingly.
As part of the remediation automation process configuration, the support team creates templates for specific MOPs and integrates it into the automation framework. Consequently, an API is exposed for the corresponding MOP. When the system provides a remediation recommendation that matches an already exposed API, it can be automatically triggered, streamlining the resolution process.
By automating repeatable tasks through a well-structured hyper-automation framework, the support team can improve efficiency, reduce manual intervention, and accelerate the resolution of issues. The combination of various access methods and the output-based decisioning engine ensures flexibility and adaptability to diverse scenarios, contributing to more efficient and reliable support operations.
1.5 Ticket Closure: Validating Outcomes and Completion
Once the resolution action is executed, its outcome is thoroughly validated to ensure the issue has been successfully addressed. This verification process confirms that the desired result has been achieved and the problem has been resolved to the satisfaction of the customer or network performance requirements. With a validated outcome, the ticket can be closed, marking the completion of the resolution process. In the NOC automation transformation, the validation process is just another step within the output-based decisioning engine of the automated ticket resolution.
Business Case for NOC Transformation
Once the ML algorithms allocate the correct queue and predict the ticket resolution type, the most evident advantage of ticket automation is its ability to reduce the Mean Time to Repair (MTTR) drastically. By leveraging automation, what might have taken hours can now be accomplished within minutes, allowing support teams to address issues promptly. Moreover, even if the actual task execution time was relatively short, tickets often spent a substantial duration in the queue awaiting manual processing. It is common for queueing time to account for most of the MTTR. For instance, if an employee spends 10 minutes executing a software restart, but the ticket spends 90 minutes in the queue, the overall MTTR is 100 minutes. In most cases, software restarts are ticket types where ML copes very well with the prediction precision; hence, they can be fully automated. In this example, a reduction from 100 to just 1 minute by automating the ticket workflow is perfectly achievable. This represents a staggering 99% reduction in the overall resolution time. On the overall operational scale, 40-50% of MTTR reduction is feasible.
Ticket automation goes beyond mere task execution. By automating routine and repetitive ticket-handling processes, valuable time is freed up for support personnel. With reduced queueing time for simpler tickets, employees can focus their expertise on more complex issues, accelerating their resolution. This ripple effect within the support team creates a positive feedback loop that further drives the overall MTTR down. Partial automation can already yield significant improvements, but the MTTR reduction can reach its maximum potential as it becomes more pervasive.
There are significant business benefits of the MTTR reduction and ticket automation; below are a few examples:
Improved Customer Satisfaction: Customer satisfaction levels rise when issues are resolved quickly and efficiently. By reducing the MTTR, businesses can address customer concerns promptly, leading to happier and more loyal customers. Satisfied customers are likelier to remain loyal, refer others to the business, and contribute to a positive brand reputation. This particularly applies to customer care-related tickets where each 1% improvement in First Call Resolution (FCR) increases transactional NPS by 1.4 points for an average call centre￼
Minimized Downtime: Downtime can be costly for businesses, leading to lost productivity, revenue, and customer trust. By reducing MTTR, organizations can minimize the duration of service disruptions and outages. This translates to less downtime for customers, enabling them to continue their operations smoothly and reducing the negative impact on business operations. This leads to improved NPS and reduced CHURN.
Increased Productivity: MTTR reduction allows employees to spend less time troubleshooting and resolving issues. This frees up their valuable time, enabling them to focus on more strategic tasks, projects, and customer-facing activities. Improved productivity can lead to better efficiency, innovation, and overall business growth.
Unlocking OPEX Benefits: While the primary motivation for ticket automation is often to enhance service delivery and reduce MTTR, the benefits extend beyond efficiency gains. As the MTTR approaches its minimum achievable level, organizations can leverage the automation infrastructure to drive substantial OPEX reduction. With reduced manual intervention, personnel resources can be optimized, leading to potential cost savings. This OPEX reduction becomes particularly relevant in the current economic climate, where organizations increasingly focus on optimizing their operations.
Imagine a hypothetical (but realistic) business case for ticket automation. Begin by assuming that an operator is having around 120,000 tickets per year, 1/3 of which are already automated to some extent and 1/3 are not feasible for automation for different reasons (e.g. issues requiring local presence). This leaves 40k tickets feasible for automation (see Figure 1). The average ticket resolution time for this class of tickets is 1.5 days. The staff engagement level for these tickets is around 10% (i.e., 90% of time ticket spends in queues and idle, waiting to be resolved). This gives 6,000 calendar days or 18,000 man-days needed to resolve all these tickets. Multiplying this by the cost ($500) of a Full Time Employee (FTE) and taking a conservative figure of the percentage of the time reduction associated with the automation (30-50%), it arrives to the savings range of $2.7 to $4.5 million. The saving figure doesn’t include additional benefits of MTTR reduction like improvement in the customer satisfaction.
In conclusion, the telecommunications industry is facing a pressing need for efficient ticket handling as networks become more complex with the advent of technologies like 5G SA and stack disaggregation. By harnessing the power of AI/ML algorithms and ticket automation, network operators can revolutionize their operations, leading to reduced MTTR, improved customer satisfaction, and increased productivity.
The data science approach, utilizing information retrieval and Large Language Models, streamlines ticket allocation and resolution by providing relevant historical matches and valuable recommendations for resolution teams. Integrating an output-based decisioning engine further enhances the system's adaptability and decision-making capabilities.
Ticket automation yields a plethora of benefits, from optimizing personnel resources and driving substantial OPEX reduction to minimizing downtime and unlocking improvements in customer satisfaction.
Looking ahead, Reailize envisions a future where 100% of tickets can be automated, leading to a truly "dark NOC" - a seamlessly automated Network Operations Centre where lights can be switched off, as AI-driven processes take the lead. With this vision in mind, Reailize sets forth on a transformative journey towards streamlined and highly efficient network operations in the telecommunications industry.
Are you ready to embrace the potential of AI and automation to achieve a "dark NOC" future?