Realize

Beyond the Norm: Unsupervised Anomaly Detection in Telecommunications with Mahalanobis Distance

‍

Overview

This paper presents an unsupervised anomaly detection framework for telecommunications networks using Mahalanobis Distance (MD). The approach identifies network anomalies in high-dimensional Key Performance Indicator (KPI) data without requiring labeled datasets. By leveraging multivariate relationships among KPIs and implementing hierarchical aggregation across network levels (cells, sectors, and sites), the methodology enables telecom operators to proactively identify and localize network issues with high computational efficiency.

The Problem

Telecommunications networks generate massive volumes of high-dimensional KPI data from thousands of network elements that must be processed in near real-time. Traditional anomaly detection faces several critical challenges:

Lack of Labeled Data: Obtaining labeled examples of network anomalies is expensive and time-consuming, making supervised approaches impractical at scale.
High Dimensionality: Network data involves dozens of correlated KPIs, making it difficult to identify meaningful patterns.
False Positives and Negatives: Seasonal patterns, correlated features, and contextual factors (like time of day) can lead to incorrect detections—either flagging normal variations as anomalies or missing actual network failures.
Computational Overhead: Many existing anomaly detection methods are computationally expensive, making them unsuitable for large-scale, real-time network monitoring.
Interpretability: Network operators need to understand why an anomaly was flagged and which KPIs contributed to it, not just receive a binary alert.

The Solution

The paper proposes a comprehensive MD-based framework that addresses these challenges through several key innovations:

Mahalanobis Distance for Multivariate Analysis: MD measures how far a data point is from the center of a distribution, accounting for correlations between features. This makes it ideal for detecting anomalies in multivariate KPI data where features are interdependent.
Data Preprocessing Pipeline:
- KPI Ratio Adjustment: Ratios (like success rates) are adjusted based on sample size to avoid false positives from small samples.
- Feature Normalization: Distributions are normalized to ensure all KPIs contribute appropriately to the anomaly score.
- Contextual Awareness: Time-based and network-level context is incorporated to distinguish normal variations from true anomalies.
Dimensionality Reduction: A systematic feature selection process identifies the most relevant KPIs, balancing model efficiency and detection accuracy while maintaining broad anomaly coverage.
Hierarchical Aggregation Strategy: Anomaly scores are calculated at the cell level and then aggregated upward to sectors and sites. This multi-level approach enables operators to:
- Localize issues to specific network elements
- Prioritize troubleshooting efforts
- Understand the scope and impact of anomalies
SHAP-Based Interpretability: SHAP (SHapley Additive exPlanations) values are used to explain which KPIs contributed most to each anomaly, providing actionable insights for network engineers.

Why It Matters

This framework delivers significant operational benefits for telecommunications providers:

Scalability: The unsupervised approach eliminates the need for labeled training data, making it practical to deploy across large, diverse networks.
Computational Efficiency: MD-based detection is significantly faster than baseline methods (Isolation Forest, Local Outlier Factor, One-Class SVM), enabling near real-time monitoring.
High Accuracy: Achieves competitive detection performance (AUC values) while maintaining low false positive rates.
Actionable Insights: The combination of hierarchical aggregation and SHAP-based explanations helps operators quickly identify root causes and prioritize responses.
Cross-Dataset Generalization: Validation across different time periods (summer, winter, spring) demonstrates that the correlation structure of KPIs remains stable, allowing the model to generalize across seasonal variations.
Practical Use Cases: Case studies demonstrate the model's ability to pinpoint specific issues, such as identifying the Random Access Channel (RACH) success rate as a key anomaly contributor.

‍

Relevance Beyond Telecommunications

The principles underlying this MD-based anomaly detection framework have broad applications across industries that deal with multivariate time series data:

Industrial IoT and Manufacturing: Monitor sensor data from production lines or machinery to detect equipment failures, quality issues, or process deviations before they cause downtime.
Energy and Utilities: Analyze data from smart grids, power plants, or distribution networks to identify anomalies that could indicate equipment failures, cyber-attacks, or inefficiencies.
Financial Services: Detect fraudulent transactions or unusual trading patterns by identifying deviations from normal multivariate behavior across multiple financial indicators.
Healthcare Systems: Monitor patient vital signs or hospital system metrics to detect early warning signs of medical emergencies or operational issues.
Cloud Infrastructure and IT Operations: Identify performance degradation, security threats, or resource bottlenecks by analyzing system metrics, logs, and performance indicators.
Transportation and Logistics: Monitor fleet operations, traffic patterns, or supply chain metrics to detect disruptions, inefficiencies, or safety issues.

The framework's emphasis on interpretability, computational efficiency, and unsupervised learning makes it particularly valuable in domains where labeled anomaly data is scarce but operational reliability is critical.

‍

Technical Details

Anomaly Detection Method: Mahalanobis Distance (MD) with threshold-based classification. MD is calculated as: MD(x) = √((x - μ)ᵀ Σ⁻¹ (x - μ)), where μ is the mean vector and Σ is the covariance matrix.
Preprocessing Techniques:
- Ratio adjustment using sample size weighting
- Feature normalization (standardization)
- Outlier handling and data cleaning
Feature Selection: Systematic evaluation of KPI subsets to identify the optimal balance between model complexity and detection performance.
Hierarchical Aggregation: Anomaly scores computed at cell level, then aggregated to sector and site levels using statistical measures (mean, max, percentiles).
Interpretability Framework: SHAP (SHapley Additive exPlanations) values quantify each KPI's contribution to anomaly scores, enabling root cause analysis.
Benchmark Comparison: The MD approach is compared against:
- Isolation Forest (IF)
- Local Outlier Factor (LOF)
- One-Class Support Vector Machines (SVM)
Results show MD achieves competitive AUC scores while requiring significantly less computational time, especially on large datasets.
Validation: Cross-dataset validation across different seasons demonstrates model robustness and generalization capability.
Key Findings: The RACH (Random Access Channel) success rate is identified as a critical KPI for anomaly detection, and dimensionality reduction improves both efficiency and accuracy.

‍

Status: Published
Journal: MDPI Computers
Volume: 14
Issue: 12
Article Number: 561
Publication Date: December 2025
DOI: 10.3390/computers14120561
Authors: Aline Mefleh, Michal Patryk Debicki, Ali Mubarak, Maroun Saade, and Nathanael Weill

‍