Advanced Semi-Supervised Learning With Uncertainty Estimation for Phase Identification in Distribution Systems

Bayesian Neural Network
Semi Supervised Learning
Uncertainty Estimation
Conference
Author

Kundan Kumar

Doi

Citation (IEEE)

K. Kumar, K. Utkarsh, J. Wang, and H. V. Padullaparti, “Advanced Semi-Supervised Learning With Uncertainty Estimation for Phase Identification in Distribution Systems,” Proc. IEEE Power & Energy Society General Meeting (PESGM), 2025.

Abstract

The integration of advanced metering infrastructure (AMI) into power distribution networks generates valuable data for tasks such as phase identification; however, the limited and unreliable availability of labeled data in the form of customer phase connectivity presents challenges. To address this issue, we propose a semi-supervised learning (SSL) framework that effectively leverages both labeled and unlabeled data.

  1. Why Phase Identification Needs a New Approach ?
Illustration of Semi-Supervised Learning Techniques
Fig. 1: Illustration of Semi-Supervised Learning Techniques

Problem: Utilities don’t know which phase customers are connected to — this affects voltage regulation, DER integration, and fault localization. • Challenge: Ground truth phase data is scarce, unreliable, and costly to collect. • Supervised learning ML methods need lots of labeled data – often unavailable or unreliable. • Motivation: How do we scale phase identification without needing tons of labeled data?

add techniques

Contribution

Our approach incorporates:

  • Self-training with an ensemble of multilayer perceptron classifiers.
  • Label spreading to propagate labels based on data similarity.
  • Bayesian Neural Networks (BNNs) for uncertainty estimation, improving confidence and reducing phase identification errors.

Key Highlights:

  • Achieved ~98% ± 0.08 accuracy on real utility data (Duquesne Light Company) using minimal and unreliable labeled data.
  • Uncertainty-aware predictions reduce misclassification risk and improve smart grid reliability.
  • Combines pseudo-labeling, graph-based SSL, and probabilistic modeling to handle data scarcity in real-world distribution networks.

Our SSL + Uncertainty Estimation approach provides an efficient and scalable solution for phase identification in AMI data, enabling utilities to improve modeling, simulation, and operational decision-making.


Semi-Supervised Learning Framework

We formulate SSL as a regularized optimization problem:Equation 1

\[ \min_{f \in \mathcal{F}} \left[ \frac{1}{n_L} \sum_{i=1}^{n_L} \ell(f(x_i), y_i) + \lambda R_u(f, \mathcal{D}_U) \right] \tag{1}\]

Where:

  • ( (, ) ): Supervised loss (cross-entropy)
  • ( R_u(f, _U) ): Unsupervised regularization term
  • ( ): Trade-off parameter

The challenge is designing ( R_u(f, _U) ) to effectively leverage unlabeled data.

\[\begin{equation} f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k} (\#eq:binom) \end{equation}\] You may refer to using \@ref(eq:binom), like see Equation @ref(eq:binom).

We formulate SSL as a regularized optimization problem:

\[ \min_{f \in \mathcal{F}} \left[ \frac{1}{n_L} \sum_{i=1}^{n_L} \ell(f(x_i), y_i) + \lambda R_u(f, \mathcal{D}_U) \right] \] Where \[a^2 + b^2 = d^2\] is the Unsupervised regularization term

Where: \[ R_u(f, \mathcal{D}_U) \] is the - ( (, ) ): Supervised loss (cross-entropy)
- ( R_u(f, _U) ): Unsupervised regularization term
- ( ): Trade-off parameter

The challenge is designing ( R_u(f, _U) ) to effectively leverage unlabeled data.

  • \[\ell(\cdot, \cdot)\]: Supervised loss (cross-entropy)

  • \[R_u(f, \mathcal{D}_U)\]: Unsupervised regularization term

  • \[\lambda\]: Trade-off parameter

  • ( (, ) ): Supervised loss (cross-entropy)

  • ( R_u(f, _U) ): Unsupervised regularization term

  • ( ): Trade-off parameter

We formulate SSL as a regularized optimization problem: min_{f∈ℱ} [ (1/n_L) ∑^{n_L}_{i=1} ℓ(f(x_i), y_i) + λR_u(f, 𝒟_U) ] Where:

ℓ(·,·): Supervised loss (cross-entropy) R_u(f, 𝒟_U): Unsupervised regularization term λ: Trade-off parameter

The challenge: designing R_u(f, 𝒟_U) that effectively exploits unlabeled data structure ## Methodology

We define the binomial distribution as:(Equation 2)

\[ f(k) = \binom{n}{k} p^k (1-p)^{n-k} \tag{2}\]

As shown in Equation @ref(eq:binom), the binomial function defines the probability…

Black-Scholes (Equation 3) is a mathematical model that seeks to explain the behavior of financial derivatives, most commonly options:

\[ \frac{\partial \mathrm C}{ \partial \mathrm t } + \frac{1}{2}\sigma^{2} \mathrm S^{2} \frac{\partial^{2} \mathrm C}{\partial \mathrm S^2} + \mathrm r \mathrm S \frac{\partial \mathrm C}{\partial \mathrm S}\ = \mathrm r \mathrm C \tag{3}\]

  1. Proposed SSL Framework Applied to AMI Data

    SSL Framework

  2. Distribution Feeder Topology

    Network Toplogy

  3. Training and Testing Data Partitions

    Data Split

  4. Accuracy Comparison of SSL Methods

Accuracy Comparison


Presentation

View Fullscreen Poster