Probabilistic neural networks in a nutshell

Feed-forward artificial neural networks that are closely related to kernel density estimation (KDE)

Miguel Ángel Cárdenas
4 min readJun 7, 2021
Photo by Naser Tamimi on Unsplash

Probabilistic neural networks (PNN) are a type of feed-forward artificial neural network that is closely related to kernel density estimation (KDE) via Parzen-window that asymptotically approaches Bayes optimal risk minimization. This technique is widely used to estimate class-conditional densities (also known as likelihood) in machine learning tasks such as supervised learning.

The neural network that was introduced by Specht is composed of four layers:

  • Input layer: Features of data points (or observations)
  • Pattern layer: Calculation of the class-conditional PDF
  • Summation layer: Summation of the inter-class patterns
  • Output layer: Hypothesis testing with the maximum a posteriori probability (MAP)

In order to understand the back-bone mechanism of the PNN, one have to look back to Bayes theorem. Suppose that the goal is to is to built a Bayes classifier, where X and Θ are independent and identically distributed (i.i.d) random variables (r.v).

whereas finding the likelihood probability density function (PDF) could be a challenging problem; using the Parzen-window method to calculate it, tackles down this problem elegantly and reliably. Therefore, if the parameters of the likelihood PDF are known, it will be easy to infer the posterior probability.

The Parzen-window is a non-parametric method to estimate the PDF for a specific observation given a data set; conversely, this doesn’t require prior knowledge about the underlying distribution. This window has a weighting function Φ and smoothing function h(n). (For further knowledge about KDE visit Sebastian Raschke webpage)

Using the normal distribution as a weighting function leads us to the following equation, normalized by the total number of class conditional observations.

In a multivariate problem, Σ is a diagonal matrix that contains the covariance of each feature.

For a better understanding, take for instance a simple univariate case study. Suppose that X is an i.i.d random variable that is composed of a set of binomial class data. Assume that σ=1, and an unclassified observation x=3.

Let Θ be a Bernoulli random variable that indicates the binomial class hypotheses, and let P(Θ) equally likely. Under the hypothesis Θ=1, the random variable X has a PDF defined by:

Under the alternative hypothesis Θ=2, X has a normal distribution with mean 2 and variance 1.

Therefore, a solution of x, that satisfies the boundary condition, can be found numerically. This is an optimal solution, that minimizes the misclassification rate. A proxy visual representation of the hypothesis test of class conditional functions is shown below.

The decision boundary of the PNN is given by:

The figure below shows the decision boundary and the error conditional probability (shaded region).

Finally, having observed x is chosen an estimate that maximizes the posterior PDF overall Θ, via MAP.

Given the MAP estimator, the outcome will be y2(x)=0.0011 < 0.2103 = y1(x), thus, the observation will be classified as Θ=1.

To compare with other machine learning algorithms was created a python class that matches the structure of SciKit Learn algorithms. Using the default benchmark composed of 3 synthetic datasets was made a comparison with a Gaussian process and the Nearest Neighbors classifiers. The image below shows the results achieved measured by the accuracy metric.

Check out the PNN repo for more info.

--

--