Software for Matrix Variate LDA and QDA

In the previous post, I had some rough notes on classification of matrix variate data. In the matrixdist package, I now have some functions for training a linear or quadratic classifier. The usage is pretty similar to the function MASS::lda() or MASS::qda(), however it requires the input as an array or list of matrices and the group variable provided as a vector (that is, it cannot handle data frames or the formula interface directly, which is reasonable, as there is no immediately clear way to make that work for a collection of matrices - anybody using it would have to roll their own solutions anyway).

set.seed(20180222)
library('MixMatrix')
A <- rmatrixnorm(30, mean = matrix(0, nrow=2, ncol=3))
B <- rmatrixnorm(30, mean = matrix(c(1,0), nrow=2, ncol=3))
C <- rmatrixnorm(30, mean = matrix(c(0,1), nrow=2, ncol=3))
ABC <- array(c(A,B,C), dim = c(2,3,90))
groups <- factor(c(rep("A",30),rep("B",30),rep("C",30)))
prior = c(30,30,30)/90
matlda <- matrixlda(x = ABC,grouping = groups, prior = prior)
matqda <- matrixqda(x = ABC,grouping = groups, prior = prior)

This does not sphere the data or extract an SVD or Fisher discriminant scores - it is a simple linear/quadratic discriminant function based on the likelihood function (see previous post for details). As such, the reported posteriors are based on the normal distribution.

The matrixlda function presumes equal covariance parameters while matrixqda fits separate covariance parameters.

It is possible to set variance or mean restrictions using the ... argument. These restrictions are applied to all groups.

The predict method for these objects works in much the same way as for lda or qda objects: provide the function and the new data, then it will return the MAP class assignments and the posterior. If no new data is provided, it will attempt to run it on the original data if it is available in the environment.

ABC[,,c(1,31,61)] # true class memberships: A, B, C

## , , 1
## 
##             [,1]       [,2]      [,3]
## [1,] -0.07020924  0.8559433 0.9827602
## [2,]  1.30097854 -0.6893054 0.2830575
## 
## , , 2
## 
##           [,1]     [,2]        [,3]
## [1,] 2.2374662 2.408268 -0.37053822
## [2,] 0.9653824 0.630897 -0.08850761
## 
## , , 3
## 
##            [,1]       [,2]      [,3]
## [1,] -1.6961217 -1.5642705 0.2045711
## [2,]  0.7782369  0.1842012 0.3869149

predict(matlda, ABC[,,c(1,31,61)])

## $class
## [1] B B A
## Levels: A B C
## 
## $posterior
##            [,1]        [,2]       [,3]
## [1,] 0.27340107 0.690217317 0.03638161
## [2,] 0.03833953 0.949049289 0.01261118
## [3,] 0.54647035 0.001273576 0.45225607

predict(matqda,  ABC[,,c(1,31,61)])

## $class
## [1] B B A
## Levels: A B C
## 
## $posterior
##            [,1]        [,2]       [,3]
## [1,] 0.24302341 0.735815885 0.02116070
## [2,] 0.03295848 0.963641160 0.00340036
## [3,] 0.54611977 0.007871269 0.44600896

In this example, points from classes A, B, and C were selected and they ended up being classifed as B, B, and A. The QDA and LDA rules had only minor disagreements, which is to be expected when they do truly have the same covariances.

Mathematical detail

The factor \(R\) for each group \(g\) in a QDA setting is:

\[\begin{eqnarray} R_g(\mathbf{X}) & = & \mathrm{trace}\big[ -\frac{1}{2}(\mathbf{V}_g^{-1} \mathbf{X}^{T} \mathbf{U}_g^{-1} \mathbf{X} +(\mathbf{V}_g^{-1} \mathbf{M}_g^{T} \mathbf{U}_g^{-1}) \mathbf{X} -\frac{1}{2}(\mathbf{V}_g^{-1} \mathbf{M}_g^{T} \mathbf{U}_g^{-1} \mathbf{M}_g) \big] \\ & & -\frac{1}{2}(p (\log|\mathbf{U}_g|)+ n(\log|\mathbf{V}_g|) ) + \log p_g \end{eqnarray}\] When \(U_i = U_j\) for all groups \(i,j\), several of these terms cancel. The posterior probability is:

\[ P(\mathbf{X} \in g) = \frac{ \exp (R_g (\mathbf{X}))}{\sum_i \exp (R_i(\mathbf{X}))} \] with the bottom sum being over all groups \(i\).