In the MixMatrix package, there are two functions for training a linear or quadratic classifier. The usage is fairly similar to the function MASS::lda() or MASS::qda(), however it requires the input as an array and the group variable provided as a vector (that is, it cannot handle data frames or the formula interface directly, which is reasonable, as there is no immediately clear way to make that work for a collection of matrices).

set.seed(20180222)
library('MixMatrix')
A <- rmatrixnorm(30, mean = matrix(0, nrow=2, ncol=3)) # creating the three groups
B <- rmatrixnorm(30, mean = matrix(c(1, 0), nrow = 2, ncol = 3))
C <- rmatrixnorm(30, mean = matrix(c(0, 1), nrow = 2, ncol = 3))
ABC <- array(c(A,B,C), dim = c(2,3,90)) # combining into on array
groups <- factor(c(rep("A", 30), rep("B", 30), rep("C", 30))) # labels
prior = c(30, 30, 30) / 90 # equal prior
matlda <- matrixlda(x = ABC, grouping = groups, prior = prior) # perform LDA
matqda <- matrixqda(x = ABC, grouping = groups, prior = prior) # perform QDA

This does not sphere the data or extract an SVD or Fisher discriminant scores - it is a simple linear/quadratic discriminant function based on the likelihood function.

The matrixlda function presumes equal covariance matrices among the groups while matrixqda fits separate covariance for each groups.

It is possible to set variance or mean restrictions from the MLmatrixnorm and MLmatrixt functions using the ... argument. These restrictions are applied to all groups.

The predict method for these objects works in much the same way as for lda or qda objects: provide the function and the new data, then it will return the MAP class assignments and the posterior. If no new data is provided, it will attempt to run it on the original data if it is available in the environment.

ABC[, , c(1, 31, 61)] # true class memberships: A, B, C
#> , , 1
#> 
#>             [,1]       [,2]      [,3]
#> [1,] -0.07020924  0.8559433 0.9827602
#> [2,]  1.30097854 -0.6893054 0.2830575
#> 
#> , , 2
#> 
#>           [,1]     [,2]        [,3]
#> [1,] 2.2374662 2.408268 -0.37053822
#> [2,] 0.9653824 0.630897 -0.08850761
#> 
#> , , 3
#> 
#>            [,1]       [,2]      [,3]
#> [1,] -1.6961217 -1.5642705 0.2045711
#> [2,]  0.7782369  0.1842012 0.3869149
#predict the membership of the first observation of each group
predict(matlda, ABC[, , c(1, 31, 61)])
#> $class
#> [1] B B A
#> Levels: A B C
#> 
#> $posterior
#>            [,1]        [,2]       [,3]
#> [1,] 0.27340107 0.690217317 0.03638161
#> [2,] 0.03833953 0.949049289 0.01261118
#> [3,] 0.54647035 0.001273576 0.45225607
#predict the membership of the first observation of each group
predict(matqda,  ABC[, , c(1, 31, 61)])
#> $class
#> [1] B B A
#> Levels: A B C
#> 
#> $posterior
#>            [,1]        [,2]       [,3]
#> [1,] 0.24302341 0.735815885 0.02116070
#> [2,] 0.03295848 0.963641160 0.00340036
#> [3,] 0.54611977 0.007871269 0.44600896

In this example, points from classes A, B, and C were selected and they ended up being classified as B, B, and A. The QDA and LDA rules had only minor disagreements, which is to be expected when they do truly have the same covariances.

Details of the modeling choices

Suppose there are two populations, π1\pi_1 and π2\pi_2 with prior probabilities of belonging to these classes, p1p_1 and p2p_2. Define a function, c(1|2)c(1|2) as the cost of misclassifying a member of population π2\pi_2 as a member of class 11 (and vice versa). Further, define P(1|2)P(1|2) as the probability of classifying a member of population π2\pi_2 as a member of class 11 (and vice versa). Then we define the expected cost of misclassification as:

ECM=c(2|1)P(2|1)p1+c(1|2)P(1|2)p2ECM = c(2|1)P(2|1)p_1 + c(1|2)P(1|2)p_2

Classification Rule

A reasonable classification rule based on ECM is the following:

Classify as class 11 if:

f1(x)f2(x)c(1|2)p2c(2|1)p1 \frac{f_1(x)}{f_2(x)} \geq \frac{c(1|2) p_2}{c(2|1)p_1}

Where fi(x)f_i(x) is the probability density function for group πi\pi_i evaluated at xx.

Matrix Variate Normal Populations

Recall the probability density function for a matrix variate normal distribution:

f(𝐗;𝐌,𝐔,𝐕)=exp(12tr[𝐕1(𝐗𝐌)T𝐔1(𝐗𝐌)])(2π)np/2|𝐕|n/2|𝐔|p/2f(\mathbf{X};\mathbf{M}, \mathbf{U}, \mathbf{V}) = \frac{\exp\left( -\frac{1}{2} \, \mathrm{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right] \right)}{(2\pi)^{np/2} |\mathbf{V}|^{n/2} |\mathbf{U}|^{p/2}}

𝐗\mathbf{X} and 𝐌\mathbf{M} are n×pn \times p, 𝐔\mathbf{U} is n×nn \times n and describes the covariance relationship between the rows, and 𝐕\mathbf{V} is p×pp \times p and describes the covariance relationship between the columns.

Estimated Minimum ECM Rule for Two Matrix Variate Normal Populations

A decision rule for this case:

R(𝐗)=trace[12(𝐕11𝐗T𝐔11𝐗𝐕21𝐗T𝐔21𝐗)+(𝐕11𝐌1T𝐔11𝐕21𝐌2T𝐔21)𝐗12(𝐕11𝐌1T𝐔11𝐌1𝐕21𝐌2T𝐔21𝐌2)]12(p(log|𝐔_1|log|𝐔_2|)+n(log|𝐕_1|log|𝐕_2|))\begin{eqnarray} R(\mathbf{X}) & = & \mathrm{trace}\big[ -\frac{1}{2}(\mathbf{V}_1^{-1} \mathbf{X}^{T} \mathbf{U}_1^{-1} \mathbf{X} - \mathbf{V}_2^{-1} \mathbf{X}^{T} \mathbf{U}_2^{-1} \mathbf{X}) \\ & & +(\mathbf{V}_1^{-1} \mathbf{M}_1^{T} \mathbf{U}_1^{-1} - \mathbf{V}_2^{-1} \mathbf{M}_2^{T} \mathbf{U}_2^{-1}) \mathbf{X} \\ & & -\frac{1}{2}(\mathbf{V}_1^{-1} \mathbf{M}_1^{T} \mathbf{U}_1^{-1} \mathbf{M}_1 - \mathbf{V}_2^{-1} \mathbf{M}_2^{T} \mathbf{U}_2^{-1} \mathbf{M}_2) \big] \\ & & -\frac{1}{2}(p (\log|\mathbf{U}\_1|-\log|\mathbf{U}\_2|)+ n(\log|\mathbf{V}\_1|-\log|\mathbf{V}\_2|) ) \end{eqnarray}

How to classify based on this:

If R(𝐗)log(c(1|2)p2)log(c(2|1)p1)R(\mathbf{X}) \geq \log(c(1|2)p_2) - \log(c(2|1)p_1), assign 𝐗\mathbf{X} to group 11, otherwise assign to group 22.

In the multivariate case, this is equivalent to the LDA/QDA rules - term (1) is the quadratic form which vanishes in case of equal covariances between groups, term (2) is the LDA term, and terms (3) and (4) are constants which depend on the parameters and not X\mathrm{X}.

Typically, the models we have used have implicitly used an equal probability prior and an equal cost of misclassification, but other inputs can be specified. In case of equal priors and equal cost of misclassification, this term is 0.

If there are equal covariances:

If the two groups have the same covariances, then this simplifies. The classification rule is then: R(𝐗)=trace[(𝐕1(𝐌1𝐌2)T𝐔1)𝐗12(𝐕1𝐌1T𝐔1𝐌1𝐕1𝐌2T𝐔1𝐌2)]log(c(1|2)p2)log(c(2|1)p1)\begin{eqnarray} R(\mathbf{X}) & = & \mathrm{trace}\big[ (\mathbf{V}^{-1} (\mathbf{M}_1 -\mathbf{M}_2)^{T}\mathbf{U}^{-1}) \mathbf{X} \\ & & -\frac{1}{2}(\mathbf{V}^{-1} \mathbf{M}_1^{T} \mathbf{U}^{-1} \mathbf{M}_1 - \mathbf{V}^{-1} \mathbf{M}_2^{T} \mathbf{U}^{-1} \mathbf{M}_2) \big] \\ & \geq & \log(c(1|2)p_2) - \log(c(2|1)p_1) \end{eqnarray}

Classify to group 11 if the last term is true. Note this is linear in 𝐗\mathbf{X}. This is directly analogous to LDA in the multivariate case.

Generalizing to multiple classes

The factor RR for each group gg in a QDA setting is:

Rg(𝐗)=trace[12(𝐕g1𝐗T𝐔g1𝐗+(𝐕g1𝐌gT𝐔g1)𝐗12(𝐕g1𝐌gT𝐔g1𝐌g)]12(p(log|𝐔_g|)+n(log|𝐕_g|))+logpg\begin{eqnarray} R_g(\mathbf{X}) & = & \mathrm{trace}\big[ -\frac{1}{2}(\mathbf{V}_g^{-1} \mathbf{X}^{T} \mathbf{U}_g^{-1} \mathbf{X} +(\mathbf{V}_g^{-1} \mathbf{M}_g^{T} \mathbf{U}_g^{-1}) \mathbf{X} -\frac{1}{2}(\mathbf{V}_g^{-1} \mathbf{M}_g^{T} \mathbf{U}_g^{-1} \mathbf{M}_g) \big] \\ & & -\frac{1}{2}(p (\log|\mathbf{U}\_g|)+ n(\log|\mathbf{V}\_g|) ) + \log p_g \end{eqnarray} When Ui=UjU_i = U_j for all groups i,ji,j, several of these terms cancel. The posterior probability is:

P(𝐗g)=exp(Rg(𝐗))iexp(Ri(𝐗)) P(\mathbf{X} \in g) = \frac{ \exp (R_g (\mathbf{X}))}{\sum_i \exp (R_i(\mathbf{X}))} with the bottom sum being over all groups ii.

Structure of the objects

The objects returned by matrixlda and matrixqda are S3 classes. See below example output:

matlda
#> $prior
#>         A         B         C 
#> 0.3333333 0.3333333 0.3333333 
#> 
#> $counts
#>  A  B  C 
#> 30 30 30 
#> 
#> $means
#> , , 1
#> 
#>            [,1]        [,2]       [,3]
#> [1,] 0.08629672 -0.06362916 -0.3045685
#> [2,] 0.03851982 -0.01375580  0.3978759
#> 
#> , , 2
#> 
#>           [,1]       [,2]       [,3]
#> [1,] 1.2872487  1.1199220 1.01282774
#> [2,] 0.3044151 -0.3797203 0.06539076
#> 
#> , , 3
#> 
#>            [,1]       [,2]       [,3]
#> [1,] 0.07377351 -0.3278233 -0.3031961
#> [2,] 0.92568372  1.3738475  0.9353985
#> 
#> 
#> $scaling
#> [1] 1.002299
#> 
#> $U
#>            [,1]       [,2]
#> [1,] 1.00000000 0.03243759
#> [2,] 0.03243759 0.97819549
#> 
#> $V
#>             [,1]        [,2]        [,3]
#> [1,]  1.00000000  0.01988966 -0.04879263
#> [2,]  0.01988966  0.93839973 -0.04563610
#> [3,] -0.04879263 -0.04563610  0.93321444
#> 
#> $lev
#> [1] "A" "B" "C"
#> 
#> $N
#> [1] 90
#> 
#> $method
#> [1] "normal"
#> 
#> $nu
#> NULL
#> 
#> $call
#> matrixlda(x = ABC, grouping = groups, prior = prior)
#> 
#> attr(,"class")
#> [1] "matrixlda"

matqda
#> $prior
#>         A         B         C 
#> 0.3333333 0.3333333 0.3333333 
#> 
#> $counts
#>  A  B  C 
#> 30 30 30 
#> 
#> $means
#> , , 1
#> 
#>            [,1]        [,2]       [,3]
#> [1,] 0.08629672 -0.06362916 -0.3045685
#> [2,] 0.03851982 -0.01375580  0.3978759
#> 
#> , , 2
#> 
#>           [,1]       [,2]       [,3]
#> [1,] 1.2872487  1.1199220 1.01282774
#> [2,] 0.3044151 -0.3797203 0.06539076
#> 
#> , , 3
#> 
#>            [,1]       [,2]       [,3]
#> [1,] 0.07377351 -0.3278233 -0.3031961
#> [2,] 0.92568372  1.3738475  0.9353985
#> 
#> 
#> $U
#> , , 1
#> 
#>            [,1]       [,2]
#> [1,] 1.00000000 0.02620514
#> [2,] 0.02620514 0.95989647
#> 
#> , , 2
#> 
#>            [,1]       [,2]
#> [1,]  1.0000000 -0.0173514
#> [2,] -0.0173514  0.9456145
#> 
#> , , 3
#> 
#>          [,1]     [,2]
#> [1,] 1.000000 0.129851
#> [2,] 0.129851 1.061265
#> 
#> 
#> $V
#> , , 1
#> 
#>             [,1]        [,2]        [,3]
#> [1,]  1.05498245 -0.07418973  0.09476411
#> [2,] -0.07418973  0.98800103 -0.04390729
#> [3,]  0.09476411 -0.04390729  0.79903040
#> 
#> , , 2
#> 
#>             [,1]       [,2]        [,3]
#> [1,]  1.09941477 0.12678075 -0.03952663
#> [2,]  0.12678075 0.94606849  0.03910645
#> [3,] -0.03952663 0.03910645  0.85913748
#> 
#> , , 3
#> 
#>              [,1]         [,2]       [,3]
#> [1,]  0.833193637  0.008466708 -0.2084547
#> [2,]  0.008466708  0.875328749 -0.1142510
#> [3,] -0.208454704 -0.114250988  1.1520533
#> 
#> 
#> $lev
#> [1] "A" "B" "C"
#> 
#> $N
#> [1] 90
#> 
#> $method
#> [1] "normal"
#> 
#> $nu
#> NULL
#> 
#> $call
#> matrixqda(x = ABC, grouping = groups, prior = prior)
#> 
#> attr(,"class")
#> [1] "matrixqda"

Session info

This vignette was built using rmarkdown.

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MixMatrix_0.2.8
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     desc_1.4.3        R6_2.5.1          fastmap_1.2.0    
#>  [5] xfun_0.47         glue_1.7.0        cachem_1.1.0      knitr_1.48       
#>  [9] htmltools_0.5.8.1 rmarkdown_2.28    lifecycle_1.0.4   cli_3.6.3        
#> [13] pkgdown_2.1.1     sass_0.4.9        textshaping_0.4.0 jquerylib_0.1.4  
#> [17] systemfonts_1.1.0 compiler_4.4.1    tools_4.4.1       ragg_1.3.3       
#> [21] evaluate_1.0.0    bslib_0.8.0       Rcpp_1.0.13       yaml_2.3.10      
#> [25] jsonlite_1.8.9    rlang_1.1.4       fs_1.6.4

All the code for easy copying

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(20180222)
library('MixMatrix')
A <- rmatrixnorm(30, mean = matrix(0, nrow=2, ncol=3)) # creating the three groups
B <- rmatrixnorm(30, mean = matrix(c(1, 0), nrow = 2, ncol = 3))
C <- rmatrixnorm(30, mean = matrix(c(0, 1), nrow = 2, ncol = 3))
ABC <- array(c(A,B,C), dim = c(2,3,90)) # combining into on array
groups <- factor(c(rep("A", 30), rep("B", 30), rep("C", 30))) # labels
prior = c(30, 30, 30) / 90 # equal prior
matlda <- matrixlda(x = ABC, grouping = groups, prior = prior) # perform LDA
matqda <- matrixqda(x = ABC, grouping = groups, prior = prior) # perform QDA
ABC[, , c(1, 31, 61)] # true class memberships: A, B, C
#predict the membership of the first observation of each group
predict(matlda, ABC[, , c(1, 31, 61)])
#predict the membership of the first observation of each group
predict(matqda,  ABC[, , c(1, 31, 61)])

matlda

matqda

sessionInfo()