Notes on Discriminant Analysis for Matrix Variate Distributions

I have some brief notes for a discussion here so I’m posting them even though they’re a little incomplete because why not? Two-class classification for matrix variate normal distributions.

Expected Cost of Misclassification

ECM is expected cost of misclassification. Suppose there are two populations, \(\pi_1\) and \(\pi_2\) with prior probabilities of belonging to these classes, \(p_1\) and \(p_2\). Define a function, \(c(1|2)\) as the cost of misclassifying a member of population \(\pi_2\) as a member of class \(1\) (and vice versa). Further, define \(P(1|2)\) as the probability of classifying a member of population \(\pi_2\) as a member of class \(1\) (and vice versa). Then we define the expected cost of misclassification as:

\[ECM = c(2|1)P(2|1)p_1 + c(1|2)P(1|2)p_2 \]

Classification Rule

A reasonable classification rule based on ECM is the following:

Classify as class \(1\) if:

\[ \frac{f_1(x)}{f_2(x)} \geq \frac{c(1|2) p_2}{c(2|1)p_1} \]

Where \(f_i(x)\) is the probability density function for group \(\pi_i\) evaluated at \(x\).

Matrix Variate Normal Populations

Recall the probability density function for a matrix variate normal distribution:

\[f(\mathbf{X};\mathbf{M}, \mathbf{U}, \mathbf{V}) = \frac{\exp\left( -\frac{1}{2} \, \mathrm{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right] \right)}{(2\pi)^{np/2} |\mathbf{V}|^{n/2} |\mathbf{U}|^{p/2}} \]

\(\mathbf{X}\) and \(\mathbf{M}\) are \(n \times p\), \(\mathbf{U}\) is \(n \times n\) and describes the covariance relationship between the rows, and \(\mathbf{V}\) is \(p \times p\) and describes the covariance relationship between the columns.

Estimated Minimum ECM Rule for Two Matrix Variate Normal Populations

A decision rule for this case:

\[\begin{eqnarray} R(\mathbf{X}) & = & \mathrm{trace}\big[ -\frac{1}{2}(\mathbf{V}_1^{-1} \mathbf{X}^{T} \mathbf{U}_1^{-1} \mathbf{X} - \mathbf{V}_2^{-1} \mathbf{X}^{T} \mathbf{U}_2^{-1} \mathbf{X}) \\ & & +(\mathbf{V}_1^{-1} \mathbf{M}_1^{T} \mathbf{U}_1^{-1} - \mathbf{V}_2^{-1} \mathbf{M}_2^{T} \mathbf{U}_2^{-1}) \mathbf{X} \\ & & -\frac{1}{2}(\mathbf{V}_1^{-1} \mathbf{M}_1^{T} \mathbf{U}_1^{-1} \mathbf{M}_1 - \mathbf{V}_2^{-1} \mathbf{M}_2^{T} \mathbf{U}_2^{-1} \mathbf{M}_2) \big] \\ & & -\frac{1}{2}(p (\log|\mathbf{U}_1|-\log|\mathbf{U}_2|)+ n(\log|\mathbf{V}_1|-\log|\mathbf{V}_2|) ) \end{eqnarray}\]

How to classify based on this:

If \(R(\mathbf{X}) \geq \log(c(1|2)p_2) - \log(c(2|1)p_1)\), assign \(\mathbf{X}\) to group \(1\), otherwise assign to group \(2\).

In the multivariate case, this is equivalent to the LDA/QDA rules - term (1) is the quadratic form which vanishes in case of equal covariances between groups, term (2) is the LDA term, and terms (3) and (4) are constants which depend on the parameters and not \(\mathrm{X}\).

Typically, the models we have used have implicitly used an equal probability prior and an equal cost of misclassification, but other inputs can be specified. In case of equal priors and equal cost of misclassification, this term is 0.

If there were equal covariances:

If the two groups have the same covariances, then this simplifies. The classification rule is then: \[\begin{eqnarray} R(\mathbf{X}) & = & \mathrm{trace}\big[ (\mathbf{V}^{-1} (\mathbf{M}_1 -\mathbf{M}_2)^{T}\mathbf{U}^{-1}) \mathbf{X} \\ & & -\frac{1}{2}(\mathbf{V}^{-1} \mathbf{M}_1^{T} \mathbf{U}^{-1} \mathbf{M}_1 - \mathbf{V}^{-1} \mathbf{M}_2^{T} \mathbf{U}^{-1} \mathbf{M}_2) \big] \\ & \geq & \log(c(1|2)p_2) - \log(c(2|1)p_1) \end{eqnarray}\]

Classify to group \(1\) if the last term is true. Note this is linear in \(\mathbf{X}\). This is directly analogous to LDA in the multivariate case.