Bayesian theory

In this section the concepts for using Bayesian techniques for model selection will be discussed. The first part discusses Bayes theorm and how it can be used to get the probability of the model given the data. The second part is on the odds factor and how to interpret the posterior probability for model selection.

Bayes Theorem

Bayesian inference is used to calculate the whole posterior probability distribution function (PDF). The equation for the posterior probability can be written as

(1)\[P(\underline{\theta} | D, M) = P(D | \underline{\theta}, M)\frac{P(D | \underline{\theta}, M)}{P(D | M)},\]

where \(\underline{\theta}\) is a vector of model parameters, \(M\) is the model and \(D\) is the data. \(P(\underline{\theta} | M)\) is the prior distribution and represents current knowledge of the system. \(P(D | M)\) is the evidence and acts as a normalisation for the posterior. The shape and spread of the posterior PDF provides insight into the model and the parameter. A narrow posterior PDF suggests that the parameter is well defined or has little evidence of variation. A broad posterior PDF could mean that the model is insensitive to that specific parameter. If the posterior PDF is non-symmetric, then the most likely value is still the peak of the distribution. However, it is more likely for the parameter to have a value higher/lower than the peak of the distribution.

Calculating the full posterior PDF can be achieved by using Markov Chain Monte Carlo (MCMC) or nested sampling. Essentially these methods will sample the posterior PDF directly, allowing them to generate the full posterior PDF.

Bayesian model selection use Bayes theorm to calculate the probability, \(P\) of the data \(D\) given the model \(M\)

(2)\[P(D|M) = \int_\Omega P(D| \underline{\theta}, M)P( \underline{\theta}|M)\mathrm{d\underline{\theta}}.\]

where the \(\underline{\theta}\) are the parameters and the integral is over all possible values for the parameters, \(\Omega\). This quantity is known as the marginal likelihood or the model evidence.

Odds factor

One method for comparing models is known as the odds factor. It assumes that you can calculate the probability of the data given the \(i^{\mathrm{th}}\) model (\(M_i\)), by taking the ratio of two different models. This section will use a derivation based on the work from here.

For model selection we want the model posterior

\[P(M | D) = P(D | M) \frac{P(M)}{P(D)},\]

where \(P(D | M)\) is the probability of the data given the model, \(P(M)\) is the probability of the model and \(P(D)\) the probability of the data. The probability of the data will be the same for all models, so by taking a ratio it cancels out

(3)\[O_{21} = \frac{P(M_2 | D)}{P(M_1 | D)} = \frac{P(D | M_2)P(M_2)}{P(D | M_1)P(M_1)}\]

where \(0_{21}\) is the odds factor for models two (\(M_2\)) and one (\(M_1\)). Assuming that there is no prior knowledge then \(P(M_1) \approx P(M_2)\). Then equation (3) can be simplified to

\[O_{21} = \frac{P(D | M_2)}{P(D | M_1)},\]

which is known as the Bayes factor. Alternatively, the Bayesian probability for the \(j^\mathrm{th}\) model can be written as

\[P(M_j | D) = \frac{ P(D | M_j)}{ \sum_k P(D | M_k)}.\]

To evaluate the odds factor, the probability of the data given the model needs to be calculated. This is typically written as

(4)\[P(D | M) = \int_\Omega d\underline{\theta} \quad P(D| \underline{\theta}, M)P(\underline{\theta} | M)\]

where the integral over \(\Omega\) is over the available parameter space for \(\underline{\theta}\). This quantity can be evaluated directly using either Markov Chain Monte Carlo (MCMC) or nested sampling.