|𝔻⟩irac's Student: July 2021

Let me first start with how I understand what Bayes theorem about; a more detailed introduction can be found in book by D.S. Sivia [1]. The main goal is to use it to asses what the probability is for a guess, hypothesis, or model — the posterior probability — when you select a probability function (via information you know) for some data or input given a target (e.g., guess) — the likelihood probability describing the data — as well as a probability function describing the guess, hypothesis, or model — the prior probability assumption. In mathematical form this is written as:

$$ \underbrace{\text{prob}(H|X,I)}_{\text{posterior}} \propto \underbrace{\text{prob}(X|H,I)}_{\text{likelihood}} \times \underbrace{\text{prob}(H|I)}_{\text{prior}} $$,

where $H$ is the guess, hypothesis, or model parameters, $X$ is the actual observed outcome, results, or data, and $I$ is ancillary information that we know but may not be key. The term on the left-hand side of the equality is called the posterior probability density function (pdf), it provides the probability information about $H$ given $X$ (and $I$). The proportionality is used rather because I haven't included the normalizing term the marginalization -- which is $\text{prob}(X|I)$ and is not always trivially known since we may have sufficient $I$ to describe it.

In the application of best parameter reliability and uncertainty using Bayes theorem, one seeks to find the optimal parameters for $H$. The general idea would be:

Take the $\log$ of the posterior pdf ($L = \log \;\text{prob}(H|X)$) to handle different scales of $H$ and expand using a Taylor Series expansion around an optimal parameters $H_o$, namely: $$ L(H) = C + (H-H_o)\frac{dL}{dH} + (H-H_o)^2 \frac{d^2L}{dH} + \cdots $$
Given that the linear term will by nature of the expansion be zero. In addition from calculus the optimal value $H_o$ will be found when derivative $\frac{dL}{dH}|_{H_o}=0$, thus when $L$ and $\frac{dL}{dH}|_{H_o}$ can be evaluated analytically, it is a straightforward to calculate $H_o$.
A key poin is the non-linear terms, more specifically the quadratic term, can be used to determine the uncertainty of error on the optimal value $H_o$.

So what does this do if I want to know the parameters reliability of a hypothesis, guess, or model. Well if we take the exponential of the expression in item 1, and ignore higher-order terms beyond the quadratic term, we obtain the probability distribution:

$$ \text{prob}(H|X,I) \propto A \exp \left(\frac{1}{2}\frac{d^2 L}{dH^2}\bigg|_{H_o} \left(H-H_o \right)^2 \right) $$

where $A$ is just a constant determined and upon comparison to a Gaussian/Normal distribution, we find that the terms correspond,

$$ \begin{align} \mu &= H_o \\ \sigma &= \left(-\frac{d^2 L}{dH^2}\bigg|_{H_o}\right)^{-\frac{1}{2}} \end{align} $$,

therefore the optimal parameters occur where $\frac{dL}{dH}|_{H_o} = 0 $ and $\frac{d^2 L}{dH^2}|_{H_o} \lt 0$. This provides the reliability about the parameters of $H$ corresponding to $X$, or more succinctly, we can infer the quality of fitting parameters of observed data by utilizing probability distribution functions and update our knowledge using Bayes theorem. Ultimately this corresponds to calculating the familiar mean and standard deviation, $\mu$ and $\sigma$, for some set of guess, hypothesis, or model given some actual observation, data, or results.

The quote I picked is from Nate Silver, a political statistician, that captures the essence of Bayes' theorem and how it is such a natural though process to use in science: