brms (and
Stan): worked exampleswhere the normalizing constant is the reciprocal of the beta function
\[ \begin{aligned} \frac{\Gamma(m + \alpha)\Gamma(n - m + \beta)}{\Gamma(n + \alpha+\beta)} &= \int \theta^{\alpha + m - 1} (1-\theta)^{ \beta + n - m - 1}\ d\theta. \end{aligned} \]
## $mean
## [1] 0.02074689
##
## $var
## [1] 2.80614e-05
##
## $sd
## [1] 0.005297301
##
## $mode
## [1] 0.01941748
##
## $qi
## [1] 0.01167346 0.03232026
Posterior intervals, such the High Posterior Density (HPD) interval or the quantile interval, provide ranges that contain specified probability mass. For example, the 0.95 interval is the range of values that contain 0.95 of the probability mass of the distribution.
The \(\varphi\) HPD interval for the probability density function \(\mathrm{P}(x)\) is computed by finding a probability density value \(p^*\) such that \[\mathrm{P}(\{x \colon \mathrm{P}(x) \geq p^*\}) = \varphi.\]
In other words, we find the value \(p^*\) such that the probability mass of the set of points whose density is greater than than \(p^*\) is exactly \(\varphi\).
The \(\varphi\) quantile interval ranges from the \((1-\varphi)/2\) percentile to the \(\varphi + (1-\varphi)/2\) percentile.
In general, the HPD is not trivial to compute. The quantile interval is easily computed from the cumulative density function. If the posterior is symmetric, the HPD and the quantile interval are identical.
In general, if we can evaluate the likelihood \(\mathrm{P}(\mathcal{D}\vert\theta)\) and the prior \(\mathrm{P}(\theta)\) at every possible value of parameter space, we can draw samples from the posterior distribution \(\mathrm{P}(\theta\vert\mathcal{D})\).
Let us denote the posterior \(\mathrm{P}(\theta\vert\mathcal{D})\) by \(f(\theta)\).
We sample from a symmetric proposal distribution \(Q(\cdot\vert\cdot)\).
We start with an initial \(\tilde{\theta}_0\), and sample \(\tilde{\theta} \sim Q(\theta\vert\tilde{\theta}_0)\).
We then accept \(\tilde{\theta}\) with probability \[\alpha = \min\left(1.0, \frac{f(\tilde{\theta})}{f(\tilde{\theta}_0)}\right).\]
After convergence, the accepted samples are draws from the distribution \(f(\theta)\).
For this sampling, the distribution \(f(\theta)\) need be known only up to a proportional constant.