6.3 Estimating the Error in $\mathbf{u}^0$

To estimate the error in the best-fitting parameter values that we find, we assume $\mathrm{P}\left( \mathbf{u} | \left\{  \mathbf{x}_ i, f_ i, \sigma _ i \right\}  \right)$ to be approximated by an $n_\mathrm {u}$-dimensional Gaussian distribution around $\mathbf{u}^0$. Taking a Taylor expansion of $L(\mathbf{u})$ about $\mathbf{u}^0$, we can write:

  $\displaystyle  L(\mathbf{u})  $ $\displaystyle  =  $ $\displaystyle  L(\mathbf{u}^0) + \underbrace{ \sum _{i=0}^{n_\mathrm {u}-1} \left( u_ i - u^0_ i \right) \left.\frac{\partial L}{\partial u_ i}\right|_{\mathbf{u}^0} }_{\textrm{Zero at $\mathbf{u}^0$ by definition}} + \label{L_ taylor_ expand} $   (6.4)
  $\displaystyle  $ $\displaystyle  $ $\displaystyle  \sum _{i=0}^{n_\mathrm {u}-1} \sum _{j=0}^{n_\mathrm {u}-1} \frac{\left( u_ i - u^0_ i \right) \left( u_ j - u^0_ j \right)}{2} \left.\frac{\partial ^2 L}{\partial u_ i \partial u_ j}\right|_{\mathbf{u}^0} + \mathcal{O}\left( \mathbf{u} - \mathbf{u}^0\right)^3 \nonumber  $    

Since the logarithm of a Gaussian distribution is a parabola, the quadratic terms in the above expansion encode the Gaussian component of the probability distribution $\mathrm{P}\left( \mathbf{u} | \left\{  \mathbf{x}_ i, f_ i, \sigma _ i \right\}  \right)$ about $\mathbf{u}^0$.The use of this is called Gauss’ Method. Higher order terms in the expansion represent any non-Gaussianity in the probability distribution, which we neglect. See MacKay, D.J.C., Information Theory, Inference and Learning Algorithms, CUP (2003). We may write the sum of these terms, which we denote $Q$, in matrix form:

\begin{equation}  Q = \frac{1}{2} \left(\mathbf{u} - \mathbf{u}^0\right)^\mathbf {T} \mathbf{A} \left(\mathbf{u} - \mathbf{u}^0\right) \label{Q_ vector} \end{equation} (6.5)

where the superscript $^\mathbf {T}$ represents the transpose of the vector displacement from $\mathbf{u}^0$, and $\mathbf{A}$ is the Hessian matrix of $L$, given by:

\begin{equation}  A_{ij} = \nabla \nabla L = \left.\frac{\partial ^2 L}{\partial u_ i \partial u_ j}\right|_{\mathbf{u}^0} \end{equation} (6.6)

This is the Hessian matrix which is output by the fit command. In general, an $n_\mathrm {u}$-dimensional Gaussian distribution such as that given by equation () yields elliptical contours of equiprobability in parameter space, whose principal axes need not be aligned with our chosen coordinate axes – the variables $u_0 ... u_{n_ u-1}$. The eigenvectors $\mathbf{e}_ i$ of $\mathbf{A}$ are the principal axes of these ellipses, and the corresponding eigenvalues $\lambda _ i$ equal $1/\sigma _ i^2$, where $\sigma _ i$ is the standard deviation of the probability density function along the direction of these axes.

This can be visualised by imagining that we diagonalise $\mathbf{A}$, and expand equation () in our diagonal basis. The resulting expression for $L$ is a sum of square terms; the cross terms vanish in this basis by definition. The equations of the equiprobability contours become the equations of ellipses:

\begin{equation}  Q = \frac{1}{2} \sum _{i=0}^{n_\mathrm {u}-1} A_{ii} \left(u_ i - u^0_ i\right)^2 = k \end{equation} (6.7)

where $k$ is some constant. By comparison with the equation for the logarithm of a Gaussian distribution, we can associate $A_{ii}$ with $-1/\sigma _ i^2$ in our eigenvector basis.

The problem of evaluating the standard deviations of our variables $u_ i$ is more complicated, however, as we are attempting to evaluate the width of these elliptical equiprobability contours in directions which are, in general, not aligned with their principal axes. To achieve this, we first convert our Hessian matrix into a covariance matrix.