I'm going to spend a couple of posts talking about some techniques I've been using while building and calibrating models, as they seem to be less well known than they might be. The first in this series is Sensitivity Analysis.
== What is Uncertainty and Sensitivity Analysis ==
UA and SA are techniques to analyse the robustness of model outputs; that is, given a model, and certain assumptions about its input parameters (and their distributions), we would like to know how much we can trust the output. In general terms, Uncertainty Analysis gives us the amount of uncertainty (or variance) in the model output, while Sensitivity Analysis tells us how much of that variance is due to each of the input factors.
I'm not going to put many references in the text here; you can find everything you ever wanted to know in Saltelli's "Global Sensitivity Analysis: The Primer", ISBN 0470059974, (WileyBlackwell, 11 Jan 2008)
== Problem Number 1: Parameter Ranges and Distributions ==
The first problem which we encounter when doing this is that in many cases, there are no prior distributions for the parameters. Why does this matter? If we take a model which is [tex]Y=0.1X_1+100X_2[/tex]. A naive approach might be to vary X1 and X2 between 0 and 1, and see what happens. The issue here is that we have no reason to believe that this is the correct range. Indeed, if we sample those distributions, run the model and compute the output variance, we'll see that the model has a high output variance. However, if X2 is a physical constant, which is well known to 10 significant figures, this would be a false assumption, as X2 is not likely to vary that much. This issue appears quite frequently, as there are still many results published with distributions attached to them. Even in the case where results are published with error ranges, it is still a decision for the analyist exactly how to convert this into an input prior; should it be a uniform distribution between the extremes? A normal distribution with the extremes as the 5th and 95th percentile?
In short, until results are routinely published with a full probability distribution, this section of the analysis will always be problematic.
== Problem Number 2: Bad OATs ==
If we now want to compute the sensitivity of the model to the different input parameters, a traditional approach which is still often used, is to take the estimated "correct" parameters, and examine the space around them by varying each input in turn. This is known as one-at-a-time, or OAT analysis, and has several issues:
* the "correct" point is of central importance, so all findings are relative to that point.
* as the dimensionality increases, the proportion of the hypervolume bounded by the searched points decreases rapidly.
* no interactions are analysed
It's easy to come up with models which are problematic here. For example, take [tex]Y=X_1*X_2[/tex], and assume "correct" parameter values of 0. A OAT approach would involve setting X2 to 0, and then varying X1, then setting X1 to 0 and varying X2. Given this model, in all cases it will give an output of 0. This would lead us to conclude that the model variance is 0, and the sensitivity to each parameter is 0.
== The Path ==
The necessary approach is to compute both sensitivity and uncertainty over a representative sampling of the inputs. There are several ways to do this, and many of the senstivity analysis methods are concerned with constructing experimental designs which allow the estimation of sensitivities with the minimum number of model runs.
The sensitivity of a model to a parameter can be defined as
which can be read as the variance (over the distribution of Xi) of the expected model output (averaged over all other parameters with Xi fixed).
In the next post, I'll explain this a little more, and then give some practical examples of how it can be used.