We conducted a designed experiment to quantify sources of uncertainty in geologists' interpretations of a geological cross section. A group of 28 geologists participated in the experiment. Each interpreted borehole record included up to three Palaeogene bedrock units, including the target unit for the experiment: the London Clay. The set of boreholes was divided into batches from which validation boreholes had been withheld; as a result, we obtained 129 point comparisons between the interpreted elevation of the base of the London Clay and its observed elevation in a borehole not used for that particular interpretation. Analysis of the results showed good general agreement between the observed and interpreted elevations, with no evidence of systematic bias. Between-site variation of the interpretation error was spatially correlated, and the variance appeared to be stationary. The between-geologist component of variance was smaller overall, and depended on the distance to the nearest borehole. There was also evidence that the between-geologist variance depends on the degree of experience of the individual. We used the statistical model of interpretation error to compute confidence intervals for any one interpretation of the base of the London Clay on the cross section, and to provide uncertainty measures for decision support in a hypothetical route-planning process. The statistical model could also be used to quantify error propagation in a full 3-D geological model produced from interpreted cross sections.
Three-dimensional (3-D) models are now the state of the art for presenting
geologists' knowledge and interpretation of subsurface structures, and are
supplied to varied users of geological information. There is no single
methodology for the production of models, and the method will reflect the
geological setting and the nature of the information available to the
modeller, which may include geophysical imagery, boreholes and surface
observations. Models can be produced by geostatistical interpolation
The uncertainty in a 3-D model is of interest to data users who will apply it
for decision making. For this reason, there has been considerable interest in
the development of quantitative or semi-quantitative operational methods to
characterise the uncertainty in 3-D models and the variation of
this uncertainty in space
If information in 3-D is produced by geostatistical interpolation, then the
uncertainty can be quantified directly on the basis of the geostatistical
model
In this paper, we are particularly interested in the uncertainty of models
produced by the cross-section interpretation methodology.
To this end, we undertook, and report here, an experiment to study the error in cross-section interpretation, hypothesizing that the variability of the interpretation error changes along the section in ways that can be described by a statistical model. We considered statistical models in which the variance of the interpretation error at some location depends on two factors. The first factor was the distance from the location to the nearest borehole available to support the interpretation of the cross section. Our hypothesis was that the variance of interpretation error would increase with the distance to the nearest available borehole. The second factor was the experience of the geologist making the interpretation; our hypothesis was that the variance of interpretation error would diminish with increasing geologist experience.
If our hypothesis is verified, then we could compute confidence intervals for
the interpreted height of a contact along a cross section, and model how this
uncertainty may propagate in the subsequent interpolation from the
interpreted cross section into a 3-D geological model. If statistical models
of the uncertainty in cross-section interpretation could be estimated for a
variety of geological settings, then these could be used to compute
uncertainty measures for new geological models, and so to calculate, for
example, decision–theoretical measures of the value of the model information
This study is based on an 8 km cross section in London which roughly follows the A12 road from Hackney northeast across the Lea Valley to Wanstead. The local geology (Fig. 1) consists of Quaternary deposits comprising alluvium along the valleys of the rivers Lea and Roding, with river terrace deposits at several levels beneath and flanking the alluvium and capping the low interfluvial ridge.
The Quaternary deposits are generally less than 5 m in total thickness, except beneath the Lea Valley, where up to 10 m are encountered. They rest everywhere on Palaeogene bedrock units. In order of increasing age and depth, these are the London Clay Formation, the Lambeth Group and the Thanet Formation. The Quaternary deposits rest on the London Clay Formation along part of the section, but cut down beneath the Lea Valley into the underlying Lambeth Group (Fig. 1). The Palaeogene deposits are underlain by the Chalk Group (Upper Cretaceous), which is several hundred metres thick and is the lowest unit considered that is encountered here in approximately 10 % of the 143 available boreholes along the cross section (Fig. 1).
Map of surface geology (superficial and exposed bedrock) in the study area, with the line of the cross section shown. Map coordinates are in km on the British National Grid. One interpretation of the cross section is shown below, with the position and depth of the full borehole set indicated.
The Palaeogene strata in this region are affected by the Alpine Orogeny, and
underwent gentle folding, faulting and tilting in Oligocene–Miocene times
In the London area, the London Clay Formation is a relatively thick firm clay
without significant water flow, and it is therefore regarded as a good medium
for tunnels and excavations
Questionnaire on modelling experience and responses received.
The key idea of the experiment was that each of a set of participating geologists would make an interpretation of the three Palaeogene bedrock units on the cross section, drawing continuous (if occasionally interrupted) basal contacts of the units as interpretations of the information in a set of boreholes. Any one participant would use a subset of all available boreholes, so that their interpretation could be compared directly with each of a complementary validation subset. The difference between the interpreted and observed elevations of the base of the London Clay, the cross-section error, would then be treated as a variable for statistical analysis to identify important features of its variability. Note that, while we only examined the base of the London Clay, the participants interpreted this in the wider stratigraphical context by also drawing the bases of the other Palaeogene units.
The 51 available boreholes which prove the base of the London Clay were
subdivided by independent random sampling without replacement into ten
non-overlapping subsets of five validation boreholes. We call each of these
subsets a
A total of 28 geologists participated in the experiment. Of these, 22 were delegates at the GSI3D workshop which took place at the British Geological Survey (BGS), Keyworth, from 17 to 18 October 2012, and the GSI3D software was used for the experiment. Some of the workshop participants were staff of BGS, others were geologists from a variety of organisations and countries, with varying levels of experience in geological modelling, but all with some interest and experience, if rudimentary, in the use of the GSI3D software, which was used for this experiment by all participants. The remaining six geologists were BGS staff who participated in the experiment after the workshop.
Each participant was asked to complete a questionnaire before undertaking the exercise. Their unique number was recorded on the form. They had the option of recording their name and contact details on the form, or of remaininganonymous. In the questionnaire, each participant was asked to record a self-assessment of their experience of geological modelling in 3-D by identifying the most appropriate of four general descriptions. The descriptions and responses are presented in Table 1. Note that there was some variation in experience among the participants: two were novices in 3-D modelling, and eight had limited experience. This allows us to quantify the effect of increasing experience on the variability of interpretation error.
The key principle of the experiment was explained to all delegates, who were also provided with an explanation of the units in the cross section. Each participant in the experiment, on presenting at the workstations, was given a unique number, and an interpretation batch of boreholes. In addition to the boreholes, a standard interpretation of the superficial material (as a single unit) was provided, so that all participants were working on a common rockhead surface. The intersections of outcrops, as mapped in 2-D, with the cross section were also provided to all participants. A set of guidance notes on the GSI3D software was available, and at all times a staff member experienced with the software was available to help. When the interpretation was complete it was saved with a code which indicated the participant's unique number and the number of the interpretation batch and complementary validation batch of boreholes which had been allocated. As each geologist presented to participate, they were allocated one of the interpretation batches of boreholes, so that a more or less even distribution of participants over batches was achieved.
Once each geologist had completed and saved their interpretation, this was compared with the corresponding batch of validation boreholes, and the observed and interpreted elevation of the base of the London Clay was extracted. One modeller's interpretation was not correctly saved, so this was lost, and in some cases the London Clay was not present in the interpretation at the location of a validation borehole. Over all validation batches, we were able to make a total of 129 comparisons between an interpreted elevation of the base of the London Clay at the location of a borehole in a validation batch observed elevation in that validation borehole (i.e. in a borehole which had not been available to the geologist who made the particular interpretation). As described in Sect. 3.1 below, and formalised in Eq. (1), an observation of interpretation error is the difference between the interpreted and observed elevation of the base of the London Clay for one such comparison. Between 10 and 20 interpretation errors could be calculated for any validation batch.
This section provides an overview of the analyses undertaken to test our hypothesis, avoiding the statistical detail. The reader will find technical information about the statistical models and their estimation in Sects 3.2–3.3, and these can be ignored by the reader who requires only a summary of the statistical methods. Section 3.4 explains how the selected statistical model for cross-section errors was interrogated to represent the cross-section uncertainty with confidence intervals and an analysis of the implications of this uncertainty for a hypothetical application.
As reported in the previous section, the experimental results consist of a set of 129 comparisons between the interpreted and observed elevations of the base of the London Clay, where each interpretation in the set had been made without access to that particular observation. The variable for statistical analysis is the cross-section error, obtained for each of the 129 comparisons by subtracting the interpreted elevation of the base from the observed elevation. An error of zero therefore means that the observed and interpreted elevations were the same in the particular comparison. A negative error means that the interpreted base was higher than the observed base in that comparison.
The statistical analysis of these values was done with linear mixed models. These treat the cross-section errors as a combination of a fixed effect (here a constant, the mean cross-section error) with random effects. The random effects represent sources of variation in the observed errors, and here account for differences between batches of validation boreholes (are the mean errors for the different batches significantly different?), between the sites of validation boreholes within batches (are the mean errors for different locations within each batch significantly different from each other?), and between the geologists. The means of the random effects are zero; their variances are interesting because they quantify the uncertainty introduced into the interpretation of the cross section by the factors which they represent (differences between modellers, differences between locations). In some of the more complex models, we used the variance of a random effect that was modelled as a function of some covariate. For example, in one case, the variance of the effect of location was modelled as a function of the distance from the location to the nearest borehole available for interpretation (i.e. the nearest borehole in the interpretation batch allocated to the particular geologist). Such models could be used to predict how interpretation uncertainty varies along a cross section.
Summary of statistical models. In all cases the form of the random
effects component for between-batch, between-site and between-geologist
effects is indicated. For each term the dependency is given (independent or
an indicated correlation structure) and it is indicated whether the variance
is constant (stationary) or modelled as a variable quantity. A
We considered seven linear mixed models which were fitted in order, so in some cases a statistical inference about one model (i.e. showing that a particular random effect was not significant) determined the form of subsequent models (that effect was dropped).
The random effects which we considered can be defined with respect to two properties. The first is dependency. If a random effect is independent, then the value that it takes for one instance tells us nothing about the value that it takes in other instances. In the first model, 1a, the random effect that models differences between batches was independent, because the batches were formed by independent random sampling. In other models, a random effect may not be independent, but may have a correlation structure. In all models, the random effect that models differences between sites had a spatial correlation structure: one might expect cross-section errors at two nearby sites to be more similar than errors at two sites which are far apart. In models 1a and 1b, the random effect which accounts for variability of geologist interpretations was independent within any site (the effect for one geologist is independent of the effect for another), but the cross-section errors for any one geologist at different sites were modelled as correlated (a geologist who tends to interpret the base too high at one site might make a similar error at other sites).
The second property of random effects is stationarity in the variance (stationarity hereafter). A stationary random effect has a constant variance. However, the variance of a non-stationary random effect may be modelled as a variable which depends on some other factor. For example, in model 2a, the variance of the geologist random effect depends on the level of experience that each geologist recorded in the questionnaire (Table 1).
Table 2 summarises the differences between the models. Mode 1a is a general one in which there are stationary random effects for batch, site and geologist differences. The batch effect is also independent, the site effect is spatially correlated (as in all models) and the geologist effect shows correlation between errors made by the same geologist. Models 1b and 1c were fitted to test, respectively, whether the variance of the batch effect could be assumed to be zero and whether the geologist random effect could be modelled as independent. The final model in group 1, 1d, was meant to see whether the variance of the site effect was non-stationary, depending on the distance to the nearest available borehole.
In all the models in a second group of three, the batch effect was dropped, and the site effect was spatially correlated and stationary. The geologist effect was independent, but we considered non-stationary alternatives in which the variance depended on (2a) the distance to the nearest borehole available for interpretation, (2b) modeller self-identified experience, and (2c) both these factors.
We compared models in two ways (details in Sect. 3.2). In some cases, it was
possible to compare models by a log-likelihood ratio statistic
The results from this experiment were analysed by the fitting and comparison
of linear mixed models (LMM)
The fixed effect in all LMM that were considered here was the mean
cross-section error. The random effects modelled the contribution of
differences between batches, differences between sites and differences
between geologists. In an LMM, the random effects are modelled as Gaussian
random variables with mean zero and a variance. The variance may be
stationary, a parameter of the LMM, or it may be a variable expressed as a
parametric function of some covariate with parameters to be estimated
Model 1a takes the following form for a set of observations of
cross-section error in a vector
The matrix
The term
If the distance between site
The geologist effect in model 1a, the term
If each geologist had one and only one validation borehole, then the
geologist effect would be simply nested within sites as an independent random
error (regardless of whether there was one or more observations of
cross-section error at each validation site). However, in the current
experiment, each of the geologists was allocated all validation boreholes in
a particular batch, and so we must choose an appropriate statistical model
for the between-geologist effect observed at each of a set of boreholes. In
model 1a, we treat the geologist effects as correlated random variables
within batches. If we denote by
The random effects of the model in Eq. (
We used the
In the proposed model, there are
One may use this procedure to compare the LMM in Eq. (
However, if we consider a null model in which the between-batch variance is
zero this is not a standard case since zero is the lower bound for a
variance. A more general criterion for comparing models of differing
complexity, although not a formal test, is to compute for each model Akaike's
information criterion – AIC
Model 1b is a variant of 1a in which the between-batch variance is dropped. Since the batches were formed at random, one may expect that the mean error does not differ between the batches, except for random sample variation. However, in a comparison between these two models, the null (1b) is formed by fixing the between-batch variance at zero, which is a boundary in parameter space (variances cannot be negative). The models are therefore compared on the AIC.
Model 1c is a variant of 1a in which the correlation
Having selected one model from among 1a–1c, a variant was considered in
which the correlated variance of the between-site random variable,
Here we consider the possibility that the between-geologist variance can be
replaced by a parametric function. In principle this is compatible with any
variant of the models considered so far. The expression for the
between-geologist covariance matrix in Eq. (
Three parametric functions were considered. In the first, the
between-geologist variance for the
The second parametric model considered used the geologist's self-assessment
of experience in 3-D geological modelling. There were four levels of
experience to choose from, so the parameter
Summary statistics of cross-section error.
Model 1 and variants, parameter estimates and inferences.
A final model was considered which combined the last two variants, with
separate intercepts and slopes of the linear function for the geologist
standard deviation being specified for each level of experience (i.e. eight
new parameters replacing
Note that the parametric functions in these three models return variances,
which may vary from one observation of cross section to another. The terms in
We used the selected model (model 2a as described in the results section
below) to simulate realisations of the random component of cross-section
error along a part of the cross section (from 4000
Model 2 and variants, parameter estimates and inferences.
Summary of model comparisons. In each case, the first-named simpler
“null” model is compared with a more complex alternative, either on the
log-likelihood ratio
Estimate of the mean cross-section error conditional on model 2a.
By finding the 2.5th and 97.5th percentiles of the simulated cross-section
errors at any location, we approximate the 95 % confidence interval for
model error. This can be used to visualise the uncertainty. The simulations
can also be used to answer other questions. Consider, for example, an
engineer who wishes to dig a tunnel through the London Clay along the length
of this part of the cross section. We assume that the engineer wants to put
the route of the tunnel as close as possible to the base of the London Clay,
but wants to avoid intruding on the underlying Lambeth Group. The conditional
simulations can be used to assess the risk of intruding on the Lambeth Group
if the tunnel route is
Figure 2 shows a scatter plot of interpreted and observed heights of the base of the London Clay for all observations of cross-section error. The points are scattered around the bisector (where observed and interpreted heights are equal), and there is no visual evidence of a systematic bias. Table 3 shows the summary statistics of cross-section error, and Fig. 3 shows the histogram of this variable. The symmetrical form of the histogram and the weak skewness and kurtosis values suggest that an assumption of normality is plausible for the analysis of these data. They also suggest that, if there is any systematic tendency for the base of the London Clay to be interpreted too high or too low, then this effect is small.
The results for model 1a and its variants are shown in Table 4. Note that
the estimated between-batch variance is zero. When a REML estimate of a
parameter is at the boundary of parameter space, as here, it is advisable to
examine the likelihood profile in the vicinity of the estimate. To compute
the likelihood profile for a model parameter, that parameter is fixed at a
series of values and, for each, the remaining parameters are estimated by
maximum (residual) likelihood. The maximised likelihoods are then plotted
against the values of the parameter of interest. The profile likelihood
should increase smoothly towards the estimated value. The profile likelihood
for the batch variance satisfied this requirement. This is not unreasonable;
because the batches were formed at random, we would hope that the
between-batch variation is purely explicable in terms of sampling error. The
comparison of models 1a and 1b can be done by examining the AIC, which is
smaller for the latter model, in which the batch effect is dropped.
Model 1b is therefore selected over 1a. The profile likelihood for the
uncorrelated between-site variance in these models,
In model 1c, the correlation between within-site effects for particular
geologists is dropped (set to zero). The maximum likelihood is slightly
smaller than for model 1b, in which this parameter is estimated. However,
the log-likelihood ratio statistic,
In model 1d, a stationary correlated variance for the between-site effect
(as in model 1c) is replaced by two parameters for a linear function which
expresses this variance as a function of distance to the nearest borehole
available for interpretation. This (full) model can be compared with a (null)
model (1c) with a stationary variance by the log-likelihood ratio test.
Once again,
In summary, the consideration of model 1a and its variants in Table 4 leads
us to the selection of model 1c (smallest AIC in the table), in which the
batch effect and the correlation parameter
Table 5 shows results for model 2a and its variants. These models are based
on 1c, but differ in that, rather than assuming a stationary geologist
effect, the between-geologist within-site variance is modelled as a function
of covariates. In model 2a, the geologist variance is modelled as a linear
function of distance to the nearest borehole available to the geologist for
interpretation. The zero value of the intercept,
All validation observations of the interpreted and observed height of the base of the London Clay AOD. The red line is the bisector.
Model 2b is an alternative to 2a, in which the geologist variance depends
on the self-identified experience of the geologist in 3-D modelling. The
estimated parameters in Table 5 are plausible in that the variance is largest
for geologists who identified themselves as having “no experience of
modelling in 3-D” and smallest for those who identified themselves as having
“substantial experience of more than 2
Histogram of cross-section errors.
In model 2c different relationships between geologist variance and distance to nearest borehole were fitted for the four levels of geological experience. In the fitted model the intercepts were all zero, the smallest slope is for the geologists with the highest experience level. However, while the log-likelihood ratio test shows that model 2c is significantly better than model 2b (i.e. adding the information on distance to nearest borehole to a model with geologist experience gives a significant improvement), the comparison of model 2c with 2a leads to the conclusion that adding geologist experience to a model which already has the distance to nearest borehole incorporated does not give a significant improvement. On the basis of the AIC, model 2a is preferred among all those considered in this study. Table 6 summarises all the key comparisons between models and the inferences which arise from these comparisons.
Table 7 shows the estimated mean cross-section error and its standard error,
under model 2a. The Wald statistic
95 % probability interval for simulated cross-section errors conditional on the location of the nearest borehole (red symbol) and model 2a. Note that these are evaluated at discrete locations.
Fig. 4 shows the 95 % probability interval for cross-section errors along the section, approximated by the 2.5th and 97.5th percentiles of the conditionally simulated errors. The red symbols show the locations of the boreholes. There are two features of the interval. First, there is a rapid narrowing near the boreholes (the interval is zero at the boreholes, but this is only seen if the borehole coincides with a point where the error is sampled). This arises from the spatial correlation of the between-site component of cross-section error. The second feature is a gradual widening of the interval to a local maximum at the midpoint between successive boreholes. This is particularly apparent in the second half of the plot. This arises from the dependence of the between-modeller effect on the distance to the nearest borehole, showing how the constraint of the borehole on model error decays with distance. In Fig. 5, the confidence intervals are added to the interpretation of the base of the London Clay by one of the modellers.
Fig. 6 shows a plot of the estimated probability that a tunnel built
Both the summary statistics and the scatter plot (Fig. 1), and the estimate of the mean cross-section error from the selected model 2a (Table 7), show that the data obtained in this study provide no evidence that there is any bias in the interpretation of the base of the London Clay by the geologists in this study; i.e. the mean error is not significantly different from zero.
We established this experiment to test the hypothesis that the variability of the error of interpretations of cross sections varies spatially. This hypothesis has been supported. First, we found that there is spatial dependence in the variability of the between-site component of cross-section error. This is to say that the cross-section error at one location is likely to be more strongly correlated with the error at a nearby location than at one farther away. This is reasonable, since if, for example, a surface tends to be interpreted as being too high above the Ordnance Datum at a site, perhaps because of faulting, then it is likely that a similar error will occur at nearby sites. There was no evidence, however, that the between-site variance depends on the distance to the nearest borehole.
One geologist's interpretation of the base of the London Clay (red) with 95 % confidence intervals (blue).
How close to the modelled base of the London Clay could you build a tunnel (over the last 4 km of the cross section) and have a specified probability (ordinate) that the tunnel will stray into the underlying Lambeth Group for no more than 1 % of its length?
The between-geologist variance is rather smaller than the between-site
variance (compare
The fitted model can be used to simulate cross-section errors, conditional on a distribution of boreholes. One may use this procedure to compute confidence intervals around the interpreted cross section which quantifies uncertainty in this interpretation and shows how this changes in space. One could also use this simulation method to study the propagation of cross-section error in further processing to interpolate the surface into 2-D, and so produce 3-D volumes.
The methodology presented in this paper could be deployed in a wider range of geological settings in order to generate statistical models of cross-section error for those settings. These could then be used to compute confidence intervals for new models or measures of uncertainty specific to the requirements of particular data users, such as the example for the London Clay illustrated in Fig. 5.
The experimental design used in this study allowed us to make best use of somewhat sparse boreholes by examining multiple geologist interpretations at each validation site. However, if there had been a significant correlation between within-site effects for the same geologist, then subsequent modelling of the geologist variance would have been complicated. Alternatively, one might use an experimental design in which validation sites are nested within modellers (so each modeller has a unique batch of validation sites). This requires there to be many boreholes available, however, since each validation borehole is compared with just one interpretation. It also reduces the information that we obtain on between-modeller differences.
One way to get around the problem of insufficient validation observations is
to generate synthetic cross sections, perhaps conditioned on geophysical data
such as interpretations from seismic lines. These synthetic cross sections
can then be notionally sampled at as many locations as we want to provide
synthetic borehole data for interpretation and validation. In such an
experiment, the syntheticvalidationboreholes should be
sampled according to an optimised design
We are grateful to our colleague Luz Ramos Cabrera for her contribution to setting up the trial, to those delegates to the 2012 GSI3D workshop, and to other colleagues at the British Geological Survey who participated. This paper is published with the permission of the executive director of the British Geological Survey (NERC). Edited by: K. Zeigler