Uncertainty assessment in 3-D geological models of increasing complexity 517 Germany Black Forest

The quality of a 3-D geological model strongly depends on the type of integrated geological data, their interpretation and associated uncertainties. In order to improve an existing geological model and effectively plan further site investigation, it is of paramount importance to identify existing uncertainties within the model space. Information entropy, a voxel-based measure, provides a method for assessing structural uncertainties, comparing multiple model interpretations and tracking changes across consecutively built models. The aim of this study is to evaluate the effect of data integration (i.e., update of an existing model through successive addition of different types of geological data) on model uncertainty, model geometry and overall structural understanding. Several geological 3-D models of increasing complexity, incorporating different input data categories, were built for the study site Staufen (Germany). We applied the concept of information entropy in order to visualize and quantify changes in uncertainty between these models. Furthermore, we propose two measures, the Jaccard and the city-block distance, to directly compare dissimilarities between the models. The study shows that different types of geological data have disparate effects on model uncertainty and model geometry. The presented approach using both information entropy and distance measures can be a major help in the optimization of 3-D geological models.


Introduction
Three-dimensional (3-D) geological models have gained importance in structural understanding of the subsurface and are increasingly used as a basis for scientific investigation (e.g., Butscher and Huggenberger, 2007;Caumon et al., 2009;Bis-tacchi et al., 2013;Liu et al., 2014), natural resource exploration (e.g., Jeannin et al., 2013;Collon et al., 2015;Hassen et al., 2016), decision making (e.g., Campbell et al., 2010;Panteleit et al., 2013;Hou et al., 2016) and engineering applications (Hack et al., 2006;Kessler et al., 2008).Overall, 3-D geological models are usually preferable over 2-D solutions because our object of study is intrinsically three-dimensional in space and, therefore, they offer a higher degree of data consistency and superior data visualization.Moreover, they enable the integration of many different types of geological data such as geological maps, cross sections, outcrops, boreholes and data from geophysical (e.g., Boncio et al., 2004) and remote-sensing methods (e.g., Schamper et al., 2014).Nevertheless, input data are often sparse, heterogeneously distributed or poorly constrained.In addition, uncertainties from many sources such as measurement error, bias and imprecisions, randomness, and lack of knowledge are inherent to all types of geological data (Mann, 1993;Bárdossy and Fodor, 2001;Culshaw, 2005).Furthermore, assumptions and simplifications are made during data collection, and subjective interpretation is part of the modeling process (Bond, 2015).Hence, model quality strongly depends on the type of integrated geological data and its associated uncertainties.
In order to assess the quality and reliability of a 3-D geological model as objectively as possible, it is essential to address underlying uncertainties.Numerous methods have recently been proposed that enable estimates, quantification and visualization of uncertainty (Tacher et al., 2006;Wellmann et al., 2010;Lindsay et al., 2012Lindsay et al., , 2013Lindsay et al., , 2014;;Lark et al., 2013;Park et al., 2013;Kinkeldey et al., 2015).A promising approach is based on the concept of information entropy (Shannon, 1948).Wellmann and Regenauer-Lieb (2012)

applied this concept
Published by Copernicus Publications on behalf of the European Geosciences Union.
to 3-D geological models.In their study, they evaluated uncertainty as a property of each discrete point of the model domain by quantifying the amount of missing information with regard to the position of a geological unit (Wellmann and Regenauer-Lieb, 2012).They consecutively added new information to a 3-D model and compared uncertainties between the resulting models at discrete locations and as an average value for the total model domain using information entropy as a quantitative indicator.Through their approach, they addressed two important questions: (1) how is model quality related to the available geological information and its associated uncertainties, and (2) how is model quality improved through the incorporation of new information?
Wellmann and Regenauer-Lieb (2012) illustrated their approach using synthetic 3-D geological models, showing how additional geological information affects model uncertainty.The present study goes a step further.It applies the concept of information entropy as well as model dissimilarity to a real case, namely the city of Staufen, Germany, at the eastern margin of the Upper Rhine Graben.In contrast to the previous study, the present study evaluates the effects of the consecutive addition of data from different data categories to an existing model on model uncertainty and overall model geometry.We hypothesize that disparate effects of different data types on model uncertainty exist and that the quantification of these effects provides a trade-off between costs (i.e., data acquisition) and benefits (i.e., reduced uncertainty and therefore higher model quality).Thus, several 3-D geological models of the study site were consecutively built with increasing complexity; each of them based on an increasing amount of (real) categorized data.An approach was developed that uses information entropy and model dissimilarity for the quantitative assessment of uncertainty in the consecutive models.Results indicate that the approach is applicable for complex and real geological settings.The approach has large potential as a tool to support both model improvement through successive data integration and cost-benefit analyses of geological site investigations.

Study site
The city of Staufen suffers from dramatic ground heave that resulted in serious damage to many houses (southwest Germany, Fig. 1).Ground heave with uplift rates exceeding 10 mm month −1 started in 2007 after seven wells were drilled to install borehole heat exchangers (BHEs) for heating the local city hall (LGRB, 2010).After more and more houses in the historic city center showed large cracks, an exploration program was initiated by the state geological survey (LGRB -Landesamt für Geologie, Rohstoffe und Bergbau) in order to investigate the case.Results showed that the geothermal wells hydraulically connected anhydrite-bearing clay rocks with a deeper aquifer, and resulting water inflow into the anhydritic clay rock triggered the transformation of the min-eral anhydrite into gypsum (Ruch and Wirsing, 2013).This chemical reaction is accompanied by a volume increase that leads to rock swelling, a phenomenon typically encountered in tunneling in such rock (e.g., Einstein, 1996;Anagnostou et al., 2010;Butscher et al., 2011bButscher et al., , 2015;;Alonso, 2011), but recently also observed after geothermal drilling (Butscher et al., 2011a;Grimm et al., 2014).The abovementioned exploration program was aimed not only at finding the cause of the ground heave but also at better constraining the complex local geological setting.The hitherto existing geological data were not sufficient to explain the observed ground heave, locate the geological units that are relevant for rock swelling, and plan countermeasures.
Staufen is located west of the Black Forest at the eastern margin of the Upper Rhine Graben (URG).It is part of the "Vorbergzone" (Genser, 1958), a transition zone between the eastern main border fault (EMBF) of the graben and the graben itself.This zone is characterized by staggered fault blocks that were trapped at the graben margin during opening and subsidence of the graben.The strata of this transition zone are often steeply inclined or even vertical (Schöttle, 2005) and are typically displaced by west-dipping faults with a large normal displacement.The fault system, kinematically linked to the EMBF, has a releasing bend geometry and today experiences sinistral oblique movement (Behrmann et al., 2003).The major geological units at the site comprise Triassic and Jurassic sedimentary rocks, which are covered by Quaternary sediments of an alluvial plain in the south (Sawatzki and Eichhorn, 1999) (Fig. 1).
Three geological units play an important role for the swelling problem at the site: the Triassic Gipskeuper ("Gypsum Keuper") formation, which contains the swelling zone, and the underlying Lettenkeuper formation and Upper Muschelkalk formation, which are aquifers providing groundwater that accesses the swelling zone via pathways along the BHE.The Gipskeuper formation consists of marlstone and mudstone and contains the calcium-sulfate minerals anhydrite (CaSO 4 ) and gypsum (CaSO 4 + H 2 O).The thickness of this formation varies between 50 and 165 m, with an average thickness of 100-110 m (LGRB, 2010), depending on the degree of leaching of the sulfate minerals close to the ground surface.It is underlain by the Lettenkeuper formation (5-10 m thickness), consisting of dolomitic limestone, standstone and mudstone, and the Upper Muschelkalk formation (≈ 60 m thickness) dominantly consisting of limestone and dolomitic limestone.

Input data
Input data for the 3-D geological modeling include all available geological data that indicate (1) boundaries between geological units, (2) the presence of geological units and faults at a certain positions, and (3) orientation (dip and azimuth) of the strata.These data were classified into four categories (Fig. 2): (1) non-site-specific, (2) site-specific, (3) direct problem-specific data and (4) indirect problem-specific data.
The non-site-specific data category comprises geological data that are generally available from published maps (Sawatzki and Eichhorn, 1999), the literature (Genser, 1958;Groschopf et al., 1981;Schreiner, 1991) and the database of the state geological survey, LGRB.Furthermore, a digital terrain model (DTM) of 1 m grid size is included in the non-site-specific data.Outcrop and borehole data are mostly scarce and irregularly distributed in space.The site-specific data comprise drill logs of the geothermal drillings, which provided a pathway for uprising groundwater that finally triggered the swelling.Problem-specific data comprise all data collected during the exploration program that was conducted after heave at the ground surface caused damage to the local infrastructure (LGRB, 2010(LGRB, , 2012)).This exploration program was initiated because geological knowledge of the site was insufficient for an adequate understanding of the swelling process in the subsurface and for planning and implementing suitable countermeasures.The problem-specific data were further divided into direct data from drill cores of the three exploration boreholes (Fig. 2; EKB 1+2 and BB 3), which add very accurate point information, and indirect data from a seismic campaign (Fig. 2; Profile 1-5), which add rather "fuzzy" 2-D information that has to be interpreted.

3-D geological modeling
The 3-D geological models were constructed using the geomodeling software SKUA/GoCAD ® 15.5 by Paradigm.They cover an area of about 0.44 km 2 and have a vertical extent of 665 m.A smaller area of interest (AOI, 300 m × 300 m, 250 m vertical extent) was defined within the model domain, including the drilled wells and the area where heave at the ground surface was observed and the problem-specific data were collected.
The strata of the models cover 10 distinct geological units including Quaternary sediments, Triassic and Jurassic bedrock, and crystalline basement at the lower model boundary (Fig. 3).The Triassic strata are further divided (from top to bottom) into four formations of Keuper (Steinmergelkeuper, Schilfsandstein, Gipskeuper and Lettenkeuper), two formations of Muschelkalk (Upper Muschelkalk, Middle to Lower Muschelkalk) and the Buntsandstein formation.Figure 3 provides an overview over the modeled geological units and average thicknesses used in the initial models.
Four initial models were consecutively built, according to the four previously described data categories.Model 1 was constructed based only on non-site-specific data (maps, literature, etc.); Model 2 additionally considered site-specific data (drill logs of the seven geothermal drillings); Model 3 also included "direct" problem-specific data (exploration boreholes); and finally, Model 4 included "indirect" problemspecific data (seismic campaign).

Site specific
Additional data: Geological data and information on local to regional scale.

Geological data with direct reference to the area of interest (AOI).
Geological data with direct reference to the AOI and collected explicitly to address the swelling problem.data density and structural model complexity increase from Models 1 to 4 and the models required successively higher efforts in data acquisition in the field.
First, an explicit modeling approach (Caumon et al., 2009) was used to create representative boundary surfaces for the geological units and faults of the initial model because the available input data were, in terms of spatial coverage, not sufficient to directly use an implicit approach.Discrete smooth interpolation (DSI) provided by GoCAD ® was used as the interpolation method (Mallet, 1992), which resulted in Delaunay-triangulated surfaces for both horizons and faults.Subsequently, based on the explicitly constructed surfaces, a volumetric 3-D model was built by implicit geological modeling, implemented in the software SKUA ® .The implicit modeling approach uses a potential field interpolation considering the orientation of strata (Frank et al., 2007), and is based on the U -V -t concept (Mallet, 2004), where horizons represent geochronological surfaces.

General approach
Our approach for assessing uncertainties in the 3-D geological models consists of four distinct steps (Fig. 4): i. Building the initial 3-D geological models of increasing data density and structural complexity (see above).
ii.The definition of fault and horizon uncertainties.Horizon uncertainties were specified in SKUA ® by a maximum displacement parameter or by alternative surface interpretations, resulting in a symmetric envelope of possible surface locations around the initial surface.
To constrain the shape of generated horizons, SKUA ® uses a variogram that spatially correlates perturbations applied to the initial surfaces (Paradigm, 2015).Fault uncertainties were defined by a maximum displacement parameter and a Gaussian probability distribution around the initial fault surface (Caumon et al., 2007;Tertois and Mallet, 2007).
iii.The creation of 30 model realizations for each initial model based on the surface variations defined above, applying the Structure Uncertainty workflow of SKUA ® .
iv.The extraction of the geological information from all model realizations for analysis, comparison and visualization.For this purpose, the AOI was divided into a regular 3-D grid of 5 m cell size, resulting in 180 000 grid cells.The membership of a grid cell to a geological unit was defined as a discrete property of each grid cell and extracted for all 30 model realizations.Based on these data, we calculated the probability of each geological unit being present in a grid cell in order to derive the information entropy at the level of (1) a single grid cell, (2) a subset representing the area of extent of a geological unit and (3) the overall AOI.Furthermore, the fuzzy set entropy was calculated to determine the ambiguousness of the targeted geological units Gipskeuper (km1), Lettenkeuper (ku) and Upper Muschelkalk (mo) within the AOI.Calculations were conducted using the statistics package R (R Core Team, 2016).The underlying concepts and equations used to calculate probabilities and entropies are described in the following section.

Information entropy
The concept of information entropy (or Shannon entropy) was first introduced by Shannon (1948) and is well known in probability theory (Klir, 2005).It quantifies the amount of missing information and hence, the uncertainty at a discrete location x, based on a probability function P of a finite data set.When applied to geological modeling, information entropy expresses the "degree of membership" of a grid cell to a specific geological unit.In other words, information entropy quantitatively describes how unambiguously the available information predicts that unit U is present at location x.Information entropy was recently applied to 3-D geological modeling by Wellmann et al. (2010) and Wellmann and Regenauer-Lieb (2012) in order to quantify and visualize uncertainties introduced by the imprecision and inaccuracy of geological input data.A detailed description of the method can be found in the cited references and is briefly summarized here.By subdividing the model domain M into a regular grid, a discrete property can be assigned to any cell at location x in the model domain.In a geological context, the membership of a grid cell to a geological unit U can be defined as such a property by an indicator function: Applied to all n realizations k of the model space M, the indicator function yields a set of n indicator fields I with each of them defining the membership of a geological unit as a property of a grid cell.Considering the combined information of all indicator fields, it follows that membership is no longer unequivocally defined at a location x and hence has to be expressed by a probability function P U : From the probabilities of occurrence P x (U ) the uncertainty (or amount of missing information) associated with a discrete point (grid cell) can be obtained by calculating the information entropy H x (Shannon, 1948) for a set of all possible geological units U: (3) In a next step, information entropy H M can be calculated as an average value of H x over the entire model space: where |M| is the number of elements within M, H M = 0 denotes that the location of all geological units is precisely known (no uncertainty) and H M is maximal for equally distributed probabilities of the geological units (P P U 3 = . ..), which means that a clear distinction between geological units within the model space is not possible.Similarly, average information entropy can also be applied to only a subset of the model space (S ⊆ M): H S can be used to evaluate the contribution of a specific sub-domain to overall uncertainty.In the case of a drilling campaign, for example, the sub-domain can comprise a targeted depth or a geological formation of specific interest.In this study, we used the probability function P x (U ) with H S conditioned by P x (U ) > 0 to define subsets within the model space.Thus, each subset represents the probability space of a geological formation of interest, namely the Lettenkeuper (S ku ), Gipskeuper (S km1 ) and Upper Muschelkalk (S mo ) formation.
Wellmann and Regenauer-Lieb (2012) also adapted fuzzy set theory (Zadeh, 1965) in order to assess how well-defined a single geological unit is within a model domain.A fuzzy set of n model realizations introduces a certain degree of indefiniteness to a discrete property (e.g., membership of a ge-ological unit), resulting in imprecise boundaries which can be referred to as fuzziness.The fuzziness of a fuzzy set (De Luca and Termini, 1972) in the context of a geological 3-D model can be quantified by the fuzzy set entropy H U (Leung et al., 1992;Yager, 1995): where the probability function P x (U ) with an interval [0,1] represents the degree of membership of a grid cell to a fuzzy set.H U equals 0 when P x (U ) is either 0 or 1 everywhere within the set; and H U equals 1 when all cells of the set have an equal probability of P x (U ) = 0.5.

Model dissimilarity
The stepwise addition of input data to the models (see Sect. 3.1) not only affects uncertainties associated with a geological unit but also the geometry of the units and therefore their position, size and orientation in space.New data may significantly change the geometry of a geological unit but only marginally change the overall uncertainty.Thus, both model uncertainty and dissimilarity should be evaluated.In order to quantify the dissimilarity d between consecutive models in terms of the probability of a specific geological unit occurring in a given voxel, two measures, the Jaccard and the city-block distance (Fig. 5), are proposed to complement information entropy.However, dissimilarities between models, and therefore, uncertainties, have recently also been addressed very effectively using geo-diversity metrics such as formation depth and volume, curvature and neighborhood relationships together with principal component analysis (Lindsay et al., 2013) and through topological analysis, which quantifies geological relationships in a model (Thiele et al., 2016a, b).The set of locations for which the probability P x (U ) of belonging to a particular geological unit U is greater than a threshold value t can be defined by A threshold value of t = 0 was applied in order to capture and consider the same sample space as in H U .This definition is highly sensitive to outcomes of small probability and might, in some cases, be more robust using a threshold value greater than 0 (e.g., t > 0.05).The Jaccard similarity measure (Webb and Copsey, 2003) is then defined as the size of the intersection divided by the size of the union (overlap) of two sample sets (M1, M2), which in our case represent the similarity in position of a geological unit U between two models: Accordingly, the dissimilarity between models can be expressed by the Jaccard distance: where d JAC = 1 indicates maximum dissimilarity (no match in position of a geological unit U between two models) and d JAC = 0 indicates complete overlap.
Even though the use of binary dissimilarities is straightforward and suitable to quantify absolute changes in position of a geological unit between models, it does not account for fuzziness (see Sect. 3.3.2).Hence, the dissimilarity may be overestimated by the Jaccard distance.In order to include fuzziness, the normalized city-block distance was employed, adopting the probability function P x (U ) as a dimension to compare dissimilarities between the two sample sets (M1,M2) (Webb and Copsey, 2003;Paul and Maji, 2014): where N is the size of M1 ∪ M2 (i.e, number of grid cells present within the union).The distance is greatest for d NCB = 1.

Initial 3-D models
The four consecutively constructed initial models show a stepwise increase in structural complexity (Fig. 6).structures difficult, especially into depth (Jessell et al., 2010).Dip and strike were assumed uniform (40 and 35 • ) for all horizons across the model domain (see Fig. 6).Information from geological maps and outcrop data revealed a normal fault within the AOI, which was assumed to be ENE-WSW striking with a moderate displacement of about 50 m.In Model 2, horizon positions of the Schilfsandsteinkeuper (km2), Gipskeuper (km1) and Lettenkeuper (ku) were locally constrained by site-specific information provided by drill logs of the geothermal wells, slightly impacting fault displacement and thickness of the formations.However, changes in model geometry were minor, as no further infor-mation on horizon orientations was available and no additional faults could be located.By adding the direct problemspecific data from the exploration wells to Model 3, a horstgraben structure was identified that entailed a considerable displacement at two normal faults between and to the northwest of the wells with a displacement of 120 and 70 m, respectively.Furthermore, the drill logs included orientation measurements of the strata, resulting in a shift in position and inclination of layers, compared to the previous models.Thus, large parts of the model domain within the AOI changed from Model 2 to Model 3, and, as a consequence, dissimilarities between these models are particularly high (see Sect. 4.4).Finally, Model 4, which included data from a seismic campaign, has the highest degree of structural complexity.The information provided by seismic sections revealed uncertainties, which were present previously but not captured by the simpler Models 1 to 3. Ultimately, seismic data force the interpreter to add complexity down to a certain scale.However, seismic surveys are inherently ambiguous and allow alternative interpretations, especially concerning the orientation and number of faults as well as the type of fault contact to a fault network (e.g., branching) (Røe et al., 2014;Cherpeau and Caumon, 2015;Julio et al., 2015).In our case, seismic sections and interpretations were adopted from LGRB (2010).
The indirect problem-specific data from the seismic 2-D survey located several additional faults within the AOI, and in some cases caused a shift in position of faults compared to Model 3. The AOI was strongly fragmented by the added faults, and the orientation of layers is no longer uniform but varies strongly between fault blocks.In summary, the stepwise integration of data according to the four data categories improved our general knowledge of subsurface structures at the study site (Fig. 2).In addition, the effect of data integration from different exploration stages on modeled subsurface geometry could be evaluated and visualized.

Multiple model realizations
The multiple (30) model realizations created by the Structural Uncertainty workflow of SKUA ® are illustrated in Fig. 7 using 2-D cross sections of Models 1 and 4 as examples.A total number of 30 realizations and a cell size of 5 m was chosen as a compromise between model detail, lowest practical limit for statistical viability and data handling.For the same reason we did not base our number of realizations on an estimate of convergence.Instead we used the estimate of 30 realizations for a stable fluctuation in fuzzy entropy in a model developed by Wellmann et al. (2010) as a guideline value to our model.Perturbations in horizon location are based on (1) alternative surface interpretations, which reflect a maximum deviation in dip and azimuth (±5 • ) from the initial surface and (2) constant displacement values, which were assigned in order to account for uncertainties in formation thickness and boundary location.For a more detailed expla-nation of our choice of parameters, assigned probability distributions and specific input modes of the Structural Uncertainty workflow, please refer to the Supplement (Tables S1  and S2).In Model 1, the non-site-specific data set includes minimal constraints, resulting in faults and horizons of the realizations that are widely dispersed but parallel.In contrast, the faults and horizons of the Model 4 realizations are more narrowly dispersed where problem-specific data were available within the AOI.The workflow handles equal uncertainties consistently across models by producing a similar pattern of horizontal displacement in Models 1 and 4.This can be seen in particular for structures located close to the NW boundary, which were not further constrained by consecutively added geological data.However, it is also apparent from the mostly uniform orientation of the surfaces in the 30 realizations of each model that perturbation measures implemented in the Structural Uncertainty workflow did not allow for large variations in dip and azimuth of horizons or faults.Therefore, uncertainty may be systematically underestimated especially at greater depths.

Distribution of information entropy
Information entropy, quantified at the level of individual grid cells, can be visualized in 3-D to identify areas of uncertainty and evaluate changes in geometry resulting from successive data integration.Figure 8a shows the distribution of information entropy for Models 1 and 4. It can also be seen that the approach is suitable for locating areas with high degrees of uncertainty, indicated by dark red colors (hot spots) in this figure.Furthermore, Fig. 8b highlights where additional constraints from the data helped to optimize the model by reducing uncertainties ( H x < O) and whether further constraints are needed in locations of specific interest.The overall distribution of uncertainty was clearly affected by additional geological information from site-and problemspecific input data (Model 4).This effect is highlighted by the changes in entropy between the models (Fig. 8b).Additional constraints on horizon and fault boundaries caused a shift in position and orientation of geological units, followed by a large redistribution of uncertainties, indicated by the changes in entropy.It can be seen that new hot spots of uncertainty were introduced in proximity to the faults identified by the exploration boreholes and the seismic data incorporated into Model 4 (see Fig. 6).However, these new areas of uncertainty can be considered an optimization of the model because large parts of the preceding Model 1 did not reflect the complex local geology.Model 1 (wrongly) predicted low uncertainties for areas where information on unidentified but existing structures (i.e., faults) was missing.This illustrates that epistemic uncertainties at the study site are likely substantial.Even Model 4 will inevitably still underrepresent the true structural complexity at this site, especially in areas of low data density.In a risk-assessment and decision-making process, this can be problematic because low uncertainty areas might be in fact no-information areas.In such a case, the respective model area would actually be highly uncertain.However, ambiguities in data interpretation (e.g., seismic sections) can lead to incorrectly identified structures and uncertainty in any case, even in areas of high data density.Nevertheless, the approach allows one to assess and visualize uncertainties related to structures that have been identified during site investigation.To lessen the limitations posed by non-sampled locations, Yamamoto et al. (2014) proposed a post-processing method for uncertainty reduction, using multiple indicator functions and interpolation variance in addition to information entropy.Based on information theory, Wellmann (2013) further proposed joint entropy, conditional entropy and mutual information as measures to evaluate correlations and reductions of uncertainty in a spatial context.However, uncertainty from a lack of evidence of a geological structure (e.g., fault), known as imprecise knowledge (Mann, 1993), still depends on the density and completeness of available input data.

Average information entropy
The calculated average information entropy H T of the consecutive models steadily decreases with higher data specificity (i.e., non-site to problem-specific, see Fig. 2) from Models 1-4 (Fig. 9).Mean values of H M ranged from 0.56 (Model 1) to 0.39 (Model 4), where H M = 0 would denote no structural uncertainty.The decrease from Models 1 to 4 is approximately linear, indicating that all four categories of geological data had a similar impact on overall model uncertainty, even though the added information resulted in quite different model geometries and, as discussed above, in some cases in a local increase in entropy (see Fig. 8b).A similar but more pronounced trend was observed for the average entropy H S of the subsets S km1 , S ku and S mo , which represent the domain of the three geological units that are of particular importance to the swelling problem.However, entropy, i.e., the amount of uncertainty, is considerably higher within the domain of these geological units than for the overall model space, especially for the subsets S ku and S mo , identifying them as areas of a particularly high degree of uncertainty.Note that these units are the aquifers that have been hydraulically connected to the swellable rocks via the geothermal drillings.Nevertheless, all entropy values are comparably moderate, considering that a maximum of (only) five different geological units was found in any one grid cell across all four models, yielding a possible maximum entropy of H M = 2.32 for an equal probability distribution (P 1 = P 2 = P 3 = P 4 = P 5 ).For comparison: if all 10 geological units would be equally probable, the maximum entropy would be 3.32.Furthermore, median values and interquartile range dropped from 0.51 (0-0.99) in Model 1 to 0 (0-0.84) in Model 4. This helps to illustrate that the amount of grid cells with H x = 0 (indicating no inherent uncertainty), increased notably by 34.8 % from 40.6 (Model 1) to 54.8 % (Model 4) and that the remaining entropies in Model 4 are limited to a considerably smaller number of cells within the model domain.
Overall, comparing the pre-to post-site-investigation situations (Models 1-4), site and problem-specific investigations were all equally successful in adding information to the model and reducing uncertainties in the area of the targeted horizons.While the benefits from the different data are equal, the costs in data acquisition (i.e., work, money and time re- quired) may vary considerably, depending on the exploration method (e.g., drillings and seismic survey).An economic evaluation was not within the scope of this study.Nevertheless, the approach presented could improve cost and benefit analyses by quantifying the gain in information through different exploration stages.

Fuzzy set entropy
The fuzzy set entropy was calculated to indicate how welldefined a geological unit is within the model space.Applied to the swelling problem of our case study, a high degree of uncertainty remains with regard to the position of the relevant geological units (km1, ku, mo) after full data integration.We obtained fuzzy set entropy values (H U ) ranging between 0.329-0.504(Fig. 10).The fuzziness of these geological units only slightly changed from Models 1 to 4, indicating that higher data specificity did not translate into more clearly defined geological units within the model domain.This can be partially attributed to the complex geological setting of the study site.In the process of data integration, additional boundaries between geological units are created at newly introduced faults, increasing the overall fuzziness of a unit.
In the case of the Lettenkeuper formation (unit ku), boundaries are even slightly less well-defined in Model 4 compared to Model 1.This is likely related to the low thickness of the formation (5-10 m, Fig. 3) relative to the mesh size (5 m).A finer grid could reduce this effect; however, computation time would increase significantly.Wellmann and Regenauer-Lieb (2012) propose using unit fuzziness to determine an optimal representative cell size and reduce the impact of spatial discretization on information entropy.As previously discussed in Sect.4.2, our workflow does not explicitly consider uncertainties through dip and strike variations by a value indicated for this purpose but through perturbations based on alternative surface interpretations, which in our case likely underestimates the fuzziness of the targeted geological units at greater depths.Thus, overall fuzziness, particularly in Model 1, may be significantly higher than calculated.

Models dissimilarity
A gain in structural information through newly acquired data usually not only impacts model uncertainty but is also associated with a change in model geometry.The calculated distances between models can identify the data category with the strongest impact on model geometry and make it possible to determine whether model geometry and uncertainty are related.Figure 11 shows the calculated Jaccard and city-block distances between the models with respect to the targeted geological units km1, ku and mo.Calculated distances between models are rather high, with values of up to 0.78; indicating a pronounced shift in position of the geological units after data were added.The addition of both direct and indirect problem-specific data to Model 3 had a strong impact on model geometry, which can be seen by comparing the calculated distances between Models 2, 3 and 4 for both Jaccard and city block (Fig. 11).In contrast, sitespecific data had a much lower effect, with less than a 20 % (0.2) change in unit position, except for ku of the Jaccard distance (see distance between Models 1 and 2).
Overall, the city-block distance, which considers the fuzziness of geological boundaries, shows a similar trend to the Jaccard distance; however, changes are much less pronounced, especially for unit ku.According to the low cityblock distance, absolute changes in probability P x (U ) for each grid cell are small, whereas high Jaccard distances indicate a large number of grid cells being affected through newly added data.Thus, the Jaccard distance likely overestimated the actual dissimilarity between models.Comparing unit ku of both distances; the disparity between values hints at a large number of low-degree changes in membership of the grid cells ( P x (U ) 1).These predominately low-degree changes are likely related to the abovementioned high degree of unit boundary fuzziness and the resulting, illdefined, geological unit ku being shifted within the model domain.However, a direct comparison of fuzzy set entropy to the corresponding city-block distance yields no quantifiable relationship between model geometry and structural uncertainty.
Nonetheless, both distance measures allow the quantification and assessment of different aspects of dissimilarities and therefore, changes in geometry across models.Nevertheless, the city-block distance is preferable when sets of multiple realizations are compared because it factors in the probability of the occurrence of a geological unit at a discrete location.In recent years, various distance measures have already been applied in other contexts to create dissimilarity distance matrices and compare model realizations in history matching and uncertainty analysis, particularly in reservoir modeling (Suzuki et al., 2008;Scheidt and Caers, 2009a, b;Park et al., 2013).These include the Hausdorff distance which, similar to our approach, directly compares the geometry of different structural model realizations but also more sophisticated measures that calculate distances in realizations based on flow model responses from a transfer function.

Summary and conclusions
Prior work has demonstrated the effectiveness of information entropy in assessing model uncertainties and providing valuable insight into the geological information used to constrain a 3-D model.Wellmann and Regenauer-Lieb (2012), for example, evaluated how additional information reduces uncertainty and helps to constrain and optimize a geological model using the measure of information entropy.Their approach focused on a hypothetical scenario of newly added borehole data and cross-section information to a synthetic model.In the present study, information entropy and, in addition, model dissimilarity was used to assess the impact of newly acquired data on model uncertainties using actual siteinvestigation data in the complex geological setting of a real case.
We presented a new workflow and methods to describe the effect of data integration on model quality, overall structural understanding of the subsurface and model geometry.Our results provide a better understanding of how model quality can be assessed in terms of uncertainties in a data acquisition process of an exploration campaign, showing that information entropy and model dissimilarity are powerful tools to visualize and quantify uncertainties, even in complex geological settings.The main conclusions of this study are as follows: 1. Average and fuzzy set entropy can be used to evaluate uncertainties in 3-D geological modeling and, therefore, support model improvement during a consecutive data integration process.We suggest that the approach could be used to also perform a cost-benefit analysis of exploration campaigns.
2. The study confirms that 3-D visualization of information entropy can reveal hot spots and changes in the distribution of uncertainty through newly added data in real cases.The method provides insight into how additional data reduce uncertainties in some areas and how newly identified geological structures may create hot spots of uncertainty in others.Furthermore, the method stresses that parsimonious models can locally underestimate uncertainty, which is only revealed after new data are available and being considered.
3. Dissimilarities in model geometry across different sets of model realizations can effectively be quantified and evaluated by a single value using the city-block distance.A combination of the concepts of information entropy and model dissimilarity improves uncertainty assessment in 3-D geological modeling.
However, some limitations of the presented approach are noteworthy.Although it was designed to assess uncertainties in the position and thickness of horizons, uncertainty in orientation could only be included indirectly through perturbations based on alternative surface interpretations but not by explicit dip and azimuth parameter values indicated for this purpose.This may result in a systematic underestimation of uncertainties at greater depths of the model domain.Furthermore, our study site (Vorbergzone) is a highly fragmented geological entity, and epistemic uncertainties due to missing information about unidentified but existing geological structures are likely substantial.
Future work should therefore aim to include "fault block uncertainties" more effectively into the workflow, for example by including multiple fault network interpretations (Holden et al., 2003;Cherpeau et al., 2010;Cherpeau and Caumon, 2015) or by considering fault zones that produce a given displacement by a variable number of faults.Finally, all data of the investigated site were collected prior to our analysis; therefore, additional data were not explicitly collected in order to reduce detected uncertainties within the consecutive models.Applying this approach during an ongoing site investigation could improve the targeted exploration and allow a well-founded cost-benefit analysis through uncertainty hot-spot detection.
Data availability.The underlying research data were collected and provided by the state geological survey, LGRB.They are freely available in the form of two extensive reports (LGRB, 2010(LGRB, , 2012) ) summarizing the findings of the exploration campaigns conducted in the city of Staufen (Germany).Both reports can be downloaded from http://www.lgrb-bw.de/geothermie/staufen.Since the size of the simulation data sets is too large for an upload, the authors encourage interested readers to contact the coauthors.
The Supplement related to this article is available online at doi:10.5194/se-8-515-2017-supplement.

Figure 1 .
Figure 1.Study site and location of the model area and area of interest (AOI).

Figure 2 .Figure 3 .
Figure 2. Data categories and geological input data used to build four initial 3-D geological models.The green square indicates the area of interest (AOI), where data were extracted for further analysis.For geological formation color code, see Fig. 1.

Figure 5 .
Figure 5. Distance measures used to calculate dissimilarities between models (M1, M2).(a) Jaccard distance (d JAC ) using a true/false binary function and (b) normalized city-block distance based on a probability function.
Figure 6.(a) Cross section through the AOI of all four initial geological models with projected borehole tracks (black lines) and 3-D representations of (b) Model 1 and (c) Model 4.

Figure 7 .
Figure 7. Cross section through Models 1 and 4. The multiple lines show 30 model realizations with shifted faults and horizons (for the location of the cross sections, see Fig. 6).The horizontal lines indicate the land surface (purple) and the base of the Quaternary (blue).
Figure 8. 3-D view of the AOI with a discretization of 5 m for (a) average information entropy H M of Models 1 and 4 and (b) change in entropy H x between both models.

Figure 9 .
Figure 9. Average entropy H M calculated for the different models (mean and median) and for subsets of the model space of each model (S km1 , S ku , S mo ).

Figure 10 .
Figure 10.Fuzzy set entropy H U of the targeted geological units km1, ku and mo of the different models.

Figure 11 .
Figure 11.Dissimilarities between the different models expressed by (a) Jaccard distance and (b) city-block distance.