Data assimilation and uncertainty assessment in 3 D geological modeling

The quality of a 3D geological model strongly depends on the type of integrated geological data, their interpretation and associated uncertainties. In order to improve an existing geological model and effectively plan further site investigation, it is of paramount importance to identify existing uncertainties within the model space. Information entropy, a voxel based measure, provides a method for assessing structural uncertainties, comparing multiple model interpretations and tracking changes across consecutively built models. The aim of this study is to evaluate the effect of data assimilation on model uncertainty, model 5 geometry and overall structural understanding. Several geological 3D models of increasing complexity, incorporating different input data categories, were built for the study site Staufen (Germany). We applied the concept of information entropy in order to visualize and quantify changes in uncertainty between these models. Furthermore, we propose two measures, the Jaccard and the City-Block distance, to directly compare dissimilarities between the models. The study shows that different types of geological data have disparate effects on model uncertainty and model geometry. The presented approach using both 10 information entropy and distance measures can be a major help in the optimization of 3D geological models.


Introduction
Three dimensional (3D) geological models have gained importance in structural understanding of the subsurface and are increasingly used as a basis for scientific investigation (e.g., Butscher and Huggenberger, 2007;Caumon et al., 2009;Bistacchi et al., 2013;Liu et al., 2014), natural resource exploration (e.g., Jeannin et al., 2013;Collon et al., 2015;Hassen et al., 2016), decision-making (e.g., Campbell et al., 2010;Panteleit et al., 2013;Hou et al., 2016) and engineering applications (Hack et al., 2006;Kessler et al., 2008).3D geological models are favorable over 2D solutions due to their high data consistency and superior data visualization.Moreover, they enable the integration of many different types of geological data such as geological maps, cross-sections, outcrops, boreholes as well as data from geophysical (e.g., Boncio et al., 2004) and remote sensing methods (e.g., Schamper et al., 2014).Nevertheless, input data are often sparse, heterogeneously distributed or poorly constrained.In addition, uncertainties from many sources such as measurement error, bias and imprecisions, randomness and lack of knowledge are inherent to all types of geological data (Mann, 1993;Bárdossy and Fodor, 2001;Culshaw, 2005).Furthermore, assumptions and simplifications are made during data collection, and subjective interpretation is part of the modeling process (Bond, 2015).Hence, model quality strongly depends on the type of integrated geological data and its associated uncertainties.
In order to assess the quality and reliability of a 3D geological model as objectively as possible, it is essential to address underlying uncertainties.Numerous methods have recently been proposed that enable estimates, quantification and visualization of uncertainty (Tacher et al., 2006;Wellmann et al., 2010;Lindsay et al., 2012Lindsay et al., , 2013Lindsay et al., , 2014;;Lark et al., 2013;Park et al., 2013;Kinkeldey et al., 2015).A promising approach is based on the concept of information entropy (Shannon, 1948).Wellmann and Regenauer-Lieb (2012) applied this concept to 3D geological models.In their study, they evaluated uncertainty as a property of each discrete point of the model domain by quantifying the amount of missing information with regard to the position of a geological unit (Wellmann and Regenauer-Lieb, 2012).They consecutively added new information to a 3D model and compared uncertainties between the resulting models at discrete locations and as an average value for the total model domain using information entropy as a quantitative indicator.Through their approach, they addressed two important questions: 1) How is model quality related to the available geological information and its associated uncertainties; and 2) how is model quality improved through incorporation of new information?
Wellmann and Regenauer-Lieb (2012) illustrated their approach using synthetic 3D geological models, showing how additional geological information affects model uncertainty.The present study goes a step further.It applies the concept of information entropy as well as model dissimilarity to a real case, namely the city of Staufen, Germany at the eastern margin of the Upper Rhine Graben.In contrast to the previous study, the present study evaluates the effects of consecutive addition of data from different data categories to an existing model on model uncertainty and overall model geometry.We hypothesized that disparate effects of different data types on model uncertainty exist, and that quantification of these effects provides a trade-off between costs (i.e.data acquisition) and benefits (i.e.reduced uncertainty and therefore higher model quality).Thus, several 3D geological models of the study site were consecutively built with increasing complexity; each of them based on an increasing amount of (real) categorized data.An approach was developed that uses information entropy and model dissimilarity for quantitative assessment of uncertainty in the consecutive models.Results indicate that the approach is applicable for complex and real geological settings.The approach has large potential as a tool to support both model improvement through data assimilation and cost-benefit analyses of geological site investigations.

Study site
The city of Staufen suffers from dramatic ground heave that resulted in serious damage to many houses (South-West Germany, Fig. 1).Ground heave with uplift rates exceeding 10 mm month -1 started in 2007 after seven wells were drilled to install borehole heat exchangers for heating the local city hall (LGRB, 2010).After more and more houses in the historic city center showed large cracks, an exploration program was initiated by the State Geological Survey (LGRB) in order to investigate the case.Results showed that the geothermal wells hydraulically connected anhydrite-bearing clay rocks with a deeper aquifer, and resulting water inflow into the anhydritic clay rock triggered the transformation of the mineral anhydrite into gypsum (Ruch and Wirsing, 2013).This chemical reaction is accompanied by a volume increase that leads to rock swelling, a phenomenon typically encountered in tunneling in such rock (e.g., Einstein, 1996;Anagnostou et al., 2010;Butscher et al., 2011bButscher et al., , 2015;;Alonso, 2012), bur recently also observed after geothermal drilling (Butscher et al., 2011a;Grimm et al., 2014).The above mentioned exploration program aimed not only at finding the cause of the ground heave, but also at better constraining the complex local geological setting.The hitherto existing geological data were not sufficient to explain the observed ground heave, locate the geological units that are relevant for rock swelling, and plan counter measures.
Staufen is located west of the Black Forest at the eastern margin of the Upper Rhine Graben (URG).It is part of the "Vorbergzone" (Genser, 1958), a transition zone between the Eastern Main Border Fault (EMBF) of the graben and the graben itself.This zone is characterized by staggered fault blocks that got trapped at the graben margin during opening and subsidence of the graben.The strata of this transition zone are often steeply inclined or even vertical (Schöttle, 2005), and are typically displaced by west-dipping faults with a large normal displacement.The fault system, kinematically linked to the EMBF, has a releasing bend geometry and today experiences sinistral oblique movement (Behrmann et al., 2003).The major geological units at the site comprise Triassic and Jurassic sedimentary rocks, which are covered by Quaternary sediments of an alluvial plain in the south (Sawatzki and Eichhorn, 1999) (Fig. 1).
Three geological units play an important role for the swelling problem at the site: the Triassic Gipskeuper ("Gypsum Keuper") formation, which contains the swelling zone; and the underlying Lettenkeuper formation and Upper Muschelkalk formation, which are aquifers providing groundwater that accesses the swelling zone via pathways along the BHE.The Gipskeuper formation consists of marlstone and mudstone, and contains the calcium-sulfate minerals anhydrite (CaSO4) and gypsum (CaSO4 + H2O).The thickness of this formation varies between 50-165 m, with an average thickness of 100-110 m (LGRB, 2010), depending on the degree of leaching of the sulfate minerals close to the ground surface.It is underlain by the Lettenke-3 Methods

Input data
Input data for the 3D geological modeling include all available geological data that indicate: 1) boundaries between geological units, 2) presence of geological units and faults at a certain positions and 3) orientation (dip and azimuth) of the strata.These data were classified into four categories (Fig. 2): 1) non-site specific, 2) site specific, 3) problem direct specific data and 4) indirect problem specific data.

Non-site specific
Input data:

Additional data:
Seismic exploration e -Profil 1 5 Geological data and information at local to regional scale.
Geological data with direct reference to the area of interest (AOI).
Geological data with direct reference to the AOI and collected explicitly to address the swelling problem.The non-site specific data category comprise geological data that are generally available from published maps (Sawatzki and Eichhorn, 1999), literature (Genser, 1958;Groschopf et al., 1981;Schreiner, 1991) and the database of state geological survey (LGRB).Furthermore, a Digital Terrain Model (DTM) of 1 m grid size is included in the non-site specific data.Outcrop and borehole data are mostly scarce and irregularly distributed in space.The site specific data comprise drill logs of the geothermal drillings, which provided a pathway for uprising groundwater that finally triggered the swelling.Problem specific data comprise all data collected during the exploration program that was conducted after heave at the ground surface caused damage to the local infrastructure (LGRB, 2010(LGRB, , 2012)).This exploration program was initiated, because geological knowledge of the site was insufficient for an adequate understanding of the swelling process in the subsurface; and for planning and implementing suitable counter measures.The problem specific data were further divided into direct data from drill cores of the three exploration boreholes (Fig. 2; EKB 1 + 2 and BB 3), which add very accurate point information; and indirect data from a seismic campaign (Fig. 2; Profile 1-5), which add rather "fuzzy" 2D information that has to be interpreted.

3D geological modeling
The 3D geological models were constructed using the geomodeling software SKUA/GoCAD ® 15.5 by Paradigm.They cover an area of about 0.44 km 2 and have a vertical extent of 665 m.A smaller area of interest (AOI, 300 m × 300 m, 250 m vertical extent) was defined within the model domain, including the drilled wells and the area, where heave at the ground surface was observed and the problem specific data were collected.
The strata of the models cover 10 distinct geological units including Quaternary sediments, Triassic and Jurassic bedrock and crystalline basement at the lower model boundary (Fig. 3).The Triassic strata is further divided (from top to bottom) into four formations of the Keuper (Steinmergelkeuper, Schilfsandstein, Gipskeuper and Lettenkeuper), two formations of the Muschelkalk (Upper Muschelkalk, Middle to Lower Muschelkalk) and the Bundsandstein formation.Figure 3 provides an overview over the modeled geological units and average thicknesses used in the initial models.Four initial models were consecutively build, according to the four previously described data categories.Model 1 was constructed based only on non-site specific data (maps, literature, etc.); Model 2 additionally considered site specific data (drill logs of the seven geothermal drillings); Model 3 also included direct problem specific data (exploration boreholes); and finally, Model 4 included indirect problem specific data (seismic campaign).Through this approach, data density and model complexity increase from Model 1 to 4; and the models required successively higher efforts in data acquisition.
For each initial model, representative boundary surfaces between geological units that match the input data were built, using an explicit modeling approach (Caumon et al., 2009).We used the Discrete Smooth Interpolation (DSI) provided by GoCAD ® as the interpolation method (Mallet, 1992), which resulted in Delaunay-triangulated surfaces for both horizons and faults.
Subsequently, based on the explicitly constructed surfaces, a volumetric 3D model was built by implicit geological modeling, implemented in the software SKUA ® .The implicit modeling approach uses a potential field interpolation considering the orientation of strata (Lajaunie et al., 1997;Calcagno et al., 2008), and is based on the U-V-t concept (Mallet, 2004), where horizons represent geochronological surfaces.

General approach
Our approach for assessing uncertainties of the 3D geological models consists of four distinct steps (Fig. 4): (I) Building the initial 3D geological models of increasing data density and complexity (see above).
(II) Definition of fault and horizon uncertainties.Horizon uncertainties were specified in SKUA ® by a maximum displacement parameter or by alternative surface interpretations, resulting in a symmetric envelope of possible surface locations around the initial surface.Constant displacement values were assigned in order to account for uncertainties in formation thickness and boundary location.Alternative surface interpretations are based on a maximum deviation in dip and azimuth (± 5 • ) from the initial surface.To constrain the shape of generated horizons, SKUA ® uses a variogram that spatially correlates perturbations applied to the initial surfaces (Paradigm, 2015).Fault uncertainties were defined by a maximum displacement parameter and a Gaussian probability distribution around the initial fault surface.
(III) Creation of 30 model realizations for each initial model based on the above defined surface variations, applying the Structure Uncertainty workflow of SKUA ® .
(IV) Extraction of the geological information from all model realizations for analysis, comparison and visualization.For this purpose, the AOI was divided into a regular 3D grid of 5 m cell size, resulting in 180000 grid cells.The membership of a grid cell to a geological unit was defined as a discrete property of each grid cell and extracted for all 30 model realizations.
Based on these data, we calculated the probability of each geological unit being present in a grid cell in order to derive the information entropy at the level of: 1) a single grid cell, 2) a subset representing the area of extent of a geological unit and 3) the overall AOI.Furthermore, the fuzzy set entropy was calculated to determine the ambiguousness of the targeted

Information entropy
The concept of information entropy (or Shannon entropy) was first introduced by Shannon (1948) and is well known in probability theory (Klir, 2005).It quantifies the amount of missing information and hence, the uncertainty at a discrete location x, based on a probability function P of a finite data set.When applied to geological modeling, information entropy expresses the "degree of membership" of a grid cell to a specific geological unit.In other words, information entropy quantitatively describes how unambiguously the available information predicts that unit U is present at location x.Information entropy was recently applied to 3D geological modeling by Wellmann et al. (2010) and Wellmann and Regenauer-Lieb (2012) in order to quantify and visualize uncertainties introduced by imprecision and inaccuracy of geological input data.A detailed description of the method can be found in the cited references, and is briefly summarized here.
By subdividing the model domain into a regular grid, a discrete property can be assigned to any cell at location x in the model domain.In a geological context, the membership of a grid cell to a geological unit U can be defined as such a property by an indicator function: Applied to all n realizations of the model space M, the indicator function yields a set of n indicator fields I with each of them defining the membership of a geological unit as a property of a grid cell.Considering the combined information of all indicator fields, it follows that membership is no longer unequivocally defined at a location x and hence has to be expressed by a probability function P U : The probability of occurrence P U for each geological unit of a model domain can be used to obtain the uncertainty (or amount of missing information) associated with a discrete point (grid cell) by calculating the information entropy H (Shannon, 1948): In a next step, total information entropy H T can be calculated as an average value of H over the entire model space: where H T = 0 denotes that the location of all geological units is precisely known (no uncertainty), and H T is maximum for equally distributed probabilities of the geological units (P 1 = P 2 = P 3 = ...), which means that a clear distinction between geological units within the model space is not possible.
Information entropy can also be applied to only a subset of the model space: Wellmann and Regenauer-Lieb (2012) also adapted fuzzy set theory (Zadeh, 1965) in order to assess how well-defined a single geological unit is within a model domain.A fuzzy set of n model realization introduces a certain degree of indefiniteness to a discrete property (e.g.membership of a geological unit), resulting in imprecise boundaries which can be referred to as fuzziness.The fuzziness of a fuzzy set (De Luca and Termini, 1972) in the context of a geological 3D model can be quantified by the fuzzy set entropy H u (Leung et al., 1992;Yager, 1995): where the probability function p u (x) with an interval [0,1] represents the degree of membership of a grid cell to a fuzzy set.H u equals 0 when p u is either 0 or 1 everywhere within the set; and H u equals 1 when all cells of the set have an equal probability of p u = 0.5.

Model dissimilarity
The step-wise addition of input data to the models (see section 3.1) not only affects uncertainties associated with a geological unit, but also the geometry of the units, and therefore their position, size and orientation in space.New data may significantly change the geometry of a geological unit but only marginally change the overall uncertainty.Thus, both model uncertainty and dissimilarity should be evaluated.In order to quantify the dissimilarity (D) between consecutive models, two measures, the Jaccard and the City-block distance (Fig. 5), are proposed to complement information entropy.
Given a geological model set M consisting of n model realizations, the membership of a grid cell at location x to a geological unit U as a subset (U ⊆ M ) can be defined by an indicator function I U , conditioned by the probability p u : The overlap or similarity in position of a geological unit between two models u i and u j can then be calculated with the Jaccard similarity measure (Webb and Copsey, 2003): where a defines the size of the union (overlap) between two subregions of identical property, and N ij = a+b+c their intersection, with: a = number of occurrences of q i = 1 and q j = 1 b = number of occurrences of q i = 1 and q j = 0 Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.c = number of occurrences of q i = 0 and q j = 1 Accordingly, the dissimilarity between models can be expressed by the Jaccard distance: Even though the use of binary dissimilarities is straight forward and suitable to quantify absolute change between models, it does not account for fuzziness (c.f., section 3.3.2).Hence, the dissimilarity may be overestimated by the Jaccard distance.In order to include fuzziness, the normalized City-Block distance was employed, adopting the probability function P u to compare dissimilarity of a sub-region (geological unit) between two models (i,j) (Webb and Copsey, 2003;Paul and Maji, 2014): where N is the combined number of cells in the sub-regions u i and u j .The distance is greatest for d N CB = 1.

Initial 3D models
The four consecutively constructed initial models show a step-wise increase in structural complexity (Fig. 6).In Model 2, horizon positions of the Schilfsandsteinkeuper (km2), Gipskeuper (km1) and Lettenkeuper (ku) were locally constrained by site-specific information provided by drill logs of the geothermal wells, slightly impacting fault displacement and thickness of the formations.However, changes in model geometry were minor, as no further information on horizon orientations was available and no additional faults could be located.With addition of the direct problem specific data from the exploration wells to Model 3, a Horst-Graben structure was identified that entailed a considerable displacement at a reverse (> 120 m) and a normal fault (70 m) north-west of the wells.Furthermore, the drill logs included orientation measurements of the strata, resulting in a shift in position and inclination of layers, compared to the previous models.Thus, large parts of the model domain within the AOI changed from Model 2 to Model 3 and, as a consequence, dissimilarities between these models are particularly high (cf., section.4.4).Finally, Model 4, which included data from a seismic campaign, has the highest degree of structural complexity.However, seismic surveys are inherently equivocal and allow alternative interpretations, especially concerning the orientation and number of faults as well as their connection to fault networks (Røe et al., 2014;Cherpeau and Caumon, 2015;Julio et al., 2015).In our case, the indirect problem specific data from the seismic 2D survey located several additional faults within the AOI, and in some cases caused a shift in position of faults compared to Model 3. The AOI was strongly fragmented by the added faults, and the orientation of layers is no longer uniform but varies strongly between fault blocks.In summary, the step-wise integration of data according to the four data categories improved our general knowledge of subsurface structures at the study site (Fig. 2).In addition, the effect of data integration from different exploration stages on modeled subsurface geometry could be evaluated and visualized.

Multiple model realizations
The multiple (30) model realizations created by the Structural Uncertainty workflow of SKUA are illustrated in Fig. 7 using 2D cross-sections of Model 1 and 4 as examples.In Model 1, the non-site specific data set minimal constraints, resulting in faults and horizons of the realizations that are widely dispersed but parallel.In contrast, the faults and horizons of the Model 4 realizations are more narrowly dispersed where problem-specific data was available within the AOI.The workflow handles equal uncertainties consistently across models by producing a similar pattern of horizontal displacement in Model 1 and Model 4. This can be seen in particular for structures located close to the NW boundary, which were not further constrained by consecutively added geological data.However, it is also apparent from the mostly uniform orientation of the surfaces in the 30 realizations of each model that displacement measures implemented in the Structural Uncertainty workflow did not allow for large variations in dip and azimuth of horizons or faults.Therefore, uncertainty may be systematically underestimated especially at greater depths.

Distribution of information entropy
Information entropy, quantified at the level of individual grid cells, can be visualized in 3D to identify areas of uncertainty and evaluate changes in geometry resulting from data assimilation.Figure 8a shows the distribution of information entropy for Model 1 and 4. It can also be seen that the approach is suitable for locating areas with high degrees of uncertainty, indicated by dark red colors (hot-spots) in this figure.Furthermore, Fig. 8b highlights where additional constraints from the data helped to optimize the model by reducing uncertainties (∆H < O) and whether further constraints are needed in locations of specific interest.
The overall distribution of uncertainty was clearly affected by additional geological information from site and problem specific input data (Model 4).This effect is highlighted by the changes in entropy between the models (Fig. 8b).Additional constraints on horizon and fault boundaries caused a shift in position and orientation of geological units, followed by a large redistribution of uncertainties, indicated by the changes in entropy.It can be seen that new hot-spots of uncertainty were introduced in proximity to the faults identified by the exploration boreholes and the seismic data incorporated into Model 4 (c.f., Fig. 6).However, these new areas of uncertainty can be considered an optimization of the model, because large parts of the preceding Model 1 did not reflect the complex local geology.Model 1 (wrongly) predicted low uncertainties for areas where information on existing structures (i.e.faults) was missing.It is a limitation of the approach that only uncertainty related to existing model structures can be quantified and visualized.Even Model 4 may still underrepresent the true structural complexity at this site.In a risk-assessment and decision-making process, this can be problematic, because low uncertainty areas might be in fact no-information areas.In such a case, the respective model area would actually be highly uncertain.Nevertheless, the approach allows one to assess and visualize uncertainties related to structures that have been identified during site investigation.
To lessen the limitations posed by non-sampled locations, Yamamoto et al. ( 2014) proposed a post-processing method for uncertainty reduction, using multiple indicator functions and interpolation variance in addition to information entropy.However, uncertainty from lack of evidence for a geological structure (e.g.fault), known as imprecise knowledge (Mann, 1993), still depends on the density and completeness of available input data.

Total information entropy
The calculated total information entropy H T of the consecutive models steadily decreases with higher data specificity from Model 1-4 (Fig. 9).Mean values of H T ranged from 0.56 (Model 1) to 0.39 (Model 4), where H T = 0 would denote no structural uncertainty.The decrease from Model 1 to 4 is approximately linear, indicating that all four categories of geological data had a similar impact on overall model uncertainty, even though the added information resulted in quite different model geometries and, as discussed above, in some cases in a local increase in entropy (cf., Fig. 8b).A similar but more pronounced trend was observed for the total mean entropy H Sub of the subsets S km1 , S ku and S mo , which represent the domain of the three geological units that are of particular importance to the swelling problem.However, entropy, i.e. the amount of uncertainty, is considerably higher within the domain of these geological units than for the overall model space, especially for the subsets 5 S ku and S mo , identifying them as areas of a particularly high degree of uncertainty.Note that these units are the aquifers that 14 Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.have been hydraulically connected to the swellable rocks via the geothermal drillings.Nevertheless, all entropy values are comparably moderate, considering that a maximum of (only) five different geological units was found in any one grid cell across all four models, yielding a possible maximum entropy of H T = 2.32 for an equal probability distribution (P 1 = P 2 = P 3 = P 4 = P 5 ).For comparison: if all ten geological units would be equally probable, the maximum entropy would be 3.32.
Furthermore, median values and interquartile range dropped from 0.51 (0-0.99) in Model 1 to 0 (0-0.84) in Model 4. This helps to illustrate that the amount of grid cells with H T = 0 (indicating no inherent uncertainty), increased notably by 34.8 % from 40.6 % (Model 1) to 54.8 % (Model 4); and that the remaining entropies in Model 4 are limited to a considerably smaller number of cells within the model domain.Overall, comparing the pre-to post-site-investigation situations (Model 1-4), site and problem specific investigations were all equally successful in adding information to the model and reducing uncertainties in the area of the targeted horizons.
While the benefits from the different data are equal, the costs in data acquisition (i.e.work, money and time required) may vary considerably, depending on the exploration method (e.g., drillings, seismic survey, etc.).An economic evaluation was not within the scope of this study.Nevertheless, the approach presented could improve cost and benefit analyses by quantifying the gain in information through different exploration stages.

Fuzzy set entropy
The fuzzy set entropy was calculated to indicate how well-defined a geological unit is within the model space.Applied to the swelling problem of our case study, a high degree of uncertainty remains with regard to the position of the relevant geological units (km1, ku, mo) after data assimilation.We obtained fuzzy set entropy values (H U ) ranging between 0.329-0.504(Fig. 10).
The fuzziness of these geological units only slightly changed from Model 1 to Model 4, indicating that higher data specificity did not translate into more clearly defined geological units within the model domain.This can be partially attributed to the In case of the Lettenkeuper formation (unit ku), boundaries are even slightly less well-defined in Model 4 compared to Model 1.This is likely related to the low thickness of the formation (5-10 m, Fig. 3) relative to the mesh size (5 m).A finer grid could reduce this effect; however computation time would increase significantly.Wellmann and Regenauer-Lieb (2012) propose using unit fuzziness to determine an optimal representative cell size and reduce the impact of spatial discretization on information entropy.As previously discussed in section 4.2, our workflow does not consider uncertainties through dip and strike variations, which underestimates the fuzziness of the targeted geological units at greater depths.Thus, overall fuzziness, particularly in Model 1, may be significantly higher than calculated.

Models dissimilarity
A gain in structural information through newly acquired data usually not only impacts model uncertainty but is also associated with a change in model geometry.The calculated distances between models can identify the data category with the strongest impact on model geometry and make it possible to determine whether model geometry and uncertainty are related.Figure 11 shows the calculated Jaccard and City-Block distances between the models with respect to the targeted geological units km1, ku and mo.
Calculated distances between models are rather high, with values of up to 0.78; indicating a pronounced shift in position of the geological units after data was added.The addition of both direct and indirect problem specific data to Model 3 had a strong impact on model geometry, which can be seen by comparing the calculated distances between Model 2, 3 and 4 for both, Jaccard and City-Block (Fig. 11).In contrast, site specific data had a much lower effect, with less than 20 % (0.2) change in unit position, except for ku of the Jaccard distance (see distance between Model 1 and 2).Overall, the City-Block distance, which considers the fuzziness of geological boundaries, shows a similar trend as the Jaccard distance; however changes are much less pronounced, especially for unit ku.According to the low City-Block distance, absolute changes in probability P U for each grid cell are small, whereas high Jaccard distances indicate a large number of grid cells being affected through newly added data.Thus, the Jaccard distance likely overestimated the actual dissimilarity between models.
Comparing unit ku of both distances; the disparity between values hints at a large number of low degree changes in membership of the grid cells (∆P << 1).These predominately low degree changes are likely related to the above mentioned high degree of unit boundary fuzziness; and the resulting, ill defined, geological unit ku being shifted within the model domain.However, a direct comparison of fuzzy set entropy to the corresponding City-Block distance yields no quantifiable relationship between model geometry and structural uncertainty.
Nonetheless, both distance measures allow quantification and assessment of different aspects of dissimilarities and therefore, changes in geometry across models.Yet, the City-Block distance is preferable when sets of multiple realizations are compared, because it factors in the probability of occurrence of a geological unit at a discrete location.In recent years, various distance measures have already been applied in a similar fashion to create dissimilarity distance matrices and compare model realizations in history matching and uncertainty analysis, particularly in reservoir modeling (Suzuki et al., 2008;Scheidt and Caers, 2009a, b;Park et al., 2013).(1) Total and fuzzy set entropy can be used to evaluate uncertainties in 3D geological modeling and, therefore, support model improvement during a consecutive data assimilation process.We suggest that the approach could be used to also perform a cost-benefit analysis of exploration campaigns.
(2) The study confirms that 3D visualization of information entropy can reveal hot-spots and changes in distribution of uncertainty through newly added data in real cases.The method provides insight into how additional data reduce uncertainties in some areas, and how newly identified geological structures may create hot-spots of uncertainty in others.
(3) Dissimilarities in model geometry across different sets of model realizations can effectively be quantified and evaluated by a single value using the City-Block distance.A combination of the concepts of information entropy and model dissimilarity improves uncertainty assessment in 3D geological modeling.
However, some limitations of the presented approach are noteworthy.Although it was designed to assess uncertainties in the position and thickness of horizons, uncertainties in orientation could only be included indirectly with adequate parameters for dip and azimuth.This may result in a systematic underestimation of uncertainties at greater depths of the model domain.Furthermore, our study site (Vorbergzone) is a highly fragmented geological entity, and uncertainties due to missing information about unidentified but existing geological structures may also be underestimated with our approach.
Future work should therefore aim to include "fault block uncertainties" more effectively into the workflow, for example by including multiple fault network interpretations (Cherpeau et al., 2010;Cherpeau and Caumon, 2015) or by considering fault zones that produce a given displacement by a variable number of faults.Finally, all data of the investigated site was collected prior to our analysis; therefore additional data was not explicitly collected in order to reduce detected uncertainties within the consecutive models.Applying this approach during an ongoing site investigation could improve the targeted exploration and allow a well-founded cost-benefit analysis through uncertainty hot-spot detection.

Data availability
The underlying research data was collected and provided by the state geological survey (LGRB).It is freely available in the form of two extensive reports (LGRB, 2010(LGRB, , 2012) )

Figure 1 .
Figure 1.Study site and location of the model area and area of interest (AOI).

Figure 2 .
Figure 2. Data categories and geological input data used to build four initial 3D geological models.The green square indicates the area of interest (AOI), where data was extracted for further analysis.For geological formation color code see Fig. 1.

Figure 3 .
Abbr.q Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.geological units Gipskeuper (km1), Lettenkeuper (ku) and Upper Muschelkalk (mo) within the AOI.Calculations were conducted using the statistics package R (R Core Team, 2016).The underlying concepts and equations used to calculate probabilities and entropies are described in the following section.

Figure 4 .
Figure 4. Uncertainty assessment workflow with four distinct steps.This workflow is applied to four initial models that are based on the different data sets illustrated in Fig. 2.
) Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.H Sub can be used to evaluate the contribution of a specific sub-domain to overall uncertainty.In case of a drilling campaign, for example, the sub-domain can comprise a targeted depth or a geological formation of specific interest.In this study, we used the probability function P U with H Sub conditioned by P U > 0 to define subsets within the model space.Thus, each subset represents the probability space of a geological formation of interest, namely the Lettenkeuper (S ku ), Gipskeuper (S km1 ) and Upper Muschelkalk (S mo ) formation.

Figure 5 .
Figure 5. Distance measures used to calculate dissimilarities between models (i,j).(a) Jaccard distance (dJAC) using a TRUE/FALSE binary function and (b) Normalized City-Block distance based on a probability function.

Figure 6 .
Figure 6.(a) Cross-section through the AOI of all four initial geological models with projected borehole tracks (black lines) and 3D representations of (b) Model 1 and (c) Model 4.

SolidFigure 7 .
Figure 7. Cross-section through Model 1 and Model 4. The multiple lines show 30 model realizations with shifted faults and horizons (for the location of the cross-sections see Fig. 6).The horizontal lines indicate the land surface (purple) and the base of the Quaternary (blue).

SolidFigure 8 .
Figure 8. 3D view of the AOI with a discretization of 5 m for (a) total information entropy H of Model 1 and Model 4 and (b) change in entropy ∆H between both models.

Figure 9 .
Figure 9.Total entropy H calculated for the different models (mean and median) and for subsets of the model space of each model (Skm1, Sku, Smo).

SolidFigure 10 .
Figure 10.Fuzzy set entropy of the targeted geological units km1, ku and mo of the different models.

SolidFigure 11 .
Figure 11.Dissimilarities between the different models expressed by (a) Jaccard distance, and (b) City-Block distance.

5
Summary and conclusionsPrior work has demonstrated the effectiveness of information entropy in assessing model uncertainties and providing valuable insight into the geological information used to constrain a 3D model.Wellmann and Regenauer-Lieb (2012), for example, evaluated how additional information reduces uncertainty and helps to constrain and optimize a geological model using the measure of information entropy.Their approach focused on a hypothetical scenario of newly added borehole data and cross-Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.section information to a synthetic model.In the present study, information entropy and, in addition, model dissimilarity was used to assess the impact of newly acquired data on model uncertainties using actual site investigation data in the complex geological setting of a real case.We presented a new workflow and methods to describe the effect of data assimilation on model quality, overall structural understanding of the subsurface and model geometry.Our results provide a better understanding of how model quality can be assessed in terms of uncertainties in a data acquisition process of an exploration campaign, showing that information entropy and model dissimilarity are powerful tools to visualize and quantify uncertainties, even in complex geological settings.The main conclusions of this study are: summarizing the findings of the exploration campaigns conducted in the Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.city of Staufen (Germany).Both reports can be downloaded from http://www.lgrb-bw.de/geothermie/staufen.Since the size of the simulation datasets is too large for an upload, the authors encourage interested readers to contact the co-authors.Solid Earth Discuss., doi:10.5194/se-2016-174,2017 Manuscript under review for journal Solid Earth Published: 10 January 2017 c Author(s) 2017.CC-BY 3.0 License.