An evaluation of different measures of dynamically recrystallized grain size for paleopiezometry or paleowattometry studies

. Paleopiezometry and paleowattometry studies are essential to validate models of lithospheric deformation and therefore increasingly common in structural geology. These studies require a single measure of dynamically recrystallized grain size in natural mylonites to estimate the magnitude of differential paleostress (or the rate of mechanical work). This contribution tests the various measures of grain size used in the literature and proposes the frequency peak of a grain size distribution as the most robust estimator for paleopiezometry or paleowattometry studies. The novelty of the approach resides in the use of the Gaussian kernel density estimator as an alternative to the classical histograms, which improves reproducibility. A free, open-source, easy-to-handle script named GrainSizeTools (https://sourceforge. net/projects/grainsizetools/) was developed with the aim of facilitating the adoption of this measure of grain size in paleopiezometry or paleowattometry studies. The major advantage of the script over other programs is that by using the Gaussian kernel density estimator and by avoiding manual steps in the estimation of the frequency peak, the reproducibility of results is improved.


Introduction
Dynamic recrystallization was originally defined by Poirier and Guillopé (1979) as "a deformation-induced reworking of grain sizes, shapes and/or orientations with little or no chemical change". Because this definition does not make a clear distinction of how to discriminate between dynamic recrystallization and metamorphic reaction in some cases (e.g. feldspar neocrystallization during deformation), it was later re-defined by Stunitz (1998) as "the reconstruction of crys-talline material without a change in chemical composition driven by strain energy in the form of dislocations". Two processes of dynamic recrystallization have been identified (see a review in Urai et al., 1986 and references therein): (1) grain boundary migration  and (2) progressive sub-grain rotation (Poirier and Nicolas, 1975). The activation of these recrystallization processes depends on several factors, such as temperature, pressure, strain rate and presence of fluids. The interaction between both recrystallization types produces three identifiable types -when host grain occurs -of dynamic recrystallization microstructures: (1) bulging recrystallization, (2) sub-grain rotation and (3) grain boundary migration (see Stipp et al., 2002 for details).
Paleopiezometers are structural features of deformed rocks that vary with the magnitude of the applied differential stress under which they formed and that therefore provide a means of determining the magnitude of the paleostress (Twiss and Moores, 2007). Specifically, these methods apply to rocks deformed by dislocation creep, a dominant deformation mechanism in the mid-lower crust. Three different microstructural elements can be potentially used to infer stress: (i) the dislocation density (Goetze and Kohlstedt, 1973;Mercier et al., 1977;Kohlstedt and Weathers, 1980), (ii) the mean sub-grain diameter (Bird et al., 1969;Twiss, 1977;Mercier et al., 1977;Karato et al., 1980) and (iii) the mean dynamically recrystallized grain size (Twiss, 1977;Mercier et al., 1977). The recrystallized grain size is the most reliable and easily measurable microstructural feature and by far the most frequently used feature to estimate paleostress in mylonites. The estimations of differential stress assume that rocks under study deform by dislocation creep at constant stress and that they reach a mechanical steady-state creep. During this stage, rocks evolve to a stable mean grain size, as it has been experimentally demonstrated in different materials and rocks (Means, 1983;Ree, 1991;Pieri et al., 2001;Barnhoorn et al., 2004;Stipp et al., 2006) as well as inferred in natural mylonites (e.g. Michibayashi, 1993;Herwegh et al., 2005).

Published by Copernicus
Currently, there is no universally accepted theory to explain the relationship between dynamically recrystallized grain size and deformation conditions (see De Bresser et al., 2001;Austin and Evans, 2007;Shimizu, 2008;Platt and Behr, 2011). Some experimental studies suggest an inversely related relation between the mean grain size (D) and the differential stress of the type D = Aσ −m , where A and m are material and mechanism-specific constants, with the mean grain size independent of temperature, water content, strain rate and total strain (Luton and Sellars, 1969;Nicolas and Poirier, 1976;Twiss, 1977;Gillopé and Poirier, 1979;Schmid et al., 1980;Rutter, 1995;Stipp and Tullis, 2003;Stipp et al., 2006). In contrast, other authors suggest that dynamically recrystallized mean grain size shows a temperature dependency (e.g. Mercier et al., 1977;Ross et al., 1980;Tungatt and Humphreys, 1984;De Bresser et al., 1998;Ter Heege, 2002;Ter Heege et al., 2005;Shimizu, 2008) or that it is determined by the rate of mechanical work (i.e. the product of stress and strain rate) (Ter Heege, 2002;Austin and Evans, 2007). The latter implies that the estimation of the mean grain size is not a paleopiezometer but a paleowattometer (Austin and Evans, 2007).
Despite the disagreement in establishing the controlling variables and the differences in understanding of the underlying physical processes, the estimation of a representative measure of grain size and the grain size distribution in dynamically recrystallized mylonites remains two of the most important microstructural features to determine. The mean grain size or other single measures of grain size are necessary for paleopiezometer or paleowattometer studies, while the grain size distribution provides additional rheological information to unravel the contribution of different deformation mechanisms during deformation (e.g. Ter Herwegh et al., 2005). Given that direct measurement of differential stresses within the lower crust or the lithospheric mantle remains elusive (see Kozlovsky, 1987, andEmmermann andLauterjung, 1997, for the current limits reached so far), paleopiezometry and paleowattometry studies of ancient mylonitic rocks, being meanwhile exhumed, are the key to constrain indirectly overall strength in the lower crust and the lithospheric mantle and, therefore, to provide mechanical constraints in the modelling of lithospheric deformation.
Briefly, the methods for measuring grain size can be separated into three groups (Berger et al., 2011): (i) 1-D data (i.e. line intercept methods, number of grains per unit area or grain boundary density), (ii) 2-D data (based on the estimation of individual grain features using image analysis tools) and (iii) 3-D methods (computed tomography, serial sectioning). In the early studies, dominated by 1-D methods, usually the mean grain size was reported. This established a tendency and nowadays the mean grain size is by far the most used parameter in paleopiezometry studies. Other parameters were also used, such as the median grain size (geometric mean) and lately, with the rise of 2-D methods, the peak of the frequency of the distribution (usually poorly referred to as the mode). A question remains about which single measure of grain size is the best estimator for paleopiezometry or paleowattometry studies. As an example, Ranalli (1984) proposed the use of the median over the mean grain size based on probabilistic considerations, but the median grain size is barely used in the literature (e.g. Post and Tullis, 1999;Stipp and Tullis, 2003;Ter Heege et al., 2005;Shimizu, 2008). Because the key in paleopiezometry is reproducibility across studies (see for example Stipp et al., 2010), it is necessary to address this question.
The main objective of this study is to decide which single measure of grain size, obtained by 2-D methods, is the best for paleopiezometer or paleowattometer studies. The starting point of the analysis are the different measures of grain size that can be obtained once the data have been acquired from thin sections. For data acquisition, segmentation of grains and the derivation of the actual grain size distribution from thin sections the reader is referred to the extensive literature in the topic (e.g. Heilbronner and Bruhn, 1998;Higgins, 2000Higgins, , 2006Berger et al., 2011;Heilbronner and Barret, 2014).
To test the different measures of grain size previously used in literature, we propose a new tool -a script -to process the data produced by any image analysis software (ImageJ or equivalent). There are free software or scripts to estimate the grain size population and other parameters based on 2-D approaches and stereological considerations; for example the StripStar script (Heilbronner and Bruhn, 1998) and the CSD-Corrections program (Higgins, 2000). However CSDCorrections is neither open source, which limits the implementation of new methods, nor cross-platform. The StripStar script is written in a programming language (Fortran) that needs to be compiled each time a modification is introduced, which eventually penalizes productivity. In addition, it requires the use of other applications to obtain the graphical output or to allow further data treatment. Finally, both applications are not specifically designed to generate a single measure of grain size from the grain population but to derive the actual grain size population from 2-D measures.
The script implemented meeting the following criteria: (i) it is written in a free and easy-to-access programming language that runs all different platforms (Windows, Mac OS X, Linux, Unix); (ii) its use does not require any knowledge of programming to use (i.e. user-friendly); (iii) it provides a free and open-source code organized in a modular way, making it easier to modify, reuse or extend the code; and (iv) it M. A. Lopez-Sanchez and S. Llana-Fúnez: An evaluation of different measures of grain size 477 produces the numerical results required and ready-to-publish figures.

Deriving grain size from thin sections: a brief review
The following provides a review of how to describe the grain size of individual grains in dynamically recrystallized mylonites, which are the effects of cutting grains in thin sections and the parameters of which can be used to measure the grain size population for paleopiezometric studies. The selected examples introduce some important concepts, such as the cut-section or the intersection probability effects. The reader is referred to the following textbooks for more details (Higgins, 2006;Heilbronner and Barret, 2014).

Why a 2-D approach?
Since we are dealing with real volumes, 3-D grain size estimation would be most desirable. However, 3-D methods are still far from being standard techniques mainly due to being very time consuming and involving relatively expensive data acquisition equipment (synchrotron-or X-ray-based tomography). In addition, 3-D methods are not always applicable, especially in dynamically recrystallized mylonites in which most of the grains in contact are of the same phase (although synchrotron can discriminate crystallographic orientations and therefore deal with this issue).
The most widely used analysis techniques in literature to measure the mean grain size in dynamically recrystallized rocks, ceramics or alloys are 1-D methods, especially the line intercept method (e.g. Abrams, 1971;Karato, 1980;Rutter, 1995;Post and Tullis, 1999). However, these methods only report one value (1-D), usually the mean grain size, without considering the distribution of apparent grain sizes. As shown below, this circumstance severely limits its utility, since the knowledge of the distribution of grain sizes (or apparent grain sizes) allows the user to anticipate some possible issues, such as multimodal population or outliers, that may result in misleading interpretations (e.g. Heilbronner and Bruhn, 1998;Higgins, 2000Higgins, , 2006Berger et al., 2011). Another severe limitation is that 1-D methods only apply in case of monomineralic aggregates or samples (at thin-section scale). Finally, 1-D grain size measures cannot be used to derive the actual distribution of grain sizes from thin sections.
Nowadays, with the increase in computer power, the more efficient way to estimate the grain size features in deformed rocks is the use of 2-D methods (see Heilbronner and Bruhn, 1998;Herwegh, 2000;Berger et al., 2011). These methods segment grain sectional areas, allowing the calculation of grain area, grain orientations, grain elongations and grain surfaces via image analysis techniques. To perform this task, there are several free image-processing programs available, such as ImageJ (Scheider et al., 2012). These programs generate the full distribution of grain sectional diameters -strictly speaking very close to full due to optical and image resolution limitations (Heilbronner and Bruhn, 1998;Heilbronner, 2000;Herwegh, 2000;Berger et al., 2011). This 2-D approach has well-known limitations in order to obtain 3-D grain size information from 2-D measures (see Higgins, 2000Higgins, , 2006Heilbronner and Barret, 2014). It requires the assumption that all crystals have the same simple shape. Furthermore, if considered objects have shapes more complex than a sphere, there is a non-unique solution to derive the actual distribution of grain sizes from 2-D measurements (e.g. Higgins, 2000).

Defining the size of individual grains
To obtain a grain size value from a thin section it is first necessary to choose a correct parameter to describe the 2-D size of the grains. There are several parameters, such as the mean calliper diameter, the major or minor axis of an ellipse fitted to an individual grain or the maximum length (see Exner, 1972;Higgins, 2006;Heilbronner and Barret, 2014). When particles are expected to be spherical or close to a spherical shape (near-equant grains), its 3-D size can be uniquely characterized by diameter (or the average diameter when they are not perfect spheres). In this case, a common way to proceed is to convert the cross-sectional area of each individual grain into an individual 1-D length via the equivalent circular diameter (d) (Heilbronner and Bruhn, 1998;Herwegh, 2000;Berger et al., 2011): This assumption is acceptable most of the time for some of the most common dynamically recrystallized non-tabular grains in crustal and mantle shear zones, such as olivine, quartz, feldspar or calcite, at least when the main type of dynamic recrystallization is not "fast" grain boundary migration. The limitation of the latter consideration is due to the potential formation of lobate-shaped grains. For this reason, the equivalent circular diameter seems to be a good descriptor of grain size in such cases and it is often used in studies dealing with dynamically recrystallized grain size (e.g. Heilbronner and Bruhn, 1998;Berger et al., 2011).

Measuring the grain size in monodisperse populations: the cut-section effect
The simplest end-member distribution where all grains have identical shape (in this case spheres) and size is called a monodisperse distribution. In this model, it is also assumed that the grains are randomly distributed within the rock volume (statistically homogeneous) and with no interaction between them (i.e. there is no "grain packing"). When a monodisperse aggregate of grains is cut randomly, as may be the case in a thin section, the intersection plane rarely cuts R is the radius, a the section or chord length of a random section and r the apothem, this is, the distance from the centre of the circle to the midpoint of the section length. The chord length corresponds with the apparent diameter and ranges between 0 and the actual diameter of the grain when the section cuts through the centre of the circle.

(b)
Example showing an apparent section that corresponds to half of the actual diameter. The sketch illustrated that when a sphere is cut through randomly, it is most likely (P = 0.87) to obtain diameters larger than half of the diameter than the opposite.
exactly through the centre of each grain and the cut-section effect occurs (Fig. 1). In this case, the diameters obtained from the sectional areas represent a population of apparent diameters that can theoretically vary between 0 and the actual diameter of the grains (Fig. 1). In perfect monodisperse populations, the maximum diameter obtained from the data set would be the closest to the actual 3-D diameter. In fact, when monodisperse populations of grains are expected, the largest grain size in thin section can be theoretically used as a single measure of grain size. The accuracy of this estimate depends on choosing an appropriate sample size for the accuracy we are looking for (see Appendix A for details). Unfortunately, dynamic recrystallization in rocks, ceramics and alloys produces a continuous range of dynamically recrystallized grain sizes instead of a unique grain size (see later). Hence, this parameter is not useful as a single descriptor of grain size and will not be considered anymore. An alternative approach is to look at the distribution of apparent grain diameters obtained from thin sections. There are several ways to represent this type of data (see Higgins, 2006); here two of the most typical will be considered: the numerical density distribution of apparent diameters obtained in a histogram plot, known as the numberweighted plot (Fig. 2a), and the area percentages of equivalent diameters in a bar plot (i.e. the sum of the areas of the grains respect to the total for each grain size interval defined) (Fig. 2b), known as the area-weighted plot (Herwegh, 2000;Berger et al., 2011). A number of parameters can quantify the grain size from the apparent grain size populations: the mean grain size and the median grain size, both common using 1-D methods, but also the area-weighted grain size or the frequency peak of the distribution. In contrast to 1-D methods, the representation of data from the 2-D approach allows to envisage additional features of the complete population of apparent grains. As it will be shown later, this is a clear advantage allowing, for example, to visualize whether there is more than one population of grains or other possible sources of bias.
Following the example of a monodisperse population of grains, Fig. 2 shows that when a group of spheres is cut through randomly and assuming that there is no packing, the probability of cutting sections at different size intervals is not equal. This situation does not produce a uniform distribution of grain sizes but a unimodal one (Fig. 2). Figure 2a also reveals that the probability of cutting sections with lengths close to the actual diameter of the grain is always higher. These theoretical distributions show an extreme case of negative skewness (also known as J-shape distribution) due to the ceiling effect caused by the measured grain sections not being able to exceed a value: the actual diameter. As an example, the probability of obtaining apparent diameters larger than the half of the actual diameter for a random cut is P = 0.87 (Fig. 1b).
Another feature of a population of apparent grain sizes from a monodisperse distribution is that the actual diameter of the grain population is always within or close the most frequent class (or bin) of the histogram, that is, the modal interval ( Fig. 2). Even when smaller bin sizes are selected, there is a significant difference between the modal interval and the next closest, the effect being more pronounced in the case of the area-weighted plot (Fig. 2b). As previously mentioned, the mean, the median, the area-weighted and the frequency peak can all be calculated from this distribution of apparent grain sizes. Their values are not equivalent to each other (see Fig. 2), but when the data set is representative there are quantifiable relations between them. For example, it can be proved that in the case of monodisperse distribution of spheres the actual diameter is 1.28 times the mean of the entire apparent diameter population (or the mean 0.79 times the actual diameter) or the area-weighted mean 0.88 times the actual diameter. This is also the reason why the mean or the median grain size obtained with 1-D methods was sometimes multiplied by a factor to estimate the actual (3-D) size (i.e. to convert the mean grain size into the actual size) (e.g. Exner, 1972;Panozzo, 1982). This strategy is erroneous not only because, as previously pointed out, grain size in dynamically recrystallized mylonites does not follow a monodisperse distribution but also due to other issues that will be seen later (see also Heilbronner and Bruhn, 1998).
The frequency peak of a grain size distribution by the modal interval is one of the statistical parameters also used to characterize the dynamically recrystallized grain size (e.g. Berger et al., 2011). However, the estimation of the modal interval compared to the mean or the median has the drawbacks inherent to the use of histograms: (1) it is necessary to define the same left edge of the bin and the same bin size/width (or number of classes) to yield reproducible results in similar populations, and (2) they are not smooth (i.e. exact values are not known as the data are grouped into classes). To use the frequency peak over the mean, the median or the areaweighted mean grain size in a distribution of grain sizes, it Plots showing the effect of choosing different bin sizes for the same data set. The monodisperse data set is formed by 5997 measurements. The actual grain size, taken as a reference point, was set at 100 and the absolute maximum uncertainty in measure within the data was set at ± 4 (4 % of the actual size). For clarity, only the bins surrounding the actual grain size are shown. The left edge of the histogram was set at 0 and four different bin sizes were used. Note that the middle value of the modal interval is different in all cases although the population is the same. would be useful to overcome these constraints. In practice the actual left edge of the bin is set by the optical and image resolution limitations of the technique applied (Higgins, 2006), which means that varies across different studies. However, the actual theoretical left edge of the bin is 0, so that by setting this lower boundary reproducibility is improved.
The discrete nature of histograms implies that the frequency peak relates to a modal interval instead of to a single value. To convert this interval into a single measure of grain size we have to choose which value within the interval is best. However, in real cases there is no best value within the modal interval and therefore any value can be used (Fig. 3). Some programs (e.g. CSDCorrections) or studies (Berger et al., 2011) take the middle (or central) value of the modal interval.
The other critical factor to consider in histograms is the bin size/width. This parameter is directly related with the precision of the frequency peak in a distribution of grain (or apparent grain) sizes (Fig. 3). If the bin size is large, a gain or loss of accuracy occurs at the expense of precision. However, if a small bin size is selected, then misleading results are more likely (e.g. the bin size must not exceed the measurement uncertainty in data).
In summary, for a particular population of apparent sizes of grains, when the left edge of the bin and the bin size are the same, the limits of the modal interval will also be the same (Fig. 3). If the frequency peak is used as estimator of grain size, these values may be fixed for all apparent grain size populations to improve reproducibility. However, establishing a fixed bin size can produce misleading results due to over-or under-smoothing the appearance of the population. It is necessary to find an adequate balance between reproducible results and a bin size that allows to envisage important features of the population. For this, the implementation of an automatic process -an algorithm -that sets an optimal bin size based on the features of the population under study is ideal.

The Gaussian kernel density estimator (KDE) as an alternative to the histogram
To avoid the implicit discontinuous nature of histograms in identifying the frequency peak in a distribution of grains sizes, an alternative approach is proposed: the Gaussian kernel density estimator. The Gaussian KDE is, like the his- togram, a non-parametric density estimator, but it is smooth and independent of end points (Fig. 4). The interpretation of the data distribution still depends on the bandwidth chosen (the equivalent to the histogram's bin size/width), which strongly influences the shape of the Gaussian KDE and, therefore, the location of the peak value. It is therefore necessary to implement a reliable method to perform this task. In any event, there are plenty of methods to find the optimal kernel bandwidth in literature depending on the expected features of the data set (see Scott, 1992;Turlach, 1993;Bashtannyk and Hyndman, 2001). The use of the Gaussian KDE also has the advantage that it does not provide an interval, as in the case of histograms, but a unique value (the peak value) that represents the most likely value of dynamically recrystallized grain size in a population of grains. This approach prevents from having to choose a value within the modal interval to estimate the frequency peak, ultimately improving reproducibility.

Polydisperse populations: the intersection probability effect
Ideal monodisperse grain size populations, such as the one described in Sect. 2.3, do not exist in natural or experimentally deformed samples. In fact, previous grain size studies of dynamically recrystallized mylonites never show numberweighted plots of apparent grain sizes with J-shaped distributions but just the opposite, i.e. long tailing distributions skewed to the right (see examples in Heilbronner and Bruhn, 1998;Heilbronner and Tullis, 2006;Berger et al., 2011;Heilbronner and Barret, 2014). This indicates that naturally deformed mylonites are systems that show a continuous range of grain sizes instead of a unique value of grain size. Populations of grains that show similar shapes but different values of grain size are referred to as polydisperse systems (e.g. Higgins, 2000). These distributions can be unimodal or multimodal, depending on the number of local frequency peaks within the distribution. When we try to derive 3-D parameters from 2-D measurements in polydisperse systems, another effect to be considered beside the cut-section effect is called the intersection-probability effect (e.g. Higgins, 2000). This effect refers to larger grains being more likely to be represented on a plane section since they are more likely to be hit by the section plane. In other words, grains are hit by the section plane with a probability that is proportional to their diameters.
The case of the multimodal distributions of grain size in mylonites is common in the literature. For example, bimodal distributions are typical of two-phase mylonites, in which the two phases reach under similar deformation conditions a different dynamically recrystallized steady-state mean grain size. Other examples of bimodal distributions at thin-section scale have been reported in monomineralic samples, such as in experimentally deformed Carrara marble samples at large strains and high temperature (Barnhoorn et al., 2004)  . The Gaussian kernel density estimator is a function that stacks a Gaussian "bell curve" on top of each measurement and whose standard deviation, determined by the local probability density, defines the bandwidth (the equivalent to the histogram's bin size/width).

or in
quartzitic mylonites with different slip systems operating simultaneously, in which quartz grains with c-axis close to yaxis (prims) are larger than others with a different crystallographic orientation (Knipe and Law, 1987;Mancktelow, 1897;Heilbronner and Tullis, 2006). The best approach in these cases is to deal with the different populations of grains separately (see examples in Herwegh, 2000;Heilbronner and Bruhn, 1998). However, as discussed below, for some purposes it is useful to represent the area-weighted plot of the two or more mineral phases within the same plot. So far it is not clear what type of continuous polydisperse distribution best describes the steady-state dynamically recrystallized grain size population. Log-normal distributions appears to be one of the most suitable candidates (e.g. Ranalli, 1984;Michibayashi, 1993;Newman, 1994). However, the lack of sufficient studies in this regard prevents us from simulating log-normal grain size population for now, given that the typical parameters that describe natural or experimentally deformed dynamically recrystallized mylonites are unknown and the issue on how grain packing affects the distribution of apparent grain sizes remains unresolved.
For simplicity, a discrete population of grains with two sizes will be considered and then the cut-section and intersection probability effects will be applied to generate a bimodal population of apparent grains (see Appendix B for details). This bimodal discrete model, although unrealistic for dynamically recrystallized mylonites, is useful to show some of the consequences of the application of both effects in the acquisition of size parameters. Another advantage of this model, despite its simplicity, is that the interpretations apply to real cases. Figure 5 shows that the relative abundance of two sizes of grains in a rock is not equal to the relative frequency with which these grains will be observed on a section plane. Thus, the relative frequency of the modal interval, which represents 60 % of the population in number, is not 1.5 times the rel- Figure 5. Grain size distribution plots of a bimodal discrete population consisting of a 60 / 40 % mixture of spherical particles with sizes 100 and 120 respectively. The contribution in volume is 46.5 / 53.5 % respectively. The two local maxima in both plots indicate the existence of two populations of spheres with sizes specified by the local maxima. Both plots show different modal intervals due to the different approaches (see Sect. 2.2). The area-weighted plot locates the modal interval in the population that represents the major/main phase (i.e. the volumetric contribution).
ative frequency of the local modal interval that represents the 40 % fraction in number. This is Delesse's principle (Delesse, 1847(Delesse, , 1848, which shows that V v = A A , where V v is the average volume fraction in the solid rock that is occupied by a mineral (or grain size population) of interest, and A A is the average area fraction on a plane section that is occupied by the same mineral (or grain population). Another consequence is that the modal intervals in number-and areaweighted plots are not the same in this example (Fig. 5), since the area-weighted plot accounts for the area of the grains. In conclusion, the area-weighted plot informs about which of the two populations of grain size is the predominant phase in volume (i.e. the volumetric contribution), which matters for rheology considerations (Heilbronner and Bruhn, 1998;Herwegh et al., 2005Herwegh et al., , 2014. Figure 5 also shows that in bimodal (or multimodal) distributions the mean, the median or the area-weighted mean grain size becomes a meaningless parameter to describe the grain size. In contrast, the local modal intervals (i.e. the local maxima) reflect the frequency peak of the different grain size populations, provided that the resolution of the method allows it.

The script
The script is written in Python, a general-purpose high-level interpreted programming language characterized by a clear syntax and ease of learning. The main advantages of Python programming language are that (i) Python is free and opensource; (ii) the underlying computer language runs on all platforms (Windows, Mac OS X, Linux or Unix); (iii) there is a large number of open and freely available scientific libraries providing an environment for algorithm development, data analysis, data visualization and numeric computation; and (iv) the use of Python is becoming increasingly popular in academia.
The script, named GrainSizeTools, can be downloaded from the permanent site https://sourceforge.net/projects/ grainsizetools/ and requires the three following scientific Python libraries: Numpy and Scipy (Oliphant, 2007) for data analysis and MatplotLib (Hunter, 2007) for plotting. The script produces several types of output, allowing to save the graphical output as bitmap (eight file types to choose) or vector images (five file types to choose). Although the script is designed to produce figures ready for publication, they can be easily customized within the MatplotLib environment (i.e. when the figure is shown by the script and prior to be saved as a file) or by post-editing the vector image in vectorgraphic applications such as Adobe Illustrator, ACDSee Canvas or Inkscape. Another important point is that to use the script there is no need for prior knowledge of the Python language. The steps to estimate the recrystallized grain size are straightforward and a quick tutorial can be found online (http://sourceforge.net/p/grainsizetools/wiki/Home/).

Brief description of the script
The script is organized in a modular way using Python functions. This facilitates the modification, reuse or extension the code and allows specifications of each function. In the specifications, the user will find the assumptions made, the conditions that must be met for the inputs and the result/s obtained for each one.
The script can be divided into three main parts or functions with intuitive and self-explanatory names.
The first part is a function called "importdata" responsible for loading the data set into the memory for subsequent treatment. The data have to be previously stored in a text file such as txt (a datum on each line) or csv (comma-separated values).

M. A. Lopez-Sanchez and S. Llana-Fúnez: An evaluation of different measures of grain size
The second part is a function called "calc_diameters" that returns an array of the diameters calculated from the sectional areas, assuming that the grains are near-equant objects. If applicable, it also allows us to correct the grain sizes calculated by adding the perimeter of the grains not previously included in the image analysis.
The third part is a function called "find_grain_size" that returns the number and area-weighted plots. It can also produce the mean, the median and the area-weighted mean, the location of the Gaussian kernel density estimator peak, the modal intervals and their middle values produced by both approaches. Additional relevant information such as the bin size and the Gaussian KDE bandwidth estimated are also provided.
The procedure implemented within the script to estimate the frequency peak of the population is briefly explained below.
As portrayed above and shown in Fig. 3, choosing a different bin size (or number of classes) could lead to a different interpretation of the distribution of data and, therefore, different results. To ensure reproducibility, the idea behind the script is that for similar populations similar bin sizes need to be used. As far as possible manual data manipulation steps is kept to a minimum and an algorithm estimates on its own the optimal bin size. Although there is no best number of bins, there are some guidelines or rules of thumb to determine the optimal number of bins. GrainSizeTools implements two of the most commonly used rules of thumb: the Scott rule (Scott, 1979), based on the sample standard deviation, and the Freedman-Diaconis rule (Freedman and Diaconis, 1981), based on the sample interquartile range. The rule used by default within the script is the Freedman-Diaconis rule since it is less prone to outliers in comparison to Scott rule. The script also allows to set a user-defined bin size if desired. The same rules apply for the area-weighted approach, based on Herwegh (2000), which uses the sum of the areas respect to the total area (area percentages) of cross-sectional shapes for each grain size interval defined. Such procedure allows the comparison in number and area-weighted plots between distributions in order to obtain complementary information about the data set. The script returns the modal intervals and the middle values of the modal intervals in both cases. It also returns the area-weighted mean of the data set.
The script implements the Gaussian kernel density estimator within the number-weighted plot. It finds and returns the peak value of the KDE function; this is, the most likely value of dynamically recrystallized grain size in a number-weighted frequency plot. To estimate the optimal bandwidth of the Gaussian KDE we use the Silverman rule of thumb method (Silverman, 1986), since such a method works well for univariate systems with unimodal densities (Turlach, 1993;Bashtannyk and Hyndman, 2001). The implementation also allows to use the Silverman rule multiplied by a constant to modify the bandwidth for comparative purposes.

Discussion: evaluation of different measures of grain size
To test which measure of grain size gives the best estimate of the differential stress (or rate of mechanical work), the first step is to estimate how the introduction of different errors affect the different measures of grain size implemented within the script. For this purpose, the strategy is to simulate populations of apparent grain sizes with the introduction of different sources of errors and then see how the results depart from the expected values. For simplicity and to take advantage of the model presented in Fig. 2, the populations of grains considered will be discrete monodisperse 3-D populations of grain size. As previously explained, although this model is unrealistic to simulate grain size populations of real dynamically recrystallized mylonites, its simplicity is convenient to show the effects of different sources of error and the interpretations made can be transferred directly to real cases. A calibration of the different measures of grain size against a real sample will be shown later. The details of the simulation of apparent grain size populations from the discrete monodisperse 3-D populations with different sources of errors can be found in Appendix B.

Sources of error
The introduction of non-recrystallized grains within the population of apparent grains is likely to bias the distribution of grain sizes. These grains have actual sizes larger than the recrystallized fraction but, due to the cut-section effect, may reach sizes similar to the apparent grain size population. Furthermore, assuming that the recrystallization is not complete, these larger grains are expected to be more likely in thin sections due to the intersection probability effect. Figure 6a and b illustrate a situation in which 20 % of the grains are nonrecrystallized grains (i.e. larger in 3-D than the recrystallized grains) which were randomly cut and introduced in the population of apparent grain sizes. The mean, the median and the area-weighted mean are affected by the introduction of these outliers, shifting these parameters to higher values. In contrast, the frequency peak remains fixed provided that the bin size and/or the bandwidth chosen are the same (in case of histograms also the left edge of the bin).
Another non-negligible artefact arises when the smallest grain sizes are not measured due to optical and image resolution limitations of the technique applied (Higgins, 2006;Berger et al., 2011). In real distributions the smallest values reflect the resolution limitations of the applied techniques to acquire the data instead of tending to 0 as they would in theory. This effect tends to systematically slightly shift the mean, the median and the area-weighted mean grain size to higher values (Fig. 6c). The amount of shifting is variable across studies since it depends on the technique (optical microscopy or SEM) and the image resolution chosen. How-

c, d) Number-and area-weighted plots
showing the distribution of apparent diameters of a population of spheres of size 100 with uncertainty in the measure. The maximum uncertainty during the outline of the grains was set in ±10 (10 % the actual size). This uncertainty limit also sets a theoretical optical/resolution limitation of the measure (i.e. simulates that the technique does not allow measuring grains below the set limit). The mean, the median, the area-weighted mean and the modal interval grain sizes are provided. ever, the frequency peak is not affected by this source of error. Figure 6c and d show another non-negligible source of error in real samples: the effect of uncertainty in grain measurement (see Appendix B for details). In this case, the apparent population of grain sizes shows sizes above the actual diameter of recrystallized grains, modifying the ideal J-shape distribution. The introduction of uncertainty in grain measurement does not affect the estimation of the mean, the median and the area-weighted grain sizes. In fact, the results obtained in Fig. 6c and d for these grain size parameters are similar to those expected in a population without error measurement (cf. Figs. 2a and 6c), just slightly shifted to higher values because of the resolution limitation imposed in the acquisition and translated to the model. The uncertainty in measure is assumed to be random, which means that the errors will be compensated if the sample size chosen is large enough. As in previous examples, the frequency peak remains unaffected by this source of error.
It is important to note that the examples portrayed here are just end-member situations. In the study of real samples a combination of different sources of bias in the determina-tion of dynamically recrystallized grain size is expected. To conclude, the variations in the estimate of the mean, the median and the area-weighted mean shown in Fig. 6 are proof that these parameters are not reliable when the presence of outliers is significant. In contrast, the frequency peak (either modal interval or Gaussian KDE) is less prone to errors introduced by outliers and by the limitations of resolution in the technique applied to acquire the data compared to the other parameters. However, the methods to estimate the frequency peak suffer from its own limitations, especially the method based on the modal interval (see Sect. 2).

How many grains are needed to achieve reproducibility?
Before testing the script in natural samples and against other software available, it is necessary to consider the number of grains needed to achieve reproducibility. Previous studies stated that more than 500 grains are necessary for dynamically recrystallized grain size analysis (Heilbronner and Bruhn, 1998;Heilbronner, 2000). Since these studies did not explicitly address how they estimated the minimum number of grains needed for dynamically recrystallized grain size analysis, we assume these numbers rely solely on practical experience and are not derived by some theoretical underlying principle.
With the aim of solving this issue a Monte Carlo simulation was implemented to determine the minimum necessary sample size from a theoretical point of view. As stated above, at present we do not have sufficient knowledge about the type of distribution and the typical values that characterize the recrystallized grain size population in mylonites. Therefore, it is not possible to simulate in a reliable way synthetic populations of dynamically recrystallized grain sizes. To overcome this, a large data set of natural dynamically recrystallized deformed aggregate can be used. The strategyknown as bootstrapping -is to perform a random resampling of the data set chosen. For this approach, we use a population of dynamically recrystallized quartz grains measured in a naturally deformed granite with a population of 2945 grains (sample MAL-05, see below for details; Sect. 6). The sample is expected to contain a number of grains well above the minimum required. The next step is to test how the number of grains measured (or sample size) affects the estimation of the mean grain size by comparing all the results obtained for a particular sample size estimating the coefficient of variation.
The given condition of the Monte Carlo simulation is based on the typical uncertainty of grain measurements. Berger et al. (2011) found that when repeating the measurement procedure in the same targeted grain, the error margins in the diameters calculated ranged between 1 and 4 %. Taking the 4 % error, the goal is to find the minimum number of grains needed so that if the measurement is repeated a large number of times, nearly 95 % (2-sigma) or 99 % (3-sigma) of the time the mean grain size obtained shows a variation equal or less than 4 %. The condition is satisfied when the number of analyzed grains are 433 for sigma-2 (∼ 95 % of the time) and 965 for sigma-3 (∼ 99 % of the time) (Fig. 7).
It needs to be taken into account that if anyone performs the same simulation only one time they would probably obtain slightly different a number of grains. This variability is Table 1. Steps for obtaining the cross-sectional areas of the grains using the ImageJ application.
1. Several light microscopy digital images were acquired from four different areas of the thin section using cross-polarized light with or without the gypsum plate inserted. In two of the four areas chosen, four different images were taken at the same location, a pair of cross-polarized and cross-polarized with the gypsum plate inserted and a similar set of images (i.e. in the same area) but rotated 45 • . Then the images were superimposed to facilitate the identification of grain boundaries. The images in the four different areas were taken with different resolutions for comparison, varying from 0.72 to 0.35 micron pixel −1 .
2. Processing of digital images to enhance grain boundaries. The data treatment consisted of setting a correct contrast/brightness ratio and applying a high-pass filter to enhance boundaries if necessary.
3. Quartz grain segmentation was performed by manual outlining using a vector-graphics application. Each grain was converted into a closed polygon filled with black colour and making the boundary lines white. In this step, grains not considered dynamically recrystallized or cut by the image margins were discarded.
4. Generation of grey-scale raster image with the same resolution of the original image. It is ensured that the grain boundaries have a width of 2-3 pixels. This will be used later to correct the diameters obtained from the areas by adding the grain boundaries taking into account the resolution of the digital images.
5. Measurement of cross-sectional areas using an image analysis program, in this case ImageJ. Basically, the steps were as follows: (i) convert the image into an 8 bit binary image; (ii) set the scale of the image; (ii) set the type of measure/s to be done, which includes the area and other parameters of interest; and (iv) measure the area of each grains individually using "analyze particles". 6. The cross-sectional areas obtained were saved as a text file to analyze subsequently the data using the Grain-SizeTools script. explained by the random noise that shows the coefficient of variation line (see the inset in Fig. 7b), which is nearly parallel to the x-axis when approaching the value of 4 %. The stochastic nature of the experiment implies that a particular outcome cannot be predicted but rather the statistics of possible outcomes. A total of 250 runs were performed to estimate the mean, standard deviation and the 2-sigma errors based on the standard deviation. Error intervals of 414 to 433 grains (using the 2-sigma coefficient of variation; Fig. 7c) and 925 to 965 grains (using the 3-sigma coefficient of variation; Fig. 7d) were obtained. The higher values within these error intervals were taken as the minimum number of grains necessary to perform a reproducible and robust grain size analysis.

Testing different measures of grain size for reproducibility
To test whether the different measures of grain size implemented within the script yield reproducible results, a data set from a natural mylonitic granite sample (named MAL-05) was used. The mylonite comes from a crustal-scale extensional shear zone, the Vivero fault (Lopez-Sanchez, 2013), deforming a coarse-grained two-mica granite with quartz (∼ 35 %), feldspar (microcline and plagioclase; ∼ 60 %) and muscovite plus biotite (≤ 5 %) as main constituents. Mylonitic samples show quartz aggregates with complete or quasi-complete dynamic recrystallization dominated by subgrain rotation (Fig. 8) (Lopez-Sanchez, 2013). In contrast, feldspar shows cataclasis with syn-tectonic crystallization of very fine albite-oligoclase, K-feldspar and biotite grains along fractures as well as at the feldspar rims (Lopez-Sanchez, 2013). The thin sections were cut parallel to the mineral and stretching lineation and perpendicular to the mylonitic foliation (XZ sections). The procedure to acquire and measure the areas of the dynamically recrystallized grains is summarized in Table 1. The strategy to check reproducibility was as follows. Dynamically recrystallized grain size areas were measured and their apparent diameters derived from four different locations belonging to two thin sections of the same sample (sample MAL-05). There are no significant differences in the grain size population between the different selected locations within the thin section. The number of grains measured in each image meet the requirements portrayed in Sect. 5. The reproducibility was tested by comparing the results obtained in each of the images measured (or a combination thereof). The results of the comparison between parameters in terms of stability and robustness to estimate the dynamically recrystallized grain size are summarized in Table 2 and Figs. 9 and 10.
The worst results were obtained using the middle values of the modal intervals, with variations from 3.3 to 5.0 % in average and coefficients of variation from 0.039 to 0.058 (Table 2). In some cases errors obtained were up to 10.6 %. In contrast to this, the Gaussian KDE peak, the mean, the median and the area-weighted yield better reproducible results, with variations from 0.3 to 1.3 % in average ( Table 2). The Gaussian KDE peak, the median and the area-weighted  (Scott, 1979) to estimate the bin size of the histogram. c Values obtained from the modal interval using Freedman-Diaconis' Rule (Freedman and Diaconis, 1981) to estimate the bin size of the histogram. The maximum errors obtained were always below 4 %: 3.8 in the area-weighted mean, 2.6 in the median, 1.9 in the Gaussian KDE peak and 0.8 in the mean ( Table 2).
The results show that the Gaussian KDE peak, the mean, the median and the area-weighted grain size have comparable quality as grain size estimators. Therefore, any of them can be theoretically used for paleopiezometry or paleowattometry studies. This finding contrasts with the results obtained in the simulations performed in the Sect. 4.1, which indicated that, because of the introduction of different sources of error, the Gaussian KDE would theoretically produce the best re-  sults. This result might be explained by the fact that when the sample size chosen is large enough, the effects by outliers are significantly reduced.

Testing the script against other software available
The results obtained with the GrainSizeTools using the whole data set of sample MAL-05 and the Gaussian KDE peak estimator are compared to those obtained with StripStar (Heilbronner and Bruhn, 1998) and CSDCorrections (Higgins, 2000) (Table 3; Fig. 11). In the latter, which allows us to set different shape parameters for the grains, the grains were assumed to be perfect spheres and the type of measurement to be the ellipse major axis. Since both programs were designed to derive the actual grain size distribution from the sectional 2-D data instead of obtaining a single value, the middle value of the modal interval of the actual (3-D) dynamically recrystallized grain size distribution was selected for comparison. Both programs do not provide an automated process to estimate the number of classes (or the bin size) and, therefore, the results can vary depending on the number of classes chosen. To overcome this, all the results obtained in a defined range of number of classes were used, so that the mean and other statistical parameters of interest can be calculated (Table 3). As can be seen in Fig. 11 and Table 3, the outcomes obtained by the Gaussian KDE peak are similar -within the credible intervals -to those produced by estimating the frequency peak using the StripStar and CSDCorrections. In the case of the area-weighted mean grain size, the frequency peak yielded a slightly different result to that obtained by StripStar using the volume-weighted approach (Table 3), although this was also previously demonstrated in Heilbronner and Barret (2014). A remarkable inference can be drawn from this comparative. Although the relative probability of the frequency peak between the populations of apparent (2-D) and actual (3-D) grain size distributions is different, its location coincides within the error. In other words, the location of the Gaussian KDE peak in the apparent grain size populations indicates the actual location of the frequency peak of the 3-D grain size population. In consequence, the frequency peak as a grain size estimator produces similar results whether obtained from apparent (2-D) or actual (3-D) grain size distributions. In contrast, the other grain size parameters produce different results, depending on whether grain size populations derive from 2-D or 3-D observations.

Concluding remarks and future development
Several measures of grain size have been tested in order to discriminate the best estimate for paleopiezometry or paleowattometry studies, including for the first time the peak of the Gaussian KDE. The simulations performed on a natural sample indicated that the grain size parameters based on the modal intervals of the histogram yield the worst estimations, while the mean, the peak of Gaussian KDE, the median and the area-weighted mean grain size are equally well suited for describing grain size. However, the peak of the Gaussian KDE has several advantages over the others parameters.
1. It is potentially less prone to be shifted by the presence of outliers or due to resolution limitations in the acquisition of the data. 3. In case of multimodal grain size distributions in which the different populations of grain size cannot be separated, only the frequency peak is useful.
A free, open-source, easy-to-handle script, named GrainSize-Tools, is introduced with the aim of facilitating the adoption of the peak of the Gaussian KDE for paleopiezometry or paleowattometry studies. The script is written in Python, a cross-platform interpreted programming language. To use the script no previous knowledge of the Python language is necessary. The main advantage of the script compared to other software or scripts available is that it uses the Gaussian KDE and avoids manual steps to find the frequency peak of the grain size distribution, improving reproducibility as a consequence.
The results produced by the Gaussian KDE peak are different to those obtained in the past, aiming to establish a correlation between dynamically recrystallized grain size and differential stress in experiments that mainly used logarithmic and square root mean grain size using 1-D methods. Therefore, the use of this measure of grain size requires the recalibration of the paleopiezometer (or paleowattometer). As shown in Berger et al. (2011), an option in these cases is to use an empirical conversion matrix to relate the different parameters of grain size. However, as shown in Fig. 6, this is not an optimal approach since different measures of grain size are affected in a different way by different source of errors. The consequence is that no fully reliable correlation can be established among them since the introduction of errors change across studies. In conclusion, the best practice would be to create a database using the frequency peak and following a strict protocol in different deformed minerals and deformation conditions from laboratory published experiments. In the case of a perfect monodisperse population, the crosssection size probability can be directly estimated using the following equation (Sahagian and Proussevitch, 1998): where R is the radius of the sphere, r 1 and r 2 are the lower and upper limits of the apparent section interval defined, and P is the probability to cut sections within the interval defined. If we take a sphere of unit radius the equation simplifies to The aim is to find the number of grains needed to know with a certainty of 99 % that at least one of the grains measured in the data set have a size similar or within an error less than 4 % compared to the actual size. Assuming a sphere of unit radius and according to the equation shown above, the opposite can be calculated; this is, the probability that all sections obtained have a diameter shorter than 0.96 (an apparent diameter shorter or equal than 4 % with respect to the actual size) is If the sample size is increased (i.e. the number of random cuts), the probability that at least one random section shows lengths larger than 0.96 is where n is the number of random cuts (i.e. the sample size) and P total the probability that at least one apparent section is larger than 0.96 for any given n. The next step is to find when at least one of the lengths of the sections randomly cut is larger than 0.96 with a certainty of 99%. When the sample size is 15 the P total is 99.28 %. This means that if just 15 grains are measured, the chance of finding a section whose diameter has an error less than 4 % of the actual size of the grains in a monodisperse non-packed system is greater than 99 %.

Appendix B: A stochastic model to simulate grain size distributions (Monte Carlo approach)
To simulate the cut-section effect, a user-defined number of random cuts are generated (depending on the sample size desired) on a circle of unit radius. The Python built-in function called random.random generates a random irrational (floating point) number between 0.0 and 1.0. This represents any possible cut from the centre of the circle to its edge (Fig. B1). Then, following the Pythagorean theorem the length of the R a r R 0 1 a/2 r i a = 2 R 2 -r 2 a = 2 1 -r i 2 Figure B1. Model to generate apparent random sections through a sphere to calculate the apparent diameter. Following the Pythagorean theorem, it is necessary to know the radius (R) and the apothem (r) to calculate the chord or section length through a sphere. Taking a sphere of radius 1, the first random apothems (r i ) between 0 and 1 are generated and then the apparent diameters (from 2 to 0) are calculated and saved.
section (also known as chord length) is calculated using the following relationship: The apparent diameters obtained can be corrected -if necessary -according to a selected diameter. The Python function implemented to simulate this effect is called generateRan-domSections (see Supplement, SourceCode_AnexoB.py file). Because monodisperse systems are only affected by the cutsection effect, this Python function can be used to simulate perfect monodisperse systems. This process is repeated a number of times, as many as required to be reproducible within the level of confidence desired (see Appendix C). The data generated are stored and then plotted on a histogram (see Figs. 2 and 6 within the manuscript for examples). According to the law of large numbers (or Bernoulli's law), when sample size is large enough the Monte Carlo Simulation will produce the same results for a given accuracy obtained by the Eq. (A1) by Sahagian and Proussevitch (1998) shown in the Appendix A.
In the case of simulating a monodisperse data set with outliers (e.g. Fig. 6a, b), the procedure was as follows: first, the grain size, the sample size and the ratio between the correct measures and the outliers are established. Once the number of outliers that need to be added to the data set are calculated, the code generates a defined number of grains with a random size ranging from 1.01 to 1.5 times the recrystallized grain size defined (e.g. if the actual grain size is set to 100 microns, the outlier grains will range between 101 to 150 microns). The maximum limit has been set arbitrarily at 1.5 times. Then, for each random grain created, a random section is generated and added to the data set (see function generate-Sample_withOutliers within the SourceCode_AnexoB.py file in the Supplement).
To generate a data set simulating uncertainty during measurement -as in the Fig. 6c and d within the text-there are two approaches. One assumes that the uncertainty of the data is independent of the size considered and the other assumes that the measurement error is size dependent. Studies showed that the last approach is the correct one (Gualda, 2006;Berger at al., 2011). Berger et al. (2011) carried out a study involving repeated measurements on several grains and found that the error in grain size varies between 1 and 4 % for targeted grains but up to 15 % for smaller grains. This tendency is in part due to optimizing the resolution for the targeted grain sizes, resulting in considerable errors for the smallest grain sizes. Taking this behaviour into account, the procedure simulates such uncertainty as follows. First, it establishes the grain size, the sample size and the uncertainty expected for the sectional areas. Then it creates a number of random sections defined by the user. To generate the uncertainty within the data set, (i) the maximum absolute error is calculated taking into account the uncertainty desired, (ii) all the values below this maximum error are removed to prevent negative values within the data set during the addition of random errors (there is also a practical reason following this strategy, since the uncertainty obtained during measurements limits the actual optical and resolution limitations of the technique applied) and, finally, (iii) a random error between 0 and the maximum absolute error estimated with a random sign (i.e. positive or negative) is generated and added for each value. It is assumed that the expected error has a normal (Gaussian) distribution. Hence, the result is closer to the actual value than the extreme values defined (see function generateSample_withUncertainty within the SourceCode_AnexoB.py file in the Supplement).
To simulate the intersection probability effect in case of polydisperse populations, a large population of grains (say a million) is generated with a defined distribution and then the individual volumes of each grain and the total volume are calculated. At that point, assuming that the probability of intersecting a particular grain is directly proportional to its volume with respect to total volume, the individual volumes of the grains are normalized with respect to the total volume. Finally, an array of data with the cumulative sum of the elements is created. As an example, imagine a list of three grains with diameters 2, 3 and 5 (i.e. [2,3,5]). The volume of grains ([0.52, 4.19, 14.14]) is then normalized it to the total volume (18.85), obtaining the following array [0.028, 0.222, 0.750]. Finally, a cumulative sum of elements is calculated [0.028, 0.25, 1.0]. Once this array is created, a desired number of random values between 0.0 and 1.0 is generated. Each value obtained is used to find which grain is going to be sectioned according to its probability (i.e. volume) using the list of cumulative sum of elements created. Returning to the given example, when a random value of 0.21 is obtained, the position in the cumulative list that meets the following criteria value > 0.21 is 0.25, the second position in the list. Therefore the grain to be picked and sectioned from the original list of grain sizes is the grain located in the second position, which has a diameter of 3. This procedure was repeated 5000 times to create a large population of grains affected by the intersec-tion probability and, later, by the cut-section effect. Due to the notable differences between the original population (10 6 ) and the new one (5000), it is highly unlikely (< 0.05) that the same grain is chosen twice in the simulation process.
The simulation of a bimodal discrete distribution (Fig. 6), uses a Python function named generate_bimodal_sample. This function first generates two populations of grains with different sizes and in a proportion defined by the user. The function takes the probability section and the cut-section effects into account to generate a population of apparent diameters of the grains. The generate_bimodal_sample requires that the user introduces the following parameters: (i) the grain size of the population A, (ii) the grain size of the population B, (iii) the sample size and (iv) the ratio between the two populations (e.g. when ratio = 0.8 it means that population A represents 80 % -in number -of the total population). The function returns a text file with the random apparent diameters (2-D) generated.
The code implemented to perform this task is shown in the Supplement file SourceCode_AnexoB.py.
When the condition is not satisfied, the sample size is increased (with a step size defined by the user) and the process initiates again until the condition is satisfied. 4. When the condition is satisfied, the sample size for the defined condition is reported and a number of plots showing the evolution of the mean, the standard deviation and the modified coefficient of variation throughout the process of increasing the sample size are generated.
5. Finally, due to the stochastic nature of the process, it was repeated 250 times to obtain the statistics of possible outcomes. On this basis, we estimated the 2-sigma standard deviation to set the minimum number of grains needed.
The code implemented to perform this task is shown in the attached file: SourceCode_AnexoC The Supplement related to this article is available online at doi:10.5194/se-6-475-2015-supplement.
Author contributions. Marco A. Lopez-Sanchez designed the simulations, developed the Python code, interpreted the data and drafted the manuscript. Sergio Llana-Fúnez contributed with new ideas and tested the scripts independently during the preparation of the manuscript.