Remote sensing data processing by multivariate regression analysis method for iron mineral resource potential mapping: A case study in the Sarvian area, central Iran.

This paper used multivariate regression to create a mathematical model (with reasonable accuracy) for iron skarn exploration in the region of the interest and generalizing multivariate regression in Mineral Prospectivity Mapping (MPM) field. The main target of this manuscript is to exert multivariate regression analysis (as a MPM method) to iron outcrops mapping from northeast part of the study area to discover new iron deposits in other parts. Two types of multivariate regression models as two linear equations were employed to discover new mineral deposits. The Aster satellite image bands (14 bands) sets as Unique Independent Variables (UIVs) and iron outcrops map as dependent variables were used for MPM. According to the results of p-value, � � and � ���� , the second regression model (which was a multiples and exponents of UIVs) was the fitted model versus other models. Also the accuracy of the model was confirmed by iron outcrops map and geological observations. Based on field observation iron mineralization occurs as contact of limestone and intrusive rocks (skarn type). Iron minerals consist dominantly of magnetite, hematite and goethite. satellite image from north-east part of the study area to identify new iron deposits in other parts. Two types of multivariate regression models utilized to find new mineral deposits, using pixel values of Aster satellite image bands (14 band) sets as Unique Independent Variables (UIVs) and Iron outcrops surface (digitized by geology map of study area (scale 1:5000) and check field) data as dependent variables.


Introduction
The remote-sensing layer is one of the significant data layers which is applicable for different levels of mineral exploration especially at reconnaissance levels.This data layer is processed based on the most common techniques for the identification of minerals.Mineral exploration is a complex process (Gupta, 2003).The complexities of mineral exploration can be solved by using remote-sensing techniques in the early stages of mineral exploration for the reconnaissance of target areas with the goal of continuing exploratory operations.One of the most recognizable uses with remote sensing is mineral exploration and the identification of various geological structures, faults and lineaments, geological units, alterations, indicator, and tracer minerals (Melesse et al., 2007;Carranza, 2008;Abedi et al., 2013;Golshadi et al., 2016 andFeizi andMansouri, 2012).The factors mentioned play important roles for recognizing mineralization in the region of interest; so the identification of these factors saves time and cost as well as giving a more precise result (Xiong and Zuo, 2017).
There are various techniques in remote-sensing processes for recognizing minerals.The satellite images were processed with specific mathematical algorithms in all remotesensing techniques with the goal of the generation of useful information.The information mentioned can be integrated with other information and layers for the evaluation and interpretation of exploratory results (Li et al., 2015, Abedi et al., 2012;Bonham-Carter and Agterberg, 1990;Carranza, 2009;Carranza and Sadeghi, 2010;Ford and Blenkinsop, 2008;Lindsay et al., 2014;Lisitsin et al., 2013;Pan and Harris, 2000;Porwal et al., 2010;Feizi and Mansouri, 2013a).
One of the important factors in remote-sensing processes is using an appropriate algorithm and the proper method.Today, new image processing methods and algorithms are improved.Among these methods, the regression analysis mathematical approach is significant due to its strong mathemat-E.Mansouri et al.: Remote-sensing data processing with the multivariate regression analysis method ical basics and the fact that it is compatible with geological data.
The identification of stream sediment anomalies has been used by multiple regression analyses (e.g.Carranza, 2010a, b).Likewise, multivariate regression has been effectively utilized by Granian et al. (2015) to display subsurface mineralization from lithogeochemical information.Granian et al. (2015) used four types of multivariate regression models to depict significant surface geochemical anomalies indicating subsurface gold mineralization and utilizing borehole data as dependent variables and surface lithogeochemical data as independent variables.
Based on previous work such as Allbed et al. (2014), modelling and mapping of mineral potentials based on satellite image data and processing it based on remote-sensing and regression analysis is a promising approach as it facilitates timely detection with a low-cost procedure and allows decision makers to decide what necessary action should be taken as the first step in the mineral prospectivity mapping (MPM) field.
There are multiple types of regression analyses.Among these types, multivariate regression analysis is selected and used in this paper.In multivariate regression analysis, the relationships between independent variables and dependent variables is predicted in order to analyse the effects of independent variables on dependent variables.This method can be used in remote sensing by modelling the mineralization outcrop points for further exploration and finding new prospective zones, directly.One of the advantages of this method is the directness and quickness of mineral identifications without the need for other exploration layers.
The aim of this paper is the processing of satellite images by the mathematical method of regression analyses and using its applications in remote-sensing and geological units.In addition, recognizing new mineralization in the region of interest with modelling mines and known deposits is another purpose of this paper.This aim is reached by identifying geological dependent variables and finding relationships among them for the exploration of new deposits with an acceptable accuracy in the study area.
The Sarvian iron ore deposit with 8 million tons reserve is a calcic iron skarn deposit.Due to intrusive rocks and carbonate rocks in many parts of the study area, new iron skarn mineralization can be introduced.In this paper we used the regression method to identify new iron mineralization in other parts of the study area.
In order to perform this method, the existence of a dependent variable is the main condition for the use of analytic regression.In this study, Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) satellite image pixels located in the northeastern part of the study area were considered to be dependent variables.Also, ASTER satellite image pixels of other parts of the study area were considered to be independent variables.Two types of multivariate regression models were used to find new mineral deposits: the 14 bands of ASTER satellite images were set as unique independent variables (UIVs), while iron outcrop area (digitized as a 1 : 5000 geology map of the study area and field) data were set as dependent variables.

Study area
The Sarvian area is in the Orūmīyeh-Dokhtar magmatic arc in central Iran (Fig. 1a).This magmatic arc is the most important metallogenic area inside the district; it hosts large metal deposits such as lead, zinc, copper and iron.A set of crystallized limestone dolomite are the oldest geological units in the study area, dating to the Permian and Triassic.Sedimentation of limestone and marl in the Qom formation occurred concurrent with continental sedimentation at the Oligocene.Most tectonic activity in the study area was in the form of vertical movements ,which caused instability in the basin and changed the depth of the sea.Vertical movements at the beginning of the Miocene caused volcanic activity in the study area, which was impressive.Important magmatism occurred in the late of Miocene, which caused skarn mineralization where carbonate units of the Qom formation were in contact.The main fault of the study area is Bidehend.The Bidehend is a strike-slip fault with a length of 43 km.The Bidehend fault is 10 km away from the study area.The effect of this major fault on the study area is limited to the creation of parallel faults and fractures with the same direction as the Bidehend fault.There is no relationship between the skarn mineralization and faults in the Sarvian area because no mineralization has been reported in faults and fractures (Feizi et al., 2016(Feizi et al., , 2017)).
The study area is dominated by Eocene intrusive rocks and carbonates of the Qom formation.Several types of metal and non-metal mineral ore deposits have, up to now, been reported in the study area.According to the 1 : 100 000 geological map of Kahak, the lithology of this area includes cream limestone with intercalations of marls (Qom formation), dark green, andesitic-basaltic lava, volcanic breccia, hyaloclastic limestone, green megaporphyritic andesitic-basaltic lava, rhyodacitic domes, tonalite-quartz-diorite, microquartzdiorite-microquartz-monzodiorite, granite-granodiorite, alternations between light green and grey tuff, tuffaceous sandstone and shale with the intercalation of nummulitic sandy limestone and andesitic lava, and Orbitolina-bearing, thickbedded to massive grey limestone (Aptian-Albian) (Feizi et al., 2016) (Fig. 1b).
These relationships are demonstrated by the calcic iron skarn ore (Sarvian mine) in the northeast of the study area (Feizi et al., 2017) (Fig. 2).Skarn-type Fe mineralization and alteration are localized along the contact zone between intrusive rocks and carbonate sequences (Zuo et al., 2014).

Multivariate regression
For uncovering relationships between independent and dependent variables, an appropriate statistical tool was introduced into geoscience by Granian et al. (2015) which is called regression analyses.If dependent variables are called (Y ) and independent variables are called (x i ), the formula is (1) Based on regression analysis theory, Y can be a linear or nonlinear function.If Y is linear, the formula is For the formula mentioned, the constant factor is a 0 , the random error is ε, and the regression coefficients are a i .If there www.solid-earth.net/9/373/2018/Solid Earth, 9, 373-384, 2018 are n samples in a data set, for each sample t variables were measured.Therefore, formula (2) can be written as Formula (3) can be presented as a linear function matrix: For calculating the coefficient matrix [A] the least squares method is utilized: The inverse of the variance-covariance sample matrix is −1 , and the covariance matrix among independent variable and samples is [C].So the regression coefficient model is computed by formula (6).
Based on Granian et al. (2015), the following criteria were utilized for the examination of the regression analysis: 1.The variance and the mean of the random error should be a constant value and zero, respectively.
2. The coefficient of determination value which is called (R 2 ) should be tested (Granian et al., 2015).R 2 is presented as In Eq. ( 7), the mean of the variable is called (Y ), the value of the ith sample is called (Y i ) and the estimated value of the ith sample is called ( Ŷi ) for dependent variables.The estimated R 2 value specified within the [0, 1] range.The value of R 2 should be 1 or close to 1 for well-fitted models.
3. Given the fact that adding independent variables to the model will increase the R 2 value, the adjusted determination coefficient which is called (R 2 adj ) is presented as (Granian et al., 2015): In Eq. ( 8), n is the number of data and t is the number of regression coefficients.If a set of explanatory variables are introduced into a regression one at a time, with the R 2 adj computed each time, the level at which R 2 adj reaches a maximum, and decreases afterward, would be a wellfitted model.

In regression analyses, the p value of final coefficients
for each specific model could be applied after choosing the best model.Accordingly, the p value of the regression model in the analysis of variance (ANOVA) test should be acceptable (less than or equal to 0.05).

Data collection
There are several iron ore bodies and one iron mine in the northeastern Sarvian study area.The regional geological conditions of the area suggest that the Sarvian iron mine is a good model for exploring the surrounding area.In this paper, a geology map of the mine is used as a training area for satellite imagery.In the training area, this method can model the iron outcrops (a dependent variable) based on ASTER satellite image bands (independent variables) (Fig. 3).

Remote-sensing data (independent variables)
The ASTER sensor was launched in December 1999 on board the Earth Observation System (EOS) US Terra satellite.ASTER provides high-resolution images of the land surface, water, ice, and clouds using three separate sensor subsystems covering 14 multi-spectral bands from visible to thermal infrared (Table 1).Resolutions are 15, 30 and 90 m in the visible and near infrared (VNIR), shortwave infrared (SWIR) and thermal infrared (TIR), respectively.For more information see Feizi and Mansouri (2013b) and Mansouri and Feizi (2016).
In this study after corrections, the pixel size of the SWIR and TIR bands based on the VNIR3 band (panchromatic band) was converted to 15 m.The layer stacking function was then used to build a new multiband file from georeferenced images of various pixel sizes, extents and projections (Mansouri et al., 2015).The date of the images is 11 June 2002.

Mapping of iron outcrops (dependent variable)
There are several iron veins and outcrops around the iron ore skarn mine in the northeastern part of the Sarvian area.Iron outcrops in the training area were mapped using a geological map on a scale of 1 : 1000 of the iron ore deposit.The map was then field checked.The shape file layer of iron outcrops was converted to a raster file with a pixel size of 15 m.

Results
Multiple, factorial, polynomial and response surface regressions have been utilized in many fields including the geosciences (e.g.Granian et al., 2015).In this study, Model 1 (Y 1 ) was generated as a multiple linear regression model and Model 2 (Y 2 ) was created from Y 1 plus many UIVs.The formulas for the two models are presented in Table 2. Thus, two linear equations (Y 1 and Y 2 ) were used to discover new mineral deposits, using pixel values from ASTER satellite data as independent variables and a map of iron outcrops as dependent variables.Of the two models proposed in this paper, model 2 has 106 coefficients (14 for UIVs, 1 as constant, 91 for multiples of UIVs) and model 1 has 15 coefficients (14 for UIVs, 1 as constant, 0 for multiples and exponents of UIVs) (Table 2).
Regression analyses were performed to assess the models in Table 2, and the critical criteria mentioned above were examined.The R 2 , R 2 adj and p values from the ANOVA test of two multivariate regression models are provided in Table 3.
Table 4 presents the calculated coefficients of independent variables in regression models.Excluded independent variables are not mentioned in Table 4. Excluded variables were those that had no effect on iron mineralization and the mapped distribution of iron outcrops.
We used several criteria to review the differences between the two models.Firstly, the variance and the mean of the random error were acceptable for both models.Secondly, based on Table 4, the p values of the ANOVA test of the two models were equal to 0. For regression models, the acceptable p value should be less than or equal to 0.05.Thus, this criterion confirmed the validity of the models without specifying the most appropriate model.
The value of R 2 is close to 1 for well-fitted models.The R 2 values of regression models are presented in Table 3. Model Y 1 has a lower R 2 than Y 2 .Thus, the Y 2 model is better than the Y 1 model.
Because adding independent variables to the model will increase the R 2 value, the R 2 adj value should be checked.The R 2 adj values of regression models are presented in Table 3.As mentioned above, if a set of variables is introduced into a regression, with the R 2 adj computed each time, the level at which R 2 adj reaches a maximum, and decreases afterwards, would result in a well-fitted model.So, according to Table 3, Y 2 , as opposed to the other models, is the fitted model.Thus, the Y 2 regression model is the most appropriate model for mineral prospectivity mapping.
Thus, according to the results of the p value (ANOVA test), R 2 and R 2 adj , the second regression model (Y 2 ), as op-  posed to the other models, would be the fitted model.For generating a mineral prospectivity map the model Y 2 was implemented in ArcGIS using the raster calculator tool.The normalized mineral prospectivity map of the study area is presented in Fig. 4.

Discussion
A large part of the study area is formed based on carbonate units of the Qom formation and intrusive rocks such as diorite, granodiorite and gabbro.These rock units increases the probability of skarn mineralization in the study area.The type of the Sarvian iron ore, which is used in this paper from its outcropping pixels as dependent variables, is also skarn.
According to the observations in the field operations and the study of the geological map of the area, there is contact between the intrusive units (diorite, granodiorite) and host rocks (limestone and siltstone of the Qom formation).In the contact area of intrusive units and host rocks, the skarn geological unit was seen as a narrow strip.The major economical mineralization of skarn iron ores in this region is magnetite.
To assess the accuracy of the selected model, the created prospectivity map was checked against the iron outcrops map in the northeastern part of the study area (Fig. 5).The locations of iron outcrops are in close agreement with predictions from the mineral prospectivity map.In addition, three target areas with very high potential were checked for iron outcrops, and the prospectivity map was confirmed by geological observations (Fig. 6).Based on field observation, iron mineralization occurs at contacts between limestone and intrusive rocks (skarn type).Iron mineralization consists dom- inantly of magnetite (Fig. 6).Therefore, the accuracy of the mineral prospectivity map is confirmed in the Sarvian area.
It is obvious that satellite images consist of various bands, and each pixel in different bands has a specific pixel value.Thus, some quantitative information is obtained which should be processed for reaching the goal of interest.In remote sensing, selecting the appropriate method and algorithm is significant for obtaining the best results.
Remote-sensing methods were mostly generated based on spectral or pixels.Based on this categorization, various statistical and spectral methods are available.One of the methods that can be used in remote sensing is analytical regression.This method is a statistical process for estimating the relationships between variables.
The application of a multivariate regression method in remote sensing is based on a supervised method.The supwww.solid-earth.net/9/373/2018/Solid Earth, 9, 373-384, 2018 porting vector machine (SVM) technique is a supervised approach which can be compared to multivariate regression analysis because both methods are supervised and based on regression functions.The theory of SVM is based on classification and regression.This method is one of the most recent approaches that has shown appropriate performance in recent years.The classification in SVM is according to the linear data classification, and the user should select an appropriate line for classification.This method is a linear training method which uses the empty spaces between data.The SVM uses kernel functions to separate and classify classes.The more kernels can locate the classes with maximum distance from each other, the greater the accuracy with which the classification will be done.This refers to the maximum distance between the separator screen and the closest samples of each class (Forkuor et al., 2017;Cheng and Bao, 2014).
The most important advantages of SVM are that it has good application in various fields and produces an optimal response.The most important disadvantages of SVM are mentioned below: 1.In SVM method an appropriate kernel should be selected.Determining the proper kernel is very important.Selecting an inappropriate kernel causes errors in calculations and conclusions.
3. SVM has limitations in speed and time because in this method an optimization issue needs to be solved.
4. This method may not provide good results for all data.
The similarity between SVM and the method used in this paper is that both are based on regression mathematics theory and functions, but there are also some differences between them.The SVM classifies and separates the categories, but the analytic regression method uses existing relationships and correlations between the data for introducing the best possible model for predicting the result.
The most important advantages of the analytical regression method are mentioned below: 1. Almost all data can be used in this method.
2. The analytical regression method does not depend on a particular parameter, and it does not have any special restrictions like the SVM.
3. The analytical regression method does not have any limitations in speed and time.
4. Due to the fact that the model predicts the results according to the data as well as the relationships between the data, the results are closer to reality.Forkuor et al. (2017) used four methods, i.e. multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM) and stochastic gradient boosting (SGB), to study soil properties in southwestern Burkina Faso.The results of all four methods are confirmed by Forkuor et al. (2017), who stated that other methods are preferable in comparison with methods based on regression according to the model performances statistics.This statement can obviously not be accurate in iron ore exploration of the Sarvian area.The results of regression analyses in the Sarvian area showed that all of the areas predicted by the appropriate regression model are iron ore minerals.
Also Allbed et al. (2014) used regression analyses to identify soil properties on satellite imagery and achieved good results; but the difference between this paper and other similar papers is the use of regression analyses in mineral exploration as well as the generation of a mineral potential map.
To examine and compare the results of using multivariate analytic regression with other similar methods on mineral exploration, Feizi and Mansouri (2013b) is used.
To review the results of multivariate regression and to compare this method with other existing methods, our previous work, Feizi and Mansouri (2013b), is referenced for two reasons.Firstly, the northern part of the study area in this paper is similar to some areas of the southern part of the study area in Feizi and Mansouri (2013b); secondly, like this paper, Feizi and Mansouri (2013b) is published with the aim of iron ore exploration.Feizi and Mansouri (2013b) used methods such as the Spectral Angle Mapper (SAM), principal component analysis (PCA), least-squares fit linear band prediction (LS-Fit), Minimum Noise Fraction Transform (MNF) and band ratio for iron ore exploration.According to the results obtained by these methods, the identified regions are iron oxide zones containing magnetite, hematite, goethite, limonite and jarosite minerals.In the Sarvian area, magnetite ore is more economical than other minerals and all active mines with more magnetite (in comparison with other minerals such as hematite, goethite, limonite and jarosite) are more economical.Therefore, the methods, such as SAM, PCA, LS-Fit, MNF and band ratio, used by Feizi and Mansouri (2013b) introduced iron oxide alterations and a variety of iron minerals such as hematite, goethite, limonite, and jarosite with magnetite, which are not significant or economically valuable.So, to identify the areas with the most magnetite, the field study should be performed with a high accuracy, which prevents wasting time and money; the results of multivariate regression in this paper recognized magnetite areas accurately.In this paper, the pixels for the magnetite veins of the Sarvian iron ore mine are considered to be base pixels, and, therefore, the results obtained from this method demonstrate exactly the magnetite anomalies in the study area.Iron oxide alterations and a variety of uneconomical iron minerals such as hematite, goethite, limonite, and jarosite in the study area were not observed based on the results of multivariate regression.Thus, the multivariate regression method performs more accurately than other methods mentioned, which results in saving time and money, specifically with regard to the field study.It should be noted that the use of analytical regression in remote sensing is most recent, and it needs further studies especially for different types of deposits in mineral exploration.
The novelty of this paper is the use of regression analyses in mineral exploration as well as the generation of a mineral potential map.For mineral exploration, various geo-data layers, such as geochemical, geophysical, remote sensing and geological geo-data layers, should be integrated into GIS, but the most important achievement of this method is that it can be used as a direct method for mineral exploration with the least requirement of other exploration layers.The direct detection of minerals such as copper, lead, zinc and some economically valuable minerals is difficult using remote sensing, but due to the accuracy of this method, these elements can be explored more easily than before.Selecting pixels as dependent variables has a direct effect on the results and is very important in regression analyses; therefore, the higher the resolution of the images, the more accurate the results will be.
E. Mansouri et al.: Remote-sensing data processing with the multivariate regression analysis method Figure 6.Mineral prospectivity map of the Sarvian area, which was confirmed by a field sample of the three target areas.
1. Regression analysis is an appropriate and direct method for mineral prospectivity mapping (MPM) with satellite image data.In this paper, the output of processed satellite images using regression analysis indicates the iron potential zones accurately.
2. The application of multivariate regression analysis (as an MPM method) was confirmed in the Sarvian area.This paper used multivariate regression to create a mathematical model (with reasonable accuracy) for iron mineral exploration in the region of interest.
3. Two types of multivariate regression models, in the form of two linear equations, were employed to discover new mineral deposits.According to the results of the p value, R 2 and R 2 adj , the second-regression model provided the best appropriate observations.4. The accuracy of the model was confirmed by iron outcrop mapping and geological observations.Based on field observation, iron mineralization occurs as contact between limestone and intrusive rocks (skarn type).
5. The results demonstrate that modelling and mapping satellite image data based on regression analysis and remote-sensing data is an efficient approach as it facilitates timely detection with a low-cost procedure and allows decision makers to decide what necessary action should be taken as the first step in an MPM field.
6.The regression analysis method is a subset of supervised classification due to the procedure mentioned.In this method, target spectrums of the training area are used for modelling and MPM.

Figure 2 .
Figure 2. Location of the Sarvian iron mine in the study area.

Figure 4 .
Figure 4. Mineral prospectivity map of the Sarvian area.

Figure 5 .
Figure 5. Mineral prospectivity map of the Sarvian area, which is confirmed by iron outcrops.

Table 3 .
The R 2 , R 2 adj and p values of the ANOVA test of two multivariate regression models.

Table 4 .
The calculated coefficients of regression models 1 and 2. CST indicates the constant.