Element-by-element parallel spectral-element methods for 3-D teleseismic wave modeling

The development of an efficient algorithm for teleseismic wave field modeling is valuable for calculating the gradients of the misfit function (termed “misfit gradients”) or Fréchet derivatives when the teleseismic waveform is used for adjoint tomography. Here, we introduce an element-byelement parallel spectral-element method (EBE-SEM) for the efficient modeling of teleseismic wave field propagation in a reduced geology model. Under the plane-wave assumption, the frequency–wavenumber (FK) technique is implemented to compute the boundary wave field used to construct the boundary condition of the teleseismic wave incidence. To reduce the memory required for the storage of the boundary wave field for the incidence boundary condition, a strategy is introduced to efficiently store the boundary wave field on the model boundary. The perfectly matched layers absorbing boundary condition (PML ABC) is formulated using the EBE-SEM to absorb the scattered wave field from the model interior. The misfit gradient can easily be constructed in each time step during the calculation of the adjoint wave field. Three synthetic examples demonstrate the validity of the EBE-SEM for use in teleseismic wave field modeling and the misfit gradient calculation.


Introduction
The increasing demand for high-resolution imaging of deep lithospheric structures requires the utilization of teleseismic datasets for waveform inversion.Teleseismic waves provide tremendous amounts of information for the detection of crustal and upper mantle structures (Rondenay, 2009).Over the past forty years, many techniques have been developed to analyze teleseismic wave datasets, including receiver function analysis (Langston, 1977;Kind et al., 2012), teleseismic wave travel-time tomography based on ray theory (Zhang et al., 2011), teleseismic migration (Shragge et al., 2006), and teleseismic scattering tomography (Roecker et al., 2010;Tong et al., 2014a).To achieve the high-resolution imaging of lithospheric structures, the adjoint-state method has become the state-of-the-art technique for teleseismic wave imaging (Tong et al., 2014a;Monteiller et al., 2015).
The adjoint tomography technique constructs the Fréchet derivatives of the objective function with respect to the model parameters by numerically solving the full seismic wave equation twice if the forward wave fields are stored on a computer disk at every given time interval (Tromp et al., 2005;Liu and Tromp, 2006;Fichtner, 2011).Adjoint tomography has been successfully implemented to investigate crustal (Tape et al., 2009) and continental subsurface heterogeneity (Chen et al., 2015).Seismic adjoint waveform tomography, which has a greater resolution than the ray-based travel-time tomography for the same dense seismic ray coverage (Liu and Gu, 2012), is able to image small-scale (half of the minimum wavelength) heterogeneity (Virieux and Operto, 2009).The main drawback of the adjoint tomography is its large computational burden.The computational requirement is linearly related to the number of earthquakes used for the tomographic inversion and the iterations required by the optimization technique.For a typical local-scale model, such as the southern California region, several thousand 3-D Published by Copernicus Publications on behalf of the European Geosciences Union.S. Liu et al.: Element-by-element parallel spectral-element methods for 3-D teleseismic wave modeling full-wave field simulations are required to perform an adjoint tomography inversion (Tape et al., 2007(Tape et al., , 2009)).
Because most earthquakes occur in the crust, and seismic rays cannot easily illuminate the deep lithosphere in local seismic tomography, imaging the deep lithospheric structures can be difficult (Tong et al., 2014b, c).Increasing the model size enables more deep reflections and refractions to be included in the inversion dataset; as a result, deep structures can be inverted by fitting these reflected and refracted waveforms (Chen et al., 2015).However, for continental-scale models, it is difficult to invert short-period seismic data on a standard computing cluster, such as 1-2 s for P waves and 3-6 s for S waves (Tong et al., 2015).
To reduce the amount of computation involved in solving the full seismic wave equation, many hybrid methods have been developed to localize 2-D/3-D numerical solvers.The boundary conditions for the reduced simulation model are provided by rapid 1-D analytical solutions for the 1-D background Earth model (Capdeville et al., 2003;Monteiller et al., 2013Monteiller et al., , 2015;;Tong et al., 2014aTong et al., , 2015)).The 2-D/3-D responses to the heterogeneity inside the reduced model contribute to the coda waves of the teleseismic phases, and the 2-D/3-D effects outside the model are neglected.This assumption of a 1-D background layered Earth model is similar to that in receiver function analysis and is often effective for a station that is sufficiently far from the source (Langston, 1977;Rondenay, 2009).
Although the computational requirements can be efficiently reduced by these hybrid methods, the computational costs are still excessive for a small workstation when we are faced with the several thousand forward simulations required in 3-D teleseismic adjoint tomography.To further reduce the computational costs, Roecker et al. (2010) constructed a frequency domain 2.5-D hybrid method for teleseismic wave simulations.To simplify the teleseismic wave field invariant along a particular axis, the 2.5-D formulation can significantly reduce the computational demands.However, the 2.5-D formulation may restrict the application of the method in arbitrarily heterogeneous media (Tong et al., 2015).
In addition to the large computational demand (CPU time) associated with the 3-D hybrid methods, satisfying the memory requirements for storing the boundary wave fields to construct teleseismic incident boundary conditions is another important issue that should be carefully considered.Tong et al. (2015) adapted the Clayton and Engquist-type (CE-type) boundary condition (Clayton and Engquist, 1977) to interface the 1-D background analytical solution with a numerical solver on the boundary of a reduced model.This treatment can both assign the teleseismic incident condition for the computational domain and absorb the scattered wave field from the interior of the heterogeneous model.Implementing the CE-type boundary condition is extremely simple and does not substantially increase the required CPU time.However, the CE-type boundary condition can efficiently absorb waves only at approximately normal incident angles, and in-cident waves at grazing angles may be reflected back to the model (Yang et al., 2003), which may reduce the accuracy of the forward and adjoint wave fields in teleseismic adjoint tomography and thus decrease the accuracy of the constructed Fréchet derivatives.Note that the CE-type boundary condition requires nine wave field components (three displacement components and six stress components) to be stored on the model boundary; this requirement may be a significant burden on the computer memory for a relatively large-scale model decomposed by a dense numerical mesh.
Here, we introduce the element-by-element parallel spectral-element method (EBE-SEM) for the efficient modeling of teleseismic plane-wave propagation in reduced models.A significant advantage of the EBE-SEM is the easy parallelization of the spectral-element algorithm, which does not require the assembly of the global stiffness matrix.The spectral elements are equally allocated to every CPU core, and the product of the stiffness matrix and the solution vector is calculated element by element; these aspects are quite useful for ensuring load balance among the CPU cores.The element stiffness matrix can be written as the tensor product of the submatrices, which greatly reduce the computer memory.In addition, the misfit gradient can be efficiently constructed because the element matrices for calculating the misfit gradient can also be formulated from the tensor products of the element submatrices.The perfectly matched layers (PMLs; Collino and Tsogka, 2001;Komatitsch and Tromp, 2003;Liu et al., 2014) are formulated by the EBE-SEM to effectively absorb scattered waves.A detailed discussion is presented to incorporate the teleseismic incident boundary condition in the EBE-SEM, and only six components on the interface between the computational domain and the PML domain must be stored in the computer memory.The high efficiency of the EBE-SEM for teleseismic wave modeling and misfit gradient construction is demonstrated by using three numerical examples.

EBE-SEM
A schematic of a teleseismic plane wave entering a localized model is depicted in Fig. 1, where the localized model is delineated by the blue lines.We first introduce the EBE-SEM for isotropic elastic wave propagation in an infinite half space, which includes the localized model.We denote the total wave field u t , which is the summation of the background wave field u FK in the layered media and the scattered wave field u s .The wavenumber-frequency (FK) domain method (Haskell, 1953;Zhu and Rivera, 2002;Tong et al., 2014a) can be used to efficiently calculate u FK , and u s is excited by the incident plane wave because of the heterogeneity of the media.The incident angle is θ , which is the angle between the normal vector and -z.The azimuth angle is ϕ, which is the angle between the projection of the normal vector into the x-y plane and -y.The azimuth angle is not explicitly shown in this figure.

EBE-SEM for total wave simulation
We assume that the total wave field obeys the following elastic wave equation (Tong et al., 2014a): where the two dots above u t denote the second-order time derivative, I is the 3×3 identity matrix, λ and µ are Lamé parameters, ∇ = e x ∂ x + e y ∂ y + e z ∂ z is the gradient operator, and the superscript T denotes the matrix transpose operation.Following the classical SEM (Seriani, 1997;Komatitsch and Vilotte, 1998;Komatitsch and Tromp, 2002), using an arbitrary test vector v to multiply both sides of Eq. ( 1) and integrating over the domain , we obtain the following weak form wave equation: where n is the normal vector of the boundary ∂ .The definition of the double dot product can be found in De Basabe and Sen (2007).Under the natural boundary condition, the right side of Eq. ( 2) is 0. If the infinite space is decomposed into N nonoverlapping hexahedral elements, then Eq. ( 2) can be written as Each hexahedral element is mapped to the reference cube [−1, 1] 3 , and the chosen interpolation points are consistent with the Gauss-Legendre-Lobatto (GLL) quadrature points (Cohen, 2002).The interpolation basis functions in each element are as follows: where the subscript of φ denotes the ith basis function, ξ , γ , and η are the three coordinates in the isoparametric coordinate system, and ϕ is the Lagrange polynomial.The ith interpolation point in the physical element is mapped to the (i 1 , i 2 , i 3 )th node in the cube [−1, 1] 3 .Based on Eq. ( 4) and the discrete values u e,i t and v e,i on the interpolation points, the continuous values u e t and v e in element e can be approximated by where n is the polynomial order of the interpolation basis.By substituting Eq. ( 5) into Eq.( 2) and using the GLL quadrature rule, we obtain the ordinary differential equation in terms of time: where M e is the element mass matrix, K e is the element stiffness matrix, and U e t is the element solution vector.The elements of M e and K e are presented in Appendix A. Because the interpolation points are consistent with the GLL quadrature points, M e is diagonal, which is one of the advantages 972 S. Liu et al.: Element-by-element parallel spectral-element methods for 3-D teleseismic wave modeling of the Legendre polynomial-based SEM because the explicit inverse of the mass matrix can be easily obtained.The assembled global diagonal mass is as follows: We define the projection operator T e to map the corresponding elements of M back to form an element matrix: where the tilde above M e is used to distinguish the difference between the back-projected mass matrix and the element mass matrix in Eq. ( 6).Because the elements of the element mass matrix that correspond to the common points shared by the adjacent elements are summed to form the global mass matrix in Eq. ( 7), some elements of M e are greater than those of M e .The notation of Eq. ( 8) is important for the discussion below.If we denote the global solution vector as U t , then the back-projected element solution T e (U t ) is still U e t because u t is continuous cross elements.From Eq. ( 6), we have An explicit temporal scheme, such as the Newmark scheme (Liu et al., 2017a), and structure-preserving schemes (Liu et al., 2017b) can be adopted for time integration.Here, we simply use the second-order central difference method (Dablain, 1986) for the time evolution.The product of the stiffness matrix and the solution vector is calculated element by element at the element level rather than at the global level.The computational burden of SEM mainly stems from the matrix-vector product, and the significant advantage of the element-by-element scheme of Eq. ( 9) is the great computational balance among CPU cores when the parallel algorithm is performed on a workstation.Our parallel code is based on the Message Passing Interface (MPI) library (www.mpich.org/downloads),and we equally distribute the spectral elements to CPU cores.
To further increase the computational efficiency of SEM, the GLL quadrature rule is fully considered in the product of the stiffness matrix and the solution vector.To simplify the discussion, the product of K e 11 in Eq. (A4) and U tx is presented below, where the subscript x denotes the x component of the discrete displacement vector.Based on the tensor product of matrices (Seriani, 1998) where P 1 = D ⊗ I ⊗ I, P 2 = I ⊗ D ⊗ I, P 3 = I ⊗ I ⊗ D, I is the (n+1)th order identity matrix, and the (n + 1) . The superscripts of A represent the physical coordinates, whereas the subscripts denote the isoparametric coordinates.A 1,1 1,1 is given by where δ is the Kronecker-delta symbol, r, s, and t represents matrix row indices, i, j , and k denote matrix column indices, and Det(J) is the determinant of the Jacobi matrix evaluated at GLL point ξ i , γ j , η k .Equation ( 11) shows that A is a diagonal matrix, and only (n + 1) 3 elements need to be stored for each A. For every hexahedral element, 81 combinations may appear for A. However, 45 matrices need to be stored because of the symmetry of the matrices.

EBE-SEM for the PML formula
In our previous work, a second-order PML absorbing boundary condition (PML ABC) was formulated using the mixedgrid finite-element method (Liu et al., 2014).Here, we use this type of PML ABC to absorb the scattered wave field u s = u sx , u sy , u sz .To simplify the discussion, only the for-mula of the PML ABC for the displacement of the x component is shown below: where d x , d y , and d z are damping coefficients along the three coordinate axes, the subscript of d denotes the first spatial derivative of the damping coefficient, u sx,1 , u sx,2 , u sx,3 , u sx,4 , and u sx,5 are the five split components of u x , and P x,1 , P x,2 , and P x,3 are the three intermediate variables.Based on the natural boundary condition, the corresponding weak form of Eq. ( 12) is  The discretized version of Eq. ( 13) is  D x,z , then we obtain the element-by-element scheme for Eq. ( 14): T e (M 1 ) −1 T e (D x ) P e x,1 T e (M 1 ) −1 T e (D z ) P e x,3 Solid Earth, 8, 969-986, 2017 www.solid-earth.net/8/969/2017/From Eqs. (B11) to (A20), Eq. ( 15) can be rewritten as From Eq. ( 16), we observe that the element damping matrices do not need to be stored.It should be noted that the value of the damping coefficient is divided by N e if the node located on the element surface is shared by N e elements.In each element, the stiffness matrix can also be written as a tensor product of matrices, which is similar to Eq. (10).

Discussion of the EBE-SEM
Although the EBE-SEM is specially designed for teleseismic wave modeling (i.e., Eq. 9 is for teleseismic total wave field propagation if the proper teleseismic incident boundary condition is added and Eq.16 is for absorbing the scatted wave field), EBE-SEM can be directly used for wave field simulation of an earthquake that occurred in the interior of the model (computational domain) if a source term is added to Eq. (9).Seriani (1997) is the seminal work that introduced the EBE-SEM.In the 2-D case, the element-by-element scheme is combined with the Chebyshev orthogonal polynomialbased SEM.Because the Chebyshev orthogonal polynomial is orthogonal and associated with the weight 1/ 1 − ξ 2 , the Chebyshev orthogonal polynomial-based SEM cannot lead to a diagonal mass matrix and an iterative algorithm is generally used to solve a large spare linear system of equations, which may not be efficient for larger-scale seismic waveform modeling.Another aspect of the EBE algorithm of Seriani (1997) is that it is only appropriate for use with only rectangular elements of a fixed shape; this restriction may be problematic for curved elements.In contrast, EBE-SEM in the paper can be used for general cases.
The recent improvements in the SPECFEM3D software also incorporate the tensor products of the element stiffness matrix (Peter et al., 2011).Geology models can be decomposed by fully unstructured hexahedral meshes.Great load balancing is achieved based on graph partitioning.This software does not explicitly assemble the global stiffness matrix.The main contribution of this paper is the detailed introduction of EBE-SEM for high-efficient teleseismic wave modeling.
If appropriate boundary conditions are added for the computational domain and the PML domain, then the computation will be isolated in the counterpart domain, i.e., Eq. ( 9) for the computational domain and Eq. ( 16) for the PML domain.The boundary conditions are discussed presented below.

Teleseismic wave incident boundary conditions
For the completeness of this paper, we first simply introduce the FK method for determining the 3-D elastic-waveequation-based plane-wave propagation in layered media.For more details, the reader is referred to Haskell (1953) and Tong et al. (2014a).Our focus is to construct the teleseismic wave incident boundary conditions and develop a highly efficient method for the storing the boundary wave fields.

Plane-wave propagation in 1-D layered media
Transforming the elastic wave equation into the ω, k x , k y domain, we obtain the following equation: where u FK = u FKx , u FKy , u FKz is the plane wave in layered media, k x and k y are the wavenumbers in the x and y directions, respectively, and ω is the angular frequency.We assign u FK the same notation in both the time-space and frequency-wavenumber domains to avoid clustering.If we define u FKH e H = u FKx e x + u FKy e y and ke H = k x e x + k y e y , where k = k 2 x + k 2 y and e H is the horizontal normal vector, then Eq. ( 17) can be written as www.solid-earth.net/8/969/2017/Solid Earth, 8, 969-986, 2017 If we define y 1 = iu FKH , y 2 = u FKz , y 3 = iσ Hz , and y 4 = σ zz , Eq. ( 18) can be changed to the following ordinary differential equations: By calculating the eigenvalues and eigenvectors of F , the general solution of Eq. ( 19) can be written as where C 1 , C 2 , C 3 , and C 4 correspond to the amplitudes of the upgoing S wave, downgoing S wave, upgoing P wave, downgoing P wave, respectively; p = k ω , υ P = ω 1 α 2 − p 2 , υ S = ω 1 β 2 − p 2 , and γ = (2k 2 − ω 2 β 2 ) are the ray parameters.As shown in Fig. 2, the velocities, density, and thickness of the mth layer are denoted as V P m , V Sm , ρ m , and h m , respectively.We define the following matrices for the mth layer: The relationship between the wave fields at z = z m−1 and z = z m can be written as In Eq. ( 23), m is the propagation matrix.The wave field at the free surface can be represented by If we consider that the incident plane wave is an upgoing P wave with unit amplitude, then we have We define The values of y 3 and y 4 at the free surface can be written as Based on the free surface boundary condition y 3 = y 4 = 0, we have Substituting Eqs. ( 25) and (28) into Eq.( 23), we obtain the wave field at z = z m .Based on Eq. ( 23), we can calculate the wave field in each layer.The first N 1 elements of B compose the computational domain (the domain bounded by the blue lines in Fig. 1), and the elements from N 1 + 1 to N 1 + N 2 compose the PML domain.We define the set of elements in the computational domain as A total of N 3 elements in B C that contact the boundary of PML domain are collected in the following set: The set of N 2 elements that compose the PML domain is given by A total of N 4 elements in B P that contact the boundary of the computational domain are gathered in the following set: We define an operator that maps the elements of vector A to a target vector A t according to the numbering set A n .The index number of each elements of set A t in set A is gathered in A n , and we define From Eq. ( 9), we have where A C is the numbering set of the nodes in the computational domain.Equation ( 35) shows that only matrix-vector products in the elements of the computational domain and the elements of the PML domain in contact with the boundary of the computational domain contribute to the acceleration wave field on the nodes in the computational domain.
Based on Eq. ( 35), we have where A C,B is the numbering set of nodes of the computational domain located on the interface between the computational and PML domains.The second term of the right side of Eq. ( 36) can be written as T e (M) −1 K e T e (U t ) Equation ( 37) is the plane-wave incident condition for the computational domain.Based on Eq. ( 37), we only need to store the boundary wave field to construct the incident boundary condition for the computational domain.
The length of − e∈B P ,B T e (M) −1 K e T e (U FK ) A C,B is 3 times as long as the number of nodes located on the interface between the computational and PML domains; and it is not necessary to store U FK on the nodes of the elements in B P ,B .This storage technique can decreased the amount of memory required by n−1 n × 100 %, which is extremely important for large-scale models.This storage technique is similar to the method of the linear combination of a boundary wave field, which was proposed in our previous work (Liu et al., 2015).

S. Liu et al.: Element-by-element parallel spectral-element methods for 3-D teleseismic wave modeling
To simplify the discussion, the boundary condition of the first equation of Eq. ( 16) is discussed.From Eq. ( 16), we have where A P is the numbering set of nodes in the PML domain.
The scattered wave on the interface between the computational domain and the PML domain can be obtained by the following equation: where A P ,B is the numbering set of the nodes of the PML domain on the interface.U sx A P ,B does not need to be calculated by Eq. ( 16).From Eqs. ( 38) and ( 39), we have Considering the other equations of PML ABC, the boundary condition of PML ABC to absorb the scattered waves is In addition to the boundary condition given by Eq. ( 41), the Dirichlet boundary condition is added to the outer boundaries of the PML domain, and the natural boundary condition is added to the planes that connect the free surface of the computational domain.Because of the boundary conditions, the element-level matrix-vector product is restricted to only the element in the computational domain and the PML domain.Because the boundary conditions in Eqs. ( 37) and ( 41) involve the plane-wave fields, we call Eqs. ( 37) and ( 41) teleseismic wave incident conditions.

Analysis of the computational costs
We use the model in Fig. 1 to quantitatively discuss the computational cost of EBE-SEM for teleseismic wave modeling.
Because the thickness of the PML domain in our numerical examples is only three elements wide, the number of floating point operations is trivial compared with the computational domain, and the main computational cost of the PML domain is from the storage requirement of the boundary condition (Eq.41).
The model is decomposed into 75 000 cubic elements with a size of 2 km × 2 km × 2 km.The fourth-order interpolation polynomial is used in the space.The time interval is 0.01 s, and the total time for the numerical modeling is 100 s.
The floating point operations in the computational domain are mainly from the element matrix-vector product.A total of 1.0546875 × 10 10 floating point multiplication operations are required in each time step.If a global stiffness matrix is assembled, then the product of the global stiffness matrix and the global solution vector requires a total of 9.35974 × 10 9 floating point multiply operations.The computational burden of EBE-SEM is greater than that of the conventional SEM for two reasons: the element stiffness matrix and solution vector product are decomposed into a total of 189 submatrix and solution vector products (Eq.10), and the acceleration on common nodes shared by the adjacent elements is calculated more than once.However, the number of operations of EBE-SEM increases by only 11.26 %.This increased computational amount may be compensated for by the high load balance of EBE-SEM in parallel computing, as will be discussed in the numerical examples.
The memory requirements of EBE-SEM and the conventional SEM for teleseismic wave modeling are presented in Table 1.A total of 13.12 GB is required to store the boundary wave field to construct the teleseismic incident conditions.If the classical compressed sparse row (CSR) storage format (Greathouse and Daga, 2014) is used to store the spare stiffness matrix of the conventional SEM, then the storage requirement is 38.76 GB, which is nearly 25 times as large as the storage requirement of EBE-SEM to store the 45 element submatrices.Two factors contribute to this storage difference.First, CSR must store the row and column information of the global stiffness matrix in addition to the storage of nonzero elements.Nearly 0.018 and 3.87 GB are required to store the row and column information, respectively.Second, the 45 submatrices for each element are diagonal, i.e., only (n + 1) 3 nonzero elements of each submatrix must be stored.

Numerical examples
Three numerical examples are provided to validate the efficiency of EBE-SEM for teleseismic wave modeling.All three examples use the Gaussian source-time function with a cutoff frequency of 2 Hz (Tong et al., 2014a).

Benchmark for the 1-D crust-upper-mantle model
Except for the model parameters presented in Sect.4, the material parameters are shown in Table 2; the incidence angle is 15 • , and the azimuth angle is 100 • .The cross section of the plane-wave front with the Moho is located at (−400 km, 0 km, 30 km) at the initial time t = 0.A seismic station is    Figure 3 shows snapshots at four instants.When t = 10 s, the plane wave is still outside the domain (Fig. 3a-c).When t = 16 s, the incident plane wave enters the lower layer of the model (Fig. 3d-f).When t = 18 s, the transmitted and reflected P waves are very clear in the plane at y = 50 km (Fig. 3h). Figure 3i clearly illustrates that the wavelength in the upper mantle is greater than that in the crust because the velocity of the upper mantle is approximately 1.39 times the velocity of the crust.When t = 22 s, the transmitted P wave is reflected by the free surface, and the Moho-reflected waves are propagating outside the model (Fig. 3k).
To qualitatively evaluate the accuracy of EBE-SEM for teleseismic wave modeling, the synthetic seismology at the station is compared with the reference solution, which is generated by the FK technique.The results are shown in Fig. 4. As depicted in Fig. 4a, the synthetic and reference waveforms show excellent agreement.The direct wave with the largest amplitude is followed by the converted S wave and the crust multiples with relatively small amplitudes.Although the amplitudes of the crust multiples are small, their phases, ampli-tudes, and travel times are correctly modeled.The error curve is shown in the right panel of Fig. 4. The maximum relative difference between the numerical solution and the reference solution is less than 5 %, although the numerical method uses only fourth-order interpolation in the space and nearly one spectral-element sampling for the minimal wavelength.
The computation was performed on a workstation with 2 Intel Xeon CPUs (E5-2680 v3) and 128 GB of RAM, and 24 CPU cores were used.The master-slave communication pattern is used in our parallel algorithm, which is similar to that of Komatitsch and Tromp (2002).The parallel efficiency is shown in Fig. 5.Because the master-slave communication pattern requires at least two CPU cores to perform the parallel algorithm, the computation time of single CPU core is not shown in Fig. 5.When 2 CPU cores are used, the amount of communication between the CPU cores is negligible compared to the amount of computation.If we denote the CPU time when 2 CPU cores are used in parallel computation as T 2 , the CPU time of n CPU cores can be estimated by T n = 2 n T 2 .The red line in Fig. 5 represents the estimated CPU times (theoretical times), and the blue line with circles is the practical CPU times.It can be clearly observed that the actual CPU times are extremely close to the theoretical times, even if the number of CPU cores is 24.The red arrows in Fig. 5 indicate the abnormal CPU times.This phenomenon is attributed to the excessive communication amount when the numbers of CPU cores are 18 and 19.However the anomaly is not large compared with the neighboring CPU times.

Plane-wave incidence to a 3-D model
To demonstrate that EBE-SEM works well in 3-D heterogeneous models, an abnormal structure with cube shape of an additional 15 % plus wave speeds and density are added at the center of the model in Fig. 1.The size of the abnormal structure is 20 km × 20 km × 20 km.Except for velocity structure, the other computational parameters are the same as those of the first numerical example.The simulation results are shown in Fig. 6.
The black arrows in the upper plots of Fig. 6 indicate the distortion of the wave front because of the velocity anomaly in the media.Because of the positive velocity anomaly, the distorted wave front travels faster than the undistorted plane wave.The lower plots in Fig. 6 show strong scattered waves.As the yellow arrows indicates, the strong scattered waves do not reflect back to the interior of the model because of the efficiency of the PML ABC used in this paper.

Teleseismic waveform misfit gradient
One key advantage of the EBE-SEM is its convenience for constructing the misfit gradient because the element stiffness matrix can easily be assembled based on the tensor product of the submatrices.To illustrate this advantage, we first define the misfit function: where m = (ln ρ, ln λ, ln µ) represents model parameters, x r is the station location, x s is the teleseismic incident parameters, d is the observed dataset, and s is the synthetic dataset.The cross section between the plane wave and the Moho at y = 0, the incident angle, and the azimuth angle constitute the teleseismic incident parameters.To simplify the discussion, this example only considers one source and one station.
Based on the continuous adjoint method discussed by Fichtner (2011), the adjoint equation is where w t is the adjoint total wave field.The misfit gradient ∇ m E obeys the following equations (Monteiller et al., 2015): where e represents the eth element; W is the discrete vector of w; K e λ and K e µ are the element stiffness matrices for the misfit gradient calculation, which are presented in Appendix C; and N T is the total time step.K e λ and K e µ can also be written as a combination of the tensor product of the submatrices, which leads to an easy construction of the misfit gradient.The misfit gradients can be written as the gradients with respect to ln ρ, ln V P , and ln V S by simple combinations of Eqs. ( 44)-( 46).
We consider the model in the second numerical example to be the real model and the 1-D layered model in Fig. 1 as the initial model.The observed and synthetic waveforms at the station (red triangle in Fig. 1) are shown in Fig. 7a.As Fig. 7b shows, the time-reversed waveform differences between the observed data and the theoretical data act as the source term in Eq. ( 43).
Figure 8 shows the constructed misfit gradients in the plane at y = 61 km at two time slices.Because of the singularity of the source, large amplitudes are distributed in the vicinity of the adjoint source.The misfit gradient no longer has a banana-doughnut shape (Tromp et al., 2005) but rather is similar to the Greek letter (Fig. 8e, f) because of the plane-wave source.The misfit gradient with respect to ln ρ is much weaker than those for ln V P and ln V S .The extremely strong negative misfit gradients of ln V P distribute along the ray path of the direct P wave.

Discussion and conclusions
Teleseismic wave adjoint tomography has the ability to image the deep structure of the lithosphere.Thus, a highly efwww.solid-earth.net/8/969/2017/Solid Earth, 8, 969-986, 2017  ficient method for teleseismic wave forward-modeling and misfit calculation is important.In this work, the EBE-SEM was specially tailored to teleseismic wave modeling and misfit gradient calculation.In this approach, the PML ABC is discretized by EBE-SEM, and the method can efficiently absorb scattered teleseismic waves.Teleseismic wave incident conditions are constructed for the computational and PML domains.An economic technique for boundary wave field storage is introduced that can greatly reduce the required amount of computer memory.
The numerical results from the first and second numerical examples demonstrate not only the efficiency of EBE-SEM in modeling teleseismic waves but also the validity of the constructed teleseismic wave incident boundary condition.As shown in the third numerical example, hardly any extra effort is required to construct the misfit gradient.The EBE-SEM has advantages over the traditional SEM in three respects: the reduction in the required computer memory requirement, easy calculation of the misfit gradient, and significant parallelization efficiency.
Code and data availability.The original source codes for the numerical examples are written in C. To obtain these source codes, please contact the first author (email: slliu@math.tsinghua.edu.cn) or the corresponding author by email.The MPI library was downloaded from http://www.mpich.org/downloads.

Figure 1 .
Figure1.Schematic of a teleseismic plane wave entering a local two-layered crust-upper-mantle model.The study area is delineated by the blue lines.The origin of the Cartesian system is located on the surface at corner A of the model.The positive x and y directions point to the east and south, respectively, and the positive z direction points downward.The inverted red triangle located at (60 km, 60 km, 0 km) represents a seismic station.The plane in the lower left of the figure denotes the wave front of the incident teleseismic plane wave.The purple arrows indicates the normal vector of the plane wave.The incident angle is θ , which is the angle between the normal vector and -z.The azimuth angle is ϕ, which is the angle between the projection of the normal vector into the x-y plane and -y.The azimuth angle is not explicitly shown in this figure.
ρv x üsx,4 + d x + d y usx,4 + d x d y u sx,ρv x üsx,5 + (d x + d z ) usx,5 + d x d z u sx, + K e zx U e sz = 0, U sx = U sx,1 + U sx,2 + U sx,3 + U sx,4 + U sx,5 , (14) where K e x , K e y , K e z , K e xx , K e yy , K e zz , K e xy , K e yx , K e xz , and K e zx are element stiffness matrices and D e x , D e y , D e z , D e xx , D e yy , D e zz , D e xy , D e xz , D e x,y , and D e x,z are the element damping matrices.The detailed expressions of these element matrices are presented in Appendix B. The damping matrices are also diagonal matrices, whose element values are scaled by a constant compared with the element values of M e .If we denote the global damping matrices as D x , D y , D z , D xx , D yy , D zz , D xy , D xz , D x,y , and

Figure 2 .
Figure 2. 1-D layered background model.The thickness, P wave velocity, S wave velocity, and density of each layer are shown.

=
P x A P −A P ,B .

Figure 3 .
Figure 3. Plane-wave incident on a 1-D crust-upper-mantle model.The snapshots from the top to the bottom are taken at t = 10, 16, 18, and 22 s, respectively.The left, middle, and right panels are corresponding to snapshots of planes at x = 50 km, y = 50 km, and z = 30 km, respectively.The yellow arrows in (j) and (k) denote the weakly transmitted and reflected S waves.The color scale is shown on the right side of the figure.

Figure 4 .
Figure 4. Comparison of the waveforms generated by EBE-SEM (dotted line) and FK (solid line).The waveform computed by FK is considered the reference solution.Panels (a), (c), and (e) are the waveforms in the x, y, and z directions, respectively.The error curves presented in the right panels are obtained by subtracting of the numerical solution from the reference solution.For visualization purposes, the errors are multiplied by 20.

Figure 5 .
Figure 5. Parallel efficiency of the EBE-SEM.The red line denotes the theoretical CPU time, and the blue line with circles represents the actual CPU times.The red arrows designate the abnormal CPU times compared with the neighboring CPU times.

Figure 6 .
Figure 6.Snapshots in a heterogeneous model.The snapshots in the upper and lower panels are taken at t = 18 and 22 s, respectively.The left, middle, and right panels are snapshots of planes at x = 50 km, y = 50 km, and z = 30 km, respectively.The black arrows in the upper plots indicate the distortion of the wave front because of the velocity anomaly in the media.The yellow arrows in (a) and (b) indicate the scattered waves.The yellow arrows in (d) and (e) denote the scattered waves that are efficiently absorbed by the constructed PML ABC.The color scale is shown on the right side of the figure.

Figure 7 .
Figure7.Construction of the adjoint source.In the left panels, the synthetic waveforms from the second example are considered the observed data (blue line), and the synthetic waveforms from the first example are treated as the theoretical data (red line).The rectangular boxes indicate the time window[20, 24]  to isolate the waveforms to construct the adjoint source.The top, middle, and bottom panels correspond to the x, y, and z components of the displacement.The three components of the adjoint source-time function is the time-reversed waveform difference between the observed data and the theoretical data.

Figure 8 .
Figure 8. Misfit gradients in the plane at y = 61 km at two times.The top and bottom panels correspond to t = 18 and 24 s, respectively.The left, middle, and right panels correspond to the misfit gradients in terms of ρ, V P , and V S , respectively.The color scale is shown on the right side of the figure.
Element matrices for the PML domainThe element matrices for the PML domain are listed below: et al.: Element-by-element parallel spectral-element methods for 3represents the average value of the damping coefficient on the element e.Appendix C: Element matrices for the misfit gradientIn Eqs.(44) and (45), K e λ and K e µ are given by a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 B = e 1 , . .., e N 1 , e N 1 +1 , . .., e N 1 +N 2 , . . . .

Table 1 .
Liu et al.:Element-by-element parallel spectral-element methods for 3-D teleseismic wave modeling Memory requirements of EBE-SEM and the conventional SEM for teleseismic wave modeling.All the values are stored in memory with single float precision.GB denotes gigabyte.