Introduction
Global stack of 1 h of seismograms accurate to a
shortest period of 2 s for an earthquake in 27 km depth computed
with Instaseis. The displacement is color-coded analogous to the
IRIS global stack , i.e. red: transversal component,
green: radial component, blue: vertical component. An automatic gain control
(AGC) with a window of 100 s length is used to balance large
amplitude variations between the various phases. Note that creating this
plot does not require to define the source depth at the time of database
calculation. A high-resolution version of this plot and one for 5 h
long seismograms is added as an electronic supplement.
Despite the exponential growth of computational power and substantial progress of 3-D
numerical methods for seismic wave propagation in the last 15 years , the simulation of the highest frequencies observed
in seismic waves on the global scale remains a high-performance computing challenge and is
not yet done routinely. This is why many seismologists still rely on approximate methods
to compute and analyze high-frequency body waves such as ray-theoretical travel times
e.g. the TauP-toolkit described in, WKBJ synthetics
, the reflectivity method or the frequency–wave number
integration method . More recently, several methods that include the full
physics in solving the seismic wave equation while reaching the highest observable
frequencies by assuming spherically symmetric models have become available, see
Fig. for an example. These methods include the direct solution
method DSM,, the frequency domain integration
method GEMINI, and a generalization of it including self
gravitation Yspec,.
As detailed by , the main drawback of these methods when applied
to computing wave fields rather than single seismograms, is their scaling proportional to the
number of points in space where the wave field is sampled. This motivated the development
of a direct time-domain approach, where the displacement as a function of space and time
is a natural field variable and only needs to be written to disk
. The implementation of this axisymmetric spectral
element method AxiSEM was recently extended to include anisotropy and attenuation
, published under public license
and is available at www.axisem.info.
As computing full global waveforms especially at higher frequencies requires substantial
computational resources, several initiatives serve to deliver waveforms by means of
databases without having to run a full numerical solver. The ShakeMovie project
(http://global.shakemovie.princeton.edu) provides synthetics for earthquakes from
the CMT (Global Centroid-Moment-Tensor) catalogue (www.globalcmt.org) recorded at permanent
Global Seismic Network (GSN) and International Federation of Digital Seismograph Networks (FDSN) stations
in 1-D and 3-D velocity models . The Pyrocko toolbox
(http://emolch.github.io/pyrocko) provides a Python interface to generate and
access Green's function databases, which for the global case are based on GEMINI,
several databases are offered for download.
In this paper we present a method that uses AxiSEM to generate global Green's
function databases and provides a Python interface for convenient extraction of
seismograms. The advantage over ShakeMovie synthetics are the possible higher
frequencies and arbitrary source and receiver combinations independent of catalogues and
real stations. Compared to Pyrocko with GEMINI synthetics,
AxiSEM is more efficient in generating the databases, allowing to routinely
compute them for a large number of different background models or specialized
applications (e.g. limited depth/distance ranges). Also, by using the Lagrangian
polynomials in the SEM (spectral element method) mesh as basis functions, it achieves higher spatial accuracy.
This paper is structured as follows. In Sect. we present the technical
aspects and argue for the choices made in the spatial and temporal discretization.
Section gives a short overview of the Python interface. In
Sect. we show the performance with respect to accuracy, speed and
disk space requirements for the databases. Finally, we depict a variety of applications in
Sect. .
Methods
Computing Green's functions with AxiSEM
The 3-D wave field is decomposed analytically into
monopole, dipole and quadrupole radiation patterns (left) and the remaining 2-D
problem is solved on a D-shaped domain (right) using the spectral element
method. While the forward databases require a total of four 2-D computations,
it is only two for the backward databases using reciprocity of the
Green's function: one for the vertical and one for the horizontal
components modified from.
AxiSEM was designed from the beginning with the application of computing
global wave fields rather then single seismograms in mind . This
becomes apparent in the following main advantages in this application: it uses a 2-D
discretization (Fig. ), with an analytical decomposition of the 3-D
wave field into several 2-D wave fields. For moment tensor sources, four 2-D wave fields are
needed, for force sources, two. As it is a time domain method, the displacement field in
space-time is a natural field variable of the numerical scheme and simply needs to be
written to disk without any extra computational cost when larger regions of Earth are
included in the database. AxiSEM uses a spectral element scheme for spatial
discretization which lends itself well to parallelization on High Performance Computing (HPC) systems. As it is based
on the weak formulation of the wave equation, it naturally includes the free surface
boundary condition and allows for highly accurate modeling of surface waves.
argued against using collective parallel I/O since the
availability of the NetCDF libraries was not granted on all
supercomputers. For that reason, we implemented a round robin I/O scheme, which remains
advantageous when running AxiSEM on less than about 100 cores in parallel and to
avoid installation problems on systems where NetCDF is not available as a pre-compiled
package. On supercomputers however, the situation has since improved and NetCDF compiled
with parallel support seems now to be widespread. For this reason, we implemented a
collective parallel I/O scheme that performs well, even when running on more than 1000
cores, see Table . In this scheme, all processes that have to write data
to disk communicate via the message passing interface (MPI) and then write collectively at
the same time to the parallel file system. This way we achieved throughputs of up to
4 GB s-1 on SuperMUC.
Forward and backward databases
Instaseis has the capability of dealing with forward wave fields,
i.e. the waves are propagated from a moment-tensor point source at fixed depth (i.e. receivers
exist throughout the medium), as well as backward or reciprocal
wave fields, where the wave fields are propagated from a single-force point source at fixed
depth and recorded throughout the medium (i.e. sources exist throughout the medium).
Potential applications of forward databases are the generation of 3-D
wave-propagation movies , the computation of incoming teleseismic
waves in 1-D/3-D hybrid methods e.g. or the forward
field in the computation of sensitivity kernels for seismic
tomography. To generate a forward database, a total of four runs with
AxiSEM are needed .
In contrast, reciprocal databases utilize the reciprocity of the Green's
functions, and are useful in all cases where the receivers are at fixed depth, thus for
instance mimicking earthquake catalogues recorded at stations along the surface. The
source can be located anywhere in the region where the Green's functions are recorded in
the simulation, thus allowing for unlimited choices in the source–receiver geometry. To
generate a reciprocal database, a total of two runs with AxiSEM are
needed, one for the vertical component and one for both horizontal components of the
seismogram . It is also possible to compute a database for the
vertical component seismograms only, which is then a factor of 3 faster and uses only
about 40 % of the disk space.
The spatial scheme
Lagrangian basis polynomials ln(ξ) of fourth order in
one dimension. At the collocation points, all but one are zero, such that the
value of the interpolated function at this point coincides with the
coefficient in this basis expansion.
Lagrange interpolation points inside an element (gray)
and its neighbors. Coordinates ξ and η are the reference
coordinates of the gray element. Points on the edges (black squares) are
shared between neighbors and function values at these points need to be
stored only once if the function is continuous (e.g. displacement). The
number of global degrees of freedom per element of such
functions is thus approximately 16
compared to 25 for discontinuous functions (e.g. strain).
For the spatial discretization we choose to keep the same basis as used in
AxiSEM. The displacement u within each element is expanded in terms of Lagrangian
polynomials li (see Fig. ) of order N defined on the integration
points of the spectral element scheme (see Fig. ):
u(ξ,η,t)=∑ij=0Nuij(t)li(ξ)lj(η);
ξ and η are the reference coordinates of the element and N typically
has a value of 4. This approach has several advantages.
The wave field is represented by polynomials, typically of degree 4;
interpolation is hence of 4th order accuracy.
The basis is local and only few coefficients are needed to represent the
wave field inside an element (typically 25), in contrast to e.g. global basis
functions such as spherical harmonics.
Discontinuities in the model that cause discontinuities in the strain Green's
functions are respected by the mesh.
The strain tensor (representing the moment tensor in the
reciprocal case) can be computed on the fly from the stored displacements at
high accuracy. This reduces the storage by a factor of 2 as the displacement
has 3 degrees of freedom, compared to 6 for the strain.
Since the displacement is continuous also at model
discontinuities and element boundaries, it needs to be stored only once at
all Gauss–Lobatto–Legendre (GLL) points that belong to multiple elements,
reducing the storage by another factor of 16/25=0.64 (see Fig. ).
Storing the displacement allows to use force sources as well without any extra
computation or storage requirements.
Snapshot of one component of the Green's
tensor (Grr,r) as represented in the SEM basis for a shortest period of
50s. Discontinuities such as caused by the crustal layers are exactly
represented and the wave field is smooth across doubling layers of the mesh.
One component of the strain Green's tensor
(Grr,r) for a distance of 30∘ as a function of time and depth
with a shortest period of 2 s. (a) SEM basis vs. (b) regular sampling
with 1 km distance and (c) phase and envelope misfits
EM and PM in the legend, see
caused by the regular sampling computed in the period range 1–20 s.
Dashed lines in the left panel sketch the spectral elements. The crustal
discontinuities of ak135f are indicated by
solid lines and lead to discontinuities in Grr,r, which are exactly
represented in the SEM basis.
Figure visualizes the spatial representation for a long period mesh
(50 s) for the Rayleigh wave train and the Grr,r component of the strain Green's
tensor: the strain is smooth also across the doubling layer of the mesh where the
background model ak135f, is smooth as well. Still, the
discontinuities of the model and hence the strain are explicitly represented by this
discretization and the resolution of the mesh is adapted to the local wavelength, as for
instance in the crust. Figure shows an example for 2 s
shortest period and compares the SEM discretization to regular depth sampling. In the regular
sampling case with nearest neighbor interpolation, the phase and envelope errors can be
quite large, especially close to the model discontinuities up to 80 % envelope misfit and
4 % phase misfit as defined by and for very shallow sources (up to
40 % envelope misfit and 14 % phase misfit).
Finite element mapping
Voronoi approximation (colored) of the AxiSEM
mesh (black lines) using the midpoints of the elements (red circles) only,
zoomed onto a doubling layer for a 50 s mesh. For most elements, the
Voronoi cell coincides almost exactly with the AxiSEM element, note
that most of the AxiSEM elements have edges of concentric circles
while the edges of the Voronoi cells are all straight lines. In the worst
case, six AxiSEM elements have to be tested whether a point is inside
or not.
One performance-critical step in the spatial scheme is to find the reference coordinates
(ξ,η) inside the spectral element that includes a point given in global coordinates (s,z). While the opposite mapping is trivial because this is how the elements of the SEM are
defined Appendix A1, it cannot be generally inverted easily.
presents an analytical inverse solution for quadrilateral elements, which
is quite involved and not easy to generalize for the semicircular elements used in
AxiSEM.
We follow a two-step approach to finding the reference coordinates. First,
we find the six closest element midpoints to
limit the search to a small number of candidate elements in which the point could be.
The number six is specific to the AxiSEM mesh, where each corner point can belong
to a maximum of six elements in the doubling layers, see Fig. . This step
can be seen as approximating the AxiSEM mesh with Voronoi cells. For most points,
the closest midpoint will already indicate the correct element, in the worst case the
second step has to be performed for all six candidates.
Lanczos kernels used for resampling. For large
values of the parameter a, it converges towards the sinc function, which
is the kernel that allows exact reconstruction for bandlimited signals as
stated in the Nyquist sampling theorem .
Resampling using a Lanczos kernel with a=12 of a
P arrival velocity seismogram convolved with the source time function from
Fig. recorded at 40∘ distance, and the
resampling error multiplied by a factor of 100. The relative error is on
the order of 0.05 % of RMS, see Fig. .
RMS error of the resampling using the first 1800 s
of the seismogram from Fig. . Convergence is reached
around a=20. It does not converge to zero because some high-frequency
energy was neglected in the downsampling, see Fig. .
In a second step, the reference coordinates (ξ,η) of the given point (sp,zp)
are computed for the six candidate elements sorted by the distance of the midpoints. If
both ξ and η are in the interval [-1, 1], the element is found. The coordinates (ξ,η) are
computed using an iterative gradient scheme adopted from SPECFEM3D
. Starting from the midpoint of the candidate element, updated
values are found by linear approximation of the inverse mapping:
ξn+1ηn+1=ξnηn+J-1(ξn,ηn)⋅sp-s(ξn,ηn)zp-z(ξn,ηn),
with the Jacobian matrix defined as
J(ξ,η)=∂ξs∂ηs∂ξz∂ηz,
and the mapping s(ξ,η) and z(ξ,η) depending on the element type
as defined in . In the AxiSEM mesh, this iteration
converges to numerical accuracy within less than 10 iterations and is not performance
critical for Instaseis as it is only used on the few candidate elements.
Also, this two-step approach requires only the midpoints of all elements in the mesh to be
read from file on initialization and can be implemented efficiently using the kd-tree
provided by the SciPy package (http://www.scipy.org/).
The temporal scheme
Normalized amplitude spectra of the Gaussian source
time function (slip rate) used at 2 s mesh period and a vertical
component synthetic seismogram recorded at 40∘ epicentral distance.
The vertical lines denote the resolution of the mesh and the Nyquist
frequency of the downsampling using four samples per mesh period.
The design of the temporal scheme is guided by a number of constraints on the spectrum of
the source time function: the spectrum should decay steep enough above the highest
frequency resolved by the mesh, such that the least number of samples according to the
Nyquist criterion can be used without introducing aliasing. On the other hand, it should
not decay too steeply, such that it is still possible to deconvolve and convolve with
another source time function. Additionally, the spectrum should be as flat as possible
within the usable frequency range as well as “earthquake-like” without the necessity of
deconvolution when extracting a seismogram from the database. An actual delta function
as would be required for true Green's functions cannot be represented in a discrete
approximation as it is not bandlimited.
We found a Gaussian source time function with σ=τ/3.5 to fulfill these
requirements, where τ is the shortest period resolved by the mesh.
Figure shows the amplitude spectra of this source time function as well
as a corresponding velocity seismogram at a distance of 40∘. The two spectra have
a very similar general shape and decay to 10-3 of the maximum at half the shortest
period. This motivates that sampling with four samples per period will not introduce
aliasing artifacts.
It is desirable to retrieve seismograms from the database with arbitrary time steps,
which requires interpolation or resampling. Popular time domain schemes such as
interpolation by low-order polynomials or splines do not work well close to the Nyquist
frequency. On the other hand, frequency domain resampling by zero-padding the discrete
Fourier transform of the signal can only resample to rational multiples of the original
sampling interval. Finally, the kernel from the theoretically exact reconstruction
according to the Nyquist–Shannon sampling theorem (i.e. the sinc function) has infinite
support which renders it impractical as well seeSect. 10.3 for an extended
introduction to interpolation.
Therefore, we adopt the Lanczos resampling scheme, which is popular in image processing,
and an approximation to the sinc-resampling with finite support. The Lanczos kernel is
defined as the sinc function multiplied by the Lanczos window function :
L(t)=sinc(t)sinc(t/a)ift∈[-a,a]0else,,
where a is a parameter to control the number of samples to be used in the
interpolation and the sinc function is defined as
sinc(t)=sin(πt)πt.
Interpolation is then performed by convolving the discrete signal si with
this kernel and evaluating it at the new time samples tj :
S(tj)=∑i=tj/Δt-a+1tj/Δt+asiLtj/Δt-i,
where ⌊⋅⌋ denotes the floor function and Δt the sampling
interval of the original signal. Figure
shows the Lanczos kernel for different values of a, Fig. shows a
practical example of resampling a seismogram. In Fig. we test a
number of values for a for the first 1800s of the same seismogram and we find
a=12 to be a reasonable compromise between cost (using 25 samples in the interpolation)
and accuracy (RMS error of 0.03 %).
Python API
The Instaseis Python API demonstrated in
a short interactive Python session. A Source and a
Receiver object are created and then passed to the
get_seismograms() method of an InstaseisDB object. This
will extract the Green's functions from the databases and perform all
necessary subsequent steps resulting in directly usable three-component
seismograms in form of an ObsPy Stream object. Please refer to the
Instaseis documentation for details.
Instaseis is implemented as a library for the Python programming language with some
performance critical parts written in Fortran. Furthermore it directly
integrates with the ObsPy package and
utilizes the Python bindings to NetCDF 4 .
This enables it to take advantage of
the strong scientific Python ecosystem built on top of the
SciPy Stack (http://www.scipy.org/).
Reasons for choosing Python include its growing popularity in the sciences and
it being easy to learn and use while still sufficiently powerful for complex
scenarios. Python is open-source and particularly well suited for big data applications
and the integration with web services and databases which suits the potential uses for
Instaseis.
Figure shows how to use the Python API in the most
simple case. Instaseis provides an object-oriented interface: in addition to
the shown Source and Receiver classes it furthermore provides
ForceSource and FiniteSource objects. These can also be
created by providing data in most commonly used file formats like StationXML,
QuakeML, and Standard Rupture Format. Please refer to the Instaseis
documentation for further details (www.instaseis.net).
Combining and integrating these features
enables the construction of modern and clean workflows to solve new problems. A
big advantage of this approach is that no temporary files need to be created
and the synthetic seismograms can be extracted from the databases on demand
when and where they are needed.
The Python API furthermore implements a client/server approach for remote
Instaseis database access over HTTP. This enables organizations to
host high-frequency databases and serve them to users over the internet. This
eliminates the need and upfront cost to calculate, store, and distribute
Instaseis databases for most users while still offering enough
performance for many use cases. The Python interface is data-source
independent: from a usage perspective it does not matter if the databases are
available locally or via the internet.
Instaseis is developed with a test-driven approach utilizing continuous
integration, i.e. every change in the code is automatically tested for a number of
different python version once committed to the repository. It is well documented, has a
high test coverage, and we intend to maintain it for the next couple of years providing a
solid foundation for future applications built on top of it. It is licensed under the
Lesser GNU General Public License v3.0, the source code and issue tracker are hosted on
GitHub.
Benchmarks
Accuracy
Comparison of vertical displacement seismograms
(bandpass filtered from 50 to 2 s period) for a
moment magnitude Mw = 5.0 event in 126 km depth under the
Tonga Islands, computed with Instaseis, AxiSEM and
Yspec in the anisotropic PREM model without ocean but including
attenuation. The traces are recorded at the GSN stations indicated in the
map. The zoom windows are depicted with gray rectangles in the record
section and the time scale is relative to the ray-theoretical arrival.
EM and PM denote the envelope and
phase misfit between Instaseis and
Yspec traces in the corresponding time
window.
As we already provided some rigorous validation comparing AxiSEM synthetics to a
reference solution Yspec in ,
the purpose of this section is only to confirm that using the new scheme
with reciprocal computations, spatial interpolation and temporal resampling does not
decrease accuracy. Figure shows a record section and some details
for Instaseis, AxiSEM and Yspec seismograms computed in the
anisotropic, visco-elastic PREM model for an event at 126 km depth beneath
Tonga bandpass filtered to 50–2 s period.
Storage requirements of the reciprocal databases for PREM
after zip compression for all three components and several parameter sets
(maximum source depth, components, seismogram length and epicentral distance
range). Dashed lines are fitted functions g(f)=af2.7,
where each point was weighted with the frequency f to ensure better fitting
at the higher frequencies. The exponent is slightly smaller than the expected 3 because the zip
compression is more efficient for longer time traces. At long periods,
element sizes are governed by the layer thickness rather than the wavelength,
resulting in the discrepancy from the power law at long periods.
While this figure is similar and the AxiSEM and Yspec reference data
actually the same as presented in Fig. 11, it is important to note
that they were generated in very different ways: here we computed a whole Green's
function database for all epicentral distances and down to 700 km source depth
and changing source or receivers would cost a few milliseconds only. In our previous
approach, this would have required a full new AxiSEM simulation on the order of
10 K CPU hours computational cost. Also, in contrast to , we used
default mesh parameters for 2 s period and time step close to the stability limit
of the 4th order symplectic time scheme . Still, the phase misfit
is well below 1 % in all zoom windows and the maximum of the
envelope misfit is 2 % for the PPP phase on station ALE.
The fact that these traces are virtually indistinguishable for such a demanding setup of
wave propagation over 800 wavelengths (waves at 2 s period traveling for
1600 s) verifies that the entire workflow of computing and
querying the database are correctly implemented. In particular, numerical reciprocity
(i.e. the different force and moment sources), on-the-fly calculation of the strain tensor
as well as temporal and spatial sampling have no significant adverse effect on accuracy,
i.e. any remaining errors vanish within numerical accuracy of the forward solver
AxiSEM.
I/O performance for a typical setup of AxiSEM on
SuperMUC. The simulation parameters were as follows: 2 s shortest period,
3600 s simulation length, model: ak135f, vertical component, maximum
source depth 700 km. The resulting uncompressed wave field file has a
size of 675 GB. The I/O throughput is not affected much by the number
of CPUs involved. The throughput between different runs varies, which is
probably caused by the changing I/O load on the system.
#CPUs
Runtime
I/O time
Throughput
rel. I/O time
4624
1091 s
196 s
3.44 GB s-1
18.0 %
2304
1802 s
281 s
2.40 GB s-1
15.6 %
1152
2359 s
167 s
4.04 GB s-1
7.0 %
576
4482 s
193 s
3.50 GB s-1
4.3 %
Computational cost in CPU hours (measured on
Monte Rosa: a Cray XE6 for Earth and Piz Daint, a Cray
XC30 for Mars) to generate full Instaseis databases with 1 h
long seismograms for two time schemes: 2nd order Newmark and 4th order
symplectic.
Database size
One major constraint for computing a database beside the CPU
cost is the permanent storage requirement. Here, we summarize the most
important parameters and the related scaling of the required disk space. The amount of data scales
with the third power of the highest frequency resolved by the mesh, but zip
compression is slightly more efficient for longer traces, resulting in an empirical
exponent of 2.7, see Fig. . Scaling with the length of the seismograms
is slightly stronger than linear, again because the compression is more efficient on the
zeros before the first P arrival. Scaling with depth and epicentral distance range is
linear, where the prefactor for depth scaling is halved at each doubling layer of the
mesh. The reciprocal databases for vertical (40 %) and horizontal (60 %) components are
computed and therefore usable independently.
Several examples are shown in Fig. : for Earth, a complete
reciprocal database including all three components, all epicentral distances and sources
down to 700 km and 1 h of seismogram length accurate down to 2 s
period, is about 1 TB in size. Calculating such a database once and storing it on a
central server will give any user arbitrary and immediate access to
short-period synthetic seismograms without any further cost. More specialized databases
are possible: for example to study inner core phases for shallow events in an epicentral
distance from 140 to 160∘, 200 GB storage suffices to store a
database with a frequency of 2 Hz.
Performance
To evaluate the overall performance of Instaseis, two distinct parts have to be
analyzed: first, the databases have to be generated with AxiSEM. Though very
efficient, the database generation at short periods is a high-performance computing task.
However, AxiSEM scales well on up to 10 000 cores such that global wave fields can be
computed at the highest frequencies within hours on a supercomputer. Detailed
performance and scaling tests of AxiSEM can be found in ,
here we just show the total CPU time required to compute full databases (i.e. horizontal
and vertical component) for 1 h long seismograms for two different time schemes
2nd order Newmark and 4th order symplectic and two planets
(Earth and Mars) at a variety of resolutions, see Fig. . The
general scaling of AxiSEM is proportional to T-3, where T is the shortest
period resolved by the mesh. The slight discrepancy from this power law at longer periods
is due to the thin crustal layers causing a smaller global time step in the simulation.
Simulations for Mars are approximately a factor of 5 faster than for Earth, due to the
smaller radius.
The performance of the second part, the seismogram extraction, on the other hand is rarely
limited by raw computing power. It scales linearly with increasing frequency of the
databases' Green's functions and can easily be accomplished on a standard laptop. The
limiting factor in most cases is the latency of the storage system, e.g. the time until
it starts reading from the database. To alleviate this issue we implement a buffering strategy
on the functions reading data from the files: the Green's functions from a whole element
of the numerical grid are read once and cached in memory. If data from the same element is
needed again at a later stage it will already be in memory, thus avoiding repeated disc
access. Once the cache memory limit is reached, the data with the earliest last access
time is deallocated, effectively resulting in a priority queue sorted by last access time.
This optimization is very effective for most common use cases as they oftentimes require
seismograms in a small range of epicentral distances and depths.
Instaseis comes with a number of integrated benchmarks to judge its performance
for a certain database on a given system. The benchmarks emulate the computational
requirements and data access patterns of some typical use cases like finite source
simulations and source parameter inversions. Finite sources within the benchmarks are
simulated by calculating waveforms for moment tensor sources on an imaginary fault plane
along the equator ranging from the surface to a depth of 25 km. One source is
calculated for each kilometer in depth until the bottom of the fault is reached. This is
repeated each kilometer along the fault's surface trajectory until the benchmark
terminates. A source parameter inversion is simulated by calculating seismograms from
moment tensor sources randomly scattered within 50 km distance to a fixed point.
Results for four runs are shown in Fig. . As is the case
with all benchmarks they have to be interpreted carefully, nonetheless they demonstrate
the behavior and performance characteristics of Instaseis on real machines.
Applications
In this section we depict several possible use cases of Instaseis. This list is
not exhaustive and deliberately unconnected to provide a broad overview.
Graphical user interface
To prominently highlight the features and nearly instantaneous seismogram
extraction for arbitrary source and receiver combinations of Instaseis, we
developed a cross-platform graphical user interface (GUI), shown in Fig. .
It ships with the standard Instaseis package and is written in PyQt, a Python
wrapper for the Qt toolkit.
Most evidently, this may be used for visual inspection and verification of any given
AxiSEM Green's function database. Instaseis' performance permits an
immediate visual feedback to changing parameters. This also delivers quantitative insight
for an intuitive understanding of the features and parameter sensitivities of seismograms.
Examples of this are the polarity flips of first arrivals when crossing a moment tensor's
nodal planes, the triplication of phases for shallow sources, the Hilbert transformed
shape of reflected phases and the relative amplitude of surface waves (especially
overtones) depending on the earthquake depth. Furthermore, the GUI allows the calculation
of seismograms from finite sources and the exploration of waveform differences in
comparison to best-fitting point sources.
Results of benchmarks for four typical use cases run on different hardware with a
variety of shortest periods. The graphs show the inverse time for the calculation
of the ith three-component seismogram. Each run calculated 1000
three-component seismograms, is repeated 10 times with the same random seed, the
top and bottom values are discarded, and the mean of the remaining eight values is
plotted. The CPU and I/O bound scenarios illustrate the speed with a fully
efficient and a deactivated cache, respectively. The two bottom scenarios emulate
real use cases, see the main text for details. Amongst other things they show
the consequence of a too small cache in the source inversion scenario for the
2 s run and the efficiency of the buffer in the finite source scenario for the
same database.
Screenshot of the Instaseis graphical user
interface (GUI). Aside from quickly exploring the characteristics of a
given Green's function database it is a great tool for understanding and
teaching many features of seismograms. The speed of Instaseis enables an
immediate visual response to changing source and receiver parameters. The
left-hand side shows three-component seismograms where theoretical arrival
times of various seismic phases are overlaid as vertical lines. The bar at
the top is used to change filter and resampling settings and the section on
the right side is used to modify source and receiver parameters.
IRIS web-interface
To enable usage of Instaseis seismograms to a broader community, we aim to remove
all hurdles of computing and storing large databases locally. To this
end, and in collaboration with IRIS, we plan to establish a web interface to the Instaseis
databases. In contrast to the ShakeMovie approach , this
interface will be able to handle arbitrary sources and receivers
independent from catalogue data or other parameter limitations.
The interface and databases will be described and benchmarked in
detail in a separate publication, the status of this project can be viewed on
http://ds.iris.edu/ds/products/ondemandsynthetics.
Finite-frequency tomography
Computational cost to compute many synthetic
seismograms for finite-frequency tomography with a shortest period of
2 s using different methods. For Yspec we assume that for every
source there are 1000 receivers with 3 components each. The shaded regions
for Instaseis indicate the dependence of the performance on the
actual source receiver distribution, compare
Fig. . Including the cost to generate the
database with AxiSEM, Instaseis breaks even with
Yspec for 14 000 waveforms, which is equivalent to about 5 sources
in this configuration.
Comparison between observed seismograms (black) and
Instaseis synthetics for the Sumatra earthquake on
30 September 2009 with magnitude Mw 7.5 at 82 km depth. Vertical axis labels are
epicentral distances, horizontal is time relative to the ray-theoretical
arrivals. A Gabor filter with 3.7 s central period is applied to all
traces and the synthetics are convolved with an inverted source time function
. The waveforms are aligned by computing relative
time-shifts between data and synthetic seismograms using cross-correlation
(similar to actual finite-frequency tomography).
In finite-frequency tomography e.g. information is extracted from
recorded seismograms by matched filters in multiple frequency bands .
A matched filter correlates a predicted signal with the measured signal to
detect the predicted signal in the presence of noise. In the case of seismic tomography, a
synthetic seismogram is necessary, which is usually created by convolving a Green's
function with an estimated source-time function. For body waves, short periods down to
1 s are commonly used e.g.. Typical data sets
contain thousands of earthquakes e.g., each recorded at hundreds of
stations, resulting in up to a million waveforms. For each of these waveforms, a separate
Green's function has to be calculated, which requires solving the seismic forward problem at
the desired frequencies. For wave propagation methods that solve the forward problem
separately for each event, computing these reference synthetics presents a formidable
computational challenge, which is why previous studies resorted to approximate solutions
like WKBJ or the reflectivity method . The full
method implemented in Yspec is about an order of magnitude faster
than AxiSEM in computing seismograms for a single source. However, at least in the current
implementation, the cost scales linearly with the number of events.
As Instaseis takes advantage of reciprocity of the Green's function, we can now build the
whole database for all possible sources with only two runs of AxiSEM: one for the vertical
and one for the horizontal components. Figure compares the
computational cost of computing the reference synthetics down to 2 s period assuming
that each event was recorded at 1000 three-component stations. Ignoring the cost of
computing the database, Instaseis is comparable in performance to WKBJ, but actually
returns full seismograms including all phases, see Fig. . In
contrast to WKBJ, where each crustal reverberation has to be defined separately, it
automatically calculates the full crustal response. Also, it appropriately models
diffracted phases such as Pdiff and triplicated phases from upper mantle
discontinuities. If we include the database generation, Instaseis breaks even in
computational cost with Yspec already at about 14 000 waveforms, i.e. five events with
1000 three-component stations each. At about 5⋅108 waveforms, the cost of
extracting the seismograms from the database becomes dominant over the database
generation. Assuming 2000 seismograms per event, this is equivalent to 10 000
earthquakes, i.e. in the order of available earthquake catalogues. However, generating
seismograms with different source locations or moment-tensor radiation patterns, which is
often necessary in tomography, does not require a new database
generation.
Probabilistic source inversion
Uncertainties in source parameters have been shown to have a strong influence on waveform
tomography . Probabilistic point source inversion estimates the
uncertainties of source parameters and their correlation. From these, the effect on
seismic tomography can be estimated . It requires the repeated
calculation of synthetic waveforms for varying moment tensors, depths and source time
functions to calculate the likelihood and posterior probability density of models in a
Bayesian sense. Changing source time function and moment tensor is extremely efficient
from an Instaseis perspective, and the limitation to a fixed epicenter means that
the I/O buffering can be done very efficiently, which is reflected in the Source Inversion test case in the benchmark (Fig. ).
Stations used in the source inversion validation (SIV)
exercise. Circles mark 30, 60 and 90∘ epicentral
distance. The finite source is a M 7.8 strike-slip earthquake in southern
California represented by ≈105 point sources, the beach ball
represents the centroid moment tensor, i.e. the orientation and predominant
direction of slip of the overall fault.
From a previous study , we assume that for an inversion for depth, the moment tensor
and the source time function, a 20-dimensional model space has to be sampled, which
requires to perform roughly 60 000 forward simulations. Using 100 seismic stations and
three-component seismograms, this means that roughly 1.8⋅107 waveforms have to be
calculated for one source inversion, costing on the order of 50–100 CPU hours
(Fig. ).
Finite sources
Seismograms for the SIV benchmark, Z-component aligned
on the P arrival band-pass filtered between 5 and 100 s period. The labels
denote the station code and epicentral distance.
In the frequency–wave number integration fk,,
only direct P and the depth phases were included, while Instaseis
provides full seismograms, including PP, PcP and other phases. Especially for the
stations in less than 40∘ distance, the effect is profound, since PP
arrives as a triplicated complex wave train only 70–100 s after P. Due
to the long source duration, the PP arrival overlaps with the direct P wave train
for several stations.
Finite sources can be represented in Instaseis by a cloud of point sources
without limitations on the fault geometry or source time functions. Each point source
needs to be attached with a moment tensor, a slip rate function and a time shift relative
to the origin time. These can for instance be retrieved from standard rupture format
(*.srf) or subfault format (*.param) files as provided by the USGS for most events
with M>6.5. As a show case, we computed the seismograms for the source inversion validation
(SIV) exercise #3 (http://equake-rc.info). The source is a M 7.8 strike-slip
earthquake on the San Andreas Fault represented by ≈105 point sources, where
each source has a different mechanism and slip rate function. The 52 stations are in
30 to 90∘ epicentral distance (see Fig. ), where the P wave
arrival is supposed to be well separated (compare Fig. ). Excluding the cost of
generating the database, it cost a total of 12 CPU hours to compute the 52 1-hour-long
three-component seismograms accurate down to 5 s.
Figure compares the Instaseis seismograms to P-phases computed with the
frequency–wave number integration method (fk) by , where only direct and
surface reflected phases where taken into account. While the first arriving waves agree to
certain extent with Instaseis providing systematically larger amplitudes, there are
significant differences for later time windows. These are due to additional phase arrivals
within the time window (especially triplicated PP, compare Fig. )
and crustal reverberations not modeled by the fk method. For events with long rupture
durations as in this example (200 s) this suggests that more accurate waveforms
should be beneficial for finite source inversions.
Insight/Mars
Seismic waves traveling in Mars after a meteorite
impact at its north pole computed with AxiSEM. P-waves are shown in
blue and S-waves and surface waves in red.
The upcoming NASA-lead Mars Insight mission , to be launched in
March 2016 and scheduled to land September 2016, will deploy a single station with both a
broad-band and a short-period seismometer on Mars. This will be the first
extra-terrestrial seismic mission since the Apollo lunar landings (1969–1972) and Mars
Viking missions (1975) with the goal of elucidating the interior structure of a planet
other than Earth. The instrument will record local, regional, and more distant
marsquakes (including meteorite impacts) and send data back to Earth for analysis.
Our knowledge of the seismic structure of Mars is limited because of lack of resolution
of currently available areophysical data e.g., and the limited
sensitivity of the Viking seismometers due to their installation on board of the lander.
For this reason, we will generate databases of “reference” seismic waveforms for a
comprehensive collection (order of magnitude 1000) of 1-D Martian models to be used by
modelers and analysts in preparation for the Insight mission. The models are constructed
from current areophysical data (mean mass, mean moment of inertia, tidal Love number, and
tidal dissipation) and thermodynamic modeling methods and summarize our current
understanding of the internal constitution of the planet. AxiSEM and hence
Instaseis can readily be used to propagate waves on Mars, see
Fig. , allowing us to build these databases very efficiently.
Synthetic ambient seismic noise
Synthetic ambient seismic noise cross correlations computed with
Instaseis. Left: 100 000 vertical force sources located in the
oceans and amplitude proportional to the significant wave height from the
NOAA WAVEWATCH III model on 3 January 2015 . Red crosses
indicate the receivers located in Munich and Zurich. Right: cross
correlations of 20 days of noise for (top) evenly distributed noise sources
and (bottom) the sources in the map, the traces are normalized to their
maximum amplitude.
As mentioned in Sect. , seismograms generated by force sources can be
extracted from the same reciprocal databases. This is particularly interesting for
studying ambient seismic noise. By cross-correlating noise recorded at two stations, using
long enough time series and under certain assumptions (uncorrelated, isotropically
distributed white noise sources), it is possible to retrieve the Green's function of the
medium between the two stations e.g.. However,
not all of these assumptions are met in nature, e.g. the noise sources are not evenly
distributed . Also, the noise sources themselves
are not yet well understood, especially with respect to the generation of Love waves in
the microseismic band .
Instaseis provides a basis to quickly generate noise synthetics to study such
effects, which we illustrate in Fig. . We computed noise cross
correlations, accurate down to a period of 5 s, for a total of 20 days of noise data
generated with 100 000 noise sources. The calculation only took 1 CPU hour. In the
first case, the noise sources consist of vertical forces with a random source-time
function, all have the same amplitude and are distributed evenly on the globe. The
resulting cross correlation is in good agreement with the Green's function, which is
obtained by introducing an impulse source at each of the stations in Zurich and Munich.
In the second case, sources are located in the oceans only, their amplitude proportional
to the significant wave height . For the two stations located in
Zurich and Munich, the close sources are thus solely located in the west, which leads to
strong asymmetry in the retrieved correlations . Instaseis
thus enables users to study noise on the global scale across the microseismic band, by
generating realistic waveforms at negligible cost.