We present a new method and implementation (Instaseis) to store global Green's
functions in a database which allows for near-instantaneous (on the order of milliseconds)
extraction of arbitrary seismograms. Using the axisymmetric spectral element method
(AxiSEM), the generation of these databases, based on reciprocity of the Green's
functions, is very efficient and is approximately half as expensive as a single
AxiSEM forward run. Thus, this enables the computation of full databases at half
the cost of the computation of seismograms for a single source in the previous scheme and
allows to compute databases at the highest frequencies globally observed. By storing
the basis coefficients of the numerical scheme (Lagrange polynomials), the Green's
functions are 4th order accurate in space and the spatial discretization respects
discontinuities in the velocity model exactly. High-order temporal interpolation using
Lanczos resampling allows to retrieve seismograms at any sampling rate. AxiSEM is
easily adaptable to arbitrary spherically symmetric models of Earth as well as other
planets. In this paper, we present the basic rationale and details of the method as well
as benchmarks and illustrate a variety of applications. The code is open source and
available with extensive documentation at

Global stack of 1 h of seismograms accurate to a
shortest period of 2 s for an earthquake in 27 km depth computed
with Instaseis. The displacement is color-coded analogous to the
IRIS global stack

Despite the exponential growth of computational power and substantial progress of 3-D
numerical methods for seismic wave propagation in the last 15 years

As detailed by

As computing full global waveforms especially at higher frequencies requires substantial
computational resources, several initiatives serve to deliver waveforms by means of
databases without having to run a full numerical solver. The ShakeMovie project
(

In this paper we present a method that uses AxiSEM to generate global Green's function databases and provides a Python interface for convenient extraction of seismograms. The advantage over ShakeMovie synthetics are the possible higher frequencies and arbitrary source and receiver combinations independent of catalogues and real stations. Compared to Pyrocko with GEMINI synthetics, AxiSEM is more efficient in generating the databases, allowing to routinely compute them for a large number of different background models or specialized applications (e.g. limited depth/distance ranges). Also, by using the Lagrangian polynomials in the SEM (spectral element method) mesh as basis functions, it achieves higher spatial accuracy.

This paper is structured as follows. In Sect.

The 3-D wave field is decomposed analytically into
monopole, dipole and quadrupole radiation patterns (left) and the remaining 2-D
problem is solved on a D-shaped domain (right) using the spectral element
method. While the forward databases require a total of four 2-D computations,
it is only two for the backward databases using reciprocity of the
Green's function: one for the vertical and one for the horizontal
components

AxiSEM was designed from the beginning with the application of computing
global wave fields rather then single seismograms in mind

Instaseis has the capability of dealing with forward wave fields, i.e. the waves are propagated from a moment-tensor point source at fixed depth (i.e. receivers exist throughout the medium), as well as backward or reciprocal wave fields, where the wave fields are propagated from a single-force point source at fixed depth and recorded throughout the medium (i.e. sources exist throughout the medium).

Potential applications of forward databases are the generation of 3-D
wave-propagation movies

In contrast, reciprocal databases utilize the reciprocity of the Green's
functions, and are useful in all cases where the receivers are at fixed depth, thus for
instance mimicking earthquake catalogues recorded at stations along the surface. The
source can be located anywhere in the region where the Green's functions are recorded in
the simulation, thus allowing for unlimited choices in the source–receiver geometry. To
generate a reciprocal database, a total of two runs with AxiSEM are
needed, one for the vertical component and one for both horizontal components of the
seismogram

Lagrangian basis polynomials

Lagrange interpolation points inside an element (gray)
and its neighbors. Coordinates

For the spatial discretization we choose to keep the same basis as used in
AxiSEM. The displacement

The wave field is represented by polynomials, typically of degree 4; interpolation is hence of 4th order accuracy.

The basis is local and only few coefficients are needed to represent the wave field inside an element (typically 25), in contrast to e.g. global basis functions such as spherical harmonics.

Discontinuities in the model that cause discontinuities in the strain Green's functions are respected by the mesh.

The strain tensor (representing the moment tensor in the reciprocal case) can be computed on the fly from the stored displacements at high accuracy. This reduces the storage by a factor of 2 as the displacement has 3 degrees of freedom, compared to 6 for the strain.

Since the displacement is continuous also at model
discontinuities and element boundaries, it needs to be stored only once at
all Gauss–Lobatto–Legendre (GLL) points that belong to multiple elements,
reducing the storage by another factor of

Storing the displacement allows to use force sources as well without any extra computation or storage requirements.

Snapshot of one component of the Green's
tensor (

One component of the strain Green's tensor
(

Figure

Voronoi approximation (colored) of the AxiSEM mesh (black lines) using the midpoints of the elements (red circles) only, zoomed onto a doubling layer for a 50 s mesh. For most elements, the Voronoi cell coincides almost exactly with the AxiSEM element, note that most of the AxiSEM elements have edges of concentric circles while the edges of the Voronoi cells are all straight lines. In the worst case, six AxiSEM elements have to be tested whether a point is inside or not.

One performance-critical step in the spatial scheme is to find the reference coordinates

We follow a two-step approach to finding the reference coordinates. First,
we find the six closest element midpoints to
limit the search to a small number of candidate elements in which the point could be.
The number six is specific to the AxiSEM mesh, where each corner point can belong
to a maximum of six elements in the doubling layers, see Fig.

Lanczos kernels used for resampling. For large
values of the parameter

Resampling using a Lanczos kernel with

RMS error of the resampling using the first 1800 s
of the seismogram from Fig.

In a second step, the reference coordinates

Normalized amplitude spectra of the Gaussian source
time function (slip rate) used at 2 s mesh period and a vertical
component synthetic seismogram recorded at 40

The design of the temporal scheme is guided by a number of constraints on the spectrum of the source time function: the spectrum should decay steep enough above the highest frequency resolved by the mesh, such that the least number of samples according to the Nyquist criterion can be used without introducing aliasing. On the other hand, it should not decay too steeply, such that it is still possible to deconvolve and convolve with another source time function. Additionally, the spectrum should be as flat as possible within the usable frequency range as well as “earthquake-like” without the necessity of deconvolution when extracting a seismogram from the database. An actual delta function as would be required for true Green's functions cannot be represented in a discrete approximation as it is not bandlimited.

We found a Gaussian source time function with

It is desirable to retrieve seismograms from the database with arbitrary time steps,
which requires interpolation or resampling. Popular time domain schemes such as
interpolation by low-order polynomials or splines do not work well close to the Nyquist
frequency. On the other hand, frequency domain resampling by zero-padding the discrete
Fourier transform of the signal can only resample to rational multiples of the original
sampling interval. Finally, the kernel from the theoretically exact reconstruction
according to the Nyquist–Shannon sampling theorem (i.e. the sinc function) has infinite
support which renders it impractical as well

Therefore, we adopt the Lanczos resampling scheme, which is popular in image processing,
and an approximation to the sinc-resampling with finite support. The Lanczos kernel is
defined as the sinc function multiplied by the Lanczos window function

Interpolation is then performed by convolving the discrete signal

The Instaseis Python API demonstrated in
a short interactive Python session. A

Instaseis is implemented as a library for the Python programming language with some
performance critical parts written in Fortran. Furthermore it directly
integrates with the ObsPy package

Figure

The Python API furthermore implements a client/server approach for remote Instaseis database access over HTTP. This enables organizations to host high-frequency databases and serve them to users over the internet. This eliminates the need and upfront cost to calculate, store, and distribute Instaseis databases for most users while still offering enough performance for many use cases. The Python interface is data-source independent: from a usage perspective it does not matter if the databases are available locally or via the internet.

Instaseis is developed with a test-driven approach utilizing continuous integration, i.e. every change in the code is automatically tested for a number of different python version once committed to the repository. It is well documented, has a high test coverage, and we intend to maintain it for the next couple of years providing a solid foundation for future applications built on top of it. It is licensed under the Lesser GNU General Public License v3.0, the source code and issue tracker are hosted on GitHub.

Comparison of vertical displacement seismograms
(bandpass filtered from 50 to 2 s period) for a
moment magnitude

As we already provided some rigorous validation comparing AxiSEM synthetics to a
reference solution

Storage requirements of the reciprocal databases for PREM
after zip compression for all three components and several parameter sets
(maximum source depth, components, seismogram length and epicentral distance
range). Dashed lines are fitted functions

While this figure is similar and the AxiSEM and Yspec reference data
actually the same as presented in

The fact that these traces are virtually indistinguishable for such a demanding setup of wave propagation over 800 wavelengths (waves at 2 s period traveling for 1600 s) verifies that the entire workflow of computing and querying the database are correctly implemented. In particular, numerical reciprocity (i.e. the different force and moment sources), on-the-fly calculation of the strain tensor as well as temporal and spatial sampling have no significant adverse effect on accuracy, i.e. any remaining errors vanish within numerical accuracy of the forward solver AxiSEM.

I/O performance for a typical setup of AxiSEM on SuperMUC. The simulation parameters were as follows: 2 s shortest period, 3600 s simulation length, model: ak135f, vertical component, maximum source depth 700 km. The resulting uncompressed wave field file has a size of 675 GB. The I/O throughput is not affected much by the number of CPUs involved. The throughput between different runs varies, which is probably caused by the changing I/O load on the system.

Computational cost in CPU hours (measured on Monte Rosa: a Cray XE6 for Earth and Piz Daint, a Cray XC30 for Mars) to generate full Instaseis databases with 1 h long seismograms for two time schemes: 2nd order Newmark and 4th order symplectic.

One major constraint for computing a database beside the CPU
cost is the permanent storage requirement. Here, we summarize the most
important parameters and the related scaling of the required disk space. The amount of data scales
with the third power of the highest frequency resolved by the mesh, but zip
compression is slightly more efficient for longer traces, resulting in an empirical
exponent of 2.7, see Fig.

Several examples are shown in Fig.

To evaluate the overall performance of Instaseis, two distinct parts have to be
analyzed: first, the databases have to be generated with AxiSEM. Though very
efficient, the database generation at short periods is a high-performance computing task.
However, AxiSEM scales well on up to 10 000 cores such that global wave fields can be
computed at the highest frequencies within hours on a supercomputer. Detailed
performance and scaling tests of AxiSEM can be found in

The performance of the second part, the seismogram extraction, on the other hand is rarely limited by raw computing power. It scales linearly with increasing frequency of the databases' Green's functions and can easily be accomplished on a standard laptop. The limiting factor in most cases is the latency of the storage system, e.g. the time until it starts reading from the database. To alleviate this issue we implement a buffering strategy on the functions reading data from the files: the Green's functions from a whole element of the numerical grid are read once and cached in memory. If data from the same element is needed again at a later stage it will already be in memory, thus avoiding repeated disc access. Once the cache memory limit is reached, the data with the earliest last access time is deallocated, effectively resulting in a priority queue sorted by last access time. This optimization is very effective for most common use cases as they oftentimes require seismograms in a small range of epicentral distances and depths.

Instaseis comes with a number of integrated benchmarks to judge its performance
for a certain database on a given system. The benchmarks emulate the computational
requirements and data access patterns of some typical use cases like finite source
simulations and source parameter inversions. Finite sources within the benchmarks are
simulated by calculating waveforms for moment tensor sources on an imaginary fault plane
along the equator ranging from the surface to a depth of 25 km. One source is
calculated for each kilometer in depth until the bottom of the fault is reached. This is
repeated each kilometer along the fault's surface trajectory until the benchmark
terminates. A source parameter inversion is simulated by calculating seismograms from
moment tensor sources randomly scattered within 50 km distance to a fixed point.
Results for four runs are shown in Fig.

In this section we depict several possible use cases of Instaseis. This list is not exhaustive and deliberately unconnected to provide a broad overview.

To prominently highlight the features and nearly instantaneous seismogram
extraction for arbitrary source and receiver combinations of Instaseis, we
developed a cross-platform graphical user interface (GUI), shown in Fig.

Most evidently, this may be used for visual inspection and verification of any given AxiSEM Green's function database. Instaseis' performance permits an immediate visual feedback to changing parameters. This also delivers quantitative insight for an intuitive understanding of the features and parameter sensitivities of seismograms. Examples of this are the polarity flips of first arrivals when crossing a moment tensor's nodal planes, the triplication of phases for shallow sources, the Hilbert transformed shape of reflected phases and the relative amplitude of surface waves (especially overtones) depending on the earthquake depth. Furthermore, the GUI allows the calculation of seismograms from finite sources and the exploration of waveform differences in comparison to best-fitting point sources.

Results of benchmarks for four typical use cases run on different hardware with a
variety of shortest periods. The graphs show the inverse time for the calculation
of the

Screenshot of the Instaseis graphical user interface (GUI). Aside from quickly exploring the characteristics of a given Green's function database it is a great tool for understanding and teaching many features of seismograms. The speed of Instaseis enables an immediate visual response to changing source and receiver parameters. The left-hand side shows three-component seismograms where theoretical arrival times of various seismic phases are overlaid as vertical lines. The bar at the top is used to change filter and resampling settings and the section on the right side is used to modify source and receiver parameters.

To enable usage of Instaseis seismograms to a broader community, we aim to remove
all hurdles of computing and storing large databases locally. To this
end, and in collaboration with IRIS, we plan to establish a web interface to the Instaseis
databases. In contrast to the ShakeMovie approach

Computational cost to compute many synthetic
seismograms for finite-frequency tomography with a shortest period of
2 s using different methods. For Yspec we assume that for every
source there are 1000 receivers with 3 components each. The shaded regions
for Instaseis indicate the dependence of the performance on the
actual source receiver distribution, compare
Fig.

Comparison between observed seismograms (black) and
Instaseis synthetics for the Sumatra earthquake on
30 September 2009 with magnitude

In finite-frequency tomography

As Instaseis takes advantage of reciprocity of the Green's function, we can now build the
whole database for all possible sources with only two runs of AxiSEM: one for the vertical
and one for the horizontal components. Figure

Uncertainties in source parameters have been shown to have a strong influence on waveform
tomography

Stations used in the source inversion validation (SIV)
exercise. Circles mark 30, 60 and 90

From a previous study

Seismograms for the SIV benchmark, Z-component aligned
on the P arrival band-pass filtered between 5 and 100 s period. The labels
denote the station code and epicentral distance.
In the frequency–wave number integration

Finite sources can be represented in Instaseis by a cloud of point sources
without limitations on the fault geometry or source time functions. Each point source
needs to be attached with a moment tensor, a slip rate function and a time shift relative
to the origin time. These can for instance be retrieved from standard rupture format
(*.srf) or subfault format (*.param) files as provided by the USGS for most events
with

Figure

Seismic waves traveling in Mars after a meteorite impact at its north pole computed with AxiSEM. P-waves are shown in blue and S-waves and surface waves in red.

The upcoming NASA-lead Mars Insight mission

Our knowledge of the seismic structure of Mars is limited because of lack of resolution
of currently available areophysical data

Synthetic ambient seismic noise cross correlations computed with
Instaseis. Left: 100 000 vertical force sources located in the
oceans and amplitude proportional to the significant wave height from the
NOAA WAVEWATCH III model on 3 January 2015

As mentioned in Sect.

Instaseis provides a basis to quickly generate noise synthetics to study such
effects, which we illustrate in Fig.

In this paper we presented a readily available methodology and code to extract seismograms
for spherical earth models from a Green's function database. High efficiency in the
generation of databases and very fast extraction (on the order of milliseconds per
seismogram) of highly accurate seismograms (indistinguishable from conventional forward
solvers) can then replace previously employed approximations such as WKBJ, reflectivity or
frequency–wave number integration methods that were used for computational reasons in many
applications of global seismology. Instaseis is open source and available with extensive
documentation at

Future developments include Cartesian local domains with layered models, which are not yet supported by AxiSEM. As a large fraction of earthquakes are located below oceans and receivers on continents, it may be beneficial for body waves studies to take advantage of the axisymmetric capability of AxiSEM and place the receiver on a circular “island” of continental crust within a global oceanic crustal model.

M. van Driel and L. Krischer implemented Instaseis, M. van Driel, S. C. Stähler and T. Nissen-Meyer continuously develop AxiSEM and added the database output, and K. Hosseini prepared the finite-frequency example. M. van Driel prepared the manuscript with contributions from all co-authors.

We thank the reviewers C. Tape and D. Al-Attar as well as the editor T. Taira for
their constructive comments that helped to substantially improve the manuscript.
We thank Alex Hutko and Chad Trabant (IRIS) for valuable discussions on the database
selection and implementation as well as user demand. Amir Khan provided the Mars model and
Olaf Zielke the reference data used in Fig.