1 Introduction
The numerical computation of convolution products is a crucial issue arising in many domains like the filtering, the computation of boundary integral operators, optimal control, etc. In a continuous framework, a convolution product is of the form
(1) 
where is some domain of integration in , some function. The bivariate function is some convolution kernel of the form
(2) 
where is some distance which can be the classical euclidean distance . Of course, eq. (1) does not admit an analytical expression in the general case and the integral is computed numerically using, for instance, a quadrature rule. Assuming we want to evaluate on a finite set of nodes , we have
(3) 
where and are respectively the weights and nodes of such a quadrature. The discrete convolution
may be recast as a simple matrixvector product
(4) 
where , , and . Obviously, the matrix is dense. Therefore, the storage and computational cost grow quadratically. The computation of (4) is constrained to smaller problems ( a few thousand) on personal computers and smaller servers, and to for industrial servers.
The current approach is to perform the computation approximately up to a given tolerance (accuracy). In the past thirty years, multiple socalled acceleration methods have been proposed. The entries of can be seen as the description of an interaction between a source set of nodes and a target set of nodes . Thus all blocks of describe the interaction between a source subset of and a target subset of . Theses interactions may be compressible, i.e. it admits a lowrank representation. This is the case, for example, when two subsets are far enough following an admissibility criterion
. The methods mentioned in the following propose different alternatives on the way the interactions are characterized and computed. The standard way is probably the
Fast Multipole Method (FMM) developed by L. Greengard and V. Rokhlin (see [greengardFMM]), initially introduced for the computation of the gravitational potential of a cloud of particles. Later versions feature the support of oscillatory kernels like the Helmholtz Green kernel. One major drawback is that the implementations are mostly kernelspecific despite recent advances in the domain [fong2009]. We refer to [chengFMM] for more details. In 1999, a new approach named Hierarchical matrices (matrices) was introduced by S. Börm, L. Grasedyck and W. Hackbusch. This method is based on the representation of the matrix by a quadtree whose leaves are lowrank or fullrank submatrices. A strong advantage in favor of hierarchical matrices is that a complete algebra has been created: addition, multiplication, LUdecomposition, etc. Unfortunately, matrices become less effective for strongly oscillating kernels because the rank of the compressible blocks increases with the frequency of the oscillations. For more details and a complete mathematical analysis, we refer to [hackbusch2015, borm2015]. One of the most recent compression methods, to our knowledge, is the Sparse Cardinal Sine Decomposition (SCSD) proposed by F. Alouges and M. Aussal in 2015 [alouges2015]. It is based on a representation of the Green kernel in the Fourier domain using the integral representation of the cardinal sine function. One major advantage is that the matrixvector product is performed without partitioning of the space. However, there is no corresponding algebra. All the aforementioned methods aim at having a quasilinear storage and computational complexity.We introduce here the Fast Free Memory method (FFM) which blends together multiple features of the existing methods in order to have a minimalist storage requirement. It is designed for the computation of massive matrixvector products, up to billions of nodes, where methods featuring quasilinear storage complexity fail because of the nonlinear part. The FFM relies heavily on the FMM algorithm, in particular on the octreebased space partitioning, and introduces compression techniques featured in the matrices and the SCSD. In particular, the FFM is a kernelindependent method for the computation of the matrixvector product (3). A powerful feature is that the required storage is minimalist with linear complexity and that the computational complexity is quasilinear. This enables the computation of matrixvector products with hundred of millions of nodes on the source and target sets with laboratorysized servers. This paper is divided into three main parts. In the first part, we develop the FFM algorithm for standard kernels (nonoscillating kernels and the Helmholtz Green kernel). In the second part, we prove the storage and computational complexities. In the last part, we illustrate the proven complexities on academic and industriallysized problems with millions of entries up to one billion.
2 The FFM algorithm
The FFM is a divideandconquer method recursively implemented. It is based on two partitioning trees (one for the source set and one for the target set) strongly related to the octree used in the classical FFM. This inheritance and the particularities of the FFM are explained in the first part of this section where we also introduce the notations which will be used in this paper. In the second part, we develop the different steps of the algorithm for the case of a general convolution and we optimize it for the case of oscillatory kernels.
2.1 The FFM and the FMM
The basic idea in the FMM is that the interaction between subsets of nodes sufficiently far one from another admit a lowrank representation. The space is therefore partitioned using two octrees (see Figure 1) obtained by successive refinements of the bounding boxes of the initial source and target set of nodes. At each level of refinement, boxes far one from the other (following a given criterion) correspond to compressible interactions. The other boxes are further subdivided and yield in turn lowrank and noncompressible interactions. When the matrices corresponding to the noncompressible interactions are smallenough, a full computation is performed. For more details, we refer to the bibliography on the topic, for example [greengardFMM, chengFMM, darve1999, darve2000]. In the FMM, the lowrank approximation is performed using a multipole expansion of the convolution kernel, see [greengardHuang]. As this will be illustrated in the following section 2.2, the FFM is more versatile and such expansions will be used only for the oscillating kernels.
The differences between the two methods find their origin in the way the partitioning octrees are built. Classically, the root boundingbox for each set of nodes is tight in the sense that its edge dimensions fit tightly the expansion of the set of nodes in each direction. Consequently, the initial bounding boxes for the source and target sets have different sizes and shapes in general. In the FFM, both initial bounding boxes are cubic with the same
edge length. In particular, it means that for a given level of refinement all the boxes in both octrees are the same up to a translation. The direct consequence of this choice is that there is no need to interpolate data between the boxes of the different trees because the discretization in the quadrature spaces are the same. Another difference is that the FFM is a
descentonly algorithm. This last particularity enables the linear storage complexity, see section 3.2.2 The FFM algorithm
In this subsection, we describe the kernelindependent compression method for the matrixvector product. We first describe the initialization. Then, we explain how the kernelindependent matrixvector product is computed using a wellknown Lagrangeinterpolation method. We show how the user can choose the compression method when dealing with the particular case of the Helmholtz Green kernel. Finally, a stopping criterion is proposed. In the following, we reuse the previous notations and we introduce as the depth of the octree. The case corresponds to the root and is the maximum allowed depth.
2.2.1 Initialization
The initialization () is performed easily by computing the bounding boxes for and . Let and be the maximum edge dimension of , resp. , then the initial dimension for both bounding boxes is
(5) 
The initial bounding box for each set is simply a cube enclosing , resp. , with edge length . At the depth in the octree, the dimension of the bounding boxes is simply .
2.2.2 The kernelindependent FFM
We suppose that the current refinement level is . We consider only one interaction between a box of the source tree and a box of the target tree. This interaction is lowrank if the distance between their centers is greater than twotimes the edge length of a box. Let the number of nodes in the target box and the number of nodes in the source box, the compressed product is performed using a classical bivariate Lagrange interpolation, see for example [cambier2017] or the chapters related to the matrices in [hackbusch2015]. The principle of the Lagrange interpolation is to approximate the convolution kernel like
(6) 
where

and are the target and source control nodes for the Lagrange polynomials. Following [mastroianni2001, cambier2017], the best interpolation nodes are the Chebyshev nodes.

, resp.
are the Lagrange polynomials defined as the tensorization of the onedimensional Lagrange polynomials in each direction of the space and localized at the
control nodes. 
and are the ranks of the interpolation for the target and source variables. If are the rank of the interpolation in the direction , then and . For a prescribed accuracy on the interpolation, these ranks depend solely on the size of the interpolation domain (in other words: the size of the bounding boxes) and on the regularity of the kernel being interpolated. Consequently, we have
(7) in the FFM framework.
The interpolated matrixvector product can be therefore recast as
(8) 
where

is the source vector whose entries are localized at the nodes contained within the source box.

, resp. , is the interpolation matrix whose entries are the Lagrange polynomials localized at the source control nodes evaluated at the source nodes.

is the transfermatrix whose entries are the kernel evaluated for each possible couple of target and source control node.
However, is not necessarily low and can be further compressed using the Adaptive Cross Approximation, see [bebendorf], such that
(9) 
where and such that . Finally, the Lagrange interpolation consists in four successive matrixvector products such that
(10) 
As the rank depends on the kernel, it may increase unacceptably when dealing with oscillatory kernels because the polynomial order must be high to fit the oscillations in the subdomains. In that case, it is beneficial to use kernelspecific compression method as illustrated in the next section 2.2.3.
2.2.3 The FFM for oscillatory kernels
In this section, we illustrate the change of compression method for the special case of the Helmholtz Green kernel
(11) 
where is the wavenumber. A test is performed on the value of to determine wether the lowrank interaction is oscillating. For example, the evaluation of the Helmholtz kernel (11) for two nodes and sufficiently close can be detected as nonoscillating because the value of is small. In this case, we use the Lagrange interpolation described in subsubsection 2.2.2 instead of a specific lowfrequency FMM (see [greengard1998] or [darve2004]). In the other case, we approximate the kernel using its Gegenbauerseries expansion like in the FMM.
Let , resp. , be the center of the target, resp. source, box, then
(12) 
which can be reformulated like
(13) 
where . Let also be the unit sphere in , then following for example [darve1999]
(14) 
where and is the Gegenbauer series such that
(15) 
where is the spherical Hankel function of the first kind and of order , is the Legendre polynomial of order . In practice, (15) is truncated at the rank
(16) 
where is the size of the edge of the bounding box and is the prescribed accuracy on the matrixvector product, see [darve2000]. The integral (14) is computed using a spherical quadrature such that, for two nodes and ,
(17) 
The computation is therefore recast as three successive matrix vector products
(18) 
where

is the dense
matrix representing the discrete nonuniform forward Fourier transform from the
source set to the Fourier domain such that(19) 
is the transfer diagonal matrix whose entries contain the value of Gegenbauer series evaluated at such that
(20) 
is the dense matrix representing the discrete nonuniform backward Fourier transform from the Fourier domain to the target set such that
(21)
Of course, the Fouriermatrices are never assembled and the Fourier transforms are computed using the corresponding NonUniform Fast Fourier Transform (NUFFT) introduced by A. Dutt and V. Rokhlin [duttNufft1993], later improved by Greengard [greengardNufft]. We refer to these papers for a complete analysis of the algorithm.
Remark 1.
In the classical FMM, the Fast Fourier Transforms are computed at the deepest level in the trees, forcing a backpropagation in the octree. In the FFM, they are performed onthefly enabling a descentonly algorithm.
2.2.4 Stopping criterion
The recursive partitioning is stopped whenever one of the following is verified:

any compressible interaction has been computed,

the average number of nodes in the boxes is below some value.
The remaining noncompressible interactions, if any, are computed as a dense matrixvector product.
3 Complexity analysis
In this section, we prove the storage complexity and the computational complexity of the FFM, where . To that purpose, we assume that each set or
consists in an uniform distribution of nodes in a cube. The case of surface nodedistributions is eventually tackled as a particular case.
3.1 Complexity of an octree
We give here some general wellknown results on space partitioning trees. Assuming is the length of the edge of the root bounding box, the bounding boxes at level have the dimension . They contain (in average) nodes. There are at most nonempty boxes. Consequently, the maximum depth of an octree is . The construction of the tree itself requires operations. In general, the required storage is also but it is in the FFM framework because we store only the data needed at the current depth. For the sake of simplicity, we assume that the boxes contain exactly
nodes as it does not modify the overall estimate.
The particular case of surface distributions
We emphasize the fact that using an octree to subdivide evenly distributed nodes on a surface amounts to consider a plane surface partitioned using a quadtree as illustrated on Figure 2.
Consequently, we replace the by in the aforementioned results: the maximum depth is , etc.
Remark 2.
For the sake of simplicity, the subscript for the indicating the type of the logarithm is omitted whenever it is not required for the comprehension.
3.2 Complexity estimates for the kernelindependent FFM
We prove here the storage and computational complexity for the kernelindependent version. Since the FFM is a descent only algorithm, we can discard data stored at the parent level in the tree. This minimal storage requirement leads to the following proposition.
Proposition 1.
The storage complexity of the FFM is and the computational complexity is .
Proof.
Before we prove the estimates, we make some preliminary remarks. We first emphasize that the interpolation step for each of the source set, resp. target, is performed only once for all the corresponding subsets. Second, we drive the attention to the fact that, while there should be many interpolations to compute, most of the transfers are the same. On Figure 3, the two transfers (arrows) represented between the boxes from the source tree (full lines) and the target tree (dotted) have the same transfer matrix which is computed only once.
In fact, the amount of different translations is uniquely determined by the initial configuration of the root bounding boxes as illustrated in 2dimension on Figure 4. The left picture corresponds to two overlapping quadtrees. If we enumerate all the possible lowrank interactions for the filled box at the depth 2, we find . The filled bounding box interacts at most with the three outer layers minus the closest layer of boxes meaning there are at most possibilities per box at all levels of refinement. All the other boxes are children of boxes involved in lowrank interactions at the previous level of refinement. On the right configuration, the trees are shifted and the amount of possible interactions is now . This number corresponds to the total number of transfer matrices which needs to be computed at the current depth ; it does not depend on .

Let the number of control nodes for the Lagrange interpolation in each direction of space, then is bounded above by a value independent on . Let and , then the storage requirement for the first interpolation is per interpolation matrix. Since there are as many matrices as there are boxes in the tree at level , the storage requirement is then . The interpolation consists in matrixvector products, each of them of size . The computational complexity of the complete Lagrange interpolation of the source set for any is therefore .

Let be the maximum number of unique transfers (see again Figure 4), the storage and the computation of these matrices are each as the number and the dimensions of the transfer matrices do not depend on nor . Each interpolated source vector is then multiplied most times by a transfer matrix whose size is independent on or . Consequently, this transfer step has complexity for the storage and the computational complexity.

Finally, the last interpolation is performed in two steps:

The construction of the interpolation matrices for the target set has the same complexity as for the source set, i.e. it is .

The transfer to the target set of interpolations consist in computing the "interpolation matrix""transferred vector" for each of the "transferred vector". This step should be performed at most times for each of the target i.e. times.
Therefore, the second interpolation step has the same complexity as the step 1, i.e. it is in storage and computational complexity.

To these costs, we must add the worst cost of the construction of the octrees which is and of the noncompressible interactions which is linear because the maximum number of close interactions for one node is bounded above by the maximum number of nodes allowed in the leaves of the tree.
We conclude that the storage requirement for the FFM is . Regarding the computational requirement, the worst case consist in performing the interpolation times. We conclude that the computational cost for the complete FFM product is . ∎
Remark 3.
These complexity estimates do not depend on wether the nodes are evenly scattered in a volume (octreebased partitioning) or on a surface (equivalent quadtreebased partitioning).
3.3 Complexity estimates for the oscillating kernels
In the previous subsection 3.2, we proved very general estimates. When dealing with oscillating kernels, one must introduce the notion of wavelength and of "discretization" per wavelength. Let be the average distance between two nodes, it is generally chosen such that
(22) 
where we recall that is the wavenumber. In other words, there are approximately six nodes per wavelength. This is fundamentally different from the generic case as the number of nodes in a given volume depends explicitly on the wavenumber . We assume in the following that is accordingly adjusted as a function of .
We suppose again that the nodes are scattered evenly in a volume. The special case of surfaces is tackled at the end of this subsection.
Proposition 2.
The storage complexity for the oscillating FFM is and the computational complexity is .
Proof.
The proof is very similar to the proof of Proposition 1. We first remark that the NUFFT is based on the Fast Fourier Transform which has a storage requirement and a computational complexity. These estimates remain valid for the NUFFT but with a bigger multiplicative constant, see [greengardNufft]. The main difficulty is to estimate the dependency of to . To that purpose, we assume that for some we have . In our choice of spherical quadrature, we have
(23) 
where is given by (16). By substituting and expanding , we obtain
(24) 
where are constants. Following the hypothesis , we have that
(25) 
Consequently,
(26) 
meaning that . In the particular case of surface distributions of nodes, we have and the polynomial (25) becomes
(27) 
This means that for surface distributions of nodes, . Since the coefficients are all positive, we conclude that we always have .
We detail the complexity of each step below:

The NUFFT has the same complexity as the classical FFT. Therefore, the storage requirement for one NUFFT is meaning that the total storage requirement is . The total number of NUFFTs is and each has the computational complexity . Since , we conclude that the storage complexity of the first step is and that the computational complexity is .

The second step is a simple multiplication in the Fourier domain which is performed times on vectors of length , see the proof of Proposition 1. By remarking that , we conclude that this step has a linear storage and computational complexity.

The last step is the backward step of the first one. Consequently, we obtain exactly the same complexities.
By including the complexity of the tree and the computation of the noncompressible interactions (see the end of the proof of Proposition 1), we conclude that the worst storage requirement is still and that the worst computational complexity is . ∎
Remark 4.
In this section, we proved that the storage requirement of the FFM is always linear while the computational complexity is in the general case (but with eventually big multiplicative constants) and for the particular case of oscillating kernels. These estimates are not worse than the existing compression methods and they are illustrated in the next section.
4 Numerical Examples
In this section, we give examples of application of the FFM. Our implementation is written in the MATLAB language. First we illustrate the estimates proven in the previous section 3. In a second time, we show how one can use the FFM to solve Boundary Integral Equations iteratively and we solve two publicly available benchmarks in acoustic and electromagnetic scattering.
4.1 Scalability of the FFM
The scalability of the FFM is illustrated by computing the convolution product where the kernel is the Laplace Green kernel given below,
(28) 
Physically, it amounts to compute the gravitational potential generated by masses located at the source nodes at the target nodes. To that purpose, we pick randomly source nodes and target nodes on the unit sphere centered at the origin as illustrated on Figure 5.
Then, we simply compute the convolution product (4) with the FFM and we measure the memory requirement and the computation time. The computation^{1}^{1}1The convolution product between two sets of nodes may be computed using the ffmProduct() function, available within the opensource Gypsilab framework [gypsiHub] in the ./openFfm directory. An example is provided by running the nrtFfmBuilder.m script. was performed on a computer with 12 cores at 2.9 GHz, 256 GBytes of RAM and Matlab R2017a using single precision. The results are gathered in Table 1 for a prescribed accuracy on the matrixvector product. The linear storage requirement is confirmed. We observe furthermore that the computational scalability is close to linear. More importantly we are able to achieve a matrixvector product with one billion of nodes in each of the source and target set in less than four hours. On the required 100 GBytes, approximately 40 GBytes are required for the storage of the coordinates of the nodes, the input vector and the output vector. This implies that the multiplicative constant in the estimate is almost .
N  Time 1 core (s)  Time 12 cores (s)  Error  Memory 
2.04  9.08  1 MB  
9.30  17.1  10 MB  
87.8  33.4  100 MB  
1063  169  1GB  
–  1499  10 GB  
–  11340  100 GB 
The same experiment is repeated for the Helmholtz Green kernel. The results are given in Table 2 with the corresponding maximum value.
N  f (Hz)  Time 1 core (s)  Time 12 cores (s)  Error  
541  20  1.65  8.81  
1893  70  16.2  16.3  
5411  200  143  48.2  
16234  600  1557  350  
54113  2000  –  8246 
4.2 Boundary Integral Equations and the FFM
Boundary Integral Equations can be obtained from certain equations describing, for example, physical phenomenons like the propagation of an acoustic or electromagnetic wave in a homogeneous medium. We refer to [nedelec] starting p. 110, or [coltonKress] starting p. 66, for more details on how they are obtained. In order to explain how the FFM is used to solve such equations, we consider the scattering of an acoustic wave propagating in an infinite medium by a scatterer with boundary on which we apply a Dirichlet boundary condition. This problem can be tackled by solving the following Boundary Integral Equation
(29) 
where is some unknown, is the Helmholtz Green kernel (see (11)) and is the incident wave. This equation may be solved using different method. Here we present shortly the Boundary Element Method based on a Galerkin formulation. We introduce a test function to obtain the Galerkin formulation of eq. (29)
(30) 
We further introduce the discrete approximation spaces and such that
(31)  
(32) 
There are multiple ways to deal with this singular integral. Here we integrate the double integral using a GaussLegendre quadrature. The resulting inaccurate integration of the singularity is tackled later. Let and be the weight and nodes of quadrature, the eq. (30) now reads
(33) 
Therefore, eq. (29) is rewritten as linear system of equations,
(34) 
The Galerkin matrix can be recast as the product of three matrices such that
(35) 
where is the matrix "transporting" the basis functions to the quadrature nodes, is the matrix "quadraturetotestfunctions" and is the matrix such that . While the matrices and are sparse and can be stored with linear complexity, is full. In the process of an iterative inversion algorithm such as GMRES, see [saadSchultz], one or more matrixvector products are required,

. This product has linear complexity.

. This product is compressed using the FFM.

. This product also has linear complexity.
In general, is closely related to the number of elements in the discretization of . Assuming for example that there are three quadrature nodes per element, then meaning that the size of may be in fact much higher than the actual size of the linear system. The singular integral is computed independently using a semianalytical method. It takes the form of an additional sparse matrix which "removes" the singularity integrated numerically in and adds the "exact" integration of the kernel.
Our FFM library is interfaced with the Gypsilab software. Gypsilab is an opensource (GPL3.0) Finite Element framework entirely written in the Matlab language aiming at assembling easily the matrices related to the variational formulations arising in the Finite Element Method or in the Boundary Element Method. Among other things, it features a complete matrix algebra (sum, product, LU decomposition, …) compatible with the native matrix types of Matlab. For more details on the capabilities of Gypsilab we refer to [gypsilab, gypsiHub]. In the context of this paper, we use it to manage the computation of the matrices , and the righthandside ^{2}^{2}2The highlevel interface with the FFM is available as an other overloaded integral() function within Gypsilab. An example is provided by running the nrtFfmBuilderFem.m script..
We present here two examples of application. The first one corresponds to the scattering of an underwater acoustic wave by a submarine and the second one is the scattering of an electromagnetic wave by a perfectly electric conductor rocket launcher.
4.2.1 Acoustic scattering by a submarine
We solve the acoustic scattering by a submarine with Neumann Boundary condition. This example is based on the BeTSSi benchmark [betssi]. The mesh is provided by ESI Group and it is remeshed using the opensource Mmg Platform [mmg]. We could solve this problem using the following equation
(36) 
whose variational formulation is
(37) 
where is the unknown, is the test function, the wavenumber, is the outbound normal vector at position , and is the normal derivative with respect to the variable . Eq. (36) is very illconditioned and is also illposed for some frequencies. The integral operator is called the hypersingular boundary integral operator. The scattering problem could also be solved using another boundary integral equation
(38) 
which is better conditioned but which is also illposed for some frequencies and less accurate in practice. The boundary integral operator in eq. (38) is the adjoint of the double layer operator. To circumvent these issues, we use a linear combination of the integral operators involved in eq. (36) and (38) called the BrakhageWerner formulation, see [kress]. This formulation is better conditioned and always wellposed. It is important to note that each iteration in the GMRES algorithm requires in fact FFMproducts in our implementation: three for the part with the scalar product of the normal vectors, three for the part with the scalar product of the surface rotational, and three for the adjoint of the double layer operator.
The submarine is m long and the frequency is set at kHz. Assuming, the celerity of sound in water is m.s, the wavelength is m meaning that there are approximately wavelengths along the submarine, or differently written we have . The mesh features triangles. Since the numerical integration is performed using a quadrature rule with 3 nodes per triangle, the size of one FFMproduct is . The problem is discretized with elements implying that the total amount of (nodal) unknowns is . It is solved using a preconditioned GMRES algorithm without restart on a 32 cores server at 3.0 GHz with 512 GBytes RAM. The tolerance for both the GMRES and the FFM product is set to . Convergence is achieved in 12 iterations and 50000 seconds ( 13 hours and 50 minutes) including the assembling of the preconditioner and the regularization matrix. Each iteration requires approximately seconds. The radiated field on the surface is represented on Figure 6. This computation is also performed using the FFM. The memory peak is measured at approximately 200 GBytes when assembling the preconditioner and the regularization matrix.
4.2.2 Perfectly electric rocket launcher
This example is a slightly modified version of a test case extracted from the Workshop EMISAE 2018, see [workshop2018]. We study the scattering of an electromagnetic plane wave by a perfectly electric launcher. This problem is solved the Combined Field Integral Equation (CFIE) which is a linear combination of the Electric Field Integral Equation (EFIE) and the Magnetic Field Integral Equation (MFIE). For the construction of these equations, we refer once again to [nedelec] starting p. 234, or [coltonKress] starting p. 108. The EFIE reads
(39) 
where is the tangential trace of the magnetic field, is the incident electromagnetic wave, is an impedance, and is the tangential trace operator. The MFIE reads
(40) 
where is the incident magnetic field. The CFIE then reads
(41) 
We use the classical RaviartThomas finite element space of order 0 (RT). The launcher is 60 m long and 12 m in diameter (including the boosters). The electromagnetic wave propagates at GHz meaning that the wavelength is 0.15 m, assuming a celerity of light equals to m.s. Therefore, there are wavelength along the launcher. The total number of unknowns is for approximately triangles. Consequently, the size of a FFM product is . We use the same server as in subsubsection 4.2.1. One GMRES matrixvector product involves now ( for the EFIE, for the MFIE) FFM products and lasts approximately hours. The real part of the solution is represented on Figure 7.
4.3 The FFM compared to the matrices
We want to stress that for lower problems sizes (), the FFM performs, in general, worse than other methods like the matrices or the FMM. Indeed, everything is computed anew at every call to the method in order to save memory. When compared to the matrices, the assembling of the matrix costs more than a few FFM products but the matrixvector product is then almost free. This is illustrated in Table 3 for the Laplace Green kernel. The computation is performed on a laptop and a single core at 2.6 GHz is used. The total available RAM is 16 GBytes. We use the matrix library featured in Gypsilab. However, the FFM enables the computation of massive convolution products as illustrated in subsections 4.1 and 4.2, even for oscillating kernels, on small servers, thus avoiding the use of huge infrastructures.
N  Time FFM product (s)  Time matrix assembly (s)  Time matrix product (s) 

0.97  0.68  0.051  
2.16  5.9  0.13  
13.1  54.4  1.09  
103.5  733 (swapped)  122 (swapped) 
5 Conclusion
We have proposed a powerful and scalable alternative to the existing compression methods when dealing with convolution featuring millions, or eventually billions, of nodes. The two main ingredients of the FFM are: a descentonly tree traversal, and cubic bounding boxes with the same edge size for the two octrees. It enables a linear storage complexity and a quasilinear computational complexity which are numerically highlighted. The FFM is kernelindependent but its versatility enables optimization for strongly oscillating convolution kernels. Moreover, the implementation effort is small as only a few hundred Matlab lines are required. From a practical point of view, the FFM gives the possibility to compute realistic problems to users who do not have access to a computer cluster. The code is now freely available in the git repository of Gypsilab at [gypsiHub] under the GPL 3.0 license.
Acknowledgements
This work is funded by the Direction Générale de l’Armement. We would like to thank François Alouges for his invaluable help in the development of this work and Leslie Greengard for providing the NUFFT code used for the computation of the numerical results.
Comments
There are no comments yet.