Ames Laboratory*,
Department of Energy,
ISU,
Ames, Iowa
| Srinivas Aluru Ames Laboratory Iowa State University Ames, Iowa, 50011 |
G. M. Prabhu Computer Science Dept. Iowa State University Ames, Iowa, 50011 |
John Gustafson Ames Laboratory Iowa State University Ames, Iowa, 50011 |
(N log2 N) in two dimensions and
(N log4 N) in three dimensions. We analyze the Greengard and Barnes-Hut algorithms and show that they are unbounded for arbitrary distributions. We also present a truly distribution independent algorithm for solving the N-body problem in O(N log N) time in two dimensions and in O(N log2 N) time in three dimensions.
A large number of physical systems can be studied by simulating the interactions between the particles constituting the system. In a typical system each particle influences every other particle, often based on an inverse square law such as Newton's law of gravitation or Coulomb's law of electrostatic interaction. Examples of such physical systems can be found in astrophysics, plasma physics, molecular dynamics and uid dynamics. Since the simulation involves following the trajectories of motion of a collection of N particles, the problem is termed the N-body problem. Apart from traditional applications in the study of physical systems, some problems in numerical complex analysis and elliptic partial differential equations can also be solved using this approach. Applications of the problem are also found in the radiosity method, which attempts to create images by computing the equilibrium distribution of light for complex scene geometries.
Since it is not possible to solve the equations of motion for a collection of four or more particles in closed form, iterative methods are used to solve the N-body problem. At each discrete time interval, the force on each particle is computed and this information is used to update the position and velocity of each particle. A straightforward computation of the forces requires O(N2) work per iteration. The rapid growth with N effectively limits the number of particles that can be simulated by this method.
Several approaches have been used to reduce the complexity per iteration. Some of the techniques include transforming the problem to a position-velocity phase space, imposing a grid on the system of particles and computing cell-cell interactions. Such techniques either fail to model the system accurately or degrade to O(N2) complexity for non-uniform distributions of the particles.
Recently, a new class of particle simulation methods have emerged to solve the N-body problem. These methods are characterized by an organization of the particles into a hierarchy of clusters, starting from a cluster containing all the particles to clusters containing the individual particles. These methods are usually referred to as hierarchical methods or tree methods. Such a method was first proposed by Appel [2], whose scheme allows for clusters with arbitrary shapes.
A number of algorithms followed the work of Appel. Widely respected among these are the Barnes- Hut [4] and the Greengard [7] methods. Both depend on a data structure constructed by a fixed hierarchical cubical subdivision of the space. Salmon [14] studies the Barnes-Hut algorithm in great detail. While Appel and Barnes-Hut achieve required accuracy by restricting which clusters may interact, Greengard uses multipole expansions [7] to approximate the interactions to the desired precision. With the notable exception of Greengard, most researchers paid little attention to a rigorous worst-case complexity analysis of their algorithms. Greengard claims his algorithm reduces the complexity to O(N) per iteration.
In this paper, we show that the Greengard's algorithm is not O(N), as claimed. Both Barnes-Hut and Greengard's methods depend on the same data structure, which we show is distribution-dependent. For the distribution that results in the smallest run- ning time, we show that Greengard's algorithm is O(N log2N) in two dimensions and O(N log4N) in three dimensions. Both algorithms are unbounded for arbitrary distributions.
We have designed a hierarchical data structure whose size depends entirely upon the number of par- ticles and is independent of the distribution of the particles. Both Greengard's and Barnes-Hut algorithms can be used in conjunction with this data structure to reduce their complexity. Apart from reducing the complexity of the Barnes-Hut algorithm, the data structure also permits more accurate error estimation. The multipole algorithm designed using this data structure has a complexity of O(N log N) in two dimensions and O(N log2 N) in three dimen sions. To the best of our knowledge, this is the fastest distribution-independent algorithm for the N-body problem.
The rest of the paper is organized as follows: In Section 2, we analyze the complexity of Greengard and Barnes-Hut algorithms. We derive lower and upper bounds on the data structure used in these algorithms and use this result to disprove claims on their complexity. Section 3 contains the description of our new hierarchical data structure. We also show how to use the Greengard and Barnes-Hut algorithms on this data structure. An algorithm to create this new data structure is described in Section 4.
The Greengard and Barnes-Hut methods for computing N-body interactions consist of two alternating phases, repeated every time step:
Figure 1: Barnes-Hut physical subdivision of space for a collection of three particles in two dimensions.
The same data structure is used in both methods, constructed as follows: Begin with a cell (square in two dimensions and cube in three dimensions) big enough to contain all the particles. Subdivide the cell into 2d cells having half the side length of the original cell (where d is the number of dimensions). Discard cells that do not contain any particles. Stop the subdivision process on cells having exactly one particle. Recursively subdivide the cells that contain more than one particle. This recursive subdivision of the space into cells is naturally represented by a tree, which we shall refer to as the Barnes-Hut (BH) tree.
Figure 1 shows the Barnes-Hut physical subdivision of the space for a collection of three particles positioned as shown. The corresponding BH tree is shown in Figure 2. For convenience and simplicity, a two-dimensional problem is discussed but the results carry over to three-dimensional problems as well. In two dimensions, each cell is subdivided into four cells and the resulting structure is a quad-tree. In the example shown, the first subdivision separates particle 1 from particles 2 and 3. The next three subdivisions performed to separate particles 2 and 3 are not successful as one of the child cells at every level of the subdivision contains both the particles and the other three contain none. The recursive subdivision is continued until the particles 2 and 3 are separated.
Figure 2: The Barnes-Hut tree corresponding to the physical subdivision of space in Figure 1.
From this example, it is clear that a large number of recursive subdivisions may be required to separate
particles that are very close to each other. Let N be the number of particles in the system and let s be the smallest interparticle distance. We require s > 0 to avoid infinite interaction force. Let D be the length of a cell that can contain all the particles. Clearly, the worst-case path length of the BH tree is given by the worst-case path needed to separate the two particles which are closest to each other. The size of the smallest cell that can contain two particles s apart in two dimensions is
The paths separating the closest particles may contain recursive subdivisions until a cell of length smaller than s2 reached. Since each subdivision halves the length of the cells, the maximum path length is given by the smallest k for which
Figure 3: Smallest cells that could possibly contain two particles that are s apart in two and three dimensions.
In either case, the worst-case path length is O (log
) and the number of nodes in the tree is bounded by O(N log
).
Greengard assumes the length D of the cell containing all the particles is one, which can be achieved by appropriate scaling. Greengard's arguments can be summarized as follows: For a fixed machine precision, only certain classes of particle distributions can be modeled, independent of the algorithm used. Therefore, by restricting attention to only those particle distributions that can be modeled on a given machine, s has to be no less than the smallest floating point number representable. Thus, log
is bounded by a constant, termed p. The size of the tree is bounded by O(pN). Greengard determines the running time of his algorithm in two dimensions to be
The above arguments imply that the height of the tree is bounded by O(p), a constant. Yet, we know that the height of a tree with N leaves and at most a constant number of children per node is
(log N). How can this disparity be explained?
The problem lies in the assumption that the parameters D and s are entirely dependent on the spatial distribution of the particles and not related to the number of particles N. To understand why this assumption is invalid, consider the behavior of
as a function of N.
To minimize the ratio
for a fixed N, all the particles should be at a distance of s from their nearest neighbors. To see why, suppose this is not true. We can reduce D by 'moving-in' particles that are farther than s from each other, while keeping s the same. Or, we can increase s by increasing the distance between particles that are s apart, keeping D unchanged. In either case,
decreases, contradicting minimality. Furthermore, the particles must be packed as closely as possible. Figure 4 shows the configuration minimizing the ratio
for a fixed
Figure 4: The configuration minimizing the ratio of the cell length containing all the particles and the smallest interparticle distance in two dimensions. The ratio is minimized when all particles are a distance s from their nearest neighbors.
N in two dimensions. Each particle has six nearest neighbors, all at a distance s. The particle is at the center of the hexagon formed by its nearest neighbors. The particles do not fit in a cell of size smaller than D x D. Adding the particles column-wise,
for some constant c1 . Since this is computed using the configuration minimizing the ratio
, the worst- case path length (log
) is
(log N). In three dimensions,


is bounded by p, p is also
(log N). The error in Greengard's proof was the assumption that p was independent of N.
How does this affect particle distributions that can be modeled on a machine with precision parameter p? It is already noted that not all distributions can be modeled for any given N 3 because of precision limits. However, unless p
c log N (c a constant), no distribution can be modeled for that N. The very fact that we are able to run an N-body problem for a collection of N particles with precision parameter p implies that p
c log N. Thus, p cannot be taken as a constant in the analysis of the running time of the algorithm and Greengard's algorithm is not O(N). Greengard's time complexity in two dimensions is
, which is
(N log2 N). The three-dimensional complexity is
, which is
(N log4 N).
Next, let us investigate how large
can be for a fixed N. For any N 3 particles,
can be made arbitrarily large by reducing the distance between the closest particles (thus reducing s), or by increasing the spread of the particles (thus increasing D). Hence, the worst-case path length does not have an upper bound as a function of the number of particles and is entirely dependent upon the spatial distribution of the particles. This immediately implies that the size of the BH tree is unbounded and can be arbitrarily large for a fixed N. Since both the Greengard's and Barnes-Hut algorithms construct and visit each node in the BH tree at least once, these algorithms are unbounded for arbitrary distributions.
Clearly, not all particle distributions can be modeled on a given machine due to precision limits. But, an algorithm whose running time depends upon the distribution is undesirable. An analogy can be drawn to a sorting algorithm whose running time depends on the size of the numbers to be sorted. The complexity of a sorting algorithm is O(n log n), provided basic operations on the numbers to be sorted (like comparison, copying) can be accomplished in constant time. The complexity of the algorithm does not remain O(n log n) if this assumption is not valid. But, a sorting algorithm with running time as a function of the size of the numbers to be sorted is undesirable. Similarly, it is reasonable to assume that the distribution of the particles is representable in a given machine but algorithms whose running times depend on the distribution are undesirable.
The BH tree can contain a path on which every node represents the same set of particles, though each node represents a cell of a different size. Such a path can be arbitrarily large irrespective of the total number of particles. Each node on the path represents a cell of exactly half the length of the cell represented by its parent. Our intent is to rectify this unbounded nature of the BH tree.
Let v1, v2, ... , vk (k 2) be a maximal path in the BH tree such that each node of the path represents the same set of particles. The maximality of the path ensures that v1's parent has more particles than v1 and no child of vk has the same particles as vk. Since only cells having more than one particle are subdivided, it is imperative that vk have at least two child nodes. We can also assume without loss of generality that v1 has a parent. Otherwise, v1 has to be the root of the tree, thus containing all the particles in the system. By the property of the path v1, v2, ... , vk, vk also contains all the particles in the system. This simply means that our choice of the initial cell is too large for the system of particles and a cell
length of it (this is the cell represented by vk) can contain the entire system. In this case, we can safely make the subtree rooted at vk be the BH tree. Therefore, it can be assumed that v1 always has a parent. Furthermore, vi is the only child of vi-1 (1 < i ² k).
Consider such a maximal path v1, v2, ... , vk . Every node on such a path represents the same particles, but using cells of different sizes. Nodes in the BH tree are used to store aggregate information on the collection of particles they represent. For example, the Barnes-Hut method keeps track of the total mass and the center of mass of the collection of particles. Greengard's method computes the multipole expansion of the collection of particles. Since every node on such a path represents the same particles, they all contain the same information. Therefore, a modified tree obtained by eliminating this redundancy should contain the same information as the BH tree.
Consider the tree obtained by replacing every such maximal path by the last node on the path. The modified tree has N leaves, one per particle and the root of the tree contains all N particles, just as in the Barnes-Hut tree. The only deviation is that no internal node can have the same particles as one of its children. Since the number of particles contained by a node is the sum of the number of particles contained by its children, this translates to the condition that each internal node has at least two children. Let S(N) be the number of nodes in the modified tree for a collection of N particles in d dimensions.
These equations are satisfied by S(N)
2N-1, and the size of the tree is bounded by O(N) in any dimensions. A tree containing N leaves has a size
(N), proving the optimality of the modified tree. Since each child contains at least one particle less than its parent, the path length is also bounded by O(N).
In the Barnes-Hut method [4], the BH tree is traversed once for every particle in the system to approximate the force acting on the particle due to the rest of the system. The force on any particle p is approximated using the following recursive calculation: Let l be the length of the cell currently being processed. Let d be the distance between the particle and the center of mass of the cell under consideration. If
, where 0
< 1 is a prespecified accuracy criterion, the cell is treated as a single particle of equivalent mass located at the center of mass for the purpose of force calculation. Otherwise, the children of the cell are examined recursively to compute the force on p. The force calculation starts by examining the root cell. This calculation is repeated once for every particle in the system.
Let v1, v2, ... , vk be a maximal path in the BH tree such that each node contains the same particles and let v0 be the parent of v1. In the modified tree, vk is the child of v0. Suppose that the node v1 is reached while traversing the Barnes-Hut tree to compute the force on a particle p. Either the tree traversal stops at some vi (1
i
k) or the traversal proceeds to the children of vk. Let l (vi ) be the length of the cell, cm(vi ) be the center of mass, and M (vi ) be the total mass of the particles in the cell represented by node vi . Note that
If the traversal stopped at some vi (1
i
k),
is the accuracy criterion. Since vj is the only child of vj-1
(1 < j
k), the force contributed by the subtree rooted at v1 is the force between p and a mass of M (vi ) located at cm(vi ). In traversing the modified tree, vk is reached instead of v1. Since k
If the Barnes-Hut tree traversal proceeds to the children of vk, the same happens in the modified tree also. The force contributed by the subtree rooted at v1 in the Barnes-Hut tree is the force contributed by the subtree rooted at vk , which is the same for both trees. In either case, the force computations give the same result on both trees. The modified tree rectifies the unbounded nature of the Barnes-Hut tree without changing the force calculations of the Barnes-Hut algorithm. However, it improves the Barnes-Hut algorithm in two important ways: First, the running time of the algorithm is reduced. In fact, the worst-case traversal on the Barnes-Hut tree is unbounded as the BH tree is unbounded. The modified tree also allows for more accurate error estimation. The error in approximating the force between a particle p and a cluster of particles by treating the cluster as a single particle of equivalent mass located at the center of mass is proportional to
, where dr is the radius of the cluster and r is the distance of its center of mass from p. In the Barnes-Hut algorithm, the error created by treating the cell represented by node vi as a single particle is therefore proportional to
i
k), the error made is computed to be proportional to
Greengard's fast multipole algorithm [7] is a two- pass procedure on the BH tree. The first pass is a bottom-up traversal of the tree in which a p-term multipole expansion is formed at every node of the tree, where p is a precision parameter. The multipole expansions at the leaves are computed directly. At any internal node, the multipole expansion is formed by shifting the multipole expansions of the child nodes to the center of the cell represented by the node and adding them together. In the second pass, the tree is traversed top-down to compute the local expansions at every node. The local expansion at a node is formed by shifting the local expansion at the parent node to its center, shifting the multipole expansions of the well-separated children of the nearest neighbors of the parent of the node to its center and adding them together. Finally, the local expansions at every leaf are evaluated to compute the approximate cumulative force on each particle. For a detailed description of Greengard's algorithm, see [7].
Consider a run of the Greengard's algorithm on the BH tree containing a path v1, v2, ... , vk , where each node represents the same particles. Since vi is the only child of vi -1 (1 < i
k), the multipole expansion at vi -1 is formed by shifting the multipole expansion of vi to the center of the cell represented by vi -1. The multipole expansions at these nodes are merely translations of one another. Since v1, v2, ... , vk is a chain, the multipole expansions at these nodes are useful only to compute the multipole expansion of v1's parent. However, the contribution by v1's multipole expansion to the multipole expansion of its parent can be directly obtained by shifting the multipole expansion of vk to the center of the cell represented by the parent of v1. Thus, computing the multipole expansions at v1, v2, ... , vk-1 is unnecessary and is avoided by the modified tree. A similar argument shows that the correct local expansions at the leaves can be obtained using the modified tree.
In the multipole algorithm designed to run on the modified tree, the precision parameter p is a constant since it can be chosen independent of N. In the Greengard's algorithm, p has a lower bound of log N. This is because p is also used as an upper bound on the worst-case path length (log
) of the BH tree, which has a lower bound of log N. Therefore, p cannot be chosen independent of N and is also a function of the distribution of the particles. In the multipole algorithm on the modified tree, the precision parameter is merely a function of the desired accuracy of the force calculations chosen independent of the number and distribution of the particles.
The new algorithm consists of two traversals of the modified tree. Computing the p-term multipole/local expansions at a node take constant time. Evaluating a p-term local expansion for every particles also takes constant time. Since the number of nodes in the modified tree is O(N), running the multipole algorithm on the modified tree takes O(N) time. This is irrespective of the distribution of the particles.
The running time of this algorithm depends on the complexity of the tree creation and the complexity of performing the force calculations. It is already noted that the force computations can be performed in O(N) time on the modified tree. In the next section, we show that the modified tree can be created in O(N log N) time in two dimensions and in O(N log2 N) time in three dimensions. Thus, the new multipole algorithm has a complexity of O(N log N) in two dimensions and O(N log2 N) in three dimensions.
This section describes an algorithm to construct the modified tree for a collection of N particles. The concepts are illustrated with two-dimensional figures for convenience, but the results are applicable to three-dimensional problems as well. First, some terminology:
The physical space containing the particles is subdivided using cells. A cell is completely determined by the length of an edge of the cell and the position of one of the corners of the cell. The corner is chosen to be the point in the cell with the smallest value for each coordinate. In two dimensions, this is the lower, leftmost corner. Let l be the length of a cell. In order to describe the subcells of this cell, the corner of this cell is taken to be the origin. The cell contains 2kd cells of length
. The cells are positioned at (i
, j
) (0
i, j < 2k -1) in two dimensions. A line is called a k-boundary if it contains an edge of a cell of length
. There are 2k + 1 lines parallel to each axis and spaced
apart that are k-boundaries. The intersections of the k-boundaries determine the cells of size
. Any k-boundary is also a j-boundary for every j > k. See Figure 5. In three dimensions, a k-boundary is a plane containing a surface of a cell of size
. Note that the description of the subcells and the boundaries is relative to a cell.
A simple recursive algorithm for creating the modified tree for a cell containing a collection of particles is given below:
BuildTree(c)
BuildTree is initially called with a cell large enough to contain all the particles in the system. The running time of BuildTree can be computed by the amount of work done at every node of the modified tree, which is steps 1-4 and 6. Steps 4 and 6 require a constant amount of work at every node of the modified tree. Steps 2 and 3 can be accomplished as a byproduct of Step 1, as we shall see later. Step 1 can be accomplished as follows:
Let l be length of the cell c passed as input to BuildTree. Any cell smaller than c but contained in c has length
for some k > 0. By a suitable transformation, the corner of c is chosen to be the origin. Let b be the smallest box (a rectangle in two dimensions) containing all the particles of c. The rectangle is given by (xmin, ymin), (xmax, ymin), (xmax, ymax), (xmin, ymax), where xmin is the smallest x coordinate of all the particles in c etc. The smallest cell in c containing all the particles should also contain the box b. A cell of size encloses b iff no k-boundary passes through b (see Figure 5). The smallest cell enclosing b is of size
,where k is the smallest integer for which a k-boundary passes through b. To determine this, we can examine boundaries parallel to each coordinate axis in turn.
Consider boundaries parallel to the y-axis. These can be specified by their distance from the y-axis. The family of k-boundaries is specified by i
,0
i
2k .
We need to find the smallest integer k such that a k- boundary parallel to y-axis passes through b, i.e. the smallest k such that xmin< i
< xmax for some i. By minimality of k, only one k-boundary passes through b. Let j be the smallest integer such that
< (xmax - xmin).
There is at least 1 and at most 2 j-boundaries passing through b. These boundaries are given by
and
. Since k
j, any k-boundary is also a j-boundary, forcing the k-boundary passing through b to coincide with h1 or h2. Let a be
.
and h2 = h1 or (a + 1)
. If
, let a' be the even integer among a and a+1. Otherwise, let a' be equal to a. It is clear that j - k is equal to the highest power of 2 that divides a'. One way to find this is
. Since all the above operations take constant time, the smallest cell contained in c enclosing the box b can be determined in constant time.
It is already established that the modified tree has O(N) nodes. The tree is created top-down starting at the root. At each node, the particles with the smallest and the largest coordinates in each dimension (xmin, xmax, ymin and ymax in two dimensions) are computed to identify the smallest box enclosing all the particles represented by the node. The smallest cell enclosing this box is computed and the children of the node determined in constant time. Note that the particles are not distributed among the child nodes. Such a distribution would result in O(N2 ) time for tree creation. Distributing the particles to the child nodes is not necessary provided we can determine the particles with extreme coordinates in the child nodes. Except for this task, the rest of the computations are done in constant time per node, for a total of O(N) time.
In BuildTree, we also need to determine cases where the cell contains exactly one particle or none. This can be determined as a byproduct of the computation of the smallest box b containing all the particles in the cell. If xmin = xmax and ymin = ymax, the cell contains exactly one particle. If the answer to any of the 4 queries is < none >, the cell is empty and can be discarded.
Finding the points with extreme coordinates can be translated to a range query problem, stated as follows: Given N points, set up a data structure to answer queries of the form 'which point has the smallest x-coordinate among the points that lie in a given square?' efficiently. Since the modified tree has O(N) nodes and we require four such queries per node (eight in three dimensions), the number of queries is O(N). The range query problem can be solved in O(N log N) preprocessing time and O(log N) query time in two dimensions. The corresponding times for three dimensions are O(N log2 N) and O(log2 N) respectively. The solution makes use of Priority Search Trees [12] and is omitted in the interest of brevity. The time required for preprocessing and answering O(N) queries is O(N log N) in two dimensions and O(N log2 N) in three dimensions. Consequently, the modified tree can be created in O(N log N) time in two dimensions and O(N log2 N) time in three dimensions.
The N-body problem has a lower bound of
(N) to compute all pairwise interactions. In light of the proof that the Greengard's method is not O(N), the fastest distribution-independent algorithm has a complexity of O(N log N) in two dimensions and O(N log2 N) in three dimensions. This complexity is mainly due to the hierarchical tree creation. A linear time algorithm to update the tree structure results in an algorithm with complexity matching the lower bound. The possibility of such an algorithm remains to be investigated.
The authors wish to thank Dr. Ravi Janardhan for suggesting the use of priority search trees to solve range query problems.