
For the first working area, we have extended the applications of this algorithm to external flows. To further speed up the simulation in viscous three-dimensional problems, we have studied and implemented a compact scheme which requires a much lower communication to computation ratio and has shown promise in the sample problem of a heat transfer in a plate simulation. The second working area involves comparison of various domain decomposition strategies of unstructured grids, and developing an optimum approach according to computer architecture.
Baker has written a numerical fluid dynamics code to study the performance characteristics and opportunities of parallel computers. The code models the behavior of inviscid, compressible fluid particles using an upwind scheme with an explicit, multilevel time integration. Both supersonic and subsonic flow regimes and various grid configurations have been examined.
This research work has led to the publication of a Ph.D thesis by Aluru under the supervision of John Gustafson and a presentation at the Supercomputing '94 conference. Aluru has left Ames Lab for NPAC, but continues to collaborate with Gustafson on the publication and presentation of these results.
A new version obtains an expression that gives the minimum number of NAND gates required to realize logical operations, for example, an n-bit adder. We have discovered in 1994 the surprising result that logical operations are much more difficult to optimize than algebraic operations, even when the vocabulary of possible operations is restricted to a single type like a NAND. Conversely, this may imply that searches become more tractable when used on higher-level constructs because higher-level operators tend to have more pruning rules and more dramatic effects from pruning.
The hoped-for parallelism in the resulting algorithm has materialized; early versions were very resistant to any method of distributing the data. Our method permits each processor to work asynchronously using "best available" information.
The SLALOM work led to a paradox posed in the 1993 report: either the Monte Carlo method is obsolete because the convergence is inferior to hierarchical methods (which seems to fly in the face of statistical theory) or it is comparable (which means the hierarchical methods have been oversold because of overlooked and unremovable sources of error or uncertainty in the answer). This paradox has been recently resolved, using a combination of theory and computational experiment; it is the latter case. Monte Carlo has identical convergence rate for two-dimensional problems, and slightly superior convergence rate for three-dimensional problems. A flaw was found in Hanrahan's "proof" of linear complexity. An argument has been found for order n exp(-1/2) error for two-dimensional hierarchical radiosity, the same as Monte Carlo. As a result, the remaining advantage of the hierarchical method is that it provides a rigorous bound whereas Monte Carlo can only provide a statistical range for the answer. Also as a result, we are renewing our interest and confidence in the Monte Carlo method as a viable approach to realistic view-independent rendering.
A conventional ray tracer (MESA) set up for Cray vector computers was ported to the nCUBE 2S at the SCL. This ray tracer was used in conjunction with the Adventures in Supercomputing program. Much of the port involved the removal of wasteful static memory allocation that prevented the program from fitting into a 4 megabyte nCUBE node. A typical frame required 40 seconds on a single CRAY Y-MP processors, and surprisingly only 45 seconds on a single nCUBE 2S processor. (Ray tracing tends to stress subroutine call overhead, memory latency, square roots, and divide operations that are relative strengths of the nCUBE and relative weaknesses of the CRAY). Thus, the 256 processors of the nCUBE 2S could be used in parallel to produce the independent frames for a Quick Time animation over 200 times faster than the CRAY. Since the nCUBE is also at least 50 times less expensive per hour to operate than the CRAY, the net price-performance improvement was about four orders of magnitude. The solution is complete in that all pieces of the software, from wire-frame model to Quick Time animation, are finished and working.
With the acquisition of a shared memory multiprocessor, Don is looking for programming tools targeting both distributed and shared memory. Qisun Feng has been collecting information and software from other sites to help make this decision. The performance tools will be extended to parallel versions of C++ and Fortran 90.
Don has also been working with Zhiming Sun of the Ames Lab Applied Mathematical Sciences Program for performance improvements of his wave propagation program. So far, the runtime has been reduced by 60% on a single processor and a parallel version is being considered.
We will be working closely with Lou Fishman of the Applied Mathematical Sciences Program on this project.
Conversion of the UTCOMP petroleum reservoir simulator, which is currently F77 for a Cray will be continued. The conversion path is to "clear text" F77 then F90 and HPF, with initial target systems being the local SCL multiprocessor systems (SGI, nCube, Intel). The intent is to study the conversion process, and build support tools for code conversion, incremental testing, and new code development. This requires testing of the new Fortran D and HPF compilers, installation of a new linear solver package, testing of the revised code and returning it to the authors at Univ. of Texas. We plan to bullet-proof the support tools, release them to other sites, and apply them to other projects.
We hope to be doing novel approaches to virtual reality by 1996, with the attitude that VR be the ultimate output of most scientific simulations. We note that the mainstream VR community has a workstation focus and is behind in thinking about MPP to solve some of its most daunting computer problems. We now have an excellent feel for the tradeoffs involved in the only two approaches that have the potential to solve Kajiya's Rendering Equation, and both are highly amenable to parallel decomposition.
N-body Algorithms
Aluru proved in late 1993 that the celebrated Greengard multipole algorithm is not order N as had been claimed, but instead is order N(log N)4. This astonishing result stems from a flaw in Greengard's decade-old proof that failed to recognize that machine precision and the number of distinct particles are related. All of the newer methods for many-body mechanics (Greengard, Appel, Barnes-Hut) fail unless the distribution of particles is "roughly uniform." Aluru has designed a new algorithm that avoids any restrictions on particle distribution and yields the same complexity as the Greengard or Appel methods. It also permits more accurate error estimation. The work is expected to have significant impact on future research in N-body methods. It has also opened up a variety of new problems in the field of N-body methods and a possibility of applying computational geometric techniques to solve the N-body problem.
Collaboration with Solid-State Physics
Charles Shorb and David Turner identified the computational kernel taking the bulk of the time on the solid state physics simulations being run on the Intel Paragon at the SCL. By applying loop unrolling and hand-tuned assembler, the kernel was made over four times faster, with commensurate effects on the overall program speed. Considerable use was made of the parallel and pipeline facilities within the Intel i860/XP processor that are not exploited by the compiler.
Applications of Parallel Exhaustive Search
Mulupuru and Gustafson have explored different uses of the massively parallel search program template originally applied to reducing the algebraic complexity of small expressions. Each use of the program involves crafting a different set of "pruning rules," rules that eliminate large parts of the search tree. They can be rigorous in that they remove only redundant parts of the search, or "aggressive" in that they remove parts of the search that seem to have low probability of success. The latter type trades provable optimality for increased possibility of discovery of an improved method, and is the reason for the discovery of the 5-multiplication cross product in 1993.
SLALOM
SLALOM is a hierarchical radiosity rendering program. Unlike other radiosity rendering programs, SLALOM puts an upper and lower bound on the radiosity of a given area, and tightens that bound with further computation. Other approaches to the radiosity problem simply estimate the radiosity of a given area and accept that answer so long as it produces a "pretty picture." Our approach guarantees the correctness of the answer, whereas other methods can easily be made to fail catastrophically on some geometries, no matter how much computing effort is used. We have currently made some breakthroughs on this project that allow the program to run in much less time and produce a much higher quality answer.
Ray Tracing
Gustafson and Sankaranarayanan have developed a ray tracing rendering program which does backward ray tracing (from the viewer) that accounts for polarization induced by ordinary reflection and refraction. We have confirmed our suspicion that the appearance of everyday objects is profoundly affected by the elliptical polarization of light from specular reflection. This is an unexpected result for an area thought to have been studied exhaustively by the graphics community for two decades.
Parallel Languages and Program Refinement Techniques
Don Heller joined the SCL in April of 1994 and is in the process of converting existing Fortran 77 programs to Fortran 90 and High Performance Fortran with the aid of a program rewrite system he is developing. It is recognized that fully automatic translation of F77 to F90 is not yet possible, so he is developing a method of rewriting the program to a higher level notation that is translated to any one of the target Fortran languages. This enables testing to be carried forward with existing systems before final delivery of HPF compilers and while the conversion is half finished. A preliminary version has been written and tested with the petroleum reservoir simulator UTCOMP from the University of Texas Petroleum engineering Department and Fortran D, a precursor of HPF from the Rice University Computer Science Department.
Technical Progress Expected In FY 1995
Radar Cross-Section by Hierarchical Methods
We are attempting to hire a post doc who will target the Radar Cross Section (RCS) problem. We plan to investigate the application of the hierarchical solvers developed at the SCL to the RCS problem. The current approaches to RCS involve very wasteful use of supercomputers to solve dense systems of equations with over 100,000 unknown quantities. The RCS community has rejected more efficient methods, possibly because of a quality-speed tradeoff. The SCL methods appear capable of simultaneously improving the validity of the answer and reducing the execution time by several orders of magnitude. We will renew ties with the defense aerospace researchers that work in this area to better understand both technical and nontechnical reasons why the older methods are still in use.
MPP Compiler Optimization
The MPP search program that produces provably optimal algebraic procedures for computing expressions can be easily modified to treat the repertoire of a computer instruction set instead of simply addition, subtraction, and multiplication. We hope to use MPP to optimize blocks of code consisting of about 10 operations at a time, and compare the very aggressive code thus produced with the code produced using commercially-available optimizing compilers. Early experiments indicate that a doubling of speed over the best commercial compilers is not difficult to achieve by this method.
Computational Fluid Dynamics
Steve Heistand has made progress on a CFD program that will work for both sub-sonic and super-sonic flow speeds. This program will run on our parallel machines, currently targeting the Paragon and the SGI Power Onyx. In its final form it will have built in grid adaption and domain decomposition which will be done in parallel as well. With the departure of Susan Ying, we will wind down our effort in CFD and use Heistand to help Don Heller with his tools for the creation and maintenance of large portable parallel application programs.
Three-body Interactions
We believe we have an approach to hierarchical methods involving interacting triples instead of pairs. This means that force calculations for the N-body problem that involve directional chemical bonds will be amenable to hierarchical methods and experience a dramatic reduction in computational complexity. In the radiation transfer application, the use of triples has been successfully demonstrated for nondiffuse surface reflectivities. We will attempt to extend the SLALOM algorithm to surfaces of non-diffuse reflective properties, and extend Greengard's N-body method to 3-body interactions. When the forward ray-tracing program is complete, we will be able to do fair comparisons of the two completely different methods for full Rendering Equation solution: Hierarchical and Monte Carlo.
Ray Tracing
We will change the polarized light ray tracer to do forward ray tracing. Because the output of forward ray tracers involves complete surface emittance properties, the rendering from different viewpoints can be done without recalculation. The program will thus be capable of rendering images of high quality at very high speeds for animation and virtual reality applications. The Monte Carlo nature of the ray tracer will allow the comparison of convergence properties to hierarchical methods, as described above.
Parallel Languages and Program Refinement Techniques
Certain idioms arise with sufficient frequency that a compact representation of the numerical method can be used to generate a variety of implementations on parallel systems. This approach will be used to develop templates and compact programs for petroleum reservoir simulation.
Technical Progress Expected In FY 1996
"Immortal Code"
The work on "immortal code" described under Performance Analysis involves the creation of example algorithms. These algorithms are more than examples, however. When produced under the restrictions of the "three-stage contract" model that involves the end-user, application programmer, compiler/system software designer, and hardware engineer, the result is a formidable mathematical accomplishment. We plan to produce several examples that solve classical problems such as equation solving or root finding, each of which will be a compendium of everything that is known about solving that problem with various algorithms on different computer architectures for specified dimensions of performance.
Hierarchical Algorithms
Work in hierarchical algorithms will continue through 1995. We hope to apply them to ab initio quantum chemistry calculations, where order N4 complexity has long limited the size of the molecules that can be studied with first-principles accuracy.
Optimization
Research in optimization will likely proceed in several directions: provably optimal circuit design and a massively parallel C or Fortran program optimizer are likely areas for activity, and we will continue to look for pruning rules that will bring larger logic problems and algebraic optimizations (like Strassen multiplication) within reach.
Informational Software Tools
A project will be initiated to develop advanced software tools to access large multimedia datasets. The project will focus on the design and implementation of sophisticated yet user friendly navigational and indexing tools. One other task in this project will be devoted to implementing tools to categorize and access individual components of information systems according to current library standards such as Z39.50.
Technical Progress Expected In FY 1997
Return to Publications page.
This page prepared by Maria E. Blanco.
The URL for this document is http://www.scl.ameslab.gov
Revised