SCL Cluster Cookbook|
"More is better" almost always applies to cache. Running a problem out of main memory will cause quite a slowdown in performance. This effect is visibly shown by the Scalable Computing Lab's HINT benchmark as every system's performance curve tails off as the problem size grows beyond the size of the system's cache.
Accesses to DRAM are extremely slow compared to the speed of the processor, taking up to orders of magnitude more time than a single CPU clock cycle. Caches are used to keep recently-used blocks of memory for very fast access if the CPU references a word from that block again. However, the very fast memory used for cache is expensive and cache control circuitry becomes more complex as the size of the cache grows. Because of these limitations, the total size of a cache is usually in the range of 8KB to 2MB.
Balancing speed limitations of larger caches against the fact that a high percentage of memory accesses are to a small area of memory, caches are often divided into levels that grow larger and slower. Level 1 (or L1) cache is often a small (8KB to 64KB), very fast cache built into the CPU itself that can very quickly satisfy the majority of memory references. Level 2 (L2) is a much larger but slightly slower cache physically located outside the CPU chip that can satisfy the majority of references that miss the L1 cache. When the L2 cache is physically separate from the CPU itself, it can usually be sized to match a system's need.
With the Pentium Pro and Pentium II CPU's, however, Intel attaches the L2 cache directly to the processor. Thus the decision on the size of the L2 cache is determined by selecting the processor. The Pentium Pro is offered with a variety of L2 cache sizes (256KB to 1MB) and the Pentium II is offered with just a 512KB L2 cache. The Pentium II's L2 cache is slower than the Pentium Pro's, which slightly offsets the Pentium II's performance improvements over the Pentium Pro.
Figure 1 below shows the effect of cache on performance. Using data obtained from Ames Lab's HINT Benchmark, the graph clearly illustrates the points where the L1 cache (the fastest cache) runs out at 8KB for the Pentium Pro and 16KB for the Pentium II. In the range where L2 cache is active, the Pentium Pro has a slight performance advantage despite the Pentium II's faster clock rate. Only when HINT exceeds the size of the Pentium Pro's L2 cache does the Pentium II regain a performance edge. Possibly due to differences in memory technologies (FPM or EDO vs. SDRAM) or chipset design, the benchmarked Pentium Pro looses more performance than the Pentium II does when going to main memory.
Figure 1. HINT Comparison of Pentium Pro and Pentium II