Dynamic Repartitioning of Adaptively Refined Meshes

Kirk Schloegel, George Karypis, and Vipin Kumar
Supercomputing, 1998
Download Paper
Abstract
One ingredient which is viewed as vital to the successful conduct of many large-scale numerical simulations is the ability to dynamically repartition the underlying adaptive finite element mesh among the processors so that the computations are balanced and interprocessor communication is minimized. This requires that a sequence of partitions of the computational mesh be computed during the course of the computation in which the amount of data migration necessary to realize subsequent partitions is minimized, while all of the domains of a given partition contain a roughly equal amount of computational weight. Recently, parallel multilevel graph repartitioning techniques have been developed that can quickly compute high-quality repartitions for adaptive and dynamic meshes while minimizing the amount of data which needs to be migrated between processors. These algorithms can be categorized as either schemes which compute a new partition from scratch and then intelligently remap this partition to the original partition (hereafter referred to as scratch-remap schemes), or multilevel diffusion schemes. Scratch-remap schemes work quite well for graphs which are highly imbalanced in localized areas. On slightly to moderately imbalanced graphs and those in which imbalance occurs globally throughout the graph, however, they result in excessive vertex migration compared to multilevel diffusion algorithms. On the other hand, diffusion-based schemes work well for slightly imbalanced graphs and for those in which imbalance occurs globally throughout the graph. However, these schemes perform poorly on graphs that are highly imbalanced in localized areas, as the propagation of diffusion over long distances results in excessive edge-cut and vertex migration results. In this paper, we present two new schemes for adaptive repartitioning: Locally-Matched Multilevel Scratch-Remap (or LMSR) and Wavefront Diffusion. The LMSR scheme performs purely local coarsening and partition remapping in a multilevel context. In Wavefront Diffusion, the flow of vertices move in a wavefront from overbalanced to underbalanced domains. We present experimental evaluations of our LMSR and Wavefront Diffusion algorithms on synthetically generated adaptive meshes as well as on some application meshes. We show that our LMSR algorithm decreases the amount of vertex migration required to balance the graph and produces repartitionings of similar quality compared to state-of-the-art scratch-remap schemes. Furthermore, we show that our LMSR algorithm is more scalable in terms of execution time compared to state-of-the-art scratch-remap schemes. We show that our Wavefront Diffusion algorithm obtains significantly lower vertex migration requirements, while maintaining similar edge-cut results compared to state-of-the-art multilevel diffusion algorithms, especially for highly imbalanced graphs. Furthermore, we compare Wavefront Diffusion with LMSR and show that the former will result in lower vertex migration requirements and the later will result in higher quality edge-cut results. These results hold true regardless of the distance which diffusion is required to propagate in order to balance the graph. Finally, we discuss the run times of our schemes which are both capable of repartitioning an eight million node graph in under three seconds on a 128-processor Cray T3E.
Research topics: Graph partitioning | Parallel processing | ParMETIS