Parallel Multilevel Diffusion Algorithms for Repartitioning of Adaptive Meshes

Kirk Schloegel, George Karypis, and Vipin Kumar
UMN CS #97-014, 1997
Download Paper
Graph partitioning has been shown to be an effective way to divide a large computation over an arbitrary number of processors. A good partitioning can ensure load balance and minimize the communication overhead of the computation by partitioning an irregular mesh into $p$ equal parts while minimizing the number of edges cut by the partition. For a large class of irregular mesh applications, the structure of the graph changes from one phase of the computation to the next. Eventually, as the graph evolves, the adapted mesh has to be repartitioned to ensure good load balance. Failure to do so will lead to higher parallel run time. This repartitioning needs to maintain a low edge-cut in order to minimize communication overhead in the follow-on computation. It also needs to minimize the time for physically migrating data from one processor to another since this time can dominate overall run time. Finally, it must be fast and scalable since it may be necessary to repartition frequently. Partitioning the adapted mesh again from scratch with an existing graph partitioner can be done quickly and will result in a low edge-cut. However, it will lead to an excessive migration of data among processors. In this paper, we present new parallel algorithms for robustly computing repartitionings of adaptively refined meshes. These algorithms perform diffusion of vertices in a multilevel framework and minimize data movement without compromising the edge-cut. Furthermore, our parallel repartitioners include parameterized heuristics to specifically optimize edge-cut, total data migration, or the maximum amount of data migrated into and out of any one processor. Our results on a variety of synthetic meshes show that our parallel multilevel diffusion algorithms are highly robust schemes for repartitioning adaptive meshes. The resulting edge-cuts are close to those resulting from partitioning from scratch with a state-of-the-art graph partitioner, while data migration is substantially reduced. Furthermore, repartitioning can be done very fast. Our experiments show that meshes with around eight million vertices can be repartitioned on a 256-processor Cray T3D in only a couple of seconds.
Research topics: Graph partitioning | Parallel processing | ParMETIS | Scientific computing