Parallel Graph & Mesh Partitioning

Discussions about the routines in ParMETIS

Hi

Just unpacked , finished the editing of makefile.inc and have started the execution of the examples from the /GRPAHS directory. They are working fine, but I am trying to put a 4960X4960 Harwell Boeing matrix as the input. Till now it seems that none of the examples can process it - mostly giving broken pipes.

Any idea if Harwell Boeing matrices are supported here and if yes then how?

PS: I am using a Sun Blade 150 cluster with OS Solaris 10.

Any help will be greatly appreciated.

Regards
Pinaki R Das

Hi

I've used ParMetis to partition and distribute a graph. I'm then modifying the graph along the way and want to rebalance it along the processes.

So, I rebuild the vtxdist, xadj, and adjncy arrays and things work fine when I call ParMETIS_V3_PartKway (). I don't care about weights so I'm keeping the weight arrays/parameters as NULL and 0 where needed.

That works fine.

If I try using ParMETIS_V3_AdaptiveRepart () however [using the same inputs and NULL for vsize, 1000 for itr), the function dies with a core dump and no error message. Tried setting vsize to contain 1's, random numbers, id's --- no difference. What confounds me is things work (properly) for _V3_PartKway with the same input!

I must be doing something weird with the inputs, but can't figure it out. Please help.

Thanks,
Lee.

Dear Sir or Madame,

The problem is as follows: I set nparts to number of processors and everything works. But when I set it to different number, I get the following mistake:
p0_15533: p4_error: interrupt SIGSEGV: 11

I thought that it's probably my mistake, but when I changed mtest.c file this way:
nparts = npes+3;
I got the same mistake.

Could you please tell me if this is an error in ParMetis library, or I should try to find my own fault here?

Sincerely,
Igor

P.S.: I'm using ParMetis-3.1 and mpich-1.2.7p1 for Linux.

I sometimes see dead-lock in parMetis using MPICH 1.2.5. I wondered if anyone else has experienced this..?
I can't seem to get consistent failures so it's difficult to track down. From what I see parMetis
"exchanges" information between processes which share a decomposed graph boundary. These exchanges are of the form:

for number of neighbours
post asynchronous recvs

fill send buffers

for number of neighbours
post asychronous sends

wait for sends/recvs to complete

In theory this should be dead-lock free so I suspect the issue is with MPICH and not parMetis.