Segfault using Parmetis

Hello,

I am using parmetris for partitioning tetrahedral grids via:


ParMETIS V3 PartKway (idxtype *vtxdist, idxtype *xadj, idxtype *adjncy, idxtype *vwgt, idxtype *adjwgt,
int *wgt?ag, int *num?ag, int *ncon, int *nparts, ?oat *tpwgts, ?oat *ubvec,
int *options, int *edgecut, idxtype *part, MPI Comm *comm);

Grids with up to 13000 elements (tetraeders) are decomposed the way it should be, larger grids however result in a segfault:


[~-devel:20359] *** Process received signal ***
[~-devel:20359] Signal: Segmentation fault (11)
[~-devel:20359] Signal code: Address not mapped (1)
[~-devel:20359] Failing at address: 0x48ee
[~-devel:20359] [ 0] [0xffffe600]
[~-devel:20359] [ 1] /home/ckonrad/lib/libparmetis.so.3.1(ParMETIS_V3_PartMeshKway+0x189) [0xf7fb6789]
[~-devel:20359] [ 2] parpart(main+0x6fd) [0x804e3cd]
[~-devel:20359] [ 3] /lib/libc.so.6(__libc_start_main+0xe0) [0xa55f70]
[~-devel:20359] [ 4] parpart [0x804dbc1]
[~-devel:20359] *** End of error message ***

I have really no idea how this comes. For this reason I will show how I call this function:


ParMETIS_V3_PartMeshKway( elmdist, eptr, eind, NULL,
&wgtflag, &numflag, &ncon, &ncommonnodes, &nparts,
tpwgts, &ubvec, options,
&edgecut, part, & comworld );

I suppose elmdist, eptr, eind to be correct. Maybe I am doing a fault with the weights. I want to have equally sized partitions. It seems that the documentation of this function is copy-and-pasted from the corresponding Graph partitioning routine, because for example the description of the wgtflag references parameter names that do not appear in the definition of the PartMeshKway.

I set: wgtflag = 0, numflag = 0; ncon = 0; ncommonnodes = 2; nparts = (between 2 and 50); tpwgt = [1/nparts ... 1/nparts] (nparts elements)
ubvec = 1.05 (1 element), options[0] = 0; options[1], options[2] are abitrary

Thank you for any hints,

Christian

RE: nice

Great Resource.function is copy-and-pasted from the corresponding Graph partitioning routine, because for example the description of the wgtflag references parameter names that do not appear in the definition of the PartMeshKway.auto air conditioning

RE: Does a successful mesh2dual

Does a successful mesh2dual call tell me that my file is okay or is it possible that it might produce some kind of graph that also contains errors?

------------------------------------
search engine optimisation research

RE: Can you open an issue on the

Can you open an issue on the flyspray issue tracking system.

thanks.

RE: even MashToDual segfaults

Hello,

I used now MeshToDual to convert my mesh to a graph as a first step. The mesh-partitioning function converts the mesh in a first step to a graph and for the fact that I get also a seg_fault when I call the conversion function manually there has to be sth wrong with the parameters I use for calling the mesh-partitioning routine.


ParMETIS_V3_Mesh2Dual(elmdist, eptr, eind, &numflag,
& ncommonnodes, &r1, &r2, & comworld);

The example that crashes has 19589 tetraedra. Hence elmdist computes to: [0 4897 9794 14691 19589] which seems okay.
eptr = [0 4 8 12 ...4*4897] for the first processor (tetraeders have 4 nodes). The eind array is of dimension 4*4897 for the first processor and stores onwardly the 4 coordinates of the first 4897 tetrahedra.

Again numflag = 0, ncommonnodes = 2, r1 and r2 are data arrays for the result and comworld the communicator.
Do you have any ideas what I am doing wrong?

thank you, chris

RE: Does a successful mesh2dual

Does a successful mesh2dual call tell me that my file is okay or is it possible that it might produce betsson some kind of graph that also contains errors?

RE: Can you try the command-line

Can you try the command-line version metis's mesh2dual on the mesh to make sure that there are no "bugs" with the mesh itself.

RE: proceeding

Hello,

I tried mesh2dual and in fact I got a Segfault. This is due to the fact that my meshes contain nodes numbers starting from 0 and not from 1. I modified the meshes, that is I added one to each node and mesh2dual is able to work on all my mesh files now.
I did the same with the input for parmetis. However this didn't help. I have the same behavior as before.

Does a successful mesh2dual call tell me that my file is okay or is it possible that it might produce some kind of graph that also contains errors?

I now tried ParMETIS_V3_Mesh2Dual to my modified meshes with the same result that I get in the partitioning a Segfault...
Is there some tool that checks the mesh file? If I apply the process:
* renumbering from [0 to N] to [1 to N+1]
* mesh2dual
* graphchk

then I get that the resulting graph file is correct. I am puzzled.

chris

RE: partly

Hello,

I executed the code for the convertion to a graph like that:


ParMETIS_V3_Mesh2Dual(...);
cout << world_id << ": done" << endl;

There is only one processor that crashes, all the others work fine and display the done message. Maybe this is a hint that it has to do with the writing to result values?

chris

RE: Source code and example files

Hello,

I generated a zip file containing these files:

* par.cpp (my source code)
* Makefile
* 3000.ele
* 5000.ele

You can reproduce my problem with that. Just 'make' and call:

mpirun -c 4 ./par_part 3000.ele (this will work)
mpirun -c 4 ./par_part 5000.ele (this wont)

The zip file is available at: http://home.in.tum.de/~konrad/par.zip

Thanks, chris