BDMPI - Big Data Message Passing Interface  Release 0.1
API Documentation

BDMPI programs are message-passing distributed memory parallel programs written using a subset of MPI's API. As such, BDMPI program development is nearly identical to developing MPI-based parallel programs. There are many online resources providing guidelines and documentation for developing MPI-based programs, all of which apply for BDMPI as well.

Though it is beyond the scope of this documentation to provide a tutorial on how to develop MPI programs, the following is a list of items that anybody using BDMPI should be aware off:

  • BDMPI supports programs written in C and C++.
    • Our own testing was done using C-based BDMPI programs, and as such the C++ support has not been tested.
  • All BDMPI programs must include the mpi.h or bdmpi.h header file and must call MPI_Init() and MPI_Finalize() before and after finishing with their work.
    • Ideally, MPI_Init() and MPI_Finalize() should be the first and last functions called in the main(), respectively.
  • All BDMPI programs should exit from their main() function by returning EXIT_SUCCESS.

MPI functions supported by BDMPI

BDMPI implements a subset of the MPI specification that includes functions for querying and creating communicators and for performing point-to-point and collective communications. For each of these functions, BDMPI provides a variant that starts with the MPI_ prefix and a variant that starts with the BDMPI_ prefix. The calling sequence of the first variant is identical to the MPI specification whereas the calling sequence of the second variant has been modified to make it 64-bit compliant (e.g., replaced most of the sizes that MPI assumed that were int to either size_t or ssize_t).

Since the calling sequence of these functions is the same as that specified in MPI's specification (available at http://www.mpi-forum.org) it is not included here.

Differences with the MPI specification

  • BDMPI's error checking and error reporting are significantly less robust than what the standard requires.
  • In all collective operations, the source and destination buffers of the operations are allowed to overlap (or be identical). In such cases, the operations will complete correctly and the new data will overwrite the old data.

BDMPI-specific functions

BDMPI provides a small set of additional functions that an application can use to improve its out-of-core execution performance and get information about how the different MPI slave processes are distributed among the nodes. Since these functions are not part of the MPI specification, their names only start with the BDMPI_ prefix.

These functions are organized in three main groups that are described in the following subsections.

Communicator-related functions

Since BDMPI organizes the MPI processes into groups of slave processes, each running on a different node, it is sometimes beneficial for the application to know how many slaves within a communicator are running on the same node and the total number of nodes that is involved. To achieve this, BDMPI provides a set of functions that can be used to interrogate each communicator in order to obtain information that relates to the number of slaves per node, the rank of a process among the other processes in its own node, the number of nodes, and their ranks.

In addition, BDMPI defines some additional predefined communicators that are used to describe the processes of each node. These are described in Predefined communicators.

Intra-slave synchronization

On a system with a quad core processor it may be reasonable to allow 3-4 slaves to run concurrently. However, if that system's I/O subsystem consisted of only a single disk, then in order to prevent I/O contention, it may be beneficial to limit the number of slaves that perform I/O. To achieve such a synchronization, BDMPI provides a set of functions that can be used to implement mutex-like synchronization among the processes running on the same slave node. The number of slave processes that can be at a critical section concurrently is controlled by the -nc option of bdmprun (Options of bdmprun).

Storage-backed memory allocations

BDMPI exposes various functions that relate to its sbmalloc subsystem (Efficient loading & saving of a process's address space). An application can use these functions to explicitly allocate sbmalloc-handled memory and to also force loading/saving of memory regions that were previously allocated by sbmalloc (see the discussion in Issues related to sbmalloc).


Predefined communicators

The following are the communicators that are predefined in BDMPI. Note that both the ones with the MPI_ and the BDMPI_ prefix are defined.

MPI Name BDMPI Name Description
MPI_COMM_WORLD BDMPI_COMM_WORLD Contains all processes
MPI_COMM_SELF BDMPI_COMM_SELF Contains only the calling process
MPI_COMM_NULL BDMPI_COMM_NULL A null group
N/A BDMPI_COMM_NODE Contains all processes in a node
N/A BDMPI_COMM_CWORLD Contains all processes in cyclic order

The last two communicators are BDMPI specific. BDMPI_COMM_NODE is used to describe the group of slave processes that were spawned by the same master process (i.e., bdmprun). Unless multiple instances of bdmprun was started on a single node, these processes will be the ones running on a node for the program.

The ranks of the processes in the MPI_COMM/WORLD/BDMPI_COMM_WORLD communicator are ordered in increasing order based on the rank of the hosts specified in mpiexec's hostfile and the number of slaves spawned by bdmprun. For example, the execution of the helloworld program (see Running BDMPI Programs):

mpiexec -hostfile ~/machines -np 3 bdmprun -ns 4 build/Linux-x86_64/test/helloworld

with the hostfile:

bd1-umh:1
bd2-umh:1
bd3-umh:1
bd4-umh:1

will create an MPI_COMM_WORLD/BDMPI_COMM_WORLD communicator in which the processes with ranks 0–3 are the slave processes on bd1-umh, ranks 4–7 are the slave processes on bd2-umh, and ranks 8–11 are the slave processes on bd3-umh. The BDMPI_COMM_CWORLD communicator also includes all processes but assignes ranks in a cyclic fashion based on the hosts specified in mpiexec's hostfile. For example, the ranks of the four slaves on bd1-umh for the above helloworld execution will be 0, 3, 6, and 9; on bd2-umh will be 1, 4, 7, and 10; and on bd3-umh will be 2, 5, 8, and 11.


Supported datatypes

The following are the datatypes that are predefined in BDMPI. Note that both the ones with the MPI_ and the BDMPI_ prefix are defined.

MPI Name BDMPI Name C equivalent
MPI_CHAR BDMPI_CHAR char
MPI_SIGNED_CHAR BDMPI_SIGNED_CHAR signed char
MPI_UNSIGNED_CHAR BDMPI_UNSIGNED_CHAR unsigned char
MPI_BYTE BDMPI_BYTE unsigned char
MPI_WCHAR BDMPI_WCHAR wchar_t
MPI_SHORT BDMPI_SHORT short
MPI_UNSIGNED_SHORT BDMPI_UNSIGNED_SHORT unsigned short
MPI_INT BDMPI_INT int
MPI_UNSIGNED BDMPI_UNSIGNED unsigned int
MPI_LONG BDMPI_LONG long
MPI_UNSIGNED_LONG BDMPI_UNSIGNED_LONG unsigned long
MPI_LONG_LONG_INT BDMPI_LONG_LONG_INT long long int
MPI_UNSIGNED_LONG_LONG BDMPI_UNSIGNED_LONG_LONG unsigned long long int
MPI_INT8_T BDMPI_INT8_T int8_t
MPI_UINT8_T BDMPI_UINT8_T uint8_t
MPI_INT16_T BDMPI_INT16_T int16_t
MPI_UINT16_T BDMPI_UINT16_T uint16_t
MPI_INT32_T BDMPI_INT32_T int32_t
MPI_UINT32_T BDMPI_UINT32_T uint32_t
MPI_INT64_T BDMPI_INT64_T int64_t
MPI_UINT64_T BDMPI_UINT64_T uint64_t
MPI_SIZE_T BDMPI_SIZE_T size_t
MPI_SSIZE_T BDMPI_SSIZE_T ssize_t
MPI_FLOAT BDMPI_FLOAT float
MPI_DOUBLE BDMPI_DOUBLE double
MPI_FLOAT_INT BDMPI_FLOAT_INT struct {float, int}
MPI_DOUBLE_INT BDMPI_DOUBLE_INT struct {double, int}
MPI_LONG_INT BDMPI_LONG_INT struct {long, int}
MPI_SHORT_INT BDMPI_SHORT_INT struct {short, int}
MPI_2INT BDMPI_2INT struct {int, int}

Supported reduction operations

The following are the reduction operations that are supported by BDMPI. Note that both the ones with the MPI_ and the BDMPI_ prefix are defined.

MPI Name BDMPI Name Description
MPI_OP_NULL BDMPI_OP_NULL null
MPI_MAX BDMPI_MAX max reduction
MPI_MIN BDMPI_MIN min reduction
MPI_SUM BDMPI_SUM sum reduction
MPI_PROD BDMPI_PROD prod reduction
MPI_LAND BDMPI_LAND logical and reduction
MPI_BAND BDMPI_BAND boolean and reduction
MPI_LOR BDMPI_LOR logical or reduction
MPI_BOR BDMPI_BOR boolean or reduction
MPI_LXOR BDMPI_LXOR logical xor reduction
MPI_BXOR BDMPI_BXOR boolean xor reduction
MPI_MAXLOC BDMPI_MAXLOC max value and location reduction
MPI_MINLOC BDMPI_MINLOC min value and location reduction