MPI in C++ and clean

June 23, 2015

Introduction

Motivation

  • Support parallel imaging under control of a single iteration control object.
    • cube imaging
    • efficiency
  • Explore design for introduction of MPI into the CASA C++ layer within the mpi4casa framework.

MPI C++/framework interface

Preferred design

  • Task-specific, sub-cluster-based MPI communicators provided by framework.
    • isolation of MPI communication for tasks
    • supports use of any MPI routines by tools
    • limited requirement for knowledge of run-time environment by tools
    • relies on framework for creation of communicators for tasks
    • resource allocation done by framework, not tasks
    • common design for MPI codes

Considerations

  • Communicator lifecycle management
    • Framework vs task responsibilities
  • Task-specific communicator vs cluster-wide singleton
  • MPI-2 vs MPI-3
  • Communicator values in mpi4py & SWIG

Framework outline (possible)

  1. create a (common) communicator for the task,
  2. call the task, with communicator, and wait for the result, and
  3. free the task-specific communicator.

TaskComm

from mpi4casa.MPIInterface import MPIInterface
from MPIEnvironment import MPIEnvironment

class TaskComm (object):

    def __init__(self, name, local):
        self.__name = name
        self.__local = local
        return

    def __del__(self):
        self.free()
        return

    def free(self):
        if self.__local and self.__local != MPIEnvironment.mpi_comm_world:
            MPIInterface.odo('if %s: %s.Free()' % (self.__name, self.__name),
                             MPIEnvironment.mpi_server_rank_list())
            self.__local.Free()
            self.__name = None
            self.__local = None
        return

    def create_group(self, group, newname):
        myrank = self.__local.rank
        groupname = newname + '_group'
        for p in group:
            if p != myrank:
                MPIInterface.odo('%s = %s.group.Incl(%s)'
                                 % (groupname, self.__name, group))
                MPIInterface.odo('%s = %s.Create_group(%s)'
                                 % (newname, self.__name, groupname))
        if myrank not in group:
            local_comm = None
        else:
            local_comm = self.__local.Create_group(self.__local.group.Incl(group))
        return TaskComm(newname, local_comm)

MPI communicators in tclean

Component communicators

The imager components are assigned communicators as follows:

  • Normalizer: normalization_comm
  • Deconvolver: deconvolution_comm
  • Imager: imaging_comm
  • Iteration control: iteration_comm

Note that these communicators need not all be distinct.

Communicator instances

Assuming that a task communicator is provided (call it task_comm), the top-level object can create any of the following communicators:

  • worker_comm: created from the group of processes doing the computation
  • MPI_COMM_SELF: pre-defined by MPI
  • MPI_COMM_NULL: pre-defined by MPI

The current implementation of parclean can implement a variety of use cases by assigning any of these communicators to the component communicators.

Serial case

  • CASA built without MPI

    Use code alternatives when compiling with or without MPI (work around or avoid MPI calls, set rank to zero).

  • parallel run using only one CASA engine

    Assign MPI_COMM_SELF to all component communictors.

Parallel cases

Communicator assignments
component communicator continuum imaging (worker/client) cube imaging (worker/client) single process
normalization_comm worker/null self/null self
deconvolution_comm worker/null self/null self
imaging_comm self/null self/null self
iteration_comm task/task task/task self

Current components

  • Imager normalization and deconvolution components are not actually parallel, but communicators in continuum imaging use worker_comm?

    • Wrap the existing components to do work only in the rank 0 process of worker_comm.
    • Works in all cases, and sets the stage for parallelizing any of the current imager componenets.

Iteration control component

  • Reduction and broadcasts of iteration control records
  • Rank 0 is leader
  • Rank 0 on client node for GUI interaction when desired
  • New classes: DistSynthesisIterbot and DistSIIterBot