Distributed imaging¶
Multiple nodes can be used to speed up the imaging by running wsclean in
distributed mode, which makes use of MPI to parallelize the imaging over
multiple nodes. If MPI is found during compilation, the compiler will produce
a wsclean-mp binary which can exploit parallelism across multiple compute
nodes. This executable supports the same command-line options as
the regular wsclean executable, but will instead perform
parallel gridding using multiple MPI processes.
A typical run:
mpirun --hostfile host_file -np 8 wsclean-mp -size 10000 10000 ...
And a host-file could look like this:
node100 slots=1
node116 slots=1
node118 slots=1
node119 slots=1
node122 slots=1
node123 slots=1
node124 slots=1
node125 slots=1
node126 slots=1
node127 slots=1
node128 slots=1
node129 slots=1
node130 slots=1
The host file should specify one slot per host, otherwise multiple wsclean’s are executed on the same host, and that has (normally) no benefit, and would actually make those processes compete for memory and cpu. If you use wsclean-mp, all paths should be absolute for the hosts participating, and the paths should be reachable by all nodes (so -name should specify an absolute path name, and the input files should have an absolute path name).
wsclean-mp will distribute the different channels to different nodes. This implies that if you don’t use -channels-out there’s no benefit, whereas using -channels-out 8 with -np 8 gives you a speed-up of 8. If multiple output channels are not necessary for your science goal, one can use -fit-spectral-pol 1 -deconvolution-channels 1.
The multi-threading arguments (as described in the
parallelization section)
apply to each MPI process.
For example, when using -j 4, each MPI process will use 4 threads.
Note that WSClean only uses MPI during gridding. Other parts, such as deconvolution and reordering, only use the main MPI process and therefore only use multi-threading.
When using a single compute node, using MPI is discouraged, since multi-threaded
gridding using the -parallel-gridding is more efficient. When using multiple
nodes, the best performance is normally achieved using one MPI process with
multiple parallel gridders per node. The total number of threads per node
(-j option) should equal to the number of cores per node,
which is the default setting.
Multi-node processing with facets¶
When the MPI method described above is applied to facetted imaging, the
distribution over nodes is done both over facets and over output channels. This
is often not efficient for facetted imaging.
To improve this, it is recommended to use the “shared reads and writes” option.
This is described in the
facet-based imaging section. There’s also an
option -compound-tasks that makes sure to send the facets from the same
channel to the same node. Generally, using shared reads and writes is
the faster approach, though there may be circumstances where the use of
compound tasks is faster.
Pinning work to nodes¶
The -channel-to-node argument specifies the mapping of output channels to
node. For example, -channel-to-node 0,0,1,2 schedules the tasks for
output channel indices 0, 1, 2, 3 at compute nodes 0, 0, 1, 2, respectively.
The length of the list must equal the number of output channels
(-channels-out argument).
If -no-work-on-master is specified, the list may not contain 0.
By default, WSClean uses a round-robin distribution of output channels to nodes, because later channels with a higher frequency are often more expensive to grid. For example, with 10 output channels, 4 nodes, and an active main node, the mapping is 0,1,2,3,0,1,2,3,0,1.