MPICluster
Class#
An MPI cluster is a set of machines that work together to solve a problem. These machines are configured to communicate with each other using the Message Passing Interface (MPI) protocol.
With the MPICluster
class, you can create a dedicated MPI cluster on-demand that
is configured automatically to run your simulations in a distributed manner across
the machines.
To instantiate an MPICluster
object the following parameters can be configured:
the
machine_type
defines the type of CPU used for each machine. This parameter follows the naming convention set by Google Cloud, e.g.,c2-standard-16
. This convention is composed of a prefix that defines the CPU series, a suffix that sets the number of virtual CPUs (vCPU) per machine and the middle word refers to the level of RAM per vCPU. In the example,c2
refers to an Intel Xeon Scalable processor of 2nd generation,standard
means 4 GB of RAM per vCPU and will contain16
vCPUs. Currently, this is the list of available machine types available via the APIthe
num_machines
sets the number of machines available in the cluster. While the computational resource is active, these machines will be reserved for the user.the
data_disk_gb
defines the size of the NFS partition that is mounted on the head node and shared with the worker nodes. This partition is used to store the input and output files of the simulations.the
max_idle_time
determines the time a machine group can remain idle (without receiving any task) before it is terminated.the
auto_terminate_ts
defines the moment in time in which the resource will be automatically terminated, even if there are tasks still running.
For example, the following code creates an MPICluster with 2 machines of type
c2-standard-30
:
import inductiva
mpi_cluster = inductiva.resources.MPICluster(
machine_type="c2-standard-30",
num_machines=2,
data_disk_gb=100)
Creating an instance of MPICluster
does not start the machines. This only registers
the configuration on the API which can now be used to manage the cluster further.
Managing the MPI Cluster#
With your mpi_cluster
object ready, starting the cluster is as simple as calling mpi_cluster.start()
.
Within a few minutes, the machines will be set up and ready to collaborate
on running simulations. At any moment, you can check an estimate of the price
per hour of the cluster with mpi_cluster.estimate_cloud_cost()
and when you
have finished you can terminate it with mpi_cluster.terminate()
. Running
simulations will be killed and from this point, the mpi_cluster
object cannot
be re-used.
To simplify the workflow, the last two functions can also be performed via the CLI.
First, you can check the cost of the cluster by selecting the machine type and the number of machines you wish to use:
$ inductiva resources cost c2-standard-4 -n 4
Estimated total cost (per machine): 0.919 (0.230) $/h.
When you don’t need the MPI cluster anymore, you can easily kill it with the name:
$ inductiva resources terminate api-agn23rtnv0qnfn03nv93nc
MPI cluster on demand without any hassle.