Playdoh is a pure Python library for distributing computations across the free computing units (CPUs and GPUs) available in a small network of multicore computers. Playdoh supports independent (embarassingly) parallel problems as well as loosely coupled tasks such as global optimizations, Monte Carlo simulations and numerical integration of partial differential equations. It is designed to be lightweight and easy to use and should be of interest to scientists wanting to turn their lab computers into a small cluster at no cost.
This user guide is an introduction to Playdoh. It shows how to distribute independent parallel tasks, how to distribute optimizations, and how to write loosely coupled parallel tasks.
See also
Reference documentation Playdoh Reference.
First of all, you should have Python 2.6 installed on your machine with Numpy 1.3 at least (Playdoh is currently not available for Python 3). Then, go to the download page and download the archive or the executable if you’re on Windows. Finally, install the package with the Windows executable or with the following command:
python setup.py install
You can also run in a console:
easy_install playdoh
The installation script requires the Python package setuptools so that the command-line tool included in Playdoh can be automatically installed. The setuptools package should be automatically installed when you install Playdoh. The installation script automatically installs the following tools: playdoh and playdoh_gui, which are a command-line tool and a GUI, respectively.
All scripts using Playdoh should start by importing the Playdoh package as follows:
from playdoh import *
Playdoh offers a parallel and distributed implementation of the map() function to quickly evaluate a single Python function against several sets of parameters, across several CPUs and machines.
The following example shows how to distribute the function y=x**2 across two CPUs:
from playdoh import *
# The function to parallelize
def fun(x):
return x**2
# This line is required on Windows, any call to a Playdoh function
# must be done after this line on this OS.
# See http://docs.python.org/library/multiprocessing.html#windows
if __name__ == '__main__':
# Execute ``fun(1)`` and ``fun(2)`` in parallel on two CPUs on this machine
# and return the result.
print map(fun, [1,2], cpu=2)
See also
Reference for map(), examples for Independent parallel problems.
Playdoh offers two functions minimize() and maximize() to quickly optimize a Python objective function (fitness) in parallel across several CPUs and several machines. Three optimization algorithms are available: the particle swarm optimization PSO, the covariance matrix adaptation evolution strategy CMAES, and an island genetic algorithm GA
The following example shows how to maximize a Gaussian function in one dimension across 2 CPUs:
from playdoh import *
import numpy
# The fitness function to maximize
def fun(x):
return numpy.exp(-x**2)
if __name__ == '__main__':
# Maximize the fitness function in parallel
results = maximize(fun,
popsize = 10000, # size of the population
maxiter = 10, # maximum number of iterations
cpu = 2, # number of CPUs to use on the local machine
x_initrange = [-10.,10.]) # initial interval for the ``x`` parameter
# Display the final results in a table
print_table(results)
See also
Reference for maximize(), minimize(), examples for Optimization.
Some computational tasks cannot be distributed using the independent parallel interface of Playdoh but require some communication between subtasks and the introduction of synchronisation points. Playdoh offers a simple programming interface to do this.
The following example shows an implementation of a numerical solver of a partial differential equation (the heat equation) in parallel across several CPUs. There are two steps.
The full script:
from playdoh import *
from numpy import *
from pylab import *
# Any task class must derive from the ParallelTask
class HeatSolver(ParallelTask):
def initialize(self, X, dx, dt, iterations):
# X is a matrix with the function values and the boundary values
# X must contain the borders of the neighbors ("overlapping Xs")
self.X = X
self.n = X.shape[0]
self.dx = dx
self.dt = dt
self.iterations = iterations
self.iteration = 0
def send_boundaries(self):
# Send boundaries of the grid to the neighbors
if 'left' in self.tubes_out:
self.push('left', self.X[:,1])
if 'right' in self.tubes_out:
self.push('right', self.X[:,-2])
def recv_boundaries(self):
# Receive boundaries of the grid from the neighbors
if 'right' in self.tubes_in:
self.X[:,0] = self.pop('right')
if 'left' in self.tubes_in:
self.X[:,-1] = self.pop('left')
def update_matrix(self):
# Implement the numerical scheme for the PDE
Xleft, Xright = self.X[1:-1,:-2], self.X[1:-1,2:]
Xtop, Xbottom = self.X[:-2,1:-1], self.X[2:,1:-1]
self.X[1:-1,1:-1] += self.dt*(Xleft+Xright+Xtop+Xbottom-4*self.X[1:-1,1:-1])/self.dx**2
def start(self):
# Run the numerical integration of the PDE
for self.iteration in xrange(self.iterations):
self.send_boundaries()
self.recv_boundaries()
self.update_matrix()
def get_result(self):
# Return the result
return self.X[1:-1,1:-1]
def heat2d(n, iterations, nodes):
# ``split`` is the grid size on each node, without the boundaries
split = [(n-2)*1.0/nodes for _ in xrange(nodes)]
split = array(split, dtype=int)
split[-1] = n-2-sum(split[:-1])
dx=2./n
dt = dx**2*.2
# y is a Dirac function at t=0
y = zeros((n,n))
y[n/2,n/2] = 1./dx**2
# Split y horizontally
split_y = []
j = 0
for i in xrange(nodes):
size = split[i]
split_y.append(y[:,j:j+size+2])
j += size
# Define a double linear topology
topology = []
for i in xrange(nodes-1):
topology.append(('right', i, i+1))
topology.append(('left', i+1, i))
# Start the task
task = start_task(HeatSolver, # name of the task class
cpu = nodes, # use ``nodes`` CPUs on the local machine
topology = topology,
args=(split_y, dx, dt, iterations)) # arguments of the ``initialize`` method
# Retrieve the result, as a list with one element returned by ``HeatSolver.get_result`` per node
result = task.get_result()
result = hstack(result)
return result
if __name__ == '__main__':
result = heat2d(50, 100, 2)
hot()
imshow(result)
show()
See also
Reference for start_task(), ParallelTask, examples for Loosely coupled parallel problems.
Any computer within your local Ethernet network can be used to run computations with Playdoh. First, Python and Playdoh must be installed. Then, the Playdoh server must run so that computations can be submitted to it. Finally, when you launch a task, you can specify the special keyword machines which is a list containing the IP addresses of the machines to use.
To launch the Playdoh server, you have two options.
Use the open_server() function to start the Playdoh server:
# Open the server on the default port, using 4 CPUs and 1 GPU
open_server(maxcpu=4, maxgpu=1)
You can close a server remotely using the close_servers() function:
# Close the bobs-machine.university.com server
close_servers(['bobs-machine.university.com'])
You can use the playdoh command-line tool:
# Open the server on the default port, using all CPUs and GPUs available
playdoh open
# Open the server with 2 CPUs and 1 GPU
playdoh open 2 CPUs 1 GPU
You can also close servers and allocate resources using this script.
See also
Reference Command line tool.
A single computer running the Playdoh server can be used py several clients in parallel to execute different tasks. The computers’ resources need to be shared among the clients. To do this, each client begins by allocating on the server the number of CPUs he wants for his own computation, among all the idle CPUs on this machine. You have three options.
Resource allocation can be done using a few functions defined in Playdoh, most notably get_available_resources() to get the resources available on a server, and request_resources() to allocate resources on a server:
# Get the available resources on the specified server
available_resources = get_available_resources('bobs-machine.university.com')
# Allocate 2 CPUs on the server
request_resources('bobs-machine.university.com', CPU=2)
See also
Resource allocation example Example: resources, reference Resource allocation.
Resource allocation can be done with a GUI included in Playdoh and which can be run with the command playdoh_gui.
The command-line tool also allows to allocate resources on servers:
# obtain the available resources on server 'bobs-machine.university.com'
playdoh get bobs-machine.university.com
# obtain all the allocated resources on the server
playdoh get bobs-machine.university.com all
# request 2 CPUs and 1 GPU for this client
playdoh request bobs-machine.university.com 2 CPUs 1 GPU
See also
Reference Command line tool.
If you run a Playdoh server on your own computer, you can specify how many resources you allocated to others. First, you can do that when you launch the Playdoh server (see Launching the Playdoh server). Then, when a server is running on your machine, you can change the total number of available resources on your server with the function set_total_resources(). Finally, you can also use the command-line tool, like:
playdoh set 2 CPUs 1 GPU
See also
Reference Command line tool.
Playdoh defines several global variables.
User preferences dictionary.
See also
You can define some user preferences in the file ~/.playdoh/userpref.py. The character ~ refers to your home directory, which should be /usr/<username> on Linux and C:\Users\<username\ on Windows.
Current preferences are:
Authentication key used to secure communications within the network. This value must be the same on every computer. By default, it is playdohauthkey. You should generate your own key and share it with anyone who might submit computations to your computer. The following code snippet shows a way of generating a random 256 bits authentication key in Python:
import os, binascii
authkey = binascii.hexlify(os.urandom(32))
Here’s an example of a valid user preferences file:
USERPREF = {}
USERPREF['port'] = 3141
To retrieve user preferences in the code, use the global variable USERPREF as a dictionary:
from playdoh import *
print USERPREF['port']
GPUs are natively supported by Playdoh through the PyCUDA package. It means that your functions can load CUDA code dynamically and run it, so that you can use several GPUs on a single or on several machines in parallel. GPUs can be used for both independent parallel problems and loosely coupled parallel problems (including optimizations).
When loading CUDA code with PyCUDA, you can use the standard functions of PyCUDA to do it but you should never initialize the GPU drivers yourself: Playdoh takes care of that so that several GPUs can be handled transparently. Here’s an example of a function using PyCUDA and that can be safely distributed with Playdoh:
import pycuda
# The function loading the CUDA code
def fun(scale):
# The CUDA code, which multiplies a vector by a scale factor.
code = '''
__global__ void test(double *x, int n)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(i>=n) return;
x[i] *= %d;
}
''' % scale
# Compile the CUDA code to GPU code
mod = pycuda.compiler.SourceModule(code)
# Transform the CUDA function into a Python function
f = mod.get_function('test')
# Create a vector on the GPU filled with 8 ones
x = pycuda.gpuarray.to_gpu(ones(8))
# Start the function on the GPU
f(x, int32(8), block=(8, 1, 1))
# Load the result from the GPU to the CPU
y = x.get()
# Finally, return the result
return y
Warning
On Linux, you may experience an issue with the CUDA code not compiling. You can fix this problem using do_redirect=True in the Playdoh function (map(), minimize(), etc.).
See also
The full example Example: gpu, the PyCUDA website.
Resource allocation is the way computing units (CPUs and GPUs on machines) are allocated to clients’ computations. It can be done either manually or automatically. In the latter case, one specifies the machines and the total number of computing units to use.
The main Playdoh functions accept special keywords cpu, gpu, machines to tell Playdoh how to automatically allocate available resources. Also, they accept the keyword allocate to do resource allocation manually. In this case, this keyword must accept an Allocation object returned by the function allocate(). Manual resource allocation is done by specifying the number of units to use on every machine.
The following example shows how to allocate automatically 10 CPUs on two machines:
from playdoh import *
allocation = allocate(machines=['127.0.0.1', '127.0.0.2'], cpu=10)
This object can then be passed to map() or other Playdoh functions.
In the next example, resource allocation is done manually:
from playdoh import *
manual_alloc = {'127.0.0.1': 3, '127.0.0.2': 7}
allocation = allocate(unit_type='CPU', allocation=manual_alloc)
See also
When distributing a Python function with Playdoh using several machines, the function’s code is automatically retrieved and sent to the machines. When the function imports external Python packages, these packages need to be installed on every machine. When the function imports external Python modules (.py files), these modules must be explicitely specified so that they are also transferred to the other machines. This is done using the codedependencies special keyword in the main Playdoh functions. This argument is a list with the modules’ filenames, relatively to the main function location in the filesystem.
The following example shows how to use the map() function with an import of an external module:
from playdoh import *
# Import an external module in the same folder
from external_module import external_fun
# The function to parallelize
def fun(x):
# Use the function defined in the external module
return external_fun(x)**2
# This line is required on Windows, any call to a Playdoh function
# must be done after this line on this OS.
# See http://docs.python.org/library/multiprocessing.html#windows
if __name__ == '__main__':
# Execute ``fun(1)`` and ``fun(2)`` in parallel on two CPUs on this machine
# and return the result.
# The ``codedependencies`` argument contains the list of external Python modules
# to transfer on the machines executing the task. It is only needed when using
# remote machines, and not when using CPUs on the local machine.
print map(fun, [1,2], codedependencies=['external_module.py'])
This also works with the minimize() and maximize() functions.
See also
The full example Example: map_dependencies and the reference of map().
Some information about the optimization can be returned by the minimize() and maximize() functions by specifying the returninfo=True special keyword.
See also
Reference for the minimize() and maximize() functions.
Several groups of parameter populations can be optimized independently and in parallel with the same fitness function. This allows a vectorization of the fitness function for different optimization runs. The number of groups is specified with the groups special keyword in the minimize() and maximize() functions. The fitness function can accept the groups keyword to get the number of groups. The total population on the node is equally subdivided into groups subpopulations.
See also
Example: maximize_groups, reference for the minimize() and maximize() functions.