October 29, 2014

Safe use of unix signals with multiprocessing module in python

"" Dear reader, since you are interested in this blog post, I am assuming you are familiar with signal module, which lets you catch unix signals from within your python script. I am also assuming you are an enthusiast parallel processing fan boy like me and try to make the best use of python’s parallel processsing frame work - multiprocessing in day to day programming. :) ""

The conflict

signal is a great module! multiprocessing is too. But together they behave aginst each other if you are not careful about their interaction. If you ever tried to use both of them in a single script, you know what I mean.

In a long running multiprocessing application, safe and clean exit of the child processes is a must. There are a lot of ways to achieve this. One common way is to send an Event object to the child process. The child will occassionally check whether that exit event is set or not and then exit if necessary. Here’s a demo script implementing this pattern, where the parent process feeds some data to a child worker process -

#!/usr/bin/env python3
import time
import queue
import signal
from multiprocessing import Process, Event, Queue


# The worker is intentionally too much lazy!
def lazy_ass_worker(exit_event, work_queue):
    while not exit_event.is_set():
        try: work = work_queue.get(timeout=1.0)
        except queue.Empty: continue

        print("I did job {} already! :)".format(work))
        print("A small nap won't hurt anyone!")
        time.sleep(1.0)

    print("Doing cleanup before leaving ...")


exit_event = Event()
work_queue = Queue()

# Spawn the worker process.
cp= Process(target=lazy_ass_worker, args=(exit_event, work_queue),)
cp.start()

# Send some integers to the worker process.
for x in range(100):
    work_queue.put(x)

# We wait for CTRL+C from the user.
try: signal.pause()
except KeyboardInterrupt:
    # Since our worker is too delicate, we should notify it with the
    # exit event and then wait for it's safe arraival / joining.
    exit_event.set()
    cp.join()

At the end of the script, I naively tried to catch KeyboardInterrupt exception(which is generated by default as a response to SIGINT signal) in the parent process and then tried to notify the child about exit condition so that it can do all the clean up before exiting. But if you actually run the above script, and press CTRL+C during the execution, the line Doing cleanup before leaving …​ is never printed. Here’s what happens in my computer -

oscar@notebook ~ % python3 demo.py
I did job 0 already! :)
A small nap won't hurt anyone!
I did job 1 already! :)
A small nap won't hurt anyone!
^CProcess Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
    self.run()
  File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "demo.py", line 16, in lazy_ass_worker
    time.sleep(1.0)
KeyboardInterrupt
oscar@notebook ~ %

What just happened?

Signals are propagated down the process tree. - that is what has happened! Even if you catch a signal in the parent process, child processes still receive and handle that signal. This comes in conflict with the kind of pattern we used in the above demo, where synchronization primitives from multiprocessing module is used for safe cleanup.

The workaround

Every child process spawned by the multiprocessing module inherits signal handlers from the parent process. If we set the signal handlers to SIG_IGN for our target signals before spawning new processes, the child processes will ignore the signals. With this strategy, our demo needs some minor modifications -

# Save a reference to the original signal handler for SIGINT.
default_handler = signal.getsignal(signal.SIGINT)

# Set signal handling of SIGINT to ignore mode.
signal.signal(signal.SIGINT, signal.SIG_IGN)

exit_event = Event()
work_queue = Queue()

# Spawn the worker process.
cp= Process(target=lazy_ass_worker, args=(exit_event, work_queue),)
cp.start()

# Since we spawned all the necessary processes already,
# restore default signal handling for the parent process.
signal.signal(signal.SIGINT, default_handler)

In the above code, after all the necessary process spawnings are done, the default signal handler is restored. If you use custom signal handlers, they should be defined at this stage. Note that using some facilities of multiprocessing module, such as Queue, Manager etc implicitly spawns additional processes. They should be be taken care of in a similar manner.

Caveats

Blocking important termination signals like SIGINT, SIGTERM etc. in the child process is problematic at early stages of development. Programming errors or runtime errors in the code can leave your development system dirty with lots of zombie processes. In that case, just kill them with a SIGKILL, as it can not be ignored.