Wednesday, August 08, 2007

Spawning subprocess with PyGTK using Twisted

Well, it is an age-old problem: How to schedule long-running tasks withing a GUI main loop (in our case PyGTK). There are a few ways:
  1. Use Python's subprocess module and select on the pipe with gobject's io_add_watch
  2. Use GTK's built in subprocess spawning abilities
  3. Use Twisted
1 & 2 are reasonable approaches, and they both work. Of course 1 won't work on Win32. The only problem with both 1 & 2 is that they use gobject's polling functions to achieve asynchronicity. This is nice when we are forced in a PyGTK main loop, but really not nice when the application wants to run in command line mode, and we really want to be able to share the execution code between different UIs, including perhaps other toolkits.

Enter Twisted. We need to do two things with Twisted:
  1. Make sure Twisted knows we are running with PyGTK
  2. Launch the process
Making sure Twisted knows that we are running inside PyGTK is quite easy (though I imagine the implementation was painful). To do this, you must install the gtk2reactor before importing any other reactors like so:


from twisted.internet import gtk2reactor
gtk2reactor.install()
from twisted.internet import reactor


Ok, I did say it was pretty easy. Now all that you need to remember with this is that you should now run your main loop with reactor.run, and not gtk.main.

Now we should think about spawning our subprocess. We will do this by using reactor.spawnProcess, which looks like:


twisted.internet.reactor.spawnProcess = spawnProcess(self,
    processProtocol,
    executable,
    args=(),
    env={},
    path=None,
    uid=None, gid=None, usePTY=0, childFDs=None
)

The only non-normal thing here is the processProtocol paramterer. All the other paramteres are standard things for things like subprocess.Popen. The processProtocol instance should be an instance of twisted.internet.protocol.ProcessProtocol, and defines how data is read from the pipes constructed to spawn the subprocess.

You can just use ProcessProtocol without overriding, but that will do nothing useful, not even print the results, so here is an example with our own ProcessProtocol class.


import os
# Have you remembered to install the gtk2reactor?
from twisted.internet import reactor
from twisted.internet.protocol import ProcessProtocol

class EchoingProcessProtocol(ProcessProtocol):

    # Will get called when the subprocess has data on stdout
    def outReceived(self, data):
        print 'STDOUT:', data
    
    # Will get called when the subprocess has data on stderr
    def errReceived(self, data):
        print 'STDERR:', data

    # Will get called when the subprocess starts
    def connectionMade(self):
        print 'Started running subprocess'

    # Will get called when the subprocess ends
    def processEnded(self, reason):
        print 'Completed running subprocess'

# Spawn the process and copy across the environment
reactor.spawnProcess(EchoingProcessProtocol(), 'ls', ['ls', '-al'], env=os.environ)
reactor.run()

Now the subprocess will execute, and anything written to the child's stdout will be printed to the screen.

This may seem entirely unremarkable, but this is now ready to plug into a GUI. Since the callback outReceived is called inside the gtk main loop, it won't block and it will be called when necessary, so it may as well do something like:


def outReceived(self, data):
    self.text_view.get_buffer().insert(
        self.text_view.get_buffer().get_end_iter(), data
    )

Which would add the line to a textview control (called self.text_view).

Links:
Twisted How-to Processes
Twisted How-to PYGTK

7 comments:

Anonymous said...

err... why not use the one blindingly obvious solution: use python's threading module?

Ali said...

Yes, I guess threads are an option. Incidentally, how would you plan on using them? I can think of a few options, and it doesn't strike me as blindingly obvious which to use.

Remember we are running a subprocess.

glyph said...

Ali,

I just wanted to say, thank you so much for writing these nifty little articles about how to use Twisted in a variety of different situations. You are doing the Twisted community a great service by providing these kinds of insights about how mixing and matching Twisted functionality can be useful. Personally, I never would have thought to do a post about subprocesses plus GTK - it seems totally obvious, but this is exactly the sort of thing that will make the reasons Twisted exists and is useful more obvious.

Thanks again,

-glyph

glyph said...

anonymous,

Windows and Linux provide extremely subtly different behavior between processes. Twisted provides a nice abstraction over all that and just one API to use. We are continuously tweaking and improving the internals of that implementation to perform better on different OSes and provide more consistent behavior; if you use threads and processes directly, you have to figure it out yourself. Here's a very basic question: how do you start the process? exec(v(e))? CreateProcess? spawn(v(e)|l)? the subprocess module? What happens when a process hangs, how do you safely terminate it? You can't safely terminate a thread, after all. You also can't just use mutexes - the process might be in a state where you are ready to send it a signal, and then it terminates normally; even if the termination is handled in a mutex, you have to deal with the fact that it might no longer be around later, and provide different behavior for the termination method.

Plus, those of us with lots of experience programming with threads just know that it is always nice to avoid them :).

arnau said...

Quoting Jamie Zawinski's joke about regexps: "Some people, when confronted with a problem, think 'I know, I’ll use threads' Now they have two problems."

So, I agree with glyph, although threading is sometimes a great solution, it's usually is an awful nightmare.

Twisted approach is interesting and deserves a look. For this kind of problem I often use generators:

http://faq.pygtk.org/index.py?req=show&file=faq20.009.htp

http://faq.pygtk.org/index.py?req=show&file=faq23.020.htp

Ali said...

Glyph,

Well, thanks for all the hard work that has made Twisted. As you say, things like this are very trivial for a seasoned twisted user or developer, but it's not something a PyGTK dev might have even thought about.

Ali

Ali said...

About threads,

I don't think threads are bad in of themselves. I use them extensively in PIDA as the main asynchronous method, but the problems exist, which are the usual problems with threads compounded by the fact that GTK behaves differently when threaded, and perhaps even in undefined, less-tested ways.

As long as I have been tight with the code and used one of the few abstractions we wrote to make it nicer, I haven't really had any problems, and the speeds have been really impressive. Forking actually came out slower, but I never found out why.

The shared namespace is also a massive bonus, or am I just being lazy?