Friday, January 4, 2013

batcher - simple shell wrappers for Python

One of my favorite Linux features is how repetitive tasks can be automated via shell scripts. Unfortunately, the Bourne shell language and its variations (e.g. Bash) are bogged down by a number of idiosyncrasies – such as the nearly constant need to enclose variable references in quotes, the awkwardness of conditional constructs, the difficulties involved in performing simple arithmetics and so on – which needlessly complicate development. On the other hand, the ability to directly invoke user shell commands, scripts and command-line applications is a convenience seldom matched by alternatives, which explains why we mostly put up with it.

A while ago I wanted to write a relatively complex automation script, and not looking forward to doing the required text processing in Bash, decided instead to try my luck with Python, using Popen objects to invoke the needed command-line applications and collect their outputs. In order to make this task simpler, I came up with the function below:
def batch(*args, **options):
    options.setdefault('stdout', PIPE)
    options.setdefault('stderr', STDOUT)
    process = Popen(args, **options)
    return process.stdout
This allowed me to interface to shell tools in a straightforward way, for example:
for line in batch('git', 'status', '-s', cwd=path):
    # Do some line-oriented processing with the command's output
While it was simple and worked well enough, this interface surely had room for improvement. It bothered me that I had no access to the Popen object driving the shell process, but only to its stdout output object; I also felt it was wasteful that the process would run to completion even if control broke off the loop before it finished. Finally, on an aesthetic note, I thought it would be nice if there was some "automagic" way to turn command names into callables, so I could e.g. do git('status', '-s') instead of batch('git', 'status', '-s').

After some tinkering with Python's reflection API, I came up with the code below:
#! /usr/bin/env python
#coding=utf-8

# batcher.py
# Dynamic interfaces to shell commands

from subprocess import Popen, PIPE, STDOUT
from sys import modules

class batch(Popen):
    r'''A running shell command or script.
    '''
    def __init__(self, command, args, options):
        args = (command,) + args
        options.setdefault('stdout', PIPE)
        options.setdefault('stderr', STDOUT)
        Popen.__init__(self, args, **options)
        self.daemon = options.get('daemon', True)

    def __del__(self):
        if self.daemon:
            self.close()
        else:
            Popen.__del__(self)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.close()

    def __iter__(self):
        return self

    def close(self):
        if self.returncode != None:
            return

        try:
            self.kill()
        except:
            pass

    def next(self):
        return self.stdout.next().rstrip()

class batcher_module(object):
    r'''An extension to the default module object, which creates new
        command-calling functions as they are imported.
    '''
    def __init__(self, module):
        self.module = module

    def __getattr__(self, name):
        try:
            return getattr(self.module, name)
        except AttributeError:
            pass

        def batcher(*args, **options):
            return batch(name, args, options)

        setattr(self.module, name, batcher)
        return batcher

modules[__name__] = batcher_module(modules[__name__])
When a symbol is imported from the batcher module, the batcher_module object first checks whether it already exists. If it doesn't (which is most likely), the symbol is created and bound to a function that will run the correspondingly-named shell command when called. That function works by instantiating an object of the batch class, which inherits from Popen and adds the following customizations and extensions:
  • By default both standard and error outputs are collected in the stdout attribute. This can be changed at instantiation time using the named arguments stdout and stderr;
  • Whereas Popen objects try to remain alive for as long as the underlying process is active, batch objects work the other way around by killing the process (if not yet finished) when they're selected for garbage collection (the original behavior can be restored by passing the daemon named argument with a value of False);
  • batch objects are iterable, returning the next (right-trimmed) line of output at each iteration;
  • batch objects are also their own context managers, killing the underlying process (if not yet finished) when the context is exited;
  • Finally, batch objects implement the close() method, which kills the underlying process if it hasn't yet finished, and does nothing otherwise.
Using the batcher module makes it possible to write Python scripts that seamlessly interface with shell applications, for example:
#! /usr/bin/env python
#coding=utf-8

# repo_dirty.py - scans a repo base for git projects containing non-commited changes

from re import search
from batcher import git, repo

def gitdirty(path):
    for line in git('status', cwd=path):
        if search(r'(On branch master)|(nothing to commit)', line) != None:
            print 'Project "%s" is dirty' % path
            return True

    return False

def repodirty():
    projects = 0
    dirty = 0
    for path in repo('forall', '-c', 'pwd'):
        projects += 1
        dirty += 1 if gitdirty(path) else 0

    print 'Finished checking dirty projects'
    print 'Total projects checked: %d' % projects
    print 'Dirty projects: %d' % dirty

def main():
    repodirty()

if __name__ == '__main__':
    main()
For the past few days I have been using this interface to rewrite some of my own shell scripts in Python, and I am startled with the results. So far it worked without a hiccup; being able to use Python on shell automation duties brought great improvements to performance and reliability, as well as my own personal satisfaction with the resulting code. Between Python's power and this newfound seamless integration to the shell environment, I now see seldom reason to ever bother writing a shell script again.

5 comments:

  1. Hello Helio,

    I have the blog lifeasdaddy, at which you left a comment today. I just want to let you know I don't mind you using that as a possible medium to contact John Harris. You are welcome. It is to be hoped that he makes contact with you directly.

    Best wishes,

    Bob Meade

    ReplyDelete
  2. Hello, this seems to be very useful, thanks for posting!

    But is there any chance it could be enhanced by handling return codes? My bash scripts heavily rely on those. Just evaluating lines does not seem suitable to me in my multi-language environments.

    ReplyDelete
  3. Hi Anonymous,

    Yes, you could check the returncode property, which is inherited from the Popen class. See the reference here:

    http://docs.python.org/2.7/library/subprocess.html#subprocess.Popen.returncode

    Depending on your needs you could either add a method to the batch class to call communicate() then return the process' exit value, or change the batcher function to do it.

    ReplyDelete
  4. Thank you for sharing. I was actually conjuring up something similiar. Python is a perfect language readily available in a few environments I (have to) work in. It's going to be fun mashing this up with the subcommands code I came across earlier this week. Cheers :)
    Rob

    ReplyDelete
  5. Nice to know you liked it. :)

    Ever since I posted this code I have made some improvements to it, I'll look into publishing it as a github project. I'll post an update as soon as I do it.

    ReplyDelete