One of my favorite Linux features is how repetitive tasks can be automated via shell scripts. Unfortunately, the Bourne shell language and its variations (e.g. Bash) are bogged down by a number of idiosyncrasies – such as the nearly constant need to enclose variable references in quotes, the awkwardness of conditional constructs, the difficulties involved in performing simple arithmetics and so on – which needlessly complicate development. On the other hand, the ability to directly invoke user shell commands, scripts and command-line applications is a convenience seldom matched by alternatives, which explains why we mostly put up with it.
A while ago I wanted to write a relatively complex automation script, and not looking forward to doing the required text processing in Bash, decided instead to try my luck with Python, using Popen objects to invoke the needed command-line applications and collect their outputs. In order to make this task simpler, I came up with the function below:
After some tinkering with Python's reflection API, I came up with the code below:
A while ago I wanted to write a relatively complex automation script, and not looking forward to doing the required text processing in Bash, decided instead to try my luck with Python, using Popen objects to invoke the needed command-line applications and collect their outputs. In order to make this task simpler, I came up with the function below:
def batch(*args, **options): options.setdefault('stdout', PIPE) options.setdefault('stderr', STDOUT) process = Popen(args, **options) return process.stdoutThis allowed me to interface to shell tools in a straightforward way, for example:
for line in batch('git', 'status', '-s', cwd=path): # Do some line-oriented processing with the command's outputWhile it was simple and worked well enough, this interface surely had room for improvement. It bothered me that I had no access to the Popen object driving the shell process, but only to its stdout output object; I also felt it was wasteful that the process would run to completion even if control broke off the loop before it finished. Finally, on an aesthetic note, I thought it would be nice if there was some "automagic" way to turn command names into callables, so I could e.g. do git('status', '-s') instead of batch('git', 'status', '-s').
After some tinkering with Python's reflection API, I came up with the code below:
#! /usr/bin/env python #coding=utf-8 # batcher.py # Dynamic interfaces to shell commands from subprocess import Popen, PIPE, STDOUT from sys import modules class batch(Popen): r'''A running shell command or script. ''' def __init__(self, command, args, options): args = (command,) + args options.setdefault('stdout', PIPE) options.setdefault('stderr', STDOUT) Popen.__init__(self, args, **options) self.daemon = options.get('daemon', True) def __del__(self): if self.daemon: self.close() else: Popen.__del__(self) def __enter__(self): return self def __exit__(self, exc_type, exc_value, traceback): self.close() def __iter__(self): return self def close(self): if self.returncode != None: return try: self.kill() except: pass def next(self): return self.stdout.next().rstrip() class batcher_module(object): r'''An extension to the default module object, which creates new command-calling functions as they are imported. ''' def __init__(self, module): self.module = module def __getattr__(self, name): try: return getattr(self.module, name) except AttributeError: pass def batcher(*args, **options): return batch(name, args, options) setattr(self.module, name, batcher) return batcher modules[__name__] = batcher_module(modules[__name__])When a symbol is imported from the batcher module, the batcher_module object first checks whether it already exists. If it doesn't (which is most likely), the symbol is created and bound to a function that will run the correspondingly-named shell command when called. That function works by instantiating an object of the batch class, which inherits from Popen and adds the following customizations and extensions:
- By default both standard and error outputs are collected in the stdout attribute. This can be changed at instantiation time using the named arguments stdout and stderr;
- Whereas Popen objects try to remain alive for as long as the underlying process is active, batch objects work the other way around by killing the process (if not yet finished) when they're selected for garbage collection (the original behavior can be restored by passing the daemon named argument with a value of False);
- batch objects are iterable, returning the next (right-trimmed) line of output at each iteration;
- batch objects are also their own context managers, killing the underlying process (if not yet finished) when the context is exited;
- Finally, batch objects implement the close() method, which kills the underlying process if it hasn't yet finished, and does nothing otherwise.
#! /usr/bin/env python #coding=utf-8 # repo_dirty.py - scans a repo base for git projects containing non-commited changes from re import search from batcher import git, repo def gitdirty(path): for line in git('status', cwd=path): if search(r'(On branch master)|(nothing to commit)', line) != None: print 'Project "%s" is dirty' % path return True return False def repodirty(): projects = 0 dirty = 0 for path in repo('forall', '-c', 'pwd'): projects += 1 dirty += 1 if gitdirty(path) else 0 print 'Finished checking dirty projects' print 'Total projects checked: %d' % projects print 'Dirty projects: %d' % dirty def main(): repodirty() if __name__ == '__main__': main()For the past few days I have been using this interface to rewrite some of my own shell scripts in Python, and I am startled with the results. So far it worked without a hiccup; being able to use Python on shell automation duties brought great improvements to performance and reliability, as well as my own personal satisfaction with the resulting code. Between Python's power and this newfound seamless integration to the shell environment, I now see seldom reason to ever bother writing a shell script again.