Saturday, February 9, 2008

Virtual Humans as Application Interfaces

About an year ago, I was involved in a project that would employ a chatterbot as an alternative to a certain service provider's web interface. In short, the user would chat with the bot, and it would in turn drive the (somewhat awkward) interface for him.

Although the project never went into production, it gave me several insights into chatterbots' practical uses, particularly as man-machine interfaces. Perhaps the most surprising one is this: text interfaces can in fact be very good to users. Well, maybe you'll wonder how this has anything to do with virtual humans, so let me elaborate.

Visual interfaces are much less productive than keyboard-bound ones – just compare the pace at which your average mouse-clicker completes tasks, compared to the slightly more experienced keyboard-shortcutter. They are also less amenable to automation, in the sense that it is much harder (when at all possible) to record and recall click sequences than it is to write and invoke command scripts – this again translates into decreased productivity, since the user cannot easily automate everyday tasks. Finally, a reasonably well-written command language will be a sub-set of a human language (probably English), which in principle would make the learning curve smoother – the visual language of icons and buttons has close to no parallel outside the computer world, which could explain why so many users have to be taught to "just click the darn icon" time and again.

But if this is all truth, how come we have embraced visual interfaces to such an extent, at the expense of dropping the text-based alternative? My impression is it that, paradoxically, it is more appealing to end users. Text interfaces tend to be depressingly dry; the blinking cursor after the prompt rarely, if ever, gives any clue on what is or isn't possible, where or how to start. With visual interfaces, at least there is a menu bar (and perhaps a set of hinted icons) telling users what options are available.

So how virtual humans fit in this picture? As I see it, they address several of the drawbacks of text interfaces, while taking advantage of their benefits:
  • The vision of a greeting character is much more sympathetic to humans than that spooky prompt. Suddenly you feel like you're typing to something (someone?) that will make an honest effort to get what you mean, instead of simply blurting "bad command or file name";

  • A communicative virtual human will be eager to explain what it can do for you, and what it needs from you to get things done. Since the interface is designed as a series of exchanges (or "chat"), it is more feasible to the bot to point problems without frustrating or irritating the user, than it would be for a visual application and its message boxes – which are interruptions, and not part, of the normal interface flow;

  • Being a text interface, it would be easier, from both technical and usability points of view, to make the virtual human remember and recall sequences of requests, than it would be for visual interfaces. In fact, a bot could even watch for recurrent patterns and offer to record and later recall them as single requests, freeing users from the chore of script writing;

  • There will be times a virtual human will not be able to understand what the user means (than again, so will visual interfaces). However, a properly-designed bot would be able to tell the user it can't understand him, recall what he was doing up to that point, relate what it thinks the user may be trying to do, and than explain how to do it (in my experience writing AIML bots, this is fairly simpler to do in a chatterbot – where the bot is already keeping track of context changes as it steps through tasks – than in a visual interface). Alternatively, the bot could record a log of its inability to fulfill the user's request, and the botmaster would later add the needed patterns – just like that, your interface has been mended to comply to your user's wishes. Imagine doing anything like that in a visual interface!
Mind you, I am not advocating that we should burn all pointer devices. I have no doubt that, for many tasks, a pointer is still the best tool (take many picture editing tasks for example). But I do believe a well-harnessed conversational interface could be a great advance to end-users and developers alike.