Thursday, November 8, 2007
The other week Barry Welford proposes that voice recognition is the killer-enabling-technology behind a future explosion of the mobile web. I am of two minds about audio for the web. In output, I love it. I heavily used the Sprint voice web in the past (much of it has become out of date), and it was not that well designed in many aspects, and pure audio. I've been waiting for years for the technology to allow simultaneous audio and data streams; now I am waiting for someone to take advantage of it. Audio with the visual web should work really well. It can tie directly into portal theory, with the read-back version being a summary (due to speed), then slightly more detail in a glanceable view, and more info if you scroll or click. Its probably less useful for long articles, but the principle could be inverted so the visual component supports the audio stream as I have mentioned before. Not sure what I think of visual voicemail systems. They are nudging this way, but none thrill me totally as yet. Input-wise, I have grave concerns. Star Trek, et. al. use voice to engage the viewer in the otherwise very individual-centric behavior of interacting with a computer. But, this is exactly why it does not work in my experience. Recognition quality is not a concern of mine. As long ago as the early '90s, I was using OS/2 almost exclusively, and it worked perfectly for interaction, and so well for word recognition as to be actively usable for typing. I'll even ignore the speed (its a LOT slower to talk than to point or type, low-literacy folks aside). The problem is the required isolation. Consider the use of an IVR today: how much has it been confused because someone next to you talks? How often have you had to go away, or wait to make such a call because your mom keeps wondering who you are talking to? I cannot think of a way around this, so voice input just seems like an insurmountably niche product to me. I hear, never having been there, Japanese commuter trains are very quiet, despite practically everyone clicking away on their mobile. Can you imagine them all talking to their phones instead?