(updated July 31)
Microsoft’s failed voice recognition product demonstration at last Thursday’s Financial Analyst Meeting came full circle on Monday when Larry Osterman, a 22-year veteran Microsoft developer admitted on his blog that he was responsible for the bug that led to the on-stage meltdown.
In a disarmingly confessional post entitled “Wait, that was my bug? Ouch!” Osterman describes the spectacle and then adds ”…and it was my fault.”
Wait a second. Someone in the computer industry just took personal responsibility for messing up?
Eleven years after Microsoft co-founder Bill Gates in his 1995 book “The Road Ahead“ predicted humans would one day talk to their computers rather than have to type, the future appeared to be at hand.
At Microsoft’s annual Financial Analyst Meeting on Thursday, Vista product manager Shanen Boettcher set out to show just how easy to use the speech recognition technology built into upcoming Windows Vista software will be. Like, for example, dictating aloud a simple, heartfelt letter to mom, and having one’s voice automatically transcribed into a computer.
The result was a disaster.
Several tries at making the computer understand the simple salutation “Dear Mom” was read by Microsoft software as “Dear Aunt, let’s set so double the killer delete select all.” Attempts to correct or undo or delete the error only deepened the mess.
It was not just a perfect refutation of the problems of making machines understand human speech. What other features of Microsoft Windows Vista pose trouble, the audience was left to wonder? “The crashing demo didn’t do a lot to instill confidence in the new Windows product,” one Wall Street analyst, who was present at the demo, said.
Windows Vista, already five years in the making, has been postponed by Microsoft several times. Delays have put off the consumer version of Windows until early 2007 — after the crucial holiday shopping season. Vista is scheduled to ship to corporate customers this November, that is, unless more problems are uncovered.
Later, Microsoft CEO Steve Ballmer blamed the failed speech recognition product demonstration on “a little bit of echo” in the room, which confused the speech-to-text system. To be sure, a second demonstration during the meeting showed how effectively speech recognition can be for navigating around applications, like Microsoft Outlook.
Structured menus appear to work fine. But recognizing random, natural speech still has quite a ways to go, by all appearances: “Let’s set so double the killer delete select all.”
– Additional reporting by Daisuke Wakabayashi.
Read the more than 2,800 comments on Digg…
Buyer beware, but several posters are offering T-shirts to commemorate the event. (1), (2), (3)
Here is the YouTube link to various videos of the demo gone awry.

Trackback
54 comments so far
Previous | 3 | 2 | 1 | Next
[...] Over the weekend, the wires were full with reports of a speech recognition demo at the Microsoft’s Financial Analysts Meeting here in Seattle that went horribly wrong. Slashdot had it, Neowin had it, Digg had it, Reuters had it. It was everywhere. And it was all my fault. Well, mostly. Rob Chambers on the speech team has already written about this, here’s the same problem from my side of the fence. About a month ago (more-or-less), we got some reports from an IHV that sometimes when they set the volume on a capture stream the actual volume would go crazy (crazy, for those that don’t know, is a technical term). Since volume is one of the areas in the audio subsystem that I own, the bug landed on my plate. At the time, I was overloaded with bugs, so another of the developers on the audio team took over the investigation and root caused the bug fairly quickly. The annoying thing about it was that the bug wasn’t reproducible - every time he stepped through the code in the debugger, it worked perfectly, but it kept failing when run without any traces. If you’ve worked with analog audio, it’s pretty clear what’s happening here - there’s a timing issue that is causing a positive feedback loop that resulted from a signal being fed back into an amplifier. It turns out that one of the common causes of feedback loops in software is a concurrency issue with notifications - a notification is received with new data, which updates a value, updating the value causes a new notification to be generated, which updates a value, updating the value causes a new notification, and so-on… The code actually handled most of the feedback cases involving notifications, but there were two lower level bugs that complicated things. The first bug was that there was an incorrect calculation that occurred when handling one of the values in the notification, and the second was that there was a concurrency issue - a member variable that should have been protected wasn’t (I’m simplifying what actually happened, but this suffices). As a consequence of these two very subtle low level bugs, the speech recognition engine wasn’t able to correctly control the gain on the microphone, when it did, it hit the notification feedback loop, which caused the microphone to clip, which meant that the samples being received by the speech recognition engine weren’t accurate. There were other contributing factors to the problem (the bug was fixed on more recent Vista builds than the one they were using for the demo, there were some issues with way the speech recognition engine had been “trained”, etc), but it doesn’t matter - the problem wouldn’t have been nearly as significant. Mea Culpa. [...]
- Posted by Wait, that was MY bug? Ouch! » Wagalulu - Microsoft[...] That’s what Windows Vista thought product manager Shanen Boettcher said: “Dear Aunt, lets set so double the killer delete select all.” What he DID say was “Dear mom”. Reuters has the story: Several tries at making the computer understand the simple salutation Dear Mom was read by Microsoft software as Dear Aunt, lets set so double the killer delete select all. Attempts to correct or undo or delete the error only deepened the mess. [...]
- Posted by brilliantdays.com | Dear Aunt, lets set so double the killer delete select all.[...] When good demos go (very, very) bad - Reuters Newsblogs [...]
- Posted by Multi-Media Me » Lets set so double the killer delete select all.dragon dictate gave me this gem:
- Posted by russspushy pop-up people pops
I still can’t stop laughing…
- Posted by achanThis is why i use PCLinuxOS. Or Gentoo, depending on my mood.
Windows is only useful for gaming, but it doesn’t even do that right half of the time. When WINE is perfect, Windows will be useless.
- Posted by sonicbhocWow just release it in 2020 when its ready
- Posted by PreZ[...] http://blogs.reuters.com/2006/07/28/when -good-demos-go-very-very-bad/ Eleven years after Microsoft co-founder Bill Gates in his 1995 book The Road Ahead% predicted humans would one day talk to their computers rather than have to type, the future appeared to be at hand. At Microsoft’s annual Financia … [...]
- Posted by usmediaweb - » When good demos go (very, very) bad[...] Based on this report from Reuters, I think we can agree that there’s still a tad of tweaking to be done… [...]
- Posted by that canadian girl » Blog Archive » When good demos go (very, very) badLets set so double the killer delete select all.
FINALLY a phase that can replace “All your base are belong to us!”
- Posted by Joeyou would think windows would have thought of some sort of failsafe to make sure that their demo wouldnt screw up like it did. maybe as simple as restricting its dictionary to the words that would ultimately be used or something like that. it is just odd that vista would fail to recognize spoken words as it did when there are already programs that can do the same thing only much better.
- Posted by mike[...] That’s the sentence that Windows Vista’s speech recognition wrote in response to an attempt to dictate it a simple “dear mom” letter. [...]
- Posted by Syntactically Correct - Amit Schreiber’s Blog » Blog Archive » Dear Aunt, lets set so double the killer delete select all.[...] Click here for the story. [...]
- Posted by Issues » Blog Archive » It’s a shame Kubric isn’t around to see thisAt the end of the transcript of Ballmer’s speech:
“Due to the varying sound quality and subject matter of tapes, the information in this transcript may contain inaccuracies.”
- Posted by JuhaHave put a video of the whole thing on my blog, plus a hilariously overdubbed Italian Vista speech recognition demo.
Link Rob Chalmers explaining what actually went wrong also on the blog - it wasn’t an echo, apparently…
http://www.geekzone.co.nz/juha/946
- Posted by Juhahttp://virtualmagic.blogspot.com/2006/07 /microsoft-vista-speech-demo.html
- Posted by AnonymousOne more reason I own Apple products. Bloatware no body needs. Besides if I feel like running Microsoft software I’ll just fire up my MacBook.
- Posted by Daveyea, i’ve used the vista beta and tried the voice stuff. it actually works pretty well, but the documentation cause its a beta isn’t anywhere complete so finding out what the exact commands are to control stuff aren’t easily found. As with betas they change alot of stuff from version to version which goes undocumented so he might’ve gotten confused when the commands he was use to saying got picked up as natural speech and wrote them down. Stuff like this happens.. I think the voice, if they keep it updated and allow for an open (unlikely) developer interface to allow for addons it could really be a nice feature of the OS.
- Posted by mattHa ha! Reminds me of the early Newton handwriting recognition. Of course it got better, but all anyone remembers is the first generation.
- Posted by Kenoic - 3 + 4 and now 1 + 2
- Posted by erm