Have you noticed that on many "reality" TV shows, the producers feel the need to add English subtitles over people who are speaking English? If it's so difficult for people to decipher speech, how in the world can we expect a computer to do it? The process of speech recognition involves seven very complicated steps, as explained at mental_floss. Link