They say when you build a better mousetrap, someone will come along and build a better mouse. That seems to have come true, except in this case, the builders are the same company.
You are familiar with reCAPTCHA, the security system that makes you decipher a distorted word to prove you’re not a bot. One of the words is a security feature; the other word is one the software does not know the answer to, but is crowdsourced through reCAPTCHA to find out what it is. The purpose is to improve optical character recognition (OCR) in order to digitize printed books. So while one screwy word is there to deceive bots, the other is there to teach bots how to read screwy words. That in itself foreshadows the rest of the story. Google bought reCAPTCHA in 2009.
Google Street View has been working on improving its OCR to read house numbers, allowing it to map addresses better. You may have noticed that some of the Blogger sites (Google owns Blogger), you have to interpret a photographed house number to leave a comment. Those house numbers are from Street View. You can see where this is leading.
Advances in the house number recognition project are such that Google’s software can recognize a single digit in a photograph 97.84% of the time. Since most street addresses are more than one digit, the actual address accuracy is 90%.
To test the algorithm, Google also let it loose on its own reCAPTCHA puzzles. There, it is 99.8 percent accurate on the hardest reCAPTCHA puzzles. Given that the whole idea of CAPTCHAs is that they are too hard for computers to solve, that’s a pretty stunning number and the accuracy is likely better than that of most humans (at least I know I don’t get anywhere close to 99.8 percent accuracy when I try to solve CAPTCHAs…).
That’s obviously a problem for reCAPTCHA because developers who are less interested in the science behind this could exploit this to spam blog comments, for example. Google, however, says that its CAPTCHA system is now less dependent on deciphering the distorted text than ever before. Instead, reCAPTCHA now looks at a broader range of clues. Entering the text is just one clue, but Google now looks at it as “a medium of engagement to elicit a broad range of cues that characterize humans and bots.”
So while Google is working on helping its bots to read real-world words and numbers, they also have to scramble to find other ways to prove who is human and who is software. We don’t know what that “medium of engagement” really is, because if they told us, the bots would know, too. Read more about the research at TechCrunch. -via Digg