Wednesday, January 16, 2013

Guess what we're actually doing!

"This is so inspiring! I'm in absolute glee to know that I'm not alone in the world and............. and........... and........."

Huh, Write words belowOkay,, ha! Next- what in the world is that? v...c. ooh, no, r...i..t.n...t..' ?

What!? No! Please Retry -please retry?! No ways! Exit.

Sound familiar? The above situation is a demo of one of the most annoying and teeth-grinding wastes of time on the Internet. The "Checking that you're not a robot" test that we all have to do before we can access anything is a double pain in the behind and I have been all aboard the "I hate the Prove You're Not a Robot" campaign with everyone else ever since I can remember. I have left blogs countless times with comments unpublished because of these scratchy things. After the third time of The answer you entered for the CAPTCHA was not correct, I'm just about ready to pull my hair out. 

About a week ago though, a link given after completing one of those fruitless irritants claiming to explain what they actually are caught my eye, and trust me, you will not believe what we're actually doing!

reCAPTCHA is a program created at Carnegie Mellon University, USA and now get this, we, the populous of the world, are in fact digitizing the text of old books, newspapers, magazines and documents written before the computer age in order to immortalize and spread their word to the planet, while also protecting websites from spam and computer robots from accessing restricted info and sites. Computers can't read distorted text as well as humans, therefore they cannot reach sites protected by CAPTCHA. I never knew a twitch about this before, and don't lie, neither did you until about a minute ago.It is totally mind-blowing how much is going on right underneath our noses that we don't even realize!

Approximately 200 million reCAPTCHAS are solved per day. As of 2009, 20 years of The New York Times had been digitized and a goal was set to finish by the end of 2010. Unfortunately, nowhere on the world wide web does it say if they really achieved this goal, but though the average per CAPTCHA is only 10 seconds, a collective estimate of about 500,000 human hours are spent solving these frustrating pickles every day, so I think we can both agree that their aim is pretty much reached.

One last question, if we are solving the words, how can the system tell if we're wrong? Here's how; Words that cannot be read correctly by OCR (Optical Character Recognition), which is a program that attempts to decipher illegible text, is given to a user together with another word whose answer is already known. If they solve the known answer, the system assumes their answer is correct for the new one. Hang on, but that's just working on assumptions. How could it possibly be accurate?
Wait, the system then gives the new image to a number of other people to interpret, with greater confidence, if the original solution was accurate.

I'll be honest. I am truly glad that completing those squiggly words is contributing to the greater good and I love that we are contributing to spreading knowledge all over the world, but as we're being honest here, they are still the most colossal vexations of our dearest Internet to moi. Fine, they're no longer a waste of time, but giving a final word on the matter, in 2040 they're going to look like this because there will be nothing left to digitise. Fun.