CAPTCHA vs Bots: Are you a Robot or Not

Captcha bots

‘With great power comes great responsibility” Uncle Ben once said to Spiderman and on the web today, advancement in technology has made it necessary for you to verify if human. Bots has become so powerful these days that it is becoming harder and harder to keep bots from spamming/attacking websites. After years of inadvertently training bots to be better by solving CAPTCHAs and improving OCR (optical character recognition) capabilities.

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is now a daily part of surfing the internet. Its usage started from typing obscured texts in an image into a box to verify we are not bots, to looking for a bus, car, crosswalk or other things in an image grid. Turing test is used to determine the ability of Artificial Intelligence to convincingly pass as human and Google’s CAPTCHA is the most popular one yet.

Advertisement - Continue reading below
Early versions of CAPTCHA (2000s)
Still the early version of CAPTCHA: in order to make it more difficult for bots the text became more obscured
Adoption of images from Google’s Street View
CAPTCHA vs bots
Use of natural images became a better option
Use of adversarial images to throw off bots was adopted and is still very much in use.

Ultimately, CAPTCHA is a filter or selectively permeable barrier meant to keep bots out and let humans in. However, CAPTCHA in itself is a tool for training AI. By solving all those CAPTCHAs, the AI eventually becomes so good at solving it, even better than humans.

A classic case of student being better than the teacher is Google’s reCAPTCHA experiment back in 2014. Its machine-learning algorithm was able to solve complex to distorted CAPTCHA text at a 99.8% accuracy while humans fell short at a miserly 33%. Then, the adoption of the use of natural images rather than synthetic image with distorted text was born, giving rise to the image grid we often see now.

Mind you, Google is not the only provider of CAPTCHA services. The search engine giant bought reCAPTCHA in 2009 from the Carnegie Mellon team. While it is free for most, solving CAPTCHA helped Google digitize its Googlsibe te Books and improve Google Street View as well as its image recognition software. These benefits Google derives makes it free, but it recently started charging Enterprise tier fee which is undisclosed which made Cloudflare ditch them.

Other popular CAPTCHA options are Honeypot, hCaptcha, Sweet Captcha, NuCapctha and so on. They all have different methods for screening for bots, Honeypot uses a field that is hidden to human users but visible to bots so any data entering the field is considered to be from a bot. Also, hCaptcha and Sweet Captcha screen bots by asking users to match items in the same category. However, none of these methods are foolproof as well due to the increasing complexity of machine intelligence and human input.

So far natural images have proven to be able to keep bots at bay especially using ‘adversarial images’ which is particularly difficult for AI vision to read. There is an ever-increasing visual database of over 14 million hand-annotated images with thousands and hundreds of images per node on ImageNet. These resources are used to train AI vision and software research coupled with the fact that we are constantly making bots better the exact same CAPTCHA security meant to keep them out.

That same year, ‘No CAPTCHA reCAPTCHA‘ was introduced where you just have to tick a box to pass the ‘are you a robot?’ test. However, this doesn’t replace the usual CAPTCHA in use considering you sometimes can still be given a CAPTCHA to solve after checking the box. This is to provide multiple layers of security in case it is not convinced you are human. There is an update to this, the reCAPTCHA v3 was introduced in 2018 that allows the user more flexibility and fine-tuning to suit site owners.

You might be wondering how exactly is clicking on a box going to differentiate me from a robot? That’s because you’re not just clicking, there is an ARA (Advanced Risk Analysis) algorithm that is running on the backend monitoring how the user interacts with the CAPTCHA.

In the long run, making CAPTCHAs harder will eventually not work due to the fact that machines learn fast and are better at these things than humans. There are various techniques that are readily available to hackers as APIs and open source tools to use in bots. Some of the readily available tools like Alchemy, NeuralTalk, GRIS and so on make CAPTCHA even less effective when it comes to keeping bots out. Google’s own reverse image search and voice recognition API can also be used to bypass CAPTCHAs.

You also have human CAPTCHA solving forms that provide their services at a low cost solving regular CAPTCHAs for about $1.00, and for $2.99 you can get 1,000 reCAPTCHA challenges solved. The result is evident how inconvenient CAPTCHA is becoming and how easier it is becoming for bots to get through.

CAPTCHA vs bots

At the end of the day, CAPTCHA is not really very effective against bots but instead, it has made browsing users experience more tedious and is a bump in the conversion process. The reality is that AI will keep getting better as time goes on and the same can be said for CAPTCHAs especially when complexity is the basis of differentiating human from bots.

It is important to know that the presence of CAPTCHA does not make a site safe, recent phishing sites use CAPTCHA which makes it look legit as was the case in recent Netflix credential phishing. CAPTCHAs don’t validate a site’s credibility in any way as it is just a measure to curb bot attacks.