CAPTCHAs
What is CAPTCHA
A CAPTCHA filters bots. With an ever-increasing number of crawlers and bots flooding the web, it is no longer a question of whether a CAPTCHA or some other bot filter is required. From “Sign In” forms to “Contact Us” forms, bots often take advantage of unsecured or unprotected (i.e. “open” forms or systems) to spam or take down services (a denial of service attack). Other bot filters include IP (VPN) checks, geo-location checks, and more. These more invasive techniques tend to be used for more sophisticated applications, such as protecting order forms, limiting requests, and content filtering.
Having said that, for everything else, a “CAPTCHA” will usually suffice. A “CAPTCHA,” or Completely Automated Public Turing test, is essentially a simple Turing test; tests that only humans should be able to complete. Most “bot filters” operate by testing every user. However, the advent of “invisible” bot detection services, such as Google’s reCAPTCHA V3, allows for an improved user experience (UX) with only select users being required to complete a familiar “Are You a Robot?” test.
How They Work
CAPTCHAs first need to be installed by the website owner or company operating a website. By choosing a provider for user challenges, one can avoid having to create and manage dynamically generated CAPTCHA challenges on their own. Having said that, a simple block diagram for how challenges are retrieved, and processed, is below:
Once the challenge is shown on the page (challenges can vary from picking a set of images, entering hard-to-read text, or even an auditory challenge for visually impaired users). The request will then be sent to your form or input processing logic, which should look like the following:
With the challenge successfully completed, the form/input can be processed (with an adequately low risk that the submission was not from an automated bot).
Other Validation Techniques
- VPN Checks
VPN IP Validation often involves a primitive check against a database of known IP ranges and is often used for content blocking (Netflix, for example, has their own database to check whether or not an IP address is from a known VPN or datacenter as opposed to a residential or consumer area).
- GeoIP Checks
GeoIP checks verify that a user isn’t a large geographical distance away from their IP address. This indicates that a user is not really where they say they are. This can suggest that a user is either fraudulent or attempting to fill random information in an attempt to spam or abuse a service.
- Behavioural Checks
Such checks are quite invasive to the end-user but allow a webmaster, company, or website owner to see if any particular request is acting strangely. That is, by monitoring movement around a page, the speed of a request and other metrics, a score can be generated to determine whether a request requires further validation or is blocked completely.
Conclusion
In an ever-growing Internet, bot filtering is in inevitability that most webmasters have to contend with using. At best, such techniques make the user experience (UX) worse. At the worst, legitimate users are barred from accessing a particular service. Accessibility concerns also exist from the use of CAPTCHAs, though there have been attempts with Google’s reCAPTCHA and hCAPTCHA to accommodate users’ with impairments with audio-based CAPTCHAs (hCAPTCHA offers a service where disabled users can opt-out of a limited number of CAPTCHAs per day).
All in all, while CAPTCHAs continue to grow in complexity and popularity, the “it only takes one to ruin it for all” mantra holds ever true. More advanced bots and CAPTCHA filling services that bypass the “Turing Test” by outsourcing the completion of bot checks are a growing threat to existing filtering techniques. Thus, for applications that require the assurance, CAPTCHAs can be combined with a (non-exhaustive) selection of VPN/datacenter IP checks, behaviour-based filtering, and fraud detection tools or databases.