I've been called insane, but haven't been proven a liar...

How Cloudflare tries to fuck your privacy: A brief look at the PoW CAPTCHA trend

26 Mar, 2024

(The bulk of this was written in 2022, but I did not release this since, well, most people are willfully ignorant retards, anyway. Why bother? I'm no tech/IT specialist and I've reached these conclusions on my own, with some help of the knowledge on the internet I've looked up by myself. If the masses aren't doing the same, they either don't care enough or are too stupid to care. )

Brief description of the situation

So, you're browsing the web in peace, and occasionally you encounter something like this on your page:

"Don't be alarmed, it's to make it harder for those scary bots to access the website," Cloudflare says. So a normie user will just let the Cloudflare's heavily obfuscated scripts which do only a Cloudflare employee knows what, wait a couple seconds, and voila! Access to website granted.

In case you're a privacy-concerned person, they'll tell you that they don't do any fingerprinting and that this is just a mere proof-of-work test to complicate access to website for bad bots. (By the way, how does this particular user know what exactly is being done by Cloudflare if the scripts responsible for "bot protection" are heavily obfuscated?) When you find out that they actually do fingerprint you, and that this sort of fingerprinting is very intrusive, and that Cloudflare itself doesn't really hide the fact, openly advertising it on its corporate page, the same dude who told you it's totally not fingerprinting but a JS test now tells you that they do fingerprint their users, but this time "it's to defeat bad bots" (while his nose slightly grows in background) and that "Cloudflare fingerprinting isn't enough to uniquely track a user so calm TF down already". (his nose grows a bit more)

And then you familiarize yourself with canvas fingerprinting and event tracking and it turns out that this Cloudflare shill have fucking lied to you again. That son of a bitch.

Well, at this point you can be totally sure that something fishy is totally going on with all this "bot protection" bullshit. Strangely enough, no one in privacy communities gives any meaningful coverage of this issue - or at least, they aren't popular enough to be found on the first three pages of Startpage, Swisscows, Qwant, and Brave Search.

Well, if no one speaks about this, then I will.

Why are they doing this?

There is so far no solid popular "conspiracy" theory regarding "bad bot pandemic", so let me create this one.

Heavy presence of "client-side scanning" in many, if not all, "anti-bot" solutions

Just a few years ago, the chief way to combat Internet bots was the Reverse Turing Test more commonly known as "CAPTCHA". The principle behind the test was easy and understandable: the user was prompted to solve a challenge to prove they're a sapient being and not some automated routine. It could of course contain tracking mechanisms, but as far as the test itself goes, it should've let sapient beings through if they would pass the challenge. So blocking trackers is mostly enough in this case.

However, somewhere around 2020, Cloudflare, the now-chief user tracking company masquerading as "bot protection system", started using Javascript challenges performed by the end user's machine as a way to "ward off bad bots". Yes, they did exactly that: they've fucking switched from providing challenges only a sapient being can solve, to automated scripts which are solved by any machine running JS, in order to prevent bot access to the website.

Fishy. As. Fuck. It's more like replacing your door lock, which serves as a reliable enough anti-intruder measure, with a set of sensors scanning lots of data from anyone who's inside the house, while removing the fucking lock itself. All because "wasting twenty seconds to lock/unlock those locks each time you go in and out is a chore for most people". Maybe in some future setting in an abstract human society it will sound reasonable, but right now it sounds like absolute and total BS.

Cloudflare isn't the only company that claims to track users to protect the website from scary bots. There's a solid set of entities who track their users and their behavior, all supposedly "in order to stop bots". This isn't just "fishy", this is an entire fucking lake filled with fucking fish so much, you can't swim in it without being surrounded by a swarm of 'em.

The attempts to introduce face control in the Internet

Speaking of Cloudflare (and several other websites), they've come with a particularly nasty scheme that outright prevents access to the website unless you show them all the papers, err, fingerprints needed to sufficently identify you. There are several legal efforts to write authentication into law; however, these are met with fierce, growing and only growing public resistance, so there's an alternative method: face control upon visiting the page.

In case of Cloudflare, they stop the user before the user reaches the target site, by forcing them to submit a fingerprint which will, if not outright identiy them uniquely, then at least severely narrow the list of potential matches - like, from several billion to just a few thousand matching persons - which, given the fact that six billion persons sure as hell aren't using that website, narrows the list of potential matches to just one person. And since Cloudflare is a service which has multiple clients, it means it can effectively track you across the websites, which, combined with data exchange between tracking companies, leads to tracking your entire web history if you're not too careful.

They're lying, means they definitely do not mean anything benevolent

And if they were really tracking the fucking bots, there wouldn't be any real reason for them to lie, hide, obfuscate both the narrative, and the JavaScript they intend to run on your machine. After all, if they do not intend to screw the users of their clients' websites by providing a sneaky tracking mechanism, why don't just declare it openly?

"Yes, we use JavaScrpt fingerprinting, as well as other data-mining techniques, to protect our clients against bad bots."

Is not something you're going to hear when you just go "cut the crap, guys" and ask it outright, whether they're using tracking techniques to beat the bots or not. There will be all kinds of language maneuvering. They'll say this is "client-side scanning", "checking your browser", "checking if the connection is secure", anything but "give us your JS fingerprint to prove you're not a bot". They'll even lie to you outright until you conduct some basic research and possibly see their actions for what they are, tracking the end-users of their respective websites. And if you do so, they'll be like, "yeah, we do fingerprint ya, but you know, it's not really to track ya, it's to beat the bad bots y'know". And if you're one of those few individuals who didn't stop right here and kept diggingg and found out that the methods used for "bot protection" are the same methods which are used for tracking and surveillance in general, they'll just tell you that your privacy is a reasonable sacrifice in exchange for their clients' "security against bots" so fuck off already.

If users will be well-aware of their intentions in general, their tricky tracking scheme fucking fails

The key to operation of this whole sneak-tracking narrative is ignorance of the majority of users. Long as users either don't know about the pervasive tracking they perform, there are no organized counter-measures against such tracking and no organized resistance to surveillance.

For the system to work, the users need either not to know they're tracked, or not to care - and the failure of EARN-IT and similar lawsuits, as well as invariable losses of internet passport/total surveillance/backdoor initiatives put forth as proposed bills in many countries, shows that the people do care about their privacy. Furthermore, already established acts such as the PATRIOT Act, as well as aready established forms of surveillance (e.g. cameras in public spaces) are receiving ever-increasing amounts of flak from the general public, which makes pushing surveillance via law hard, costly, and ever-challenging. Even if they do push one measure, the reaction will be nothing mild and passing over time.

So th solution is to focus on users not knowing they're being watched. And judging by the fact that I am probably the first person to address this issue in a more-or-less extensive detailed approach, Cloudflare and the likes of it are doing a damn good job at this. Most people I know do not know the words "Cloudflare" or "DataDome" so far, and those who do, barely know anything about them.

If, however, people find out what's going on, the consequences will be indeed grave for the sneaky tracker team. A rough example of what's going to happen is shown by the war on Internet piracy somewhere in 2010s; they wanted the people to buy music CDs and video games, but instead have tremendously raised people's awareness of their ruthlessness and disregard for others' privacy rights, as well as popularized the use of Tor and other privacy tools greatly. Only that this time, it'll be far, far greater, since it doesn't exclusively concern warez downloaders. Cloudflare's client websites will be faced with the choice between ditching Cloudflare or admitting they have no respect for their customers' privacy; Cloudflare will be forced to back off or lose a huge share of their profits, and most importantly - ultimately, the excuse for widespread surveillance in form of bots will vanish as people will actively propose non-intrusive anti-bot measures and the digital voyeurists will be left with terrorism, crime and scary pedophiles as main reasons to install a spy bot. All of these reasons have ever waning support from the public, and pushing them more intensely only accelerates the decay of people's trust in government surveillance, which itself was pushed not via force or solid arguments but via deception through selective attention focusing.

How is that bad for you, an ordinary Internet user?

Unless you've got nothing to hide (which isn't the case if you respect your privacy and sovereignty) there are numerous threats coming from Cloudflare.

Canvas fingerprinting will totally identify you

Cloudflare is intended for small websites with an audience of under million, because bigger sites are expected to almost always be capable of handle both the CDN expenses and the DDoS/other-bad-bot protections by themselves. Hence, they are just not expected to be interested in Cloudflare.

Which means, any website using Cloudflare is expected to have a small audience.

As said before, canvas fingerprinting generates slightly-different images based on parameters such as the installed GPU and drivers for it, installed fonts, font settings etc., which means a lot of potential variations of canvas renderings. More precisely, the amount of potential canvas renderings is the product of the amounts of variations each relevant parameter adds.

For example, if let's assume that there are eighty different canvas results produced by all of the GPUs and graphics card drivers in use at a given point of time - so there are already eighty different canvases we can get from changing the GPUs (and their drivers) alone. Then, let's say there are forty more versions of canvas produces by various operational systems and combinations of system settings and installed components. Then, there are roughly 60 ways to render a canvas depending on which browser is being used, as well as what settings are present. The total amount of canvcas variations you can get for a particular image is: 80 * 40 * 60 = 192000.

And if we assume you're one of the eight billion Internet users, and that some malicious entity wants to ID you through the use of canvas, then your canvas rendering will narrow the range of potential matches roughly by the factor of 192000: 8 000 000 000 / 192 000 = 41666.66. In other words, that would mean you're roughly one of the 41667 people using whatever website that has eight billion users.

But you just can't reasonably expect a Cloudflare-hosted website to have eight billion users, for reasons explained above. Let's say, you're one of 400 000 users on that website. The canvas fingerprinting is, in that case, able to narrow the potential matches down to 400 000 / 192 000 = 2.08333 users. In other words, one of these two particular persons is you.

Even if we assume that no further precise filtering is happening and that canvas fingerprinting is the only identification tool used, there are only two persons who can be you. For an entity such as an authoritarian state, an invasive ad pusher, or a social engineer seeking to subtly brainwash you into believing something, this precision is more than enough.

The information you give to one web voyeurist, is available for all of them

There are several things web spies are doing with that information they gather from their users.

They hunt for your personal data so they would either sell it to someone, or use it to manipulate or harm you in future, or study you against your best intersts so to use mechanisms designed to change you in a way you do not even imagine to be possible. So Cloudflare are either building a Doomsday Machine which would somehow bring a massive calamity using all of that information they're gathering, or they're basically trading your identities with other networks. Or they do both.

Another nasty thing about Cloudflare is that it's present on many, many websites, gathering info from each and every Cloudflare-hosted website - which allows then to effectvely track you across a large portion of web. On their own. And if we assume that they do trade data with the others, which they likely do because the vast information trade improves the awareness of all web stalkers, benefitting all of them - then we can effectively assume that whatever Cloudflare can know from you is known to the near-entire web.

Even if the website you're visiting does not track you by itself, if it's Cloudflare-hosted, it's a gigantic threat for your privacy. All Cloudflare needs is to correlate your activity on that website with your identity as stored in Cloudflare databases - and your private experience with that website is done for. Many Cloudflare-hosted websites, such as Minds.com "alternative" social network, are even cooperating with Cloudflare, letting them know which account belongs to who.

Cloudflare knows everything you're doing on a website hosted by them

Even if neither Cloudflare nor the website records your session for replay or uses any JavaScript which tracks your movements, do not forget that a lot of website's content is still stored on Cloudflare's servers. For example, Minds store all images on Cloudflare; and each time you're browsing memes or whatever, your browser requests those images from Cloudflare servers - letting Cloudflare know exactly which content you're into, long as they exchange data with Minds' staff (and given just how hard the Minds' admins were trying to push user identification by the end of 2020, even proposing something as outrageous as providing a driver license scan merely to be able to use the website, they surely do cooperate with CF this way).

Even in cases where a webite owner does not have insider agreements regarding user tracking, Cloudflare still knows everything you do on the website - long as this content is stored on their servers. Long as your browser needs to send requests to Cloudflare to receive or send content, Cloudflare can easily monitor the request history - and add it to your possibly-vast web history compiled by Cloudflare both from your actions on their servers, and from the data they've bought from other web tracking sources.

Realistically, you don't even know what the hell does Cloudflare try to run on your system

They heavily obfuscate their Javascript - meaning, they try to make it as hard to read as possible, so someone wanting to know exactly what the hell is it doing on the user machine will have trouble deciphering it.

Yes, someone had partially de-obfuscated the script and told us what are they doing. Partially. Which means, there might be still bits and pieces of potentially-malicious code you know nothing about. For example, Cloudflare doesn't want you to know where do they store that evercookie they're placing on your browser, you know, that evercookie which is associated with your entire browsing history known to the CF, so they heavily obfuscate that part. Or that they're launching something like DrawnApart on your system and don't want you to be able to defend against this.

In any case, though, the very fact that they do obfuscate their code already means that they're up to no good. They may try and justify this with some BS like "the bot-makers will simply by-pass our defence mechanisms if they know what we do on your machine", but there's still one fuckin' way to avoid that: CAPTCHA. The one mechanism that does protect against bots of dubious purpose while letting the "good" ones like the search engines access the site.

Of course, they know that. And they know that CAPTCHA by itself is no good for user tracking. That's why they opt for "proof-of-work tests" which then turn into canvas fingerprinting "to discriminate between device classes" and then, when the users figure out it's not just to determine their device class - to whatever sounds as a good enough excuse.

And outside of those deobfuscated parts, you don't even know what the hell do they do on your machine. Maybe this canvas fingerprinting is a part of some more complicated mechanism? Maybe they're launching a DrawnApart-like abstract function which by itself doesn't look suspicious and therefore might be still undetected.

Generic advices for "ordinary" web users

Remember, there is nothing more efficient at detecting bots than a Reverse Turing Test known as CAPTCHA. If a website is truly afraid of bots, it can afford itself to put forth a bot test, for fuck's sake. Anything that claims to be a sort-of autonomous check is BS, and is highly likely an attempt to compromise your privacy.
Cloudflare can not stop a DDoS attack or any bad bots because it mostly allows non-anonymized connections through. A good example is Minds.com, a social network which uses Cloudflare in "I'm under attack" mode constantly and yet is full of spam/ad bots and token farmers. If you're running a website, CF can help you track your users, but not stop a bot attack.
Any website that puts impediments for Tor users wants to know you better. Against your best interests. They may say it's for security or some shit. They always come up with that when they need to compromise you in any way, though.
Internet ads are no longer merely annoying banners popping up and pissing you off. Modern web ads use extensive metrics which track your web behavior as well as lots of fingerprinting techniques to provide "personalized ads" - and to store your data for other purposes. A website begging you to allow ads to fund its existence might not be an evil asshole who wants your private data up for sale, but it surely assists in selling your data
Whenever a website asks you to pause and wait "just a moment" - close the fucking webpage ASAP and put it into a special Sneaky Tracking Websites list. So you would never, ever deal with it again. It is highly unlikely that they're the only website hosting some useful-for-you information.

A few words about the "proof-of-work CAPTCHAs" as a form of closing this article

In the past, the bots were challenged with a so-called "Reverse Turing Test" - a challenge intended for a sapient being (or at least a semi-sapient human) to solve. Kinda like clicking on all pictures which contain a bus until ~~Google completes the event tracking and records your behavior as part of your fingerprint~~ the system decides you're not a bot. Sometimes, the bots could be trained to solve CAPTCHAs automatically. Other times, they could not.

Nowadays, however, the Internet is full of bots - and, most importantly, a lot of those bots are being used by the websites themselves, such as search crawlers, archiving bots etc., and yes, all sorts of web apps which integrate with one another. So in short, there is no longer a need to fight the bots, because the Internet uses the bots on a daily basis.

Sure, there is still a need to fight off the internet vandals of all sorts. The narrative goes like this: you have legitimate users and you have the bad bots. So in order to halt the suspicious users, you present them with a supposedly tough JavaScript challenge which is expected to take away a few seconds of your CPU processing time before you can proceed. (Why not just block the suspicious address for precisely those few seconds so they'll need to refresh the page to continue?) Ordinary users would need to wait a few seonds, while a malicious actor sending thousands of requests from their machine to your server in order to DDoS you (barely ever observed in reality, as most if not all DDoSers use bot nets) will supposedly get halted as they'll need to complete those "PoW CAPTCHAs" every time they make a request from their machine. (again, they'll just use a bot-net.)

So basically, PoW CAPTCHAs are a shit mechanism which can stop only the never-really-seen kinds of attackers which, for some reason, use only one machine to launch tons of attacks on a server. And given that the nature of DDoS attacks is distributed - it's in the word, DDoS stands for "Distributed Denial of Service" which means, you use numerous machines to bombard a server with requests - well, the idea itself looks and sounds like shit.

That is, unless you consider that the PoW attacks aren't there to stop bots. Or internet vandals.

They basically are a way to stop the user from reaching a website until they complete a "challenge". And one interesting thing about these challenges is that they are surprisingly good at stopping users with privacy-friendly setups, which just doesn't make sense if we're just running a script which is supposed to take some of your CPU time but not obtain a valid fingerprint of your machine. Kinda hints at the true purpose of these "challenges", since they also suck at stopping spammers and ther bad actors and nobody seems to mind enough.

So what they're doing, is installing a sort-of bouncer on all websites which just kicks you out unless you give him your ~~papers~~ fingerprints. As for why would you need to be concerned, it's for another topic and, honestly, if you don't care about whether you're being tracked across the web or not, then why are you still reading? Just submit them all your data and get the heck out of here.