What is a browser fingerprint?
You won't be surprised by this, but a browser fingerprint is pretty much like a human fingerprint. Where a real fingerprint relies on the unique patterns found within the ridges of the finger, the browser's fingerprint is formed by the unique characteristics of the browser and device.
Just like on a real fingerprint, each characteristic on its own is not special, but a combination of 10 or 20 things together is unique enough to be able to say "this is probably the same browser".
Here's a very quick and basic example, lets take the following datapoints:
- ๐ฅ๏ธ Browser: Chrome 92.0.4515.107
- ๐ผ๏ธ Screen resolution: 1024ร768
- ๐ Fonts installed: SF Display, Source Sans Pro
- ๐ Timezone: Europe/Zurich
- ๐ฌ Language: it-CH
A screen resolution of 1024ร768 is quite common, and 0.7 million Swiss people speak Italian, so those facts on their own aren't useful.
But the combination of all these attributes together may only be seen in 0.1% of browsers (1 in a 1000), and that would be enough to allow us to conclude that if we see that combination a few times, it's probably the same browser.
(Note: This is a simplified example, in real life this isn't enough data to make a good fingerprint).
What are they useful for?
These days, fingerprinting is most useful for the security and protection of websites, apps, and content.
โ
For example, imagine a site offers a free trial and someone wanted to continuously use free trials instead of paying. Actually, this is not that difficult a situation to imagine!
That user will definitely use a different email and name every time, but free email addresses are easy to create. They may also use VPNs or go incognito, or clear their browser cookies. Unless you use the browser fingerprint to determine if you've seen this person before, they'll look like a new trial.
Since none of these things matter to the browser fingerprint, sites that limit free trials by fingerprint are able to stop this behaviour (or at least make it prohibitively difficult).
Fingerprints can also be used in click-fraud detection, in order to spot repeat clicks secretly coming from the same device, or clicks with high-risk characteristics.
In the next section, you can learn about historic fingerprinting use cases, and how it used to be possible to track the movement of the same browsers across websites. However sketchy this is, some advertising platforms used this technique for collecting data and targeting ads across the sites in their networks.
We've come a long way in the last 5-10 years, and nowadays this is no longer viable due to improvements in web browsers.
Brief history (how browsers fought back)
Of all the aspects of the internet that have been evolving over the last few years, privacy has to be top of the list. Powerful legislation has come into force, and browsers have become wiser about how to protect their user's privacy.
We'll take a brief look at browser fingerprinting then and now. It will help you to see where we are today.
The early days (the wild west)
Let's take ourselves back to 2010. Apple launched the first iPad. Internet Explorer was still the dominant web browser (soon to be overtaken by Chrome), and Bitcoin had only just got started a year before.
Remember BlackBerry? They had a 43% market share in 2010 but were in for a sharp decline from there. BBM (BlackBerry Messenger) was big.
Back then in 2010, Electronic Frontier Foundation created a website that allowed visitors to test their own browser fingerprint. After collecting 470,161 fingerprints, they concluded there was enough entropy (uniqueness) in the average fingerprint to use as an identifier.
By 2012, the concept of canvas fingerprinting was gaining popularity, after researchers at University of California, San Diego wrote a paper Pixel Perfect: Fingerprinting Canvas in HTML5, exposing it.
Canvas fingerprinting is a method where the browser is asked to draw shapes or words (although usually hidden from the browser window), which produces subtle but distinct differences across each and every browser. These graphics can then be made into a 'hash' (a unique ID), that can be compared against future visits.
In 2014, social bookmarking company AddThis caused a controversy when it was found they were testing canvas fingerprinting as a replacement for cookies, all but unknown to their users and partners.
It was found in this year that at least 12 high-profile web ads and user tracking companies were using fingerprinting to track users across websites. Their code could be found on 1000s of popular websites.
Since 2014, other browser fingerprinting methods have been exposed and incorporated into fingerprinting technology, including methods that rely on:
- Internal IP address leaks (WebRTC leak)
- The unique way the browser establishes a secure connection to a server (JA3/Akamai hashes)
- Presence of plugins and fonts
- Techniques that could expose other sites a browser had visited (CSS fingerprinting)
- Round trip time (RTT) to different servers
Today's landscape and modern browsers
From around 2015, Firefox has been pioneering browser advancements that aim to plug the holes in the data exposed to the outside world that allow fingerprinting to happen.
Chrome and other browsers have followed Firefox's lead and slowly started to tighten up.
Apple has also been leading the way with changes to Safari that attempt to present a more neutral outside view of the system configuration, in order to make tracking harder.
We've also seen new privacy-first browsers such as Brave emerge.
These advances have really come about from a much greater public concern over privacy, which has led to legislative changes too.
For example, the EUs E-privacy Directive seeks to protect the processing of data, and protect privacy.
Impact on cross-site and cross-browser tracking
Whilst it continues to be possible to use fingerprinting for legitimate security purposes within a single website, the unique browser features that enabled effective cross-site tracking to work are no longer available.
In other words, the effectiveness of browser fingerprinting for purposes such as ad targeting, are massively diminished. This use cases requires very high certainty, and for the tracking to be effective across many websites and over a very long period of time.
This is not to say that fingerprinting is no longer used in advertising, just that it's no longer viable as a precise (and opaque) tracking mechanism as it once was.
Along with the continuing phasing out of third-party cookies, cross-site tracking is gradually being phased out of the web ecosystem.
The future of fingerprinting
Browser fingerprinting still has viable and legitimate uses, read on to find out what modern fingerprinting looks like and how it can be used.
Under the hood - how it works
So as we've already explored in the introduction, a browser fingerprint is made up of data about the browser, device, and sometimes the connection, that is combined in order to generate a unique and stable ID for the visitor.
In the next section we'll look at the individual elements that go together to make up the fingerprint, but here we'll explore the high level concepts.
The fingerprinting process pretty much works like this:
- The fingerprint process starts when someone visits a site, usually as soon as the page loads up.
- A small JavaScript application runs, often called an 'agent' or 'collector' script, that gathers data.
- Some data is gathered using standard JavaScript (JS) APIs (i.e. the script just asks for the data), and other data needs to be gathered through tests (i.e. for canvas fingerprinting or font discovery).
- If the site is using simpler client-side fingerprinting, a unique ID is immediately created by 'hashing' all the collected data into an ID that is usually 32 characters in length. If the same data is seen again, the ID will be exactly the same.
- Services like Hitprobe will go one step further and send this data to the server and generate a fingerprint using data that is only available from the server-side.
- The site can repeat this process for each visit, sign-up, request, etc. and check to see if the fingerprint has been seen before.
Client-side only vs. server enhanced fingerprints
There is enough data on the client-side only to generate a fingerprint, but depending on the browser, many other users may share the same ID. I.e. there is not usually enough entropy to make the fingerprint fairly unique.
The server has access to data the browser alone doesn't. For example, the server can recognise the unique way different browsers open up a secure HTTP connection. The server also knows details about the user's internet connection.
By combining signals generated in the browser, with the data gathered on the server, a server-side fingerprint can be a lot more stable and unique than a browser only fingerprint.
Downsides to browser fingerprinting
When considering fingerprinting, it's important to carefully weight up the pros and cons, as there are some notable difficulties:
- Upgrades of the browser or OS can cause the fingerprint to change. Although the impact of this will depend on the fingerprint service/method you choose.
- A fingerprint is unique to a browser, so someone can install multiple browsers to appear unique each time.
- It relies on JavaScript being available. Although because the vast majority of users have JS enabled (and the web doesn't really work without it anymore), this itself can be a reason to take a closer look at the visitor.
- Since data is collected in the browser, and anyone can manipulate this data if they know how, fingerprints are never bulletproof.
But even given the above considerations, there is no better alternative if you need to detect uniqueness where the user is deliberately cutting off more conventional methods such as first-party cookies.
Anti-fingerprinting methods
Some modern browsers incorporate anti-fingerprinting measures. These either seek to block scripts used to fingerprint, falsify the information provided by JavaScript's APIs (for example reporting a fixed or approximate screen size), or inject randomness into graphics or audio to hinder methods such as canvas fingerprinting.
The Tor project have created a browser and network that both defends against the collection of data within the browser, and routes traffic through volunteer-run servers known as Tor relays.
What data is used?
Here's a run-down of the most commonly used datapoints within fingerprinting.
Plugins
Plugins (or extensions, browser apps, or toolbars), are a way for users to extend their browser functionality by installing small apps from third-party developers directly from a store made available by the browser maker.
There are some popular plugins that are commonly found to be installed (for example, ad blockers). Some plugins are installed by default, for example, PDF viewers.
The plugin datapoint for fingerprinting relies on detecting which of a set of popular plugins are installed.
As an example, say we check a list of 5 plugins, and plugin 1, 2 and 4 are found to be installed, a simple representation may be 11010
(1
meaning installed, and 0
not installed).
Fonts
In a similar way to the plugins method above, font fingerprinting determines which of a set of fonts are installed.
Since browsers know the privacy impact of simply supplying a list of installed fonts, the installed fonts need to be inferred by rendering some text in a specific font, and then measuring the dimensions of the characters.
But by using this slightly complex but generally reliable technique, it's possible to get a yes or no as to whether each font is present.
Browser settings (language, timezone)
There is a whole range of JavaScript APIs available to ask the browser about things such as language, timezone, etc.
This is necessary as apps do need to know about how to function. For example, what timezone to show dates in, what language to use, etc.
Device attributes (screen size, OS, browser version)
The browser also reports a lot of other information about the device or browser directly. This includes the screen resolution, operating system, browser version, etc.
Canvas or WebGL hashes
Canvas fingerprinting is one of the most complex fingerprinting datapoints. This type of hash can be produced from either JavaScript's Canvas or WebGL APIs.
In layman's terms, the browser its asked to 'draw' text, shapes, fills, etc. using the 'canvas' API. This API outputs whatever it's asked to draw onto a HTML <canvas>
element.
Normally, the <canvas>
is visible and the shapes and text should mean something to the user. But since this graphic is only to be used to calculate a unique ID, the canvas is hidden. I.e. The browser draws it, but you can't actually see it.
Once the graphics are drawn, the browser is asked to convert the pixels back into a 'hash', which is a unique ID. This ID will always be the same if the graphic is the same. But since the graphic will differ very slightly depending on the browser, graphics card, OS, CPU, etc. browsers tend to produce a unique and reproducible ID from this method.
It does normally take 100-200 milliseconds to draw the graphic and convert to an ID, but while this is a long time for a computer, it seems near enough instant to a human.
Audio hashes
Audio hashes are practically the same as canvas fingerprints, but rely on producing and encoding audio instead of graphics.
HTTP headers and TLS characteristics
When the browser makes a request to the server, it sends some properties that are unique to each browser.
This includes HTTP headers. Headers such as User-Agent describe the browser, version, and operating system that the request is coming from. Although very easily spoofed, it does differ from browser to browser and can be used in fingerprinting.
Another source of fingerprint data is the handshake that is done when a secure connection is established between a web browser and the server. This is called a TLS handshake. During the handshake, the browser and server must agree on the cryptographic algorithms to use. Part of this is declaring the algorithms that the browser supports. The available algorithms and the order they're presented, along with related settings, can form a reliable fingerprint.
There are several implementations of this method, including JA3 (first created by Salesforce), or Akamai's hash.
Network
The network itself (i.e. the user's IP address) can itself form part of a fingerprint. By combining the IP address and the browser collected data, an even more unique hash can be generated.
However, it's easy for the user to change their IP address by using a VPN for example, and ISPs often assign dynamic IPs, that change from time to time. So the IP alone can't be used to calculate a permanent fingerprint.
Is it legal to use browser fingerprinting?
Fingerprints and tracking legislation (cookie law, ePrivacy Directive)
Various countries have introduced cookie or tracking legislation. These include:
- European Union's ๐ช๐บ E-Privacy Directive
- California's ๐บ๐ธ Consumer Privacy Act (CCPA) and Privacy Rights Act (CPRA)
- Virginia's ๐บ๐ธ Consumer Data Protection Act (VCDPA)
- Connecticut's ๐บ๐ธ Data Privacy Act (CTDPA)
- Canada's ๐จ๐ฆ Personal Information Protection and Electronic Documents Act (PIPEDA)
- Australia's ๐ฆ๐บ Privacy Principles (AAPs)
- And others
These regulations generally prohibit tracking without explicit consent. Many have exemptions for necessary tracking, and some have exemptions for tracking on the basis of security, protection, etc.
You should consult your local legislation in the country you operate from, to understand the implications for you.
Fingerprints and personal data (GDPR)
Aside from legislation covering the need to obtain consent to track a visit, privacy legislation usually dictates the permission you need to obtain, store, or process personal identifiable information (PII).
Generally fingerprinting doesn't rely on PII, but an IP address can be considered to be PII in certain circumstances. So you may need to declare your used of fingerprinting if you store an IP address against other data that may allow the user to be identified if cross-referenced with other information.
Use cases for browser fingerprints
Click fraud prevention
For businesses running paid marketing campaigns, it's important to understand the quality of the traffic that is being sent. Click fraud prevention software helps marketers to weed out poor quality clicks, such as those from click farms, competitors, people not in the business's country, or accidental clicks.
Fingerprinting is used under the hood within modern click fraud prevention platforms to link the same user to different sessions, even if they try to appear distinct by enabling a VPN or going incognito.
It's now much harder to identify a visit only from an IP address, so it's more important than ever to choose a good marketing fraud solution that incorporates browser fingerprints, otherwise you'll be missing fraud that is otherwise hard to detect.
As well as helping to identify wasted spend on PPCย platforms like Google Ads and Meta Ads, some click fraud tools will also monitor wider paid marketing traffic, for example referrals, or partner lead generation.
Identifying repeat visits from anonymous users
Sometimes websites need to allow anonymous users (i.e. those who haven't yet signed up or logged in), to enjoy a limited amount of free content.
Think about news websites that offer a handful of articles free each month. Or social networks that ask a user to sign up after scrolling a few pages down in a feed.
If it was as easy as switching to incognito mode to bypass this, it wouldn't be much of an incentive to sign up. So these websites tend to use fingerprinting techniques to identify visits from the same anonymous user and enforce these limits.
Preventing account takeovers
Account takeover is where a fraudster or scammer will trick someone into allowing them access to a customer's account (usually a bank account, email account, or some other financial service).
Fingerprints can be useful to detect this situation because the customer will often login with the same device every time, and a change in fingerprint can cause extra security measures to kick in.
This means that the customer does not need to go through 2FA (two-factor authentication) every time, but when it may matter, the fingerprint triggered 2FA can protect the customer from account takeover.
Tailoring content / personalization
A less common use case, because cookies can already achieve this more transparently and reliably, is personalisation.
An anonymous user's settings can be tied to their fingerprint, so when they are seen again, their preferences can be recalled.
Recognising connections between different users
Fingerprints can be used to spot where many users are logging in with the same username/password credentials.
This is known as credential sharing. Depending on the application and circumstances, you may choose just to use this information only to guide product development or marketing.
If the app is strictly intended to be used by only one person per credential, you could choose to block access or reach out to the customer where this kind of activity is detected.
Glossary of technical terms
Here are a few important technical terms used in this guide and a short explanation.
Hash
A cryptographic method to produce a fixed length ID (often 32 characters) from a set of data. The same data always results in the same hash, but the hash can't be converted back to the data.
Entropy
In basic terms, entropy is a measure of how surprising a set of data is. The higher the entropy, the more unpredictable or uncertain the information is. Low entropy indicates the data is more predictable.
JavaScript
Often abbreviated to JS, is a programming language supported by all browsers and used to create apps that run in the web browser.
HTTP
The browser needs to communicate with the server, and HTTP is the protocol used to 'talk' to the web server. When a browser loads a webpage, it fetches the page over HTTP.
Cookies
Tiny text documents that are saved in the browser and are sent back to the website on the next and subsequent visits. They're usually used to store settings and authentication state.
Anonymous visitor
Website visitors that are not logged in are known as anonymous visitors. They don't have a natural ID. Once they sign up or login, they can then be identified by their email address, or the ID assigned by the application.
Click fraud
A category of fraud/anomaly detection software that links multiple clicks together from the same device, sometimes using the browser fingerprinting techniques described here.
Hitprobe's browser fingerprinting (and more)
We hope you've enjoyed reading this complete guide to browser fingerprints.
We're Hitprobe, and our website protection platform offers stable server-side enhanced device fingerprints.
If you're ready to start using browser fingerprints to secure your website or app, Hitprobe is fast to implement, affordable, and there's a free forever plan to get you started right now.
Learn about Hitprobe's browser fingerprint solution and create your free account today.