Increased browser privacy without the breakage

First published: August 18, 2013
Last updated: September 22, 2013

Introduction

This article assumes that you are interested in surfing the web in a way which is more private than the modern default (which is to participate in a surveillance state where you "visit a website and it will almost certainly know who you are"). I'm not going to try to convince you why you should be interested in this if you aren't already. You can look elsewhere for that.

There is a lot of material about this subject on the web, but a lot of it includes advice which can go a long way toward ruining your browsing experience. The web was not built with privacy in mind and if you want to obtain some you often have to subvert browser features or behaviours which web developers rely on working properly. This can lead to websites which don't work properly or at all, or work but don't look right (sometimes just ugly, sometimes so messed up you can't even read parts of the text). This article is going to show you ways you can increase your browsing privacy with little or no breakage. For example, I don't discuss NoScript or NotScript here, because they just break the modern web outright. Proponents of these extensions will tell that it's not so bad and you can quickly build up whitelists which make the web usable, but in my experience they are dead wrong. Even if they're not dead wrong yet, they will be in a few years time. Influential web developers are explicitly leading the charge for a web where Javascript is absolutely essential and unavoidable. I'm not happy about that at all, but I'm not going to pretend it won't happen.

Most of the techniques discussed here should have relatively little impact on your day to day browsing, so you shouldn't feel like your browser has been severely crippled by them. I have tried to order them roughly so that the things you can do which give you a lot of benefit for very little effort/cost come before the things which don't add much protection or take more effort to set up. If you start at the top and move down through the suggestions one at a time until you find a few things in a row which you can't be bothered with, then you should have privacy that is about as good as you can get it without more effort/learning/money/whatever.

The tactics discussed here will not grant you perfect, unconditional privacy or anonymity. They should go a long way toward protecting you from the ubiquitous tracking by advertising agencies, and dragnet surveillance at the ISP level. However, if you are violating federal laws or trying to subvert an oppressive regime and you have government agents with professional training and large budgets explicitly trying to track you in particular, the advice here is absolutely not going to cut it.

Corrections, additions and any other feedback is welcome via email.

Disable 3rd party cookies

Cookies are small data files that web pages can ask your browser to store on your hard drive, and which your browser will send back to the web server on subsequent requests. This allows sites to recognise your browser as your browser each time you make a request. This is essential to facilitate things like logging into a website, but cookies are also problematic for privacy. However, disabling them outright will utterly break large sections of the web, so that's not an option. An excellent compromise is to disable only "third party cookies". At the time of writing, this is actually default behaviour for Safari, and the Mozilla project is planning to make it the default behaviour for Firefox in the near future. This strongly suggests that disabling 3rd party cookies can be safely expected not to break anything.

To disable 3rd party cookies in Firefox, get to the Preferences window, select the "Privacy" tab and under the "History" section, select "Use custom settings for history". This will uncover an option for "Accept third-party cookies", which you should set to "Never".

Install HTTPS Everywhere

HTTPS Everywhere is a browser extension (available for Firefox and Chrome) produced by the fine folks at the EFF. As its name suggests, HTTPS Everywhere gets your browser to use HTTPS anywhere it's possible to do so. This means that the connection between your browser and the web server is encrypted using SSL, and therefore cannot be easily read by any intermediaries, such as your ISP. Your ISP can still see the IP address of the webserver (and probably the domain name, unless you've been careful with your DNS setup), but they can't tell which pages in particular on that server you are reading, and they can't read the content of any comments you post, etc. Of course, the webserver itself can still see everything you do and potentially use it to track or identify you. HTTPS Everywhere protects against 3rd party eavesdropping, not 2nd party neglect of privacy.

Using HTTPS Everywhere simply makes sure that you always choose to use an option which a 3rd party web server is already configured to offer you but which you otherwise sometimes might not use. It doesn't do anything unusual or fancy and therefore it shouldn't break anything.

Block third party traffic to social media sites

Social media sites like Facebook and Twitter are a major component of the modern internet surveillance state. Lots of websites with no direct affiliation with either FB or Twitter nevertheless have social widgets on their pages, such as "Like" or "Tweet this" buttons, etc. The code and images for these buttons are downloaded from Facebook's or Twitter's servers, respectively, which means that they essentially receive a notification whenever you visit one of these sites. If you happen to be a FB or Twitter user and you're logged into your account at the time this happens, then that notification can be tied directly to your account, which in the case of Facebook is usually also tied directly to your real life identity. The solution to this is to use browser extensions which disallow connections to FB or Twitter when you are on external sites. This doesn't change the way your browser interacts directly with www.facebook.com or www.twitter.com at all, so both sites will still work 100% as usual. You'll just "miss out" on all the social widgets on external sites. I suppose this might count as "breakage" if you actually use those buttons.

You can use an extension called "Facebook disconnect" (available for Firefox and Chrome) to block tracking by Facebook, and another called "Twitter disconnect" (also available for Firefox and Chrome) to block tracking by Twitter.

You may notice there is also a "Google disconnect" extension by the same developer, but be warned! Unlike the Facebook and Twitter Disconnects, this introduces moderate breakage to a noticeable number of sites. When a website that isn't Facebook or Twitter wants to download content from Facebook or Twitter, they are almost certainly just adding in fluff like Like buttons, and if that content is dropped you probably won't even notice. But when a website that isn't Google wants to download content from Google, it's sometimes because it wants to make use of the Google Hosted Libraries service. This lets web developers make use of popular Javascript libraries like jQuery without having to spend the bandwidth to host them or the effort to keep them up to date (it also introduces a single point of failure for a huge proportion of the web, but hey, shiny widgets!). If you block these connections, some web pages will fail massively (remember that graceful degradation and unobtrusive JS are deprecated concepts in this brave new Web 2.0 world!). If you really don't want to introduce any hassle or breakage, leave this one out (if you don't want to be tracked by Google, you'll need to make sure you're covered by other counter-measures).

Block Flash by default

Adobe Flash is a cancerous scourge on the web, which is thankfully likely to die a long, slow death as HTML5 replaces it (HTML5, though, has its own shortcomings too). However, for the time being, it is a day-to-day necessity for web browsers, as it powers a lot of the web's multimedia - Youtube and Vimeo, Soundcloud and Bandcamp, and more. However, it's also used for a lot of advertisements, and many sites even try to track you using small, invisible or nearly invisible Flash applications. Flash can be especially nasty for privacy due to its support for Local Shared Objects or LSOs. These are commonly called "Flash cookies", because they basically act exactly like cookies, with the important difference that they are treated differently and stored in a different place by your browser. When most browsers are told to "delete all cookies", Flash cookies will not be deleted. If you delete a site's regular cookies but not its Flash cookies, the site can recognise this and resend your browser the same cookie that you deleted, making it a so-called "super cookie" or "zombie cookie" (Flash is not the only the way to create zombie cookies, so keep reading).

Thankfully, most people probably only really want to actually run Flash applications on a small handful of sites, so there's no serious breakage involved with blocking Flash by default and adding your regular Flash using sites to a whitelist. There are good extensions which do this, and for non-whitelsited sites replace the Flash app with a nicy shiny play button so that you can permit Flash on unfamiliar sites in a very quick and easy way:

Avoid using redirects

Many websites which provide you with a lot of links to external sites use browser redirection to track which of those links when you click on. For example, Google search result pages give you a long list of links to web pages which usually aren't part of Google. The green URL text for each result will say something like www.notgoogle.com/somepage.html, and when you hover over the blue page title you'll see www.notgoogle.com/somepage.html show up in the bottom corner of your browser, but don't be fooled! Thanks to Javascript trickery, once you actually click the link you'll go somewhere like www.google.com/url&url=www.notgoogle.com/somepage.html&user=ABCDE1234. The real URL won't be so pretty, but the point to notice is that it starts with www.google.com, not www.notgoogle.com! The URL points to a redirect app running on one of Google's server. When your browser goes there, Google immediately tells it to go to the www.notgoogle.com/somepage.html address, which your browser promptly does. The whole thing happens quickly enough that you don't really notice you aren't going straight to the external site. But you go to Google first, and all the other stuff in the redirect URL (like the code ABCDE1234 in the example above) is linked in Google's databases to your browser (via cookies or some other means). So not only does Google know what you searched for, but they know exactly which of the results you clicked on, in what order and how long you spent at each one! Google are not alone with this behaviour by any means: when your Facebook friends post links to external sites that show up in your feed, Facebook tracks which ones you do and do not click on (and it's a good bet that their system downloads a copy of all the links you do click on and analyses the content in an attempt to refine their profile on you).

Fortunately, it's pretty easy to disable all this Javascript trickery and replace the redirect links with direct links to the pages you actually want to visit. As a bonus side-effect, things will load just a little faster, because you're cutting out an unnecessary trip to Google/Facebook/Whoever. There's no risk of this breaking anything, you're just replacing one link with another that points to the same (ultimate) destination.

Here are some browser extensions which rewrite links at popular sites to remove redirects:

Be smarter about Referers

By default, whenever you click on a link to a webpage, your browser sends the web server hosting that page the address of the page containing the link you clicked on. This is called a "Referer header" (yes, it should be spelled "Referrer", but someone made a typo when writing an HTTP standard document back in the 90's and now the "wrong" spelling is the one that browsers and servers all expect, so it's become right by convention). Referer headers can tell www.site2.com that you were just looking at www.site1.com, even if the two sites have nothing to do with one another. This is obviously bad for privacy but, like cookies, if you just disable sending Referers completely you can expect to break things. This won't lead to quite as bad breakage as disabling cookies, but there are definitely still pages which rely on correctly functioning Referer headers, for example to track your progress as you move from page to page of a long, complicated form, without relying on cookies.

A good compromise is to only send Referers when you move within a site/domain. If you're going from www.site1.com/page1.html to www.site1.com/page2.html, including a Referer doesn't do much to compromise your privacy because www.site1.com is of course already going to know which of its own pages you visit. Sending Referers in this case allows things like multi-page forms to work without a problem, but still keeps random web masters from knowing that you visit any sites other than their own.

Here are some browser extensions which implement smart Referer policies:

Tweak your hosts file

Your computer almost certainly has (or supports having) a file on it somewhere called a "hosts file". This file contains mappings from hostnames to IP addresses, and your computer will look for mappings here before querying any DNS servers. This is usually used to assign hostnames to other machines on your internal network, or to give short and easy-to-type nicknames to machines on the internet. However, there are no restrictions on the hostnames that can go in your hosts file. You can use this to very effectively isolate your computer from web servers that you don't want to ever talk to, such as those associated with tracking services. If you put in your hosts file that www.badtracker.com should resolve to 127.0.0.1 (an IP address which always points back to your own computer), then any time your computer tries to download anything from www.badtracker.com (be it a webpage, an image file, some Javascript code, whatever), it will in fact try to download it from your own computer. If you're not running a webserver, the connection will fail, and nothing which could compromise your privacy can happen.

Someone who cares has done all the hard work of producing a very thorough hosts file (complete with installation instructions for most operating systems) which will isolate your computer from a lot of nasty web servers - predominantly tracking systems run by the advertising industry, but also various sources of malware, common "shock sites", etc. Note that if you already have a hosts file, you should append this one to your existing one, rather than overwriting it, otherwise you'll lose whatever existing configuration you have.

This is an excellent and elegant approach to web privacy, which is not as widely known as it deserves to be. I love this method because it works for all programs on your computer at all times, it works against multiple forms of tracking (it stops you downloading tracked images or scripts or Flash content or anything else) and it doesn't need regular updating like browser extensions.

Clear all persistent storage regularly

The main threat to browser privacy comes from the browser's capacity for persistent storage: web servers can store things in your browser which continue to exist there after you leave their website, and usually even after you exit and restart your browser. Cookies are the paradigmatic example of persistent storage in browsers (and we dealt with them passingly earlier in this guide by blocking 3rd party cookies, but there's more to consider than that!), but they are not the only form, and any form of persistent storage represents some kind of threat to privacy. Because the fact that cookies facilitate tracking is fairly widely known, and because blocking or removing cookies is relatively easy, many of the serious players in the internet surveillance state now use multiple forms of non-cookie persistent storage to track you as well. By combining all these different forms of persistence, it's possible to create "supercookies" or "zombie cookies" which are difficult to remove and can even be restored after removal!

Commonly used forms of persistent browser storage are:

The browser cache is the most technically interesting of these, in that it's not obvious how to apply it to tracking, so it requires some clever hacks. The cache can be coerced into storing (short) arbitrary strings through the use ETags, and some advanced tracking systems (such as Evercookie) even get your browser to cache specially crafted PNG images which have a unique identifier steganographically embedded in them (really!).

In principle you can disable all of these forms of persistence and have really good privacy, but if you do you're going to have a bad time. Disabling cookies (as discussed earlier) will leave you unable to login to any website. Disabling caching will increase your bandwidth consumption (and that of the web servers you visit!) and slow down your browsing. You may be able to avoid relying on HTML5 storage for now, but it's a shiny new toy and it won't be long before the web developers make sure you can't even read the news without it. So to avoid breakage you must keep these forms of persistence enabled, but to increase your privacy you should clear them out on a regular basis. Here are some options for automating this:

IMPORTANT NOTE: for maximum efficiency, it's important that you (at least occasionally) clear all these sources of persistence simultaneously. Systems like Evercookie will plant matching tracking codes in all the persistence forms at the same time, and if you delete some of them, they will be replaced when the system recognises you from the other forms. So you can clear your cookies and your Flash LSOs and your cache, but forget about your HTML5 storage and that will all be for naught. Only by emptying all these containers at the same time, while not using your browser, can you guarantee that all tracking material has been removed.

Change your IP address often

The conventional wisdom is that IP addresses are not especially useful for tracking - technologies like Network Address Translation mean that many different computers can surf the web using the same IP address. For example, often all the computers in a residential house, or in a cafe with free wifi, or in a classroom or office will appear to the outside world to have the same IP address. This makes IP addresses less useful for tracking individual users than other options like cookies. However, IP addresses are not completely uninformative. Large tracking systems which gather data from all over the web can probably tell when one or two computers are behind an IP address instead of tens or hundreds, by noticing e.g. the limited range of User Agents, or the lack of many simultaneous logged in Facebook sessions. Being the only computer behind a given IP address for a prolonged period of time may facilitate tracking.

Changing your IP address regularly may be fairly easy, or more difficult, depending on the particulars of your internet connection. As an experiment, determine your current IP address (just Google "show my ip" and use one of the many, many IP reporting sites that turn up), then turn off your modem. Wait 5 or 10 minutes, then power it up again and recheck your IP address. If you're lucky, it will have changed. If this is the case, you can simply make a habit of power cycling your modem first thing each morning, or last thing each night. If you find that your IP address sticks with you after turning your modem off and on again, you'll likely have to resort to more drastic measures to change your IP address, like using a VPN (see below).

Be aware that power cycling your modem to change your IP is not a perfect solution. For one thing, your ISP can only assign you addresses from a fixed pool, so you won't get a huge range of variation and you probably get occasional repeats. Also, geolocation technology can be used to estimate your location based on your IP, and is usually accurate down to the city level at least, and this technique won't counteract that at all. If it's important that your browsing not be able to be tied to your geographic location, you will need to go one step further and start using a VPN (see below). However, if you don't want to go through that effort or pay for that service, this is an okay "poor man's" alternative.

Use a VPN

Coming soon.

Pay attention to DNS

DNS is the technology which translates human-readable domain names like www.luke.maurits.id.au into computer-readable IP addresses like 31.3.227.144. Whenever you attempt to connect a remote computer (like a webserver or mailserver) based on a name like google.com, your computer needs to first talk to a DNS server to ask for the corresponding IP address. Your computer typically asks one, two or maybe three DNS servers to do these translations, and typically those servers are all owned and operated by the same entity (very often your ISP). That entity has the capacity to build up a profile of the remote computers you connect to, though nothing more than that - if you use your browser to visit www.somesite.com/somepage.html, your DNS provider will learn that you visited www.somesite.com, but they'll not be able to tell that you looked at /somepage.html as opposed to /someotherpage.html. Of course, if your DNS provider is your ISP, then this is cold comfort because they can see every website you visit unless that site supports HTTPS and you use it.

The biggest DNS-related "gotcha" applies to people who use a VPN, Tor or some other technique that prevents their ISP being able to monitor what they do online. You might figure that the connection between your computer and your DNS server is now encrypted on its way through your ISP, but this may not be the case. When your modem/router hands your computer an IP address via DHCP, it usually also tells your computer to use itself (i.e. the modem/router) as a DNS server (by providing a private IP address like 192.168.1.255). When your router receives a DNS request, it initiates its own DNS request to your ISP's DNS servers - it basically acts as a middle man between your computer and your ISP's DNS servers. Now, the VPN software running on your computer is not a part of your router's firmware (unless you're a DD-RWT wizard or similar), so when your router talks to your ISP's DNS servers, it talks to them directly, and as such your VPN offers no protection. This is called a "DNS leak", and it's a pretty serious privacy failure. In this scenario your ISP can't actually observe any traffic between you and external webservers or mailservers, but it does learn the hostname of every remote computer you connect to, which can convey a lot of information about the kind of websites you visit.

The solution is to configure your computer to not listen to the DNS servers provided via DHCP from your router, but to hardcode in the IP addresses of DNS servers. This will force your DNS traffic to go through the VPN. Your ISP won't be able to read it, because it will be encrypted when it passes through their networks. The DNS servers will still be able to build up a profile, but unlike your ISP they can't easily connect the IP address they see the requests coming from (your VPN's exit IP) with your identity.

So, which DNS servers should you use? DNS servers which anybody can connect to are, unsurprisingly, called "public DNS servers" and there are a lot of them to choose from. A popular choice is the two servers operated by Google, which are at 8.8.4.4 and 8.8.8.8. According to Google, the identifiable data in your requests is destroyed in 24-48 hours, though of course it's impossible for anybody outside of Google to verify this. It's also worth bearing in mind that if you log into Gmail using an account that's obviously linked to your true identity then Google can tie your VPN exit IP to that identity. If you hit their public DNS servers from that same IP, they can connect the queries to you (if you're using a VPN that shares exit IP addresses amongst its users then this connection is only probabilistic).

Some other public DNS servers to consider are:

At the end of the day, using any DNS server that you don't personally operate involves trusting some third party. The sensible thing to do is to put as little trust in any one third party as possible. If you use a Unix-based operating system, you can add the line options rotate to your /etc/resolv.conf file to tell long-running processes to cycle through the DNS servers if the file in a round-robin fashion. You can specify up to 3 DNS servers. If each server is operated by a separate entity, then you entrust each each entity with one third of your DNS queries, instead of one entity with 100% of the queries. Makes sense.

Pay attention to your browser fingerprint

Most of the browser-specific things we've looked at so far for improving your privacy have to do with your browser persistently storing something that a web server once sent you: a cookie, a cached image, etc. This is how 90% of tracking systems work, but there are other considerations. There is some constant information that your browser sends to every web server, even ones it has never spoken to before. This includes things like your User Agent, a short string that tells web servers which browser and operating system (down to the version number) you are using, as well as lists of what languages you'd like your websites to be in, what plugins and fonts you have, etc. All of these details about your browser and computer, taken together, comprise what is called your "browser fingerprint".

The EFF really bought public attention to the issue of browser fingerprinting with their Panopticlick project. They've collected over 3 million browser fingerprints from volunteers, and will tell you how many other browsers they've seen with your exact fingerprint - usually, it's zero. They also give you a nice breakdown of how much information each part of your fingerprint contributes toward your identity.

There are a few common responses to the privacy threat posed by browser fingerprinting. It's very easy to change your User Agent to anything you like, so you can make your fingerprint a little less unique by changing your UA to your best guess of the most common UA in the world. There are also some browser extensions which try to take a more sophisticated approach by doing things like randomly changing your UA for each request, blocking the ability to list your plugins, etc. Here are some such extensions:

In my experience, these kinds of extensions can lead to pretty annoying breakage. Like so many privacy-enhancing extensions, they only do things which sound perfectly reasonable and like they shouldn't ruin your experience much at all. Then you try them and find out the web is chock full of totally unreasonable websites which will break badly unless you give them completely unrestricted access to every conceivable browser feature.

Personally, I think that today the threat from browser fingerprinting is somewhat exaggerated and that the standard defence is misguided. As mentioned, most people try to get around browser fingerprinting by doing things like changing their user agent to something really common, like the latest Internet Explorer on the latest Windows. A different approach is realise that nobody else on Earth having your exact browser fingerprint is only a problem if you keep that fingerprint for a long time.

What is the natural lifetime of a browser fingerprint? Every time you upgrade your browser to the latest version, your fingerprint changes. This may not have been a big deal a few years ago, but at the time of writing mainstream browsers have break-neck release schedules compared to the past. Recent releases of Firefox have only been one month apart. This means you get a new fingerprint at least every month. It's not very different, admittedly, but a lot of the fingerprinting code I've seen just concatenates a bunch of stuff together (user agent, screen resolution and font list, say) and then feeds that big long string into a hash function like MD5 or SHA1. This means that something as minor as going from version 4.2.1 to 4.2.2 of your browser gives you a completely different fingerprint. So if you keep your browser up to date you get a new fingerprint at least every month without lifting a finger. If you use a few regularly updated plugins as well, your fingerprint might only last a week or two. That's still a long time to be tracked, but it could be a lot worse.

I think the best defence against browser fingerprinting is not to try to find a common fingerprint and keep it forever. It's to change your globally unique fingerprint for a new globally unique fingerprint regularly - even every day. How to achieve this? It's pretty unlikely that you have a new release of your browser or one of your plugins everyday. Some people may use an extended service release (ESR) browser version which is not updated for months or years at a time. Here are some ideas that should achieve a regularly changing browser fingerprint without requiring you to update your browser regularly, and without breaking your browsing experience at all:

"Font dancing": Font lists are often a crucial part of browser fingerprinting. If you run a unix-based OS with a nice easy package management system that has a repository full of way more fonts than you actually care about having installed, it shouldn't be hard to write a script that random installs and uninstalls extraneous fonts. If you ran this script from a cron job every day, you'd have a constantly shifting random collection of fonts for your browser to report on. I suspect you'd probably have to restart your browser every day for it to read in the new fonts, but that can be made very painless with extensions like Restartless Restart for Firefox.

Use separate browsers (or browser profiles) for anonymous and identified browsing

At this point in the article, you've done everything I know of that can be done to subvert tracking without having too much impact on the web browsing experience. But it's prudent to assume that some degree of tracking will occur nevertheless, due to things I've overlooked, bugs in privacy-enhancing browser extensions, new forms of tracking technology, etc. and to prepare yourself for that eventuality.

One way to do this is to use two different browsers (or two different profiles in one browser, if your browser of choice supports this). In one browser, you do all your surfing which is directly linkable to your real life identity, like logging into Facebook, Gmail (if you use your real name), your bank, Paypal, online stores - anything where the people running the site definitely know exactly who you are as soon as you log in. In the other browser, you do everything else (presumably most of your browsing). By using different browsers/profiles for these two different kinds of browsing, you also use separate cookie jars, separate caches, etc. You can also easily configure separate browser fingerprints. All of this means that even if tracking networks manage to connect some of the browsing you do in your anonymous browser/profile as belonging to a single person, it should be quite difficult for them to identify that person as you. About the only thing you'll have in common is an IP address.

This sounds not too hard in theory, but in practice it requires a lot of attention and discipline for it to stay effective. It potentially only takes one mistake in your "anonymous" browser to ruin the whole thing.