By now you’ve probably heard of the term “cache” (/kaSH/, or just like “cash”) – probably related to something on your web browser that you can “clear”. In any case, keep on reading and we’ll explore what exactly this cache is and what it can do for you!
Some quick definitions:
To cache (used as verb): The process of storing the result of a request or computation so that it can be quickly retrieved the next time it’s needed, instead of requesting or calculating it again.
The cache (noun): The actual storage that the saved information is stored in, that would be used for caching.
Every modern web browser has caching built in, such that when you view a web page for the first time, some of the files, images, and scripts from that web page are downloaded into your computer and stored. When you go back to that site later, you don’t have to wait for those same files to download again from the internet before the page is displayed. The internet looks at the website and file names it’s trying to request and can find them in what was saved before, so it skips the download and shows you the version on your computer behind the scenes. So when you “clear your browser cache,” you are essentially deleting these files so that the next time you load a website it will actually get the latest data from the site again.
Of course, it gets more complex than that. Just on the browser side, an individual website can control how long each file should be kept in cache – or for some, never. For instance, the logo on your company’s website can probably be cached for a long time. It’s probably not going to change, and if it does, it would likely be a new file name, which wouldn’t be a match for a file in the cache.
All in all, this is much more important on slower connections (remember when we used to use dial up?!), where taking extra time downloading the exact same thing we downloaded an hour ago, or yesterday, seemed to be something we needed to avoid. If you have a very high speed data connection, you might not even notice these days if you never cached anything again – but generally you don’t get this option.
As a developer, the more interesting type of cache is on the side of the web server. This is not an automated process, but a manual one where a developer needs to pick and choose what might make sense to cache. For instance, if the web server has a function which will ask the database for a list of the 50 US states, which is very common to use in a state drop down control when filling out an address on a form, this is a great example of something that should be cached. The US states are not going to be changing very often, and the cache on the web server can work across all users of the website. So once the first person asks for the 50 states, perhaps for the next 24 hours (or even longer), every other website visitor that asks for a list of states may get a quicker response, plus one that does not need to talk to the database!
The main benefit of caching on a web server deals with scaling. While a single user on the website may be able to notice a difference with some caching, the drastic benefit is to the server and database’s resources itself as the number of users escalates. To use our example above, say it takes 100 milliseconds to ask the database for the list of 50 states. With one user that’s not a problem, but if you happen to get 20 users asking for that list of 50 states all within a second, it slows way down! And what if instead of 20, it’s 2,000 users? Instead of asking the database 2,000 times for the same information that isn’t changing, we store that result in cache (in memory in this case), and know to return it when a list of states is asked for. This lookup is much faster than talking with a database (that may be busy with other requests too) – often on the scale of 100 times faster or more.
If you can imagine scaling things up even further – such as on a site as popular as YouTube – you can see how immensely powerful caching is to speed up your website. However, eventually you may need to make a decision: Is it more important to give a fast response, or an accurate response? Take the number of “views” of a video on YouTube for instance. Asking the database exactly how many views each time is going to be much too slow to accommodate the number of visitors on a popular video. But what if maybe the number of views is only calculated every few minutes, and that result is stored and returned to everyone who asks, and every few minutes it can get updated. It’s not going to be exactly accurate – it’s going to be old data – but it will be fast! Worth noting that you couldn’t do this with something that MUST be accurate – it has to be done a different way.
If you’re selling tickets to an event, where you absolutely can’t sell the same seat to more than one person, extra rules and steps must be followed to make sure two users can’t check out with the same seat at the same time. You cannot store the check for a seat’s availability in cache, no matter how busy your site gets! Sites like Ticketmaster have had to add additional complex features on their website, such as “waiting rooms,” so that only a certain number of people can be trying to purchase tickets to an event at one time, in order to make the experience as consistent as possible. So sometimes, being 100% accurate is so important that you make your website viewers wait!
Overall, caching is an integral part of internet use today. We could not have fast working, high volume websites without significant effort being put into caching behind the scenes. While you may have never thought of it before, all this is working to make your experience on popular websites the best it can be!
Here’s a fun video putting all this together: