Web App Scaling Considerations, for Cocktail Parties

I’ve been trying to dive more deeply into scaling web apps. When I want to internalize something, I often make up a little song or phrase to remind myself of the main points. Below are a few sentences I came up with for scaling, as well as an explanation of what each word means. I wanted to share it in case helpful. This way, if either one of us finds ourself, say, at a cocktail party where scaling comes up (sure, it could happen), we can silently repeat the phrase to jog our memories about some of the major considerations.

The memory-jogging phrase is nonsense, but there’s an organization to its sentences. I find it helpful to think of them, and read their cadence, as poorly-translated furniture assembly instructions:

"Load horizontally or vertically. Separate shards sequentially. Cache your queues."

The discussion below describes what this all means in a very general way. At the very bottom is a bibliography/list of resources if you would like to check them out.

First Sentence: Think Server Structure - "Load horizontally or vertically."

“Load” - Stands for think “load balancer”. This is a machine that handles multiple simultaneous browser requests and routes the requests to different servers to handle them. This “balances the load” of incoming requests and avoids one server being overloaded. The different servers that are assigned requests by the load balancer are often clones of a master server. Each clone must be updated every time the master server is updated, but there are services that handle this.
“Horizontally” - Stands for think about horizontal scaling. Horizontal scaling means to increase the number of servers you use to handle incoming requests. This would tend to go hand-in-hand with the load balancer set up described above. You might also consider the use of a content delivery network, aka a CDN. Very generally speaking, CDNs are meant to deal with the fact that internet speeds are affected by the physical distance between a requesting browser and a responding server (keyword: latency). CDNs put clones of your master server in different geographic areas and ensure that each clone handles requests made nearest to them in order to minimize user wait time. For example, if you are a U.S. company with a U.S. website and a Canadian customer base, you might want to make use of a CDN to have a clone of your master server in Canada, so that Canadian customers don’t get frustrated waiting for a distant, U.S.-based server to respond to them.
“Vertically” - Stands for think about vertical scaling. Vertical scaling simply means beefing up each server you use – adding more disk space, more RAM, more processing power, etc. There will always be limits to this, so I get the sense it’s only a band-aid, not a fantastic solution, when trying to scale.

Second Sentence: Think Database Structure - "Separate shards sequentially."

“Separate” - Stands for separate your web hosting server(s) from your database server(s). This is good practice in general. Don’t forget to think about having redundant databases - if you put all of your data eggs in one basket and that basket breaks, you’ve got a bad scene on your hands.
“Shards” - Stands for consider “sharding” your database, also known as partitioning. Sharding means to divide your database up among multiple machines. There are different ways you could do this. For example, if you run a blog, you might have one machine for posts, one for reader profiles, and one for comments. You might also utilize “directory-based” partitioning, meaning there is a separate machine with a table for looking up information housed on other machines. As you consider sharding, you’re going to want to avoid problems from maxing out a machine’s physical memory (so you have to reconfigure or move data) and avoid bottlenecks (for example, all requests must pass through one machine, which could overload it and slow performance).
“Sequentially” - Stands for think about SQL vs NoSQL. SQL is a relational database schema, and “join” lookups can be slow. NoSQL schemas (such as MongoDB and Couchbase) were born out of a desire to work around the limits, constraints, and timing issues of traditional SQL databases. NoSQL schemas utilize alternative ways of relating data and do not support join lookups. They also have a better reputation for scaling in general. If you are going to stick with SQL, one potential way to improve performance is denormalizing, which basically means to repeat commonly-used data across different tables to avoid using a join lookup every time this commonly-used data is needed.

Third Sentence: Think Improving Performance for Complex Tasks - "Cache your queues."

“Cache” - Stands for consider storing the results of complex processes in a cache for fast lookup, rather than running the processes each time a request is made. Cache in this context means a memory layer in between your application and your database. When a request or query is made, the application first checks the cache to see if the results are already stored there, and if so, the application can provide the results quickly. If not, then the application looks to the database per tradition. Storing commonly-used, complex queries in the cache can provide a major speed boost and keep users from having to wait for complex query results. But there’s a tradeoff: the cache results might be slightly stale, because, after all, they were already pre-computed and stashed away in the cache.
“Queue” - Stands for plan to have a queue, hand-in-hand with utilizing a cache. The commonly-used, complex requests or queries that are stored in the cache have to get updated somehow, to keep them only slightly stale and not majorly stale. The updating is handled by a queue, which keeps track of the processes to re-run for the purpose of updating the cache and dutifully performs these processes on a first-in-first-out basis.

First Sentence: Think Server Structure - "Load horizontally or vertically."

Second Sentence: Think Database Structure - "Separate shards sequentially."

Third Sentence: Think Improving Performance for Complex Tasks - "Cache your queues."

Bibliography / List of Resources