System Design – Scalability

Scalability is defined as the capacity to change in size. In technology, it means the ability to grow and manage increased demand without affecting overall performance of the systems.

Why is scalability so hard?

A lot of systems are built iteratively. They start with an idea and a prototype. Traffic and workload grow over time. Scalability is often an after thought. A lot of times as system resources get more efficient, adding them to the existing ecosystem can expose scaling issues in the other systems.

Then there is a cost issue, especially for smaller companies. Investing in systems or resources that will help your systems scale from the onset is a tough decision to make, especially when operating on a tight budget.

The Fallout

When traffic to your site is light or your consumer base is low, weaknesses in your system are hard to spot. As the workload increases and surpasses the sytems’ ability to scale, performance drops.

Vertical vs Horizontal Scaling

In vertical scaling, we increase the overall capacity by increasing the capacity of the individual systems. E.g. increase CPU, memory etc. Vertical scaling is also referred to as scaling up.

Horizontal scaling or scaling out means adding more machines to your setup. With scaling out, you are spreading the workload across your infrastructure. The most common pattern in horizontal scaling is the use of load balancers that round robin system load.

Scalability strategies

Load Balancers

Use a load balancer for distributing load to systems. This works well if you have stateless applications and any instance put behind a load balancer can handle the load


Here there can be fast lookup of most recently used results. That will help with high frequency accesses of the same resource. Redis and Memcached are two commonly used caching systems.

NoSQL Databases

NoSQL databases scale better than Relational databases. This is primarily because of the ACID constraint on RDMS. When one or more if the ACID constraints are relaxed, write and read operations can scale.

E.g. Atomic transactions in database means all operations happen or none. However this means there is a write lock on the database till the operations commits or rolls back. In MongoDB a NoSQL database, write operations are atomic on the level of a single document. So even if your are writing multiple operations, other operations may interleave.

Similarly Cassandra drops rules around consistency. There is eventual consistency. This is highly optimal for systems with high write operations.

Content Delivery Networks

A Content Delivery Network will serve the user content from a location as close to the user as possible thus reducing latency.

Communication between microservices

On the surface, this does not look like a system design issue that will affect scale. We could decide, REST is the way we will communicate. However, this decision will not work well when scaling out to build hundreds of microservices that need to communicate. We will need to be able to keep the latency of our responses low. Here is where we could leverage a binary communication protocol like gRPC.

Another communication tool to look into is brokered messaging. This is particular needed to stream large volumes of data between systems. This can be done via a systems like Apache Kafka that create data pipelines between systems.

Always look to the future when designing systems so your systems can withstand the tests of time.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s