Shortcut to seniority

System design

System design refers to architecturing a large-scale system, based on a set of requirements, and managing any possible trade-offs.

From my point of view, there are three major blocks.

First major block refers to problem solving.

These are the steps that a programmer can do in order to solve a specific problem.

From the implementation perspective, the first block (problem solving) applies to any problem, but are mainly in the scope of small tasks (level functions, classes).

The second major block refers to questioning everything.

These are the actions that a programmer must do in order to understand in detail a specific problem.

The second block (questioning everything) is in the scope of medium tasks (level: domain).

The third major block, and the most abstract one, is this chapter: system design.

These are the constraints and design choices that a programmer must be aware of in order to design an application.

The third block is in the scope of large tasks (level: application).

Specific terms

Single point of failure (SPOF)

A single point of failure is a part of a system that will stop the entire system from working, if it fails. In order to protect the system from SPOFs, we need to identify where are the problems and fix them. We should look for things such as data that is not backed up or any hardware or software system that does not have any redundancy.

Design techniques

Load balancing

Load balancing is a term that refers to distribution of work across multiple computing resources, or in simple terms, the process of distributing the requests to different servers, in order to reduce the burden of a single server.

This is done to maximize speed, capacity, and to make sure that no server is overloaded with requests. If a server goes down, the load balancer can also redirect the unprocessed requests to a different server.

image with caption — Figure xxx - System design - Load balancer

Database replication

Database replication refers to storing a distributed database, and copying data from a database within a computer / server to another one. The implementation in which database replication is used in order to eliminate data ambiguity and inconsistency among its users is known as normalization.

Database partitioning / sharding

We want to make use of database sharding in our system because we can scale the system by simply adding more shards. Doing this, we can improve the performance by balancing the workload across all shards, and we can also position them physically close to the users that will access that data, thus increasing the performance of the system.

Database partitioning can be performed in two ways:

Vertical partitioning, by storing different tables and columns in separate databases.
Horizontal partitioning, by splitting and storing rows of the same table in multiple databases.

Example of vertical partitioning

                                            
                                                UserData retrieveUserData(string user_id) {  
                                                return db["users"].get(user_id);  
                                                }  
                                                
                                                UserPhoto retrieveUserPhoto(string photo_id) {  
                                                return db["photos"].get(photo_id);  
                                                }

Example of horizontal partitioning

                                            
                                                UserData retrieveUserData(string user_id) {  
                                                return db_users[user_id % 2].get(user_id);  
                                                }

There are a few ways we can categorize the data sharding, such as algorithmic or dynamic sharding.

If we use algorithmic sharding, everyone can determine the partitions we use in the database and in which location we should look for a given data.

With algorithmic sharding, we use a function to locate the data. This is similar to using hash tables.

When we use this type of sharding, the data is distributed only by the function, without considering the size of each database.

If we use dynamic sharding, we need to use a service locator which will determine the location of the entries, and can be changed dynamically in case we want to split or redistribute data.

The disadvantage with this approach is that it becomes a single point of failure in our system.

Scaling the system

There are two ways of scaling up the system:

Vertical scaling	Horizontal scaling
Occurs when we add more hardware (CPU, RAM) to the existing machine.	Occurs when we add more machines into the pool of resources.

Caching

Caching allows a better usage of the already existing resources, and it is used to speed up the computation.

For example, there’s the Least Recently Used (LRU) policy, which is a cache replacement algorithm that removes the least recently used data in order to make space for newer one.

Design interview

If you’re preparing for a design interview, keep in mind the following:

You’re probably not expected to write any code, but identify high level components and describe the interactions between components.

Step 1. Gathering the requirements

The first step is similar to problem solving (understanding the problem).

Therefore, in an interview, you should ask questions to fully understand the scope of the problem.

Keep in mind that design questions are usually open-ended and don’t have one correct answer.

A. Clarify the use-cases and in-scope / out-of scope

Let’s say we have to build a social network:

Who can post ? Answer: Any user.
Who can read the post? Answer: Any user if public, friends for friends privacy, etc.
Can we add / follow other users? Answer: probably yes.
Can we react (thumbs-up, thumbs-down, etc.) on a post? Answer: probably yes.
Can we comment on a post?
Can we post photos and videos, or only text?
Can users search for other users?
Can non-users search for users?
Can users or non-users search for posts?

B. Clarify constraints and estimate the scale

How many total users we expect to have?
How many daily users we expect to have?
How many requests the server could get per second? What about per month?
How much data is read per second? How much data is written per second?
What is the amount of storage we need over 2-3-5-10 years?

In this step we should understand if the system is more read of write oriented, and identify the traffic and data constraints.

What scale is expected from the system?
How much storage would we need? Is this based on the users as well?
Have we already taken into account the data required to store the posts and any post-related data?
What network bandwidth usage are we expecting?

Step 2. Define the API

Once you have gathered the specifications, you should define the expected API exposed by the system.

For example, the API will probably have functions such as:

Post(user, text, location, timestamp)
DisplayTimeline(user, timestamp)

Step 3. Define the data model

The data model will clarify how the data is moving within the system components.

Through this, you identify entities of the system and their interaction with each other.

User: UserID, name, e-mail address, date of birth, last login
Post: postID, content, location, timestamp, reacts, privacy
User_Friends: friend1, friend2…

You should also think about which database system you should use, be it SQL or NoSQL.

Step 4. High level design

You should draw a big box to delimit the system, break it in sub-components, and then discuss the role of each component.

We need to have a load balancer to distribute the traffic appropiately.

We also need a database that can store all the posts, and such db should support a big number of reads.

We also probably need a distributed file storage system for storing the photos and videos.

Step 5. In-depth design

You should take 2-3 core components and discuss in details, approaches, pros and cons, and which one would you choose and why.

There’s no correct answer, so the only important thing here is to consider the tradeoffs while keeping the system constraints in mind.

Do we partition our data? Is there actually a need for that?
Should we try to store all the data on the same database instead?
How do we optimize the system to provide the most recent posts?
Where do we add caching? What about load balancing?
Should we separate the read and write requests on different servers?

Step 6. Identify bottlenecks

Try to identify bottlenecks and think about ways on how to mitigate them.

What are potential Single point of failures within the system?
How are we monitoring the performance of our system? Are we notified when something fails?

Shortcut to seniority

Badea Robert