Intro
In this article I’ll be outlining and going into some detail on the process of scaling a single database to handle progressively higher amounts of read and write requests. We’ll start from a single PostgreSQL/MySQL database, and take a look at recommendations for incremental scaling.
Remember that overengineering is a common issue! Don’t jump straight to sharding and ruin your simple system to accommodate an amount of request that could be handled by turning on indexes.
Single Server
We begin with the most basic design possible, a single server backend, connected to an instance of PostgreSQL (or whatever RDS you prefer).
Note
In this article I’ll be mostly focusing on RDS, since many NoSQL services will have different methods of scaling, and often will require less scaling since they are already optimized for high traffic.
Here’s an idea of what it could look like (excuse the gorilla style excalidraw):

This simple architecture can genuinely work great for many applications. If you’re running a simple WordPress webpage with nowhere near even 50 requests per second (RPS) you can stop here. You’re gonna be fine.
For my more popular webapp developers, reaching ~50-100 RPS, it’s time for an upgrade.
Application Code
First off, if you’re running inefficient code on your server, creating memory leaks and complex overhead, go optimize that code now. Having slow code server side can eventually cause your client server connection to bottleneck sometimes even before your database does! So fix it.
And if your reaction to that is, “I don’t know, is it fast?“. It’s time to do some benchmarking and make sure. Ideally you’ve already taken care of this but it’s a nice thing to check before things get complex.
That being said, if you’re getting more users and having higher latency this is probably because of slow reads from your database. Most applications are read heavy, and writes (especially in simple RDSs) are fast because you just need to add a record to the end of a table or two.
Why is the server-database connection often slower than the client-server connection?
Often server code can be simply run mostly in main memory and cache, without needing to expensively access disk. However, database functions are not unlikely to be ertirely composed of disk accesses, with complex queries and joins. Not to mention that often a single request to the server can result in more than one query on the database.
Indexing
Smart indexing is the first step you can take towards optimizing your database. Frequently accessed data should always be indexed based on the attribute(s) they are queried on. For instance, if you find yourself constantly getting requests for emails from the user table given a uid, you’ll want to place an index on user.uid.
In most relational databases this can be done with one command like below:
CREATE INDEX index_name ON table_name (column_to_index);So for the userid example it would look like:
CREATE INDEX user_index ON user (uid);Note
Making an index on a primary key in most modern DBMS (Postgres, MySQL namely) is redundant as a index will automatically be added to any column labeled as a PRIMARY KEY or UNIQUE.
There are a few types of indexes used today, but by far the most common is a B-Tree index. These indexes use a series of lookup tables (analogous to beloved multi-level page tables) that create a tree structure lowering lookup times to a time complexity.
Other types of indexes
Composite index
A composite index indexes your database off of two or more values. The index will search by the first index first, and then by the second, and so on.
A well placed composite index can buy you a lot of time for complex queries involving two or more columns. If you have a WHERE clause in your query that involves two or more columns, creating a composite index on those datapoints might serve your interests.
Creating a composite index is simple and follows the SQL outlined here:
CREATE INDEX index_name ON table_name (col1, col2, ...);Note that the order of these columns is important. Databases usually follow the left prefix index rule meaning that in the example above, the index could be used effectively to search on (col1), (col1 and 2), (col1, 2, and 3) and so on, but never (col2) or (col2, col3) etc. If you have other queries that are searching based on col2 or (col2, col3), you should likely create a different index for each. More details on this topic can be found in this article
Full-text index
A full-text index is a very specific type of index that (you guessed it!) helps when querying based on text. For instance when you have queries that end with:
... WHERE text_col LIKE '%key%';It’s probably a good idea to look into one. In Postgres this takes a little more setup, but in MySQL it can be added with this command:
CREATE FULLTEXT INDEX index_name ON table_name (col);Full-text indexes work by mapping words (or tokens) to the documents they’re contained in. This allows the database to quickly narrow the search range when looking for specific keywords.
What’s next?
Let’s take a quick look at what we have now:

This is nice, but optimizing application code and database indexes can only get you so far. As your reads increase, and the size of your tables grow larger, eventually server-database operations will begin to produce latency again. What’s the next best step?
Partitioning
Not to be confused with sharding, partitioning is the process of splitting up data within your database. This simplifies and speeds up queries since the request can be directed to a region of the disk based on a key.
Sharding vs. partitioning
I’ve heard and found conflicting information on what exactly sharding means and you might hear a variety of definitions throughout your career. Many say sharding is a type of partitioning, which I guess I agree with. But for this article I’ll be using partitioning to mean splitting and grouping data within the database and sharding to mean splitting and grouping data into different servers.
There are two main types of partitioning: horizontal and vertical. Horizontal partitioning involves splitting the table into groups of rows (records). Vertical partitioning groups partitions into groups of columns (attributes). Vertical partitioning solves a more specific, and less common problem than horizontal partitioning, but I’ll go over both here.
Vertical partitioning
Vertical partitioning is best used when you have a table with many columns, and less than 20% of those columns are being accessed frequently. When this happens it might (big might) be worth considering vertically partitioning your database into two tables, one holding the frequently accessed columns, and the other containing the rest.
However doing this can have some serious consequences in terms of overhead and complexity. First of all, neither Postgres, nor MySQL supports an easy way to partition vertically. Doing so would require manual splits between database tables, and possibly downtime for your application.
Not to mention that all of the SQL queries in your application code would need to be refactored, along with any indexes made on these tables.
In general vertical partitioning can be a big mess, and is usually not necessary unless planned on from the outset of designing your schema.
That being said you have your own brain and its a free country so do what you please.
Horizontal partitioning
As I mentioned before, horizontal partitioning involves splitting up a table into groups of rows. Practically there are a few different ways to go about this, but the most common are range based and hash based partitioning.
Range based
Range based partitioning is when you divide your database into contiguous sections of rows, each section directing the data towards a different partition. The sections are chosen based on a range key, which is one or more of the columns in the database table. For instance, lets say we decide to partition user data based on the year the account was created.
| account_created | partition |
|---|---|
| 2025 | 5 |
| 2024 | 4 |
| 2023 | 3 |
| … | … |
You might immediately notice a minor issue with this method. Unless the data being partitioned is static and not receiving writes, more partitions will always need to be added for newer customers. As soon as January comes around we’ll assign 2026 -> partition 6.
Note
A way to fix this could be to partition based on a fixed range value like months, or a range of days.
Now when a query for user data reaches the database, the date will first be checked to see which partition to look in, and then the query will be fired on that smaller more efficient table.
Taking another look at our system diagram we can get a more clear picture of how requests are processed now:

Notice again that we still only have one database! Also keep in mind that our indexes are still there, they just now are used (more efficiently) on each individual partition.
Issues with range based partitioning
Range based partitioning’s largest issue is that it can often result in hot spots where one partition is receiving the majority of reads and writes. Solving this is sometimes difficult, but one common fix is to use specialized partitions, where key ranges are unevenly distributed, in an effort to uniformly distribute load.
For instance lets say that our company gained twice as many users in 2023 when compared to other years, and as a result there more requests to partition 3. A different partition scheme could solve this issue:
| account_created | partition |
|---|---|
| 2025 | 5 |
| 2024 | 4 |
| 2023/7/3-2023/12/31 | 3 |
| 2023/1/1-2023/7/2 | 2 |
| … | … |
| Now the requests previously going into partition 3 will be split up into 3 and 2. Obviously a real world situation would require more analysis of access patterns, but you get the idea. |
Another way to solve this would be to instead use hash based partitioning.
Hash based partitioning
Hash based partitioning solves the hotspot problem by using a fast and uniform hashing function on the partition key, and then using that hash value to determine which partition to send the query.
Note
Unlike in cryptography, a collision free (injective for all my math people) hash function is not necessary or ideal. The goal here is simply to achieve an even distribution of hash values.
One simple way to do this is to use a modulo formula as shown below.
N = # of partitions
partition id = hash(partition_key) mod N
For example let’s say that we want to have 5 partitions, then the possible partition ids would be 0, 1, 2, 3 and 4 because those are all of the possible remainders from division by 5.
This is a simple but effective approach that is used in many production hash based partitions. In fact this happens to be the exact method PostgreSQL uses to create hash based partitions.
Range queries on hashed partitions
One problem with using hash based partitioning is that queries that find data in a rage like
... WHERE id BETWEEN 200 AND 400;can be expensive.
Since the hash is uniform, it’s likely that records for id=200 are stored in a completely different partition from id=201 and id=202. This can make finding information like this very slow costly. If your query patterns involve ranges, its a good idea to user range based partitioning (or at least avoid hash based).
Partitioning summary
Partitioning can provide some great performance increases by reducing the amount of data that needs to be processed for each query, but you should take great care to analyze your access patterns before you make important decisions like how many partitions you want, and what type of partitioning you’ll use.
Let’s recap where our system is right now. With proper partitioning and indexes, database reads may be tamed for a while, but it’s also possible that your client-server connection has become a bottleneck.

This means its time to scale your sever.
Server scaling
When it comes to server scaling there are essentially two options: vertical and horizontal scaling.
Vertical scaling means upgrading the physical machines your server is running on. Usually this means adding more RAM or compute power. Vertical scaling is always an option, and can be a good way to increase the number of requests your servers can handle, especially if you want to keep things as simple as possible.
Vertical scaling can only get you so far, since there are limits on compute power for one machine, and cost to upgrade it.
Horizontal scaling means adding more machines to your server pool. Horizontal scaling can provide more flexibility, fault tolerance, and cost much less, but it does come at the cost of some overhead. One of the decisions you have to make when a request comes into the backend is now What server should it go to?
Distributed systems solve this issue with load balancers.
Load balancing
Load balancing is the process of distributing requests across a variety of servers. Load balancers can be placed in many locations in the architecture, but they are most commonly seen between the client and the application server pool.
Here’s what our architecture might look like once we start distributing load and scaling horizontally.

The method that is used to decide which server to send a request is called a load balancing algorithm. There are many good options, and as always the choice will depend on your use case. To read more about load balancing algorithms check out Cloudflare’s summary of them here.
Single point of failure
Now that the system is officially distributed its time to start worrying about big words like reliability, scalability and availability. In this case adding a single load balancer to handle all incoming traffic creates a single point of failure, which can jeopardize availability. For this reason it’s customary to include a warm standby to take over if the load balancer fails.
Scaling horizontally with load balancers also has a variety of other benefits I won’t get into depth on here, but to cover them quickly:
- Elasticity: More servers can be added or removed with relative ease, and options are even available in cloud computing services that do this automatically (like AWS Elastic Load balancer in combination with EC2 auto scaling).
- Fault tolerance: If an application server goes down, the load balancer can simply not send requests to that server until it is back online.
- Cost: Buying more servers is often more cost effective than scaling vertically and upgrading your current one. Additionally, added elastic allocation of resources can ensure you’re only paying for what you need.
Notice that we still only have one dedicated database server. It might be the case that your daily average users (DAU) is getting high enough that even a well partitioned and indexed database has slow reads. So we’ll have to add on one more layer of complexity.
Read replicas
To handle more reads we’ll need to split up our database by function. One main server will handle all of the write operations, while other replicas of that database can handle reads.
Replica databases can either synchronously or asynchronously pull in writes from the primary database, depending on what level of consistency is required for the system. Be warned that if you want strong consistency, and require all databases to be written to synchronously on each write, there is a chance writes could become a bottleneck for you in the future.
Diagraming will give us something like this:

Note
In this diagram I’ve included a “database router” component. This component serves the function of determining where to send queries. In most systems the functionality of this component will not be separate but will be managed in the application servers. The main reason I included one here is because I didn’t wanna draw all those arrows.
That being said it is possible to have some sort of proxy server handling this type of routing and it could potentially be a good idea for your situation.
Distributing reads like this has much the same effect as using a load balancer in front of application servers.
Sharding
Note on caching
TLDR: Go right ahead!