11. Design A Chat System

info

Design a Chat System, like Whatsapp, Facebook messenger, Wechat, Line, Discord

A chat app performs different functions for different people. It is extremely important to nail down the exact requirements. For example, you do not want to design a system that focuses on group chat when the interviewer has one-on-one chat in mind. It is important to explore the feature requirements.

References

https://bytebytego.com/courses/system-design-interview/design-a-chat-system

Step 1 - Understand the problem and establish design scope

It is vital to agree on the type of chat app to design. In the marketplac:

One-on-one chat apps like Facebook Messenger, WeChat, and WhatsApp
Office chat apps that focus on group chat like Slack
Game chat apps, like Discord, that focus on large group interaction and low voice chat latency.

The first set of clarification questions should nail down what the interviewer has in mind exactly when she asks you to design a chat system. At the very least, figure out if you should focus on a one-on-one chat or group chat app.

In the chapter, we focus on designing a chat app like Facebook messenger, with an emphasis on the following features:

A one-on-one chat with low delivery latency. 一对一聊天
Small group chat (max of 100 people). 群组聊天
Online presence. 在线状态
Multiple device support. The same account can be logged in to multiple accounts at the same time. 支持多账户同时登录
Push notifications. 推送通知

It is also important to agree on the design scale. We will design a system that supports 50 million DAU.

Step 2 - Propose high-level design and get buy-in

Clients do not communicate directly with each other. Instead, each client connects to a chat service, which supports all the features mentioned above. Let us focus on fundamental operations. The chat service must support the following functions:

Receive messages from other clients.
Find the right recipients for each message and relay the message to the recipients.
If a recipient is not online, hold the messages for that recipient on the server until she is online.

When a client intends to start a chat, it connects the chats service using one or more network protocols. For a chat service, the choice of network protocols is important. Let us discuss this with the interviewer.

Requests are initiated by the client for most client/server applications. This is also true for the sender side of a chat application. 对于聊天应用程序, 请求都是由客户端发起的。

In Figure 2, when the sender sends a message to the receiver via the chat service, it uses the time-tested HTTP protocol, which is the most common web protocol. In this scenario, the client opens a HTTP connection with the chat service and sends the message, informing the service to send the message to the receiver. The keep-alive is efficient for this because the keep-alive header allows a client to maintain a persistent connection with the chat service. It also reduces the number of TCP handshakes. HTTP is a fine option on the sender side, and many popular chat applications such as Facebook [1] used HTTP initially to send messages.

However, the receiver side is a bit more complicated. Since HTTP is client-initiated, it is not trivial to send messages from the server. 由于 HTTP 是客户端发起的，因此从服务器发送消息并非易事。 Over the years, many techniques are used to simulate a server-initiated connection: polling, long polling, and WebSocket. Those are important techniques widely used in system design interviews so let us examine each of them.

Polling

As shown in Figure 3, polling is a technique that the client periodically asks the server if there are messages available. Depending on polling frequency, polling could be costly. It could consume precious server resources to answer a question that offers no as an answer most of the time.

Long polling

In long polling, a client holds the connection open until there are actually new messages available or a timeout threshold has been reached. Once the client receives new messages, it immediately sends another request to the server, restarting the process. Long polling has a few drawbacks:

Sender and receiver may not connect to the same chat server. HTTP based servers are usually stateless. If you use round robin for load balancing, the server that receives the message might not have a long-polling connection with the client who receives the message.
A server has no good way to tell if a client is disconnected.
It is inefficient. If a user does not chat much, long polling still makes periodic connections after timeouts.

WebSocket

WebSocket is the most common solution for sending asynchronous updates from server to client. Figure 5 shows how it works.

WebSocket connection is initiated by the client. It is bi-directional and persistent. WebSocket连接由客户端发起, 它是双向且持久的。

It starts its life as a HTTP connection and could be “upgraded” via some well-defined handshake to a WebSocket connection. Through this persistent connection, a server could send updates to a client. WebSocket connections generally work even if a firewall is in place. This is because they use port 80 or 443 which are also used by HTTP/HTTPS connections. 通过 HTTP 连接开始, 通过某些特定的握手升级为 WebSocket 连接, 并使用 80/443 端口

Earlier we said that on the sender side HTTP is a fine protocol to use, but since WebSocket is bidirectional, there is no strong technical reason not to use it also for sending. Figure 6 shows how WebSockets (ws) is used for both sender and receiver sides.

By using WebSocket for both sending and receiving, it simplifies the design and makes implementation on both client and server more straightforward. Since WebSocket connections are persistent, efficient connection management is critical on the server-side.

Step 3 - High-level design

Just now we mentioned that WebSocket was chosen as the main communication protocol between the client and server for its bidirectional communication, it is important to note that everything else does not have to be WebSocket. In fact, most features (sign up, login, user profile, etc) of a chat application could use the traditional request/response method over HTTP. Let us drill in a bit and look at the high-level components of the system.

As shown in Figure 7, the chat system is broken down into three major categories: stateless services, stateful services, and third-party integration.

Stateless Services 无状态服务

Stateless services are traditional public-facing request/response services, used to manage the login, signup, user profile, etc. 无状态服务是传统的面向公众的请求/响应服务，用于管理登录、注册、用户配置文件等。

Stateless services sit behind a load balancer whose job is to route requests to the correct services based on the request paths. These services can be monolithic or individual microservices. The one service that we will discuss more in deep dive is the service discovery. Its primary job is to give the client a list of DNS host names of chat servers that the client could connect to.

Stateful Service 有状态服务

The only stateful service is the chat service. The service is stateful because each client maintains a persistent network connection to a chat server. In this service, a client normally does not switch to another chat server as long as the server is still available. The service discovery coordinates closely with the chat service to avoid server overloading. We will go into detail in deep dive. 在此服务中，只要服务器仍然可用，客户端通常不会切换到另一个聊天服务器。

Third-party integration 第三方集成

For a chat app, push notification is the most important third-party integration. 对于聊天应用程序来说，推送通知是最重要的第三方集成。

It is a way to inform users when new messages have arrived, even when the app is not running. Proper integration of push notification is crucial. Refer to "Design a notification system" chapter for more information.

Scalability 可扩展性

On a small scale, all services listed above could fit in one server. Even at the scale we design for, it is in theory possible to fit all user connections in one modern cloud server. The number of concurrent connections that a server can handle will most likely be the limiting factor.

In our scenario, at 1M concurrent users, assuming each user connection needs 10K of memory on the server (this is a very rough figure and very dependent on the language choice), it only needs about 10GB of memory to hold all the connections on one box. 在我们的场景中，在 1M 并发用户的情况下，假设每个用户连接需要服务器上 10K 的内存（这是一个非常粗略的数字，并且非常依赖于语言的选择），则只需要大约 10GB 的内存来容纳一台服务器上的所有连接。

If we propose a design where everything fits in one server, this may raise a big red flag in the interviewer’s mind. No technologist would design such a scale in a single server. Single server design is a deal breaker due to many factors. The single point of failure is the biggest among them.

However, it is perfectly fine to start with a single server design. Just make sure the interviewer knows this is a starting point. Putting everything we mentioned together, Figure 8 shows the adjusted high-level design.

In Figure 8, the client maintains a persistent WebSocket connection to a chat server for real-time messaging.

Chat servers facilitate message sending/receiving. 聊天服务器负责消息发送/接收。
Presence servers manage online/offline status.
API servers handle everything including user login, signup, change profile, etc.
Notification servers send push notifications.
Finally, the key-value store is used to store chat history. When an offline user comes online, she will see all her previous chat history. K-V DB 用于存储聊天记录。当离线用户上线时，她将看到她之前的所有聊天记录。

Storage 存储

An important decision we must make is to decide on the right type of database to use: relational databases or NoSQL databases? To make an informed decision, we will examine the data types and read/write patterns.

Two types of data exist in a typical chat system. The first is generic data, such as user profile, setting, user friends list. These data are stored in robust and reliable relational databases. Replication and sharding are common techniques to satisfy availability and scalability requirements. 第一个是通用数据，例如用户个人资料、设置、用户好友列表。这些数据存储在强大且可靠的关系数据库中。复制和分片是满足可用性和可扩展性要求的常用技术。

The second is unique to chat systems: chat history data. It is important to understand the read/write pattern.

The amount of data is enormous for chat systems. A previous study [2] reveals that Facebook messenger and Whatsapp process 60 billion messages a day.
Only recent chats are accessed frequently. Users do not usually look up for old chats. 仅经常访问最近的聊天记录, 用户通常不会查找旧的聊天记录。
Although very recent chat history is viewed in most cases, users might use features that require random access of data, such as search, view your mentions, jump to specific messages, etc. These cases should be supported by the data access layer.
The read to write ratio is about 1:1 for 1 on 1 chat apps.

Selecting the correct storage system that supports all of our use cases is crucial. We recommend key-value stores for the following reasons:

Key-value stores allow easy horizontal scaling.
Key-value stores provide very low latency to access data.
Relational databases do not handle long tail [3] of data well. When the indexes grow large, random access is expensive. 关系数据库不能很好地处理长尾数据。当索引变大时，随机访问的成本很高。
Key-value stores are adopted by other proven reliable chat applications. For example, both Facebook messenger and Discord use key-value stores. Facebook messenger uses HBase [4], and Discord uses Cassandra [5].

Data models

Just now, we talked about using key-value stores as our storage layer. The most important data is message data. Let us take a close look.

Message table for 1 on 1 chat

Figure 9 shows the message table for 1 on 1 chat. The primary key is message_id, which helps to decide message sequence. We cannot rely on created_at to decide the message sequence because two messages can be created at the same time.

message {
    message_id bigint
    message_from bigint
    message_to bigint
    content text
    created_at timestamp
}

Message table for group chat

Figure 10 shows the message table for group chat. The composite primary key is (channel_id, message_id). Channel and group represent the same meaning here. channel_id is the partition key because all queries in a group chat operate in a channel.

group_message {
    channel_id bigint
    message_id bigint
    user_id bigint
    content text
    created_at timestamp
}

Message ID

How to generate message_id is an interesting topic worth exploring. message_id carries the responsibility of ensuring the order of messages. To ascertain the order of messages, message_id must satisfy the following two requirements: message_id 负责确保消息的顺序和消息的唯一标识符。

IDs must be unique.
IDs should be sortable by time, meaning new rows have higher IDs than old ones.

How can we achieve those two guarantees? The first idea that comes to mind is the “auto_increment” keyword in MySql. However, NoSQL databases usually do not provide such a feature.

The second approach is to use a global 64-bit sequence number generator like Snowflake [6]. This is discussed in the “Design a unique ID generator in a distributed system” chapter.

The final approach is to use local sequence number generator. Local means IDs are only unique within a group. The reason why local IDs work is that maintaining message sequence within one-on-one channel or a group channel is sufficient. This approach is easier to implement in comparison to the global ID implementation. 本地意味着 ID 仅在组内是唯一的。本地ID起作用的原因是维持一对一通道或组通道内的消息顺序就足够了。与全局 ID 实现相比，这种方法更容易实现。

Step 4 - Design deep dive

For the chat system, service discovery, messaging flows, and online/offline indicators worth deeper exploration.

Service discovery

The primary role of service discovery is to recommend the best chat server for a client based on the criteria like geographical location, server capacity, etc. Apache Zookeeper [7] is a popular open-source solution for service discovery. It registers all the available chat servers and picks the best chat server for a client based on predefined criteria. 服务发现的主要作用是根据地理位置、服务器容量等标准为客户端推荐最佳的聊天服务器。它注册所有可用的聊天服务器，并根据预定义的标准为客户端选择最佳的聊天服务器。

Figure 11 shows how service discovery (Zookeeper) works.

1.User A tries to log in to the app.
2.The load balancer sends the login request to API servers.
3.After the backend authenticates the user, service discovery finds the best chat server for User A. In this example, server 2 is chosen and the server info is returned back to User A.
4.User A connects to chat server 2 through WebSocket.

Message flows

In this section, we will explore 1 on 1 chat flow, message synchronization across multiple devices and group chat flow.

1 on 1 chat flow

1. User A sends a chat message to Chat server 1.
1. Chat server 1 obtains a message ID from the ID generator.
1. Chat server 1 sends the message to the message sync queue.
1. The message is stored in a key-value store.
5.a. If User B is online, the message is forwarded to Chat server 2 where User B is connected.
5.b. If User B is offline, a push notification is sent from push notification (PN) servers.
1. Chat server 2 forwards the message to User B. There is a persistent WebSocket connection between User B and Chat server 2.

Message synchronization across multiple devices 多设备消息同步

Many users have multiple devices. We will explain how to sync messages across multiple devices.

In Figure 13, user A has two devices: a phone and a laptop. When User A logs in to the chat app with her phone, it establishes a WebSocket connection with Chat server 1. Similarly, there is a connection between the laptop and Chat server 1.

Each device maintains a variable called cur_max_message_id, which keeps track of the latest message ID on the device. Messages that satisfy the following two conditions are considered as news messages:

The recipient ID is equal to the currently logged-in user ID.
Message ID in the key-value store is larger than cur_max_message_id.

With distinct cur_max_message_id on each device, message synchronization is easy as each device can get new messages from the KV store.

Small group chat flow

In comparison to the one-on-one chat, the logic of group chat is more complicated.

Figure 14 explains what happens when User A sends a message in a group chat. Assume there are 3 members in the group (User A, User B and user C). First, the message from User A is copied to each group member’s message sync queue: one for User B and the second for User C. You can think of the message sync queue as an inbox for a recipient. This design choice is good for small group chat because:

it simplifies message sync flow as each client only needs to check its own inbox to get new messages.
when the group number is small, storing a copy in each recipient’s inbox is not too expensive.

WeChat uses a similar approach, and it limits a group to 500 members [8]. However, for groups with a lot of users, storing a message copy for each member is not acceptable. 但是，对于拥有大量用户的组来说，为每个成员存储消息副本是不可接受的。

On the recipient side, a recipient can receive messages from multiple users. Each recipient has an inbox (message sync queue) which contains messages from different senders. Figure 15 illustrates the design. 在接收方，一个接收方可以接收来自多个用户的消息。每个收件人都有一个收件箱（消息同步队列），其中包含来自不同发件人的消息。图 15 展示了该设计。

Online presence 在线状态

An online presence indicator is an essential feature of many chat applications. Usually, you can see a green dot next to a user’s profile picture or username. This section explains what happens behind the scenes.

In the high-level design, presence servers are responsible for managing online status and communicating with clients through WebSocket. There are a few flows that will trigger online status change. Let us examine each of them. 呈现服务器负责管理在线状态并通过WebSocket与客户端进行通信。有一些流程会触发在线状态更改。让我们逐一考察一下。

The user login flow is explained in the “Service Discovery” section. After a WebSocket connection is built between the client and the real-time service, user A’s online status and last_active_at timestamp are saved in the KV store. Presence indicator shows the user is online after she logs in.

User logout

When a user logs out, it goes through the user logout flow as shown in Figure 17. The online status is changed to offline in the KV store. The presence indicator shows a user is offline.

User disconnection

We all wish our internet connection is consistent and reliable. However, that is not always the case; thus, we must address this issue in our design. When a user disconnects from the internet, the persistent connection between the client and server is lost.

A naive way to handle user disconnection is to mark the user as offline and change the status to online when the connection re-establishes. However, this approach has a major flaw. It is common for users to disconnect and reconnect to the internet frequently in a short time. For example, network connections can be on and off while a user goes through a tunnel. Updating online status on every disconnect/reconnect would make the presence indicator change too often, resulting in poor user experience. 处理用户断开连接的一种简单方法是将用户标记为离线，并在连接重新建立时将状态更改为在线。然而，这种方法有一个重大缺陷。用户在短时间内频繁断开和重新连接互联网是很常见的。例如，当用户穿过隧道时，网络连接可以打开和关闭。每次断开/重新连接时更新在线状态会使状态指示符更改过于频繁，从而导致用户体验不佳。

We introduce a heartbeat mechanism to solve this problem. Periodically, an online client sends a heartbeat event to presence servers. If presence servers receive a heartbeat event within a certain time, say x seconds from the client, a user is considered as online. Otherwise, it is offline. 我们引入心跳机制来解决这个问题。在线客户端定期向状态服务器发送心跳事件。如果在线状态服务器在一定时间内（例如 x 秒）从客户端接收到心跳事件，则认为用户在线。否则就是离线状态。

In Figure 18, the client sends a heartbeat event to the server every 5 seconds. After sending 3 heartbeat events, the client is disconnected and does not reconnect within x = 30 seconds (This number is arbitrarily chosen to demonstrate the logic). The online status is changed to offline.

Online status fanout

How do user A’s friends know about the status changes? Figure 19 explains how it works. Presence servers use a publish-subscribe model, in which each friend pair maintains a channel. When User A’s online status changes, it publishes the event to three channels, channel A-B, A-C, and A-D. Those three channels are subscribed by User B, C, and D, respectively. Thus, it is easy for friends to get online status updates. The communication between clients and servers is through real-time WebSocket.

用户A的好友如何知道状态变化? 状态服务器使用发布-订阅模型，其中每个朋友对维护一个频道。当用户A在线状态发生变化时，将事件发布到三个通道：通道 A-B、A-C、A-D。这三个频道分别由用户 B、C 和 D 订阅。这样，朋友就可以轻松获得在线状态更新。客户端和服务器之间的通信是通过实时 WebSocket 进行的。

The above design is effective for a small user group. For instance, WeChat uses a similar approach because its user group is capped to 500. For larger groups, informing all members about online status is expensive and time consuming. Assume a group has 100,000 members. Each status change will generate 100,000 events. 上述设计对于小用户群来说是有效的。例如，微信就采用了类似的方法，因为它的用户群体上限为 500 人。对于较大的群体，向所有成员通报在线状态既昂贵又耗时。假设一个群组有 100,000 名成员。每次状态更改都会生成 100,000 个事件。

To solve the performance bottleneck, a possible solution is to fetch online status only when a user enters a group or manually refreshes the friend list. 为了解决性能瓶颈，一种可能的解决方案是仅在用户进入群组或手动刷新好友列表时才获取在线状态。

Step 5 - Wrap up

In this chapter, we presented a chat system architecture that supports both 1-to-1 chat and small group chat. WebSocket is used for real-time communication between the client and server. The chat system contains the following components: chat servers for real-time messaging, presence servers for managing online presence, push notification servers for sending push notifications, key-value stores for chat history persistence and API servers for other functionalities. 聊天系统包含以下组件：用于实时消息传递的聊天服务器、用于管理在线状态的状态服务器、用于发送推送通知的推送通知服务器、用于保存聊天历史记录的键值存储以及用于其他功能的 API 服务器。

If you have extra time at the end of the interview, here are additional talking points:

Extend the chat app to support media files such as photos and videos. Media files are significantly larger than text in size. Compression, cloud storage, and thumbnails are interesting topics to talk about. 压缩、云存储和缩略图
End-to-end encryption. Whatsapp supports end-to-end encryption for messages. Only the sender and the recipient can read messages. Interested readers should refer to the article in the reference materials [9]. 端到端加密。 Whatsapp 支持消息的端到端加密。只有发件人和收件人才能阅读消息。
Caching messages on the client-side is effective to reduce the data transfer between the client and server. 在客户端缓存消息可以有效减少客户端和服务器之间的数据传输。
Improve load time. Slack built a geographically distributed network to cache users’ data, channels, etc. for better load time [10]. 改善加载时间。 Slack 构建了一个地理分布式网络来缓存用户的数据、频道等，以获得更好的加载时间。
Error handling.
The chat server error. There might be hundreds of thousands, or even more persistent connections to a chat server. If a chat server goes offline, service discovery (Zookeeper) will provide a new chat server for clients to establish new connections with. 与聊天服务器的持久连接可能有数十万甚至更多。如果聊天服务器离线，服务发现（Zookeeper）将提供一个新的聊天服务器供客户端建立新的连接。
Message resent mechanism. Retry and queueing are common techniques for resending messages. 消息重发机制。重试和排队是重新发送消息的常用技术。

References​

Step 1 - Understand the problem and establish design scope​

Step 2 - Propose high-level design and get buy-in​

Polling​

Long polling​

WebSocket​

Step 3 - High-level design​

Stateless Services 无状态服务​

Stateful Service 有状态服务​

Third-party integration 第三方集成​

Scalability 可扩展性​

Storage 存储​

Data models​

Message table for 1 on 1 chat​

Message table for group chat​

Message ID​

Step 4 - Design deep dive​

Service discovery​

Message flows​

1 on 1 chat flow​

Message synchronization across multiple devices 多设备消息同步​

Small group chat flow​

Online presence 在线状态​

User login​

User logout​

User disconnection​

Online status fanout​

Step 5 - Wrap up​