2. Design YouTube

info

Design a Video Platform, like YouTube, Bilibili, Netflix, TikTok.

来源: 花花酱

Interview Signals

System Design 考察的四方面

Work solution: 能否提出一个可以 work 的方案，是否熟悉常见的场景与设计模式.
Analysis and communication: 与面试官保持交流，对 Storage 与 Bandwidth 的分析.
Tradeoff Pros/Cons: 是否能够提出不同的解决方法并评估不同解决方案的优缺点，根据需求做取舍.
Knowledge Base: 知识面深度和广度.

Overview

Step 1: Clarify the requirements
Step 2: Capacity Estimation
Step 3: System APIs
Step 4: High-level System Design
Step 5: Data Storage
Step 6: Scalability

Step 1: Clarify the requirements

Clarify goals of the system

Requirements
Traffic size (e.g., Daily Active User)

Discuss the funcitonalities, align with interviewers on components to focus.

Align with interviewers on 2-3 components to focus in the interview

Type 1: Functional Requirement

Upload
View
Share
Like/Dislike
Comment
Search
Recommend
...

Type 2: Non-Functional Requirement

在一个分布式计算系统中，只能同时满足下列的两点。即 CAP 理论

Consistency 一致性
- Every read receives the most recent write or an error
- Tradeoff with Availability: Eventual consistency
Availability 可用性
- Every request receives a (non-error) response, without the guarantee that it contains the most recent write
- Scalable
  - Performance: low Latency 低延迟
Partition tolerance (Fault Tolerance) 容错性
- The system continues to operate despite an arbitrary number of messgaes being dropped (or delayed) by the network between nodes

Step 2: Capacity Estimation

Why Capacity Estimation?

Evaluate candidate's analytical skills & system sense
Helpful for identifying system bottlenecks in order to improve system scalability
infra resource 需要提前申请提前准备

Assumption

2 billion total users, 150 million daily active user (DAU)
Each content creator
- 1% among all users are content creators
- Upload frequency: 1 video per week
Each contenx consumer
- Avg watching time: 50 mins per day
Each video
- Avg length: 10 mins
- Avg upload resolution: 1080p
- Bandwidth requirement for 1080p playback: 5Mbps

Storage Estimation

New videos
- 2B * 1% * 1 video / 7 days = 33 videos/s
File size per minute
- 当用户上传了视频，我们需要转码 (video transcoding) 成不同分辨率的视频文件，以满足不同分辨率的播放需求
- 1080p + 720p + 480p + 360p + ... = 100MB/min
Daily write size
- 33 videos / s * 86400s/day * 10 min * 100MB/min = 2.7PB/day
Replication
- redundancy: same-region replication x3
- availability: cross-region replication x3

Bandwidth Estimate

Daily upload bandwidth

33 videos/s * (10 * 60s/video) * 5Mbps = 99Gbps

Daily download/ongoing bandwidth

Concurrent users
- 150M DAU * 50 mins / (24 * 60mins/day) = 5.2M users
Bandwidth
- 5.2M * 5Mbps = 26Tbps
Read / Write ratio
- 26Tbps / 99Gbps = 263: 1

Read heavy system

Step 3: System APIs

Public APIs

01 Upload Video 上传

uploadVideo(api_dev_key, video_title, vide_description, tags[], category_id, default_language,recording_details, video_contents)
Notification: via email or youtube notif once the encoding finished. 因为视频的处理通常需要较长的时间，服务器会采用异步处理 Async task
api_dev_key: The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota. 注册账户的 API 开发者密钥。除其他外，这将用于根据分配的配额限制用户。

api_dev_key(string):注册帐户的api开发者密钥。除其他外，这将用于根据分配的配额限制用户。
video_title (string):视频的标题。
vide_description(string):视频的可选描述。
tags (string[]):视频的可选标签。
category_id (string):视频的类别，例如电影、歌曲、人物等。
default_language (string):例如英语、普通话、印地语等。
recording_details (string):录制视频的位置。
video_contents (stream):要上传的视频。

02 Watch Video stream 获得视频流

streamVideo(api_dev_key, video_id, offset, codec, resolution)
Input
- Offset 时间戳: stream video from any offset. Allow users to resume video watching.
- Codec 编码格式, resolution 分辨率: depending on the watch devices.
Return
- A media stream from the given offset. 返回给定 offset 的视频流 stream

- api_dev_key(string): 我们服务的注册帐户的api开发者密钥。
- video_id (string): 用于标识视频的字符串。
- offset (number): 我们应该能够从任何偏移量流式传输视频；该偏移量是从视频开始的时间（以秒为单位）。如果我们支持播放/暂停来自多个设备的视频，我们将需要在服务器上存储偏移量。这将使用户能够从停止的同一点开始在任何设备上观看视频。
- codec (string) & resolution(string): 我们应该从客户端发送API中的编解码器和分辨率信息，以支持从多个设备播放/暂停。假设您正在电视的Netflix应用程序上观看视频，暂停视频，然后开始在手机的Netflix应用程序上观看视频。在这种情况下，您将需要编解码器和分辨率，因为这两个设备具有不同的分辨率并使用不同的编解码器。

info

Videos are chopped to 2 seconds clips, and stored separately. A typical short video of 5 mins = 150 clips. Medium size video 20 mins = 600 clips. Long video of 2 hours = 3600 clips. 视频被切成 2 秒的片段，分别存储。一个典型的短视频 5 分钟 = 150 个片段。中等大小的视频 20 分钟 = 600 个片段。2 小时的长视频 = 3600 个片段。

03 Search a video 搜索

searchVideo(api_dev_key, search_query, user_id, user_embedding_id, user_location, maximum_videos_to_return, page_token)
Input1: user_location_info
Input2: user_embedding_info
Input3: search_key_overwrite & synonyms 同义词

api_dev_key(string): 我们服务的注册帐户的api开发者密钥。
search_query (string): 包含搜索词的字符串。
user_id (string): 用户 uuid
user_embedding_id (string): 用户特征向量数据 id，用于 retrieve user embedding，用于后续 Rank 与 Sorting
user_location (string): 执行搜索的用户的可选位置。
maximum_videos_to_return (number): 一个请求中返回的最大结果数。
page_token (string): 此标记将在结果集中指定应返回的页面。

Internel APIs

encodeVideo(string videoId, codec, resolution) 转码
generateThumbnail(string videoId) 生成缩略图
StoreThumbnail(videoId, image) 存储缩略图
getVideoInfo(string videoId) 获取视频信息
getVideoThumbnail(string videoId) 获取视频缩略图
indexVideo(video_id) 索引视频

info

More details of video upload

Client starts with HTTP requests to the server to establish various settings. 客户端首先向服务器发出 HTTP 请求以建立各种设置。
- a. Create a videoID
- b. Prepare a storage location 上传到哪个数据库的地址
- c. ...
Server responds the storage location, client starts reading the video file and upload. 服务器响应存储位置，客户端开始读取视频文件并上传。

Step 4: High-level System Design

Scenario1: Upload video

Database(Metadata/User) 用于存储视频元数据，比如视频标题和视频描述等
视频本身会被存储在 Distributed Media Storage
上传的视频需要经过转码 encode 处理比较费时间，通过Processing Queue 来 Async task 处理。当视频上传时，Upload Service会发布任务至 Processing Queue
Video Processing Service 用于对视频进行处理，它会从 Processing Queue里取任务，然后从 Distributed Media Storage中下载相应的视频。视频的处理包括转码、提取缩略图等步骤。视频处理完之后，需要把视频和缩略图存放到 Distributed Media Storage，同时在 Database(Metadata/User)中更新视频和缩略图的存放路径
为达到低延迟，常见的做法是把视频 push 到离用户比较近的服务器，即 CDN模块。因此我们增加 Video Distributing Service 来负责把视频和图片分发到 CDN 的各个节点上。这个过程同样是 Async task，因为比较耗时间，我们通过 Completion Queue 来处理。当视频处理完之后，往这个队列里添加一个视频，下游的 Service可以去读取任务分发视频

Video Processing Service 功能

Breakdown video into chunks 把视频切分为小片段
Transcoding (multiple codec & resolutions)
- a. Decode
- b. Encode
Generate thumbnails and previews 生成缩略图
(Advanced) Video Understanding ML

The processing can be done in parallel across all chunks 并行处理

视频和图片是静态资源，适合 CDN的分发
不是所有视频都需要保存在 CDN上，因为容量有限。一般热门的视频会从 CDN 上 stream 给用户，冷门的视频则由原数据中心 stream 给用户

Scenario2: Watch video

Video Playback Service用于负责视频的播放
Host Identify Service用来对视频的地址进行查找，给定 video, user IP address, device info 查找离用户最近的并且存储有这个 video 的 CDN 的位置。如果找到了就把该位置返回给用户，用户即可观看视频。如果没有找到，就从原数据中心 stream 数据
其他数据如视频的标题和描述，由 Database(Metadata/User)中读取

Step 5: Data Storage

String是不可变长度的数据类型，varchar是可变长度的数据类型

SQL database
- Usage: store relational database
- E.g., user table, video metadata
NoSQL database
- Usage: store unstructured data
- E.g., store video thumbnails in big table
- 实际工程中，绝大部分公司还是把视频与图片都存储到文件系统
File system / Blob storage
- Media file: image, audio, video
- E.g., File system 分布式文件系统: HDFS, GlusterFS
- E.g., Blob storage: Netflix used Amazon S3

Step 6: Scalability

Identify potential bottlenecks
Discussion solutions, focusing on tradeoffs
- Data sharding 数据分片
  - Data store, cache
- Load balancing 负载均衡
  - E.g., user <-> application server; application server <-> cache server; application <-> db
- Data caching 数据缓存
  - Read heavy 对读 >> 写的 app 特别有用

Bottlenecks & Solutions

Read heavy system
- Distribute/Replicate data across servers/regions
  - Data sharding
  - Data replication
Latency 延迟敏感
- Caching
- CDN

Data Sharding

Why?
- Horizontal scaling: distribute our database load to multiple servers making it scalable
How?
- Sharding by video_id via consistent hashing
- Pros: Uniform distribution (no hot user problem)
- Cons:
  - Need to query all shards in order to get a user's videos
  - Hot videos can make shards experience very high traffic 例如热门的视频大家都在看
  - Solution: replicate the hot videos to more servers 对 hot video 进行多份拷贝

Data Replication

Why?
- Scale to handle heavy read traffic by making the same data available in multiple machines
- Improve availability
How?
- Primary-secondary configuration 主从数据库配置 (Master-Slave)
  - Write to primary and then propagate to all secondaries 所有的写操作都在主 DB，并复制到从 DB
  - Read from secondary 所有的读操作都在从 DB
  - 随时都可以读数据，不用被写操作影响
Cons?
- Break consistency: data staleness. Okay to have eventual consistency
- 破坏了 consistency 即读到的不一定是最新的数据

Caching

Where to put cache?
- CDN
- Server <-> database
- 任何需要重复读数据的地方都可以放 cache
How many to cache
- Cache 20% of daily read videos (follow the 80-20 rule)
- 经典原则是存储 20%需要读到的数据
How to scale
- Distribute cache across many servers using consistent hashing mechanism

E.g. Netflix 为了进一步优化 latency 和 ISP 合作

将 Cache 放入 ISP 里面。当用户请求播放某个视频的时候，如果 ISP发现里面有，则从 Cache里面 stream 视频给用户
好处是不用从 CDN 里面 stream 视频到 ISP再到达用户。节省了这一部分的带宽，用户在观看视频时的 latency 就更低了
90%的 Netflix traffic 是从 ISP的Cache里经过。而不用 hit CDN和Database

CDN

Predict locations where people would prefer to watch a video 提前预测
- Copy the video to the CDNs closest to the predicted locations in advance. So the videos are ready to serve the users when requested 提前 push 你要看的时候到你最近的 CDN
- Copy the video at off-peak time 例如在半夜进行视频的分发，降低分发时的带宽压力
我们将热门视频存储到 CDN:
- CDN 在多处复制内容。视频更接近于用户，并且跳数更少，视频将以更流畅网络流式传输。
- CDN 机器大量使用缓存，并且大部分可以提供超出内存的视频。
- 非热门的视频 (每天 1-20 次观看) 通过 DB server 查询返回

Interview Signals​

Overview​

Step 1: Clarify the requirements​

Type 1: Functional Requirement​

Type 2: Non-Functional Requirement​

Step 2: Capacity Estimation​

Assumption​

Storage Estimation​

Bandwidth Estimate​

Step 3: System APIs​

Step 4: High-level System Design​

Scenario1: Upload video​

Scenario2: Watch video​

Step 5: Data Storage​

Step 6: Scalability​

Bottlenecks & Solutions​

Data Sharding​

Data Replication​

Caching​

CDN​

Interview Signals

Overview

Step 1: Clarify the requirements

Type 1: Functional Requirement

Type 2: Non-Functional Requirement

Step 2: Capacity Estimation

Assumption

Storage Estimation

Bandwidth Estimate

Step 3: System APIs

Step 4: High-level System Design

Scenario1: Upload video

Scenario2: Watch video

Step 5: Data Storage

Step 6: Scalability

Bottlenecks & Solutions

Data Sharding

Data Replication

Caching

CDN