2. Design YouTube
Design a Video Platform, like YouTube, Bilibili, Netflix, TikTok.
来源: 花花酱
Interview Signals
System Design 考察的四方面
- Work solution: 能否提出一个可以 work 的方案,是否熟悉常见的场景与设计模式.
- Analysis and communication: 与面试官保持交流,对 Storage 与 Bandwidth 的分析.
- Tradeoff Pros/Cons: 是否能够提出不同的解决方法并评估不同解决方案的优缺点,根据需求做取舍.
- Knowledge Base: 知识面深度和广度.
Overview
- Step 1: Clarify the requirements
- Step 2: Capacity Estimation
- Step 3: System APIs
- Step 4: High-level System Design
- Step 5: Data Storage
- Step 6: Scalability
Step 1: Clarify the requirements
Clarify goals of the system
- Requirements
- Traffic size (e.g., Daily Active User)
Discuss the funcitonalities, align with interviewers on components to focus.
Align with interviewers on 2-3 components to focus in the interview
Type 1: Functional Requirement
- Upload
- View
- Share
- Like/Dislike
- Comment
- Search
- Recommend
- ...
Type 2: Non-Functional Requirement
在一个分布式计算系统中,只能同时满足下列的两点。即 CAP 理论
- Consistency 一致性
- Every read receives the most recent write or an error
- Tradeoff with Availability: Eventual consistency
- Availability 可用性
- Every request receives a (non-error) response, without the guarantee that it contains the most recent write
- Scalable
- Performance: low Latency 低延迟
- Partition tolerance (Fault Tolerance) 容错性
- The system continues to operate despite an arbitrary number of messgaes being dropped (or delayed) by the network between nodes
Step 2: Capacity Estimation
Why Capacity Estimation?
- Evaluate candidate's analytical skills & system sense
- Helpful for identifying system bottlenecks in order to improve system scalability
- infra resource 需要提前申请 提前准备
Assumption
- 2 billion total users, 150 million daily active user (DAU)
- Each content creator
- 1% among all users are content creators
- Upload frequency: 1 video per week
- Each contenx consumer
- Avg watching time: 50 mins per day
- Each video
- Avg length: 10 mins
- Avg upload resolution: 1080p
- Bandwidth requirement for 1080p playback: 5Mbps
Storage Estimation
- New videos
- 2B * 1% * 1 video / 7 days = 33 videos/s
- File size per minute
- 当用户上传了视频,我们需要转码 (video transcoding) 成不同分辨率的视频文件,以满足不同分辨率的播放需求
- 1080p + 720p + 480p + 360p + ... = 100MB/min
- Daily write size
- 33 videos / s * 86400s/day * 10 min * 100MB/min = 2.7PB/day
- Replication
- redundancy: same-region replication x3
- availability: cross-region replication x3
Bandwidth Estimate
Daily upload bandwidth
- 33 videos/s * (10 * 60s/video) * 5Mbps = 99Gbps
Daily download/ongoing bandwidth
- Concurrent users
- 150M DAU * 50 mins / (24 * 60mins/day) = 5.2M users
- Bandwidth
- 5.2M * 5Mbps = 26Tbps
- Read / Write ratio
- 26Tbps / 99Gbps = 263: 1
Read heavy system
Step 3: System APIs
Public APIs
01 Upload Video 上传
uploadVideo(api_dev_key, video_title, vide_description, tags[], category_id, default_language,recording_details, video_contents)
- Notification: via email or youtube notif once the encoding finished. 因为视频的处理通常需要较长的时间,服务器会采用异步处理 Async task
api_dev_key
: The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota. 注册账户的 API 开发者密钥。除其他外,这将用于根据分配的配额限制用户。
api_dev_key(string):注册帐户的api开发者密钥。除其他外,这将用于根据分配的配额限制用户。
video_title (string):视频的标题。
vide_description(string):视频的可选描述。
tags (string[]):视频的可选标签。
category_id (string):视频的类别,例如电影、歌曲、人物等。
default_language (string):例如英语、普通话、印地语等。
recording_details (string):录制视频的位置 。
video_contents (stream):要上传的视频。
02 Watch Video stream 获得视频流
streamVideo(api_dev_key, video_id, offset, codec, resolution)
- Input
Offset
时间戳: stream video from any offset. Allow users to resume video watching.Codec
编码格式,resolution
分辨率: depending on the watch devices.
- Return
- A media stream from the given offset. 返回给定 offset 的视频流
stream
- A media stream from the given offset. 返回给定 offset 的视频流
- api_dev_key(string): 我们服务的注册帐户的api开发者密钥。
- video_id (string): 用于标识视频的字符串。
- offset (number): 我们应该能够从任何偏移量流式传输视频;该偏移量是从视频开始的时间(以秒为单位)。如果我们支持播放/暂停来自多个设备的视频,我们将需要在服务器上存储偏移量。这将使用户能够从停止的同一点开始在任何设备上观看视频。
- codec (string) & resolution(string): 我们应该从客户端发送API中的编解码器和分辨率信息,以支持从多个设备播放/暂停。假设您正在电视的Netflix应用程序上观看视频,暂停视频,然后开始在手机的Netflix应用程序上观看视频。在这种情况下,您将需要编解码器和分辨率,因为这两个设备具有不同的分辨率并使用不同的编解码器。
Videos are chopped to 2 seconds
clips, and stored separately. A typical short video of 5 mins = 150 clips
. Medium size video 20 mins = 600 clips
. Long video of 2 hours = 3600 clips
. 视频被切成 2 秒的片段,分别存储。一个典型的短视频 5 分钟 = 150 个片段。中等大小的视频 20 分钟 = 600 个片段。2 小时的长视频 = 3600 个片段。
03 Search a video 搜索
searchVideo(api_dev_key, search_query, user_id, user_embedding_id, user_location, maximum_videos_to_return, page_token)
- Input1: user_location_info
- Input2: user_embedding_info
- Input3: search_key_overwrite & synonyms 同义词
api_dev_key(string): 我们服务的注册帐户的api开发者密钥。
search_query (string): 包含搜索词的字符串。
user_id (string): 用户 uuid
user_embedding_id (string): 用户特征向量数据 id,用于 retrieve user embedding,用于后续 Rank 与 Sorting
user_location (string): 执行搜索的用户的可选位置。
maximum_videos_to_return (number): 一个请求中返回的最大结果数。
page_token (string): 此标记将在结果集中指定应返回的页面。
Internel APIs
encodeVideo(string videoId, codec, resolution)
转码generateThumbnail(string videoId)
生成缩略图StoreThumbnail(videoId, image)
存储缩略图getVideoInfo(string videoId)
获取视频信息getVideoThumbnail(string videoId)
获取视频缩略图indexVideo(video_id)
索引视频
More details of video upload
- Client starts with HTTP requests to the server to establish various settings. 客户端首先向服务器发出 HTTP 请求以建立各种设置。
- a. Create a videoID
- b. Prepare a storage location 上传到哪个数据库的地址
- c. ...
- Server responds the storage location, client starts reading the video file and upload. 服务器响应存储位置,客户端开始读取视频文件并 上传。
Step 4: High-level System Design
Scenario1: Upload video
Database(Metadata/User)
用于存储视频元数据,比如视频标题和视频描述等- 视频本身会被存储在
Distributed Media Storage
- 上传的视频需要经过转码 encode 处理比较费时间,通过
Processing Queue
来 Async task 处理。当视频上传时,Upload Service
会发布任务至Processing Queue
Video Processing Service
用于对视频进行处理,它会从Processing Queue
里取任务,然后从Distributed Media Storage
中下载相应的视频。视频的处理包括转码、提取缩略图等步骤。视频处理完之后,需要把视频和缩略图存放到Distributed Media Storage
,同时在Database(Metadata/User)
中更新视频和缩略图的存放路径- 为达到低延迟,常见的做法是把视频 push 到离用户比较近的服务器,即
CDN
模块。因此我们增加Video Distributing Service
来负责把视频和图片分发到CDN
的各个节点 上。这个过程同样是 Async task,因为比较耗时间,我们通过Completion Queue
来处理。当视频处理完之后,往这个队列里添加一个视频,下游的Service
可以去读取任务分发视频
Video Processing Service 功能
- Breakdown video into chunks 把视频切分为小片段
- Transcoding (multiple codec & resolutions)
- a. Decode
- b. Encode
- Generate thumbnails and previews 生成缩略图
- (Advanced) Video Understanding ML
The processing can be done in parallel across all chunks 并行处理
- 视频和图片是静态资源,适合
CDN
的分发 - 不是所有视频都需要保存在
CDN
上,因为容量有限。一般热门的视频会从CDN
上 stream 给用户,冷门的视频则由原数据中心 stream 给用户
Scenario2: Watch video
Video Playback Service
用于负责视频的播放Host Identify Service
用来对视频的地址进行查找,给定 video, user IP address, device info 查找离用户最近的并且存储有这个 video 的CDN
的位置。如果找到了就把该位置返回给用户,用户 即可观看视频。如果没有找到,就从原数据中心 stream 数据- 其他数据如视频的标题和描述,由
Database(Metadata/User)
中读取
Step 5: Data Storage
String
是不可变长度的数据类型,varchar
是可变长度的数据类型
- SQL database
- Usage: store relational database
- E.g., user table, video metadata
- NoSQL database
- Usage: store unstructured data
- E.g., store video thumbnails in big table
- 实际工程中,绝大部分公司还是把视频与图片都存储到文件系统
- File system / Blob storage
- Media file: image, audio, video
- E.g., File system 分布式文件系统: HDFS, GlusterFS
- E.g., Blob storage: Netflix used Amazon S3
Step 6: Scalability
- Identify potential bottlenecks
- Discussion solutions, focusing on tradeoffs
- Data sharding 数据分片
- Data store, cache
- Load balancing 负载均衡
- E.g.,
user <-> application server
;application server <-> cache server
;application <-> db
- E.g.,
- Data caching 数据缓存
- Read heavy 对读 >> 写的 app 特别有用
- Data sharding 数据分片
Bottlenecks & Solutions
- Read heavy system
- Distribute/Replicate data across servers/regions
- Data sharding
- Data replication
- Distribute/Replicate data across servers/regions
- Latency 延迟敏感
- Caching
- CDN
Data Sharding
- Why?
- Horizontal scaling: distribute our database load to multiple servers making it scalable
- How?
- Sharding by
video_id
viaconsistent hashing
- Pros: Uniform distribution (no hot user problem)
- Cons:
- Need to query all shards in order to get a user's videos
- Hot videos can make shards experience very high traffic 例如热门的视频 大家都在看
- Solution: replicate the hot videos to more servers 对 hot video 进行多份拷贝
- Sharding by
Data Replication
- Why?
- Scale to handle heavy read traffic by making the same data available in multiple machines
- Improve availability
- How?
- Primary-secondary configuration 主从数据库配置 (Master-Slave)
- Write to primary and then propagate to all secondaries 所有的写操作都在主 DB,并复制到从 DB
- Read from secondary 所有的读操作都在从 DB
- 随时都可以读数据,不用被写操作影响
- Primary-secondary configuration 主从数据库配置 (Master-Slave)
- Cons?
- Break consistency: data staleness. Okay to have eventual consistency
- 破坏了 consistency 即读到的不一定是最新的数据
Caching
- Where to put cache?
- CDN
Server <-> database
- 任何需要重复读数据的地方都可以放 cache
- How many to cache
- Cache 20% of daily read videos (follow the 80-20 rule)
- 经典原则是存储 20%需要读到的数据
- How to scale
- Distribute cache across many servers using consistent hashing mechanism
E.g. Netflix 为了进一步优化 latency 和 ISP
合作
- 将
Cache
放入ISP
里面。当用户请求播放某个视频的时候,如果ISP
发现里面有,则从Cache
里面 stream 视频给用户 - 好处是不用从
CDN
里面 stream 视频到ISP
再到达用户。节省了这一部分的带宽,用户在观看视频时的 latency 就更低了 - 90%的 Netflix traffic 是从
ISP
的Cache
里经过。而不用 hitCDN
和Database
CDN
- Predict locations where people would prefer to watch a video 提前预测
- Copy the video to the CDNs closest to the predicted locations in advance. So the videos are ready to serve the users when requested 提前 push 你要看的时候到你最近的
CDN
- Copy the video at off-peak time 例如在半夜进行视频的分发,降低分发时的带宽压力
- Copy the video to the CDNs closest to the predicted locations in advance. So the videos are ready to serve the users when requested 提前 push 你要看的时候到你最近的
- 我们将热门视频存储到 CDN:
- CDN 在多处复制内容。 视频更接近于用户,并且跳数更少,视频将以更流畅网络流式传输。
- CDN 机器大量使用缓存,并且大部分可以提供超出内存的视频。
- 非热门的视频 (每天 1-20 次观看) 通过 DB server 查询返回