SpringCloud - 广告系统
项目介绍
实现广告系统中最为核心的两个模块:广告投放系统与广告检索系统,并测试它们的可用性,学习广告系统的设计思想,实现方法。
- 基于SpringCloud框架开发,使用了Eureka、Zuul、Ribbon、Feign 以及 Hystrix组件. Eureka 用于服务的注册和服务信息的获取,Zull 和 Feign 都依赖于 Eureka 中存储的服务信息;Zuul 是网关,是整个工程的入口;Ribbon 和 Feign 用于访问其他的微服务,其实和你使用 RestTemplate 去访问没有实质上的区别,只是框架把它们封装的更加易于使用;Hystrix 用于熔断和降级,接口出错的时候,可以对接口的访问实现兜底.
- 在JVM中构造索引,使用倒排索引加速了检索的过程.
- 使用了Binlog作为增量索引的更新工具,监听和解析Binlog的过程.
- 使用了Kafka对增量数据的更新过程进行优化,减轻MySQL的压力.
Ad Delivery system and Ad Retrieval system Microservices built with Java/Spring Cloud.
- Developed Ad Delivery system to create ads by user/features/creative information, and Ad Retrieval system to retrieve ads based on keyword/feature/geographic/ad information.
- Used standalone mode of Eureka Server for Microservices registration and used Zuul Server as an API Gateway.
- Increased the speed of the Ad retrieval process by constructing indexes in the JVM, and using inverted indexes.
- Designed an update module of incremental indexes by using Binlog to listen and parse the Binlog process.
- Optimized the update process for incremental data by Kafka and reduced the pressure on MySQL.
- Utilized: Java, Spring Cloud(Eureka, Zuul, Ribbon, Hystrix), MySQL, Kafka, Maven, Git
环境配置
Eureka
配置Eureka Server的多节点部署
修改本机hosts,让多个server服务指向相同的IP地址
# 打开hosts
sudo vim /etc/hosts
# insert
127.0.0.1 server1
127.0.0.1 server2
127.0.0.1 server3
# 保存退出
打包Spring Boot程序
// 注意一定要使用JDk1.8,不可以用JDK14打包为JDK8的程序
cd my-imooc-ad-sping-cloud
# 打包, 跳过测试。 -U为强制打包
mvn clean package -Dmaven.test.skip=true -U
# 跳转到ad-eureka目录
cd ad-eureka/target
# 启动第一个eureka server1服务
java -jar ad-eureka-1.0-SNAPSHOT.jar --spring.profiles.active=server1
# 如果能成功启动。则打开其他的终端窗口,启动其余server服务
java -jar ad-eureka-1.0-SNAPSHOT.jar --spring.profiles.active=server2
java -jar ad-eureka-1.0-SNAPSHOT.jar --spring.profiles.active=server3
多节点服务启动成功
可以看到有3个instance,实现了高可用,可以在多个机器上部署。
Kafka
官网下载
解压安装文件
tar -zxvf kafka_2.12-3.1.0.tgz
通过brew安装(我的方式)
# 安装报错先update一下
brew update
brew install kafka
修改配置文件
# 我的kafka目录
cd /usr/local/etc/kafka
vim server.properties
# insert
# 修改broker.id
broker.id = 1
# 修改log.dirs 日志目录
log.dirs = 自己想要的文件目录
brew安装 启动指令
# 先启动zookeeper服务
brew services start zookeeper
# 启动kafka服务
brew services start kafka
# 或者 手动指令启动
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
kafka-server-start /usr/local/etc/kafka/server.properties
# 创建topic
# 老版本 before kafka2.2
# kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic imooc_ad_test
# 创建topic 新版指令
# 由于新版不再使用zookeeper 而是bootstrap-server
kafka-topics --create --topic test-topic --bootstrap-server localhost:9092 --replication-factor 1 --partitions 4
# Output: Created topic test-topic.
# 查看现有的topic
kafka-topics --list --bootstrap-server localhost:9092
# 启动 Producer
kafka-console-producer --broker-list localhost:9092 --topic test-topic
#启动 Consumer
kafka-console-consumer --bootstrap-server localhost:9092 --topic test-topic --from-beginning
# 查看Topic 相关信息
kafka-topics --describe --bootstrap-server localhost:9092 --topic test-topic
# Topic 相关信息 Output like
Topic: test-topic TopicId: 9ihIygWuS8C8nY4gGwTNwg PartitionCount: 4 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: test-topic Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: test-topic Partition: 1 Leader: 1 Replicas: 1 Isr: 1
Topic: test-topic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
Topic: test-topic Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Producer and Consumer 过程
手动安装 启动指令(进入到Kafka的根目录下)
#启动 ZK. Kafka 安装包自带 ZK,可以单节点启动
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
#启动 Kafka 服务器
bin/kafka-server-start.sh config/server.properties
#创建 Topic(test)
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
#Topic 列表
bin/kafka-topics.sh --list --zookeeper localhost:2181
#启动 Producer
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
#启动 Consumer
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
# Topic 相关信息(test)
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
知识梳理
术语 Terminology
- 单条广告 (即广告计划) Ad
- 广告组 Ad Group(Google), Ad Set(Facebook)
- 广告投放 Ad delivery
- 广告检索 Ad retrieval
- 广告资源 ad inventory: The ad slots being offered to buyers.
- 广告位 ad slot: The space on a web or mobile page where the ad is displayed.
- ad tag: A small piece of code that includes parameters describing the ad slot.
- 广告系统服务端 ad server: Technology used by ad serving platforms to deliver creatives to ad slots on a publisher's properties. Ad servers usually include features such as creative selection, counts, and serving.
- 广告主 advertiser: Organizations that want to promote a product through different media either directly or through other buyers.
- audience: The (unique) users who visit or use a publisher's property. audience segment: A selection, based on a subset of the taxonomy, that results in a set of (unique) users whom advertisers can target.
- buyer: Purchases ad slots to place creatives. Buyers can be networks, agencies, or advertisers.
- conversion: Predefined action by an advertiser that a user might take on an advertiser's property.
- CPA: Cost per action. What a buyer pays per action. Actions or conversions can have different goals, such as acquiring as many users as possible, retaining high-valued key customers, or getting targeted users to buy something on their website. An action might be downloading a whitepaper, signing up for a newsletter, or buying something on the advertiser's website.
- CPC: Cost per click. What a buyer pays per ad click.
- CPM: Cost per mille. What a buyer pays per thousand impressions.
- 创意 creative: Advertisement presented to the targeted user.
- CTR: Click-through rate. Number of clicks divided by number of impressions.
- CVR: Conversion rate. Number of conversions divided by number of impressions.
- DMP: Data management platforms provide additional user information to advertising technology (ad tech) players. These platforms might give access to a data dump, or sometimes they load the data to your platform, if you give them access to object storage such as Cloud Storage.
- 访问 impression: When an ad is fetched from its source, and is billable.
MicroService
点对点
服务之间可直接调用,如果系统越来越庞大,则难以维护。
API Gateway
应用最广泛的架构。所有业务的接口通过API Gateway去暴露,是所有客户端接口的唯一入口。微服务之间的通信也通过API Gateway.
Zuul-API Gateway 组件
Zuul 提供了服务网关的功能,可以实现负载均衡、反向代理、动态路由、请求转发等功能。Zuul 大部分功能都是通过过 滤器实现的,Zuul 中定义了四种标准的过滤器类型,同时,还支持自定义过滤器(课程中实现了两个自定义过滤器,用来记录访问延迟)。这些过滤器的类型也对应于请求的典型生命周期,如图所示。
- Pre filters: 在 Request 被路由之前调用。实现身份验证,记录调试信息等.
- Routing filters: 将 Request 路由到微服务,用于构造发送给微服务的请求.
- Post filters: 为 Response 添加标准的HTTP Header.
- Error filters: 当 Request 发生错误执行的过滤器.
- Custom filters: 自定义的过滤器.
ad-common 模块
设计思想
- 通用的代码、配置不应该散落在各个业务模块中,不利于维护与更新.
- 一个大的系统,响应对象需要统一外层格式.
- 各种业务设计与实现,可能会抛出各种各样的异常,异常信息的收集也应该做到统一.
方便前端的统一解析
- code = 统一的HTTP状态码的编码
- message = 报错 or 消息
- data = 统一包装的数据结构
回顾 Spring特性
- client sends a Request to Spring MVC, 所有的Request由DispatchServlet来统一的分发,类似于网关.
- 基于HandlerMapping来定位到具体的Controller。分为Handler和Mapping两步.
- 将Request提交给Controller, Controller调用具体的业务Service.
- Return ModelAndView to DispatchServlet.
- 查询视图解析器 ViewResolver, 对Model数据进行渲染, 返回View, Return HTTP response.
实现微服务调用
Ribbon 方式调用
Ribbon 是一个客户端负载均衡器,可以很好的控制 HTTP 和 TCP 客户端的行为
- SearchApplication.java 中完成注入,并标记 @LoadBalanced 开启负载均衡的功能.
- SearchController.java 中通过 RestTemplate 调用服务接口,与常见的 RestTemplate 不同的是,调用使用的不再是 ip + port,而是服务名。这是通过 注册中心(Eureka Server)实现的.
Feign 方式调用
Feign 可以实现声明式的 Web 服务客户端
- 通过 @FeignClient 指定调用的服务名称.
- 在接口上声明 @RequestMapping 指明调用服务的地址与请求类型.
- 通过在 @FeignClient 中配置 fallback 指定熔断.
- 实现接口:SponsorClient.java,熔断:SponsorClientHystrix.java.
广告数据索引设计
正向索引
倒排索引