SpringCloud - 广告系统

项目介绍

实现广告系统中最为核心的两个模块：广告投放系统与广告检索系统，并测试它们的可用性，学习广告系统的设计思想，实现方法。

基于SpringCloud框架开发，使用了Eureka、Zuul、Ribbon、Feign 以及 Hystrix组件. Eureka 用于服务的注册和服务信息的获取，Zull 和 Feign 都依赖于 Eureka 中存储的服务信息；Zuul 是网关，是整个工程的入口；Ribbon 和 Feign 用于访问其他的微服务，其实和你使用 RestTemplate 去访问没有实质上的区别，只是框架把它们封装的更加易于使用；Hystrix 用于熔断和降级，接口出错的时候，可以对接口的访问实现兜底.
在JVM中构造索引，使用倒排索引加速了检索的过程.
使用了Binlog作为增量索引的更新工具，监听和解析Binlog的过程.
使用了Kafka对增量数据的更新过程进行优化，减轻MySQL的压力.

Ad Delivery system and Ad Retrieval system Microservices built with Java/Spring Cloud.

Developed Ad Delivery system to create ads by user/features/creative information, and Ad Retrieval system to retrieve ads based on keyword/feature/geographic/ad information.

Used standalone mode of Eureka Server for Microservices registration and used Zuul Server as an API Gateway.

Increased the speed of the Ad retrieval process by constructing indexes in the JVM, and using inverted indexes.

Designed an update module of incremental indexes by using Binlog to listen and parse the Binlog process.

Optimized the update process for incremental data by Kafka and reduced the pressure on MySQL.

Utilized: Java, Spring Cloud(Eureka, Zuul, Ribbon, Hystrix), MySQL, Kafka, Maven, Git

环境配置

Eureka

配置Eureka Server的多节点部署

修改本机hosts，让多个server服务指向相同的IP地址

# 打开hosts
sudo vim /etc/hosts

# insert
127.0.0.1 server1
127.0.0.1 server2
127.0.0.1 server3

# 保存退出

打包Spring Boot程序

// 注意一定要使用JDk1.8，不可以用JDK14打包为JDK8的程序
cd my-imooc-ad-sping-cloud

# 打包, 跳过测试。 -U为强制打包
mvn clean package -Dmaven.test.skip=true -U

# 跳转到ad-eureka目录
cd ad-eureka/target

# 启动第一个eureka server1服务
java -jar ad-eureka-1.0-SNAPSHOT.jar --spring.profiles.active=server1

# 如果能成功启动。则打开其他的终端窗口，启动其余server服务
java -jar ad-eureka-1.0-SNAPSHOT.jar --spring.profiles.active=server2
java -jar ad-eureka-1.0-SNAPSHOT.jar --spring.profiles.active=server3

多节点服务启动成功

可以看到有3个instance，实现了高可用，可以在多个机器上部署。

Kafka

官网下载

官网

解压安装文件

tar -zxvf kafka_2.12-3.1.0.tgz

通过brew安装(我的方式)

# 安装报错先update一下
brew update

brew install kafka

修改配置文件

# 我的kafka目录
cd /usr/local/etc/kafka

vim server.properties

# insert
# 修改broker.id
broker.id = 1
# 修改log.dirs 日志目录
log.dirs = 自己想要的文件目录

brew安装启动指令

# 先启动zookeeper服务
brew services start zookeeper
# 启动kafka服务
brew services start kafka

# 或者 手动指令启动
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
kafka-server-start /usr/local/etc/kafka/server.properties

# 创建topic
# 老版本 before kafka2.2
# kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic  imooc_ad_test

# 创建topic 新版指令
# 由于新版不再使用zookeeper 而是bootstrap-server
kafka-topics --create --topic test-topic --bootstrap-server localhost:9092 --replication-factor 1 --partitions 4 
# Output: Created topic test-topic.

# 查看现有的topic
kafka-topics --list --bootstrap-server localhost:9092

# 启动 Producer
kafka-console-producer --broker-list localhost:9092 --topic test-topic	

#启动 Consumer	
kafka-console-consumer --bootstrap-server localhost:9092 --topic test-topic --from-beginning

# 查看Topic 相关信息	
kafka-topics --describe --bootstrap-server localhost:9092 --topic test-topic

# Topic 相关信息 Output like
Topic: test-topic	TopicId: 9ihIygWuS8C8nY4gGwTNwg	PartitionCount: 4	ReplicationFactor: 1	Configs: segment.bytes=1073741824
	Topic: test-topic	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: test-topic	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
	Topic: test-topic	Partition: 2	Leader: 1	Replicas: 1	Isr: 1
	Topic: test-topic	Partition: 3	Leader: 1	Replicas: 1	Isr: 1

Producer and Consumer 过程

手动安装启动指令(进入到Kafka的根目录下)

#启动 ZK. Kafka 安装包自带 ZK，可以单节点启动	
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

#启动 Kafka 服务器	
bin/kafka-server-start.sh config/server.properties	

#创建 Topic（test）	
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test	

#Topic 列表	
bin/kafka-topics.sh --list --zookeeper localhost:2181	

#启动 Producer	
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test	

#启动 Consumer	
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

# Topic 相关信息（test）	
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test

知识梳理

术语 Terminology

单条广告 (即广告计划) Ad
广告组 Ad Group(Google), Ad Set(Facebook)
广告投放 Ad delivery
广告检索 Ad retrieval
广告资源 ad inventory: The ad slots being offered to buyers.
广告位 ad slot: The space on a web or mobile page where the ad is displayed.
ad tag: A small piece of code that includes parameters describing the ad slot.
广告系统服务端 ad server: Technology used by ad serving platforms to deliver creatives to ad slots on a publisher's properties. Ad servers usually include features such as creative selection, counts, and serving.
广告主 advertiser: Organizations that want to promote a product through different media either directly or through other buyers.
audience: The (unique) users who visit or use a publisher's property. audience segment: A selection, based on a subset of the taxonomy, that results in a set of (unique) users whom advertisers can target.
buyer: Purchases ad slots to place creatives. Buyers can be networks, agencies, or advertisers.
conversion: Predefined action by an advertiser that a user might take on an advertiser's property.
CPA: Cost per action. What a buyer pays per action. Actions or conversions can have different goals, such as acquiring as many users as possible, retaining high-valued key customers, or getting targeted users to buy something on their website. An action might be downloading a whitepaper, signing up for a newsletter, or buying something on the advertiser's website.
CPC: Cost per click. What a buyer pays per ad click.
CPM: Cost per mille. What a buyer pays per thousand impressions.
创意 creative: Advertisement presented to the targeted user.
CTR: Click-through rate. Number of clicks divided by number of impressions.
CVR: Conversion rate. Number of conversions divided by number of impressions.
DMP: Data management platforms provide additional user information to advertising technology (ad tech) players. These platforms might give access to a data dump, or sometimes they load the data to your platform, if you give them access to object storage such as Cloud Storage.
访问 impression: When an ad is fetched from its source, and is billable.

MicroService

点对点

服务之间可直接调用，如果系统越来越庞大，则难以维护。

API Gateway

应用最广泛的架构。所有业务的接口通过API Gateway去暴露，是所有客户端接口的唯一入口。微服务之间的通信也通过API Gateway.

Zuul-API Gateway 组件

Zuul 提供了服务网关的功能，可以实现负载均衡、反向代理、动态路由、请求转发等功能。Zuul 大部分功能都是通过过滤器实现的，Zuul 中定义了四种标准的过滤器类型，同时，还支持自定义过滤器（课程中实现了两个自定义过滤器，用来记录访问延迟）。这些过滤器的类型也对应于请求的典型生命周期，如图所示。

Pre filters: 在 Request 被路由之前调用。实现身份验证，记录调试信息等.
Routing filters: 将 Request 路由到微服务，用于构造发送给微服务的请求.
Post filters: 为 Response 添加标准的HTTP Header.
Error filters: 当 Request 发生错误执行的过滤器.
Custom filters: 自定义的过滤器.

ad-common 模块

设计思想

通用的代码、配置不应该散落在各个业务模块中，不利于维护与更新.
一个大的系统，响应对象需要统一外层格式.
各种业务设计与实现，可能会抛出各种各样的异常，异常信息的收集也应该做到统一.

方便前端的统一解析

code = 统一的HTTP状态码的编码
message = 报错 or 消息
data = 统一包装的数据结构

回顾 Spring特性

client sends a Request to Spring MVC, 所有的Request由DispatchServlet来统一的分发，类似于网关.
基于HandlerMapping来定位到具体的Controller。分为Handler和Mapping两步.
将Request提交给Controller, Controller调用具体的业务Service.
Return ModelAndView to DispatchServlet.
查询视图解析器 ViewResolver, 对Model数据进行渲染, 返回View, Return HTTP response.

实现微服务调用

Ribbon 方式调用

Ribbon 是一个客户端负载均衡器，可以很好的控制 HTTP 和 TCP 客户端的行为

SearchApplication.java 中完成注入，并标记 @LoadBalanced 开启负载均衡的功能.
SearchController.java 中通过 RestTemplate 调用服务接口，与常见的 RestTemplate 不同的是，调用使用的不再是 ip + port，而是服务名。这是通过注册中心（Eureka Server）实现的.

Feign 方式调用

Feign 可以实现声明式的 Web 服务客户端

通过 @FeignClient 指定调用的服务名称.
在接口上声明 @RequestMapping 指明调用服务的地址与请求类型.
通过在 @FeignClient 中配置 fallback 指定熔断.
实现接口：SponsorClient.java，熔断：SponsorClientHystrix.java.

广告数据索引设计

正向索引

倒排索引

广告数据索引维护

MySQL-Binlog

什么是Binlog

Binlog 是 MySQL Server 维护的一种二进制日志，主要是用来记录对 MySQL 数据更新或潜在发生更新的 SQL 语句，并以"事务"的形式保存在磁盘中(文件).

主要用途

复制：MySQL 的 Master-Slave 协议，让 Slave 可以通过监听 Binlog 实现数据复制，达到数据一致的目的.
数据恢复：通过 mysqlbinlog 工具恢复数据.
增量备份.

-- Binlog 开关变量
mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin       | ON    |
+---------------+-------+
1 row in set (0.30 sec)

-- Binlog 日志的格式
mysql> show variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
1 row in set (0.00 sec)

Kafka

Kafka Producer 和 Consumer 的工作过程 Producer 发送消息的方式

只管发送，不管结果：只调用接口发送消息到 Kafka 服务器，但不管成功写入与否。由于 Kafka 是高可用的，因此大部分情况下消息都会写入，但在异常情况下会丢消息
同步发送：调用 send() 方法返回一个 Future 对象，我们可以使用它的 get() 方法来判断消息发送成功与否
异步发送：调用 send() 时提供一个回调方法，当接收到broker 结果后回调此方法

Consumer 消费消息的方式

自动提交位移
手动提交当前位移
手动异步提交当前位移
手动异步提交当前位移带回调
混合同步与异步提交位移

业务梳理

广告投放系统

DataBase表设计

广告检索系统

核心思想是什么

媒体方发起广告请求（详细的表达自己的意图），检索服务检索广告数据（条件匹配过程），返回响应
广告数据是围绕推广单元的，所有的匹配信息应该是集中到推广单元中
高效的对广告数据进行检索，尽可能低的延迟，所以，需要设计并构建合理的索引结构
保证索引与数据库中存储的数据是一致的，即全量索引 + 增量索引

可以看到，之前所做的工作都是为了检索服务高效、准确的执行检索动作

媒体方的请求包含的三个要素

媒体方的请求标识 mediaId
请求基本信息 RequestInfo: requestId, adSlots, App, Geo, Device
匹配信息 FeatureInfo: KeywordFeature, DistrictFeature, ItFeature, FeatureRelation

检索服务的响应

Map<String, List> adSlot2Ads: key 是广告位编码，能够唯一的标识一个广告位， value 是提供给广告位的广告数据

检索服务的匹配过程 核心的思想是循环遍历媒体方请求的广告位，将匹配范围由大变小，越是能过滤更多的推广单元的条件匹配，越是先执行。对于每一个广告位，匹配过程如下：

根据广告位类型对推广单元进行预筛选
根据各个 Feature 对推广单元进行再过滤
对推广单元以及推广计划的状态进行过滤
创意信息与广告位信息的匹配

Q&A

索引数据的存储与操作使用的是ConcurrentHashMap、ConcurrentSkipListSet，能否使用 HashMap、HashSet 替换呢？为什么？

答: 不能，用HashMap和HashSet在高并发的情况下会有线程安全问题，相比之下ConcurrentHashMap和ConcurrentSkipListSet是线程安全的类.

如果广告数据太多，内存中放不下，怎么办？

答: 广告数据如果在 JVM 中放不下了，可以考虑基于内存的存储工具，例如 Redis 等.

加载全量索引时，为什么要将数据库表中的数据导出到文件中，而不是直接从数据库中读取？

答: 数据保存到文件中用于程序在启动时加载全量索引，这是一种通用的实现方法。这样做的好处是避免给数据库造成太大的压力。可以想象，如果启动的实例过多，数据保存在数据库中，那么每一个实例都需要从数据库中瞬间读取大量的数据，网络 IO 和延迟都会非常大。所以，我们在实现这样的功能时，都不会直接从数据库中读取。而是将数据库中的全量数据转储到文件中

全量索引与增量索引的存储？如何预估大小？

答: 在内存（其实就是在你的 JVM 中）中保存索引首先需要考虑的是你的 JVM 内存能开多大，如果你机器的内存足够大，那么 JVM 的内存给多一些更好（建议不要低于 4G）。JVM 的内存确定之后，再去考虑你当前的数据量，这个主要是靠预估，估计你的一个 Java Object 会占据多少字节。看一看当前的 JVM 是否可以存的下。这个需要根据具体的业务来做选择。但是，对于广告系统这种项目来说，可以肯定的是，你的广告数据量不会很大（这类数据比较特殊，本来就没有多少广告主，哪来的巨量数据），所以，一般数据量都在 MB 级别。完全可以存储在 JVM 中。

Binlog 增量数据构造之后（MySqlRowData）可以用来做什么？

检索系统中的增量索引 这是检索系统中的核心用途：数据库表字段发生变更，对应的索引对象属性同样需要修改，即增量索引，对应的实现是 IndexSender.java

第二层级增量索引的投递: IndexSender.Level2RowData

第三层级增量索引的投递: IndexSender.Level3RowData

第四层级增量索引的投递: IndexSender.Level4RowData

投递给其他的子系统(扩展)

为了更通用的处理，将 Binlog 增量数据投递到 Kafka，其他 “有兴趣的子系统” 可以监听对应的 topic，获取数据，对应的实现是 KafkaSender.java

投递给数据分析子系统：对广告主的广告数据变更行为进行建模分析

投递给日志系统：用于打印记录广告主数据变更历史，用于将来的数据核对

目前的实现，ad-search 就是 MySQL Master 的 Slave，如果部署多个 ad-search 实例，那个每个实例都是一个 Slave，这样合理吗？你能说出为什么吗？

答: 每一个 ad-search 都作为 MySQL 的 Slave（可以查下 MySQL 的 Master/Slave 协议），当 Slave 的个数比较少的时候（比如业务量不大，且 Slave 个数不超过5个），是可以这么多的。但是当 Slave 个数变多，Binlog 将会由 MySQL Master 发往各个 Slave，这时候带宽、数据同步延迟、并发连接都会成为系统瓶颈（表面上看是 MySQL 的瓶颈，但实际却是系统架构设计的太为简单，没有考虑并发较高，需要多实例的场景）。所以，Slave 的个数是一定不能太多的，最好不要超过5个。但同时，ad-search 实例个数又可能会很多，因为流量较大，需要多实例分散流量。

为什么要先从sponsor模块导出数据库数据search模块再读取呢？直接在search模块启动的时候search去调sponsor的服务查数据库不就能构造索引对象了吗？

我们的服务在上线之前，肯定会有广告数据已经事先录入到数据库里去了，这些数据可能量级相对来说比较大，如果直接从数据库中加载，无疑会给数据库带来瞬时压力，且我们的服务启动过程也会受阻。从文件中读取，直接构建索引是最好的方式.

Debug

Caused by: org.hibernate.HibernateException: Access to DialectResolutionInfo cannot be null when 'hibernate.dialect' not set

在application.yml文件中添加database-platform: org.hibernate.dialect.MySQL5Dialect

The server time zone value 'CDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support

在application.yml文件的 datasource-url中添加serverTimezone配置

Test

准备工作

导出当前数据库表中的数据到 mysql_data, 即为全量数据

ad_plan.data
ad_unit.data
ad_creative.data
ad_creative_unit.data
ad_unit_district.data
ad_unit_it.data
ad_unit_keyword.data

Ad-sponsor(广告投放系统) Testcase

# Output
Hibernate: 
    select
        adplan0_.id as id1_1_,
        adplan0_.create_time as create_t2_1_,
        adplan0_.end_date as end_date3_1_,
        adplan0_.plan_name as plan_nam4_1_,
        adplan0_.plan_status as plan_sta5_1_,
        adplan0_.start_date as start_da6_1_,
        adplan0_.update_time as update_t7_1_,
        adplan0_.user_id as user_id8_1_ 
    from
        ad_plan adplan0_ 
    where
        (
            adplan0_.id in (
                ?
            )
        ) 
        and adplan0_.user_id=?
[AdPlan(id=10, userId=15, planName=推广计划名称, planStatus=1, startDate=2018-11-28 00:00:00.0, endDate=2019-11-20 00:00:00.0, createTime=2018-11-19 20:42:27.0, updateTime=2018-11-19 20:57:12.0)]

Ad-search(广告检索系统化) Testcase

匹配索引的过程日志

# Log 日志

Initialized JPA EntityManagerFactory for persistence unit 'default'
2022-04-06 21:36:39.611  INFO 84689 --- [           main] com.imooc.ad.index.adplan.AdPlanIndex    : before add: {}
2022-04-06 21:36:39.611  INFO 84689 --- [           main] com.imooc.ad.index.adplan.AdPlanIndex    : after add: {10=AdPlanObject(planId=10, userId=15, planStatus=1, startDate=Wed Nov 28 00:00:00 CST 2018, endDate=Wed Nov 20 00:00:00 CST 2019)}
2022-04-06 21:36:39.624  INFO 84689 --- [           main] c.imooc.ad.index.creative.CreativeIndex  : before add: {}
2022-04-06 21:36:39.624  INFO 84689 --- [           main] c.imooc.ad.index.creative.CreativeIndex  : after add: {10=CreativeObject(adId=10, name=第一个创意, type=1, materialType=1, height=720, width=1080, auditStatus=1, adUrl=https://www.imooc.com)}
2022-04-06 21:36:39.637  INFO 84689 --- [           main] com.imooc.ad.index.adunit.AdUnitIndex    : before add: {}
2022-04-06 21:36:39.638  INFO 84689 --- [           main] com.imooc.ad.index.adunit.AdUnitIndex    : after add: {10=AdUnitObject(unitId=10, unitStatus=1, positionType=1, planId=10, adPlanObject=AdPlanObject(planId=10, userId=15, planStatus=1, startDate=Wed Nov 28 00:00:00 CST 2018, endDate=Wed Nov 20 00:00:00 CST 2019))}
2022-04-06 21:36:39.638  INFO 84689 --- [           main] com.imooc.ad.index.adunit.AdUnitIndex    : before add: {10=AdUnitObject(unitId=10, unitStatus=1, positionType=1, planId=10, adPlanObject=AdPlanObject(planId=10, userId=15, planStatus=1, startDate=Wed Nov 28 00:00:00 CST 2018, endDate=Wed Nov 20 00:00:00 CST 2019))}
2022-04-06 21:36:39.639  INFO 84689 --- [           main] com.imooc.ad.index.adunit.AdUnitIndex    : after add: {10=AdUnitObject(unitId=10, unitStatus=1, positionType=1, planId=10, adPlanObject=AdPlanObject(planId=10, userId=15, planStatus=1, startDate=Wed Nov 28 00:00:00 CST 2018, endDate=Wed Nov 20 00:00:00 CST 2019)), 12=AdUnitObject(unitId=12, unitStatus=1, positionType=1, planId=10, adPlanObject=AdPlanObject(planId=10, userId=15, planStatus=1, startDate=Wed Nov 28 00:00:00 CST 2018, endDate=Wed Nov 20 00:00:00 CST 2019))}
2022-04-06 21:36:39.652  INFO 84689 --- [           main] c.i.a.i.creativeunit.CreativeUnitIndex   : before add: {}
2022-04-06 21:36:39.660  INFO 84689 --- [           main] c.i.a.i.creativeunit.CreativeUnitIndex   : after add: {10-10=CreativeUnitObject(adId=10, unitId=10)}
2022-04-06 21:36:39.669  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, before add: {}
2022-04-06 21:36:39.670  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, after add: {10=[安徽省-淮北市]}
2022-04-06 21:36:39.671  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, before add: {10=[安徽省-淮北市]}
2022-04-06 21:36:39.671  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, after add: {10=[安徽省-宿州市, 安徽省-淮北市]}
2022-04-06 21:36:39.671  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, before add: {10=[安徽省-宿州市, 安徽省-淮北市]}
2022-04-06 21:36:39.671  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, after add: {10=[安徽省-合肥市, 安徽省-宿州市, 安徽省-淮北市]}
2022-04-06 21:36:39.671  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, before add: {10=[安徽省-合肥市, 安徽省-宿州市, 安徽省-淮北市]}
2022-04-06 21:36:39.671  INFO 84689 --- [           main] c.i.ad.index.district.UnitDistrictIndex  : UnitDistrictIndex, after add: {10=[安徽省-合肥市, 安徽省-宿州市, 安徽省-淮北市, 辽宁省-大连市]}
2022-04-06 21:36:39.680  INFO 84689 --- [           main] com.imooc.ad.index.interest.UnitItIndex  : UnitItIndex, before add: {}
2022-04-06 21:36:39.680  INFO 84689 --- [           main] com.imooc.ad.index.interest.UnitItIndex  : UnitItIndex, after add: {10=[台球]}
2022-04-06 21:36:39.680  INFO 84689 --- [           main] com.imooc.ad.index.interest.UnitItIndex  : UnitItIndex, before add: {10=[台球]}
2022-04-06 21:36:39.680  INFO 84689 --- [           main] com.imooc.ad.index.interest.UnitItIndex  : UnitItIndex, after add: {10=[台球, 游泳]}
2022-04-06 21:36:39.681  INFO 84689 --- [           main] com.imooc.ad.index.interest.UnitItIndex  : UnitItIndex, before add: {10=[台球, 游泳]}
2022-04-06 21:36:39.681  INFO 84689 --- [           main] com.imooc.ad.index.interest.UnitItIndex  : UnitItIndex, after add: {10=[乒乓球, 台球, 游泳]}
2022-04-06 21:36:39.689  INFO 84689 --- [           main] c.i.ad.index.keyword.UnitKeywordIndex    : UnitKeywordIndex, before add: {}
2022-04-06 21:36:39.690  INFO 84689 --- [           main] c.i.ad.index.keyword.UnitKeywordIndex    : UnitKeywordIndex, after add: {10=[宝马]}
2022-04-06 21:36:39.690  INFO 84689 --- [           main] c.i.ad.index.keyword.UnitKeywordIndex    : UnitKeywordIndex, before add: {10=[宝马]}
2022-04-06 21:36:39.690  INFO 84689 --- [           main] c.i.ad.index.keyword.UnitKeywordIndex    : UnitKeywordIndex, after add: {10=[奥迪, 宝马]}
2022-04-06 21:36:39.690  INFO 84689 --- [           main] c.i.ad.index.keyword.UnitKeywordIndex    : UnitKeywordIndex, before add: {10=[奥迪, 宝马]}
2022-04-06 21:36:39.690  INFO 84689 --- [           main] c.i.ad.index.keyword.UnitKeywordIndex    : UnitKeywordIndex, after add: {10=[大众, 奥迪, 宝马]}

Search request like

{
  "featureInfo": {
    "districtFeature": {
      "districts": [
        {
          "city": "合肥市",
          "province": "安徽省"
        }
      ]
    },
    "itFeature": {
      "its": [
        "台球",
        "游泳"
      ]
    },
    "keywordFeature": {
      "keywords": [
        "宝马",
        "大众"
      ]
    },
    "relation": "OR"
  },
  "mediaId": "imooc-ad",
  "requestInfo": {
    "adSlots": [
      {
        "adSlotCode": "ad-x",
        "height": 720,
        "minCpm": 1000,
        "positionType": 1,
        "type": [
          1,
          2
        ],
        "width": 1080
      }
    ],
    "app": {
      "activityName": "video",
      "appCode": "imooc",
      "appName": "imooc",
      "packageName": "com.imooc"
    },
    "device": {
      "deviceCode": "iphone",
      "displaySize": "1080 720",
      "ip": "127.0.0.1",
      "mac": "0xxxxx",
      "model": "x",
      "screenSize": "1080 720",
      "serialName": "123456789"
    },
    "geo": {
      "city": "北京市",
      "latitude": 100.28,
      "longitude": 88.61,
      "province": "北京市"
    },
    "requestId": "aaa"
  }
}

Hystrix Dashboard 的测试

监控入口与仪表盘 URL

http://localhost:7002/hystrix http://localhost:7001/ad-search/actuator/hystrix.stream

API Test

Create user

获取推广计划

项目介绍​

环境配置​

Eureka​

Kafka​

知识梳理​

术语 Terminology​

MicroService​

Zuul-API Gateway 组件​

ad-common 模块​

回顾 Spring特性​

实现微服务调用​

Ribbon 方式调用​

Feign 方式调用​

广告数据索引设计​

正向索引​

倒排索引​

广告数据索引维护​

MySQL-Binlog​

什么是Binlog​

主要用途​

Kafka​

业务梳理​

广告投放系统​

DataBase表设计​

广告检索系统​

Q&A​

索引数据的存储与操作使用的是ConcurrentHashMap、ConcurrentSkipListSet，能否使用 HashMap、HashSet 替换呢？为什么？​

如果广告数据太多，内存中放不下，怎么办？​

加载全量索引时，为什么要将数据库表中的数据导出到文件中，而不是直接从数据库中读取 ？​

全量索引与增量索引的存储？如何预估大小？​

Binlog 增量数据构造之后（MySqlRowData）可以用来做什么 ？​

目前的实现，ad-search 就是 MySQL Master 的 Slave，如果部署多个 ad-search 实例，那个每个实例都是一个 Slave，这样合理吗？你能说出为什么吗？​

为什么要先从sponsor模块导出数据库数据search模块再读取呢？直接在search模块启动的时候search去调sponsor的服务查数据库不就能构造索引对象了吗？​

Debug​

Caused by: org.hibernate.HibernateException: Access to DialectResolutionInfo cannot be null when 'hibernate.dialect' not set​

The server time zone value 'CDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support​

Test​

准备工作​

Ad-sponsor(广告投放系统) Testcase​

Ad-search(广告检索系统化) Testcase​

Hystrix Dashboard 的测试​

API Test​

项目介绍

环境配置

Eureka

Kafka

知识梳理

术语 Terminology

MicroService

Zuul-API Gateway 组件

ad-common 模块

回顾 Spring特性

实现微服务调用

Ribbon 方式调用

Feign 方式调用

广告数据索引设计

正向索引

倒排索引

广告数据索引维护

MySQL-Binlog

什么是Binlog

主要用途

Kafka

业务梳理

广告投放系统

DataBase表设计

广告检索系统

Q&A

索引数据的存储与操作使用的是ConcurrentHashMap、ConcurrentSkipListSet，能否使用 HashMap、HashSet 替换呢？为什么？

如果广告数据太多，内存中放不下，怎么办？

加载全量索引时，为什么要将数据库表中的数据导出到文件中，而不是直接从数据库中读取？

全量索引与增量索引的存储？如何预估大小？

Binlog 增量数据构造之后（MySqlRowData）可以用来做什么？

目前的实现，ad-search 就是 MySQL Master 的 Slave，如果部署多个 ad-search 实例，那个每个实例都是一个 Slave，这样合理吗？你能说出为什么吗？

为什么要先从sponsor模块导出数据库数据search模块再读取呢？直接在search模块启动的时候search去调sponsor的服务查数据库不就能构造索引对象了吗？

Debug

Caused by: org.hibernate.HibernateException: Access to DialectResolutionInfo cannot be null when 'hibernate.dialect' not set

The server time zone value 'CDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support

Test

准备工作

Ad-sponsor(广告投放系统) Testcase

Ad-search(广告检索系统化) Testcase

Hystrix Dashboard 的测试

API Test