Flink Notes

Interview Questions

1. How many Flink nodes do you have?

Task Managers: We have 50 TMs, one for each of our 50 consumer nodes for that Kafka Topic, ensuring immediate message processing. 以确保每个消费者节点都有一个对应的 Flink TM 进行处理。这样可以确保消息被即时处理。
Job Manager: 3. For high availability, Flink supports multiple Job Manager configurations. In this setup, if the primary JM fails, a standby JM takes over to ensure continuous job execution. 主要负责协调任务和管理集群。为了高可用性，Flink 支持多个 Job Manager 的配置。在这种配置中，如果主 JM 宕机，一个备用 JM 将接管，从而确保作业的连续运行。

2. What data transformation operations did you use?

map parses string logs into LogEvent objects. 解析原始的字符串日志，并将其转换为 Java 对象。
filter selects specific data we need.
timeWindow aggregates data for a specific time summary. 生成特定时间段内的数据聚合

// 将日志转换为Java对象，方便处理
DataStream<LogEvent> parsedStream = stream
    .map(new MapFunction<String, LogEvent>() {
        @Override
        public LogEvent map(String value) throws Exception {
            return LogParser.parse(value);
        }
    });

// 使用filter函数过滤掉不需要的日志条目
DataStream<LogEvent> filteredStream = parsedStream
    .filter(new FilterFunction<LogEvent>() {
        @Override
        public boolean filter(LogEvent logEvent) throws Exception {
            return logEvent.getLogLevel().equals("ERROR");
        }
    });

// 使用时间窗口进行日志条目的聚合
DataStream<LogSummary> windowedSummary = filteredStream
    .keyBy("logType")
    .timeWindow(Time.minutes(5))
    .aggregate(new LogAggregationFunction());

// 输出处理后的日志摘要
windowedSummary.print();

3. What is Flink job definition here? How much data do you process from Flink?

We're processing a daily volume of several million records, with each record being around 5KB in size. 每天处理的数据量是几百万条，每条数据大小大约为 5KB。

4. Imagine a case where your job is not running for 2 hours for some reasons, it's a real-time job, so we're missing some data for the last 2 hours. How are you going to handle this case?

Data Replay 数据回放
- When Flink is integrated with Kafka, in the event of a Flink failure, reset the offset to a position before the failure to ensure no data is lost. 当 Flink 发生故障的时候 Reset Offset 到故障之前的位置，确保没有数据丢失。
Checkpoints 检查点
- Checkpoint is a snapshot of Flink's state, used for data recovery.
- If Flink encounters an error, you can restart from the most recent checkpoint without starting from scratch.
Sink 保存数据结果
- To ensure data is not written duplicate after system recovery. 为了确保在系统恢复后数据不会被重复写入
- Two-Phase Commit Protocol (2PC)： For sinks requiring exactly-once guarantees, Flink uses 2PC to ensure data is written only once, even after failures. 对于需要确保 exactly-once 语义的 sink，Flink 支持 2PC。这确保了数据只被写入一次，即使在故障恢复的情况下。
- Idempotent sink: Some external systems or storages offer idempotent operations, meaning that even if data is written multiple times, the final outcome remains unchanged. 一些外部系统或存储提供了幂等性操作，这意味着即使数据被多次写入，最终的结果也不会改变。
Reference
- https://bbs.huaweicloud.com/blogs/317303
- https://www.51cto.com/article/770537.html

Interview Questions​

1. How many Flink nodes do you have?​

2. What data transformation operations did you use?​

3. What is Flink job definition here? How much data do you process from Flink?​

4. Imagine a case where your job is not running for 2 hours for some reasons, it's a real-time job, so we're missing some data for the last 2 hours. How are you going to handle this case?​

Interview Questions

1. How many Flink nodes do you have?

2. What data transformation operations did you use?

3. What is Flink job definition here? How much data do you process from Flink?

4. Imagine a case where your job is not running for 2 hours for some reasons, it's a real-time job, so we're missing some data for the last 2 hours. How are you going to handle this case?