面试题：有一张表里面有三个字段，分别是（id，开始时间，结束时间），表中数据量为 5000W，如何统计流量最大的时候有多少条数据？

我们需要统计在某个时间段内，表中数据量最大的时候有多少条记录。具体来说，就是找到在某个时间点，有多少条记录的“开始时间”和“结束时间”区间覆盖了这个时间点。

解决方案

方法一：基于时间点的扫描统计

思路：
- 遍历所有可能的“时间点”，统计每个时间点有多少条记录的区间覆盖了它。
- 找到最大值。
实现步骤：
- 提取所有“开始时间”和“结束时间”，生成一个时间点列表。
- 对每个时间点，统计满足 开始时间 <= 时间点 <= 结束时间 的记录数。
- 找到最大值。
问题：
- 时间点是连续的，无法直接遍历所有时间点。
- 数据量太大（5000W），直接扫描性能较差。

方法二：基于事件的时间线扫描

思路：
- 将“开始时间”和“结束时间”视为事件：
  - “开始时间”表示流量增加 1。
  - “结束时间”表示流量减少 1。
- 将所有事件按时间排序，扫描时间线，统计流量的最大值。
实现步骤：
- 提取所有“开始时间”和“结束时间”，并标记事件类型（+1 或 -1）。
- 将所有事件按时间排序。
- 扫描时间线，维护一个计数器：
  - 遇到“开始时间”事件，计数器加 1。
  - 遇到“结束时间”事件，计数器减 1。
- 记录计数器的最大值。
优点：
- 只需要扫描一次时间线，时间复杂度为 O(N log N)（排序） + O(N)（扫描）。
- 适合大数据量场景。

SQL 实现：

WITH events AS (
    SELECT start_time AS event_time, 1 AS event_type
    FROM your_table
    UNION ALL
    SELECT end_time AS event_time, -1 AS event_type
    FROM your_table
),
sorted_events AS (
    SELECT event_time, event_type
    FROM events
    ORDER BY event_time
)
SELECT MAX(concurrent_count) AS max_concurrent
FROM (
    SELECT event_time,
           SUM(event_type) OVER (ORDER BY event_time) AS concurrent_count
    FROM sorted_events
) AS counts;

方法三：基于时间段的统计

思路：
- 将时间划分为多个区间（如每分钟、每小时），统计每个区间内的最大流量。
- 找到全局最大值。
实现步骤：
- 将时间划分为固定大小的区间（如每分钟）。
- 对每个区间，统计有多少条记录的区间覆盖了该时间区间。
- 找到最大值。
优点：
- 可以通过并行化加速统计。
- 适合需要近似结果的场景。

SQL 实现：

WITH time_intervals AS (
    SELECT generate_series(
               MIN(start_time),
               MAX(end_time),
               '1 minute'::interval
           ) AS interval_start
),
interval_counts AS (
    SELECT ti.interval_start,
           COUNT(*) AS concurrent_count
    FROM time_intervals ti
    JOIN your_table yt
    ON ti.interval_start BETWEEN yt.start_time AND yt.end_time
    GROUP BY ti.interval_start
)
SELECT MAX(concurrent_count) AS max_concurrent
FROM interval_counts;

方法四：基于索引的优化

思路：
- 对“开始时间”和“结束时间”字段建立索引，加速区间查询。
- 使用窗口函数或递归查询统计流量。
实现步骤：
- 对“开始时间”和“结束时间”字段建立联合索引。
- 使用窗口函数统计每个时间点的流量。

SQL 实现：

WITH events AS (
    SELECT start_time AS event_time, 1 AS event_type
    FROM your_table
    UNION ALL
    SELECT end_time AS event_time, -1 AS event_type
    FROM your_table
),
sorted_events AS (
    SELECT event_time, event_type
    FROM events
    ORDER BY event_time
)
SELECT MAX(concurrent_count) AS max_concurrent
FROM (
    SELECT event_time,
           SUM(event_type) OVER (ORDER BY event_time) AS concurrent_count
    FROM sorted_events
) AS counts;