常规的视图(View)可以看作一个预先保存好的查询,而Clickhouse的物化视图(materialized view,MV)功能类似于数据的插入触发器,会在来源数据插入的时候按照定义更新MV对应的数据
场景:统计平台不同时间段(分钟、小时、6小时、天、周、月、年)的下载量。原始数据:(when,platform,count)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| CREATE TABLE download ( when DateTime, platform UInt32, cnt UInt32 ) ENGINE = MergeTree PARTITION BY toYYYYMM(when) ORDER BY (platform, when);
INSERT INTO download SELECT now() + number * 60 as when, rand() % 10, rand() % 100000000 FROM system.numbers LIMIT 50000
|
查看数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| SELECT toStartOfDay(when) AS day, platform, sum(cnt) AS cnt FROM download GROUP BY platform, day order by day asc,platform
SELECT toStartOfWeek(when) AS week, platform, sum(cnt) AS cnt FROM download GROUP BY platform, week order by week asc,platform
|
按天的查询可以建立为MV,其中的POPULATE表示从已有的数据中生成,MV创建完成后原始数据表中的新增数据将自动按照GROUP BY条件更新到MV中。
需要注意MV在创建到数据导入完成会存在一部分时间,实际使用中一般不指定POPULATE,而在MV创建完成后再分段的注入数据,保证数据一致性。
1 2 3 4 5 6 7 8 9 10 11 12
| CREATE MATERIALIZED VIEW downloads_by_day ENGINE = SummingMergeTree PARTITION BY toYYYYMM(day) ORDER BY (platform, day) POPULATE AS SELECT toStartOfDay(when) AS day, platform, sum(cnt) AS cnt FROM download GROUP BY platform, day;
|
同样的,我们可以按照分析需求建立更多层级的MV,也可以在MV中选择不同的聚合函数。结合输入端给到的时间范围以及展示上最多需要的点数,调整需要查询MV。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| public enum TimeAlign { By1Min = 1, By5Min = 2, By10Min = 3, By15Min = 4, By30Min = 5, By1Hour = 6, By6Hour = 7, By12Hour = 8, By1Day = 9, By1Week = 10, By1Month = 11 }
|