Skip to content

Commit 147791b

Browse files
authored
docs: trigger overview and quick start (#1994)
1 parent 0e8e1c1 commit 147791b

File tree

9 files changed

+308
-2
lines changed

9 files changed

+308
-2
lines changed

docs/enterprise/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ This section provides an overview of the advanced features available in Greptime
3232
carries more cluster management and monitoring features.
3333
- [Read Replica](./read-replica.md): Read-only datanode instances for heavy query workloads such as
3434
analytical queries.
35-
- Triggers: Periodically evaluate your rules and trigger external
35+
- [Triggers](./trigger/overview.md): Periodically evaluate your rules and trigger external
3636
webhook. Compatible with Prometheus AlterManager.
3737
- Reliability features for Flow.
3838

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
keywords: [Trigger, GreptimeDB Enterprise, SQL, Webhook]
3+
description: The overview of GreptimeDB Trigger.
4+
---
5+
6+
# Trigger
7+
8+
Trigger allows you to define evaluation rules with SQL.
9+
GreptimeDB evaluates these rules periodically; once the condition is met, a
10+
notification is sent out.
11+
12+
- [Quick Start Example](./quick-start.md): A step-by-step guide to set up a
13+
Trigger that monitors system load and raises alerts.
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
keywords: [Trigger, Alert, GreptimeDB Enterprise, SQL, Webhook, Alertmanager, Slack]
3+
description: This guide demonstrates how GreptimeDB Triggers enable seamless integration with the Prometheus Alertmanager ecosystem for comprehensive monitoring and alerting.
4+
---
5+
6+
# Quick Start Example
7+
8+
This section walks through an end-to-end example that uses Trigger to monitor
9+
system load and raise an alert.
10+
11+
The diagram illustrates the complete end-to-end workflow of the example.
12+
13+
![Trigger demo architecture](/trigger-demo-architecture.png)
14+
15+
1. Vector continuously scrapes host metrics and writes them to GreptimeDB.
16+
2. A Trigger in GreptimeDB evaluates a rule every minute; whenever the condition
17+
is met, it sends a notification to Alertmanager.
18+
3. Alertmanager applies its own policies and finally delivers the alert to Slack.
19+
20+
## Use Vector to Scrape Host Metrics
21+
22+
Use Vector to scrape host metrics and write it to GreptimeDB. Below is a Vector
23+
configuration example:
24+
25+
```toml
26+
[sources.in]
27+
type = "host_metrics"
28+
scrape_interval_secs = 15
29+
30+
[sinks.out]
31+
inputs = ["in"]
32+
type = "greptimedb"
33+
endpoint = "localhost:4001"
34+
```
35+
36+
GreptimeDB auto-creates tables on the first write. The `host_load1` table stores
37+
the system load averaged over the last minute. It is a key performance indicator
38+
for measuring system activity. We can create a monitoring rule to track values
39+
in this table. The schema of this table is shown below:
40+
41+
```sql
42+
+-----------+----------------------+------+------+---------+---------------+
43+
| Column | Type | Key | Null | Default | Semantic Type |
44+
+-----------+----------------------+------+------+---------+---------------+
45+
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
46+
| collector | String | PRI | YES | | TAG |
47+
| host | String | PRI | YES | | TAG |
48+
| val | Float64 | | YES | | FIELD |
49+
+-----------+----------------------+------+------+---------+---------------+
50+
```
51+
52+
## Set up Alertmanager with a Slack Receiver
53+
54+
The payload of GreptimeDB Trigger's Webhook is compatible with [Prometheus
55+
Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/), so we
56+
can reuse Alertmanager’s grouping, inhibition, silencing and routing features
57+
without any extra glue code.
58+
59+
You can refer to the [official documentation](https://prometheus.io/docs/alerting/latest/configuration/)
60+
to configure Prometheus Alertmanager. Below is a minimal message template you
61+
can use:
62+
63+
```text
64+
{{ define "slack.text" }}
65+
{{ range .Alerts }}
66+
67+
Labels:
68+
{{- range .Labels.SortedPairs }}
69+
- {{ .Name }}: {{ .Value }}
70+
{{ end }}
71+
72+
Annotations:
73+
{{- range .Annotations.SortedPairs }}
74+
- {{ .Name }}: {{ .Value }}
75+
{{ end }}
76+
77+
{{ end }}
78+
{{ end }}
79+
```
80+
81+
Generating a Slack message using the above template will iterate over all alerts
82+
and display the labels and annotations for each alert.
83+
84+
Start Alertmanager once the configuration is ready.
85+
86+
87+
## Create Trigger
88+
89+
Connect to GreptimeDB with MySql client and run the following SQL:
90+
91+
```sql
92+
CREATE TRIGGER IF NOT EXISTS load1_monitor
93+
ON (
94+
SELECT collector AS label_collector,
95+
host as label_host,
96+
val
97+
FROM host_load1 WHERE val > 10 and ts >= now() - '1 minutes'::INTERVAL
98+
) EVERY '1 minute'::INTERVAL
99+
LABELS (severity=warning)
100+
ANNOTATIONS (comment='Your computer is smoking, should take a break.')
101+
NOTIFY(
102+
WEBHOOK alert_manager URL 'http://localhost:9093' WITH (timeout="1m")
103+
);
104+
```
105+
106+
The above SQL will create a trigger named `load1_monitor` that runs every minute.
107+
It evaluates the last 60 seconds of data in `host_load1`; if any load1 value
108+
exceeds 10, the `WEBHOOK` option in the `NOTIFY` syntax specifies that this
109+
trigger will send a notification to Alertmanager which running on localhost with
110+
port 9093.
111+
112+
You can execute `SHOW TRIGGERS` to view the list of created Triggers.
113+
114+
```sql
115+
SHOW TRIGGERS;
116+
```
117+
118+
The output should look like this:
119+
120+
```text
121+
+---------------+
122+
| Triggers |
123+
+---------------+
124+
| load1_monitor |
125+
+---------------+
126+
```
127+
128+
## Test Trigger
129+
130+
Use [stress-ng](https://github.com/ColinIanKing/stress-ng) to simulate high CPU
131+
load for 60s:
132+
133+
```bash
134+
stress-ng --cpu 100 --cpu-load 10 --timeout 60
135+
```
136+
137+
The load1 will rise quickly, the Trigger notification will fire, and within a
138+
minute Slack channel will receive an alert like:
139+
140+
![Trigger slack alert](/trigger-slack-alert.png)

i18n/zh/docusaurus-plugin-content-docs/current/enterprise/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Enterprise 版还提供更多增强功能,帮助企业优化数据效率并显
3333
- Elasticsearch 查询兼容性:在 Kibana 中以 GreptimeDB 作为后端。
3434
- Greptime 企业版管理控制台:加强版本的管理界面,提供更多的集群管理和监控功能。
3535
- [读副本](./read-replica.md):专门运行复杂的查询操作的 datanode,避免影响实时写入。
36-
- 触发器:定时查询和检测预配置的规则,可触发外部 webhook,兼容 Prometheus
36+
- [触发器](./trigger/overview.md):定时查询和检测预配置的规则,可触发外部 webhook,兼容 Prometheus
3737
AlertManager。
3838
- Flow 的可靠性功能。
3939

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
keywords: [触发器, 告警, GreptimeDB 企业版, SQL, Webhook]
3+
description: GreptimeDB 触发器概述。
4+
---
5+
6+
# Trigger
7+
8+
Trigger 允许用户基于 SQL 语句定义触发规则,GreptimeDB 根据这些触发规则进行周期性
9+
计算,当满足条件后对外发出通知。
10+
11+
- [快读启动示例](./quick-start.md): 手把手教你使用 Trigger, 用于监控系统负载并在
12+
负载过高时触发告警。
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
keywords: [触发器, 告警, GreptimeDB企业版, SQL, Webhook, Alertmanager, Slack]
3+
description: 本指南演示GreptimeDB触发器如何与Prometheus Alertmanager生态系统无缝集成,实现监控和告警功能。
4+
---
5+
6+
## 快速入门示例
7+
8+
本节将通过一个端到端示例展示如何使用触发器监控系统负载并触发告警。
9+
10+
下图展示了该示例的完整端到端工作流程。
11+
12+
![触发器演示架构](/trigger-demo-architecture.png)
13+
14+
1. Vector 持续采集主机指标并写入 GreptimeDB。
15+
2. GreptimeDB 中的 Trigger 每分钟评估规则;当条件满足时,会向 Alertmanager 发送
16+
通知。
17+
3. Alertmanager 依据自身配置完成告警分组、抑制及路由,最终通过 Slack 集成将消息
18+
发送至指定频道。
19+
20+
## 使用 Vector 采集主机指标
21+
22+
首先,使用 Vector 采集本机的负载数据,并将数据写入 GreptimeDB 中。Vector 的配置
23+
示例如下所示:
24+
25+
```toml
26+
[sources.in]
27+
type = "host_metrics"
28+
scrape_interval_secs = 15
29+
30+
[sinks.out]
31+
inputs = ["in"]
32+
type = "greptimedb"
33+
endpoint = "localhost:4001"
34+
```
35+
36+
GreptimeDB 会在数据写入的时候自动创建表,其中,`host_load1`表记录了 load1 数据,
37+
load1 是衡量系统活动的关键性能指标。我们可以创建监控规则来跟踪此表中的值。表结构
38+
如下所示:
39+
40+
```sql
41+
+-----------+----------------------+------+------+---------+---------------+
42+
| Column | Type | Key | Null | Default | Semantic Type |
43+
+-----------+----------------------+------+------+---------+---------------+
44+
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
45+
| collector | String | PRI | YES | | TAG |
46+
| host | String | PRI | YES | | TAG |
47+
| val | Float64 | | YES | | FIELD |
48+
+-----------+----------------------+------+------+---------+---------------+
49+
```
50+
51+
## 配置 Alertmanager 与 Slack 集成
52+
53+
GreptimeDB Trigger 的 Webhook payload 与 [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/)
54+
兼容,因此我们可以复用 Alertmanager 的分组、抑制、静默和路由功能,而无需任何额外
55+
的胶水代码。
56+
57+
你可以参考 [官方文档](https://prometheus.io/docs/alerting/latest/configuration/)
58+
对 Prometheus Alertmanager 进行配置。为在 Slack 消息中呈现一致、易读的内容,可以
59+
配置以下消息模板。
60+
61+
```text
62+
{{ define "slack.text" }}
63+
{{ range .Alerts }}
64+
65+
Labels:
66+
{{- range .Labels.SortedPairs }}
67+
- {{ .Name }}: {{ .Value }}
68+
{{ end }}
69+
70+
Annotations:
71+
{{- range .Annotations.SortedPairs }}
72+
- {{ .Name }}: {{ .Value }}
73+
{{ end }}
74+
75+
{{ end }}
76+
{{ end }}
77+
```
78+
79+
使用上述模板生成 slack 消息会遍历所有的告警,并把每个告警的标签和注解展示出来。
80+
81+
当配置完成之后,启动 Alertmanager。
82+
83+
## 创建 Trigger
84+
85+
在 GreptimeDB 中创建 Trigger。使用 MySql 客户端连接 GreptimeDB 并执行以下 SQL:
86+
87+
```sql
88+
CREATE TRIGGER IF NOT EXISTS load1_monitor
89+
ON (
90+
SELECT collector AS label_collector,
91+
host as label_host,
92+
val
93+
FROM host_load1 WHERE val > 10 and ts >= now() - '1 minutes'::INTERVAL
94+
) EVERY '1 minute'::INTERVAL
95+
LABELS (severity=warning)
96+
ANNOTATIONS (comment='Your computer is smoking, should take a break.')
97+
NOTIFY(
98+
WEBHOOK alert_manager URL 'http://localhost:9093' WITH (timeout="1m")
99+
);
100+
```
101+
102+
上述 SQL 将创建一个名为 `load1_monitor` 的触发器,每分钟运行一次。它会评估 `host_load1`
103+
表中最近 60 秒的数据;如果任何 load1 值超过 10,则 `NOTIFY` 子句中的 `WEBHOOK`
104+
选项会指定 Trigger 向在本地主机上运行且端口为 9093 的 Alertmanager 发送通知。
105+
106+
执行 `SHOW TRIGGERS` 查看已创建的触发器列表。
107+
108+
```sql
109+
SHOW TRIGGERS;
110+
```
111+
112+
输出结果应如下所示:
113+
114+
```text
115+
+---------------+
116+
| Triggers |
117+
+---------------+
118+
| load1_monitor |
119+
+---------------+
120+
```
121+
122+
## 测试 Trigger
123+
124+
使用 [stress-ng](https://github.com/ColinIanKing/stress-ng) 模拟 60 秒的高 CPU 负载:
125+
126+
```bash
127+
stress-ng --cpu 100 --cpu-load 10 --timeout 60
128+
```
129+
130+
load1 值将快速上升,Trigger 通知将被触发,在一分钟之内,指定的 Slack 频道将收到如下
131+
告警:
132+
133+
![Slack 告警示意图](/trigger-slack-alert.png)

sidebars.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -572,6 +572,14 @@ const sidebars: SidebarsConfig = {
572572
'enterprise/release-notes/release-24_11',
573573
]
574574
},
575+
{
576+
type: 'category',
577+
label: 'Trigger',
578+
items: [
579+
'enterprise/trigger/overview',
580+
'enterprise/trigger/quick-start',
581+
]
582+
},
575583
],
576584
},
577585
{
58.6 KB
Loading

static/trigger-slack-alert.png

26.6 KB
Loading

0 commit comments

Comments
 (0)