Skip to content
Merged
2 changes: 1 addition & 1 deletion docs/enterprise/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This section provides an overview of the advanced features available in Greptime
carries more cluster management and monitoring features.
- [Read Replica](./read-replica.md): Read-only datanode instances for heavy query workloads such as
analytical queries.
- Triggers: Periodically evaluate your rules and trigger external
- [Triggers](./trigger/overview.md): Periodically evaluate your rules and trigger external
webhook. Compatible with Prometheus AlterManager.
- Reliability features for Flow.

Expand Down
13 changes: 13 additions & 0 deletions docs/enterprise/trigger/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
keywords: [Trigger, GreptimeDB Enterprise, SQL, Webhook]
description: The overview of GreptimeDB Trigger.
---

# Trigger

Trigger allows you to define evaluation rules with SQL.
GreptimeDB evaluates these rules periodically; once the condition is met, a
notification is sent out.

- [Quick Start Example](./quick-start.md): A step-by-step guide to set up a
Trigger that monitors system load and raises alerts.
140 changes: 140 additions & 0 deletions docs/enterprise/trigger/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
keywords: [Trigger, Alert, GreptimeDB Enterprise, SQL, Webhook, Alertmanager, Slack]
description: This guide demonstrates how GreptimeDB Triggers enable seamless integration with the Prometheus Alertmanager ecosystem for comprehensive monitoring and alerting.
---

# Quick Start Example

This section walks through an end-to-end example that uses Trigger to monitor
system load and raise an alert.

The diagram illustrates the complete end-to-end workflow of the example.

![Trigger demo architecture](/trigger-demo-architecture.png)

1. Vector continuously scrapes host metrics and writes them to GreptimeDB.
2. A Trigger in GreptimeDB evaluates a rule every minute; whenever the condition
is met, it sends a notification to Alertmanager.
3. Alertmanager applies its own policies and finally delivers the alert to Slack.

## Use Vector to Scrape Host Metrics

Use Vector to scrape host metrics and write it to GreptimeDB. Below is a Vector
configuration example:

```toml
[sources.in]
type = "host_metrics"
scrape_interval_secs = 15

[sinks.out]
inputs = ["in"]
type = "greptimedb"
endpoint = "localhost:4001"
```

GreptimeDB auto-creates tables on the first write. The `host_load1` table stores
the system load averaged over the last minute. It is a key performance indicator
for measuring system activity. We can create a monitoring rule to track values
in this table. The schema of this table is shown below:

```sql
+-----------+----------------------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
| collector | String | PRI | YES | | TAG |
| host | String | PRI | YES | | TAG |
| val | Float64 | | YES | | FIELD |
+-----------+----------------------+------+------+---------+---------------+
```

## Set up Alertmanager with a Slack Receiver

The payload of GreptimeDB Trigger's Webhook is compatible with [Prometheus
Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/), so we
can reuse Alertmanager’s grouping, inhibition, silencing and routing features
without any extra glue code.

You can refer to the [official documentation](https://prometheus.io/docs/alerting/latest/configuration/)
to configure Prometheus Alertmanager. Below is a minimal message template you
can use:

```text
{{ define "slack.text" }}
{{ range .Alerts }}

Labels:
{{- range .Labels.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

Annotations:
{{- range .Annotations.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

{{ end }}
{{ end }}
```

Generating a Slack message using the above template will iterate over all alerts
and display the labels and annotations for each alert.

Start Alertmanager once the configuration is ready.


## Create Trigger

Connect to GreptimeDB with MySql client and run the following SQL:

```sql
CREATE TRIGGER IF NOT EXISTS load1_monitor
ON (
SELECT collector AS label_collector,
host as label_host,
val
FROM host_load1 WHERE val > 10 and ts >= now() - '1 minutes'::INTERVAL
) EVERY '1 minute'::INTERVAL
LABELS (severity=warning)
ANNOTATIONS (comment='Your computer is smoking, should take a break.')
NOTIFY(
WEBHOOK alert_manager URL 'http://localhost:9093' WITH (timeout="1m")
);
```

The above SQL will create a trigger named `load1_monitor` that runs every minute.
It evaluates the last 60 seconds of data in `host_load1`; if any load1 value
exceeds 10, the `WEBHOOK` option in the `NOTIFY` syntax specifies that this
trigger will send a notification to Alertmanager which running on localhost with
port 9093.

You can execute `SHOW TRIGGERS` to view the list of created Triggers.

```sql
SHOW TRIGGERS;
```

The output should look like this:

```text
+---------------+
| Triggers |
+---------------+
| load1_monitor |
+---------------+
```

## Test Trigger

Use [stress-ng](https://github.com/ColinIanKing/stress-ng) to simulate high CPU
load for 60s:

```bash
stress-ng --cpu 100 --cpu-load 10 --timeout 60
```

The load1 will rise quickly, the Trigger notification will fire, and within a
minute Slack channel will receive an alert like:

![Trigger slack alert](/trigger-slack-alert.png)
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Enterprise 版还提供更多增强功能,帮助企业优化数据效率并显
- Elasticsearch 查询兼容性:在 Kibana 中以 GreptimeDB 作为后端。
- Greptime 企业版管理控制台:加强版本的管理界面,提供更多的集群管理和监控功能。
- [读副本](./read-replica.md):专门运行复杂的查询操作的 datanode,避免影响实时写入。
- 触发器:定时查询和检测预配置的规则,可触发外部 webhook,兼容 Prometheus
- [触发器](./trigger/overview.md):定时查询和检测预配置的规则,可触发外部 webhook,兼容 Prometheus
AlertManager。
- Flow 的可靠性功能。

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
keywords: [触发器, 告警, GreptimeDB 企业版, SQL, Webhook]
description: GreptimeDB 触发器概述。
---

# Trigger

Trigger 允许用户基于 SQL 语句定义触发规则,GreptimeDB 根据这些触发规则进行周期性
计算,当满足条件后对外发出通知。

- [快读启动示例](./quick-start.md): 手把手教你使用 Trigger, 用于监控系统负载并在
负载过高时触发告警。
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
keywords: [触发器, 告警, GreptimeDB企业版, SQL, Webhook, Alertmanager, Slack]
description: 本指南演示GreptimeDB触发器如何与Prometheus Alertmanager生态系统无缝集成,实现监控和告警功能。
---

## 快速入门示例

本节将通过一个端到端示例展示如何使用触发器监控系统负载并触发告警。

下图展示了该示例的完整端到端工作流程。

![触发器演示架构](/trigger-demo-architecture.png)

1. Vector 持续采集主机指标并写入 GreptimeDB。
2. GreptimeDB 中的 Trigger 每分钟评估规则;当条件满足时,会向 Alertmanager 发送
通知。
3. Alertmanager 依据自身配置完成告警分组、抑制及路由,最终通过 Slack 集成将消息
发送至指定频道。

## 使用 Vector 采集主机指标

首先,使用 Vector 采集本机的负载数据,并将数据写入 GreptimeDB 中。Vector 的配置
示例如下所示:

```toml
[sources.in]
type = "host_metrics"
scrape_interval_secs = 15

[sinks.out]
inputs = ["in"]
type = "greptimedb"
endpoint = "localhost:4001"
```

GreptimeDB 会在数据写入的时候自动创建表,其中,`host_load1`表记录了 load1 数据,
load1 是衡量系统活动的关键性能指标。我们可以创建监控规则来跟踪此表中的值。表结构
如下所示:

```sql
+-----------+----------------------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
| collector | String | PRI | YES | | TAG |
| host | String | PRI | YES | | TAG |
| val | Float64 | | YES | | FIELD |
+-----------+----------------------+------+------+---------+---------------+
```

## 配置 Alertmanager 与 Slack 集成

GreptimeDB Trigger 的 Webhook payload 与 [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/)
兼容,因此我们可以复用 Alertmanager 的分组、抑制、静默和路由功能,而无需任何额外
的胶水代码。

你可以参考 [官方文档](https://prometheus.io/docs/alerting/latest/configuration/)
对 Prometheus Alertmanager 进行配置。为在 Slack 消息中呈现一致、易读的内容,可以
配置以下消息模板。

```text
{{ define "slack.text" }}
{{ range .Alerts }}

Labels:
{{- range .Labels.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

Annotations:
{{- range .Annotations.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

{{ end }}
{{ end }}
```

使用上述模板生成 slack 消息会遍历所有的告警,并把每个告警的标签和注解展示出来。

当配置完成之后,启动 Alertmanager。

## 创建 Trigger

在 GreptimeDB 中创建 Trigger。使用 MySql 客户端连接 GreptimeDB 并执行以下 SQL:

```sql
CREATE TRIGGER IF NOT EXISTS load1_monitor
ON (
SELECT collector AS label_collector,
host as label_host,
val
FROM host_load1 WHERE val > 10 and ts >= now() - '1 minutes'::INTERVAL
) EVERY '1 minute'::INTERVAL
LABELS (severity=warning)
ANNOTATIONS (comment='Your computer is smoking, should take a break.')
NOTIFY(
WEBHOOK alert_manager URL 'http://localhost:9093' WITH (timeout="1m")
);
```

上述 SQL 将创建一个名为 `load1_monitor` 的触发器,每分钟运行一次。它会评估 `host_load1`
表中最近 60 秒的数据;如果任何 load1 值超过 10,则 `NOTIFY` 子句中的 `WEBHOOK`
选项会指定 Trigger 向在本地主机上运行且端口为 9093 的 Alertmanager 发送通知。

执行 `SHOW TRIGGERS` 查看已创建的触发器列表。

```sql
SHOW TRIGGERS;
```

输出结果应如下所示:

```text
+---------------+
| Triggers |
+---------------+
| load1_monitor |
+---------------+
```

## 测试 Trigger

使用 [stress-ng](https://github.com/ColinIanKing/stress-ng) 模拟 60 秒的高 CPU 负载:

```bash
stress-ng --cpu 100 --cpu-load 10 --timeout 60
```

load1 值将快速上升,Trigger 通知将被触发,在一分钟之内,指定的 Slack 频道将收到如下
告警:

![Slack 告警示意图](/trigger-slack-alert.png)
8 changes: 8 additions & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,14 @@ const sidebars: SidebarsConfig = {
'enterprise/release-notes/release-24_11',
]
},
{
type: 'category',
label: 'Trigger',
items: [
'enterprise/trigger/overview',
'enterprise/trigger/quick-start',
]
},
],
},
{
Expand Down
Binary file added static/trigger-demo-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/trigger-slack-alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.