Skip to content
Merged
2 changes: 1 addition & 1 deletion docs/enterprise/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This section provides an overview of the advanced features available in Greptime
carries more cluster management and monitoring features.
- [Read Replica](./read-replica.md): Read-only datanode instances for heavy query workloads such as
analytical queries.
- Triggers: Periodically evaluate your rules and trigger external
- [Triggers](./trigger/overview.md): Periodically evaluate your rules and trigger external
webhook. Compatible with Prometheus AlterManager.
- Reliability features for Flow.

Expand Down
13 changes: 13 additions & 0 deletions docs/enterprise/trigger/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
keywords: [Trigger, GreptimeDB Enterprise, SQL, Webhook]
description: The overview of GreptimeDB Trigger.
---

# Trigger

Trigger allows you to define evaluation rules with SQL.
GreptimeDB evaluates these rules periodically; once the condition is met, a
notification is sent out.

- [Quick Start Example](./quick-start.md): A step-by-step guide to set up a
Trigger that monitors system load and raises alerts.
134 changes: 134 additions & 0 deletions docs/enterprise/trigger/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
keywords: [Trigger, Alert, GreptimeDB Enterprise, SQL, Webhook, Alertmanager, Slack]
description: This guide demonstrates how GreptimeDB Triggers enable seamless integration with the Prometheus Alertmanager ecosystem for comprehensive monitoring and alerting.
---

# Quick Start Example

This section walks through a end-to-end example that uses Trigger to monitor
system load(load1) and raise an alert.

"load1" refers to the load average of the Linux system over the past minute.
It is one of the key performance indicators for measuring how busy the system is.

The payload of GreptimeDB Trigger's Webhook is compatible with Prometheus
Alertmanager, so we can reuse Alertmanager’s grouping, inhibition, silencing and
routing features without any extra glue code.

The diagram illustrates the complete end-to-end workflow of the example.

![Trigger demo architecture](/trigger-demo-architecture.png)

1. Vector continuously scrapes host metrics and writes it to GreptimeDB.
2. A Trigger in GreptimeDB evaluates the rule `load1 > 10` every minute; whenever
the condition is met, it sends a notification to Alertmanager.
3. Alertmanager applies its own policies and finally delivers the alert to Slack.

## Prerequisites

Use Vector to scrapes host metrics and write it to GreptimeDB. Below is a Vector
configuration example:

```toml
[sources.in]
type = "host_metrics"
scrape_interval_secs = 15

[sinks.out]
inputs = ["in"]
type = "greptimedb"
endpoint = "localhost:4001"
```

GreptimeDB auto-creates tables on the first write. The resulting `host_load1`
table stores the load1 metrics; its schema is shown below:

```sql
+-----------+----------------------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
| collector | String | PRI | YES | | TAG |
| host | String | PRI | YES | | TAG |
| val | Float64 | | YES | | FIELD |
+-----------+----------------------+------+------+---------+---------------+
```

Set up Alertmanager with a Slack receiver. Below is a minimal message template
you can use:

```text
{{ define "slack.text" }}
{{ range .Alerts }}

Labels:
{{- range .Labels.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

Annotations:
{{- range .Annotations.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

{{ end }}
{{ end }}
```

Generating a Slack message using the above template will iterate over all alerts
and display the labels and annotations for each alert.

Start Alertmanager once the configuration is ready.


## Create Trigger

Connect to GreptimeDB with MySql client and run the following SQL:

```sql
CREATE TRIGGER IF NOT EXISTS load1_monitor
ON (
SELECT collector AS label_collector,
host as label_host,
val
FROM host_load1 WHERE val > 10 and ts >= now() - '1 minutes'::INTERVAL
) EVERY '1 minute'::INTERVAL
LABELS (severity=warning)
ANNOTATIONS (comment='Your computer is smoking, should take a break.')
NOTIFY(
WEBHOOK alert_manager URL 'http://127.0.0.1localhost:9093' WITH (timeout="1m")
);
```

The above SQL will create a trigger named `load1_monitor` that runs every minute.
It evaluates the last 60 seconds of data in host_load1; if any load1 value
exceeds 10, it sends a notification to Alertmanager.

Execute `SHOW TRIGGERS` to view the list of created Triggers.

```sql
SHOW TRIGGERS;
```

The output should look like this:

```text
+---------------+
| Triggers |
+---------------+
| load1_monitor |
+---------------+
```

## Test Trigger

Use stress-ng to simulate high CPU load for 60 s:

```bash
stress-ng --cpu 100 --cpu-load 10 --timeout 60
```

The load1 will rise quickly, the notify of Trigger will fire, and within a minute
Slack channel will receive an alert like:

![Trigger slack alert](/trigger-slack-alert.png)
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Enterprise 版还提供更多增强功能,帮助企业优化数据效率并显
- Elasticsearch 查询兼容性:在 Kibana 中以 GreptimeDB 作为后端。
- Greptime 企业版管理控制台:加强版本的管理界面,提供更多的集群管理和监控功能。
- [读副本](./read-replica.md):专门运行复杂的查询操作的 datanode,避免影响实时写入。
- 触发器:定时查询和检测预配置的规则,可触发外部 webhook,兼容 Prometheus
- [触发器](./trigger/overview.md):定时查询和检测预配置的规则,可触发外部 webhook,兼容 Prometheus
AlertManager。
- Flow 的可靠性功能。

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
keywords: [触发器, 告警, GreptimeDB 企业版, SQL, Webhook]
description: GreptimeDB 触发器概述。
---

# Trigger

Trigger 允许用户基于 SQL 语句定义触发规则,GreptimeDB 根据这些触发规则进行周期性
计算,当满足条件后对外发出通知。

- [快读启动示例](./quick-start.md): 手把手教你使用 Trigger, 用于监控系统负载并在
负载过高时触发告警。
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
keywords: [触发器, 告警, GreptimeDB企业版, SQL, Webhook, Alertmanager, Slack]
description: 本指南演示GreptimeDB触发器如何与Prometheus Alertmanager生态系统无缝集成,实现监控和告警功能。
---

## 快速入门示例

本节将通过一个端到端示例展示如何使用触发器监控系统负载(load1)并触发告警。

“load1” 指的是 Linux 系统中过去 1 分钟的平均负载(load average),它是衡量系统
繁忙程度的关键性能指标之一。

此外,GreptimeDB 的 Webhook 输出格式与 Prometheus Alertmanager 完全兼容,可以直接接
入 Alertmanager 生态。

下图展示了该示例的完整端到端工作流程。

![触发器演示架构](/trigger-demo-architecture.png)

1. Vector 持续采集主机指标并写入 GreptimeDB。
2. GreptimeDB 中的 Trigger 每分钟评估规则`load1 > 10`;当条件满足时,会向 Alertmanager
发送通知。
3. Alertmanager 依据自身配置完成告警分组、抑制及路由,最终通过 Slack 集成将消息
发送至指定频道。

## 前置工作

首先,使用 Vector 采集本机的负载数据,并将数据写入 GreptimeDB 中。Vector 的配置
示例如下所示:

```toml
[sources.in]
type = "host_metrics"
scrape_interval_secs = 15

[sinks.out]
inputs = ["in"]
type = "greptimedb"
endpoint = "localhost:4001"
```

GreptimeDB 会在数据写入的时候自动创建表,其中,`host_load1`表记录了 load1 数据,
表结构如下所示:

```sql
+-----------+----------------------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
| collector | String | PRI | YES | | TAG |
| host | String | PRI | YES | | TAG |
| val | Float64 | | YES | | FIELD |
+-----------+----------------------+------+------+---------+---------------+
```

配置 Alertmanager 的 Slack Receiver 的具体过程不在此赘述。为在 Slack 消息中呈现
一致、易读的内容,可以配置以下模板。

```text
{{ define "slack.text" }}
{{ range .Alerts }}

Labels:
{{- range .Labels.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

Annotations:
{{- range .Annotations.SortedPairs }}
- {{ .Name }}: {{ .Value }}
{{ end }}

{{ end }}
{{ end }}
```

使用上述模板生成 slack 消息会遍历所有的告警,并把每个告警的标签和注解展示出来。

当配置完成之后,启动 Alertmanager。

## 创建 Trigger

在 GreptimeDB 中创建 Trigger。使用 MySql 客户端连接 GreptimeDB 并执行以下 SQL:

```sql
CREATE TRIGGER IF NOT EXISTS load1_monitor
ON (
SELECT collector AS label_collector,
host as label_host,
val
FROM host_load1 WHERE val > 10 and ts >= now() - '1 minutes'::INTERVAL
) EVERY '1 minute'::INTERVAL
LABELS (severity=warning)
ANNOTATIONS (comment='Your computer is smoking, should take a break.')
NOTIFY(
WEBHOOK alert_manager URL 'http://127.0.0.1localhost:9093' WITH (timeout="1m")
);
```

上述SQL将创建一个名为`load1_monitor`的触发器,每分钟运行一次。它会评估 `host_load1`
表中最近 60 秒的数据;如果任何 load1 值超过10,就会触发 GreptimeDB 向 Alertmanager
发送通知。

执行`SHOW TRIGGERS`查看已创建的触发器列表。

```sql
SHOW TRIGGERS;
```

输出结果应如下所示:

```text
+---------------+
| Triggers |
+---------------+
| load1_monitor |
+---------------+
```

## 测试 Trigger

使用 stress-ng 模拟 60 秒的高 CPU 负载:

```bash
stress-ng --cpu 100 --cpu-load 10 --timeout 60
```

load1 值将快速上升,Trigger 将被触发,在一分钟之内,指定的 Slack 频道将收到如下
告警:

![Slack 告警示意图](/trigger-slack-alert.png)
8 changes: 8 additions & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,14 @@ const sidebars: SidebarsConfig = {
'enterprise/release-notes/release-24_11',
]
},
{
type: 'category',
label: 'Trigger',
items: [
'enterprise/trigger/overview',
'enterprise/trigger/quick-start',
]
},
],
},
{
Expand Down
Binary file added static/trigger-demo-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/trigger-slack-alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.