Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ It should be noted that when handling character set matching, Utf-8 standard cha

If the 'pattern' is not allowed regexp regular, throw error;

Support character match classes : https://github.com/google/re2/wiki/Syntax
Support character match classes : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris supports enabling more advanced regular expression features, such as look-around zero-width assertions, through the session variable `enable_extended_regex` (default is `false`).

## Syntax

Expand Down Expand Up @@ -183,4 +185,22 @@ SELECT regexp_extract_all('hello (world) 123', '([[:alpha:]+');
```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:alpha:]+
Error: missing ]: [[:alpha:]+
```

Advanced regexp
```sql
SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=ID:)([A-Z]{2}-\d). Error: invalid perl operator: (?<
```

```sql
SET enable_extended_regex = true;
SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
```
```text
+-------------------------------------------------------------------------+
| REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)') |
+-------------------------------------------------------------------------+
| ['AA-1','BB-2','CC-3'] |
+-------------------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ Support since Apache Doris 3.0.2

If the 'pattern' is not allowed regexp regular,throw error

Support character match classes : https://github.com/google/re2/wiki/Syntax
Support character match classes : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris supports enabling more advanced regular expression features, such as look-around zero-width assertions, through the session variable `enable_extended_regex` (default is `false`).

## Syntax

Expand Down Expand Up @@ -226,4 +228,22 @@ mysql> SELECT REGEXP_EXTRACT_OR_NULL('123AbCdExCx', '([[:lower:]]+)C([[]ower:]]+
```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:lower:]]+)C([[:lower:]+)
Error: missing ]: [[:lower:]+)
```

Advanced regexp
```sql
SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). Error: invalid perl operator: (?<
```

```sql
SET enable_extended_regex = true;
SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
```
```text
+-----------------------------------------------------------------+
| regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1) |
+-----------------------------------------------------------------+
| 123 |
+-----------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ The `pos` parameter is of 'integer' type, used to specify the position in the st

If the `pattern` is not allowed regexp regular,throw error;

Support character match classes : https://github.com/google/re2/wiki/Syntax
Support character match classes : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris supports enabling more advanced regular expression features, such as look-around zero-width assertions, through the session variable `enable_extended_regex` (default is `false`).

## Syntax
```sql
Expand Down Expand Up @@ -179,4 +181,22 @@ SELECT regexp_extract('AbCdE', '([[:digit:]]+', 1);
```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:digit:]]+
Error: missing ): ([[:digit:]]+
```

Advanced regexp
```sql
SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). Error: invalid perl operator: (?<
```

```sql
SET enable_extended_regex = true;
SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
```
```text
+---------------------------------------------------------------+
| regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1) |
+---------------------------------------------------------------+
| 123 |
+---------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,17 @@
---

## Description

~
Performs a regular expression match on the string str, returning true if the match succeeds, otherwise false. pattern is the regular expression pattern.
It should be noted that when handling character set matching, Utf-8 standard character classes should be used. This ensures that functions can correctly identify and process various characters from different languages.

If the `pattern` is not allowed regexp regular,throw error;

Support character match classes : https://github.com/google/re2/wiki/Syntax
Support character match classes : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris supports enabling more advanced regular expression features, such as look-around zero-width assertions, through the session variable `enable_extended_regex` (default is `false`).

Note: After enabling this variable, performance will only be affected when the regular expression contains advanced syntax (such as look-around). According to testing, the performance of a `pattern` without look-around type zero-width assertions (`?=`, `?!`, `?<=`, `?<!`) is about 12 times faster than when they are included. Therefore, for better performance, it is recommended to optimize your regular expressions as much as possible and avoid using such zero-width assertions.

## Syntax

Expand Down Expand Up @@ -191,4 +195,20 @@ SELECT REGEXP('Hello, World!', '([a-z');

```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INTERNAL_ERROR]Invalid regex expression: ([a-z
```

Advanced regexp
```sql
SELECT REGEXP('Apache/Doris', '([a-zA-Z_+-]+(?:\/[a-zA-Z_0-9+-]+)*)(?=s|$)');
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Invalid regex expression: ([a-zA-Z_+-]+(?:/[a-zA-Z_0-9+-]+)*)(?=s|$). Error: invalid perl operator: (?=

SET enable_extended_regex = true;
SELECT REGEXP('Apache/Doris', '([a-zA-Z_+-]+(?:\/[a-zA-Z_0-9+-]+)*)(?=s|$)');
```
```text
+-----------------------------------------------------------------------+
| REGEXP('Apache/Doris', '([a-zA-Z_+-]+(?:\/[a-zA-Z_0-9+-]+)*)(?=s|$)') |
+-----------------------------------------------------------------------+
| 1 |
+-----------------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ REGEXP_EXTRACT_ALL 函数用于对给定字符串str执行正则表达式匹配

如果 'pattern' 参数不符合正则表达式,则抛出错误

支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
支持的字符匹配种类 : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 look-around 零宽断言。

## 语法

Expand Down Expand Up @@ -201,4 +203,22 @@ SELECT regexp_extract_all('hello (world) 123', '([[:alpha:]+');
```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:alpha:]+
Error: missing ]: [[:alpha:]+
```

高级的正则表达式
```sql
SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=ID:)([A-Z]{2}-\d). Error: invalid perl operator: (?<
```

```sql
SET enable_extended_regex = true;
SELECT REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)');
```
```text
+-------------------------------------------------------------------------+
| REGEXP_EXTRACT_ALL('ID:AA-1,ID:BB-2,ID:CC-3', '(?<=ID:)([A-Z]{2}-\\d)') |
+-------------------------------------------------------------------------+
| ['AA-1','BB-2','CC-3'] |
+-------------------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@
从 Apache Doris 3.0.2 版本开始支持
:::

支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
支持的字符匹配种类 : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 look-around 零宽断言。

## 语法

Expand Down Expand Up @@ -227,4 +229,22 @@ mysql> SELECT REGEXP_EXTRACT_OR_NULL('123AbCdExCx', '([[:lower:]]+)C([[]ower:]]+
```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:lower:]]+)C([[:lower:]+)
Error: missing ]: [[:lower:]+)
```

高级的正则表达式
```sql
SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). Error: invalid perl operator: (?<
```

```sql
SET enable_extended_regex = true;
SELECT regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1);
```
```text
+-----------------------------------------------------------------+
| regexp_extract_or_null('foo123bar', '(?<=foo)(\\d+)(?=bar)', 1) |
+-----------------------------------------------------------------+
| 123 |
+-----------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ pos参数为 'integer' 类型,用于指定字符串中开始搜索正则表达

如果 'pattern' 参数不符合正则表达式,则抛出错误

支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
支持的字符匹配种类 : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 look-around 零宽断言。

## 语法
```sql
Expand Down Expand Up @@ -201,4 +203,22 @@ SELECT regexp_extract('AbCdE', '([[:digit:]]+', 1);
```text
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INVALID_ARGUMENT]Could not compile regexp pattern: ([[:digit:]]+
Error: missing ): ([[:digit:]]+
```

高级的正则表达式
```sql
SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Invalid regex pattern: (?<=foo)(\d+)(?=bar). Error: invalid perl operator: (?<
```

```sql
SET enable_extended_regex = true;
SELECT regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1);
```
```text
+---------------------------------------------------------------+
| regexp_extract('foo123bar456baz', '(?<=foo)(\\d+)(?=bar)', 1) |
+---------------------------------------------------------------+
| 123 |
+---------------------------------------------------------------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@

如果 'pattern' 参数不符合正则表达式,则抛出错误

支持的字符匹配种类 : https://github.com/google/re2/wiki/Syntax
支持的字符匹配种类 : https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Doris 支持通过会话变量 `enable_extended_regex`(默认为 `false`)来启用更高级的正则表达式功能,例如 look-around 零宽断言。

注:启用此变量后,仅当正则表达式中包含高级语法(如 look-around)时才会影响性能。根据测试,不包含 look-around 类零宽断言(`?=`, `?!`, `?<=`, `?<!`)的 `pattern` 的性能约是包含时的 12 倍。因此,为了获得更好的性能,建议您尽可能优化正则表达式,避免使用此类零宽断言。

## 语法

Expand Down Expand Up @@ -192,3 +196,18 @@ SELECT REGEXP('Hello, World!', '([a-z');
ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.2)[INTERNAL_ERROR]Invalid regex expression: ([a-z
```

高级的正则表达式
```sql
SELECT regexp('foobar', '(?<=foo)bar');
-- ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Invalid regex expression: ([a-zA-Z_+-]+(?:/[a-zA-Z_0-9+-]+)*)(?=s|$). Error: invalid perl operator: (?<

SET enable_extended_regex = true;
SELECT regexp('foobar', '(?<=foo)bar');
```
```text
+---------------------------------+
| regexp('foobar', '(?<=foo)bar') |
+---------------------------------+
| 1 |
+---------------------------------+
```