You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the output above, the `pipeline` field is a YAML-formatted string. Since the JSON format does not display YAML strings well, the `echo` command can be used to present it in a more human-readable way:
The pipeline configuration contains an error. The `gsub` Processor expects the `replacement` field to be a string, but the current configuration provides an array. As a result, the pipeline creation fails with the following error message:
@@ -246,7 +251,7 @@ transform:
246
251
type: string
247
252
- field: time
248
253
type: time
249
-
index: timestamp'
254
+
index: timestamp
250
255
```
251
256
252
257
Now that the Pipeline has been created successfully, you can test the Pipeline using the `dryrun` interface.
Transform decides the final datatype and table structure in the database.
60
60
Table suffix allows storing the data into different tables.
61
61
62
+
- Version is used to state the pipeline configuration format. Although it's optional, it's high recommended to start with version 2. See [here](#transform-in-version-2) for more details.
62
63
- Processors are used for preprocessing log data, such as parsing time fields and replacing fields.
63
64
- Dispatcher(optional) is used for forwarding the context into another pipeline, so that the same batch of input data can be divided and processed by different pipeline based on certain fields.
64
65
- Transform(optional) is used for converting data formats, such as converting string types to numeric types, and specifying indexes.
@@ -67,6 +68,7 @@ Table suffix allows storing the data into different tables.
67
68
Here is an example of a simple configuration that includes Processors and Transform:
Starting from `v0.15`, the GreptimeDB introduce a version `2` format.
100
102
The main change is the transform process.
101
-
Refer to [the following documentation](#transform-in-doc-version-2) for detailed changes.
103
+
Refer to [the following documentation](#transform-in-version-2) for detailed changes.
102
104
103
105
## Processor
104
106
@@ -865,27 +867,69 @@ The `filter` processor takes the following options:
865
867
866
868
Transform is used to convert data formats and specify indexes upon columns. It is located under the `transform` section in the YAML file.
867
869
868
-
Starting from `v0.15`, an auto-transform mode is added to simplify the configuration. See below for details.
870
+
Starting from `v0.15`, GreptimeDB is introducing version 2 format and auto-transform to largely simplify the configuration. See below for details.
869
871
870
872
A Transform consists of one or more configurations, and each configuration contains the following fields:
871
873
872
874
- `fields`: A list of field names to be transformed.
873
-
- `type`: The transformation type.
874
-
- `index`: The index type (optional).
875
-
- `tag`: Specify the field to be a tag field (optional).
876
-
- `on_failure`: Handling method for transformation failures (optional).
877
-
- `default`: Default value (optional).
875
+
- `type`: The target transformation type in the database.
876
+
- `index`(optional): The index type.
877
+
- `tag`(optional): Specify the field to be a tag field.
878
+
- `on_failure`(optional): Handling method for transformation failures.
879
+
- `default`(optional): Default value.
880
+
881
+
### Transform in version 2
882
+
883
+
Originally, you have to manually specify all the fields in the transform section for them to be persisted in the database.
884
+
If a field is not specify in the transform, it will be discards.
885
+
With the number of field growing, this can make the configuration both tedious and error-prone.
886
+
887
+
Starting from `v0.15`, GreptimeDB introduces a new transform mode which make it easier to write pipeline configuration.
888
+
You only set necessary fields in the transform section, specifying particular datatype and index for them; the rest of the fields from the pipeline context are set automatically by the pipeline engine.
889
+
With the `select` processor, you can decide what field is wanted and what isn't in the final table.
890
+
891
+
However, this is a breaking change to the existing pipeline configuration files.
892
+
If you has already used pipeline with `dissect` or `regex` processors, after upgrading the database, the original message string, which is still in the pipeline context, gets immediately inserted into the database and there's no way to stop this behavior.
893
+
894
+
Therefore, GreptimeDB introduces the concept of version to decide which transform mode you want to use, just like the version in a Docker Compose file. Here is an example:
895
+
```YAML
896
+
version: 2
897
+
processors:
898
+
- date:
899
+
field: input_str
900
+
formats:
901
+
- "%Y-%m-%dT%H:%M:%S%.3fZ"
902
+
903
+
transform:
904
+
- field: input_str, ts
905
+
type: time, ms
906
+
index: timestamp
907
+
```
908
+
909
+
Simply add a `version: 2` line at the top of the config file, and the pipeline engine will run the transform in combined mode:
910
+
1. Process all written transform rules sequentially.
911
+
2. Write all fields of the pipeline context to the final table.
912
+
913
+
Note:
914
+
- The transform section **must contains one timestamp index field**.
915
+
- The transform process in the version 2 will consume the original field in the pipeline context, so you can't transform the same field twice.
878
916
879
917
### Auto-transform
880
-
If no transform section is specified in the pipeline configuration, the pipeline engine will attempt to infer the data types of the fields from the context and preserve them in the database, much like the `identity_pipeline` does.
881
918
882
-
To create a table in GreptimeDB, a time index column must be specified.
919
+
The transform configuration in version 2 format is already a large simplification over the original transform.
920
+
However, there are times when you might want to combine the power of processors with the ease of using `greptime_identity`, writing no transform code, letting the pipeline engine auto infer and persist the data.
921
+
922
+
Now it is possible in custom pipelines.
923
+
If no transform section is specified, the pipeline engine will attempt to infer the datatype of the fields from the pipeline context and preserve them into the database, much like what the `identity_pipeline` does.
924
+
925
+
To create a table in GreptimeDB, a timestamp index column must be specified.
883
926
In this case, the pipeline engine will try to find a field of type `timestamp` in the context and set it as the time index column.
884
-
A `timestamp` field is produced by a `date` or `epoch` processor, so at least one `date` or `epoch` processor must be defined in the processors section.
927
+
A `timestamp` field is produced by a `date` or `epoch` processor, so at least one `date` or `epoch` processor must be defined in the processor's section.
885
928
Additionally, only one `timestamp` field is allowed, multiple `timestamp` fields would lead to an error due to ambiguity.
886
929
887
930
For example, the following pipeline configuration is now valid.
888
931
```YAML
932
+
version: 2
889
933
processors:
890
934
- dissect:
891
935
fields:
@@ -1037,42 +1081,6 @@ The result will be:
1037
1081
}
1038
1082
```
1039
1083
1040
-
### Transform in doc version 2
1041
-
1042
-
Before `v0.15`, the pipeline engine only supports a fully-set transform mode and an auto-transform mode:
1043
-
- Fully-set transform: only fields explicitly noted in the transform section will be persisted into the database
1044
-
- Auto-transform: no transform section is written, and the pipeline engine will try to set all the fields from the pipeline context. But in this case, there is no way to set other indexes other than the time index.
1045
-
1046
-
Starting from `v0.15`, GreptimeDB introduces a new transform mode combining the advantages of the existing two, which make it easier to write pipeline configuration.
1047
-
You only set necessary fields in the transform section, specifying particular datatype and index for them; the rest of the fields from the pipeline context are set automatically by the pipeline engine.
1048
-
With the `select` processor, you can decide what field is wanted and what isn't in the final table.
1049
-
1050
-
However, this is a breaking change to the existing pipeline configuration files.
1051
-
If you has already used pipeline with `dissect` or `regex` processors, after upgrading the database, the original message string, which is still in the pipeline context, gets immediately inserted into the database and there's no way to stop this behavior.
1052
-
1053
-
Therefore, GreptimeDB introduces the concept of doc version to decide which transform mode you want to use, just like the version in a Docker Compose file. Here is an example:
1054
-
```YAML
1055
-
version: 2
1056
-
processors:
1057
-
- date:
1058
-
field: input_str
1059
-
formats:
1060
-
- "%Y-%m-%dT%H:%M:%S%.3fZ"
1061
-
1062
-
transform:
1063
-
- field: input_str, ts
1064
-
type: time, ms
1065
-
index: timestamp
1066
-
```
1067
-
1068
-
Simply add a `version: 2` line at the top of the config file, and the pipeline engine will run the transform in combined mode:
1069
-
1. Process all written transform rules sequentially.
1070
-
2. Write all fields of the pipeline context to the final table.
1071
-
1072
-
Note:
1073
-
- If the transform section is explicitly written, **it must contain a time index field**. Otherwise the time-index field will be inferred by the pipeline engine just like the auto-transform mode.
1074
-
- The transform process in the version 2 will consume the original field in the pipeline context, so you can't transform the same field twice.
1075
-
1076
1084
## Dispatcher
1077
1085
1078
1086
The pipeline dispatcher routes requests to other pipelines based on configured
0 commit comments