Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 125 additions & 3 deletions pipeline/inputs/opentelemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Fluent Bit has a compliant implementation which fully supports `OTLP/HTTP` and `
| `net.share_port` | Allow multiple plugins to bind to the same port. | `false` |
| `port` | The port for Fluent Bit to listen for incoming connections. | `4318` |
| `profiles_support` | This is an experimental feature, feel free to test it but don't enable this in production environments. | `false` |
| `raw_traces` | Forward traces without processing. | `false` |
| `raw_traces` | Forward traces without processing. When set to `false` (default), traces are processed using the unified JSON parser with strict validation. When set to `true`, trace data is forwarded as raw log messages without validation or processing. | `false` |
| `routable` | If set to `true`, the data generated by the plugin will be routable, meaning that it can be forwarded to other plugins or outputs. If set to `false`, the data will be discarded. | `true` |
| `storage.pause_on_chunks_overlimit` | Enable pausing on an input when they reach their chunks limit. | _none_ |
| `storage.type` | Sets the storage type for this input, one of: `filesystem`, `memory` or `memrb`. | `memory` |
Expand All @@ -57,7 +57,7 @@ Fluent Bit has a compliant implementation which fully supports `OTLP/HTTP` and `
| `tls.verify_hostname` | Enable or disable to verify hostname. | `off` |
| `tls.vhost` | Hostname to be used for TLS SNI extension. | _none_ |

Raw traces means that any data forwarded to the traces endpoint (`/v1/traces`) will be packed and forwarded as a log message, and won't be processed by Fluent Bit. The traces endpoint by default expects a valid `protobuf` encoded payload, but you can set the `raw_traces` option in case you want to get trace telemetry data to any of the Fluent Bit supported outputs.
When `raw_traces` is set to `false` (default), the traces endpoint (`/v1/traces`) processes incoming trace data using the unified JSON parser with strict validation. The endpoint accepts both `protobuf` and `JSON` encoded payloads. When `raw_traces` is set to `true`, any data forwarded to the traces endpoint will be packed and forwarded as a log message without processing, validation, or conversion to the Fluent Bit internal trace format.

### OpenTelemetry transport protocol endpoints

Expand Down Expand Up @@ -92,7 +92,7 @@ The OpenTelemetry input plugin supports the following telemetry data types:
|---------|---------------|----------------|------------|
| Logs | Stable | Stable | Stable |
| Metrics | Unimplemented | Stable | Stable |
| Traces | Unimplemented | Stable | Stable |
| Traces | Stable | Stable | Stable |

A sample configuration file to get started will look something like the following:

Expand Down Expand Up @@ -135,3 +135,125 @@ A sample curl request to POST JSON encoded log data would be:
```shell
curl --header "Content-Type: application/json" --request POST --data '{"resourceLogs":[{"resource":{},"scopeLogs":[{"scope":{},"logRecords":[{"timeUnixNano":"1660296023390371588","body":{"stringValue":"{\"message\":\"dummy\"}"},"traceId":"","spanId":""}]}]}]}' http://0.0.0.0:4318/v1/logs
```

## OpenTelemetry trace improvements

Fluent Bit includes enhanced support for OpenTelemetry traces with improved JSON parsing, error handling, and validation capabilities.

### Unified trace JSON parser

Fluent Bit provides a unified interface for processing OpenTelemetry trace data in JSON format. The parser converts OpenTelemetry JSON trace payloads into the Fluent Bit internal trace representation, supporting the full OpenTelemetry trace specification including:

- Resource spans with attributes
- Instrumentation scope information
- Span data (names, IDs, timestamps, status)
- Span events and links
- Trace and span ID validation

The unified parser handles the OpenTelemetry JSON encoding format, which wraps attribute values in type-specific containers (for example, `stringValue`, `intValue`, `doubleValue`, `boolValue`).

### Error status propagation

The OpenTelemetry input plugin provides detailed error status information when processing trace data. If trace processing fails, the plugin returns specific error codes that help identify the issue:

- `FLB_OTEL_TRACES_ERR_INVALID_JSON` - Invalid JSON format
- `FLB_OTEL_TRACES_ERR_INVALID_TRACE_ID` - Invalid trace ID format or length
- `FLB_OTEL_TRACES_ERR_INVALID_SPAN_ID` - Invalid span ID format or length
- `FLB_OTEL_TRACES_ERR_INVALID_PARENT_SPAN_ID` - Invalid parent span ID
- `FLB_OTEL_TRACES_ERR_STATUS_FAILURE` - Invalid span status code
- `FLB_OTEL_TRACES_ERR_INVALID_ATTRIBUTES` - Invalid attribute format
- `FLB_OTEL_TRACES_ERR_INVALID_EVENT_ENTRY` - Invalid span event
- `FLB_OTEL_TRACES_ERR_INVALID_LINK_ENTRY` - Invalid span link

#### Valid span status codes

The OpenTelemetry specification defines three valid span status codes. When processing trace data, the plugin accepts the following status code values (case-insensitive):

- `OK` - The operation completed successfully
- `ERROR` - The operation has an error
- `UNSET` - The status isn't set (default)

Any other status code value triggers `FLB_OTEL_TRACES_ERR_STATUS_FAILURE` and causes the trace data to be rejected. The status code must be provided as a string in the `status.code` field of the span object.

#### Error handling behavior

When trace validation fails, the following behavior applies:

- **Trace data is dropped**: Invalid trace data isn't processed or forwarded. The trace payload is rejected immediately.
- **Error logging**: The plugin logs an error message with the specific error status code to help diagnose issues. Error messages include the error code number and description.
- **No retry mechanism**: Failed requests aren't automatically retried. The client must resend corrected trace data.
- **HTTP response codes**:
- **HTTP/1.1**: Returns `400 Bad Request` with an error message when validation fails. Returns the configured `successful_response_code` (default `201 Created`) when processing succeeds.
- **gRPC**: Returns gRPC status `2 (UNKNOWN)` with message "Serialization error." when validation fails. Returns gRPC status `0 (OK)` with an empty `ExportTraceServiceResponse` when processing succeeds.

### Strict ID decoding

Fluent Bit enforces strict validation for trace and span IDs to ensure data integrity:

- **Trace IDs**: Must be exactly 32 hexadecimal characters (16 bytes)
- **Span IDs**: Must be exactly 16 hexadecimal characters (8 bytes)
- **Parent Span IDs**: Must be exactly 16 hexadecimal characters (8 bytes) when present

The validation process:
1. Verifies the ID length matches the expected size
2. Validates that all characters are valid hexadecimal digits (0-9, a-f, A-F)
3. Decodes the hexadecimal string to binary format
4. Rejects invalid IDs with appropriate error codes

Invalid IDs result in error status codes (`FLB_OTEL_TRACES_ERR_INVALID_TRACE_ID`, `FLB_OTEL_TRACES_ERR_INVALID_SPAN_ID`, and so on) and the trace data is rejected to prevent processing of corrupted or malformed trace information.

### Example: JSON trace payload

The following example shows a valid OpenTelemetry JSON trace payload that can be sent to the `/v1/traces` endpoint:

```json
{
"resourceSpans": [
{
"resource": {
"attributes": [
{
"key": "service.name",
"value": {
"stringValue": "my-service"
}
}
]
},
"scopeSpans": [
{
"scope": {
"name": "my-instrumentation",
"version": "1.0.0"
},
"spans": [
{
"traceId": "0123456789abcdef0123456789abcdef",
"spanId": "0123456789abcdef",
"name": "my-span",
"kind": 1,
"startTimeUnixNano": "1660296023390371588",
"endTimeUnixNano": "1660296023391371588",
"status": {
"code": "OK"
},
"attributes": [
{
"key": "http.method",
"value": {
"stringValue": "GET"
}
}
]
}
]
}
]
}
]
}
```

Trace IDs must be exactly 32 hex characters and span IDs must be exactly 16 hex characters. Invalid IDs will be rejected with appropriate error messages.

In the example, the `status.code` field uses `"OK"`. Valid status code values are `"OK"`, `"ERROR"`, and `"UNSET"` (case-insensitive). Any other value triggers `FLB_OTEL_TRACES_ERR_STATUS_FAILURE` and causes the trace to be rejected.