Skip to content

[receiver/filelog] encoding not applied to multiline option #39011

@DougManton

Description

@DougManton

Component(s)

receiver/filelog

What happened?

Description

SAP audit log files have utf-16le encoding, continuously updated with new logs, fixed length records, and no line termination. This should be supportable using the Filelog receivers encoding and multiline support, but I've got a reproducible bug and workaround.

Steps to Reproduce

  1. Create a utf-16le file named auditlog.txt containing 10 SAP audit log records in the format:
2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        
  1. Configure filelog receiver as following:
receivers:
  filelog/sap:
    include: [ auditlog.txt ]
    encoding: utf-16le
    multiline:
      line_start_pattern: '([23])[A-Z][A-Z][A-Z0-9]\d{14}00'
    preserve_trailing_whitespaces: true
    start_at: beginning

Expected Result

10 log events

Actual Result

1 log event

Workaround

I suspected the multiline processing is not honouring the file encoding, and therefore failing to match the pattern. To test this theory, I adjusted the multiline to only match on the first 8 bits of each 16 bit character:

receivers:
  filelog/sap:
    include: [ auditlog.txt ]
    encoding: utf-16le
    multiline:
      line_start_pattern: '([23]).[A-Z].[A-Z].[A-Z0-9].(\d.){14}0.0.'
    preserve_trailing_whitespaces: true
    start_at: beginning

This configuration outputs 10 log records, each one containing a complete 200 character record.

Collector version

v0.122.0

Environment information

Environment

OS: MacOS 15.3.2

OpenTelemetry Collector configuration

receivers:
  filelog/sap:
    include: [ auditlog.txt ]
    encoding: utf-16le
    multiline:
      line_start_pattern: '([23])[A-Z][A-Z][A-Z0-9]\d{14}00'
    preserve_trailing_whitespaces: true
    start_at: beginning
exporters:
  file/debug:
    path: debug.json
service:
  pipelines:
    logs:
      receivers:
        - filelog/sap
      exporters:
        - file/debug

Log output

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedExtra attention is needednever staleIssues marked with this label will be never staled and automatically removedreceiver/filelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions