Skip to content

Commit f54f0e4

Browse files
🌊 Dissect suggestions (#242377)
## Add Dissect Pattern Suggestion Support to Streams Processing ### Summary This PR adds automatic dissect pattern generation capabilities to the Streams processing pipeline, complementing the existing grok pattern suggestions. Dissect patterns provide faster log parsing for structured logs with simple delimiters (vs regex-based grok). ### What was added #### New Package: `@kbn/dissect-heuristics` - **Core algorithm** (`extractDissectPatternDangerouslySlow`): Analyzes sample log messages to automatically extract dissect patterns - 6-step pipeline: whitespace normalization → delimiter detection → delimiter tree building → field extraction → modifier detection → pattern generation - Supports dissect modifiers: right padding (`->`), named skip (`?`), empty skip (`{}`) - **LLM Review Integration**: Maps generic field names to ECS-compliant field names - `getReviewFields`: Prepares field metadata for LLM review - `getDissectProcessorWithReview`: Applies LLM suggestions to rename fields and handle multi-column field grouping - `ReviewDissectFieldsPrompt`: Structured prompt for LLM field mapping - **Message Grouping**: Re-exports `groupMessagesByPattern` from `@kbn/grok-heuristics` for consistent message clustering #### Server-Side API - **New endpoint**: `POST /internal/streams/{name}/processing/_suggestions/dissect` - Input: connector ID, sample messages, review fields - Output: SSE stream with dissect processor configuration - Handler (dissect_suggestions_handler.ts): Orchestrates LLM review and field mapping with OTEL/ECS field name resolution #### Client-Side Integration - **React hook** (`useDissectPatternSuggestion`): - Groups messages by pattern using `groupMessagesByPattern` - Extracts dissect pattern from the largest message group - Calls LLM for field review - Simulates processor to validate results - Includes telemetry tracking for AI suggestion latency ### Architecture Follows the same pattern as existing grok suggestions: 1. Client groups similar log messages 2. Heuristic algorithm extracts pattern from largest group 3. LLM reviews and maps fields to ECS/OTEL standards (can decide to group fields, turn fields into static parts of the pattern, can decide to skip fields) 4. Simulation validates the processor before applying ### Open questions / considerations * I forked a bunch of stuff from the grok implementation, theoretically some redundancy could be avoided, but I'm not sure how much it would help. For both client and server I abstracted out some base helpers, but I didn't go so far to invent a whole new subsystem for pattern suggestions. Maybe it's worth it, not sure. * I'm using the same pre-grouping used for grok, then just go with the biggest group, since if there are completely different message patterns, you are out of luck anyway with dissect. We could try to make the base logic smarter, but not sure how * When parsing date patterns, it's very common that they are captured with multiple groups, like `%{+timestamp}-%{+timestamp}-%{+timestamp}`. This works fine, but it means that with the default `' '` append separator, the resulting custom timestamp column becomes a non-standard date format, which is not captured by the date format suggestion logic we have in place. Maybe we can make that smarter, that would be great anyway * Added new tracking events for dissect patterns, could also be a param on the existing one, but I wanted to stay backwards compatible * The dissect processor could need some love, e.g. a better editor experience, syntax highlighting, automatic multi-line preview, maybe even highlighting like grok... But I think it is out of scope for this PR * Sometimes the AI messes up and puts static values in places where they don't belong, breaking matches. We might be able to improve on that, but it doesn't happen a ton, so I didn't go too far on this. I could imagine a simulation feedback loop where we try to use the generated pattern, if it doesn't have matches give it back to the LLM and let it try again <details> <summary>Click to expand eval for loghub data</summary> ``` Getting suggestions... - logs.apache-web: [%{field_1} %{field_2} %{field_3} %{field_4} %{field_5}] [%{field_6}] %{field_7->} %{field_8->} %{field_9} - logs.hadoop-logs: %{field_1}-%{field_2}-%{field_3} %{field_4},%{field_5} %{field_6} [%{field_7}] %{field_8}: %{field_9} %{field_10} %{field_11} %{field_12} %{field_13}_%{field_14}_%{field_15}_%{field_16} - logs.bgl-logs: - %{field_1} %{field_2} %{field_3}-%{field_4}-%{field_5}-%{field_6}-%{field_7} %{field_8}-%{field_9}-%{field_10}-%{field_11} %{field_12}-%{field_13}-%{field_14}-%{field_15}-%{field_16} %{field_17} %{field_18} %{field_19} %{field_20} %{field_21} %{field_22} %{field_23} %{field_24} - logs.health-app-logs: %{field_1}-%{field_2}|%{field_3}_%{field_4}|%{field_5}|%{field_6} - logs.windows: %{field_1}-%{field_2}-%{field_3} %{field_4}, %{field_5->} %{field_6->} %{field_7->} %{field_8->} %{field_9} - logs.android: %{field_1}-%{field_2} %{field_3->} %{field_4->} %{field_5->} %{field_6->} %{field_7}: %{field_8} - logs.thunderbird-logs: - %{field_1} %{field_2} %{field_3->} %{field_4->} %{field_5->} %{field_6->} %{field_7} %{field_8->}(%{field_9->})%{field_10->}[%{field_11->}]: %{field_12->} %{field_13->} %{field_14->} %{field_15} - logs.proxifier-logs: [%{field_1} %{field_2}] %{field_3} - %{field_4} %{field_5->} %{field_6->} %{field_7->} %{field_8} %{field_9} - logs.linux: %{field_1} %{field_2} %{field_3} %{field_4} %{field_5}(%{field_6}_%{field_7})[%{field_8}]: %{field_9->} %{field_10}; %{field_11->} %{field_12} - logs.apache-web: [%{+attributes.custom.timestamp} %{+attributes.custom.timestamp} %{+attributes.custom.timestamp} %{+attributes.custom.timestamp} %{+attributes.custom.timestamp}] [%{severity_text}] %{body.text} - logs.android: %{+attributes.custom.timestamp}-%{+attributes.custom.timestamp} %{+attributes.custom.timestamp->} %{resource.attributes.process.pid->} %{attributes.process.thread.id->} %{severity_text->} %{attributes.log.logger}: %{body.text} - logs.windows: %{+attributes.custom.timestamp}-%{+attributes.custom.timestamp}-%{+attributes.custom.timestamp} %{+attributes.custom.timestamp}, %{severity_text->} %{resource.attributes.service.name->} %{body.text} - logs.health-app-logs: %{+attributes.custom.timestamp}-%{+attributes.custom.timestamp}|Step_%{attributes.log.logger}|%{resource.attributes.process.pid}|%{body.text} - logs.proxifier-logs: [%{+attributes.custom.timestamp} %{+attributes.custom.timestamp}] chrome.exe - %{attributes.url.domain} %{attributes.event.type->} %{attributes.custom.details} - logs.thunderbird-logs: - %{attributes.custom.timestamp} %{+attributes.custom.timestamp_text} %{resource.attributes.host.name->} %{+attributes.custom.timestamp_text->} %{+attributes.custom.timestamp_text->} %{+attributes.custom.timestamp_text->} %{attributes.host.hostname} %{attributes.process.name->}(%{attributes.user.name->})%{field_10->}[%{resource.attributes.process.pid->}]: %{field_12->} %{body.text} - logs.linux: %{+attributes.custom.timestamp} %{+attributes.custom.timestamp} %{+attributes.custom.timestamp} %{attributes.host.hostname} sshd(pam_unix)[%{resource.attributes.process.pid}]: %{+attributes.event.action->} %{+attributes.event.action}; %{body.text} - logs.bgl-logs: - %{field_1} %{attributes.custom.date} %{+resource.attributes.host.name}-%{+resource.attributes.host.name}-%{+resource.attributes.host.name}-%{+resource.attributes.host.name}-%{+resource.attributes.host.name} %{+attributes.custom.timestamp}-%{+attributes.custom.timestamp}-%{+attributes.custom.timestamp}-%{+attributes.custom.timestamp} %{+attributes.custom.target_host}-%{+attributes.custom.target_host}-%{+attributes.custom.target_host}-%{+attributes.custom.target_host}-%{+attributes.custom.target_host} RAS KERNEL INFO %{body.text} - logs.hadoop-logs: %{+attributes.custom.timestamp}-%{+attributes.custom.timestamp}-%{+attributes.custom.timestamp} %{+attributes.custom.timestamp},%{+attributes.custom.timestamp} INFO [%{attributes.process.thread.name}] %{attributes.log.logger}: %{attributes.custom.action} %{attributes.custom.component} for application appattempt_%{+attributes.custom.attempt_id}_%{+attributes.custom.attempt_id}_%{+attributes.custom.attempt_id} Simulate processing... - logs.apache-web: 1 → body.text: 2 unique values (e.g., "mod_jk child workerEnv in error state 6", "workerEnv.init() ok /etc/httpd/conf/workers2.properties") → severity_text: 2 unique values (e.g., "error", "notice") → attributes.custom.timestamp: 38 unique values (e.g., "Fri Nov 14 15:27:00 2025", "Fri Nov 14 15:26:58 2025", "Fri Nov 14 15:26:56 2025", "Fri Nov 14 15:26:53 2025", "Fri Nov 14 15:26:52 2025", "Fri Nov 14 15:26:50 2025", "Fri Nov 14 15:26:49 2025", "Fri Nov 14 15:26:48 2025", "Fri Nov 14 15:26:47 2025", "Fri Nov 14 15:26:45 2025") - logs.hadoop-logs: 1 → attributes.process.thread.name: 1 unique values (e.g., "main") → attributes.custom.action: 1 unique values (e.g., "Created") → attributes.custom.attempt_id: 1 unique values (e.g., "1445144423722 0020 000001") → attributes.custom.timestamp: 65 unique values (e.g., "2025 11 14 15:27:01 370", "2025 11 14 15:27:00 070", "2025 11 14 15:26:58 770", "2025 11 14 15:26:57 470", "2025 11 14 15:26:56 170", "2025 11 14 15:26:54 870", "2025 11 14 15:26:53 570", "2025 11 14 15:26:52 270", "2025 11 14 15:26:50 970", "2025 11 14 15:26:49 670") → attributes.custom.component: 1 unique values (e.g., "MRAppMaster") → attributes.log.logger: 1 unique values (e.g., "org.apache.hadoop.mapreduce.v2.app.MRAppMaster") - logs.bgl-logs: 1 → body.text: 1 unique values (e.g., "instruction cache parity error corrected") → field_1: 2 unique values (e.g., "1117838573", "1117838570") → attributes.custom.date: 1 unique values (e.g., "2005.06.03") → attributes.custom.timestamp: 50 unique values (e.g., "2025 11 14 15.27.01.370000", "2025 11 14 15.27.00.070000", "2025 11 14 15.26.58.770000", "2025 11 14 15.26.57.470000", "2025 11 14 15.26.56.170000", "2025 11 14 15.26.54.870000", "2025 11 14 15.26.53.570000", "2025 11 14 15.26.52.270000", "2025 11 14 15.26.50.970000", "2025 11 14 15.26.49.670000") → resource.attributes.host.name: 1 unique values (e.g., "R02 M1 N0 C:J12 U11") → attributes.custom.target_host: 1 unique values (e.g., "R02 M1 N0 C:J12 U11") - logs.linux: 0.6818181818181818 → body.text: 2 unique values (e.g., "user unknown", "logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=218.188.2.4") → attributes.host.hostname: 1 unique values (e.g., "combo") → attributes.event.action: 2 unique values (e.g., "check pass", "authentication failure") → resource.attributes.process.pid: 2 unique values (e.g., "19937", "19939") → attributes.custom.timestamp: 34 unique values (e.g., "Nov 14 15:27:01", "Nov 14 15:27:00", "Nov 14 15:26:58", "Nov 14 15:26:57", "Nov 14 15:26:56", "Nov 14 15:26:54", "Nov 14 15:26:53", "Nov 14 15:26:52", "Nov 14 15:26:50", "Nov 14 15:26:49") - logs.android: 1 → body.text: 22 unique values (e.g., "$printFreezingDisplayLogsopening app wtoken = AppWindowToken{9f4ef63 token=Token{a64f992 ActivityReco...", "HBM brightnessOut =38", "Animating brightness: target=38, rate=200", "HBM brightnessIn =38", "cleanUpApplicationRecordLocked, pid: 5769, restart: false", "cleanUpApplicationRecordLocked, pid: 23484, restart: false", "cleanUpApplicationRecord -- 23484", "cleanUpApplicationRecordLocked, reset pid: 5784, euid: 0", "cleanUpApplicationRecordLocked, pid: 5784, restart: false", "cleanUpApplicationRecord -- 5784") → severity_text: 4 unique values (e.g., "D", "I", "V", "W") → resource.attributes.process.pid: 4 unique values (e.g., "1702", "23650", "2227", "28601") → attributes.custom.timestamp: 95 unique values (e.g., "11 14 15:26:58.770", "11 14 15:26:57.470", "11 14 15:26:52.270", "11 14 15:26:50.970", "11 14 15:26:48.370", "11 14 15:26:45.770", "11 14 15:26:44.370", "11 14 15:26:42.970", "11 14 15:26:41.470", "11 14 15:26:38.870") → attributes.process.thread.id: 17 unique values (e.g., "2395", "1820", "1737", "1736", "3693", "17632", "17621", "23689", "2250", "14640") → attributes.log.logger: 7 unique values (e.g., "WindowManager", "DisplayPowerController", "ActivityManager", "DisplayManagerService", "AudioManager", "PhoneStatusBar", "PowerManagerService") - logs.health-app-logs: 1 → body.text: 10 unique values (e.g., "onExtend:1514038530000 14 0 4", "flush sensor data", "setTodayTotalDetailSteps=1514038440000##7007##548365##8661##12361##27173954", "calculateCaloriesWithCache totalCalories=126775", "processHandleBroadcastAction action:android.intent.action.SCREEN_ON", " getTodayTotalDetailSteps = 1514038440000##6993##548365##8661##12266##27164404", "onStandStepChanged 3579", "onReceive action: android.intent.action.SCREEN_ON", "calculateAltitudeWithCache totalAltitude=240", "REPORT : 7007 5002 150089 240") → resource.attributes.process.pid: 1 unique values (e.g., "30002312") → attributes.custom.timestamp: 10 unique values (e.g., "20251114 15:27:01:370", "20251114 15:27:00:070", "20251114 15:26:58:770", "20251114 15:26:57:470", "20251114 15:26:56:170", "20251114 15:26:54:870", "20251114 15:26:53:570", "20251114 15:26:52:270", "20251114 15:26:50:970", "20251114 15:26:49:670") → attributes.log.logger: 5 unique values (e.g., "LSC", "StandStepCounter", "SPUtils", "ExtSDM", "StandReportReceiver") - logs.windows: 1 → body.text: 7 unique values (e.g., "$Loaded Servicing Stack v6.1.7601.23505 with Core: C:\Windows\winsxs\amd64_microsoft-windows-servicin...", "Ending TrustedInstaller finalization.", "Reboot mark refs: 0", "Starting TrustedInstaller finalization.", "Ending the TrustedInstaller main loop.", "Idle processing thread terminated normally", "0000000e Created NT transaction (seq 2) result 0x00000000, handle @0xb8") → severity_text: 1 unique values (e.g., "Info") → attributes.custom.timestamp: 95 unique values (e.g., "2025 11 14 15:27:00", "2025 11 14 15:26:58", "2025 11 14 15:26:57", "2025 11 14 15:26:56", "2025 11 14 15:26:54", "2025 11 14 15:26:53", "2025 11 14 15:26:52", "2025 11 14 15:26:50", "2025 11 14 15:26:49", "2025 11 14 15:26:48") → resource.attributes.service.name: 2 unique values (e.g., "CBS", "CSI") - logs.thunderbird-logs: 0.6190476190476191 → field_10: 1 unique values (e.g., "") → body.text: 2 unique values (e.g., "opened for user root by (uid=0)", "closed for user root") → field_12: 1 unique values (e.g., "session") → attributes.host.hostname: 13 unique values (e.g., "dn754/dn754", "dn978/dn978", "en74/en74", "dn3/dn3", "dn261/dn261", "dn731/dn731", "src@eadmin1", "dn73/dn73", "dn228/dn228", "dn596/dn596") → attributes.custom.timestamp_text: 1 unique values (e.g., "2005.11.09 Nov 9 12:01:01") → attributes.process.name: 1 unique values (e.g., "crond") → resource.attributes.process.pid: 12 unique values (e.g., "2913", "2920", "3080", "2907", "2916", "4307", "2917", "2915", "2727", "12636") → attributes.custom.timestamp: 3 unique values (e.g., "1763134020", "1763134018", "1763134017") → attributes.user.name: 1 unique values (e.g., "pam_unix") → resource.attributes.host.name: 13 unique values (e.g., "dn754", "dn978", "en74", "dn3", "dn261", "dn731", "eadmin1", "dn73", "dn228", "dn596") - logs.proxifier-logs: 1 → attributes.event.type: 2 unique values (e.g., "open", "close,") → attributes.url.domain: 1 unique values (e.g., "proxy.cse.cuhk.edu.hk:5070") → attributes.custom.details: 38 unique values (e.g., "through proxy proxy.cse.cuhk.edu.hk:5070 HTTPS", "1190 bytes (1.16 KB) sent, 1671 bytes (1.63 KB) received, lifetime 00:02", "845 bytes sent, 12076 bytes (11.7 KB) received, lifetime <1 sec", "1165 bytes (1.13 KB) sent, 815 bytes received, lifetime <1 sec", "850 bytes sent, 10547 bytes (10.2 KB) received, lifetime 00:02", "0 bytes sent, 0 bytes received, lifetime <1 sec", "3425 bytes (3.34 KB) sent, 212164 bytes (207 KB) received, lifetime 00:18", "934 bytes sent, 5869 bytes (5.73 KB) received, lifetime <1 sec", "451 bytes sent, 18846 bytes (18.4 KB) received, lifetime <1 sec", "1293 bytes (1.26 KB) sent, 2439 bytes (2.38 KB) received, lifetime <1 sec") → attributes.custom.timestamp: 2 unique values (e.g., "11.14 15:27:01", "11.14 15:27:00") Average Parsing Score (samples): 0.9577777777777778 Average Parsing Score (all docs): 0.9223184223184222 ``` </details> --------- Co-authored-by: kibanamachine <[email protected]>
1 parent 5ee4015 commit f54f0e4

File tree

58 files changed

+6436
-267
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+6436
-267
lines changed

‎.github/CODEOWNERS‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -930,6 +930,7 @@ x-pack/platform/packages/shared/kbn-apm-types @elastic/obs-ux-infra_services-tea
930930
x-pack/platform/packages/shared/kbn-classic-stream-flyout @elastic/kibana-management
931931
x-pack/platform/packages/shared/kbn-content-packs-schema @elastic/streams-program-team
932932
x-pack/platform/packages/shared/kbn-data-forge @elastic/obs-ux-management-team
933+
x-pack/platform/packages/shared/kbn-dissect-heuristics @elastic/obs-onboarding-team
933934
x-pack/platform/packages/shared/kbn-elastic-assistant @elastic/security-generative-ai
934935
x-pack/platform/packages/shared/kbn-elastic-assistant-common @elastic/security-generative-ai
935936
x-pack/platform/packages/shared/kbn-elastic-assistant-shared-state @elastic/security-generative-ai

‎package.json‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -530,6 +530,7 @@
530530
"@kbn/discover-plugin": "link:src/platform/plugins/shared/discover",
531531
"@kbn/discover-shared-plugin": "link:src/platform/plugins/shared/discover_shared",
532532
"@kbn/discover-utils": "link:src/platform/packages/shared/kbn-discover-utils",
533+
"@kbn/dissect-heuristics": "link:x-pack/platform/packages/shared/kbn-dissect-heuristics",
533534
"@kbn/doc-links": "link:src/platform/packages/shared/kbn-doc-links",
534535
"@kbn/dom-drag-drop": "link:src/platform/packages/shared/kbn-dom-drag-drop",
535536
"@kbn/ebt-tools": "link:src/platform/packages/shared/kbn-ebt-tools",

‎tsconfig.base.json‎

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -922,6 +922,8 @@
922922
"@kbn/discover-shared-plugin/*": ["src/platform/plugins/shared/discover_shared/*"],
923923
"@kbn/discover-utils": ["src/platform/packages/shared/kbn-discover-utils"],
924924
"@kbn/discover-utils/*": ["src/platform/packages/shared/kbn-discover-utils/*"],
925+
"@kbn/dissect-heuristics": ["x-pack/platform/packages/shared/kbn-dissect-heuristics"],
926+
"@kbn/dissect-heuristics/*": ["x-pack/platform/packages/shared/kbn-dissect-heuristics/*"],
925927
"@kbn/doc-links": ["src/platform/packages/shared/kbn-doc-links"],
926928
"@kbn/doc-links/*": ["src/platform/packages/shared/kbn-doc-links/*"],
927929
"@kbn/docs-utils": ["packages/kbn-docs-utils"],
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# @kbn/dissect-heuristics
2+
3+
Utilities and helper functions for extracting Dissect patterns from log messages.
4+
5+
## Overview
6+
7+
This package provides an algorithm to automatically generate [Elasticsearch Dissect processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/dissect-processor.html) patterns by analyzing sample log messages. Unlike Grok patterns which use regular expressions, Dissect patterns use simple literal string delimiters for faster parsing of structured logs.
8+
9+
## Supported Dissect Modifiers
10+
11+
The extraction algorithm supports a subset of Dissect modifiers:
12+
13+
- **Right Padding (`->`)**: Handles variable trailing whitespace
14+
- **Named Skip (`?`)**: Skips fields with non-meaningful constant values
15+
- **Empty Skip (`{}`)**: Anonymous skip fields
16+
17+
**Note**: Reference keys (`*` and `&`) and append modifiers (`+`) are not supported by this implementation.
18+
19+
## Delimiter Reliability Heuristics
20+
21+
The current approach keeps delimiter scoring deliberately simple:
22+
23+
- Position consistency: Delimiters are scored only by how consistently they appear at similar character offsets across messages (variance-based exponential decay).
24+
- Symmetry enforcement: After scoring, any single-character closing bracket `)`, `]`, `}` is discarded unless its matching opener `(`, `[`, `{` was also selected. This avoids generating patterns that fragment bracketed content using an orphan closer.
25+
26+
No additional bracket penalties (mismatch, crossing, depth variance, ordering instability) are applied—favoring simpler, more predictable behavior while still preventing obviously broken delimiter choices.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
export { extractDissectPattern } from './src/extract_dissect_pattern';
9+
export { getDissectProcessor } from './src/get_dissect_processor';
10+
export { getReviewFields, type ReviewFields } from './src/review/get_review_fields';
11+
export { getDissectProcessorWithReview } from './src/review/get_dissect_processor_with_review';
12+
export { ReviewDissectFieldsPrompt } from './src/review/review_fields_prompt';
13+
export { groupMessagesByPattern } from './src/group_messages';
14+
export { serializeAST } from './src/serialize_ast';
15+
export type {
16+
DissectPattern,
17+
DissectField,
18+
DissectModifiers,
19+
DelimiterNode,
20+
DissectProcessorResult,
21+
DissectAST,
22+
DissectASTNode,
23+
DissectFieldNode,
24+
DissectLiteralNode,
25+
} from './src/types';
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
module.exports = {
9+
preset: '@kbn/test/jest_node',
10+
rootDir: '../../../../..',
11+
roots: ['<rootDir>/x-pack/platform/packages/shared/kbn-dissect-heuristics'],
12+
};
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"type": "shared-common",
3+
"id": "@kbn/dissect-heuristics",
4+
"owner": "@elastic/obs-onboarding-team",
5+
"group": "platform",
6+
"visibility": "shared"
7+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"name": "@kbn/dissect-heuristics",
3+
"private": true,
4+
"version": "1.0.0",
5+
"license": "Elastic License 2.0"
6+
}
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
/**
9+
* Apache Common Log Format
10+
* Pattern: %{clientip} %{ident} %{auth} [%{timestamp}] "%{verb} %{request} %{httpversion}" %{response} %{bytes}
11+
*/
12+
export const APACHE_LOGS = [
13+
'127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326',
14+
'127.0.0.1 - - [10/Oct/2000:13:55:37 -0700] "GET /images/logo.png HTTP/1.0" 200 1234',
15+
'192.168.1.1 - frank [10/Oct/2000:13:55:38 -0700] "POST /api/submit HTTP/1.1" 201 512',
16+
];
17+
18+
/**
19+
* Syslog format
20+
* Pattern: %{month} %{day} %{time} %{hostname} %{process}[%{pid}]: %{message}
21+
*/
22+
export const SYSLOG = [
23+
'Mar 10 15:45:23 hostname sshd[12345]: Accepted password for user from 192.168.1.1',
24+
'Mar 10 15:45:24 hostname systemd[1]: Started service successfully',
25+
'Mar 10 15:45:25 hostname kernel[0]: Memory allocation completed',
26+
];
27+
28+
/**
29+
* Custom pipe-delimited format
30+
* Pattern: %{date}|%{level}|%{service}|%{message}
31+
*/
32+
export const PIPE_DELIMITED = [
33+
'2024-01-15|INFO|UserService|User login successful',
34+
'2024-01-15|WARN|AuthService|Rate limit approaching',
35+
'2024-01-15|ERROR|DatabaseService|Connection timeout',
36+
];
37+
38+
/**
39+
* Logs with variable trailing whitespace (needs right padding modifier)
40+
* Pattern: %{level->} %{message}
41+
*/
42+
export const VARIABLE_WHITESPACE = [
43+
'INFO Log message here',
44+
'WARN Log message here',
45+
'ERROR Log message here',
46+
'DEBUG Log message here',
47+
];
48+
49+
/**
50+
* Logs with skip fields (constant values)
51+
* Pattern: %{ip} %{} %{} [%{timestamp}]
52+
*/
53+
export const WITH_SKIP_FIELDS = [
54+
'1.2.3.4 - - [30/Apr/1998:22:00:52]',
55+
'5.6.7.8 - - [01/May/1998:10:15:30]',
56+
'9.0.1.2 - - [02/May/1998:14:22:18]',
57+
];
58+
59+
/**
60+
* Simple space-delimited
61+
* Pattern: %{field1} %{field2} %{field3}
62+
*/
63+
export const SPACE_DELIMITED = ['alpha beta gamma', 'one two three', 'red green blue'];
64+
65+
/**
66+
* JSON-like structure (challenging case)
67+
* Pattern: {"timestamp":"%{timestamp}","level":"%{level}","message":"%{message}"}
68+
*/
69+
export const JSON_LIKE = [
70+
'{"timestamp":"2024-01-15T10:30:00","level":"INFO","message":"User logged in"}',
71+
'{"timestamp":"2024-01-15T10:30:01","level":"WARN","message":"High memory usage"}',
72+
'{"timestamp":"2024-01-15T10:30:02","level":"ERROR","message":"Database error"}',
73+
];
74+
75+
/**
76+
* CSV format
77+
* Pattern: %{id},%{name},%{email},%{status}
78+
*/
79+
export const CSV_FORMAT = [
80+
'1,John Doe,[email protected],active',
81+
'2,Jane Smith,[email protected],inactive',
82+
'3,Bob Johnson,[email protected],active',
83+
];
84+
85+
/**
86+
* Edge case: No common delimiters
87+
*/
88+
export const NO_COMMON_DELIMITERS = ['abc123', 'xyz789', 'def456'];
89+
90+
/**
91+
* Edge case: All identical messages
92+
*/
93+
export const IDENTICAL_MESSAGES = [
94+
'Same message every time',
95+
'Same message every time',
96+
'Same message every time',
97+
];
98+
99+
/**
100+
* Edge case: Single message
101+
*/
102+
export const SINGLE_MESSAGE = ['This is a single log message'];
103+
104+
/**
105+
* Edge case: Empty messages
106+
*/
107+
export const EMPTY_MESSAGES = ['', '', ''];
108+
109+
/**
110+
* Complex nested delimiters
111+
* Pattern: %{date} [%{level}] (%{module}) - %{message}
112+
*/
113+
export const NESTED_DELIMITERS = [
114+
'2024-01-15 [INFO] (UserService) - User authenticated successfully',
115+
'2024-01-15 [WARN] (AuthService) - Invalid token detected',
116+
'2024-01-15 [ERROR] (DatabaseService) - Query execution failed',
117+
];
118+
119+
/**
120+
* Multiple spaces as delimiters
121+
* Pattern: %{field1} %{field2} %{field3}
122+
*/
123+
export const MULTIPLE_SPACES = [
124+
'field1 field2 field3',
125+
'data1 data2 data3',
126+
'test1 test2 test3',
127+
];
128+
129+
/**
130+
* Tab-delimited format
131+
* Pattern: %{col1}\t%{col2}\t%{col3}
132+
*/
133+
export const TAB_DELIMITED = ['col1\tcol2\tcol3', 'val1\tval2\tval3', 'data1\tdata2\tdata3'];
134+
135+
/**
136+
* Kubernetes-style logs
137+
* Pattern: %{timestamp} %{stream} %{flag} %{message}
138+
*/
139+
export const KUBERNETES_LOGS = [
140+
'2024-01-15T10:30:00.000Z stdout F Container started successfully',
141+
'2024-01-15T10:30:01.000Z stderr F Error: Connection refused',
142+
'2024-01-15T10:30:02.000Z stdout P Partial log line continues',
143+
];
144+
145+
/**
146+
* Expected patterns for testing
147+
*/
148+
export const EXPECTED_PATTERNS = {
149+
APACHE_LOGS:
150+
'%{clientip} %{ident} %{auth} [%{timestamp}] "%{verb} %{request} %{httpversion}" %{response} %{bytes}',
151+
SYSLOG: '%{month} %{day} %{time} %{hostname} %{process}[%{pid}]: %{message}',
152+
PIPE_DELIMITED: '%{date}|%{level}|%{service}|%{message}',
153+
SPACE_DELIMITED: '%{field1} %{field2} %{field3}',
154+
CSV_FORMAT: '%{id},%{name},%{email},%{status}',
155+
};

0 commit comments

Comments
 (0)