Skip to content

Commit b411eba

Browse files
awahab07kibanamachineelasticmachine
authored
[FatalReactError] Send additional metrics with Error Telemetry Event to understand transient, non-breaking errors. (#234589)
## Summary This PR improves the React error boundary telemetry in Shared UX to correctly classify and measure short-lived (“transient”) errors. ## Problem Historically, we could not reliably distinguish between: * A momentary Error Boundary that appears briefly and then disappears, a probable false positive due to immediate user navigation or recovery. * A “seen” error UI that remained on screen long enough to be considered a user-visible failure (validly Fatal). This made it difficult to detect transient errors that users likely did not see, and possibly should not be reported as Fatal errors to telemetry (or alerted for). ### What this PR does Adds two telemetry fields to `REACT_FATAL_ERROR_EVENT_TYPE`: * `component_render_min_duration_ms`: Minimum time the error boundary UI component remained mounted. - Helps identify truly momentary/transient error experiences * `has_transient_navigation`: A boolean that is `true` only if a navigation occurred within the first `TRANSIENT_NAVIGATION_WINDOW_MS` (e.g. 250ms) after the error appeared. - Enables analysis of transient errors that coincide with user or programmatic navigation (e.g., route change after an error render) #### Refactored Error Reporting Logic To accurately capture `has_transient_navigation`, an error is not reported immediately. Instead, it is first enqueued and held for a brief time window (default 250ms) to determine if a transient navigation has occurred. The error report is only committed after this window elapses, ensuring `has_transient_navigation` has been correctly determined. A commit is triggered as a result of any user action which unmounts the error boundary (provided `TRANSIENT_NAVIGATION_WINDOW_MS` has elapsed otherwise it'll wait), or automatically after 10 seconds. This waiting period for classification does not affect `component_render_min_duration_ms`, which accurately measures the component's actual on-screen render time. - `TRANSIENT_NAVIGATION_WINDOW_MS` (default 250ms) define the nav settlement window. - `DEFAULT_MAX_ERROR_DURATION_MS` (default 10s) max time the error is held from reporting to determine `component_render_min_duration_ms`. So `component_render_min_duration_ms` maxes out at 10s. #### Example Payload ```json { "component_name": "KibanaSectionErrorBoundary", "component_stack": "...", "error_message": "Error: ...", "error_stack": "...", "component_render_min_duration_ms": 103, "has_transient_navigation": true } ``` ## How to Test / Reproduce Since it's hard to reproduce such transient errors in Dev env, two reproducible error scenarios have been simulated in the commit `564007eb91704558a67e3474cdb116c0418a908a`. You can check out that commit and reproduce the scenarios. **Tip** to see the reported Fatal Error payload, you can add a log point in http://localhost:5601/entry:core/node_modules/@elastic/ebt/client/src/analytics_client/analytics_client.js at L46 with content `'reporting event', eventType, eventData`: <img width="600" alt="image" src="https://github.com/user-attachments/assets/9fb56722-4c2b-418e-8690-80c7e05a35d9" /> ### A) Logs app router issue (was fixed by [#194580](https://github.com/elastic/kibana/pull/194580/files#diff-65b7459bd6dcaafd801e69ddd217d7e977391ef8398526de737132b919c23c52)) After checking out [the commit](564007e), visit `http://localhost:5601/<proxy>/app/logs` https://github.com/user-attachments/assets/e885e945-4ba9-499e-aeb2-afc9482e31e0 ### B) Demo transient scenarios (in Kibana) After checking out the same [commit](564007e), visit Observability Onboarding page `http://localhost:5601/<proxy>/app/observabilityOnboarding` as shown in the following video: https://github.com/user-attachments/assets/85897472-af10-46b9-8792-ee5fadf40426 ### C) Storybook A comprehensive story with docs/comments has been added for an elaborated reproduction of the scenarios. Run via `yarn run storybook shared_ux`, visit `localhost:9001` and consult the following videos: https://github.com/user-attachments/assets/2615863b-9055-4618-a679-5468d75f7725 https://github.com/user-attachments/assets/bb82817f-52fd-48eb-ae8d-86400373c20c ### D) Reproducing the known error in prod scenario Most number of transient errors in prod were reported by the scenario demonstrated in the following video where user's nav attempts in quick succession would raise this error. https://github.com/user-attachments/assets/670c6912-c706-4a32-967a-fbc870e92923 As show above, it was reproducible before, but after [this](#220313) fix, it should be quite hard to reproduce. The metrics added in this PR will make it easier to identify such errors if they are still being reported, and could possibly be filtered out based on the values of those metrics. To reproduce this in Storybook, use the following story which allows to perform a navigation in close succession to the error boundary rendering: https://github.com/user-attachments/assets/c5d1a82a-19f4-477b-a969-d756ff57c7c3 --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
1 parent b8f3c71 commit b411eba

21 files changed

+1453
-179
lines changed

src/platform/packages/shared/shared-ux/error_boundary/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* License v3.0 only", or the "Server Side Public License, v 1".
88
*/
99

10-
export { KibanaErrorBoundaryProvider } from './src/services/error_boundary_services';
10+
export { KibanaErrorBoundaryProvider } from './src/services/error_boundary_provider';
1111
export { KibanaErrorBoundary } from './src/ui/error_boundary';
1212
export { KibanaSectionErrorBoundary } from './src/ui/section_error_boundary';
1313
export { ThrowIfError } from './src/ui/throw_if_error';

src/platform/packages/shared/shared-ux/error_boundary/lib/telemetry_events.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,22 @@ export const reactFatalErrorSchema = {
3232
optional: false as const,
3333
},
3434
},
35+
component_render_min_duration_ms: {
36+
type: 'long' as const,
37+
_meta: {
38+
description:
39+
'Minimum duration in milliseconds that the fatal error component stayed rendered (before unmount). A max value of 10,000 (10s) is enforced to prevent excessive, indefinite or indeterminable durations.',
40+
optional: false as const,
41+
},
42+
},
43+
has_transient_navigation: {
44+
type: 'boolean' as const,
45+
_meta: {
46+
description:
47+
'Indicates if navigation occurred within the transient window (e.g. first 250ms) after the error occurred. This helps identify transient errors, successfully followed by a navigation, that users may not have seen.',
48+
optional: false as const,
49+
},
50+
},
3551
error_message: {
3652
type: 'keyword' as const,
3753
_meta: {

src/platform/packages/shared/shared-ux/error_boundary/mocks/index.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,11 @@ export { BadComponent } from './src/bad_component';
1111
export { ChunkLoadErrorComponent } from './src/chunk_load_error_component';
1212
export { getServicesMock } from './src/jest';
1313
export { KibanaErrorBoundaryStorybookMock } from './src/storybook';
14+
export { createAnalyticsMock } from './src/analytics_mock';
15+
export {
16+
ControlsBar,
17+
Spacer,
18+
DocsBlock,
19+
StoryActionButton,
20+
} from './src/error_boundary_story_controls';
21+
export { createServicesWithAnalyticsMock } from './src/story_services';
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
import { action } from '@storybook/addon-actions';
11+
12+
export interface TelemetryEvent {
13+
type: string;
14+
payload: unknown;
15+
at: number;
16+
}
17+
18+
export interface AnalyticsMock {
19+
analytics: { reportEvent: (type: string, payload: unknown) => void };
20+
getEvents(): TelemetryEvent[];
21+
clear(): void;
22+
subscribe(cb: (events: TelemetryEvent[]) => void): () => void;
23+
}
24+
25+
export function createAnalyticsMock(): AnalyticsMock {
26+
const events: TelemetryEvent[] = [];
27+
const subscribers = new Set<(events: TelemetryEvent[]) => void>();
28+
const reportAction = action('Report telemetry event');
29+
30+
function notify() {
31+
subscribers.forEach((cb) => cb([...events]));
32+
}
33+
34+
return {
35+
analytics: {
36+
reportEvent: (type: string, payload: unknown) => {
37+
reportAction(type, payload);
38+
events.push({ type, payload, at: Date.now() });
39+
notify();
40+
},
41+
},
42+
getEvents: () => [...events],
43+
clear: () => {
44+
events.splice(0, events.length);
45+
notify();
46+
},
47+
subscribe: (cb) => {
48+
subscribers.add(cb);
49+
return () => subscribers.delete(cb);
50+
},
51+
};
52+
}

src/platform/packages/shared/shared-ux/error_boundary/mocks/src/bad_component.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ export const BadComponent = () => {
2525
};
2626

2727
return (
28-
<EuiButton onClick={handleClick} data-test-subj="clickForErrorBtn">
29-
Click for error
28+
<EuiButton color="danger" onClick={handleClick} data-test-subj="clickForErrorBtn">
29+
Throw error
3030
</EuiButton>
3131
);
3232
};

src/platform/packages/shared/shared-ux/error_boundary/mocks/src/chunk_load_error_component.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ export const ChunkLoadErrorComponent = () => {
2727
};
2828

2929
return (
30-
<EuiButton onClick={handleClick} fill={true} data-test-subj="clickForErrorBtn">
31-
Click for error
30+
<EuiButton color="danger" onClick={handleClick} fill={true} data-test-subj="clickForErrorBtn">
31+
Throw error
3232
</EuiButton>
3333
);
3434
};
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
import React from 'react';
11+
import { EuiButton } from '@elastic/eui';
12+
13+
export const Spacer: React.FC = () => <div style={{ height: 12 }} />;
14+
15+
export const ControlsBar: React.FC<{ children: React.ReactNode }> = ({ children }) => (
16+
<div style={{ display: 'flex', gap: 8, flexWrap: 'wrap', alignItems: 'center' }}>{children}</div>
17+
);
18+
19+
export const DocsBlock: React.FC<{ title: string; children: React.ReactNode }> = ({
20+
title,
21+
children,
22+
}) => (
23+
<div style={{ marginTop: 12, padding: 12, border: '1px dashed #ccc', borderRadius: 6 }}>
24+
<div style={{ fontWeight: 700, marginBottom: 6 }}>{title}</div>
25+
<div style={{ lineHeight: 1.5 }}>{children}</div>
26+
</div>
27+
);
28+
29+
export const StoryActionButton: React.FC<{
30+
onClick: () => void;
31+
children: React.ReactNode;
32+
color?:
33+
| 'primary'
34+
| 'danger'
35+
| 'success'
36+
| 'warning'
37+
| 'text'
38+
| 'accent'
39+
| 'accentSecondary'
40+
| 'neutral'
41+
| 'risk'
42+
| undefined;
43+
fill?: boolean;
44+
disabled?: boolean;
45+
}> = ({ onClick, children, color = 'danger', fill = true, disabled }) => {
46+
return (
47+
<EuiButton
48+
color={color}
49+
onClick={onClick}
50+
fill={fill}
51+
isDisabled={disabled}
52+
style={{ margin: 4 }}
53+
size="s"
54+
>
55+
{children}
56+
</EuiButton>
57+
);
58+
};
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
import { action } from '@storybook/addon-actions';
11+
import type { AnalyticsMock } from './analytics_mock';
12+
import { createAnalyticsMock } from './analytics_mock';
13+
import type { KibanaErrorBoundaryServices } from '../../types';
14+
import { KibanaErrorService } from '../../src/services/error_service';
15+
16+
export function createServicesWithAnalyticsMock(): {
17+
services: KibanaErrorBoundaryServices;
18+
mock: AnalyticsMock;
19+
} {
20+
const onClickRefresh = action('Reload window');
21+
const mock = createAnalyticsMock();
22+
const analytics = mock.analytics;
23+
24+
return {
25+
services: {
26+
onClickRefresh,
27+
errorService: new KibanaErrorService({ analytics }),
28+
},
29+
mock,
30+
};
31+
}

src/platform/packages/shared/shared-ux/error_boundary/mocks/src/storybook.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import { AbstractStorybookMock } from '@kbn/shared-ux-storybook-mock';
1111
import { action } from '@storybook/addon-actions';
1212
import { KibanaErrorService } from '../../src/services/error_service';
13+
import { createAnalyticsMock } from './analytics_mock';
1314
import type { KibanaErrorBoundaryServices } from '../../types';
1415

1516
// eslint-disable-next-line @typescript-eslint/no-empty-interface
@@ -27,7 +28,8 @@ export class KibanaErrorBoundaryStorybookMock extends AbstractStorybookMock<
2728

2829
getServices(params: Params = {}): KibanaErrorBoundaryServices {
2930
const onClickRefresh = action('Reload window');
30-
const analytics = { reportEvent: action('Report telemetry event') };
31+
const mock = createAnalyticsMock();
32+
const analytics = mock.analytics;
3133

3234
return {
3335
...params,
Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import { analyticsServiceMock } from '@kbn/core-analytics-browser-mocks';
1414
import type { KibanaErrorBoundaryProviderDeps } from '../../types';
1515
import { KibanaErrorBoundary, KibanaErrorBoundaryProvider } from '../..';
1616
import { BadComponent } from '../../mocks';
17+
import { TRANSIENT_NAVIGATION_WINDOW_MS } from './error_service';
1718
import userEvent from '@testing-library/user-event';
1819

1920
describe('<KibanaErrorBoundaryProvider>', () => {
@@ -26,34 +27,40 @@ describe('<KibanaErrorBoundaryProvider>', () => {
2627
it('creates a context of services for KibanaErrorBoundary', async () => {
2728
const reportEventSpy = jest.spyOn(analytics!, 'reportEvent');
2829

29-
const { findByTestId } = render(
30+
const { findByTestId, unmount } = render(
3031
<KibanaErrorBoundaryProvider analytics={analytics}>
3132
<KibanaErrorBoundary>
3233
<BadComponent />
3334
</KibanaErrorBoundary>
3435
</KibanaErrorBoundaryProvider>
3536
);
3637
await userEvent.click(await findByTestId('clickForErrorBtn'));
38+
unmount(); // Unmount to commit/report the error
39+
40+
// Wait for the error to be reported/committed
41+
await new Promise((resolve) => setTimeout(resolve, 1.5 * TRANSIENT_NAVIGATION_WINDOW_MS));
3742

3843
expect(reportEventSpy).toBeCalledWith('fatal-error-react', {
3944
component_name: 'BadComponent',
4045
component_stack: expect.any(String),
4146
error_message: 'Error: This is an error to show the test user!',
4247
error_stack: expect.any(String),
48+
component_render_min_duration_ms: expect.any(Number),
49+
has_transient_navigation: expect.any(Boolean),
4350
});
4451
});
4552

4653
it('uses higher-level context if available', async () => {
47-
const reportEventSpy1 = jest.spyOn(analytics!, 'reportEvent');
54+
const reportEventParentSpy = jest.spyOn(analytics!, 'reportEvent');
4855

49-
const analytics2 = analyticsServiceMock.createAnalyticsServiceStart();
50-
const reportEventSpy2 = jest.spyOn(analytics2, 'reportEvent');
56+
const analyticsChild = analyticsServiceMock.createAnalyticsServiceStart();
57+
const reportEventChildSpy = jest.spyOn(analyticsChild, 'reportEvent');
5158

52-
const { findByTestId } = render(
59+
const { findByTestId, unmount } = render(
5360
<KibanaErrorBoundaryProvider analytics={analytics}>
5461
<KibanaErrorBoundary>
5562
Hello world
56-
<KibanaErrorBoundaryProvider analytics={analytics2}>
63+
<KibanaErrorBoundaryProvider analytics={analyticsChild}>
5764
<KibanaErrorBoundary>
5865
<BadComponent />
5966
</KibanaErrorBoundary>
@@ -63,12 +70,26 @@ describe('<KibanaErrorBoundaryProvider>', () => {
6370
);
6471
await userEvent.click(await findByTestId('clickForErrorBtn'));
6572

66-
expect(reportEventSpy2).not.toBeCalled();
67-
expect(reportEventSpy1).toBeCalledWith('fatal-error-react', {
73+
// Wait for nav to settle
74+
await new Promise((resolve) => setTimeout(resolve, TRANSIENT_NAVIGATION_WINDOW_MS));
75+
76+
unmount(); // Unmount to commit/report the error
77+
78+
// Wait for the error to be reported/committed
79+
await new Promise((resolve) => setTimeout(resolve, 500));
80+
81+
expect(reportEventParentSpy).not.toBeCalled();
82+
expect(reportEventChildSpy).toBeCalledWith('fatal-error-react', {
6883
component_name: 'BadComponent',
6984
component_stack: expect.any(String),
7085
error_message: 'Error: This is an error to show the test user!',
7186
error_stack: expect.any(String),
87+
has_transient_navigation: expect.any(Boolean),
88+
component_render_min_duration_ms: expect.any(Number),
7289
});
90+
expect(
91+
(reportEventChildSpy.mock.calls[0][1] as Record<string, unknown>)
92+
.component_render_min_duration_ms
93+
).toBeGreaterThanOrEqual(250);
7394
});
7495
});

0 commit comments

Comments
 (0)