-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[Alerting] Allow alert tags to be modified in bulk #241883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Allow alert tags to be modified in bulk #241883
Conversation
|
/ci |
1f4e1b4 to
a6c1886
Compare
|
Pinging @elastic/response-ops (Team:ResponseOps) |
|
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
a6c1886 to
80f8fdd
Compare
x-pack/platform/plugins/shared/rule_registry/server/routes/tags.ts
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/rule_registry/server/alert_data_client/alerts_client.test.ts
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/rule_registry/server/alert_data_client/alerts_client.ts
Outdated
Show resolved
Hide resolved
| validate: { | ||
| body: buildRouteValidation( | ||
| t.intersection([ | ||
| t.type({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to provide the index. Shouldn't the index be derived by the alert? It feels like our internal system should be able to know and keep track of this.
Also, what if we are trying to bulk update alerts that are spread across multiple indices, for example, add a tag to both a synthetics and apm rule that are for the same service? cc: @cnasikas there may be some context I'm missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code was developed before ResponseOps, but I assume it is needed because it would be very inefficient to get the indices of the alerts using IDs. You may be able to do a bulk get using an alias, but for the query param, this is not feasible. You would need to do an aggregation (search) over all alerting indices, handle aggregation limits, etc. Now, behind the scenes, the bulk update of the alert client is using bulk update by query, which supports updating multiple documents using an alias. So you can set the index route param to be .alerts-observability-* or even alerts-observability-logs*,alerts-observability-apm*.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm attempting this without success so far
PATCH kbn:/internal/rac/alerts/tags
{
"index": ".alerts-*",
"addTags": ["alert_tag"],
"alertIds": ["30b3b23c-27f5-4064-9947-1ebe376d4561"]
}
Result is
{
"statusCode": 404,
"error": "Not Found",
"message": "alerts with ids 30b3b23c-27f5-4064-9947-1ebe376d4561 and index .alerts-* not found"
}
Attempting to use the UUID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cnasikas testing I'm not able to use an index pattern it seems in the way you've suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users will have to provide the full index due to constraints with esClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from the UI, we will have to segment bulk requests so that we have one bulk request per index, it would seem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, mget does not support index patterns. To avoid having the UI make multiple requests, instead of an mget, we can do an aggregation to get the rule type IDs and consumers. Then we can construct the pairs and pass them to the ensureAllAuthorized. The index would still be needed to target the o11y + stack alerts (and not modify the security alerts), but it can be optional and fallback to .alerts-*.
GET .alerts-*/_search
{
"query": {
"ids": {
"values": [
"36d6d650-1204-44f5-b105-3af0c5d5acde"
]
}
},
"aggs": {
"ruleTypeId": {
"terms": {
"field": "kibana.alert.rule.rule_type_id",
"size": 100
},
"aggs": {
"consumer": {
"terms": {
"field": "kibana.alert.rule.consumer",
"size": 100
}
}
}
}
},
"size": 0
}
|
@elasticmachine merge upstream |
| }); | ||
| }); | ||
|
|
||
| describe('failure scenarios', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please also mock the Not Found failure scenario?
| noKibanaPrivileges, | ||
| ]; | ||
|
|
||
| addTests({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like how these are structured 🔥
| .set('kbn-xsrf', 'true') | ||
| .send({ | ||
| alertIds: [alertId], | ||
| addTags: ['new-tag'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed all these only have the addTags param. Could you please add one scenario for tag removal? (No need to duplicate all tests, just one should bulk remove alert tags.)
| ctx._source['${ALERT_WORKFLOW_TAGS}'] = new ArrayList(); | ||
| } | ||
| for (item in params.addTags) { | ||
| if (!ctx._source['${ALERT_WORKFLOW_TAGS}'].contains(item)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cnasikas I was able to recreate this too. The difference between the success count and the length of the results array is confusing user experience wise, and the "success" status is misleading. |
dominiqueclarke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dominiqueclarke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you use the query option, the results array is empty. Maybe it doesn't make sense to include an array item for every individual instance that matches the query, but perhaps returning back a single item with the query with status: success?
dominiqueclarke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking on the bug reported by Umberto and I, where the success status can be incorrect.
@umbopepato Which user did you use and what is the consumer of each ES rule?
@umbopepato @dominiqueclarke Given all the misleading findings, I would suggest removing the
I believe that the response is correct here. You are not authorized to perform any action on that space, so a 403 indicates that. |
I used the default elastic superuser |
I'm not sure I fully understand the rationale, but this particular issue is not as critical to me as the other. My user dos technical have access to perform all actions in that space, they are a super user, it's just that there are no alerts in that space. It's more of a 404. |
Fine by me. That'll help us iterate over fails to populate the error in the UI. |
Oh, I see what you mean. I thought you did NOT have access to that space. Let me investigate, and I will come back to you. |
|
@dominiqueclarke I changed the response to return only the failures if they exist. I also return a 404 for the scenario with alerts from a different space. @umbopepato I could not reproduce your bug. Was it a refresh issue with the alerts table? |
|
@elasticmachine merge upstream |
I'm really sorry to report that I'm having difficulties with returning failures if they are any. Here I have two ids. I'm querying against the slo alert indices. One id is from slo alert and the other is from synthetics alert. As you can see it's reporting total 1 updated 1, without reporting any failures.
|
This is the expected response. We are using |
dominiqueclarke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed and will move forward with merging this PR.
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Public APIs missing exports
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
cc @cnasikas |



Closes #240356
Replaces this PR
Summary
This pull request introduces a new API route and supporting backend logic for bulk updating tags on alerts in the rule registry. The main changes include the addition of a
patchTagsmethod to theAlertsClient, a new route for bulk patching alert tags, and reusable scripts for tag updates. These improvements make it possible to add/remove or replace tags on multiple alerts efficiently, either by IDs or by query.Alerting authorization
The
AlertingAuthorizationclass changed to support bulk authorized multiple rule type IDs and consumers. TheensureAuthorizedwas renamed to_ensureAuthorizedand acceptsArray<{ ruleTypeId: string; consumers: string[] }>;. The logic is the same as before, but it constructs all security actions based on the input. This is needed to avoid having to do one authorization call per(ruleTypeId, consumer)pair. Lastly, abulkEnsureAuthorizedis exposed which is a wrapper of the private_ensureAuthorized.Alerts client
The code in the
AlertsClientis pretty outdated. For this reason, I decided not to use the existing functionality and code from scratch. ThebulkUpdateTagsis introduced, and it bulk updates the tags of multiple alerts either by usingalertIdsor by using a KQLquery. In the first scenario, an aggregation is made to get the rule type ID and the consumer of each alert. Then we bulk authorize. If the user has access, we move forward and update the tags of the alerts. If not, we throw an error. For thequeryscenario, we apply the authorization filter along with thequeryto filter out the alerts that the user does not have access to. Lastly, we audit log only once for the whole bulk operation and not for each alert found.API and Backend Enhancements
POST /internal/rac/alerts/tagsfor bulk updating alert tags, supporting add, remove, and replace operations, with validation and error handling. [1] [2]bulkUpdateTagsmethod inAlertsClient, enabling tag updates on alerts by IDs or query, using Elasticsearch scripts for efficient bulk operations.Reusable Update Scripts
alert_client_bulk_update_scripts.tsand integrated into the client logic. [1] [2]