Skip to content

Commit fd8c7f8

Browse files
authored
packages/openai: New package (#12494)
1 parent 561d4b7 commit fd8c7f8

File tree

57 files changed

+7649
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+7649
-1
lines changed

.github/CODEOWNERS

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@
307307
/packages/nginx_ingress_controller_otel @elastic/obs-infraobs-integrations
308308
/packages/o365 @elastic/security-service-integrations
309309
/packages/okta @elastic/security-service-integrations
310+
/packages/openai @elastic/obs-infraobs-integrations
310311
/packages/opencanary @elastic/security-service-integrations
311312
/packages/oracle @elastic/obs-infraobs-integrations
312313
/packages/oracle_weblogic @elastic/obs-infraobs-integrations
@@ -480,4 +481,4 @@
480481
/packages/o365_metrics/data_stream/yammer_device_usage @elastic/obs-infraobs-integrations
481482
/packages/o365_metrics/data_stream/service_health @elastic/obs-infraobs-integrations
482483
/packages/o365_metrics/data_stream/viva_engage_device_usage_user_counts @elastic/obs-infraobs-integrations
483-
/packages/o365_metrics/data_stream/subscriptions @elastic/obs-infraobs-integrations
484+
/packages/o365_metrics/data_stream/subscriptions @elastic/obs-infraobs-integrations
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
dependencies:
2+
ecs:
3+
reference: [email protected]
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# OpenAI
2+
3+
The OpenAI integration allows you to monitor OpenAI API usage metrics. OpenAI is an AI research and deployment company that offers [API platform](https://openai.com/api) for their industry-leading foundation models.
4+
5+
With the OpenAI integration, you can track API usage metrics across their models, as well as for vector store and code interpreter. You will use Kibana to visualize your data, create alerts if usage limits are approaching, and view metrics when you troubleshoot issues. For example, you can track token usage and API calls per model.
6+
7+
## Data collection
8+
9+
The OpenAI integration leverages the [OpenAI Usage API](https://platform.openai.com/docs/api-reference/usage) to collect detailed usage metrics. The Usage API delivers comprehensive insights into your API activity, helping you understand and optimize your organization's OpenAI API usage.
10+
11+
## Data streams
12+
13+
The OpenAI integration collects the following logs data streams:
14+
15+
- `audio_speeches`: Collects audio speeches usage metrics.
16+
- `audio_transcriptions`: Collects audio transcriptions usage metrics.
17+
- `code_interpreter_sessions`: Collects code interpreter sessions usage metrics.
18+
- `completions`: Collects completions usage metrics.
19+
- `embeddings`: Collects embeddings usage metrics.
20+
- `images`: Collects images usage metrics.
21+
- `moderations`: Collects moderations usage metrics.
22+
- `vector_stores`: Collects vector stores usage metrics.
23+
24+
See more details for data streams in the [Logs](#logs-reference).
25+
26+
## Requirements
27+
28+
You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it.
29+
30+
You need an OpenAI account with a valid [Admin key](https://platform.openai.com/settings/organization/admin-keys) for programmatic access to [OpenAI Usage API](https://platform.openai.com/docs/api-reference/usage).
31+
32+
## Setup
33+
34+
For step-by-step instructions on how to set up an integration, see the [Getting started](https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-observability.html) guide.
35+
36+
### Generate an Admin key
37+
38+
To generate an Admin key, please generate a key or use an existing one from the [Admin keys](https://platform.openai.com/settings/organization/admin-keys) page. Use the Admin key to configure the OpenAI integration.
39+
40+
## Collection behavior
41+
42+
By default, the OpenAI integration fetches metrics with a bucket width of 1 day (`1d`), which means metrics are aggregated by day. metrics are collected from the initial start time until the current time, excluding the current bucket since it is incomplete. So, based on configured bucket width, the integration collects metrics from the initial start time until the current time minus the bucket width.
43+
44+
## Logs reference
45+
46+
**ECS Field Reference**
47+
48+
Refer to this [document](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) for detailed information on ECS fields.
49+
50+
### Audio speeches
51+
52+
The `audio_speeches` data stream captures audio speeches usage metrics.
53+
54+
{{event "audio_speeches"}}
55+
56+
{{fields "audio_speeches"}}
57+
58+
### Audio transcriptions
59+
60+
The `audio_transcriptions` data stream captures audio transcriptions usage metrics.
61+
62+
{{event "audio_transcriptions"}}
63+
64+
{{fields "audio_transcriptions"}}
65+
66+
### Code interpreter sessions
67+
68+
The `code_interpreter_sessions` data stream captures code interpreter sessions usage metrics.
69+
70+
{{event "code_interpreter_sessions"}}
71+
72+
{{fields "code_interpreter_sessions"}}
73+
74+
### Completions
75+
76+
The `completions` data stream captures completions usage metrics.
77+
78+
{{event "completions"}}
79+
80+
{{fields "completions"}}
81+
82+
### Embeddings
83+
84+
The `embeddings` data stream captures embeddings usage metrics.
85+
86+
{{event "embeddings"}}
87+
88+
{{fields "embeddings"}}
89+
90+
### Images
91+
92+
The `images` data stream captures images usage metrics.
93+
94+
{{event "images"}}
95+
96+
{{fields "images"}}
97+
98+
### Moderations
99+
100+
The `moderations` data stream captures moderations usage metrics.
101+
102+
{{event "moderations"}}
103+
104+
{{fields "moderations"}}
105+
106+
### Vector stores
107+
108+
The `vector_stores` data stream captures vector stores usage metrics.
109+
110+
{{event "vector_stores"}}
111+
112+
{{fields "vector_stores"}}

packages/openai/changelog.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# newer versions go on top
2+
- version: "0.1.0"
3+
changes:
4+
- description: Initial draft of the OpenAI integration.
5+
type: enhancement
6+
link: https://github.com/elastic/integrations/pull/12494
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
config_version: 2
2+
interval: {{interval}}
3+
{{#if enable_request_tracer}}
4+
resource.tracer.filename: "../../logs/cel/openai-audio-speeches-http-request-trace-*.ndjson"
5+
resource.tracer.maxbackups: 5
6+
{{/if}}
7+
resource.url: {{api_url}}
8+
fields_under_root: true
9+
keep_null: true
10+
state:
11+
initial_interval: {{initial_interval}}
12+
access_token: {{admin_token}}
13+
page: null
14+
redact:
15+
fields:
16+
- access_token
17+
program: |
18+
(
19+
!has(state.initial_start_time) ?
20+
state.with({
21+
"initial_start_time": int(timestamp(now - duration(state.initial_interval)))
22+
})
23+
:
24+
state
25+
).as(state,
26+
(
27+
state.?want_more.orValue(false) ?
28+
state
29+
:
30+
state.with({
31+
"page": null,
32+
"initial_start_time": state.?cursor.last_bucket_start.orValue(int(timestamp(now - duration(state.initial_interval))))
33+
})
34+
).as(state,
35+
request(
36+
"GET",
37+
state.url + "?" + {
38+
"start_time": [string(int(state.initial_start_time))],
39+
"page": state.page != null ? [state.page] : [],
40+
"bucket_width": ["{{ bucket_width }}"],
41+
"group_by": ["project_id,user_id,api_key_id,model"]
42+
}.format_query()
43+
).with({
44+
"Header": {
45+
"Authorization": ["Bearer " + state.access_token],
46+
"Content-Type": ["application/json"]
47+
}
48+
}).do_request().as(resp, resp.StatusCode == 200 ?
49+
bytes(resp.Body).decode_json().as(body,
50+
{
51+
"events": body.data.filter(bucket,
52+
!(body.has_more == false && bucket.start_time >= body.data.map(bucket, bucket.start_time).max()))
53+
.map(bucket, size(bucket.results) > 0 ? bucket.results.map(result, { "message": result.with({"start_time": bucket.start_time, "end_time": bucket.end_time }).encode_json() }) : [{}]).flatten()
54+
,
55+
"cursor": {
56+
"last_bucket_start": size(body.data) > 0 ? body.data.map(bucket, bucket.start_time).max() : state.?cursor.last_bucket_start,
57+
"last_bucket_end": size(body.data) > 0 ? body.data.map(bucket, bucket.end_time).max() : state.?cursor.last_bucket_end
58+
},
59+
"want_more": body.has_more,
60+
"page": body.has_more ? body.next_page : null,
61+
"access_token": state.access_token,
62+
"initial_start_time": state.initial_start_time,
63+
"url": state.url
64+
}
65+
)
66+
:
67+
{
68+
"events": {
69+
"error": {
70+
"code": string(resp.StatusCode),
71+
"message": "GET: " + (size(resp.Body) != 0 ? string(resp.Body) : string(resp.Status) + " (" + string(resp.StatusCode) + ")")
72+
}
73+
},
74+
"want_more": false,
75+
"access_token": state.access_token,
76+
"initial_start_time": state.initial_start_time
77+
}
78+
)
79+
)
80+
)
81+
tags:
82+
{{#if preserve_original_event}}
83+
- preserve_original_event
84+
{{/if}}
85+
{{#each tags as |tag|}}
86+
- {{tag}}
87+
{{/each}}
88+
{{#contains "forwarded" tags}}
89+
publisher_pipeline.disable_host: true
90+
{{/contains}}
91+
{{#if processors}}
92+
processors:
93+
{{processors}}
94+
{{/if}}
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
description: Pipeline for audio_speeches usage
3+
processors:
4+
- set:
5+
field: ecs.version
6+
value: "8.16.0"
7+
- set:
8+
field: event.created
9+
copy_from: "@timestamp"
10+
- set:
11+
field: event.kind
12+
value: metric
13+
- rename:
14+
field: message
15+
target_field: openai.audio_speeches.results
16+
ignore_missing: true
17+
if: ctx.event?.original == null
18+
- remove:
19+
field: message
20+
if: ctx.event?.original != null
21+
ignore_missing: true
22+
- json:
23+
field: openai.audio_speeches.results
24+
target_field: openai.audio_speeches
25+
- remove:
26+
field: openai.audio_speeches.results
27+
ignore_missing: true
28+
# Add base OpenAI fields
29+
- rename:
30+
field: openai.audio_speeches.model
31+
target_field: openai.base.model
32+
ignore_missing: true
33+
- rename:
34+
field: openai.audio_speeches.project_id
35+
target_field: openai.base.project_id
36+
ignore_missing: true
37+
- rename:
38+
field: openai.audio_speeches.user_id
39+
target_field: openai.base.user_id
40+
ignore_missing: true
41+
- rename:
42+
field: openai.audio_speeches.api_key_id
43+
target_field: openai.base.api_key_id
44+
ignore_missing: true
45+
- rename:
46+
field: openai.audio_speeches.num_model_requests
47+
target_field: openai.base.num_model_requests
48+
ignore_missing: true
49+
- rename:
50+
field: openai.audio_speeches.object
51+
target_field: openai.base.usage_object_type
52+
ignore_missing: true
53+
- date:
54+
field: openai.audio_speeches.start_time
55+
target_field: openai.base.start_time
56+
formats:
57+
- UNIX
58+
- date:
59+
field: openai.audio_speeches.end_time
60+
target_field: openai.base.end_time
61+
formats:
62+
- UNIX
63+
- set:
64+
field: '@timestamp'
65+
copy_from: openai.base.start_time
66+
- remove:
67+
field:
68+
- openai.audio_speeches.start_time
69+
- openai.audio_speeches.end_time
70+
ignore_failure: true
71+
72+
73+
###################
74+
# Failure handler #
75+
###################
76+
on_failure:
77+
- append:
78+
tag: append_error_message
79+
field: error.message
80+
value: |
81+
Processor "{{{ _ingest.on_failure_processor_type }}}"
82+
with tag "{{{ _ingest.on_failure_processor_tag }}}"
83+
in pipeline "{{{ _ingest.on_failure_pipeline }}}"
84+
failed with message "{{{ _ingest.on_failure_message }}}"
85+
- set:
86+
tag: set_event_kind
87+
field: event.kind
88+
value: pipeline_error
89+
- append:
90+
field: tags
91+
value: preserve_original_event
92+
allow_duplicates: false
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
- name: data_stream.type
2+
type: constant_keyword
3+
description: Data stream type.
4+
- name: data_stream.dataset
5+
type: constant_keyword
6+
description: Data stream dataset.
7+
- name: data_stream.namespace
8+
type: constant_keyword
9+
description: Data stream namespace.
10+
- name: '@timestamp'
11+
type: date
12+
description: Event timestamp.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
- name: openai
2+
type: group
3+
fields:
4+
# Base fields shared across OpenAI data streams
5+
- name: base
6+
type: group
7+
description: |
8+
Common fields across OpenAI data streams — completions, embeddings, moderations, images, audio_speeches and audio_transcriptions.
9+
fields:
10+
- name: model
11+
type: keyword
12+
description: Name of the OpenAI model used
13+
- name: num_model_requests
14+
type: long
15+
description: Number of requests made to the model
16+
- name: project_id
17+
type: keyword
18+
description: Identifier of the project
19+
- name: user_id
20+
type: keyword
21+
description: Identifier of the user
22+
- name: api_key_id
23+
type: keyword
24+
description: Identifier for the API key used
25+
- name: start_time
26+
type: date
27+
description: Start timestamp of the usage bucket
28+
- name: end_time
29+
type: date
30+
description: End timestamp of the usage bucket
31+
- name: usage_object_type
32+
type: keyword
33+
description: Type of the usage record
34+
# Audio-speeches-specific fields
35+
- name: audio_speeches
36+
type: group
37+
description: OpenAI audio speeches usage metrics and metadata
38+
fields:
39+
- name: characters
40+
type: long
41+
description: Number of characters processed

0 commit comments

Comments
 (0)