elastic
diff --git a/‎.github/CODEOWNERS‎
Lines changed: 2 additions & 1 deletion b/‎.github/CODEOWNERS‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎packages/openai/_dev/build/build.yml‎
Lines changed: 3 additions & 0 deletions b/‎packages/openai/_dev/build/build.yml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎packages/openai/_dev/build/docs/README.md‎
Lines changed: 112 additions & 0 deletions b/‎packages/openai/_dev/build/docs/README.md‎
Lines changed: 112 additions & 0 deletions
diff --git a/‎packages/openai/changelog.yml‎
Lines changed: 6 additions & 0 deletions b/‎packages/openai/changelog.yml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎packages/openai/data_stream/audio_speeches/agent/stream/cel.yml.hbs‎
Lines changed: 94 additions & 0 deletions b/‎packages/openai/data_stream/audio_speeches/agent/stream/cel.yml.hbs‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎packages/openai/data_stream/audio_speeches/elasticsearch/ingest_pipeline/default.yml‎
Lines changed: 92 additions & 0 deletions b/‎packages/openai/data_stream/audio_speeches/elasticsearch/ingest_pipeline/default.yml‎
Lines changed: 92 additions & 0 deletions
diff --git a/‎packages/openai/data_stream/audio_speeches/fields/base-fields.yml‎
Lines changed: 12 additions & 0 deletions b/‎packages/openai/data_stream/audio_speeches/fields/base-fields.yml‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎packages/openai/data_stream/audio_speeches/fields/fields.yml‎
Lines changed: 41 additions & 0 deletions b/‎packages/openai/data_stream/audio_speeches/fields/fields.yml‎
Lines changed: 41 additions & 0 deletions
@@ -307,6 +307,7 @@
 /packages/nginx_ingress_controller_otel @elastic/obs-infraobs-integrations
 /packages/o365 @elastic/security-service-integrations
 /packages/okta @elastic/security-service-integrations
+/packages/openai @elastic/obs-infraobs-integrations
 /packages/opencanary @elastic/security-service-integrations
 /packages/oracle @elastic/obs-infraobs-integrations
 /packages/oracle_weblogic @elastic/obs-infraobs-integrations
@@ -480,4 +481,4 @@
 /packages/o365_metrics/data_stream/yammer_device_usage @elastic/obs-infraobs-integrations
 /packages/o365_metrics/data_stream/service_health @elastic/obs-infraobs-integrations
 /packages/o365_metrics/data_stream/viva_engage_device_usage_user_counts @elastic/obs-infraobs-integrations
-/packages/o365_metrics/data_stream/subscriptions @elastic/obs-infraobs-integrations
+/packages/o365_metrics/data_stream/subscriptions @elastic/obs-infraobs-integrations
@@ -0,0 +1,3 @@
+dependencies:
+  ecs:
+    reference: [email protected]
@@ -0,0 +1,112 @@
+# OpenAI
+
+The OpenAI integration allows you to monitor OpenAI API usage metrics. OpenAI is an AI research and deployment company that offers [API platform](https://openai.com/api) for their industry-leading foundation models.
+
+With the OpenAI integration, you can track API usage metrics across their models, as well as for vector store and code interpreter. You will use Kibana to visualize your data, create alerts if usage limits are approaching, and view metrics when you troubleshoot issues. For example, you can track token usage and API calls per model.
+
+## Data collection
+
+The OpenAI integration leverages the [OpenAI Usage API](https://platform.openai.com/docs/api-reference/usage) to collect detailed usage metrics. The Usage API delivers comprehensive insights into your API activity, helping you understand and optimize your organization's OpenAI API usage.
+
+## Data streams
+
+The OpenAI integration collects the following logs data streams:
+
+- `audio_speeches`: Collects audio speeches usage metrics.
+- `audio_transcriptions`: Collects audio transcriptions usage metrics.
+- `code_interpreter_sessions`: Collects code interpreter sessions usage metrics.
+- `completions`: Collects completions usage metrics.
+- `embeddings`: Collects embeddings usage metrics.
+- `images`: Collects images usage metrics.
+- `moderations`: Collects moderations usage metrics.
+- `vector_stores`: Collects vector stores usage metrics.
+
+See more details for data streams in the [Logs](#logs-reference).
+
+## Requirements
+
+You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it.
+
+You need an OpenAI account with a valid [Admin key](https://platform.openai.com/settings/organization/admin-keys) for programmatic access to [OpenAI Usage API](https://platform.openai.com/docs/api-reference/usage).
+
+## Setup
+
+For step-by-step instructions on how to set up an integration, see the [Getting started](https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-observability.html) guide.
+
+### Generate an Admin key
+
+To generate an Admin key, please generate a key or use an existing one from the [Admin keys](https://platform.openai.com/settings/organization/admin-keys) page. Use the Admin key to configure the OpenAI integration.
+
+## Collection behavior
+
+By default, the OpenAI integration fetches metrics with a bucket width of 1 day (`1d`), which means metrics are aggregated by day. metrics are collected from the initial start time until the current time, excluding the current bucket since it is incomplete. So, based on configured bucket width, the integration collects metrics from the initial start time until the current time minus the bucket width.
+
+## Logs reference
+
+**ECS Field Reference**
+
+Refer to this [document](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) for detailed information on ECS fields.
+
+### Audio speeches
+
+The `audio_speeches` data stream captures audio speeches usage metrics.
+
+{{event "audio_speeches"}}
+
+{{fields "audio_speeches"}}
+
+### Audio transcriptions
+
+The `audio_transcriptions` data stream captures audio transcriptions usage metrics.
+
+{{event "audio_transcriptions"}}
+
+{{fields "audio_transcriptions"}}
+
+### Code interpreter sessions
+
+The `code_interpreter_sessions` data stream captures code interpreter sessions usage metrics.
+
+{{event "code_interpreter_sessions"}}
+
+{{fields "code_interpreter_sessions"}}
+
+### Completions
+
+The `completions` data stream captures completions usage metrics.
+
+{{event "completions"}}
+
+{{fields "completions"}}
+
+### Embeddings
+
+The `embeddings` data stream captures embeddings usage metrics.
+
+{{event "embeddings"}}
+
+{{fields "embeddings"}}
+
+### Images
+
+The `images` data stream captures images usage metrics.
+
+{{event "images"}}
+
+{{fields "images"}}
+
+### Moderations
+
+The `moderations` data stream captures moderations usage metrics.
+
+{{event "moderations"}}
+
+{{fields "moderations"}}
+
+### Vector stores
+
+The `vector_stores` data stream captures vector stores usage metrics.
+
+{{event "vector_stores"}}
+
+{{fields "vector_stores"}}
@@ -0,0 +1,6 @@
+# newer versions go on top
+- version: "0.1.0"
+  changes:
+    - description: Initial draft of the OpenAI integration.
+      type: enhancement
+      link: https://github.com/elastic/integrations/pull/12494
@@ -0,0 +1,94 @@
+config_version: 2
+interval: {{interval}}
+{{#if enable_request_tracer}}
+resource.tracer.filename: "../../logs/cel/openai-audio-speeches-http-request-trace-*.ndjson"
+resource.tracer.maxbackups: 5
+{{/if}}
+resource.url: {{api_url}}
+fields_under_root: true
+keep_null: true
+state:
+  initial_interval: {{initial_interval}}
+  access_token: {{admin_token}}
+  page: null
+redact:
+  fields:
+    - access_token
+program: |
+  (
+    !has(state.initial_start_time) ?
+      state.with({
+          "initial_start_time": int(timestamp(now - duration(state.initial_interval)))
+      })
+    :
+      state
+  ).as(state,
+    (
+      state.?want_more.orValue(false) ?
+        state
+      :
+        state.with({
+          "page": null,
+          "initial_start_time": state.?cursor.last_bucket_start.orValue(int(timestamp(now - duration(state.initial_interval))))
+        })
+    ).as(state,
+      request(
+        "GET",
+        state.url + "?" + {
+          "start_time": [string(int(state.initial_start_time))],
+          "page": state.page != null ? [state.page] : [],
+          "bucket_width": ["{{ bucket_width }}"],
+          "group_by": ["project_id,user_id,api_key_id,model"]
+        }.format_query()
+      ).with({
+        "Header": {
+          "Authorization": ["Bearer " + state.access_token],
+          "Content-Type": ["application/json"]
+        }
+      }).do_request().as(resp, resp.StatusCode == 200 ?
+          bytes(resp.Body).decode_json().as(body,
+            {
+              "events": body.data.filter(bucket,
+                    !(body.has_more == false && bucket.start_time >= body.data.map(bucket, bucket.start_time).max()))
+                  .map(bucket, size(bucket.results) > 0 ? bucket.results.map(result, { "message": result.with({"start_time": bucket.start_time, "end_time": bucket.end_time }).encode_json() }) : [{}]).flatten()
+              ,
+              "cursor": {
+                "last_bucket_start": size(body.data) > 0 ? body.data.map(bucket, bucket.start_time).max() : state.?cursor.last_bucket_start,
+                "last_bucket_end": size(body.data) > 0 ? body.data.map(bucket, bucket.end_time).max() : state.?cursor.last_bucket_end
+              },
+              "want_more": body.has_more,
+              "page": body.has_more ? body.next_page : null,
+              "access_token": state.access_token,
+              "initial_start_time": state.initial_start_time,
+              "url": state.url
+            }
+          )
+        :
+          {
+            "events": {
+              "error": {
+                "code": string(resp.StatusCode),
+                "message": "GET: " + (size(resp.Body) != 0 ? string(resp.Body) : string(resp.Status) + " (" + string(resp.StatusCode) + ")")
+              }
+            },
+            "want_more": false,
+            "access_token": state.access_token,
+            "initial_start_time": state.initial_start_time
+          }
+      )
+    )
+  )
+tags:
+{{#if preserve_original_event}}
+  - preserve_original_event
+{{/if}}
+{{#each tags as |tag|}}
+  - {{tag}}
+{{/each}}
+{{#contains "forwarded" tags}}
+publisher_pipeline.disable_host: true
+{{/contains}}
+{{#if processors}}
+processors:
+{{processors}}
+{{/if}}
@@ -0,0 +1,92 @@
+---
+description: Pipeline for audio_speeches usage
+processors:
+    - set:
+        field: ecs.version
+        value: "8.16.0"
+    - set:
+        field: event.created
+        copy_from: "@timestamp"
+    - set:
+        field: event.kind
+        value: metric
+    - rename:
+        field: message
+        target_field: openai.audio_speeches.results
+        ignore_missing: true
+        if: ctx.event?.original == null
+    - remove:
+        field: message
+        if: ctx.event?.original != null
+        ignore_missing: true
+    - json:
+        field: openai.audio_speeches.results
+        target_field: openai.audio_speeches
+    - remove:
+        field: openai.audio_speeches.results
+        ignore_missing: true
+    # Add base OpenAI fields
+    - rename:
+        field: openai.audio_speeches.model
+        target_field: openai.base.model
+        ignore_missing: true
+    - rename:
+        field: openai.audio_speeches.project_id
+        target_field: openai.base.project_id
+        ignore_missing: true
+    - rename:
+        field: openai.audio_speeches.user_id
+        target_field: openai.base.user_id
+        ignore_missing: true
+    - rename:
+        field: openai.audio_speeches.api_key_id
+        target_field: openai.base.api_key_id
+        ignore_missing: true
+    - rename:
+        field: openai.audio_speeches.num_model_requests
+        target_field: openai.base.num_model_requests
+        ignore_missing: true
+    - rename:
+        field: openai.audio_speeches.object
+        target_field: openai.base.usage_object_type
+        ignore_missing: true
+    - date:
+        field: openai.audio_speeches.start_time
+        target_field: openai.base.start_time
+        formats:
+            - UNIX
+    - date:
+        field: openai.audio_speeches.end_time
+        target_field: openai.base.end_time
+        formats:
+            - UNIX
+    - set:
+        field: '@timestamp'
+        copy_from: openai.base.start_time
+    - remove:
+        field:
+            - openai.audio_speeches.start_time
+            - openai.audio_speeches.end_time
+        ignore_failure: true
+
+
+###################
+# Failure handler #
+###################
+on_failure:
+    - append:
+        tag: append_error_message
+        field: error.message
+        value: |
+            Processor "{{{ _ingest.on_failure_processor_type }}}"
+            with tag "{{{ _ingest.on_failure_processor_tag }}}"
+            in pipeline "{{{ _ingest.on_failure_pipeline }}}"
+            failed with message "{{{ _ingest.on_failure_message }}}"
+    - set:
+        tag: set_event_kind
+        field: event.kind
+        value: pipeline_error
+    - append:
+        field: tags
+        value: preserve_original_event
+        allow_duplicates: false
@@ -0,0 +1,12 @@
+- name: data_stream.type
+  type: constant_keyword
+  description: Data stream type.
+- name: data_stream.dataset
+  type: constant_keyword
+  description: Data stream dataset.
+- name: data_stream.namespace
+  type: constant_keyword
+  description: Data stream namespace.
+- name: '@timestamp'
+  type: date
+  description: Event timestamp.
@@ -0,0 +1,41 @@
+- name: openai
+  type: group
+  fields:
+    # Base fields shared across OpenAI data streams
+    - name: base
+      type: group
+      description: |
+        Common fields across OpenAI data streams — completions, embeddings, moderations, images, audio_speeches and audio_transcriptions.
+      fields:
+        - name: model
+          type: keyword
+          description: Name of the OpenAI model used
+        - name: num_model_requests
+          type: long
+          description: Number of requests made to the model
+        - name: project_id
+          type: keyword
+          description: Identifier of the project
+        - name: user_id
+          type: keyword
+          description: Identifier of the user
+        - name: api_key_id
+          type: keyword
+          description: Identifier for the API key used
+        - name: start_time
+          type: date
+          description: Start timestamp of the usage bucket
+        - name: end_time
+          type: date
+          description: End timestamp of the usage bucket
+        - name: usage_object_type
+          type: keyword
+          description: Type of the usage record
+    # Audio-speeches-specific fields
+    - name: audio_speeches
+      type: group
+      description: OpenAI audio speeches usage metrics and metadata
+      fields:
+        - name: characters
+          type: long
+          description: Number of characters processed
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+dependencies:`
	`2`	`+ ecs:`
	`3`	`+ reference: [email protected]`