Skip to content

Conversation

@krajorama
Copy link
Member

@krajorama krajorama commented Apr 24, 2025

Background

Based on https://github.com/prometheus/proposals/blob/main/proposals/2024-01-29_native_histograms_text_format.md
And OpenMetrics 2.0 WG discussions.

Changes

  • Allow structured complex types marked by "{" and "}" in the specification.
  • Allow multiple exemplars per complex type value.
  • Require that exemplars for complex type values have the timestamp. (1)
  • Be permissive about observing NaN , +Inf, -Inf. Discourage observing NaN.
  • Split histogram into ones with classic and native buckets.
  • For classic buckets, define behavior when observing NaN.
  • Define the native buckets and also how NaN is handled.
  • Define the text format of native histograms and also their exemplars.
  • Be permissive about the existence of _sum and _count series for classic histograms - for backwards compatibility.

(1) Not doing this on account of being an implementation limitation and also contradicts having backwards compatibility when exposing classic histograms as complex type.

Out of scope

Open questions / decisions

See OpenMetrics2.0 WG meeting notes tab

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>

Numbers MUST be either floating points or integers. Note that ingestors of the format MAY only support float64. The non-real values NaN, +Inf and -Inf MUST be supported. NaN MUST NOT be considered a missing value, but it MAY be used to signal a division by zero.

Complex data types MUST contain all information necessary to recreate a Metric Type, with the exception of Created time and Exemplars.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assume we'll have the created timestamp separate from the JSON like data. prometheus/OpenMetrics#285

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama
Copy link
Member Author

Note self: add details how summaries and classic histograms one liners (so not NHCB spans/deltas) fit into it.

krajorama added 4 commits May 22, 2025 10:05
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Add semantic conventions about where complex types may occure.
Allow empty spans and deltas.
Be more precise.

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some minutes to read the PR. Couldn't read everything though, I'll do a more complete review another day!

Reviewed from mobile

@beorn7 beorn7 self-requested a review May 27, 2025 11:26
krajorama added 2 commits May 27, 2025 15:53
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama marked this pull request as ready for review May 27, 2025 14:01
@krajorama

This comment was marked as outdated.

I did not want to clutter the examples with adding "# EOF" to all,
so I made a new marker to explicitly add it on request.

There was only one faulty example where we had an extra space, see
the Exemplars section.

Signed-off-by: György Krajcsovits <[email protected]>
Allow multiple exemplars for complex types, i.e. native histograms.
But require that the timestamp is present.

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama force-pushed the krajo/om2.0-native-histograms branch from c16426b to 7965787 Compare June 11, 2025 07:27
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all of this.

Mostly commented about the histogram aspect (but I couldn't resist and referred to OM specific things now and then).

beorn7
beorn7 previously requested changes Jun 24, 2025
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

Sorry for adding some comments outside of a review.

I also assume that non-outdated non-resolved comments are still on your radar to deal with later.

@bwplotka bwplotka moved this to Done in Open Metrics 2.0 Sep 3, 2025
@bwplotka bwplotka moved this from Done to In Progress in Open Metrics 2.0 Sep 3, 2025
@beorn7
Copy link
Member

beorn7 commented Sep 3, 2025

Just one more note from my side. (But as said, I'm not authorized to approve OMv2 related things.)

@bwplotka
Copy link
Member

bwplotka commented Sep 3, 2025

I'm planning to go through this, but looks generally good!

@beorn7
Copy link
Member

beorn7 commented Sep 3, 2025

My comments have all been addressed. (I just don't approve this for the reasons stated previously.)

…stograms

# Conflicts:
#	docs/specs/om/open_metrics_spec_2_0.md
##### Histogram with Classic Buckets

The MetricPoint's Bucket Values Sample MetricNames MUST have the suffix `_bucket`. If present, the MetricPoint's Sum Value Sample MetricName MUST have the suffix `_sum`.
The MetricPoint's Sum Value Sample MetricName MUST have the suffix `_sum`. The MetricPoint's Count Value Sample MetricName MUST have the suffix `_count`. The MetricPoint's Classic Bucket values Sample MetricNames MUST have the suffix `_bucket`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewers: this will go away with #2679


Histograms with Native Buckets MUST use the integer native histogram data type.

The integer native histogram data type is represented as structured data with fields. There MUST NOT be any whitespace around fields.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewers: whitespace is under discussion for readability in complex types in general

@krajorama krajorama dismissed beorn7’s stale review November 5, 2025 08:10

Discussion is concluded. Awaiting reviews from approvers.

Signed-off-by: György Krajcsovits <[email protected]>

With the exception of the `sum` and `zero_threshold` field, all numbers MUST be integers and MUST NOT include dot '.' or exponent 'e'.

Native Bucket values MUST be ordered by their index, and their values MUST be placed in the `negative_buckets` (and/or `positive_buckets`) fields.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewers: we don't use delta encoding here, negative_buckets and positive_buckets hold the actual absolute non negative integer value.


Numbers MUST be either floating points or integers. Note that ingestors of the format MAY only support float64. The non-real values NaN, +Inf and -Inf MUST be supported. NaN MUST NOT be considered a missing value, but it MAY be used to signal a division by zero.

Complex data types MUST contain all information necessary to recreate a sample of a Metric Type, with the exception of Created Timestamp and Exemplars.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Complex data types MUST contain all information necessary to recreate a sample of a Metric Type, with the exception of Created Timestamp and Exemplars.
Complex data types MUST contain all information necessary to recreate a sample of a Metric Type, with the exception of Start Timestamp and Exemplars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this merges after the created -> start PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave this to #2770 and see which is merged first. But I'll just update it already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be easier to do it after the other PR is merged actually, deferring.

Histograms measure distributions of discrete events. Common examples are the latency of HTTP requests, function runtimes, or I/O request sizes.

A Histogram MetricPoint MUST contain at least one bucket, and SHOULD contain Sum, and Created Timestamp values. Every bucket MUST have a threshold and a value.
A Histogram MetricPoint MUST contain Count, Sum values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean a histogram with negative observations will no longer omit the sum? This differs from OTel's semantics for the histogram's sum: https://github.com/open-telemetry/opentelemetry-proto/blob/a8951735f7801e8adfaec5c0ace9262771cfec6e/opentelemetry/proto/metrics/v1/metrics.proto#L470.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, yes. See also #2627

As stated in #2627 , requiring the opposite would be breaking for many people. Also native histograms don't have optional Sum field, so best you could do is set the Sum to 0.0. Which also means that when we convert classic histograms to NHCB, we get 0.0 if Sum is missing.

Now when it comes to conversion, you can easily convert the Prometheus native histogram to OTEL exponential and leave out the Sum if there's negative observations (negative buckets in use) or the zero bucket is in use (since the zero bucket doesn't tell you the sign of the observations).

I don't think this is a problem in reality. Most histograms that I've seen measure time or size which are naturally positive so you don't actually run into problems (unless there's a bug). On the other hand, what if you do want to measure negative values? This requirement would force you to add one more metric just for the Sum or some other trick.

A Histogram MUST measure values that are not NaN in either [Classic Buckets](#classic-buckets) or [Native Buckets](#native-buckets) or both. Measuring NaN is different for Classic and Native Buckets, see in their respective sections.

Semantically, buckets values are counters so MUST NOT be NaN or negative.
If a Histogram stops measuring values in either Classic or Native Buckets and keeps measuring values in the other, it MUST clear and not expose the buckets it stopped measuring into. This avoids exposing different distribution from the two kind of buckets at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we phrase this in terms of the requirements on the protocol, rather than requirements on metrics libraries? You can still describe how a metrics library can achieve the desired result, if thats helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like: "If both Classic and Native Buckets are in use, a Histogram MUST NOT contain measurements in Classic (or Native) Buckets that was taken later than all measurements in the other kind of Native (or Classic) Buckets. This avoids the Histogram having a recent distribution in one kind of buckets and something outdated in the other kind."

In reality this might be hard to achieve 100% if you get the data async , so a couple of measurements might be taken between exposing the native and classic buckets.

So we can also make this a SHOULD, after all we get independent time series out of the different buckets.
Or in fact we can drop it altogether if we think this is too much.

Every Bucket MUST have well defined boundaries and a value. Boundaries of a Bucket MUST NOT be NaN. Bucket values MUST be integers. Semantically, bucket values are counters so MUST NOT be NaN or negative.

Negative threshold buckets MAY be used. Bucket thresholds MUST NOT equal NaN. Count and bucket values MUST be integers.
A Histogram SHOULD refuse to measure NaN value as adding NaN to the Sum will make the Sum equal to NaN and mask the sum of the real measurements until the next reset of the counters. If a Histogram does allow NaN, then NaN MUST be counted in the Count and MUST be added to the Sum, resulting in the Sum becoming NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A Histogram SHOULD refuse to measure NaN value as adding NaN to the Sum will make the Sum equal to NaN and mask the sum of the real measurements until the next reset of the counters. If a Histogram does allow NaN, then NaN MUST be counted in the Count and MUST be added to the Sum, resulting in the Sum becoming NaN.
A Histogram SHOULD NOT include NaN measurements as including NaN in the Sum will make the Sum equal to NaN and mask the sum of the real measurements for the lifetime of the time series. If a Histogram includes NaN measurements, then NaN measurements MUST be counted in the Count and the Sum MUST be NaN.

Same suggestion about phrasing based on the protocol, rather than based on how the metric is aggregated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you also suggesting to move the text elsewhere or just rephrase?

Native histograms or any metric for that matter can be reset within the lifetime of the series, so I would stick to reset of the counters or more accurately reset of the Histogram.

Negative threshold buckets MAY be used. Bucket thresholds MUST NOT equal NaN. Count and bucket values MUST be integers.
A Histogram SHOULD refuse to measure NaN value as adding NaN to the Sum will make the Sum equal to NaN and mask the sum of the real measurements until the next reset of the counters. If a Histogram does allow NaN, then NaN MUST be counted in the Count and MUST be added to the Sum, resulting in the Sum becoming NaN.

A Histogram MAY refuse to measure +Inf and -Inf values as adding these to the Sum will mask the sum of the real measurements until the next reset of the counters. If a Histogram measures +Inf or -Inf, then +Inf or -Inf MUST be counted in the Count and MUST be added to the Sum, potentially resulting in +Inf, -Inf or NaN in the Sum, the later for example in case of adding +Inf to -Inf.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have guidance for whether they should or not? Or are we actually ambivalent about whether +/- Inf are included?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I can reword to say : if +-Inf is included, then what to do and add an explanation what it means. So basically flip the two sentences around and not use "MAY" . Something like:
"If a Histogram includes +Inf or -Inf measurement, then +Inf or -Inf MUST be counted in Count and MUST be added to the Sum, potentially resulting in +Inf, -Inf or NaN in the Sum, the later for example in case of adding +Inf to -Inf. Note that in this case the Sum of finite measurements is masked until the next reset of the Histogram."

WDYT?


A Histogram MetricPoint with Native Buckets MAY contain exemplars.

Exemplars associated with a Histogram MetricPoint with Native Buckets SHOULD have a timestamp. Note: storage implementations may drop exemplars without timestamps if keeping track of exemplars without timestamps is too resource intensive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the exemplar section (assuming these requirements apply across the board)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We should discuss in the WG.

In general it would be great if we required timestamp all the time, but unfortunately getting the current time is usually pretty expensive, so I think saying MUST is out of the question. Prometheus solves this for single series by always comparing the received exemplar value to the last recorded value. If it matches, we drop it (no timestamp) needed.

So at scrape , if the exemplar is coming from a float series and doesn't have a timestamp , we assign it the scrape time as timestamp. The exemplar will not be stored in the exemplar store if it's a duplicate, so no problem. (Although it can lead to weird things as the exemplar storage is finite, so the exemplar might be rotated out of the storage and replaced with the same value , but later timestamp. Exemplars are experimental for a reason https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage ).

For Histograms, the situation is tricky. If you store the Histogram as a native histogram, then you have a single series and you get multiple exemplars (N) at once . If they don't have a timestamp, then currently we have no way of comparing them to the last (M) exemplars in storage as the storage has no backwards references, no interface to add all exemplars at once to at least get them due to order in the buffer and doesn't store how many exemplars (M) belong together. We could solve this. Maybe help implementing prometheus/prometheus#17469 ?

The thing that we cannot solve at the moment is that Remote-Write may split exemplars between messages, so you might not get all exemplars at the same time. Which means you don't get (N) exemplars, you get (n) . Maybe even this is ok?

We've tried to solve the issue in prometheus/prometheus#13021 once , but didn't manage.

Signed-off-by: György Krajcsovits <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

6 participants