Skip to content

Conversation

@ringerc
Copy link

@ringerc ringerc commented Sep 25, 2025

Add commands thanos tools query range and thanos tools query instant that use the Thanos gRPC API to execute PromQL on a Thanos Query endpoint.

This works like 'promtool query' for the Thanos gRPC API instead of the Prometheus HTTP API.

Using the Thanos gRPC API improves performance and lowers memory use by using protobuf instead of text json serialization. It also streams large result sets to the client, so the whole serialised result does not have to be accumulated in memory before any of it can be sent; this lowers latency and peak memory use.

Results cannot be fully streamed, because the Thanos executor still accumulates them into a promql.Result before sending any. But it helps reduce the memory overhead and latency of building the whole serialised response in memory while still holding the result data in memory. This lowers peak Query memory, and latency to first response.

Sending queries to Thanos Query Frontend is not currently supported, because Frontend only exposes the Prometheus HTTP API (and only uses the Prometheus HTTP API to talk to its downstreams).

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Add query subcommand group to thanos tools
  • Add query range and query instant subcommands that use the thanos gRPC API to execute PromQL queries against a Thanos Query gRPC API endpoint

Details

This is a WIP, with the following improvements still required:

  • Needs docs updates
  • Needs test cases
  • Needs (m)TLS configuration support
  • Needs http_proxy support
  • Checks for gRPC endpoint format needed (support dns:, ipv4:, etc)
  • Needs formatting options like promtool

Example use:

 ./thanos tools query range --start='-2h' --step '30s' --insecure \
     --server 'thanos-query.monitoring.svc.local:10901' \
     --timeout=5m --query.promql-engine=thanos \
     --query 'last_over_time({job="kubelet"}[1m])'

It can output results in

  • largely promtool-compatible json (buffers client-side)
  • json stream
  • promql-format series literals like promtool's default output
  • or raw go struct representation for debug use

Unlike promtool, it emits stats and warnings too. These are exposed as # comments in the promql-format stream.

Verification

This is still WIP and needs test cover added but manual testing is promising

$ kubectl port-forward -n monitoring "$(kubectl get pod -n monitoring -l app.kubernetes.io/component=query,app.kubernetes.io/instance=thanos -o name)" 10912:http 10911:grpc
$ time ./thanos tools query range --start='-2h' --step '5s' --insecure --server 'localhost:10911' --query.promql-engine=thanos --query 'last_over_time({job="kubelet"}[1m])' --timeout=5m >out 2>&1
./thanos tools query range --start='-2h' --step '30s' --insecure --server      8.50s user 0.89s system 106% cpu 8.834 total
$ less out
QueryRangeRequest: query:"last_over_time({job=\"kubelet\"}[1m])" start_time_seconds:1758833572 end_time_seconds:1758840772 interval_seconds:30 timeout_seconds:300 engine:thanos 
Issuing gRPC QueryRangeRequest: query:"last_over_time({job=\"kubelet\"}[1m])" start_time_seconds:1758833572 end_time_seconds:1758840772 interval_seconds:30 timeout_seconds:300 engine:thanos 
QueryRange response message: timeseries:<labels:<name:"__name__" value:"ALERTS" > labels:<name:"alertname" value:"TargetDown" > labels:<name:"alertstate" value:"pending" > labels:<name:"job" value:"kubelet" > labels:<name:"namespace" value:"kube-system" > labels:<name:"prometheus" value:"monitoring/kube-prometheus" > labels:<name:"prometheus_replica" value:"prometheus-kube-prometheus-0" > labels:<name:"service" value:"kubelet" > labels:<name:"severity" value:"warning" > samples:<value:1 timestamp:1758833662000 > samples:<value:1 timestamp:1758833692000 > samples:<value:1 timestamp:1758833722000 > > 
QueryRange response message: timeseries:<labels:<name:"__name__" value:"ALERTS_FOR_STATE" > labels:<name:"alertname" value:"TargetDown" > labels:<name:"job" value:"kubelet" > labels:<name:"namespace" value:"kube-system" > labels:<name:"prometheus" value:"monitoring/kube-prometheus" > labels:<name:"prometheus_replica" value:"prometheus-kube-prometheus-0" > labels:<name:"service" value:"kubelet" > labels:<name:"severity" value:"warning" > samples:<value:1.758833641e+09 timestamp:1758833662000 > samples:<value:1.758833641e+09 timestamp:1758833692000 > samples:<value:1.758833641e+09 timestamp:1758833722000 > > 
...
QueryRange response message: timeseries:<labels:<name:"__name__" value:"aggregator_discovery_aggregation_count_total" > labels:<name:"endpoint" value:"https-metrics" > labels:<name:"instance" value:"172.18.0.2:10250" > labels:<name:"job" value:"kubelet" > labels:<name:"metrics_path" value:"/metrics" > labels:<name:"namespace" value:"kube-system" > labels:<name:"node" value:"edbpgai-control-plane" > labels:<name:"prometheus" value:"monitoring/kube-prometheus" > labels:<name:"prometheus_replica" value:"prometheus-kube-prometheus-0" > labels:<name:"service" value:"kubelet" > samples:<timestamp:1758833662000 > [....]
[...]
QueryRange response message: stats:<samples_total:23068983 peak_samples:8 > 
QueryRange: EOF
ts=2025-09-25T22:53:01.612809733Z caller=main.go:174 level=info msg=exiting

@ringerc ringerc marked this pull request as draft September 25, 2025 22:54
Add commands 'thanos tools query range' and 'thanos tools query instant'
that use the Thanos gRPC API to execute PromQL on a Thanos Query
endpoint.

This works like 'promtool query' for the Thanos gRPC API instead of the
Prometheus HTTP API.

Using the Thanos gRPC API improves performance and lowers memory use by
using protobuf instead of text json serialization. It also streams large
result sets to the client, so the whole serialised result does not have
to be accumulated in memory before any of it can be sent; this lowers
latency and peak memory use.

Results cannot be fully streamed, because the Thanos executor still
accumulates them into a promql.Result before sending any. But it helps
reduce the memory overhead and latency of building the whole serialised
response in memory while still holding the result data in memory. This
lowers peak Query memory, and latency to first response.

Sending queries to Thanos Query Frontend is not currently supported,
because Frontend only exposes the Prometheus HTTP API (and only uses the
Prometheus HTTP API to talk to its downstreams).

This is a WIP, with the following improvements still required:

* Needs docs updates
* Needs test cases
* Needs (m)TLS configuration support
* Needs http_proxy support
* Checks for gRPC endpoint format needed (support dns:, ipv4:, etc)
* Needs formatting options like promtool
* --query should be positional parameter

Example use:

   ./thanos tools query range --start='-2h' --step '30s' --insecure \
       --server 'thanos-query.monitoring.svc.local:10901' \
       --timeout=5m --query.promql-engine=thanos \
       --query 'last_over_time({job="kubelet"}[1m])'
@ringerc
Copy link
Author

ringerc commented Sep 26, 2025

@MichaHoffmann suggested naming changes

maybe "tools_grpc.go" with "thanos tools grpc query-range" and such

which I can adopt soon.

We discussed the formatting code too. I explained that the json and promql formats are there to match promtool. I didn't find anything pre-existing in Thanos for them. The promtool code isn't exported, and doesn't take the same types as Thanos gRPC uses anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant