tools: Add 'thanos tools' 'query range' and 'query instant' subcommands for running PromQL queries over Thanos gRPC API #8501

ringerc · 2025-09-25T22:54:00Z

Add commands thanos tools query range and thanos tools query instant that use the Thanos gRPC API to execute PromQL on a Thanos Query endpoint.

This works like 'promtool query' for the Thanos gRPC API instead of the Prometheus HTTP API.

Using the Thanos gRPC API improves performance and lowers memory use by using protobuf instead of text json serialization. It also streams large result sets to the client, so the whole serialised result does not have to be accumulated in memory before any of it can be sent; this lowers latency and peak memory use.

Results cannot be fully streamed, because the Thanos executor still accumulates them into a promql.Result before sending any. But it helps reduce the memory overhead and latency of building the whole serialised response in memory while still holding the result data in memory. This lowers peak Query memory, and latency to first response.

Sending queries to Thanos Query Frontend is not currently supported, because Frontend only exposes the Prometheus HTTP API (and only uses the Prometheus HTTP API to talk to its downstreams).

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Add query subcommand group to thanos tools
Add query range and query instant subcommands that use the thanos gRPC API to execute PromQL queries against a Thanos Query gRPC API endpoint

Details

This is a WIP, with the following improvements still required:

Needs docs updates
Needs test cases
Needs (m)TLS configuration support
Needs http_proxy support
Checks for gRPC endpoint format needed (support dns:, ipv4:, etc)
~~Needs formatting options like promtool~~

Example use:

 ./thanos tools query range --start='-2h' --step '30s' --insecure \
     --server 'thanos-query.monitoring.svc.local:10901' \
     --timeout=5m --query.promql-engine=thanos \
     --query 'last_over_time({job="kubelet"}[1m])'

It can output results in

largely promtool-compatible json (buffers client-side)
json stream
promql-format series literals like promtool's default output
or raw go struct representation for debug use

Unlike promtool, it emits stats and warnings too. These are exposed as # comments in the promql-format stream.

Verification

This is still WIP and needs test cover added but manual testing is promising

$ kubectl port-forward -n monitoring "$(kubectl get pod -n monitoring -l app.kubernetes.io/component=query,app.kubernetes.io/instance=thanos -o name)" 10912:http 10911:grpc

$ time ./thanos tools query range --start='-2h' --step '5s' --insecure --server 'localhost:10911' --query.promql-engine=thanos --query 'last_over_time({job="kubelet"}[1m])' --timeout=5m >out 2>&1
./thanos tools query range --start='-2h' --step '30s' --insecure --server      8.50s user 0.89s system 106% cpu 8.834 total
$ less out
QueryRangeRequest: query:"last_over_time({job=\"kubelet\"}[1m])" start_time_seconds:1758833572 end_time_seconds:1758840772 interval_seconds:30 timeout_seconds:300 engine:thanos 
Issuing gRPC QueryRangeRequest: query:"last_over_time({job=\"kubelet\"}[1m])" start_time_seconds:1758833572 end_time_seconds:1758840772 interval_seconds:30 timeout_seconds:300 engine:thanos 
QueryRange response message: timeseries:<labels:<name:"__name__" value:"ALERTS" > labels:<name:"alertname" value:"TargetDown" > labels:<name:"alertstate" value:"pending" > labels:<name:"job" value:"kubelet" > labels:<name:"namespace" value:"kube-system" > labels:<name:"prometheus" value:"monitoring/kube-prometheus" > labels:<name:"prometheus_replica" value:"prometheus-kube-prometheus-0" > labels:<name:"service" value:"kubelet" > labels:<name:"severity" value:"warning" > samples:<value:1 timestamp:1758833662000 > samples:<value:1 timestamp:1758833692000 > samples:<value:1 timestamp:1758833722000 > > 
QueryRange response message: timeseries:<labels:<name:"__name__" value:"ALERTS_FOR_STATE" > labels:<name:"alertname" value:"TargetDown" > labels:<name:"job" value:"kubelet" > labels:<name:"namespace" value:"kube-system" > labels:<name:"prometheus" value:"monitoring/kube-prometheus" > labels:<name:"prometheus_replica" value:"prometheus-kube-prometheus-0" > labels:<name:"service" value:"kubelet" > labels:<name:"severity" value:"warning" > samples:<value:1.758833641e+09 timestamp:1758833662000 > samples:<value:1.758833641e+09 timestamp:1758833692000 > samples:<value:1.758833641e+09 timestamp:1758833722000 > > 
...
QueryRange response message: timeseries:<labels:<name:"__name__" value:"aggregator_discovery_aggregation_count_total" > labels:<name:"endpoint" value:"https-metrics" > labels:<name:"instance" value:"172.18.0.2:10250" > labels:<name:"job" value:"kubelet" > labels:<name:"metrics_path" value:"/metrics" > labels:<name:"namespace" value:"kube-system" > labels:<name:"node" value:"edbpgai-control-plane" > labels:<name:"prometheus" value:"monitoring/kube-prometheus" > labels:<name:"prometheus_replica" value:"prometheus-kube-prometheus-0" > labels:<name:"service" value:"kubelet" > samples:<timestamp:1758833662000 > [....]
[...]
QueryRange response message: stats:<samples_total:23068983 peak_samples:8 > 
QueryRange: EOF
ts=2025-09-25T22:53:01.612809733Z caller=main.go:174 level=info msg=exiting

Add commands 'thanos tools query range' and 'thanos tools query instant' that use the Thanos gRPC API to execute PromQL on a Thanos Query endpoint. This works like 'promtool query' for the Thanos gRPC API instead of the Prometheus HTTP API. Using the Thanos gRPC API improves performance and lowers memory use by using protobuf instead of text json serialization. It also streams large result sets to the client, so the whole serialised result does not have to be accumulated in memory before any of it can be sent; this lowers latency and peak memory use. Results cannot be fully streamed, because the Thanos executor still accumulates them into a promql.Result before sending any. But it helps reduce the memory overhead and latency of building the whole serialised response in memory while still holding the result data in memory. This lowers peak Query memory, and latency to first response. Sending queries to Thanos Query Frontend is not currently supported, because Frontend only exposes the Prometheus HTTP API (and only uses the Prometheus HTTP API to talk to its downstreams). This is a WIP, with the following improvements still required: * Needs docs updates * Needs test cases * Needs (m)TLS configuration support * Needs http_proxy support * Checks for gRPC endpoint format needed (support dns:, ipv4:, etc) * Needs formatting options like promtool * --query should be positional parameter Example use: ./thanos tools query range --start='-2h' --step '30s' --insecure \ --server 'thanos-query.monitoring.svc.local:10901' \ --timeout=5m --query.promql-engine=thanos \ --query 'last_over_time({job="kubelet"}[1m])'

ringerc · 2025-09-26T05:26:43Z

@MichaHoffmann suggested naming changes

maybe "tools_grpc.go" with "thanos tools grpc query-range" and such

which I can adopt soon.

We discussed the formatting code too. I explained that the json and promql formats are there to match promtool. I didn't find anything pre-existing in Thanos for them. The promtool code isn't exported, and doesn't take the same types as Thanos gRPC uses anyway.

pull-request-size bot added the size/L label Sep 25, 2025

ringerc marked this pull request as draft September 25, 2025 22:54

ringerc force-pushed the cli-query-command branch from a5f0aed to 2b5cb99 Compare September 25, 2025 22:59

ringerc force-pushed the cli-query-command branch from c7aa66d to 1e1e64b Compare September 25, 2025 23:00

tools: add output formatters to tools query subcommand

63c6c53

pull-request-size bot added size/XL and removed size/L labels Sep 26, 2025

This was referenced Oct 2, 2025

Streaming Result for Query/QueryRange HTTP API prometheus/prometheus#10040

Open

Return results from Thanos PromQL engine as a stream via an iterator #8508

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tools: Add 'thanos tools' 'query range' and 'query instant' subcommands for running PromQL queries over Thanos gRPC API #8501

tools: Add 'thanos tools' 'query range' and 'query instant' subcommands for running PromQL queries over Thanos gRPC API #8501

Uh oh!

ringerc commented Sep 25, 2025 •

edited

Loading

Uh oh!

ringerc commented Sep 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tools: Add 'thanos tools' 'query range' and 'query instant' subcommands for running PromQL queries over Thanos gRPC API #8501

Are you sure you want to change the base?

tools: Add 'thanos tools' 'query range' and 'query instant' subcommands for running PromQL queries over Thanos gRPC API #8501

Uh oh!

Conversation

ringerc commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Details

Verification

Uh oh!

ringerc commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ringerc commented Sep 25, 2025 •

edited

Loading

ringerc commented Sep 26, 2025 •

edited

Loading