StarRocks version 4.0

4.0.0

Release date: October 17, 2025

Data Lake Analytics

Unified Page Cache and Data Cache for BE metadata, and adopted an adaptive strategy for scaling. #61640
Optimized metadata file parsing for Iceberg statistics to avoid repetitive parsing. #59955
Optimized COUNT/MIN/MAX queries against Iceberg metadata by efficiently skipping over data file scans, significantly improving aggregation query performance on large partitioned tables and reducing resource consumption. #60385
Supports compaction for Iceberg tables via procedure rewrite_data_files.
Supports Iceberg tables with hidden partitions, including creating, writing, and reading the tables. #58914
Supports setting sort keys when creating Iceberg tables.
Optimizes sink performance for Iceberg tables.
- Iceberg Sink supports spilling large operators, global shuffle, and local sorting to optimize memory usage and address small file issues. #61963
- Iceberg Sink optimizes local sorting based on Spill Partition Writer to improve write efficiency. #62096
- Iceberg Sink supports global shuffle for partitions to further reduce small files. #62123
Enhanced bucket-aware execution for Iceberg tables to improve concurrency and distribution capabilities of bucketed tables. #61756
Supports the TIME data type in the Paimon catalog. #58292
Upgraded Iceberg version to 1.10.0. #63667

Security and Authentication

In scenarios where JWT authentication and the Iceberg REST Catalog are used, StarRocks supports the passthrough of user login information to Iceberg via the REST Session Catalog for subsequent data access authentication. #59611 #58850
Supports vended credentials for the Iceberg catalog.
Supports granting StarRocks internal roles to external groups obtained via Group Provider. #63385 #63258
Added REFRESH privilege to external tables to control the permission to refresh them. #63385

Storage Optimization and Cluster Management

Introduced  the File Bundling optimization for the cloud-native table in shared-data clusters to automatically bundle the data files generated by loading, Compaction, or Publish operations, thereby reducing the API cost caused by high-frequency access to the external storage system. #58316
Supports Multi-Table Write-Write Transaction to allow users to control the atomic submission of INSERT, UPDATE, and DELETE operations. The transaction supports Stream Load and INSERT INTO interfaces, effectively guaranteeing cross-table consistency in ETL and real-time write scenarios. #61362
Supports Kafka 4.0 for Routine Load.
Supports full-text inverted indexes on Primary Key tables in shared-nothing clusters.
Supports modifying aggregate keys of Aggregate tables. #62253
Supports enabling case-insensitive processing on names of catalogs, databases, tables, views, and materialized views. #61136
Supports blacklisting Compute Nodes in shared-data clusters. #60830
Supports global connection ID. #57256
Added the recyclebin_catalogs metadata view to Information Schema to display recoverable deleted metadata. #51007

Query and Performance Improvement

Supports DECIMAL256 data type, expanding the upper limit of precision from 38 to 76 bits. Its 256-bit storage provides better adaptability to high-precision financial and scientific computing scenarios, effectively mitigating DECIMAL128's precision overflow problem in very large aggregations and high-order operations. #59645
Improved the performance for basic operators.#61691 #61632 #62585 #61405 #61429
Optimized the performance of the JOIN and AGG operators. #61691
[Preview] Introduced SQL Plan Manager to allow users to bind a query plan to a query, thereby preventing the query plan from changing due to system state changes (mainly data updates and statistics updates), thus stabilizing query performance. #56310
Introduced Partition-wise Spillable Aggregate/Distinct operators to replace the original Spill implementation based on sorted aggregation, significantly improving aggregation performance and reducing read/write overhead in complex and high-cardinality GROUP BY scenarios. #60216
Flat JSON V2:
- Supports configuring Flat JSON on the table level. #57379
- Enhance JSON columnar storage by retaining the V1 mechanism while adding page- and segment-level indexes (ZoneMaps, Bloom filters), predicate pushdown with late materialization, dictionary encoding, and integration of a low-cardinality global dictionary to significantly boost execution efficiency. #60953
Supports an adaptive ZoneMap index creation strategy for the STRING data type. #61960
Enhanced query observability:
- Optimized EXPLAIN ANALYZE output to display the execution metrics by group and by operator for better readability. #63326
- QueryDetailActionV2 and QueryProfileActionV2 now support JSON format, enhancing cross-FE query capabilities. #63235
- Supports retrieving Query Profile information across all FEs. #61345
- SHOW PROCESSLIST statements display Catalog, Query ID, and other information. #62552
- Enhanced query queue and process monitoring, supporting display of Running/Pending statuses.#62261
Materialized view rewrites consider the distribution and sort keys of the original table, improving the selection of optimal materialized views. #62830

Functions and SQL Syntax

Added the following functions:
- bitmap_hash64 #56913
- bool_or #57414
- strpos #57278
- to_datetime and to_datetime_ntz #60637
- regexp_count #57182
- tokenize #58965
- format_bytes #61535
- encode_sort_key #61781
- column_size and column_compressed_size #62481
Provides the following syntactic extensions:
- Supports IF NOT EXISTS keywords in CREATE ANALYZE FULL TABLE. #59789
- Supports EXCLUDE clauses in SELECT. #57411
- Supports FILTER clauses in aggregate functions, improving readability and execution efficiency of conditional aggregations. #58937

Behavior Changes

Adjust the logic of the materialized view parameter auto_partition_refresh_number to limit the number of partitions to refresh regardless of auto refresh or manual refresh. #62301
Flat JSON is enabled by default. #62097
The default value of the system variable enable_materialized_view_agg_pushdown_rewrite is set to true, indicating that aggregation pushdown for materialized view query rewrite is enabled by default. #60976
Changed the type of some columns in information_schema.materialized_views to better align with the corresponding data. #60054
The split_part function returns NULL when the delimiter is not matched. #56967
Use STRING to replace fixed-length CHAR in CTAS/CREATE MATERIALIZED VIEW to avoid deducing the wrong column length, which may cause materialized view refresh failures. #63114 #62476
Data Cache-related configurations are simplified. #61640
- datacache_mem_size and datacache_disk_size are now effective.
- storage_page_cache_limit, block_cache_mem_size, block_cache_disk_size are deprecated.
Added new catalog properties (remote_file_cache_memory_ratio for Hive, and iceberg_data_file_cache_memory_usage_ratio and iceberg_delete_file_cache_memory_usage_ratio for Iceberg) to limit the memory resources used for Hive and Iceberg metadata cache, and set the default values to 0.1 (10%). Adjust the metadata cache TTL to 24 hours. #63459 #63373 #61966 #62288
SHOW DATA DISTRIBUTION now will not merge the statistics of all materialized indexes with the same bucket sequence number. It only shows data distribution at the materialized index level. #59656
The default bucket size for automatic bucket tables is changed from 4GB to 1GB to improve performance and resource utilization. #63168
The system determines the Partial Update mode based on the corresponding session variable and the number of columns in the INSERT statement. #62091

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

4.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

StarRocks version 4.0

4.0.0

Data Lake Analytics

Security and Authentication

Storage Optimization and Cluster Management

Query and Performance Improvement

Functions and SQL Syntax

Behavior Changes

Uh oh!