StarRocks version 4.0
4.0.0
Release date: October 17, 2025
Data Lake Analytics
- Unified Page Cache and Data Cache for BE metadata, and adopted an adaptive strategy for scaling. #61640
- Optimized metadata file parsing for Iceberg statistics to avoid repetitive parsing. #59955
- Optimized COUNT/MIN/MAX queries against Iceberg metadata by efficiently skipping over data file scans, significantly improving aggregation query performance on large partitioned tables and reducing resource consumption. #60385
- Supports compaction for Iceberg tables via procedure rewrite_data_files.
- Supports Iceberg tables with hidden partitions, including creating, writing, and reading the tables. #58914
- Supports setting sort keys when creating Iceberg tables.
- Optimizes sink performance for Iceberg tables.
- Iceberg Sink supports spilling large operators, global shuffle, and local sorting to optimize memory usage and address small file issues. #61963
- Iceberg Sink optimizes local sorting based on Spill Partition Writer to improve write efficiency. #62096
- Iceberg Sink supports global shuffle for partitions to further reduce small files. #62123
 
- Enhanced bucket-aware execution for Iceberg tables to improve concurrency and distribution capabilities of bucketed tables. #61756
- Supports the TIME data type in the Paimon catalog. #58292
- Upgraded Iceberg version to 1.10.0. #63667
Security and Authentication
- In scenarios where JWT authentication and the Iceberg REST Catalog are used, StarRocks supports the passthrough of user login information to Iceberg via the REST Session Catalog for subsequent data access authentication. #59611 #58850
- Supports vended credentials for the Iceberg catalog.
- Supports granting StarRocks internal roles to external groups obtained via Group Provider. #63385 #63258
- Added REFRESH privilege to external tables to control the permission to refresh them. #63385
Storage Optimization and Cluster Management
- Introduced the File Bundling optimization for the cloud-native table in shared-data clusters to automatically bundle the data files generated by loading, Compaction, or Publish operations, thereby reducing the API cost caused by high-frequency access to the external storage system. #58316
- Supports Multi-Table Write-Write Transaction to allow users to control the atomic submission of INSERT, UPDATE, and DELETE operations. The transaction supports Stream Load and INSERT INTO interfaces, effectively guaranteeing cross-table consistency in ETL and real-time write scenarios. #61362
- Supports Kafka 4.0 for Routine Load.
- Supports full-text inverted indexes on Primary Key tables in shared-nothing clusters.
- Supports modifying aggregate keys of Aggregate tables. #62253
- Supports enabling case-insensitive processing on names of catalogs, databases, tables, views, and materialized views. #61136
- Supports blacklisting Compute Nodes in shared-data clusters. #60830
- Supports global connection ID. #57256
- Added the recyclebin_catalogsmetadata view to Information Schema to display recoverable deleted metadata. #51007
Query and Performance Improvement
- Supports DECIMAL256 data type, expanding the upper limit of precision from 38 to 76 bits. Its 256-bit storage provides better adaptability to high-precision financial and scientific computing scenarios, effectively mitigating DECIMAL128's precision overflow problem in very large aggregations and high-order operations. #59645
- Improved the performance for basic operators.#61691 #61632 #62585 #61405 #61429
- Optimized the performance of the JOIN and AGG operators. #61691
- [Preview] Introduced SQL Plan Manager to allow users to bind a query plan to a query, thereby preventing the query plan from changing due to system state changes (mainly data updates and statistics updates), thus stabilizing query performance. #56310
- Introduced Partition-wise Spillable Aggregate/Distinct operators to replace the original Spill implementation based on sorted aggregation, significantly improving aggregation performance and reducing read/write overhead in complex and high-cardinality GROUP BY scenarios. #60216
- Flat JSON V2:
- Supports configuring Flat JSON on the table level. #57379
- Enhance JSON columnar storage by retaining the V1 mechanism while adding page- and segment-level indexes (ZoneMaps, Bloom filters), predicate pushdown with late materialization, dictionary encoding, and integration of a low-cardinality global dictionary to significantly boost execution efficiency. #60953
 
- Supports an adaptive ZoneMap index creation strategy for the STRING data type. #61960
- Enhanced query observability:
- Optimized EXPLAIN ANALYZE output to display the execution metrics by group and by operator for better readability. #63326
- QueryDetailActionV2and- QueryProfileActionV2now support JSON format, enhancing cross-FE query capabilities. #63235
- Supports retrieving Query Profile information across all FEs. #61345
- SHOW PROCESSLIST statements display Catalog, Query ID, and other information. #62552
- Enhanced query queue and process monitoring, supporting display of Running/Pending statuses.#62261
 
- Materialized view rewrites consider the distribution and sort keys of the original table, improving the selection of optimal materialized views. #62830
Functions and SQL Syntax
- Added the following functions:
- Provides the following syntactic extensions:
Behavior Changes
- Adjust the logic of the materialized view parameter auto_partition_refresh_numberto limit the number of partitions to refresh regardless of auto refresh or manual refresh. #62301
- Flat JSON is enabled by default. #62097
- The default value of the system variable enable_materialized_view_agg_pushdown_rewriteis set totrue, indicating that aggregation pushdown for materialized view query rewrite is enabled by default. #60976
- Changed the type of some columns in information_schema.materialized_viewsto better align with the corresponding data. #60054
- The split_partfunction returns NULL when the delimiter is not matched. #56967
- Use STRING to replace fixed-length CHAR in CTAS/CREATE MATERIALIZED VIEW to avoid deducing the wrong column length, which may cause materialized view refresh failures. #63114 #62476
- Data Cache-related configurations are simplified. #61640
- datacache_mem_sizeand- datacache_disk_sizeare now effective.
- storage_page_cache_limit,- block_cache_mem_size,- block_cache_disk_sizeare deprecated.
 
- Added new catalog properties (remote_file_cache_memory_ratiofor Hive, andiceberg_data_file_cache_memory_usage_ratioandiceberg_delete_file_cache_memory_usage_ratiofor Iceberg) to limit the memory resources used for Hive and Iceberg metadata cache, and set the default values to0.1(10%). Adjust the metadata cache TTL to 24 hours. #63459 #63373 #61966 #62288
- SHOW DATA DISTRIBUTION now will not merge the statistics of all materialized indexes with the same bucket sequence number. It only shows data distribution at the materialized index level. #59656
- The default bucket size for automatic bucket tables is changed from 4GB to 1GB to improve performance and resource utilization. #63168
- The system determines the Partial Update mode based on the corresponding session variable and the number of columns in the INSERT statement. #62091