Skip to content

Releases: dataflint/spark

Version 0.7.0

01 Dec 17:20

Choose a tag to compare

Release Notes - v0.7.0

Release Date: December 1, 2025

πŸŽ‰ What's New

Delta Lake Instrumentation πŸš€

This release introduces comprehensive Delta Lake monitoring and instrumentation capabilities:

  • Delta Lake Table Monitoring: New spark.dataflint.instrument.deltalake configuration flag to enable Delta Lake-specific instrumentation
  • Delta Lake Scan Page: New dedicated UI page showing Delta Lake scan operations and metrics
  • Full Table Scan Detection: Automatic alerts for full table scans on Delta Lake tables to help identify performance issues
  • Z-Order Cache Tracking: Monitor Z-Order optimization cache usage in table properties
  • Delta Log Integration: Direct integration with Delta Lake's cached snapshots for improved performance monitoring

Enhanced UI & User Experience πŸ“Š

Alerts Tab Improvements

  • Grouped Alerts: Alerts are now organized by alert type for better visibility and navigation
  • Search Functionality: New description search bar to quickly find specific alerts
  • Spill Selector: New UI component to identify and navigate to operations with data spills
  • Duration/Alert Navigation: Improved button logic for advancing through alerts by duration or index

SQL Flow Enhancements

  • Subquery Differentiation: Better visual differentiation for subqueries in the SQL execution plan
  • Union Support: Improved stage identification algorithm for UNION operations and missing nodes with same-stage neighbors
  • SQL Text Display: Enhanced SQL text rendering and display

JDBC Support

  • JDBC Scan Detection: Better support for JDBC scan operations with dedicated parsing and visualization
  • JDBC Examples: New comprehensive JDBC example demonstrating monitoring capabilities

Telemetry & Analytics πŸ“ˆ

  • Scarf Pixel Integration: Optional telemetry to help monitor OSS usage patterns
    • Can be disabled with spark.dataflint.telemetry.enabled=false flag

Technical Improvements πŸ”§

Core Enhancements

  • Delta Lake Reflection Utils: New utility classes for Delta Lake introspection and monitoring
  • Delta Table Path Parser: Robust parsing of Delta Lake table paths and identifiers
  • Improved Metrics Processing: Enhanced metric processors for better performance data collection

Bug Fixes

  • Fixed bytesToHumanReadableSize utility to handle comma-separated values correctly
  • Fixed read parser unit tests
  • Improved central snapshot deployment configuration
  • Better handling of missing nodes in stage identification

Build & CI/CD

  • Updated CI/CD workflows for improved reliability
  • Enhanced build configuration for Spark 3.x and 4.x compatibility
  • Improved artifact publishing process

πŸ“ Full Changelog

Full Changelog: v0.6.1...v0.7.0

Features

  • Delta Lake instrumentation and monitoring (#multiple commits)
  • Alert tab grouping by alert type (bce9c5f)
  • Spill selector component (0051e5c)
  • Description search bar (e9f1a36)
  • Scarf pixel telemetry integration (71d8287)
  • SQL text improvements (#39)
  • Subquery UI differentiation (4ec35fb)
  • JDBC scan support (3e2060f)
  • Full scan table alerts for Delta Lake (b8709cc)
  • Z-Order cache tracking (8718974)

Bug Fixes & Improvements

  • Improved stage identification algorithm for unions (d9e2902)
  • Fixed duration/alert navigation logic (f7b8c16)
  • Fixed read parser unit tests (b49efb7)
  • Fixed bytesToHumanReadableSize comma handling (c7e3695)
  • Improved Delta Lake listener implementation (def2b01)
  • Refactored listener architecture (3edef71)
  • Enhanced Delta Log integration (120a910, 1bc376c)
  • Added more supported SQL plan nodes (6e5b64e)

CI/CD & Build

  • CI improvements (#38)
  • Fixed central snapshot deployment (5e9a966)
  • Updated README and documentation (18810dc, #36)

πŸ™ Contributors

Special thanks to:

  • @menishmueli - Core development and features
  • @cxzl25 - SQL text improvements and CI enhancements
  • @daniel Aronovich - Documentation updates

πŸ“š Documentation

For detailed usage instructions, see the README.

For Delta Lake instrumentation setup:

spark.conf.set("spark.dataflint.instrument.deltalake", "true")

To disable telemetry:

spark.conf.set("spark.dataflint.telemetry.enabled", "false")

New Contributors

Version 0.6.1

08 Oct 14:21

Choose a tag to compare

  • Fix POM dependencies for users who import DataFlint using mvn
  • Improved delta lake improved support - inserts, optimized writes and optimized command

Version 0.6.0

21 Sep 16:16

Choose a tag to compare

2 major changes:

Spark 4 Support ⭐
DataFlint now support spark 4! due to breaking changes in spark, it does require using different dataflint artifact name:
io.dataflint:dataflint_spark4_2.13:0.6.0
(Instead of io.dataflint:dataflint_spark4_2.13:0.6.0 ) for spark 3

Stage identification by stages with statistics πŸ“Š
This means better DataFlint support for spark accelerators such as Nvidia RAPIDs for Spark.
Now DataFlint can also identify sql node stages by metrics with statistics that mentioned the stage and task data it got the min/median/max from.

v0.5.1

01 Sep 19:40

Choose a tag to compare

πŸš€ New Features

  • Enhanced Navigation & User Experience
  • Added 2 new navigation icons for quick access to:
  • Slowest node in the execution plan
  • Nodes with alerts/issues
  • Added visual indicators in the minimap showing where alerts are located
  • Added color coding for minimap based on node performance state
  • Improved Node Support
  • Added support for "ShuffledHashJoin" node (with "d" suffix as used in Databricks)
  • Added support for search window nodes that don't have associated stages (common in Databricks window functions outside codegen cluster nodes)

🎨 Visual & UI Improvements

  • Node Visualization Enhancements
  • Unified color system for all nodes to ensure consistency
  • Tuned node performance colors for better visual clarity
  • Increased metrics text font size in nodes for improved readability
  • Added color coding for minimap based on node state
  • Stage Distribution Improvements
  • Extended shuffle read/write metrics to all stage distributions (not limited to exchange nodes)
  • Fixed ordering of read/write nodes in exchange stages for better logical flow

⚑ Performance Optimizations

  • Real-time Performance
  • Significant performance improvements for SQL plan visualization during real-time execution with live updates
  • Better aggregation of task attempts for more accurate node duration calculations
  • Added capping mechanism for codegen duration that exceeds stage duration

πŸ› Bug Fixes & Error Handling

  • Connection & Error Management
  • Added proper error messaging for "server disconnected" modal
  • Improved error handling and user feedback

πŸ”§ Technical Improvements

  • Code Quality & Maintenance
  • Standardized color management system across all node types
  • Enhanced metrics calculation accuracy
  • Improved real-time data processing efficiency

Full Changelog: v0.5.0...v0.5.1

Version 0.5.0

24 Aug 05:22

Choose a tag to compare

New and updated design for the sql plan nodes

Version 0.4.4

13 Aug 11:52

Choose a tag to compare

What's Changed

  • Adding support for Expand nodes, and calculating the expand ratio
  • Improving aggregation node names - Aggregate (partial_count) is now Count Within Partition
  • Projecting (i.e. selecting) all fields is shown as * (instead of empty text in Spark UI)

Full Changelog: v0.4.3...v0.4.4

Version 0.4.3

19 Jul 21:10

Choose a tag to compare

Added query url support for sql_id and node_ids

Version 0.4.2

30 Jun 12:54

Choose a tag to compare

  • Support zorder based pruning metrics in databricks
  • Showing python UDF function name in filter/select nodes
  • Add support for Generate (explode, inline, etc) node

Version 0.4.1

03 Jun 07:07

Choose a tag to compare

  • Support DBR 15/16

Version 0.4.0

27 May 18:38

Choose a tag to compare

  • Support map by pandas and arrow functions
  • Added new flag to silence alert for a job -
    spark.dataflint.alert.disabled
    Which accepts a column seperated list of alerts such as:
    smallTasks,idleCoresTooHigh
  • Added short recommendation on top of alert
  • Updated DataFlint logo
  • Support better stage identifications for varios readers
  • Shows stage failures with an orange V on sql node and list of complete stage failures