01 Dec 17:20

menishmueli

747adc9

Version 0.7.0 Latest

Latest

Release Notes - v0.7.0

Release Date: December 1, 2025

🎉 What's New

Delta Lake Instrumentation 🚀

This release introduces comprehensive Delta Lake monitoring and instrumentation capabilities:

Delta Lake Table Monitoring: New spark.dataflint.instrument.deltalake configuration flag to enable Delta Lake-specific instrumentation
Delta Lake Scan Page: New dedicated UI page showing Delta Lake scan operations and metrics
Full Table Scan Detection: Automatic alerts for full table scans on Delta Lake tables to help identify performance issues
Z-Order Cache Tracking: Monitor Z-Order optimization cache usage in table properties
Delta Log Integration: Direct integration with Delta Lake's cached snapshots for improved performance monitoring

Enhanced UI & User Experience 📊

Alerts Tab Improvements

Grouped Alerts: Alerts are now organized by alert type for better visibility and navigation
Search Functionality: New description search bar to quickly find specific alerts
Spill Selector: New UI component to identify and navigate to operations with data spills
Duration/Alert Navigation: Improved button logic for advancing through alerts by duration or index

SQL Flow Enhancements

Subquery Differentiation: Better visual differentiation for subqueries in the SQL execution plan
Union Support: Improved stage identification algorithm for UNION operations and missing nodes with same-stage neighbors
SQL Text Display: Enhanced SQL text rendering and display

JDBC Support

JDBC Scan Detection: Better support for JDBC scan operations with dedicated parsing and visualization
JDBC Examples: New comprehensive JDBC example demonstrating monitoring capabilities

Telemetry & Analytics 📈

Scarf Pixel Integration: Optional telemetry to help monitor OSS usage patterns
- Can be disabled with spark.dataflint.telemetry.enabled=false flag

Technical Improvements 🔧

Core Enhancements

Delta Lake Reflection Utils: New utility classes for Delta Lake introspection and monitoring
Delta Table Path Parser: Robust parsing of Delta Lake table paths and identifiers
Improved Metrics Processing: Enhanced metric processors for better performance data collection

Bug Fixes

Fixed bytesToHumanReadableSize utility to handle comma-separated values correctly
Fixed read parser unit tests
Improved central snapshot deployment configuration
Better handling of missing nodes in stage identification

Build & CI/CD

Updated CI/CD workflows for improved reliability
Enhanced build configuration for Spark 3.x and 4.x compatibility
Improved artifact publishing process

📝 Full Changelog

Full Changelog: v0.6.1...v0.7.0

Features

Delta Lake instrumentation and monitoring (#multiple commits)
Alert tab grouping by alert type (bce9c5f)
Spill selector component (0051e5c)
Description search bar (e9f1a36)
Scarf pixel telemetry integration (71d8287)
SQL text improvements (#39)
Subquery UI differentiation (4ec35fb)
JDBC scan support (3e2060f)
Full scan table alerts for Delta Lake (b8709cc)
Z-Order cache tracking (8718974)

Bug Fixes & Improvements

Improved stage identification algorithm for unions (d9e2902)
Fixed duration/alert navigation logic (f7b8c16)
Fixed read parser unit tests (b49efb7)
Fixed bytesToHumanReadableSize comma handling (c7e3695)
Improved Delta Lake listener implementation (def2b01)
Refactored listener architecture (3edef71)
Enhanced Delta Log integration (120a910, 1bc376c)
Added more supported SQL plan nodes (6e5b64e)

CI/CD & Build

CI improvements (#38)
Fixed central snapshot deployment (5e9a966)
Updated README and documentation (18810dc, #36)

🙏 Contributors

Special thanks to:

@menishmueli - Core development and features
@cxzl25 - SQL text improvements and CI enhancements
@daniel Aronovich - Documentation updates

📚 Documentation

For detailed usage instructions, see the README.

For Delta Lake instrumentation setup:

spark.conf.set("spark.dataflint.instrument.deltalake", "true")

To disable telemetry:

spark.conf.set("spark.dataflint.telemetry.enabled", "false")

New Contributors

@cxzl25 made their first contribution in #38

Contributors

daniel, cxzl25, and menishmueli

Assets 2

08 Oct 14:21

menishmueli

v0.6.1

615415f

Version 0.6.1

Fix POM dependencies for users who import DataFlint using mvn
Improved delta lake improved support - inserts, optimized writes and optimized command

Assets 2

21 Sep 16:16

menishmueli

v0.6.0

34eceac

Version 0.6.0

2 major changes:

Spark 4 Support ⭐
DataFlint now support spark 4! due to breaking changes in spark, it does require using different dataflint artifact name:
io.dataflint:dataflint_spark4_2.13:0.6.0
(Instead of io.dataflint:dataflint_spark4_2.13:0.6.0 ) for spark 3

Stage identification by stages with statistics 📊
This means better DataFlint support for spark accelerators such as Nvidia RAPIDs for Spark.
Now DataFlint can also identify sql node stages by metrics with statistics that mentioned the stage and task data it got the min/median/max from.

Assets 2

01 Sep 19:40

menishmueli

v0.5.1

c1b92e5

v0.5.1

🚀 New Features

Enhanced Navigation & User Experience
Added 2 new navigation icons for quick access to:
Slowest node in the execution plan
Nodes with alerts/issues
Added visual indicators in the minimap showing where alerts are located
Added color coding for minimap based on node performance state
Improved Node Support
Added support for "ShuffledHashJoin" node (with "d" suffix as used in Databricks)
Added support for search window nodes that don't have associated stages (common in Databricks window functions outside codegen cluster nodes)

🎨 Visual & UI Improvements

Node Visualization Enhancements
Unified color system for all nodes to ensure consistency
Tuned node performance colors for better visual clarity
Increased metrics text font size in nodes for improved readability
Added color coding for minimap based on node state
Stage Distribution Improvements
Extended shuffle read/write metrics to all stage distributions (not limited to exchange nodes)
Fixed ordering of read/write nodes in exchange stages for better logical flow

⚡ Performance Optimizations

Real-time Performance
Significant performance improvements for SQL plan visualization during real-time execution with live updates
Better aggregation of task attempts for more accurate node duration calculations
Added capping mechanism for codegen duration that exceeds stage duration

🐛 Bug Fixes & Error Handling

Connection & Error Management
Added proper error messaging for "server disconnected" modal
Improved error handling and user feedback

🔧 Technical Improvements

Code Quality & Maintenance
Standardized color management system across all node types
Enhanced metrics calculation accuracy
Improved real-time data processing efficiency

Full Changelog: v0.5.0...v0.5.1

Assets 2

24 Aug 05:22

menishmueli

v0.5.0

881d9aa

Version 0.5.0

New and updated design for the sql plan nodes

Assets 2

13 Aug 11:52

menishmueli

v0.4.4

2ce28a7

Version 0.4.4

What's Changed

Adding support for Expand nodes, and calculating the expand ratio
Improving aggregation node names - Aggregate (partial_count) is now Count Within Partition
Projecting (i.e. selecting) all fields is shown as * (instead of empty text in Spark UI)

Full Changelog: v0.4.3...v0.4.4

Assets 2

19 Jul 21:10

menishmueli

v0.4.3

640ba90

Version 0.4.3

Added query url support for sql_id and node_ids

Assets 2

30 Jun 12:54

menishmueli

v0.4.2

9ade92c

Version 0.4.2

Support zorder based pruning metrics in databricks
Showing python UDF function name in filter/select nodes
Add support for Generate (explode, inline, etc) node

Assets 2

03 Jun 07:07

menishmueli

v0.4.1

ae8495c

Version 0.4.1

Support DBR 15/16

Assets 2

27 May 18:38

menishmueli

v0.4.0

da770f7

Version 0.4.0

Support map by pandas and arrow functions
Added new flag to silence alert for a job -
spark.dataflint.alert.disabled
Which accepts a column seperated list of alerts such as:
smallTasks,idleCoresTooHigh
Added short recommendation on top of alert
Updated DataFlint logo
Support better stage identifications for varios readers
Shows stage failures with an orange V on sql node and list of complete stage failures

Assets 2

Releases: dataflint/spark

Version 0.7.0

Release Notes - v0.7.0

🎉 What's New

Delta Lake Instrumentation 🚀

Enhanced UI & User Experience 📊

Alerts Tab Improvements

SQL Flow Enhancements

JDBC Support

Telemetry & Analytics 📈

Technical Improvements 🔧

Core Enhancements

Bug Fixes

Build & CI/CD

📝 Full Changelog

Features

Bug Fixes & Improvements

CI/CD & Build

🙏 Contributors

📚 Documentation

New Contributors

Contributors

Uh oh!

Version 0.6.1

Uh oh!

Version 0.6.0

Uh oh!

v0.5.1

Uh oh!

Version 0.5.0

Uh oh!

Version 0.4.4

What's Changed

Uh oh!

Version 0.4.3

Uh oh!

Version 0.4.2

Uh oh!

Version 0.4.1

Uh oh!

Version 0.4.0

Uh oh!