Add doc for how to run nds power over Spark Connect #233

wbo4958 · 2025-11-10T06:42:52Z

This PR add a section in README on how to run nds power over Spark Connect and add a note that the python listener is disabled for now.

To fix the comments

Signed-off-by: Bobby Wang <[email protected]>

wbo4958 · 2025-11-10T06:43:44Z

Hi @tgravescs @jihoonson @gerashegalov @eordentlich, please help review this PR, thx very much.

greptile-apps

Greptile Overview

Greptile Summary

Added documentation section for running NDS Power Run over Spark Connect (Spark 4.0.0+), including installation instructions for pyspark-client and two execution methods: direct command-line execution and notebook API usage.

Key additions:

Installation steps for pyspark-client==4.0.0
Command-line execution example using SPARK_REMOTE environment variable
Notebook API usage example with gen_sql_from_stream and run_query_stream functions
Important note that python listener is disabled in Spark Connect environment (py4j unavailability)

Confidence Score: 5/5

This PR is safe to merge with no risk - documentation-only change
Documentation-only PR that adds clear instructions for running NDS Power Run over Spark Connect. All changes are isolated to README.md, properly formatted, technically accurate, and aligned with the existing codebase implementation (PysparkBenchReport.py). The note about disabled python listener correctly reflects the code behavior.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
nds/README.md	5/5	Added comprehensive Spark Connect documentation with setup instructions and python listener note

Sequence Diagram

sequenceDiagram
    participant User
    participant LocalEnv as Local Environment
    participant PySparkClient as PySpark Client
    participant SparkConnect as Spark Connect Server
    participant SparkCluster as Spark Cluster

    User->>LocalEnv: Install pyspark-client==4.0.0
    User->>LocalEnv: Set SPARK_REMOTE=sc://localhost
    
    alt Command-line execution
        User->>LocalEnv: python nds_power.py [args]
        LocalEnv->>PySparkClient: Initialize PySpark session
        PySparkClient->>SparkConnect: Connect via SPARK_REMOTE
        SparkConnect->>SparkCluster: Execute queries
        SparkCluster-->>SparkConnect: Query results
        SparkConnect-->>PySparkClient: Return results
        PySparkClient-->>LocalEnv: Write time.csv
        Note over PySparkClient,SparkConnect: Python listener disabled (no py4j)
    else Notebook API execution
        User->>LocalEnv: Import nds_power APIs
        LocalEnv->>PySparkClient: gen_sql_from_stream()
        LocalEnv->>PySparkClient: run_query_stream()
        PySparkClient->>SparkConnect: Connect and execute
        SparkConnect->>SparkCluster: Run queries
        SparkCluster-->>SparkConnect: Results
        SparkConnect-->>PySparkClient: Return data
        PySparkClient-->>LocalEnv: Save results
    end

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

jihoonson

Looks nice, left a minor suggestion.

jihoonson · 2025-11-11T00:18:23Z

nds/README.md

 --output_format parquet
 ```

+### Power Run over Spark Connect


Can you add a subsection at Line 347 titled "Power Run with spark-submit"? I'd also add some introduction like below:

Users can use the `spark-submit-template` script to run the power run with spark-submit. An example command to submit nds_power.py by spark-submit-template utility is: ...

Sounds good. Done.

nds/README.md

Co-authored-by: Gera Shegalov <[email protected]>

greptile-apps · 2025-11-11T02:04:39Z

Greptile Overview

Greptile Summary

Added comprehensive documentation for running NDS Power benchmarks over Spark Connect. The documentation includes installation prerequisites, CLI usage examples, notebook API usage examples, and an important note about Python listener being disabled in Spark Connect environments.

Restructured Power Run section with clearer subsection headers for spark-submit and Spark Connect execution paths
Added pyspark-client installation instructions for local Spark Connect client setup
Provided CLI example with SPARK_REMOTE environment variable configuration
Included notebook API usage example showing how to import and use gen_sql_from_stream and run_query_stream functions directly
Documented limitation that Python listener is disabled when running over Spark Connect due to py4j unavailability

Confidence Score: 5/5

This PR is safe to merge with no risk - it only adds documentation with no code changes
Score reflects that this is a documentation-only PR that accurately describes the Spark Connect functionality implemented in PR Support running nds_power over spark connect #226. The documentation is clear, well-structured, includes proper examples for both CLI and notebook usage, and correctly notes the Python listener limitation. No code changes means zero risk of introducing bugs or breaking existing functionality.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
nds/README.md	5/5	Added comprehensive documentation for running NDS Power over Spark Connect with installation instructions, usage examples, and important notes about listener limitations

Sequence Diagram

sequenceDiagram
    participant User
    participant README as README.md Documentation
    participant pyspark_client as pyspark-client (local)
    participant nds_power as nds_power.py
    participant SparkConnect as Spark Connect Server
    participant SparkSession
    
    Note over User,SparkConnect: Setup Phase
    User->>pyspark_client: pip install pyspark-client==4.0.0
    User->>User: export SPARK_REMOTE=sc://localhost
    
    Note over User,SparkSession: Execution Phase (CLI)
    User->>nds_power: python nds_power.py parquet_sf3k query_0.sql time.csv
    nds_power->>SparkConnect: Connect via SPARK_REMOTE
    SparkConnect->>SparkSession: Create SparkSession
    nds_power->>nds_power: Check is_remote_only() == True
    nds_power->>nds_power: Skip python listener registration
    nds_power->>SparkSession: Execute queries via Spark Connect
    SparkSession-->>nds_power: Query results
    nds_power-->>User: time.csv with execution times
    
    Note over User,SparkSession: Execution Phase (Notebook API)
    User->>nds_power: from nds_power import gen_sql_from_stream, run_query_stream
    User->>nds_power: gen_sql_from_stream(query_stream_file)
    nds_power-->>User: query_dict
    User->>nds_power: run_query_stream(input_prefix, query_dict, ...)
    nds_power->>SparkConnect: Connect via SPARK_REMOTE
    SparkConnect->>SparkSession: Create SparkSession
    nds_power->>nds_power: Check is_remote_only() == True
    nds_power->>nds_power: Skip python listener registration
    nds_power->>SparkSession: Execute queries via Spark Connect
    SparkSession-->>nds_power: Query results
    nds_power-->>User: Execution complete with time logs

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

tgravescs · 2025-11-14T14:34:00Z

nds/README.md

+```
+
+`Note:` the python listener is disabled when running nds_power.py over Spark Connect, as py4j
+is not available in the Spark Connect environment.


it would be nice to say what this affects. if it affects the reporting of task failures I think that would be good to add here.

Add doc for how to run nds power over Spark Connect

fd0fb4a

Signed-off-by: Bobby Wang <[email protected]>

greptile-apps bot reviewed Nov 10, 2025

View reviewed changes

jihoonson reviewed Nov 11, 2025

View reviewed changes

gerashegalov reviewed Nov 11, 2025

View reviewed changes

nds/README.md Outdated Show resolved Hide resolved

Update nds/README.md

0142e6a

Co-authored-by: Gera Shegalov <[email protected]>

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

comments

645f866

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

wbo4958 requested review from gerashegalov and jihoonson November 12, 2025 03:28

jihoonson approved these changes Nov 13, 2025

View reviewed changes

wbo4958 merged commit 632c551 into NVIDIA:dev Nov 13, 2025
2 checks passed

wbo4958 deleted the connect-doc branch November 13, 2025 23:13

tgravescs reviewed Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add doc for how to run nds power over Spark Connect #233

Add doc for how to run nds power over Spark Connect #233

Uh oh!

wbo4958 commented Nov 10, 2025

Uh oh!

wbo4958 commented Nov 10, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

jihoonson left a comment

Uh oh!

jihoonson Nov 11, 2025

Uh oh!

wbo4958 Nov 11, 2025

Uh oh!

Uh oh!

greptile-apps bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

tgravescs Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add doc for how to run nds power over Spark Connect #233

Add doc for how to run nds power over Spark Connect #233

Uh oh!

Conversation

wbo4958 commented Nov 10, 2025

Uh oh!

wbo4958 commented Nov 10, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

jihoonson Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

wbo4958 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tgravescs Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

greptile-apps bot commented Nov 11, 2025 •

edited

Loading