Skip to content

Conversation

@oracleloyall
Copy link
Contributor

In resource-constrained environments, 10TB-scale 8-parallel processing in cloudberry may encounter specific anomalies related to Motion layer UDP communication. Below are four key scenarios and how the code modifications address them.

Four Anomaly Scenarios

  1. Capacity Mismatch:
    The receiving end’s buffer becomes full, but the sender is unaware. As a result, the sender’s unacknowledged packet queue continues transmitting, leading to unnecessary retransmissions and packet drops.

  2. False Deadlock Detection:
    The peer node processes heartbeat packets but fails to free up buffer capacity. This triggers a false deadlock judgment, incorrectly flagging network anomalies.

  3. Unprocessed Packets Require Main Thread Wakeup:
    When the receive queue is full, incoming data packets are discarded. However, the main thread still needs to be awakened to process backlogged packets in the queue, preventing permanent stalling.

  4. Execution Time Mismatch Across Nodes:
    Issues like data skew, computational performance gaps, or I/O bottlenecks cause significant differences in execution time between nodes. For example, in a hash join, if the inner table’s is not ready, the node cannot process data from other nodes, leading to packet timeouts.
    Example Plan: Packets from to (via ) timeout because the in remains unready, blocking packet processing.

                                                        ->  Redistribute Motion 96:96  (slice10; segments: 96)  (cost=0.00..27670.24 rows=791688 wi
dth=14)
                                                               Hash Key: web_sales.ws_item_sk
                                                               ->  Hash Semi Join  (cost=0.00..27635.55 rows=791688 width=14)
                                                                     Hash Cond: (web_sales.ws_bill_customer_sk = share2_ref3.c_customer_sk)
                                                                     ->  Redistribute Motion 96:96  (slice11; segments: 96)  (cost=0.00..26675.88 ro
ws=1994891 width=18)
                                                                           Hash Key: web_sales.ws_bill_customer_sk
                                                                           ->  Hash Join  (cost=0.00..26563.48 rows=1994891 width=18)
                                                                                 Hash Cond: (web_sales.ws_sold_date_sk = date_dim_3.d_date_sk)
                                                                                 ->  Seq Scan on web_sales  (cost=0.00..8887.20 rows=74999595 width=
22)
                                                                                 ->  Hash  (cost=441.83..441.83 rows=49 width=4)
                                                                                       ->  Seq Scan on date_dim date_dim_3  (cost=0.00..441.83 rows=
49 width=4)
                                                                                             Filter: ((d_year = 1999) AND (d_moy = 2))
                                                                     ->  Hash  (cost=436.83..436.83 rows=605155 width=4)
                                                                           ->  Shared Scan (share slice:id 10:2)  (cost=0.00..436.83 rows=605155 wid
th=4)
                                                         ->  Hash  (cost=5289.15..5289.15 rows=3243 width=4)
                                                               ->  HashAggregate  (cost=0.00..5289.15 rows=3243 width=4)
                                                                     Group Key: share0_ref3.i_item_sk
                                                                     ->  Shared Scan (share slice:id 7:0)  (cost=0.00..791.01 rows=37345054 width=4)

Code Modifications and Their Impact

The code changes target the above scenarios by enhancing UDP communication feedback, adjusting deadlock checks, and ensuring proper thread wakeup. Key modifications:

  1. Addressing Capacity Mismatch:

    • Added (256) to flag when the receive buffer is full.
    • When the receive queue is full (), a response with is sent to the sender (). This notifies the sender to pause or adjust transmission, preventing blind retransmissions.
  2. Fixing False Deadlock Detection:

    • Modified to accept as a parameter, enabling ACK polling during deadlock checks.
    • Extended the initial timeout for deadlock suspicion from to 600 seconds, reducing premature network error reports.
    • If no response is received after 600 seconds, the buffer capacity is incrementally increased () to alleviate false bottlenecks, with detailed logging before triggering an error.
  3. Ensuring Main Thread Wakeup on Full Queue:

    • In , even when packets are dropped due to a full queue, the main thread is awakened () if the packet matches the waiting query/node/route. This ensures backlogged packets in the queue are processed.
  4. Mitigating Node Execution Mismatches:

    • Added logging for retransmissions after attempts, providing visibility into prolonged packet delays (e.g., due to unready ).
    • Reset after successful ACK polling, preventing excessive retry counts from triggering false timeouts.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@oracleloyall oracleloyall changed the title Fix: tcpcds high concurrency caused UDP Hung Fix: tcpds high concurrency caused UDP Hung Nov 13, 2025
@oracleloyall oracleloyall force-pushed the main branch 2 times, most recently from d5fbbf8 to 844c091 Compare November 13, 2025 07:05
…g in cloudberry may encounter specific anomalies related to Motion layer UDP communication. Below are four key scenarios and how the code modifications address them.

Four Anomaly Scenarios

1. Capacity Mismatch:
   The receiving end’s buffer becomes full, but the sender is unaware. As a result, the sender’s unacknowledged packet queue continues transmitting, leading to unnecessary retransmissions and packet drops.

2. False Deadlock Detection:
   The peer node processes heartbeat packets but fails to free up buffer capacity. This triggers a false deadlock judgment, incorrectly flagging network anomalies.

3. Unprocessed Packets Require Main Thread Wakeup:
   When the receive queue is full, incoming data packets are discarded. However, the main thread still needs to be awakened to process backlogged packets in the queue, preventing permanent stalling.

4. Execution Time Mismatch Across Nodes:
   Issues like data skew, computational performance gaps, or I/O bottlenecks cause significant differences in execution time between nodes. For example, in a hash join, if the inner table’s  is not ready, the node cannot process data from other nodes, leading to packet timeouts.
   *Example Plan*: Packets from  to  (via ) timeout because the  in  remains unready, blocking packet processing.

Code Modifications and Their Impact

The code changes target the above scenarios by enhancing UDP communication feedback, adjusting deadlock checks, and ensuring proper thread wakeup. Key modifications:

1. Addressing Capacity Mismatch:
   - Added  (256) to flag when the receive buffer is full.
   - When the receive queue is full (), a response with  is sent to the sender (). This notifies the sender to pause or adjust transmission, preventing blind retransmissions.

2. Fixing False Deadlock Detection:
   - Modified  to accept  as a parameter, enabling ACK polling during deadlock checks.
   - Extended the initial timeout for deadlock suspicion from  to 600 seconds, reducing premature network error reports.
   - If no response is received after 600 seconds, the buffer capacity is incrementally increased () to alleviate false bottlenecks, with detailed logging before triggering an error.

3. Ensuring Main Thread Wakeup on Full Queue:
   - In , even when packets are dropped due to a full queue, the main thread is awakened () if the packet matches the waiting query/node/route. This ensures backlogged packets in the queue are processed.

4. Mitigating Node Execution Mismatches:
   - Added logging for retransmissions after  attempts, providing visibility into prolonged packet delays (e.g., due to unready ).
   - Reset  after successful ACK polling, preventing excessive retry counts from triggering false timeouts.
@my-ship-it my-ship-it merged commit acec354 into apache:main Nov 18, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants