Skip to content

Conversation

@acul71
Copy link
Contributor

@acul71 acul71 commented Nov 10, 2025

Problem

The test test_kbucket_split_behavior in tests/core/kad_dht/test_unit_routing_table.py was failing intermittently in CI/CD, as reported in discussion #1023 and issue #1032.

Root Cause

The mock_host fixture in the TestRoutingTable class didn't properly mock host.new_stream(). When a bucket is full, the KBucket.add_peer() method tries to ping the oldest peer using _ping_peer(), which calls host.new_stream(). Without proper mocking, the behavior was unpredictable, causing intermittent test failures.

Solution

Mock host.new_stream() in the TestRoutingTable.mock_host fixture to raise an exception, simulating unresponsive peers. This makes the test deterministic and matches real production behavior where new_stream() fails for unreachable peers.

Changes

  • Added host.new_stream = AsyncMock(side_effect=Exception("Mock stream error")) to the mock_host fixture
  • This ensures ping operations fail consistently, allowing peers to be added when buckets are full
  • Matches real production behavior where unresponsive peers are replaced

Testing

  • ✅ Test passes consistently (verified with multiple runs)
  • ✅ All other tests in the file still pass (16/16)
  • ✅ No linting errors

Related

scacaca and others added 8 commits September 25, 2025 16:07
add checking peer_id in indentify
- Add peer ID validation to prevent peer ID spoofing attacks
- Reject signed peer records where peer ID doesn't match sender
- Optimize peer ID comparison (direct comparison instead of string conversion)
- Add comprehensive test for peer ID validation
- Update docstrings with security information
- Add newsfragment for issue #958

This addresses CVE-2023-40583 equivalent vulnerability.
Fix flaky test by ensuring mock_host.new_stream raises an exception
so that ping operations fail consistently. This allows peers to be
added when buckets are full, which is necessary for the test to pass.

The test was failing intermittently in CI/CD because the ping behavior
was unpredictable when new_stream was not properly mocked.

This simulates real production behavior where new_stream() fails for
unreachable peers, allowing the routing table to replace unresponsive
peers as per Kademlia protocol specifications.

Fixes #1032
@acul71
Copy link
Contributor Author

acul71 commented Nov 10, 2025

Hi @seetadev and @sumanjeet0012,

I've investigated the flaky test issue you mentioned in discussion #1023 regarding test_kbucket_split_behavior failing intermittently in CI/CD.

Root Cause

The test was flaky because the mock_host fixture in TestRoutingTable didn't properly mock host.new_stream(). When a bucket is full, the code tries to ping the oldest peer, and without proper mocking, the behavior was unpredictable.

Solution

I've created a fix that mocks new_stream() to raise an exception, simulating unresponsive peers. This:

  • Makes the test deterministic and reliable
  • Matches real production behavior where new_stream() fails for unreachable peers
  • Allows the routing table to correctly replace unresponsive peers (as per Kademlia protocol)

PR Created

I've opened PR #1033 with the fix. The test now passes consistently, and all other tests remain unaffected.

Could you please review when you have a chance? The fix is minimal and focused on making the test more reliable.

Thanks!

@seetadev
Copy link
Contributor

Excellent work, @acul71 👏

This is a clean, well-reasoned fix to one of the more persistent CI/CD issues we’ve been seeing with test_kbucket_split_behavior. Your analysis of the root cause — particularly the insight that mock_host.new_stream() wasn’t properly mocked and was introducing nondeterministic behavior — is spot on.

The solution of explicitly mocking new_stream() to raise an exception makes perfect sense. It brings the test behavior closer to real-world scenarios, where peers can be unreachable and ping attempts fail. This approach not only stabilizes the test but also improves the accuracy of how the routing table logic is validated against actual Kademlia behavior.

It’s great to see that you verified consistency across multiple runs and ensured that all other tests remained unaffected.

Thanks for taking the time to dig into the underlying issue and provide a production-aligned fix. 🚀

Looks ready for merge.

@seetadev seetadev merged commit e9d0959 into main Nov 11, 2025
38 checks passed
@seetadev seetadev deleted the fix/kad-dht-kbucket-split-test branch November 11, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix flaky test: test_kbucket_split_behavior in kad_dht routing table

4 participants