Skip to content

Conversation

@codemaestro64
Copy link

What this PR does

Adds Python (py-libp2p v0.2.9) to the existing hole-punch-interop test suite.

Implemented

  • /libp2p/dcutr/0.1.0 protocol
  • CONNECTSYNC message exchange (JSON)
  • Observed-address exchange (host.get_network().get_addrs())
  • Simultaneous direct dial (100 ms window)
  • Verification – ping RTT < 100 ms → direct connection, otherwise fallback to relay
  • Bidirectional – Python initiator ↔ Go receiver and Go initiator ↔ Python receiver
  • NAT simulation via Docker + iptables masquerade (entrypoint)

Output

"python-v0.2.9 x go-v0.42 (dcutr,tcp,noise)",success
"go-v0.42 x python-v0.2.9 (dcutr,tcp,noise)",success

@dhuseby dhuseby added the hole-punching Related to hole punching interop label Nov 6, 2025
@seetadev seetadev marked this pull request as ready for review November 9, 2025 21:00
@sumanjeet0012
Copy link
Contributor

@codemaestro64 The CI/CD pipeline is failing due to a minor issue.

From the logs, we can see the following:

Dockerfile:6

4 | COPY pyproject.toml .
5 | RUN pip install --no-cache-dir -e .
6 | >>> COPY hole_punch_test.py .
7 | VOLUME /results
8 | CMD ["python", "hole_punch_test.py"]

ERROR: failed to build: failed to solve: failed to compute cache key: failed to calculate checksum of ref 06a078c6-7ffc-447f-b88c-6fd49120bdc5::7hh376udkbb2bncz241saf56a: "/hole_punch_test.py": not found

The issue is that the actual file name is 'hole_punch.py', but the Dockerfile references 'hole_punch_test.py'. This mismatch is causing the build error.

To resolve this, you can either:

  1. Rename the file to hole_punch_test.py, or
  2. Update the Dockerfile to use hole_punch.py.

The incorrect references in the Dockerfile are:

  • COPY hole_punch_test.py .
  • CMD ["python", "hole_punch_test.py"]

@acul71
Copy link
Contributor

acul71 commented Nov 24, 2025

@sumanjeet0012 @codemaestro64

Python Implementation Missing Features - Hole Punch Interop

Overview

This document analyzes what the Python implementation in hole-punch-interop/impl/python/v0.4.0 is missing compared to the requirements specified in hole-punch-interop/README.md and the expected test flow.

Critical Missing Features

1. Binary Name Requirement

Status: ❌ MISSING

  • Requirement: Docker containers MUST have a binary called hole-punch-client in their $PATH
  • Current State: The Python implementation runs python hole_punch.py directly via start.sh
  • Fix Needed: Create a wrapper script or symlink named hole-punch-client that executes the Python script

2. Redis Integration

Status: ❌ MISSING

  • Requirement: Clients must connect to Redis at redis:6379 and:
    • Block until RELAY_TCP_ADDRESS or RELAY_QUIC_ADDRESS is available
    • Listener must push its PeerId to LISTEN_CLIENT_PEER_ID after making a reservation
    • Dialer must block on LISTEN_CLIENT_PEER_ID availability
  • Current State: Python code expects RELAY_MULTIADDR and TARGET_ID environment variables directly, completely bypassing Redis
  • Fix Needed: Implement Redis client using redis Python library with blocking operations (blpop with timeout 0)

3. Test Flow Implementation

Status: ❌ INCORRECT

The Python implementation doesn't follow the required test flow:

Missing Steps:

  1. Redis Connection: No Redis client initialization
  2. Relay Address Discovery: Should wait for RELAY_TCP_ADDRESS/RELAY_QUIC_ADDRESS from Redis, not env vars
  3. Identify Protocol: Should use identify to discover external address (mentioned in README as SHOULD)
  4. Relay Reservation: Listener should listen on relay and make a reservation (not just connect)
  5. PeerId Publishing: Listener should push PeerId to Redis key LISTEN_CLIENT_PEER_ID after reservation
  6. PeerId Waiting: Dialer should block on LISTEN_CLIENT_PEER_ID from Redis, not use TARGET_ID env var

4. Output Format

Status: ❌ INCORRECT

  • Requirement: RTT MUST be printed to stdout in JSON format:
    { "rtt_to_holepunched_peer_millis": 12 }
  • Current State: Writes to /results/results.csv in CSV format
  • Fix Needed: Print JSON to stdout instead of writing CSV file

5. Mode Values

Status: ❌ INCORRECT

  • Requirement: MODE env variable should be listen or dial
  • Current State: Python code checks for initiator or receiver
  • Fix Needed: Update mode checking to use listen/dial

6. Transport Support

Status: ❌ MISSING

  • Requirement: Must support both tcp and quic transports
  • Current State: Only TCP is implemented (hardcoded)
  • Fix Needed: Add QUIC transport support and select based on TRANSPORT env variable

7. Required System Tools

Status: ❌ MISSING

  • Requirement: MUST have dig, curl, jq and tcpdump installed
  • Current State: Dockerfile only installs iptables and libgmp-dev
  • Fix Needed: Add these packages to Dockerfile:
    RUN apt-get update && apt-get install -y \
        git build-essential iptables libgmp-dev \
        dnsutils curl jq tcpdump \
        && rm -rf /var/lib/apt/lists/*

8. TCP_NODELAY

Status: ❌ MISSING

  • Requirement: Implementations MUST set TCP_NODELAY for the TCP transport
  • Current State: Not implemented in py-libp2p. The TCP transport (libp2p/transport/tcp/tcp.py) uses trio.open_tcp_stream() and trio.serve_tcp() which don't configure TCP_NODELAY. No socket option configuration exists in the codebase.
  • Fix Needed:
    • Option 1: Modify py-libp2p's TCP transport to set TCP_NODELAY on sockets. The TrioTCPStream class has access to stream.socket, so it's possible to add setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) after socket creation.
    • Option 2: If modifying py-libp2p is not possible, create a wrapper that sets TCP_NODELAY on connections after they're established (though this is less ideal).
  • Note: Checked /home/luca/PNL_Launchpad_Curriculum/Libp2p/py-libp2p - no TCP_NODELAY support found.

9. Connection Keep-Alive

Status: ❌ UNCLEAR

  • Requirement: Implementations MUST make sure connections are being kept alive
  • Current State: Not explicitly configured
  • Fix Needed: Ensure keep-alive is enabled on TCP connections

10. 0RTT Negotiation

Status: ❌ UNCLEAR

  • Requirement: Dialer and listener both MUST use 0RTT negotiation for protocols
  • Current State: Not clear if implemented
  • Fix Needed: Verify and ensure 0RTT is used for protocol negotiation

11. Protocol Version

Status: ⚠️ POTENTIALLY INCORRECT

  • Current State: Uses /libp2p/dcutr/1.0.0
  • Note: README mentions /libp2p/dcutr/0.1.0 but Rust implementation uses /libp2p/dcutr/1.0.0 - need to verify correct version

12. Muxer Selection

Status: ❌ INCORRECT

  • Requirement: For TCP, MUST use noise + yamux to upgrade the connection
  • Current State: Uses muxer_preference="MPLEX" (mplex instead of yamux)
  • Fix Needed: Change to yamux for TCP transport

13. Logging Requirements

Status: ⚠️ PARTIALLY CORRECT

  • Requirement: Logs MUST go to stderr, RTT json MUST go to stdout
  • Current State: Uses Python logging which may go to stderr by default, but RTT goes to CSV file instead of stdout
  • Fix Needed: Ensure logging goes to stderr, RTT JSON goes to stdout

14. Listener Behavior

Status: ⚠️ PARTIALLY CORRECT

  • Requirement: Listener MUST NOT early-exit but wait to be killed by test runner
  • Current State: Uses await asyncio.Event().wait() which should work, but needs verification
  • Fix Needed: Ensure listener waits indefinitely until killed

15. Error Handling

Status: ⚠️ PARTIALLY CORRECT

  • Requirement: Implementations SHOULD exit early with a non-zero exit code if anything goes wrong
  • Current State: Some error handling exists but may not exit with proper codes
  • Fix Needed: Ensure all error paths exit with non-zero codes

Summary

The Python implementation is missing the core Redis-based orchestration mechanism and doesn't follow the required test flow. It appears to be using a simplified approach that bypasses the standard test infrastructure. To be compliant, it needs:

  1. Redis integration - Most critical missing piece
  2. Correct test flow - Following the step-by-step process in README
  3. Binary wrapper - hole-punch-client in PATH
  4. Output format - JSON to stdout instead of CSV
  5. Transport support - Add QUIC
  6. System dependencies - Add required tools
  7. Protocol compliance - yamux instead of mplex, TCP_NODELAY, etc.

Reference Implementation

The Rust implementation in hole-punch-interop/impl/rust/v0.53/rust-libp2p-.../hole-punching-tests/src/main.rs provides a complete reference for how the test flow should be implemented, including:

  • Redis client with blocking operations
  • Proper test flow following all steps
  • Correct output format
  • Transport selection
  • Error handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hole-punching Related to hole punching interop

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants