Skip to content

Stuck telemetry stream #5

@jktjkt

Description

@jktjkt

On one of the lab boxes, the telemetry streams were not getting through -- neither on new connections, nor on the old ones. This is how the logs looked like:

Mar 15 10:11:48 roadm-c1 rousette[29777]: change: 57267 bytes
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): new event, ∑ queue size = 0
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sent one event
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sleeping
Mar 15 10:11:48 roadm-c1 rousette[29777]: [::1]:36514(for=_a8bpCpSe): new event, ∑ queue size = 27858024
Mar 15 10:11:49 roadm-c1 rousette[29777]: change: 57269 bytes
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): new event, ∑ queue size = 0
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sent one event
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): sleeping
Mar 15 10:11:49 roadm-c1 rousette[29777]: [::1]:36514(for=_a8bpCpSe): new event, ∑ queue size = 27920198

I have not tried fetching other data besides the telemetry stream :(.

Stuff was fixed by restarting the reverse proxy (which took ages to finish):

Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: State 'stop-sigterm' timed out. Killing.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 227 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 233 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 234 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 30866 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Main process exited, code=killed, status=9/KILL
Mar 15 10:12:00 roadm-c1 rousette[29777]: [::1]:36514(for=_RNlpRAX0): closed (INTERNAL_ERROR)
Mar 15 10:12:00 roadm-c1 rousette[29777]: [::1]:36514(for=_a8bpCpSe): closed (INTERNAL_ERROR)
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Failed with result 'timeout'.
Mar 15 10:12:00 roadm-c1 systemd[1]: Stopped nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Consumed 25min 12.430s CPU time.
Mar 15 10:12:00 roadm-c1 systemd[1]: Starting nghttpx reverse proxy for handling incoming HTTP requests...
Mar 15 10:12:00 roadm-c1 rousette[29777]: change: 57266 bytes
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on 0.0.0.0
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on [::]
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.261Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1780) Worker process [30886] spawned
Mar 15 10:12:00 roadm-c1 systemd[1]: Started nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 nghttpx[30886]: 2024-03-15T10:12:00.280Z 30885 30886 34ff0138 NOTICE (shrpx_connection_handler.cc:414) [LISTEN:0x02156390] Created worker thread #0
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.281Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1553) Worker process pid=30886 is ready
Mar 15 10:12:00 roadm-c1 rousette[29777]: change: 57277 bytes
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): GET /telemetry/optics
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): GET /telemetry/optics
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:01 roadm-c1 rousette[29777]: change: 57269 bytes
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:01 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): GET /telemetry/optics
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: change: 57279 bytes
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: change: 57276 bytes
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:02 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:03 roadm-c1 rousette[29777]: change: 57273 bytes
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): new event, ∑ queue size = 0
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sent one event
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_KXCrqzfx): sleeping
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): new event, ∑ queue size = 0
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sent one event
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_Mf8JfaZh): sleeping
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): new event, ∑ queue size = 0
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sent one event
Mar 15 10:12:03 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): sleeping
Mar 15 10:12:04 roadm-c1 nghttpx[30886]: 10.0.0.177 - - [15/Mar/2024:10:12:02 +0000] "GET /telemetry/optics HTTP/1.1" 200 248725 "-" "curl/7.81.0"
Mar 15 10:12:04 roadm-c1 rousette[29777]: [::1]:50574(for=_ULri4niY): closed (INTERNAL_ERROR)

And the full log from nghttpx:

Mar 15 09:54:19 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:09:50:52 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "Python/3.10 aiohttp/3.9.3"
Mar 15 10:06:13 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:09:54:20 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "Python/3.10 aiohttp/3.9.3"
Mar 15 10:09:01 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:10:08:55 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "curl/7.81.0"
Mar 15 10:09:10 roadm-c1 nghttpx[233]: 10.0.0.177 - - [15/Mar/2024:10:09:02 +0000] "GET /telemetry/optics HTTP/1.1" 200 0 "-" "curl/7.81.0"
Mar 15 10:10:30 roadm-c1 nghttpx[233]: 2024-03-15T10:10:30.060Z 227 233 c9d94777 NOTICE (shrpx_worker_process.cc:113) Graceful shutdown signal received
Mar 15 10:10:30 roadm-c1 nghttpx[233]: 2024-03-15T10:10:30.062Z 227 233 24cd368c NOTICE (shrpx_worker.cc:523) [WORKER:0xd36c58] Graceful shutdown commenc
Mar 15 10:10:30 roadm-c1 systemd[1]: Stopping nghttpx reverse proxy for handling incoming HTTP requests...
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: State 'stop-sigterm' timed out. Killing.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 227 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 233 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 234 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Killing process 30866 (nghttpx) with signal SIGKILL.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Main process exited, code=killed, status=9/KILL
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Failed with result 'timeout'.
Mar 15 10:12:00 roadm-c1 systemd[1]: Stopped nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 systemd[1]: nghttpx.service: Consumed 25min 12.430s CPU time.
Mar 15 10:12:00 roadm-c1 systemd[1]: Starting nghttpx reverse proxy for handling incoming HTTP requests...
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on 0.0.0.0
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.260Z 30885 30885 34ff0138 NOTICE (shrpx.cc:971) Listening on [::]
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.261Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1780) Worker process [30886] spawned
Mar 15 10:12:00 roadm-c1 systemd[1]: Started nghttpx reverse proxy for handling incoming HTTP requests.
Mar 15 10:12:00 roadm-c1 nghttpx[30886]: 2024-03-15T10:12:00.280Z 30885 30886 34ff0138 NOTICE (shrpx_connection_handler.cc:414) [LISTEN:0x02156390] Creat
Mar 15 10:12:00 roadm-c1 nghttpx[30885]: 2024-03-15T10:12:00.281Z 30885 30885 34ff0138 NOTICE (shrpx.cc:1553) Worker process pid=30886 is ready
Mar 15 10:12:04 roadm-c1 nghttpx[30886]: 10.0.0.177 - - [15/Mar/2024:10:12:02 +0000] "GET /telemetry/optics HTTP/1.1" 200 248725 "-" "curl/7.81.0"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions