Skip to content

A critical error when working with POLL and EPOLL? #1939

@John-Jasper-Doe

Description

@John-Jasper-Doe

In addition to the previous issue.

if (strikes >= MAX_RECV_LOOP_STRIKES) {
ilog(LOG_WARN | LOG_FLAG_LIMIT, "UDP receive queue exceeded %i times: "
"discarding packet", strikes);
// Polling is edge-triggered so we won't immediately get here again.
// We could remove ourselves from the poller though. Maybe call stream_fd_closed?
return;
}

After adding changes to this source code, the functionality for resetting active_read_events and error_strikes to 0,
the problem was not completely resolved.

There is a suggestion of a critical error in the current implementation of the poller_poll() and epoll_events()
functions.

rtpengine/lib/poller.c

Lines 84 to 90 in 76dd9ab

static int epoll_events(struct poller_item *it, struct poller_item_int *ii) {
if (!it)
it = &ii->item;
return EPOLLHUP | EPOLLERR | EPOLLET |
((it->writeable && ii && ii->blocked) ? EPOLLOUT : 0) |
(it->readable ? EPOLLIN : 0);
}

The EPOLLET flag has been added to the implementation of the epoll_events() function. However, the poller_pool()
function analyzes the polling constants (POLLERR, POLLHUP), which is incorrect. As a result, frozen RTP sessions
appear 10-15 seconds after the call. Removing the EPOLLET solves the problem of hanging, but it leads to CPU
overload (constant uncontrolled wakeups).

What if you tweak the epoll_event() and poller_poll() functions, change the POLL constants to EPOLL, this will lead to the
absence of "frozen sessions", low CPU load, and correct operation with epoll constants.

New version of epoll_events():

    return EPOLLHUP | EPOLLERR | EPOLLET | EPOLLRDHUP | EPOLLPRI |
		  ((it->writeable && ii && ii->blocked) ? EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND : 0) |
		  (it->readable ? EPOLLIN |  EPOLLRDNORM | EPOLLRDBAND : 0);

And in the new version of the poller_poll() function, analyze these flags:

... 
if (ev->events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP))
  it->item.closed(it->item.fd, it->item.obj);
else {
  if (ev->events & (EPOLLIN | EPOLLPRI | EPOLLRDNORM | EPOLLRDBAND)) {
    if (it->item.readable)
      it->item.readable(it->item.fd, it->item.obj);
  }
  else
    if (ev->events & (EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND)) {
      mutex_lock(&p->lock);
      it->blocked = 0;

      ZERO(e);
      ...
    }
...

What do you think it's correct fix or not?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions