-
Notifications
You must be signed in to change notification settings - Fork 139
provider: dedicated provide queue #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gammazero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide collects batches of CID before providing them. This means there may be some delay when filling the batch, especially if each new CID arrives just before the pause detection timer expired. It seems like it might be better to send new provides immediately.
One problem with doing that is that the dublicate CIDs would not be removed, because the batch is currently a map of CIDs. Consider keeping the map to ignore duplicates, but publishing each unique CID immediately. There may be some additional logic needed to keep the map from growing too large, which may mean making it an LRU cache.
Also, we may need a test to show that providing does not need to wait for reprovide.
Otherwise, this all looks good.
|
Added instant provides but I didn't add tests yet. I have set instant provides as default with a "safe (?)" default of 128 concurrent provides, batched provides can still be used if the right option is provided. I have set a warning if the queue of cids to be provided instantly isn't processed fast enough. I have set instant provides as the default because before #641, all cids were instantly provided, so the behavior is expected to be similar to what it used to be. It adds quite some complexity :( In any case, the provide operation is resource intensive, each provide triggering sending a message to 20 remote peers. With batching and the accelerated DHT client, we can group the messages we send to a remote peer to avoid opening a closing a connection many times to the same peer. But it comes at the expense of timing. Instant provides are "instant" but the node is expected to open and close more connections, even though the same payload is sent to the same peers. Some open questions:
@gammazero let me know if you see anything that can be simplified further |
If using a worker pool, I do not see a need for batched provides. The size of the worker pool will limit the number of CIDs read into memory and processed. For lower-priority reprovides, batching could be used to limit the number of CIDs in memory, but this limitation can also be done using a worker pool. It would probably be good to have a smaller worker pool for reprovides than is used for provides. Maybe even limit the reprovide pool size to one or two. Batching used to be the way CIDs were deduplicated. This has been replaced by the deduplication cache in this PR, so there really does not appear to be any utility in batching. Once #901 is fixed, there will also be no need for deduplication cache.
If we have a limited size worker pool, there does not seem to be a need for a batch, since the size of the pool limits the number of CIDs that get loaded into memory for (re)providing. The pool size should be adjustable, and a smaller pool should be used for reprovides by default.
If provide sends each CID to 20 remote peers (requires 20 connections) and the worker pool has 50 workers, then that is using 1000 connections. Perhaps the pool size can be informed by the maximum (or a reasonable) number of peer connections, leaving enough capacity to handle incoming peer connections. Does resource manager give us some clue? |
|
The main benefit of batching, is that it reduces the number of connections we need to open. Basically it will
Where as if there is no batching, the provider will open a connection to the 20 closest nodes to a cid, send them the Hence batching helps reduce the number of dials. Note that batch providing only works the the accelerated DHT client.
This is tricky, because the The libp2p resource manager will probably prevent us from opening too many connections (if enabled). |
gammazero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. I think we can get rid of batched provides.
|
gammazero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reprovides are currently spiky since all keys are reprovided at once (for both normal and accelerated DHT clients). If new keys are added during a reprovide, the
reproviderwill wait until the current reprovide is finished before advertising the newly added keys to the DHT. So new keys are blocked by reprovides.This PR splits the (re)provide queue in a provide queue and a reprovide queue, allowing provides to be advertised immediately, even during reprovides.
ipfs/kubo#10777
Depending on either: