-
Notifications
You must be signed in to change notification settings - Fork 1.4k
zebra: optimize nexthop_active_update for routes with identical ECMP #20039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
krishna-samy
wants to merge
4
commits into
FRRouting:master
Choose a base branch
from
krishna-samy:krishna/nh_active_optimization
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
zebra: optimize nexthop_active_update for routes with identical ECMP #20039
krishna-samy
wants to merge
4
commits into
FRRouting:master
from
krishna-samy:krishna/nh_active_optimization
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
Author
|
Some of the test results: scale test results - 10k routes with 512 ECMP:without fix:Event statistics for zebra: with fix: |
03d8d7c to
447be1a
Compare
The original implementation of nhe_received was done to store the received NHs from protocols as is during the early route processing. But the route_entry_update_nhe() was incorrectly overwriting re->nhe_received during NH resolution updates. This caused the received NHE to be replaced with the resolved NHE, defeating the purpose of tracking the original nexthops sent by the protocol. Fixing the same. Signed-off-by: Krishnasamy <[email protected]>
Signed-off-by: Krishnasamy <[email protected]>
Problem statement:
- Under large route churn with identical ECMP sets, Zebra spends excessive CPU
in nexthop_active_update during rib_process.
- This is because of repeated identical NH active checks, nhe hash lookups and
hash entry transitions during resolution process
- As per current code, each incoming route entry during a burst will go
through the above processing individually.
- This could be optimized to have efficient processing during route churn
Fix:
- Introduce new fields as below to cache the resolved_nhe_id for each incoming NHE received from protocol.
struct route_entry {
...
struct nhg_hash_entry *nhe_received;
...
}
- On the received/unresolved NHE:
struct nhg_hash_entry {
...
uint32_t resolved_nhe_id; // Cached resolved NHE ID (0 = not cached)
uint32_t cache_gen_num; // Validation stamp for cache
};
- 're->nhe_received' to store the received NHs set from protocol
- 'nhe_received->resolved_nhe_id' to store the resolved NHE in slow-path
and the same will be used for lookup during fast-path.
- 'global_nh_epoch' to track any system wide events so that the cached NHEs would be invalidated.
- On fast-path, if cache_gen_num matches global_nh_epoch, adopt the resolved NHG directly (skip heavy resolution).
- Increment global_nh_epoch on route-map changes, interface up/down/address events
and label updates to invalidate the cache
- So the validity of any cached 'resolved_nhe_id' is determined by equality check 'cache_gen_num == global_nh_epoch'.
- Some of the special cases like labels, route-map, self-pointing NHs are considered to skip the caching.
Signed-off-by: Krishnasamy <[email protected]>
Signed-off-by: Krishnasamy <[email protected]>
447be1a to
c4c7153
Compare
Contributor
Author
|
ci:rerun |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem statement:
Fix:
Result:
overall ~30% improvement in zebra during routes churn with identical ECMP
Before fix:
0 14684.091 1317 11149 25199 11185 25841 0 0 0 TE work_queue_runAfter fix:
0 7307.182 1166 6266 18044 6292 18051 0 0 0 TE work_queue_run