Skip to content

Conversation

@mmckeen
Copy link
Contributor

@mmckeen mmckeen commented Jul 22, 2025

Description

This PR allows for optional sampling of packet reporting when in high data aggregation level for packetparser.

By default, all packets are reported but optionally 1 out of n packets are sampled by random chance with the exception of certain important control flags or when hitting the reporting interval.

This allows Retina to scale to high network volume environments at the trade-off of some reporting granularity.

The performance impact of this is mostly for workloads with lots of new connections, connections already tracked in the conntrack table rely on #1665 for scalability.

The behavior added in #1665 allows for accurate reporting of metrics despite sampling being in place.

Related Issue

#1760

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Main

Screenshot 2025-07-22 at 4 51 24 PM

After the change (with default sampling rate of 1)

Screenshot 2025-07-22 at 4 57 36 PM

After the change (with sampling rate of 1000)

Screenshot 2025-07-22 at 5 04 22 PM

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@mmckeen mmckeen force-pushed the packetParserSampling branch 3 times, most recently from ebac9e3 to b2a5ab5 Compare July 22, 2025 23:08
@mmckeen mmckeen marked this pull request as ready for review July 23, 2025 00:05
@mmckeen mmckeen requested a review from a team as a code owner July 23, 2025 00:05
@nddq nddq requested review from SRodi and nddq and removed request for karina-ranadive and mainred August 1, 2025 03:40
@nddq nddq added the type/enhancement New feature or request label Aug 1, 2025
@mmckeen
Copy link
Contributor Author

mmckeen commented Aug 12, 2025

@nddq @SRodi any thoughts on this PR?

@nddq
Copy link
Member

nddq commented Aug 21, 2025

@mmckeen I will try to make a pass this week

Copy link
Member

@nddq nddq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revisit the syntax of DataSamplingRate. For example, since I know the code context, I understand that setting DataSamplingRate to 1 means every packet will be sampled. But for someone new to Retina, this behavior may not be obvious.

@nddq
Copy link
Member

nddq commented Sep 2, 2025

Also, I'm all for ditching the reporting period entirely if we can make packet sampling works as it is a lot more flexible than the currently hardcoding value that we are using for reporting period 🙂 cc @anubhabMajumdar @SRodi So what I'm thinking of is either we report a packet because it carries a new flag or because it was randomly sampled. Some more advanced behavior that I could think of is that we can monitor cpu usage and dynamically adjust the sampling rate and recompile the packet parser program accordingly.

@mmckeen
Copy link
Contributor Author

mmckeen commented Sep 2, 2025

Also, I'm all for ditching the reporting period entirely if we can make packet sampling works as it is a lot more flexible than the currently hardcoding value that we are using for reporting period 🙂 cc @anubhabMajumdar @SRodi So what I'm thinking of is either we report a packet because it carries a new flag or because it was randomly sampled. Some more advanced behavior that I could think of is that we can monitor cpu usage and dynamically adjust the sampling rate and recompile the packet parser program accordingly.

The reporting interval might still be valid to enable us to make sure we report eventually for every 5-tuple. For example, we wouldn't want to sample and then never report the counters for a particular connection.

The reporting interval being something smallish also ensures Prometheus rate interval calculations (per-minute for example) stay valid.

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

This PR will be closed in 7 days due to inactivity.

@github-actions github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Oct 9, 2025
@nddq nddq removed the meta/waiting-for-author Blocked and waiting on the author label Oct 11, 2025
@mmckeen
Copy link
Contributor Author

mmckeen commented Oct 21, 2025

Apologies, been a while for me to circle back on this.

I'll try and address the comments this week 🙇

@mmckeen mmckeen force-pushed the packetParserSampling branch from 4466a52 to 5ddf47d Compare October 22, 2025 21:20
@mmckeen mmckeen force-pushed the packetParserSampling branch from 5ddf47d to 49ed563 Compare October 22, 2025 22:48
@mmckeen mmckeen requested a review from nddq October 22, 2025 22:54
@mmckeen
Copy link
Contributor Author

mmckeen commented Oct 22, 2025

@nddq updated this with docs, the sampling calculation still looks accurate to me but happy for feedback 🙇

@nddq
Copy link
Member

nddq commented Oct 23, 2025

@mmckeen thanks for the docs! After re-reading it, the sampling calculation makes more sense to me now (I thought by default, the sampled variable is set to false 😅). Will try to do another round of review soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants