Update in_cloudwatch_logs.rb for better throttling handling #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

hd40910 wants to merge 1 commit into fluent-plugins-nursery:master from hd40910:patch-1

hd40910 commented Apr 3, 2025

If we look at this ruby code , there are 2 potential issues:

The current implementation has a flaw in it retry mechanism. When AWS throttles the get_log_events call, the code attempts to retry, but there's no limit to how many times it will retry
The retry mechanism is inefficient because it's retrying the entire get_events method when throttling occurs if there are more than x events


          Update in_cloudwatch_logs.rb

0d74b19

If we look at this ruby code , there are 2 potential issues:

1.	The current implementation has a flaw in it retry mechanism. When AWS throttles the get_log_events call, the code attempts to retry, but there's no limit to how many times it will retry
2.	The retry mechanism is inefficient because it's retrying the entire get_events method when throttling occurs if there are more than x events

Author

hd40910 commented Apr 3, 2025

@cosmo0920 for your approval, please do inform if you see issues

Member

cosmo0920 commented Apr 10, 2025

Currently, I have no cycles to take a look on it.
Could y'all take a look this, @kenhys or @daipom ?

daipom self-requested a review

April 10, 2025 10:52

daipom requested changes

View reviewed changes

Contributor

daipom left a comment

@hd40910 Thanks for this improvement! Sorry for the delay.
This direction looks good to me.

I have commented on some points.
Please check them.

In addition, please consider the following points?

Remove unnecessary spaces.
- Some new blank lines contain needless whitespaces.
Update README.
- Add description for the new parameter.
- Change description of throttling_retry_seconds about the exponential interval.
Fix failing tests on CI.
- Looks like they fail because the log message has changed.
- Since the wait time will be random, it would be good to loosen the assert condition. It would be unnecessary to assert an exact match for the entire message.)

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                  config_param :end_time, :string, default: nil

                  config_param :time_range_format, :string, default: "%Y-%m-%d %H:%M:%S"

                  config_param :throttling_retry_seconds, :time, default: nil

                  config_param :max_retry_count, :integer, default: 999 #TODO

Contributor

daipom Apr 16, 2025

config_param :max_retry_count, :integer, default: 999 #TODO

Please consider the appropriate default value and remove the TODO comment.

I'm not familiar with CloudWatch.
Can there be a case where a user wants to retry an unlimited number of times, like the current version?

If so, retry should be unlimited when this value is nil.
And the default value might be better to be nil for compatibility.

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                    request[:next_token] = log_next_token if !log_next_token.nil? && !log_next_token.empty?

                    request[:start_from_head] = true if read_from_head?(log_next_token)

                    # Only apply throttling retry to the API call, not the whole method

Contributor

daipom Apr 16, 2025

Suggested change

# Only apply throttling retry to the API call, not the whole method

Information that can be found in the code is not required for comments.
(Maybe those comments are for PR. Thanks. I understand the code. Let's remove them now.)

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                    request[:next_token] = next_token if next_token

                    request[:log_stream_name_prefix] = log_stream_name_prefix if log_stream_name_prefix

                    # Only apply throttling retry to the API call

Contributor

daipom Apr 16, 2025

Suggested change

# Only apply throttling retry to the API call

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                  end

                  def throttling_handler(method_name)

                  # New method to handle API calls with throttling retry with exponential backoff

Contributor

daipom Apr 16, 2025

Suggested change

# New method to handle API calls with throttling retry with exponential backoff

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                    if @throttling_retry_seconds && retry_count < @max_retry_count

                      # Calculate backoff with jitter: base_time * (2^retry_count) + random_jitter

                      wait_time = @throttling_retry_seconds * (2 ** retry_count) * (0.9 + 0.2 * rand)

                      log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds."

Contributor

daipom Apr 16, 2025

Could you tell me what Haia means?

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                      log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds."

                      sleep wait_time

                      # Only retry the API call itself, not recursively

Contributor

daipom Apr 16, 2025

Suggested change

# Only retry the API call itself, not recursively

lib/fluent/plugin/in_cloudwatch_logs.rb

    
                    request[:next_token] = next_token if next_token

                    response = @logs.describe_log_groups(request)

                    # Apply throttling handling to describe_log_groups too

Contributor

daipom Apr 16, 2025

Suggested change

# Apply throttling handling to describe_log_groups too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet