Skip to content

Conversation

@hd40910
Copy link

@hd40910 hd40910 commented Apr 3, 2025

If we look at this ruby code , there are 2 potential issues:

  1. The current implementation has a flaw in it retry mechanism. When AWS throttles the get_log_events call, the code attempts to retry, but there's no limit to how many times it will retry
  2. The retry mechanism is inefficient because it's retrying the entire get_events method when throttling occurs if there are more than x events

If we look at this ruby code , there are 2 potential issues:

1.	The current implementation has a flaw in it retry mechanism. When AWS throttles the get_log_events call, the code attempts to retry, but there's no limit to how many times it will retry
2.	The retry mechanism is inefficient because it's retrying the entire get_events method when throttling occurs if there are more than x events
@hd40910
Copy link
Author

hd40910 commented Apr 3, 2025

@cosmo0920 for your approval, please do inform if you see issues

@cosmo0920
Copy link
Member

Currently, I have no cycles to take a look on it.
Could y'all take a look this, @kenhys or @daipom ?

@daipom daipom self-requested a review April 10, 2025 10:52
Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hd40910 Thanks for this improvement! Sorry for the delay.
This direction looks good to me.

I have commented on some points.
Please check them.

In addition, please consider the following points?

  • Remove unnecessary spaces.
    • Some new blank lines contain needless whitespaces.
  • Update README.
    • Add description for the new parameter.
    • Change description of throttling_retry_seconds about the exponential interval.
  • Fix failing tests on CI.
    • Looks like they fail because the log message has changed.
    • Since the wait time will be random, it would be good to loosen the assert condition. It would be unnecessary to assert an exact match for the entire message.)

config_param :end_time, :string, default: nil
config_param :time_range_format, :string, default: "%Y-%m-%d %H:%M:%S"
config_param :throttling_retry_seconds, :time, default: nil
config_param :max_retry_count, :integer, default: 999 #TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_param :max_retry_count, :integer, default: 999 #TODO

Please consider the appropriate default value and remove the TODO comment.

I'm not familiar with CloudWatch.
Can there be a case where a user wants to retry an unlimited number of times, like the current version?

If so, retry should be unlimited when this value is nil.
And the default value might be better to be nil for compatibility.

request[:next_token] = log_next_token if !log_next_token.nil? && !log_next_token.empty?
request[:start_from_head] = true if read_from_head?(log_next_token)

# Only apply throttling retry to the API call, not the whole method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Only apply throttling retry to the API call, not the whole method

Information that can be found in the code is not required for comments.
(Maybe those comments are for PR. Thanks. I understand the code. Let's remove them now.)

request[:next_token] = next_token if next_token
request[:log_stream_name_prefix] = log_stream_name_prefix if log_stream_name_prefix

# Only apply throttling retry to the API call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Only apply throttling retry to the API call

end

def throttling_handler(method_name)
# New method to handle API calls with throttling retry with exponential backoff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# New method to handle API calls with throttling retry with exponential backoff

if @throttling_retry_seconds && retry_count < @max_retry_count
# Calculate backoff with jitter: base_time * (2^retry_count) + random_jitter
wait_time = @throttling_retry_seconds * (2 ** retry_count) * (0.9 + 0.2 * rand)
log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me what Haia means?

log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds."
sleep wait_time

# Only retry the API call itself, not recursively
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Only retry the API call itself, not recursively

request[:next_token] = next_token if next_token
response = @logs.describe_log_groups(request)

# Apply throttling handling to describe_log_groups too
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Apply throttling handling to describe_log_groups too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants