Update in_cloudwatch_logs.rb for better throttling handling#264
Conversation
If we look at this ruby code , there are 2 potential issues: 1. The current implementation has a flaw in it retry mechanism. When AWS throttles the get_log_events call, the code attempts to retry, but there's no limit to how many times it will retry 2. The retry mechanism is inefficient because it's retrying the entire get_events method when throttling occurs if there are more than x events
|
@cosmo0920 for your approval, please do inform if you see issues |
daipom
left a comment
There was a problem hiding this comment.
@hd40910 Thanks for this improvement! Sorry for the delay.
This direction looks good to me.
I have commented on some points.
Please check them.
In addition, please consider the following points?
- Remove unnecessary spaces.
- Some new blank lines contain needless whitespaces.
- Update README.
- Add description for the new parameter.
- Change description of
throttling_retry_secondsabout the exponential interval.
- Fix failing tests on CI.
- Looks like they fail because the log message has changed.
- Since the wait time will be random, it would be good to loosen the assert condition. It would be unnecessary to assert an exact match for the entire message.)
| config_param :end_time, :string, default: nil | ||
| config_param :time_range_format, :string, default: "%Y-%m-%d %H:%M:%S" | ||
| config_param :throttling_retry_seconds, :time, default: nil | ||
| config_param :max_retry_count, :integer, default: 999 #TODO |
There was a problem hiding this comment.
config_param :max_retry_count, :integer, default: 999 #TODO
Please consider the appropriate default value and remove the TODO comment.
I'm not familiar with CloudWatch.
Can there be a case where a user wants to retry an unlimited number of times, like the current version?
If so, retry should be unlimited when this value is nil.
And the default value might be better to be nil for compatibility.
| request[:next_token] = log_next_token if !log_next_token.nil? && !log_next_token.empty? | ||
| request[:start_from_head] = true if read_from_head?(log_next_token) | ||
|
|
||
| # Only apply throttling retry to the API call, not the whole method |
There was a problem hiding this comment.
| # Only apply throttling retry to the API call, not the whole method |
Information that can be found in the code is not required for comments.
(Maybe those comments are for PR. Thanks. I understand the code. Let's remove them now.)
| request[:next_token] = next_token if next_token | ||
| request[:log_stream_name_prefix] = log_stream_name_prefix if log_stream_name_prefix | ||
|
|
||
| # Only apply throttling retry to the API call |
There was a problem hiding this comment.
| # Only apply throttling retry to the API call |
| end | ||
|
|
||
| def throttling_handler(method_name) | ||
| # New method to handle API calls with throttling retry with exponential backoff |
There was a problem hiding this comment.
| # New method to handle API calls with throttling retry with exponential backoff |
| if @throttling_retry_seconds && retry_count < @max_retry_count | ||
| # Calculate backoff with jitter: base_time * (2^retry_count) + random_jitter | ||
| wait_time = @throttling_retry_seconds * (2 ** retry_count) * (0.9 + 0.2 * rand) | ||
| log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds." |
There was a problem hiding this comment.
Could you tell me what Haia means?
| log.warn "Haia - ThrottlingException on #{method_name}. Retry #{retry_count+1}/#{@max_retry_count}. Waiting #{wait_time.round(2)} seconds." | ||
| sleep wait_time | ||
|
|
||
| # Only retry the API call itself, not recursively |
There was a problem hiding this comment.
| # Only retry the API call itself, not recursively |
| request[:next_token] = next_token if next_token | ||
| response = @logs.describe_log_groups(request) | ||
|
|
||
| # Apply throttling handling to describe_log_groups too |
There was a problem hiding this comment.
| # Apply throttling handling to describe_log_groups too |
If we look at this ruby code , there are 2 potential issues: