-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[AzureFunctions] [Storage] Avoid swallowing exceptions when retrieving votes #53737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes exception handling logic from the QueueMetricsProvider class, changing it from gracefully handling transient storage errors by logging warnings and returning default metrics, to now propagating exceptions to callers. The test is updated to expect exceptions to be thrown rather than handled internally.
Key changes:
- Removed try-catch blocks and error handling logic from
GetMetricsAsync()andGetQueueLengthAsync()methods - Changed exception behavior from graceful degradation to fail-fast
- Updated test from validating error handling behavior to expecting exceptions to propagate
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Common/src/Shared/Queues/QueueMetricsProvider.cs | Removed all exception handling from GetMetricsAsync() and GetQueueLengthAsync(), now allowing RequestFailedException and other exceptions to propagate |
| sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Queues/tests/QueueMetricsProviderTests.cs | Updated test to verify exceptions are thrown instead of being handled; changed test method from async to sync; removed logging verification |
Comments suppressed due to low confidence (1)
sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Common/src/Shared/Queues/QueueMetricsProvider.cs:19
- The
_loggerfield is no longer used after removing all exception handling and logging. This unused field should be removed along with theloggerFactoryparameter from the constructor.
private readonly QueueClient _queue;
private readonly ILogger _logger;
| using Azure.Storage.Queues; | ||
| using Microsoft.Azure.WebJobs.Extensions.Storage.Common.Listeners; | ||
| using Microsoft.Azure.WebJobs.Extensions.Storage.Common.Tests; | ||
| using Microsoft.Azure.WebJobs.Host.Listeners; |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This using statement appears to be unused and should be removed.
| using Microsoft.Azure.WebJobs.Host.Listeners; |
| QueueTriggerMetrics queueMetrics = await GetMetricsAsync().ConfigureAwait(false); | ||
| return queueMetrics.QueueLength; |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing exception handling from GetQueueLengthAsync() and GetMetricsAsync() changes the API contract in a breaking way. Callers like QueueTargetScaler.GetScaleResultAsync() that previously received a default value (0) during transient errors will now experience exceptions. This could cause scaling operations to fail instead of gracefully degrading. Consider whether callers are prepared to handle exceptions for transient storage issues like queue not found, queue being deleted, or server-side errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do kinda agree with this but at the same time, if I recall correctly, if an exception is thrown in webjobs then it will actually emit errors to application insights actually and not tear down the entire application. @alrod @mathewc Could you help my understanding here? Just want to make sure we don't cause a breaking behavior change with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not clear to me either whether changing these contracts across all the extensions won't be breaking. As it stands, when any of these monitors/scalers can't provide metrics, they return default/empty metrics but don't throw.
The client of these APIs is not only ScaleManager but also ScaleMonitorService for runtime driven scale. Are there any other callers of these APIs that might not be prepared to handle exceptions? We need to carefully verify.
Yes, this is a risky change to make across all these extensions. I don't understand - why won't these error messages written to the supplied ILogger go to the customer's App Insights or other log streams? The logger should already be configured to do this.
| <PackageReference Update="Microsoft.Azure.SignalR.Serverless.Protocols" Version="1.10.0" /> | ||
| <PackageReference Update="Microsoft.Azure.WebJobs" Version="3.0.41" /> | ||
| <PackageReference Update="Microsoft.Azure.WebJobs.Sources" Version="3.0.41" PrivateAssets="All"/> | ||
| <PackageReference Update="Microsoft.Azure.WebJobs" Version="3.0.42" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsquire Are the other webjob extension packages good to have this version bumped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads-up, @amnguye. I have no objections to the bumps, but I do want to kick the live tests to ensure we don't accidentally blow something up.
|
/azp run net - webpubsub - tests |
|
/azp run net - eventhub - functions - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run net - servicebus - functions - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Confirmed that WebPubSub failures are consistent with nightly runs and not related to these changes. The version bumps look good. |
Proposed Improvement
We want to emit error logs to the customer’s Application Insights whenever there are connectivity issues to the resource.
Current Behavior
Exceptions are swallowed in the extension code and logged only in internal logs.
Planned Change
References
-https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Scale/ScaleManager.cs#L195
-https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Scale/ScaleManager.cs#L121
Additional Notes