Skip to content

Conversation

@alrod
Copy link
Member

@alrod alrod commented Nov 6, 2025

Proposed Improvement
We want to emit error logs to the customer’s Application Insights whenever there are connectivity issues to the resource.

Current Behavior
Exceptions are swallowed in the extension code and logged only in internal logs.

Planned Change

  • The change will allow connectivity exceptions to bubble up to the WebJobs SDK, so they can be logged in the standard WebJobs format.
  • All extensions will follow the same logging format for consistency.
  • In case of an exception, the trigger will be ignored for the current vote integration.

References
-https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Scale/ScaleManager.cs#L195
-https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Scale/ScaleManager.cs#L121

Additional Notes

  • The ScaleManager class in WebJobs SDK is used for both regular scale and runtime-driven scale.
  • We are also adding voting logs in the PR to standardize how the current vote is logged. These logs will also be provided to the customer.

Copilot AI review requested due to automatic review settings November 6, 2025 18:29
@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Nov 6, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes exception handling logic from the QueueMetricsProvider class, changing it from gracefully handling transient storage errors by logging warnings and returning default metrics, to now propagating exceptions to callers. The test is updated to expect exceptions to be thrown rather than handled internally.

Key changes:

  • Removed try-catch blocks and error handling logic from GetMetricsAsync() and GetQueueLengthAsync() methods
  • Changed exception behavior from graceful degradation to fail-fast
  • Updated test from validating error handling behavior to expecting exceptions to propagate

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Common/src/Shared/Queues/QueueMetricsProvider.cs Removed all exception handling from GetMetricsAsync() and GetQueueLengthAsync(), now allowing RequestFailedException and other exceptions to propagate
sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Queues/tests/QueueMetricsProviderTests.cs Updated test to verify exceptions are thrown instead of being handled; changed test method from async to sync; removed logging verification
Comments suppressed due to low confidence (1)

sdk/storage/Microsoft.Azure.WebJobs.Extensions.Storage.Common/src/Shared/Queues/QueueMetricsProvider.cs:19

  • The _logger field is no longer used after removing all exception handling and logging. This unused field should be removed along with the loggerFactory parameter from the constructor.
        private readonly QueueClient _queue;
        private readonly ILogger _logger;

using Azure.Storage.Queues;
using Microsoft.Azure.WebJobs.Extensions.Storage.Common.Listeners;
using Microsoft.Azure.WebJobs.Extensions.Storage.Common.Tests;
using Microsoft.Azure.WebJobs.Host.Listeners;
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This using statement appears to be unused and should be removed.

Suggested change
using Microsoft.Azure.WebJobs.Host.Listeners;

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +39
QueueTriggerMetrics queueMetrics = await GetMetricsAsync().ConfigureAwait(false);
return queueMetrics.QueueLength;
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing exception handling from GetQueueLengthAsync() and GetMetricsAsync() changes the API contract in a breaking way. Callers like QueueTargetScaler.GetScaleResultAsync() that previously received a default value (0) during transient errors will now experience exceptions. This could cause scaling operations to fail instead of gracefully degrading. Consider whether callers are prepared to handle exceptions for transient storage issues like queue not found, queue being deleted, or server-side errors.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do kinda agree with this but at the same time, if I recall correctly, if an exception is thrown in webjobs then it will actually emit errors to application insights actually and not tear down the entire application. @alrod @mathewc Could you help my understanding here? Just want to make sure we don't cause a breaking behavior change with this.

Copy link
Member

@mathewc mathewc Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not clear to me either whether changing these contracts across all the extensions won't be breaking. As it stands, when any of these monitors/scalers can't provide metrics, they return default/empty metrics but don't throw.

The client of these APIs is not only ScaleManager but also ScaleMonitorService for runtime driven scale. Are there any other callers of these APIs that might not be prepared to handle exceptions? We need to carefully verify.

Yes, this is a risky change to make across all these extensions. I don't understand - why won't these error messages written to the supplied ILogger go to the customer's App Insights or other log streams? The logger should already be configured to do this.

<PackageReference Update="Microsoft.Azure.SignalR.Serverless.Protocols" Version="1.10.0" />
<PackageReference Update="Microsoft.Azure.WebJobs" Version="3.0.41" />
<PackageReference Update="Microsoft.Azure.WebJobs.Sources" Version="3.0.41" PrivateAssets="All"/>
<PackageReference Update="Microsoft.Azure.WebJobs" Version="3.0.42" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsquire Are the other webjob extension packages good to have this version bumped?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up, @amnguye. I have no objections to the bumps, but I do want to kick the live tests to ensure we don't accidentally blow something up.

@jsquire
Copy link
Member

jsquire commented Nov 7, 2025

/azp run net - webpubsub - tests

@jsquire
Copy link
Member

jsquire commented Nov 7, 2025

/azp run net - eventhub - functions - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jsquire
Copy link
Member

jsquire commented Nov 7, 2025

/azp run net - servicebus - functions - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jsquire
Copy link
Member

jsquire commented Nov 7, 2025

Confirmed that WebPubSub failures are consistent with nightly runs and not related to these changes. The version bumps look good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants