Skip to content

Conversation

@GarrettBeatty
Copy link
Contributor

@GarrettBeatty GarrettBeatty commented Dec 11, 2025

Description

Previously if an error occurred when for a part, we would still queue the next parts, even though a previous part failed.

This would result in the behavior where, part 2 failed, queues part 3, part 3 fails, queues part 4, etc..

This change adds a check so that if a part fails with some exception, it will cancel the internal cancellation token to prevent other parts from getting queued.

Now the new behavior is part 2 failed, dont queue part 3.

Motivation and Context

#3806

Testing

  1. Unit test

Before my fix the unit test failed on

                    $"Part 10 should NOT be queued after Part 3 fails. Queued parts: {string.Join(", ", queuedPartsList)}");

after my fix, the test passes.

  1. I also re-ran this fix on the ec2 instance where i originally saw this error and i verified that the program was successfully cancelled immediately instead of going through all of the parts.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

@GarrettBeatty GarrettBeatty marked this pull request as ready for review December 11, 2025 17:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where multipart downloads would continue queuing new parts even after an earlier part had failed, causing unnecessary network requests and delayed error reporting. The fix adds early cancellation logic using a ContinueWith callback on each download task to immediately cancel the internal CancellationTokenSource when any part fails, preventing the loop from queuing additional parts.

Key Changes

  • Adds a ContinueWith callback to each download task that cancels the internal token on first failure, stopping the queueing loop early
  • Improves exception capture to prioritize the original failure exception over cascading cancellation exceptions
  • Adds comprehensive unit test that validates parts stop being queued after a failure (max 6 parts instead of all 10)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
sdk/test/Services/S3/UnitTests/Custom/MultipartDownloadManagerTests.cs Adds new test StartDownloadsAsync_PartFailsDuringDownload_StopsQueuingNewParts to verify that when Part 3 fails, the system stops queuing additional parts instead of continuing through all parts
sdk/src/Services/S3/Custom/Transfer/Internal/MultipartDownloadManager.cs Wraps the download task creation loop in try-catch for OperationCanceledException, adds ContinueWith callback to cancel internal token on task failure, improves exception handling to preserve original exception, and adds error logging to discovery methods

@GarrettBeatty GarrettBeatty changed the base branch from feature/transfermanager to gcbeatty/refactor2 December 12, 2025 21:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@GarrettBeatty GarrettBeatty removed the request for review from normj December 12, 2025 22:47
@GarrettBeatty GarrettBeatty changed the base branch from gcbeatty/refactor2 to feature/transfermanager December 14, 2025 19:26
@GarrettBeatty GarrettBeatty changed the base branch from feature/transfermanager to gcbeatty/refactor2 December 14, 2025 19:26
@GarrettBeatty GarrettBeatty marked this pull request as ready for review December 15, 2025 00:11
Copy link
Contributor

@peterrsongg peterrsongg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. A small optimization could be made to reduce an allocation of an additional Collection on line https://github.com/aws/aws-sdk-net/pull/4228/changes#diff-58b3d78595e1526f2f5df0b2c5804bb23e2a97cdea6b73c0e644f8ea141ff836R42

by passing in a HashSet, but that's not a blocker for me.

Base automatically changed from gcbeatty/refactor2 to feature/transfermanager December 16, 2025 14:39
remove downloadexception field

code updates

code updates

code updates
@GarrettBeatty GarrettBeatty merged commit 9ade4e1 into feature/transfermanager Dec 16, 2025
1 check passed
@GarrettBeatty GarrettBeatty deleted the gcbeatty/cancelfix branch December 16, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants