Skip to content

Conversation

@paulb777
Copy link
Member

Fix #15281

Gemini's analysis:

After analyzing firebase_auth_credentials_provider_apple.mm, I've identified the source of the crash.

The Problem

Here's the relevant section of the GetToken method in the Firestore SDK:

    1 // ...
    2   std::weak_ptr<Contents> weak_contents = contents_;
    3   void (^get_token_callback)(NSString*, NSError*) =
    4       ^(NSString* _Nullable token, NSError* _Nullable error) {
    5         std::shared_ptr<Contents> contents = weak_contents.lock();
    6         if (!contents) {
    7           return;
    8         }
    9         // ... (rest of the logic)
   10       };
   11
   12   if (contents_->auth) {
   13     [contents_->auth getTokenForcingRefresh:force_refresh_
   14                                withCallback:get_token_callback];
   15   }
   16 // ...

The C++ object FirebaseAuthCredentialsProvider holds a shared pointer (std::shared_ptr) to its internal state called contents_.

  1. A std::weak_ptr (weak_contents) is created from contents_.
  2. An Objective-C block (get_token_callback) is created to serve as the completion handler.
  3. Crucially, this block captures the std::weak_ptr. When the async getTokenForcingRefresh method is called, the block is passed along.
  4. If the C++ FirebaseAuthCredentialsProvider object is deallocated before the token fetch completes, its contents_ member is destroyed along with it.
  5. When the get_token_callback block eventually runs, weak_contents.lock() correctly fails to acquire a shared_ptr because the object is
    gone. The if (!contents) { return; } check handles this correctly.

However, the crash isn't happening there. It's happening because of what the block doesn't capture.

The block implicitly captures the C++ completion object by value. This completion is a std::function, which itself holds a reference to the necessary context to continue the Firestore operation. When the FirebaseAuthCredentialsProvider is destroyed, the context that this completion function needs is also destroyed.

The race condition is:

  1. Firestore requests a token from Auth via FirebaseAuthCredentialsProvider.
  2. The FirebaseAuthCredentialsProvider object is deallocated for some reason (e.g., the Firestore instance is shut down).
  3. The async getToken operation completes on a background thread.
  4. The get_token_callback block is invoked.
  5. Inside the block, it tries to call the completion function, but the underlying object that completion needs to operate on has been destroyed.
  6. Crash. The objc_retain in your stack trace is likely from the Objective-C runtime trying to manage the memory of one of these captured objects as the block is executed, but the object is already a zombie.

The Solution

This is a bug within the Firestore SDK. The fix is to ensure that the completion object (and any other necessary state) is handled safely within the callback block, likely by capturing self or a shared_ptr to self and checking for its existence before proceeding.

I will now patch the file to fix this issue. I'll modify the block to capture a shared_ptr to the FirebaseAuthCredentialsProvider itself, ensuring it stays alive until the callback completes.

@gemini-code-assist
Copy link
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the tests here are changing how the provider is created.

Do the added class members need to be public?

// Firestore/core/src/credentials/firebase_app_check_credentials_provider_apple.h
      public std::enable_shared_from_this<FirebaseAppCheckCredentialsProvider> {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's Gemini's explanation for why the tests needed changes:

When the production code is built, it uses std::make_shared. But when the unit tests are built,
they are likely creating the FirebaseAuthCredentialsProvider object directly on the stack, like this:

1 // From your search results in firebase_auth_credentials_provider_test.mm
2 FirebaseAuthCredentialsProvider credentials_provider(app, auth);

When credentials_provider.GetToken(...) is called in the test, it's being called on a stack-allocated object. The object isn't managed
by a shared_ptr, so the shared_from_this() call inside GetToken fails.

I need to modify the unit test file (firebase-ios-sdk/Firestore/core/test/unit/credentials/firebase_auth_credentials_provider_test.mm)
to create the provider using std::make_shared, mirroring the production code. This will resolve the build error and ensure the tests are
more accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that explanation sounds reasonable.

Copy link
Member

@ncooke3 ncooke3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with Cheryl and Morgan's reviews.

Copy link
Contributor

@morganchen12 morganchen12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a bit hand-wavy:

  1. Inside the block, it tries to call the completion function, but the underlying object that completion needs to operate on has been destroyed.
  2. Crash. The objc_retain in your stack trace is likely from the Objective-C runtime trying to manage the memory of one of these captured objects as the block is executed, but the object is already a zombie.

For 5 we don't see another stack frame for the completion function invocation (though it could be inlined), and for 6 it would be nice to know what the actual zombie object is.

However, this bit of code is the most likely culprit given it's the only(?) place we consume the Auth token refresh completion API ourselves and the C++/ObjC boundary does require special consideration in a few places. I tried leafing through the code to identify what the most likely zombie was and didn't come to any meaningful conclusions.

The extension to the lifespan of the credential providers object via capture by Auth may be non-trivial, but I will let Cheryl decide if it's impactful or not.

Please wait for Cheryl's approval before merging.

@cherylEnkidu
Copy link
Contributor

I’m looking into this, but it will likely take a few days. Please feel free to reach out if you don’t hear back from me after a while.

@paulb777
Copy link
Member Author

paulb777 commented Dec 3, 2025

@morganchen12 Here's Gemini's response to your code review:

Screenshot 2025-12-02 at 6 08 07 PM

@morganchen12
Copy link
Contributor

The assertion that the zombie object is the std::function is most likely untrue since that object would be managed with C++ calls and not objc_release, but the rest is pretty plausible. The most likely zombie object is the app_check_listener_handle_ in firebase_app_check_credentials_provider_apple.h: everything else has been wrapped in Context explicitly to protect against this kind of memory error, although using shared_ptr instead of weak_ptr to guarantee the lifespan of the objects through the invocation of the closure seems prudent.

Overall still LGTM, but wait for Cheryl's review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash while fetching the auth token

5 participants