THREESCALE-10555: Resync applications in batches #453

jlledom · 2025-11-24T10:10:03Z

Basically, we want to optimize the next rake tasks from porta:

backend:storage:enqueue_rewrite
backend:storage:rewrite
backend:storage:rewrite_provider

The proc in porta that syncs data to backend looks like this:

REWRITER = ->(cinstance) do
  cinstance.send(:update_backend_application)
  cinstance.send(:update_backend_user_key_to_application_id_mapping)
  # to ensure that there is a 'backend_object' - there are
  # invalid data floating around
  if cinstance.provider_account
    cinstance.application_keys.each { |app_key| app_key.send(:update_backend_value) }
    cinstance.referrer_filters.each { |ref_filter| ref_filter.send(:update_backend_value) }
  end
end

So, for a provider with e.g. 500 applications, this ends up with:

500 requests via update_backend_application
500 requests via update_backend_user_key_to_application_id_mapping
N requests via app_key#update_backend_value, depending on the service using app_id + app_key auth, and how many keys are there.
A few requests via ref_filter#update_backend_value

So more than 1000 requests for this provider, more than 1500 if the service uses app_id + app_key auth method.

After this PR, porta will have a new backend endpoint available, so it can batch apps and send all of them in 1 request.

Co-Authored-By: Claude <[email protected]>

mayorova · 2025-11-24T17:31:31Z

I think that's nice!
Well, first request is to add spec for this in docs/specs/backend_internal.yaml.

Secondly, I am thinking about the following scenario:

there is an application already stored in backend Redis with some referer filters (say, ref1 and ref2 and application keys key1 and key2
we send an update for the application, with the current values as per system DB: ref3, ref4 for filters, and key3, key4 for keys.

Why there is a mismatch? well, maybe there was some hiccup in the communication between system and backend, and those update jobs were lost somehow. This is one of the purposes of the storage:rewrite worker - to get the two systems back in sync.

But I believe with the current approach, we are not deleting the old values from Redis? (so, key1 and key2 will still be valid, even though they shouldn't?

I believe this is the actual behavior with current process too, and with the current one-object-at-a-time approach it's kinda difficult to handle this.

But now as we pass the exact complete desired state of each applications, maybe we could overcome this limitation?
(unless the problem doesn't actually exist and I made it up)

So, for the two arrays, we may also try to ensure that any other values that are not supposed to be there, are deleted. WDYT?

mayorova · 2025-11-24T17:32:23Z

app/api/internal/applications.rb

+                application: app.to_hash.merge(
+                  application_keys: app.keys,
+                  referrer_filters: app.referrer_filters
+                )


We add the keys and referrers to the resulting object. Can we also add the user_key (it seems that it's the only thing that's missing in the resulting application)?

I've been checking this and decided not do it. The reason is the absurdly complicated way we have to store user keys in apisonator. Instead of storing it under something like

application/service_id:#{service_id}/id:#{id}/user_key

It's actually stored as:

application/service_id:#{service_id}/key:#{key}/id

So the user key itself is stored in the redis key, and the value for that key is the app id. The ancients did it again. An app can only have one key, so when a user key is rotated, a new redis key it's created. Because of this, the previous user key has to be removed manually when rotating and also when deleting an application. If there's any inconsistency, that would lead to old rotated keys already valid and being usable, also orphaned keys.

In fact, I followed the whole flow when you remove and app from porta all the way to backend, and I'd say user keys are always orphaned when removing an app.

With regards to returning the user key in the response, another side effect of this weird way to store user keys is that, if we want to retrieve the correct user key, we'll need to read all user keys for all apps in that service and return the one which value is the given app_id.

My guess is this design was a way to authenticate apps faster when receiving a hit. And I understand it, but has this undesired side effects and we can't change this now.

The fact is, if we want to return the user key in the response, the only plausible wya to do it is just returning the received :user_key parameter, like "this is the received user key and storing it didn't failed, so it must be the same one stored in redis". But I don't like this because all other data in the response is loaded from redis, we are not retruning parameters data directly in the response, and I don't think we should do an exception here.

The fact is, if we want to return the user key in the response, the only plausible wya to do it is just returning the received :user_key parameter, like "this is the received user key and storing it didn't failed, so it must be the same one stored in redis".

This was what I had in mind for my request 😬

To verify that "it is the one stored in redis", we can just fetch application/service_id:#{service_id}/key:#{key}/id and confirm it matches the app ID.

But OK, if you really don't want to add it, it's fine.

To verify that "it is the one stored in redis", we can just fetch application/service_id:#{service_id}/key:#{key}/id and confirm it matches the app ID.

I didn't think on this and can be done since it's only one redis command. I'll do it 👍

Done: 2bd7454

I also updated tests (69362c3) and spec (8bd9e0c)

jlledom · 2025-11-26T09:27:34Z

Well, first request is to add spec for this in docs/specs/backend_internal.yaml.

Working on it...

Secondly, I am thinking about the following scenario:
* there is an application already stored in backend Redis with some referer filters (say, `ref1` and `ref2` and application keys `key1` and `key2`

* we send an update for the application, with the current values as per system DB: `ref3`, `ref4` for filters, and `key3`, `key4` for keys.
Why there is a mismatch? well, maybe there was some hiccup in the communication between system and backend, and those update jobs were lost somehow. This is one of the purposes of the storage:rewrite worker - to get the two systems back in sync.

But I believe with the current approach, we are not deleting the old values from Redis? (so, key1 and key2 will still be valid, even though they shouldn't?

Every time you see an inconsistency like this... it was Claude!

WDYT?

Yeah, I'll fix this behavior and use a complete replace approach.

Co-Authored-By: Claude <[email protected]>

jlledom · 2025-11-26T13:37:19Z

I think that's nice! Well, first request is to add spec for this in docs/specs/backend_internal.yaml.

Done: 5834a0d

Secondly, I am thinking about the following scenario:
* there is an application already stored in backend Redis with some referer filters (say, `ref1` and `ref2` and application keys `key1` and `key2`

* we send an update for the application, with the current values as per system DB: `ref3`, `ref4` for filters, and `key3`, `key4` for keys.
Why there is a mismatch? well, maybe there was some hiccup in the communication between system and backend, and those update jobs were lost somehow. This is one of the purposes of the storage:rewrite worker - to get the two systems back in sync.

After checking this, I also think it's not worth doing it, at least in this PR. Because what you mention is applicable to all application data and even applications themselves, not only referrer filters:

applications themselves:
- remove all existing applications not in the given list
user keys
- remove old user keys of given applications
- remove keys for all applications not in the list
application keys and referrer filters
- remove old items from given applications
- remove items from applications not in the list

This is what we should do if we wanted to make this endpoint really sync with porta.

Doing each one of the operations above is not easy, redis doesn't have transactions like relational DBs, so we can't just wipe and add new items, because if new items fail to be added, the list would be left wiped. Also, during a few milliseconds, the list would be actually wiped while we add the new items.

That means the only approach we have is generate the diff and remove particular items. I see this approach error prone, even more considering we also have to sync the memoization cache. And all of this during runtime with requests coming.

What I would need in order to do this in a way I would feel confident enough is having atomicity and rollback features in redis. According to Claude, the only way to do that is writing and executing Lua scripts. We would need to replace the save_application_with_data method I added in this PR by a Lua script that would sync one single application atomically and all-or-nothing.

I think it's interesting to investigate the Lua way, but in another PR in my opinion. Adding a new endpoint that does exactly the same we were doing before but in batches is a wide enough scope for this PR, we can address the sync feature later on. WDYT?

Edit: I found this. Even with Lua scripts there is no rollback in redis, there is no way to atomically transition from one consistent state to another consistent state.

mayorova · 2025-12-03T10:00:34Z

docs/specs/backend_internal.yaml

+                            allOf:
+                            - $ref: "#/components/schemas/Application"
+                            - type: object
+                              properties:
+                                user_key:
+                                  type:
+                                  - "null"
+                                  - string
+                                application_keys:
+                                  type: array
+                                  items:
+                                    type: string
+                                referrer_filters:
+                                  type: array
+                                  items:
+                                    type: string


I think this is the same model as in request body now? "Application schema" + user_key, application_keys and referrer_filters...
Maybe this can be put into a schema and referenced in both places? (request and response)

Not exactly the same, because the response includes the field service_id, which is not part of the request, since the request includes the service id as a URL param.

mayorova · 2025-12-03T11:03:09Z

app/api/internal/applications.rb

      internal_api '/services/:service_id/applications' do
+        # Helper method to save a single application with all its associated data
+        def save_application_with_data(service_id, app_data)
+          app_id = app_data[:id]


We need to validate the presence of the id field.
Currently, this is considered successful:

{ "applications": [ { "state": "ok" } ] }

and it creates invalid entries in redis:

[db8] > keys * 1) "application/service_id:2/id:/state" 2) "service_id:2/applications" [db8] > get application/service_id:2/id:/state "ok" [db8] > smembers service_id:2/applications 1) ""

I added some validation to fix this: 5d3e46d

This looks like:

"applications": [ { "status": "error", "id": null, "error": "Application has no id " }, { "status": "modified", "application": { "service_id": "3", "id": "0f3a6cf3", "state": "active", "plan_id": "7", "plan_name": "Basic", "redirect_url": null, "user_key": null, "application_keys": [ "bbbb" ], "referrer_filters": [ "*" ] } } ]

On you request example, you're sending an invalid state value: ok. Since I was already working on validation I decided to validate this field also... however, apparently there are some inconsistencies on which states should accepted.

Apparently, there are for possible states in which a cinstance can be (ref):

state :pending state :live state :suspended state :deprecated

But, when sending this to porta, live is translated into active (ref):

if plan && service state = self.state state = :active if live? ThreeScale::Core::Application.save( :service_id => service.backend_id, :id => application_id, :state => state, :plan_id => plan.id, :plan_name => plan.name, :redirect_url => redirect_url ) end

However, some tests in apisonator insert live directly to DB:

apisonator/test/test_helpers/fixtures.rb

Line 140 in 7a113dd

Application.save(service_id: @service_id, id: @app_id, state: :live)

According to the API Docs, only active, pending and suspended are allowed.

apisonator/docs/specs/backend_internal.yaml

Lines 1743 to 1748 in 8bd9e0c

state:

type: string

enum:

- active

- pending

- suspended

For that reason, I validate only those three and I modified the tests to not insert live anymore. However, I'm aware we could have anything in production, so this must be breaking changes. WDYT?

Ah, right, at first I thought this validation would only apply for the new batch updates - then it wouldn't be concerning. But indeed, it can happen for any application updates, and indeed, we don't know what's there in production.

So, I'd remove the ApplicationStateInvalid error.

Indeed, in my example, I used state=ok, but I guess I wasn't clear enough that that was not bothering me, the only problem I saw was a missing ID, not the "invalid" state.

Done: 316ea2f

Co-Authored-By: Claude <[email protected]>

mayorova

Looks good 👍

jlledom requested review from akostadinov and mayorova November 24, 2025 10:10

jlledom self-assigned this Nov 24, 2025

jlledom and others added 2 commits November 24, 2025 11:12

New API endpoint for batch updating applications

6b43355

Co-Authored-By: Claude <[email protected]>

Add tests for the new endpoint

48d7021

Co-Authored-By: Claude <[email protected]>

jlledom force-pushed the THREESCALE-10555-resync-application-batches branch from 87bd9d6 to 48d7021 Compare November 24, 2025 10:14

jlledom marked this pull request as ready for review November 24, 2025 10:19

mayorova reviewed Nov 24, 2025

View reviewed changes

jlledom and others added 2 commits November 26, 2025 13:52

Add new endpoint to OpenAPI 3.1 spec

5834a0d

Co-Authored-By: Claude <[email protected]>

Fix a small error in the spec

1e45b31

jlledom added 3 commits November 26, 2025 17:34

Return user_key in the response

2bd7454

Update tests to verify user_key field

69362c3

Add user_key field to response schema and example

8bd9e0c

jlledom requested a review from mayorova November 28, 2025 11:27

mayorova reviewed Dec 3, 2025

View reviewed changes

jlledom and others added 6 commits December 9, 2025 16:40

Validate applications before saving

5d3e46d

Add tests about validation

c2e7038

Co-Authored-By: Claude <[email protected]>

Don't accept live state

12d6e18

Fix tests

e8d8e50

Remove state validation

316ea2f

Update tests

0984001

jlledom requested a review from mayorova December 9, 2025 16:31

mayorova approved these changes Dec 9, 2025

View reviewed changes

THREESCALE-10555: Resync applications in batches #453

Are you sure you want to change the base?

THREESCALE-10555: Resync applications in batches #453

Conversation

jlledom commented Nov 24, 2025

Uh oh!

mayorova commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlledom commented Nov 26, 2025

Uh oh!

jlledom commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayorova left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jlledom commented Nov 26, 2025 •

edited

Loading