Skip to content

Conversation

@jlledom
Copy link
Contributor

@jlledom jlledom commented Nov 24, 2025

Basically, we want to optimize the next rake tasks from porta:

backend:storage:enqueue_rewrite
backend:storage:rewrite
backend:storage:rewrite_provider

The proc in porta that syncs data to backend looks like this:

REWRITER = ->(cinstance) do
  cinstance.send(:update_backend_application)
  cinstance.send(:update_backend_user_key_to_application_id_mapping)
  # to ensure that there is a 'backend_object' - there are
  # invalid data floating around
  if cinstance.provider_account
    cinstance.application_keys.each { |app_key| app_key.send(:update_backend_value) }
    cinstance.referrer_filters.each { |ref_filter| ref_filter.send(:update_backend_value) }
  end
end

So, for a provider with e.g. 500 applications, this ends up with:

  • 500 requests via update_backend_application
  • 500 requests via update_backend_user_key_to_application_id_mapping
  • N requests via app_key#update_backend_value, depending on the service using app_id + app_key auth, and how many keys are there.
  • A few requests via ref_filter#update_backend_value

So more than 1000 requests for this provider, more than 1500 if the service uses app_id + app_key auth method.

After this PR, porta will have a new backend endpoint available, so it can batch apps and send all of them in 1 request.

@jlledom jlledom self-assigned this Nov 24, 2025
@jlledom jlledom force-pushed the THREESCALE-10555-resync-application-batches branch from 87bd9d6 to 48d7021 Compare November 24, 2025 10:14
@jlledom jlledom marked this pull request as ready for review November 24, 2025 10:19
@mayorova
Copy link
Contributor

I think that's nice!
Well, first request is to add spec for this in docs/specs/backend_internal.yaml.

Secondly, I am thinking about the following scenario:

  • there is an application already stored in backend Redis with some referer filters (say, ref1 and ref2 and application keys key1 and key2
  • we send an update for the application, with the current values as per system DB: ref3, ref4 for filters, and key3, key4 for keys.

Why there is a mismatch? well, maybe there was some hiccup in the communication between system and backend, and those update jobs were lost somehow. This is one of the purposes of the storage:rewrite worker - to get the two systems back in sync.

But I believe with the current approach, we are not deleting the old values from Redis? (so, key1 and key2 will still be valid, even though they shouldn't?

I believe this is the actual behavior with current process too, and with the current one-object-at-a-time approach it's kinda difficult to handle this.

But now as we pass the exact complete desired state of each applications, maybe we could overcome this limitation?
(unless the problem doesn't actually exist and I made it up)

So, for the two arrays, we may also try to ensure that any other values that are not supposed to be there, are deleted. WDYT?

Comment on lines 52 to 55
application: app.to_hash.merge(
application_keys: app.keys,
referrer_filters: app.referrer_filters
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add the keys and referrers to the resulting object. Can we also add the user_key (it seems that it's the only thing that's missing in the resulting application)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been checking this and decided not do it. The reason is the absurdly complicated way we have to store user keys in apisonator. Instead of storing it under something like

application/service_id:#{service_id}/id:#{id}/user_key

It's actually stored as:

application/service_id:#{service_id}/key:#{key}/id

So the user key itself is stored in the redis key, and the value for that key is the app id. The ancients did it again. An app can only have one key, so when a user key is rotated, a new redis key it's created. Because of this, the previous user key has to be removed manually when rotating and also when deleting an application. If there's any inconsistency, that would lead to old rotated keys already valid and being usable, also orphaned keys.

In fact, I followed the whole flow when you remove and app from porta all the way to backend, and I'd say user keys are always orphaned when removing an app.

With regards to returning the user key in the response, another side effect of this weird way to store user keys is that, if we want to retrieve the correct user key, we'll need to read all user keys for all apps in that service and return the one which value is the given app_id.

My guess is this design was a way to authenticate apps faster when receiving a hit. And I understand it, but has this undesired side effects and we can't change this now.

The fact is, if we want to return the user key in the response, the only plausible wya to do it is just returning the received :user_key parameter, like "this is the received user key and storing it didn't failed, so it must be the same one stored in redis". But I don't like this because all other data in the response is loaded from redis, we are not retruning parameters data directly in the response, and I don't think we should do an exception here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact is, if we want to return the user key in the response, the only plausible wya to do it is just returning the received :user_key parameter, like "this is the received user key and storing it didn't failed, so it must be the same one stored in redis".

This was what I had in mind for my request 😬

To verify that "it is the one stored in redis", we can just fetch application/service_id:#{service_id}/key:#{key}/id and confirm it matches the app ID.

But OK, if you really don't want to add it, it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To verify that "it is the one stored in redis", we can just fetch application/service_id:#{service_id}/key:#{key}/id and confirm it matches the app ID.

I didn't think on this and can be done since it's only one redis command. I'll do it 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 2bd7454

I also updated tests (69362c3) and spec (8bd9e0c)

@jlledom
Copy link
Contributor Author

jlledom commented Nov 26, 2025

Well, first request is to add spec for this in docs/specs/backend_internal.yaml.

Working on it...

Secondly, I am thinking about the following scenario:

* there is an application already stored in backend Redis with some referer filters (say, `ref1` and `ref2` and application keys `key1` and `key2`

* we send an update for the application, with the current values as per system DB: `ref3`, `ref4` for filters, and `key3`, `key4` for keys.

Why there is a mismatch? well, maybe there was some hiccup in the communication between system and backend, and those update jobs were lost somehow. This is one of the purposes of the storage:rewrite worker - to get the two systems back in sync.

But I believe with the current approach, we are not deleting the old values from Redis? (so, key1 and key2 will still be valid, even though they shouldn't?

Every time you see an inconsistency like this... it was Claude!

WDYT?

Yeah, I'll fix this behavior and use a complete replace approach.

@jlledom
Copy link
Contributor Author

jlledom commented Nov 26, 2025

I think that's nice! Well, first request is to add spec for this in docs/specs/backend_internal.yaml.

Done: 5834a0d

Secondly, I am thinking about the following scenario:

* there is an application already stored in backend Redis with some referer filters (say, `ref1` and `ref2` and application keys `key1` and `key2`

* we send an update for the application, with the current values as per system DB: `ref3`, `ref4` for filters, and `key3`, `key4` for keys.

Why there is a mismatch? well, maybe there was some hiccup in the communication between system and backend, and those update jobs were lost somehow. This is one of the purposes of the storage:rewrite worker - to get the two systems back in sync.

After checking this, I also think it's not worth doing it, at least in this PR. Because what you mention is applicable to all application data and even applications themselves, not only referrer filters:

  • applications themselves:
    • remove all existing applications not in the given list
  • user keys
    • remove old user keys of given applications
    • remove keys for all applications not in the list
  • application keys and referrer filters
    • remove old items from given applications
    • remove items from applications not in the list

This is what we should do if we wanted to make this endpoint really sync with porta.

Doing each one of the operations above is not easy, redis doesn't have transactions like relational DBs, so we can't just wipe and add new items, because if new items fail to be added, the list would be left wiped. Also, during a few milliseconds, the list would be actually wiped while we add the new items.

That means the only approach we have is generate the diff and remove particular items. I see this approach error prone, even more considering we also have to sync the memoization cache. And all of this during runtime with requests coming.

What I would need in order to do this in a way I would feel confident enough is having atomicity and rollback features in redis. According to Claude, the only way to do that is writing and executing Lua scripts. We would need to replace the save_application_with_data method I added in this PR by a Lua script that would sync one single application atomically and all-or-nothing.

I think it's interesting to investigate the Lua way, but in another PR in my opinion. Adding a new endpoint that does exactly the same we were doing before but in batches is a wide enough scope for this PR, we can address the sync feature later on. WDYT?

Edit: I found this. Even with Lua scripts there is no rollback in redis, there is no way to atomically transition from one consistent state to another consistent state.

@jlledom jlledom requested a review from mayorova November 28, 2025 11:27
Comment on lines +618 to +633
allOf:
- $ref: "#/components/schemas/Application"
- type: object
properties:
user_key:
type:
- "null"
- string
application_keys:
type: array
items:
type: string
referrer_filters:
type: array
items:
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same model as in request body now? "Application schema" + user_key, application_keys and referrer_filters...
Maybe this can be put into a schema and referenced in both places? (request and response)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly the same, because the response includes the field service_id, which is not part of the request, since the request includes the service id as a URL param.

internal_api '/services/:service_id/applications' do
# Helper method to save a single application with all its associated data
def save_application_with_data(service_id, app_data)
app_id = app_data[:id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate the presence of the id field.
Currently, this is considered successful:

{
  "applications": [
    {
      "state": "ok"
    }
  ]
}

and it creates invalid entries in redis:

[db8] > keys *
1) "application/service_id:2/id:/state"
2) "service_id:2/applications"

[db8] > get application/service_id:2/id:/state
"ok"

[db8] > smembers service_id:2/applications
1) ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some validation to fix this: 5d3e46d

This looks like:

"applications": [
		{
			"status": "error",
			"id": null,
			"error": "Application has no id "
		},
		{
			"status": "modified",
			"application": {
				"service_id": "3",
				"id": "0f3a6cf3",
				"state": "active",
				"plan_id": "7",
				"plan_name": "Basic",
				"redirect_url": null,
				"user_key": null,
				"application_keys": [
					"bbbb"
				],
				"referrer_filters": [
					"*"
				]
			}
		}
]

On you request example, you're sending an invalid state value: ok. Since I was already working on validation I decided to validate this field also... however, apparently there are some inconsistencies on which states should accepted.

Apparently, there are for possible states in which a cinstance can be (ref):

state :pending
state :live
state :suspended
state :deprecated

But, when sending this to porta, live is translated into active (ref):

if plan && service
state = self.state
state = :active if live?

ThreeScale::Core::Application.save( :service_id => service.backend_id,
                                    :id         => application_id,
                                    :state      => state,
                                    :plan_id    => plan.id,
                                    :plan_name  => plan.name,
                                    :redirect_url => redirect_url )
end

However, some tests in apisonator insert live directly to DB:

Application.save(service_id: @service_id, id: @app_id, state: :live)

According to the API Docs, only active, pending and suspended are allowed.

state:
type: string
enum:
- active
- pending
- suspended

For that reason, I validate only those three and I modified the tests to not insert live anymore. However, I'm aware we could have anything in production, so this must be breaking changes. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, at first I thought this validation would only apply for the new batch updates - then it wouldn't be concerning. But indeed, it can happen for any application updates, and indeed, we don't know what's there in production.

So, I'd remove the ApplicationStateInvalid error.

Indeed, in my example, I used state=ok, but I guess I wasn't clear enough that that was not bothering me, the only problem I saw was a missing ID, not the "invalid" state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 316ea2f

@jlledom jlledom requested a review from mayorova December 9, 2025 16:31
Copy link
Contributor

@mayorova mayorova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants