Skip to content

Conversation

@yurydelendik
Copy link
Contributor

@yurydelendik yurydelendik commented Sep 2, 2025

Draft implementation of the proposal that can be found at https://github.com/WebAssembly/custom-descriptors/blob/main/proposals/custom-descriptors/Overview.md.

Keeping it as draft:

  • PackedIndex or RefType mask extension for exact type is okay
  • Ensure validation rules properly implemented
  • Has some support for wasm-smith
  • Exact / heap types for imports/ref.func

There are no tests or valid examples in the proposal yet: providing my own.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed the validation/bit-packing-RefType bits and I'll cc @fitzgen for those bits too as I'm sure he'll be interested in them when they shape up. Otherwise I left a few comments here and there. Overall thanks for this though!

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yurydelendik yurydelendik force-pushed the custom-descr branch 3 times, most recently from 9763983 to 9d29633 Compare October 23, 2025 17:56
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing RefType to 4 bytes is going to make ValType larger than 4 bytes, which has been observed to have been a pretty big slow down in part work. I'm going to try and avoid that via

  • Allowing ValType to remain a nice Rust enum and grow larger than 4 bytes, but then
  • Creating an internal ValTypePacked type that is still 4 bytes but is not a nice enum and has worse ergonomics
  • Making the core validation loop use ValTypePacked instead of ValType

This should relieve the pressure you're bumping up against. I'll whip this up and then once it has landed on main you can rebase and then this should be able to merge (modulo any other review comments Alex might have, I haven't looked at anything other than the RefType bitpacking stuff).

Thanks for your patience @yurydelendik!

@fitzgen
Copy link
Member

fitzgen commented Oct 24, 2025

I'll whip this up

I was hoping I would finish this up before the weekend but it looks like I won't. Since the CG meeting in Munich is next week, I probably won't finish this until the week afterwards.

@fitzgen
Copy link
Member

fitzgen commented Nov 7, 2025

Draft PR here: #2369

Alex has ideas about landing this PR before that, though.

@alexcrichton
Copy link
Member

Ok I've done a bit of benchmarking now. With this PR as-is:

bench results
validate/tests          time:   [8.5038 ms 8.5072 ms 8.5128 ms]
                        change: [+3.1259% +3.2126% +3.3074%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/lots-of-unique-fn-types
                        time:   [1.6182 ms 1.6190 ms 1.6200 ms]
                        change: [+4.2532% +4.3423% +4.4290%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/lots-of-unique-fn-types
                        time:   [764.95 µs 765.34 µs 765.94 µs]
                        change: [+5.6912% +5.7338% +5.7780%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/intgemm-simd   time:   [725.08 µs 725.26 µs 725.46 µs]
                        change: [-2.5999% -2.5459% -2.5027%] (p = 0.00 < 0.05)
                        Performance has improved.

validate-old/intgemm-simd
                        time:   [713.58 µs 713.74 µs 713.90 µs]
                        change: [-2.7302% -2.6875% -2.6424%] (p = 0.00 < 0.05)
                        Performance has improved.

validate/bz2            time:   [313.12 µs 313.41 µs 313.75 µs]
                        change: [-7.5212% -7.4616% -7.3949%] (p = 0.00 < 0.05)
                        Performance has improved.

validate-old/bz2        time:   [310.04 µs 310.17 µs 310.32 µs]
                        change: [-8.2737% -8.1943% -8.0962%] (p = 0.00 < 0.05)
                        Performance has improved.

validate/lots-of-empty-fn-types
                        time:   [611.75 µs 612.03 µs 612.33 µs]
                        change: [+6.5110% +6.6114% +6.7036%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/lots-of-empty-fn-types
                        time:   [687.19 µs 687.32 µs 687.48 µs]
                        change: [+6.3189% +6.3881% +6.4912%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/spidermonkey   time:   [18.976 ms 18.981 ms 18.987 ms]
                        change: [-3.8797% -3.8300% -3.7809%] (p = 0.00 < 0.05)
                        Performance has improved.

validate-old/spidermonkey
                        time:   [19.005 ms 19.008 ms 19.012 ms]
                        change: [-3.8858% -3.8447% -3.8097%] (p = 0.00 < 0.05)
                        Performance has improved.

validate/pulldown-cmark time:   [793.03 µs 793.35 µs 793.79 µs]
                        change: [-2.5407% -2.4893% -2.4298%] (p = 0.00 < 0.05)
                        Performance has improved.

validate-old/pulldown-cmark
                        time:   [787.91 µs 788.14 µs 788.41 µs]
                        change: [-3.1519% -3.1091% -3.0608%] (p = 0.00 < 0.05)
                        Performance has improved.

So not obviously bad. It's actually a speedup on all 'real world' modules but is an equivalent regression on synthetic stressor modules.

Next I took #2369 and massaged it into the validator as well. The commit is alexcrichton@c2d95e4 and is pretty messy as my goal was just to get to the point of benchmarking. Try as I might it's a net regression across the board:

bench numbers
validate/tests          time:   [8.6207 ms 8.6267 ms 8.6365 ms]
                        change: [+4.5500% +4.6617% +4.8015%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/lots-of-unique-fn-types
                        time:   [1.6303 ms 1.6312 ms 1.6324 ms]
                        change: [+5.0902% +5.2510% +5.4381%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/lots-of-unique-fn-types
                        time:   [756.01 µs 756.24 µs 756.51 µs]
                        change: [+4.5071% +4.5594% +4.6147%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/intgemm-simd   time:   [793.10 µs 793.51 µs 794.08 µs]
                        change: [+6.5632% +6.6305% +6.6962%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/intgemm-simd
                        time:   [781.38 µs 781.66 µs 781.94 µs]
                        change: [+6.5166% +6.5586% +6.6019%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/bz2            time:   [347.23 µs 347.40 µs 347.69 µs]
                        change: [+2.5307% +2.5825% +2.6475%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/bz2        time:   [344.52 µs 344.78 µs 345.28 µs]
                        change: [+1.9069% +2.0015% +2.1150%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/lots-of-empty-fn-types
                        time:   [610.01 µs 610.26 µs 610.52 µs]
                        change: [+6.2306% +6.3265% +6.4184%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/lots-of-empty-fn-types
                        time:   [678.92 µs 679.28 µs 679.86 µs]
                        change: [+5.0427% +5.1188% +5.2261%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/spidermonkey   time:   [20.722 ms 20.729 ms 20.737 ms]
                        change: [+4.9696% +5.0265% +5.0852%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate-old/spidermonkey
                        time:   [20.741 ms 20.745 ms 20.749 ms]
                        change: [+4.8940% +4.9394% +4.9792%] (p = 0.00 < 0.05)
                        Performance has regressed.

validate/pulldown-cmark time:   [857.67 µs 857.86 µs 858.06 µs]
                        change: [+5.3636% +5.4135% +5.4581%] (p = 0.00 < 0.05)
                        Performance has regressed.


validate-old/pulldown-cmark
                        time:   [855.23 µs 855.41 µs 855.60 µs]
                        change: [+5.0936% +5.1401% +5.1899%] (p = 0.00 < 0.05)
                        Performance has regressed.

I've tried a number of ways to further optimize this in my commit, but to no avail. In retrospect this makes sense though:

  • ValType is extremely pervasive and this PR inflates it from 4 to 8 bytes. While we're changing the validator to store ValTypePacked nowhere else is changing so everywhere else is getting a larger memory footprint. I think this is why the synthetic modules are regressing.
  • Validator performance with ValTypePacked is pretty bad because, looking at the generated code, there's a lot of conversion between ValType and ValTypePacked. The theory of ValTypePacked is that its conversion gets inlined away and is zero-cost, but in practice that's just not happening. It's certainly happening in some places, and I can see it, but it's not happening in all cases which is basically what's required here.

So my conclusion is that we should land this PR as-is. Update the test to accept an 8-byte MaybeType instead of a 4-byte MaybeType and just accept that ValType is now 8 bytes instead of 4 bytes. I don't think it's worth replacing ValType with ValTypePacked (e.g. have it be an opaque 32-bit integer) because that loses match exhaustiveness. I also don't think it's worth having two representations because it's objectively slower and also a larger maintenance burden. In the end if we want this to be faster I think we just need to wait for future Rust language features such as pattern types and/or the ability to define a type which has a broader niche than just NonZero for example.

@fitzgen
Copy link
Member

fitzgen commented Nov 7, 2025

  • ValType is extremely pervasive and this PR inflates it from 4 to 8 bytes. While we're changing the validator to store ValTypePacked nowhere else is changing so everywhere else is getting a larger memory footprint. I think this is why the synthetic modules are regressing.

  • Validator performance with ValTypePacked is pretty bad because, looking at the generated code, there's a lot of conversion between ValType and ValTypePacked. The theory of ValTypePacked is that its conversion gets inlined away and is zero-cost, but in practice that's just not happening. It's certainly happening in some places, and I can see it, but it's not happening in all cases which is basically what's required here.

Yeah I was hoping that the conversions would pretty much get inlined away and it is unfortunate that you aren't seeing that.

I think I agree with your conclusion that we can land this PR as-is (modulo CI going green) but I think it is worth thinking a tiny bit more about ValTypePacked before giving up there. Specifically, where are the conversions not getting inlined away? I would imagine it is at the places where valtypes enter/exit the validator code itself, not within the validator. Assuming that is correct, are there perhaps a couple choice places outside the validator that we could change to use the packed variant and avoid most of the conversion costs? On the other hand, if the conversions aren't boiling away in the middle of validator's hot loop, then this whole approach is pretty dead in the water.

@fitzgen
Copy link
Member

fitzgen commented Nov 7, 2025

So not obviously bad. It's actually a speedup on all 'real world' modules but is an equivalent regression on synthetic stressor modules.

FWIW, I think we should care about the real world modules way more than the synthetic ones.

@alexcrichton
Copy link
Member

On the other hand, if the conversions aren't boiling away in the middle of validator's hot loop, then this whole approach is pretty dead in the water.

More-or-less this is what I saw. For example, this is one of the hottest methods in the validator:

    fn visit_local_set(&mut self, local_index: u32) -> Self::Output {
        let ty = self.local(local_index)?;
        self.pop_operand(Some(ty))?;
        self.local_inits.set_init(local_index);
        Ok(())
    }

Here self.local is reading from some storage of ValTypePacked. Later pop_operand reads a MabyeTypePacked representation. Here the ty isn't a "constant" like it is in other places (e.g. pop_operand(Some(ValType::I32))) but it's genuinely a runtime variable. LLVM was unable to boil away all the conversions so self.local, which returns ValType, would decode and then it's get re-encoded in pop_operand to MaybeTypePacked.

We could fix this by changing self.local to return ValTypePacked I think, but that starts bringing up the question of just wholesale replacing ValType with ValTypePacked everywhere in the validator. Once we do that there's the question of everythign else and .... then I reached the conclusion we shouldn't do this

@fitzgen
Copy link
Member

fitzgen commented Nov 7, 2025

We could fix this by changing self.local to return ValTypePacked I think, but that starts bringing up the question of just wholesale replacing ValType with ValTypePacked everywhere in the validator. Once we do that there's the question of everythign else and .... then I reached the conclusion we shouldn't do this

Yeah, makes sense to me.

Really wish Rust didn't make us choose between packing and ergonomics!

@alexcrichton
Copy link
Member

@yurydelendik alright if you're ok with it the ball's now back in your court. I pushed up two commits here (feel free to rebase/squash them away as you see fit) as well as a merge with main. Notably in 162dff0 I raised the assertion limit for MaybeType to unblock this PR. In the next commit I updated the spec test suite submodule which has a lot more custom-descriptors tests. Would you be ok adjusting text/validation/etc as necesary to get the spec test suite passing?

@yurydelendik
Copy link
Contributor Author

@yurydelendik alright if you're ok with it the ball's now back in your court. I pushed up two commits here (feel free to rebase/squash them away as you see fit) as well as a merge with main. Notably in 162dff0 I raised the assertion limit for MaybeType to unblock this PR. In the next commit I updated the spec test suite submodule which has a lot more custom-descriptors tests. Would you be ok adjusting text/validation/etc as necesary to get the spec test suite passing?

Thanks. I rebased and adjusted most of syntax. Changed logic around ref.get_desc and ref.cast_desc too.

Not implement is parsing of some 'exact' case, e.g. in imports -- that requires some deep diving into data structures. Will be looking at that next

@alexcrichton
Copy link
Member

Happy to help out if you get stuck! I just pushed up a commit which is a re-BLESS=1 of the cli testsuite which cuts things down to just a few failures

@alexcrichton
Copy link
Member

Also, if you'd like, these tests are pretty minor so I think it would be reasonable to land what you have here and address the missing functionality in a follow-up too. Whichever you'd prefer.

@yurydelendik
Copy link
Contributor Author

Also, if you'd like, these tests are pretty minor so I think it would be reasonable to land what you have here and address the missing functionality in a follow-up too. Whichever you'd prefer.

That sounds good. We can land what is here so far, at any moment. I'll continue working on exact types in imports.

@yurydelendik yurydelendik marked this pull request as ready for review November 14, 2025 17:38
@yurydelendik yurydelendik requested a review from a team as a code owner November 14, 2025 17:38
@yurydelendik yurydelendik requested review from pchickey and removed request for a team November 14, 2025 17:38
@alexcrichton
Copy link
Member

Ok sounds good, I'll do one final pass on review and frob CI to get the test to be expected to fail and then land this

@alexcrichton alexcrichton requested review from alexcrichton and removed request for pchickey November 14, 2025 18:30
@alexcrichton alexcrichton added this pull request to the merge queue Nov 14, 2025
Merged via the queue into bytecodealliance:main with commit ec6b54a Nov 14, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants