Skip to content

Conversation

@keithamus
Copy link

@keithamus keithamus commented Oct 2, 2025

This change is a proof of concept for how possible it would be to take
individual python modules and re-implement them in Rust, in a "Ship of
Theseus" style Rust rewrite.

This takes some of the simplest parts of the codebase - a list of
predicate free functions - and turns them into Rust checks. These
functions become quite a bit more trivial in Rust as Rust implements
many of these functions on char, so we simply delegate to those
methods.

These functions where chosen not because they're a slow path, but more
because they are straightforward functions that return booleans and show
demonstrate a good proof of concept. They certainly don't make things
slower, however.

The next steps would be to take some of the bigger functions in preds.py
that do full string comparison (such as isXMLishTagname), and port those.
This was avoided for now as these might be slightly more controversial,
as we might want to include dependencies for fast string matching.

I'm not very well versed in Python, this was mostly cobbled together using
https://medium.com/@MatthieuL49/a-mixed-rust-python-project-24491e2af424
and https://colliery.io/blog/rust-python-pattern/ as guides.

Some completely non empirical evaluation of timings, I ran the tests
with/without Rust bindings:

$ time BIKESHED_USE_RUST=0 ./bikeshed.py test --no-update
Running tests |████████████████████| 502/502 [100%] in 3:04.5 (2.72/s)
✔ All tests passed.
________________________________________________________
Executed in  184.82 secs    fish           external
   usr time  176.82 secs  717.00 micros  176.82 secs
   sys time    1.99 secs    0.00 micros    1.99 secs

$ time BIKESHED_USE_RUST=1 ./bikeshed.py test --no-update
Running tests [R] |████████████████████| 502/502 [100%] in 3:04.4 (2.72/s)
✔ All tests passed.

________________________________________________________
Executed in  184.72 secs    fish           external
   usr time  177.16 secs  481.00 micros  177.16 secs
   sys time    1.96 secs  172.00 micros    1.96 secs

As expected both take essentially the same time. This may prove 1 of 2
things:

  • I'm an idiot and haven't done this right.
  • The FFI boundary between Python/Rust isn't costing us much (at least
    for simple checks like these).

@keithamus keithamus force-pushed the implement-preds-py-as-a-rust-wheel-with-pyo3-maturin branch from 992d6ba to 43e9887 Compare October 2, 2025 11:57
This change is a proof of concept for how possible it would be to take
individual python modules and re-implement them in Rust, in a "Ship of
Theseus" style Rust rewrite.

This takes some of the simplest parts of the codebase - a list of
predicate free functions - and turns them into Rust checks. These
functions become quite a bit more trivial in Rust as Rust implements
many of these functions on `char`, so we simply delegate to those
methods.

These functions where chosen not because they're a slow path, but more
because they are straightforward functions that return booleans and show
demonstrate a good proof of concept. They certainly don't make things
_slower_, however.

The next steps would be to take some of the bigger functions in preds.py
that do full string comparison (such as isXMLishTagname), and port those.
This was avoided for now as these might be slightly more controversial,
as we might want to include dependencies for fast string matching.

I'm not very well versed in Python, this was mostly cobbled together using
https://medium.com/@MatthieuL49/a-mixed-rust-python-project-24491e2af424
and https://colliery.io/blog/rust-python-pattern/ as guides.

Some completely non empirical evaluation of timings, I ran the tests
with/without Rust bindings:

```sh
$ time BIKESHED_USE_RUST=0 ./bikeshed.py test --no-update
Running tests |████████████████████| 502/502 [100%] in 3:04.5 (2.72/s)
✔ All tests passed.
________________________________________________________
Executed in  184.82 secs    fish           external
   usr time  176.82 secs  717.00 micros  176.82 secs
   sys time    1.99 secs    0.00 micros    1.99 secs

$ time BIKESHED_USE_RUST=1 ./bikeshed.py test --no-update
Running tests [R] |████████████████████| 502/502 [100%] in 3:04.4 (2.72/s)
✔ All tests passed.

________________________________________________________
Executed in  184.72 secs    fish           external
   usr time  177.16 secs  481.00 micros  177.16 secs
   sys time    1.96 secs  172.00 micros    1.96 secs

```

As expected both take essentially the same time. This may prove 1 of 2
things:

- I'm an idiot and haven't done this right.
- The FFI boundary between Python/Rust isn't costing us much (at least
  for simple checks like these).
@keithamus keithamus force-pushed the implement-preds-py-as-a-rust-wheel-with-pyo3-maturin branch from 43e9887 to 7ae646b Compare October 2, 2025 12:08
@keithamus keithamus marked this pull request as ready for review October 2, 2025 12:08
@tabatkins
Copy link
Collaborator

Oooh, hell yeah. I've been meaning to do this exact thing for a while.

Note that I've got a huge change to the custom parser living in a branch, tho the edits to this file in particular are minimal. Don't go converting parser.py, tho. 😁

Another good test case will be the serializer, I think. Self-contained, lots of string building that I think is slow in Python, and a noticable chunk of time in large documents.

I won't be able to dig into this more immediately, as my vacation starts today, but I'm very excited to find in when I get back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants