Skip to content

Conversation

@xal-0
Copy link
Contributor

@xal-0 xal-0 commented Jul 22, 2025

The definition of IntVector caused an implicit copy to convert SA to a Vector{Int}, negating the efforts to reuse memory. Replace it with @view to get reasonable memory consumption.

Suffix sorting a 100 MiB UInt8 array (an aarch64 executable):

before fix   10.324155 seconds (75 allocations: 16.797 GiB, 1.12% gc time)
after fix    8.059011 seconds  (26 allocations: 400.007 MiB, 0.00% gc time)

Comparison with a few commonly-used suffix sorting libraries:

libsais      7.563240 seconds (3 allocations: 400.000 MiB, 0.00% gc time)
divsufsort   4.244779 seconds (3 allocations: 400.000 MiB, 0.00% gc time)

The definition of IntVector caused an implicit copy to convert SA to a
Vector{Int}, negating the efforts to reuse memory.  Replace it with @view to get
reasonable memory consumption.

Suffing sorting a 100 MiB UInt8 array (an aarch64 executable):
before fix   10.324155 seconds (75 allocations: 16.797 GiB, 1.12% gc time)
after fix    8.059011 seconds (26 allocations: 400.007 MiB, 0.00% gc time)

Comparison with a few commonly-used suffix sorting libraries:
libsais      7.563240 seconds (3 allocations: 400.000 MiB, 0.00% gc time)
divsufsort   4.244779 seconds (3 allocations: 400.000 MiB, 0.00% gc time)
Copy link
Collaborator

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wowza; nice improvement!

@StefanKarpinski
Copy link
Contributor

Do you want to setup a proper email address for contributions? I was about to merge and noticed that the authorship email address would be one of the whacky GitHub autogenerated email addresses.

@xal-0
Copy link
Contributor Author

xal-0 commented Aug 13, 2025

Is that for squashed merges? I've set my primary email to public now.

@StefanKarpinski
Copy link
Contributor

Yes, if you author locally whatever your local git email is configured to gets use by git and then pushed. GitHub needs an email address when it's synthesizing a commit for you (like a squash).

@StefanKarpinski StefanKarpinski merged commit 90cba38 into JuliaCollections:master Aug 13, 2025
9 checks passed
@xal-0 xal-0 deleted the fix-memory-reuse branch August 14, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants