Skip to content

int_in_range (and others) reports success even when running out of entropy #219

@ejmount

Description

@ejmount

Firstly, I feel like I'm missing something here because this is such a fundamental piece of functionality, it's implausible to me this problem isn't described anywhere (I couldn't see anything about in the docs - the only reference I've found is in a unit test buried in #184), but I'm raising it because it nonetheless caused problems for me.

In Unstructured, a call int_in_range(n..=m) consumes some bytes of the seed data to construct an index between 0 and the smallest power of 256 larger than (m-n). It does this by simply concatenating the bytes together, before mod-reducing and shifting this value into the desired output range. As mentioned in #184, this leads to biased results and inefficient use of the entropy if m-n is not a power of 256, but it will nonetheless work if the data is available. However, if there aren't enough bytes available, int_in_range does not return an Err - instead, if it can't get a next byte, it stops constructing the large index value and uses it as is in the rest of the calculation, returning an Ok. IOW, if we have less than the required number of bytes left, we interpret whatever's left as a single integer (which will be by definition less than (m-n), possibly as little as 1/256th of that if the range is a very large one) and use that for the reduction and shifting. In the particular case that there's no data left at all, we "successfully" return exactly n.

While I appreciate that this is technically compliant with the documentation, it's extremely counterintuitive and I can't tell if it's intended behavior or not. If you enable clippy::pedantic and configure avoid-breaking-exported-api = false, it will indeed flag int_in_range returning a Result as redundant because it cannot fail. If this isn't intended, can it be fixed to return NotEnoughData appropriately? (It also maybe looks like it should return EmptyChoose for an empty range, instead of panicking as it does currently)

Alternatively, if this is the intended behavior, perhaps I've misunderstood and none of Unstructured should be expected to be checking for this case, because only one method, bytes(), actually does sometimes return NotEnoughData. I assume this is because it returns a reference to the underlying data in order to facilitate &[u8]: Arbitrary, and so can't invent a value out of thin air like the other methods. If this is the idea, I'd appreciate if it could be included in the documentation somewhere that none of the methods are intended to be checking for a lack of data except by necessity.

FWIW I'm happy to submit a PR to help fix this, or do any work needed to push #184 over the line if this is an issue of documentation, I'd just need someone to clarify the intended design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions