Skip to content

Commit ff91990

Browse files
authored
Merge pull request #454 from filecoin-project/fr32/doc/embedded-raw-data-alignment
fr32: fix constant 128
2 parents 0b0969b + c07a62f commit ff91990

File tree

1 file changed

+52
-12
lines changed

1 file changed

+52
-12
lines changed

sector-base/src/io/fr32.rs

Lines changed: 52 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,43 @@ padded in the first element and the remaining 66 bits form the incomplete
8787
data unit after it, which is aligned to 9 bytes. At the bit level, that
8888
last incomplete byte will have 2 valid bits and 6 extra bits.
8989
90+
# Alignment of raw data bytes in the padded output
91+
92+
This section is not necessary to use this structure but it does help to
93+
reason about it. By the previous definition, the raw data bits *embedded*
94+
in the padded layout are not necessarily grouped in the same byte units
95+
as in the original raw data input (due to the inclusion of the padding
96+
bits interleaved in that bit stream, which keep shifting the data bits
97+
after them).
98+
99+
This can also be stated as: the offsets of the bits (relative to the byte
100+
they belong to, i.e., *bit-offset*) in the raw data input won't necessarily
101+
match the bit-offsets of the raw data bits embedded in the padded layout.
102+
The consequence is that each raw byte written to the padded layout won't
103+
result in a byte-aligned bit stream output, i.e., it may cause the appearance
104+
of extra bits (to convert the output to a byte-aligned stream).
105+
106+
There are portions of the padded layout, however, where this alignment does
107+
happen. Particularly, when the padded layout accumulates enough padding bits
108+
that they altogether add up to a byte, the following raw data byte written
109+
will result in a byte-aligned output, and the same is true for all the other
110+
raw data byte that follow it up until the element end, where new padding bits
111+
shift away this alignment. (The other obvious case is the first element, which,
112+
with no padded bits in front of it, has by definition all its embedded raw data
113+
bytes aligned, independently of the `data_bits`/`pad_bits` configuration used.)
114+
115+
In the previous example, that happens after the fourth element, where 4 units
116+
of `pad_bits` add up to one byte and all of the raw data bytes in the fifth
117+
element will keep its original alignment from the byte input stream (and the
118+
same will happen with every other element multiple of 4). When that fourth
119+
element is completed we have then 127 bytes of raw data and 1 byte of padding
120+
(totalling 32 * 4 = 128 bytes of padded output), so the interval of raw data
121+
bytes `[127..159]` (indexed like this in the input raw data stream) will keep
122+
its original alignment when embedded in the padded layout, i.e., every raw
123+
data byte written will keep the output bit stream byte-aligned (without extra
124+
bits). (Technically, the last byte actually won't be a full byte since its last
125+
bits will be replaced by padding).
126+
90127
# Key terms
91128
92129
Collection of terms introduced in this documentation (with the format
@@ -109,6 +146,9 @@ an additional summary of what was already discussed.
109146
byte (in a way the extra bits are the padding at the byte-level, but we don't
110147
use that term here to avoid confusions).
111148
* Sub-byte padding.
149+
* Bit-offset: offset of a bit within the byte it belongs to, ranging in `[0..8]`.
150+
* Embedded raw data: view of the input raw data when it has been decomposed in
151+
bit streams and padded in the resulting output.
112152
113153
**/
114154
#[derive(Debug)]
@@ -600,10 +640,10 @@ where
600640
W: Read + Write + Seek,
601641
{
602642
// In order to optimize alignment in the common case of writing from an aligned start,
603-
// we should make the chunk a multiple of 128.
643+
// we should make the chunk a multiple of 127 (4 full elements, see `PaddingMap#alignment`).
604644
// n was hand-tuned to do reasonably well in the benchmarks.
605645
let n = 1000;
606-
let chunk_size = 128 * n;
646+
let chunk_size = 127 * n;
607647

608648
let mut written = 0;
609649

@@ -768,7 +808,7 @@ where
768808
W: Write,
769809
{
770810
// In order to optimize alignment in the common case of writing from an aligned start,
771-
// we should make the chunk a multiple of 128.
811+
// we should make the chunk a multiple of 128 (4 full elements in the padded layout).
772812
// n was hand-tuned to do reasonably well in the benchmarks.
773813
let n = 1000;
774814
let chunk_size = 128 * n;
@@ -1022,22 +1062,22 @@ mod tests {
10221062
assert_eq!(padded[63], 0b0011_1111);
10231063
}
10241064

1025-
// `write_padded` for 256 bytes of 1s, splitting it in two calls of 128 bytes,
1065+
// `write_padded` for 256 bytes of 1s, splitting it in two calls of 127 bytes,
10261066
// aligning the calls with the padded element boundaries, check padding bits
10271067
// in byte 31 and 63.
10281068
#[test]
10291069
fn test_write_padded_multiple_aligned() {
1030-
let data = vec![255u8; 256];
1070+
let data = vec![255u8; 254];
10311071
let buf = Vec::new();
10321072
let mut cursor = Cursor::new(buf);
1033-
let mut written = write_padded(&data[0..128], &mut cursor).unwrap();
1034-
written += write_padded(&data[128..], &mut cursor).unwrap();
1073+
let mut written = write_padded(&data[0..127], &mut cursor).unwrap();
1074+
written += write_padded(&data[127..], &mut cursor).unwrap();
10351075
let padded = cursor.into_inner();
10361076

1037-
assert_eq!(written, 256);
1077+
assert_eq!(written, 254);
10381078
assert_eq!(
10391079
padded.len(),
1040-
FR32_PADDING_MAP.transform_byte_offset(256, true)
1080+
FR32_PADDING_MAP.transform_byte_offset(254, true)
10411081
);
10421082
assert_eq!(&padded[0..31], &data[0..31]);
10431083
assert_eq!(padded[31], 0b0011_1111);
@@ -1049,16 +1089,16 @@ mod tests {
10491089
// from the previous one.
10501090
}
10511091

1052-
// `write_padded` for 265 bytes of 1s, splitting it in two calls of 128 bytes,
1092+
// `write_padded` for 265 bytes of 1s, splitting it in two calls of 127 bytes,
10531093
// aligning the calls with the padded element boundaries, check padding bits
10541094
// in byte 31 and 63.
10551095
#[test]
10561096
fn test_write_padded_multiple_first_aligned() {
10571097
let data = vec![255u8; 265];
10581098
let buf = Vec::new();
10591099
let mut cursor = Cursor::new(buf);
1060-
let mut written = write_padded(&data[0..128], &mut cursor).unwrap();
1061-
written += write_padded(&data[128..], &mut cursor).unwrap();
1100+
let mut written = write_padded(&data[0..127], &mut cursor).unwrap();
1101+
written += write_padded(&data[127..], &mut cursor).unwrap();
10621102
let padded = cursor.into_inner();
10631103

10641104
assert_eq!(written, 265);

0 commit comments

Comments
 (0)