mem: Optimize buffer object re-use #8784
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Splitting a
bufferresults in fetching a newbufferobject from async.Pool. Thebufferobject is returned back to the pool only once the shared ref count falls to 0. As a result, only one of thebufferobjects is returned back to the pool for re-use. The "leaked" buffer objects may cause noticeable allocations when buffers are split more frequently. I noticed this when attempting to remove a buffer copy by replacing the bufio.Reader.Solution
This PR introduces a root-owner model for the underlying
*[]bytewithinbufferobjects. The root object manages the slice's lifecycle, returning it to the pool only when its reference count reaches zero.When a
bufferis split, the newbufferis treated as a child, incrementing the ref counts for both itself and the root. Once a child’s ref count hits zero, it returns itself to the pool and decrements the root’s count.Additionally, this PR replaces the
sync.Poolused for*atomic.Int32by embeddingatomic.Int32as a value field within thebufferstruct. By eliminating the second pool and the associated pointer indirection, we reduce allocation overhead and improve cache locality during buffer lifecycle events.Benchmarks
A micro-benchmark showing the buffer object leak:
Result on master vs this PR.
goos: linux goarch: amd64 pkg: google.golang.org/grpc/mem cpu: Intel(R) Xeon(R) CPU @ 2.60GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Split/split-48 418.2n ± 0% 263.9n ± 1% -36.89% (p=0.000 n=10) Split/no-split-48 221.1n ± 1% 208.5n ± 0% -5.70% (p=0.000 n=10) geomean 304.1n 234.6n -22.86% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ Split/split-48 64.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) Split/no-split-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ Split/split-48 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) Split/no-split-48 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomeanThe effect on local gRPC benchmarks is negligible since the
SplitUnsafefunction isn't called very frequently.$ go run benchmark/benchresult/main.go unary-before unary-after unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurr entCalls_120-reqSize_16000B-respSize_16000B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-c lientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBuff erPool_simple-sharedWriteBuffer_false Title Before After Percentage TotalOps 2985694 3024364 1.30% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 74784.94 74784.99 0.00% Allocs/op 133.67 133.89 0.00% ReqT/op 6369480533.33 6451976533.33 1.30% RespT/op 6369480533.33 6451976533.33 1.30% 50th-Lat 2.410033ms 2.40116ms -0.37% 90th-Lat 3.145118ms 3.081771ms -2.01% 99th-Lat 3.563055ms 3.629663ms 1.87% Avg-Lat 2.410529ms 2.379513ms -1.29% GoVersion go1.24.8 go1.24.8 GrpcVersion 1.78.0-dev 1.78.0-devRELEASE NOTES:
bufferobjects on usingSplitUnsafe.