Skip to content

Conversation

@arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Dec 22, 2025

Splitting a buffer results in fetching a new buffer object from a sync.Pool. The buffer object is returned back to the pool only once the shared ref count falls to 0. As a result, only one of the buffer objects is returned back to the pool for re-use. The "leaked" buffer objects may cause noticeable allocations when buffers are split more frequently. I noticed this when attempting to remove a buffer copy by replacing the bufio.Reader.

Solution

This PR introduces a root-owner model for the underlying *[]byte within buffer objects. The root object manages the slice's lifecycle, returning it to the pool only when its reference count reaches zero.

When a buffer is split, the new buffer is treated as a child, incrementing the ref counts for both itself and the root. Once a child’s ref count hits zero, it returns itself to the pool and decrements the root’s count.

Additionally, this PR replaces the sync.Pool used for *atomic.Int32 by embedding atomic.Int32 as a value field within the buffer struct. By eliminating the second pool and the associated pointer indirection, we reduce allocation overhead and improve cache locality during buffer lifecycle events.

Benchmarks

A micro-benchmark showing the buffer object leak:

func BenchmarkSplit(b *testing.B) {
	pool := mem.DefaultBufferPool()
	b.Run("split", func(b *testing.B) {
		for b.Loop() {
			size := 1 << 15 // 32 KB
			slice := pool.Get(size)
			buf := mem.NewBuffer(slice, pool)
			left, right := mem.SplitUnsafe(buf, size/2)
			left.Free()
			right.Free()
		}
	})
	b.Run("no-split", func(b *testing.B) {
		for b.Loop() {
			size := 1 << 15 // 32 KB
			slice := pool.Get(size)
			buf := mem.NewBuffer(slice, pool)
			buf.Free()
		}
	})
}

Result on master vs this PR.

goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
                  │   old.txt   │               new.txt               │
                  │   sec/op    │   sec/op     vs base                │
Split/split-48      418.2n ± 0%   263.9n ± 1%  -36.89% (p=0.000 n=10)
Split/no-split-48   221.1n ± 1%   208.5n ± 0%   -5.70% (p=0.000 n=10)
geomean             304.1n        234.6n       -22.86%

                  │   old.txt    │                 new.txt                 │
                  │     B/op     │    B/op     vs base                     │
Split/split-48      64.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
Split/no-split-48   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                        ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                  │   old.txt    │                 new.txt                 │
                  │  allocs/op   │ allocs/op   vs base                     │
Split/split-48      1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Split/no-split-48   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
geomean                        ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

The effect on local gRPC benchmarks is negligible since the SplitUnsafe function isn't called very frequently.

$ go run benchmark/benchresult/main.go unary-before unary-after       
unary-networkMode_Local-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurr
entCalls_120-reqSize_16000B-respSize_16000B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-c
lientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBuff
erPool_simple-sharedWriteBuffer_false
               Title       Before        After Percentage
            TotalOps      2985694      3024364     1.30%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op     74784.94     74784.99     0.00%
           Allocs/op       133.67       133.89     0.00%
             ReqT/op 6369480533.33 6451976533.33     1.30%
            RespT/op 6369480533.33 6451976533.33     1.30%
            50th-Lat   2.410033ms    2.40116ms    -0.37%
            90th-Lat   3.145118ms   3.081771ms    -2.01%
            99th-Lat   3.563055ms   3.629663ms     1.87%
             Avg-Lat   2.410529ms   2.379513ms    -1.29%
           GoVersion     go1.24.8     go1.24.8
         GrpcVersion   1.78.0-dev   1.78.0-dev

RELEASE NOTES:

  • mem: Improve pooling of buffer objects on using SplitUnsafe.

@arjan-bal arjan-bal added this to the 1.79 Release milestone Dec 22, 2025
@arjan-bal arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Dec 22, 2025
@codecov
Copy link

codecov bot commented Dec 22, 2025

Codecov Report

❌ Patch coverage is 85.71429% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.35%. Comparing base (81a00ce) to head (f3bb58e).
⚠️ Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
mem/buffers.go 85.71% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8784      +/-   ##
==========================================
+ Coverage   83.22%   83.35%   +0.13%     
==========================================
  Files         418      418              
  Lines       32385    32900     +515     
==========================================
+ Hits        26952    27425     +473     
- Misses       4050     4080      +30     
- Partials     1383     1395      +12     
Files with missing lines Coverage Δ
mem/buffers.go 87.25% <85.71%> (+0.38%) ⬆️

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal force-pushed the better-recycle-buffer-objects branch from 45c3231 to e1a28ac Compare December 22, 2025 10:11
@arjan-bal arjan-bal requested review from dfawley and easwars December 22, 2025 10:38
@arjan-bal arjan-bal assigned easwars, dfawley and arjan-bal and unassigned easwars and dfawley Dec 22, 2025
@arjan-bal arjan-bal force-pushed the better-recycle-buffer-objects branch 2 times, most recently from 7619ed9 to c33b2ae Compare December 23, 2025 09:42
@arjan-bal arjan-bal force-pushed the better-recycle-buffer-objects branch from c33b2ae to 3331987 Compare December 23, 2025 09:56
@arjan-bal arjan-bal changed the title mem: Correctly recycle buffer objects after SplitUnsafe mem: Optimize buffer re-use Dec 23, 2025
@arjan-bal arjan-bal changed the title mem: Optimize buffer re-use mem: Optimize buffer object re-use Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Type: Performance Performance improvements (CPU, network, memory, etc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants