Skip to content

Commit 62bdc28

Browse files
committed
Inaugural commit
0 parents  commit 62bdc28

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+16717
-0
lines changed

.github/workflows/test.yml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
on: [push, pull_request]
2+
name: Test
3+
jobs:
4+
test:
5+
strategy:
6+
matrix:
7+
go-version: [1.23.x]
8+
os: [ubuntu-latest, macos-latest]
9+
runs-on: ${{ matrix.os }}
10+
steps:
11+
- name: Install Go
12+
uses: actions/setup-go@v5
13+
with:
14+
go-version: ${{ matrix.go-version }}
15+
- name: Checkout code
16+
uses: actions/checkout@v4
17+
- uses: actions/cache@v4
18+
with:
19+
path: |
20+
~/go/pkg/mod
21+
~/.cache/go-build
22+
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
23+
restore-keys: |
24+
${{ runner.os }}-go-
25+
- name: Test
26+
run: go test -short -v ./... -coverpkg "github.com/prequel-dev/plz4,github.com/prequel-dev/plz4/internal/...,github.com/prequel-dev/plz4/pkg/..." -coverprofile coverage.out
27+
- name: Upload coverage as artifact
28+
uses: actions/upload-artifact@v3
29+
with:
30+
name: coverage.out
31+
path: coverage.out
32+
if: matrix.os == 'ubuntu-latest'
33+
coverage:
34+
runs-on: ubuntu-latest
35+
needs: test
36+
permissions:
37+
contents: write # This grants write access to the repository, including the Wiki.
38+
steps:
39+
- name: Download coverage artifact
40+
uses: actions/download-artifact@v3
41+
with:
42+
name: coverage.out
43+
path: ~/coverage.out
44+
- name: Update coverage report
45+
uses: ncruces/go-coverage-report@v0
46+
with:
47+
report: 'true'
48+
chart: 'true'
49+
amend: 'true'
50+
coverage-file: 'coverage.out'
51+
continue-on-error: true

LICENSE

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
2+
3+
Copyright (C) 2025, Prequel, Inc.
4+
5+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
6+
7+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
8+
9+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
10+
11+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Parallel LZ4
2+
[![license](http://img.shields.io/badge/license-BSD--2-red.svg?style=flat)](https://raw.githubusercontent.com/prequel-dev/plz4/main/LICENSE)
3+
[![Build Status](https://github.com/prequel-dev/plz4/actions/workflows/test.yml/badge.svg)](https://github.com/prequel-dev/plz4/actions/workflows/test.yml)
4+
[![Go Coverage](https://github.com/prequel-dev/plz4/wiki/coverage.svg)](https://raw.githack.com/wiki/prequel-dev/plz4/coverage.html)
5+
[![GitHub Tag](https://img.shields.io/github/tag/prequel-dev/plz4.svg?style=social)](https://github.com/prequel-dev/plz4/tags)
6+
7+
The plz4 package provides a fast and simple golang library to encode and decode the [LZ4 Frame Format](./docs/lz4_Frame_Format.md) in parallel.
8+
9+
In addition, it provides the [plz4](./cmd/plz4) command line tool to generate and decode LZ4.
10+
11+
12+
## Features
13+
14+
The primary goal of the plz4 project is performance, speed in particular. Multi-core machines are now commonplace, and LZ4's independent block mode is well suited to fully take advantage of multiple cores with some [caveats](#caveats).
15+
16+
This project attempts to support all of the features enumerated in the [LZ4 Frame Format ](./docs/lz4_Frame_Format.md) specification. In addition to the baseline features such as checksums and variable compression levels, the library supports the following;
17+
18+
- User-provided [dictionary](./docs/lz4_Frame_Format.md?plain=1#L220) to improve compression ratio
19+
- [Linked blocks](./docs/lz4_Frame_Format.md?plain=1#L154) (ie. non-independent) to improve compression ratio
20+
- [Skippable frame](./docs/lz4_Frame_Format.md?plain=1#L308) support
21+
- [Frame concatenation](./docs/lz4_Frame_Format.md?plain=1#L106) support
22+
- User-specified parallelism and optional worker pool
23+
- Progress callbacks
24+
- [Sparse](./pkg/sparse) write support
25+
- Random read access (see [caveats](#random-read-access))
26+
27+
28+
29+
## Design
30+
31+
The library is written in [Go](https://go.dev/), which provides a fantastic framework for parallel execution. For maximum feature compatibility, the underlying engine leverages the canonical [lz4 library](https://github.com/lz4/lz4) via CGO. As an alternative; there is an excellent [pure golang](https://github.com/pierrec/lz4) implementation by Pierre Curto.
32+
33+
The library runs in two modes; either synchronous or asynchronous mode. The asynchronous mode executes the encoding/decoding work in one or more goroutines to parallelize. In both modes, a memory pool is employed to minimize data allocations on the heap. Data blocks are reused from the memory pool when possible. On account of the minimal heap allocations, plz4 puts little pressure on the heap. As such, it performs well as a compression engine for long-running processes such as servers.
34+
35+
There is an inherent tradeoff between speed and memory in the LZ4 design. LZ4 compresses best with large blocks and as such the 4 Mib block size is the default. However, the more work done in parallel increases the amount of instantaneous RAM used. For example, a compression job using 8 cores and 4 MiB blocks could use upwards of 32 Mibs at one time (more than that when you consider both read and write blocks). A compression job using 8 cores and 64 KiB blocks would use much less, upwards of 512Kib.
36+
37+
To manage this tradeoff, there are a few knobs:
38+
39+
- When compressing, tune the block size given the environment.
40+
- For each job, the maximum number of goroutines may be specified. This, coupled with the block size, will limit overall RAM usage.
41+
- There is additionally an option to provide a user-specified WorkerPool. The advantage here is that the overall number of cores is limited without having to manage the maximum parallel count on each job.
42+
43+
44+
## Caveats
45+
46+
### Linked Blocks
47+
48+
While LZ4 Frames using independent blocks parallelizes well, the linked blocks feature does not. This is because each block is dependent on the data from the previous block. While plz4 can compress linked frames in parallel, it cannot decompress in parallel because of this dependency.
49+
50+
### Content Checksum
51+
52+
There is another LZ4 Frame feature that is problematic at scale. By default, plz4 enables the content checksum feature, as recommended in the spec. This feature uses a 32-bit checksum to validate that the content stream produced during decompression has the same checksum as the original source. Because the checksum must be calculated in serial, a decompression job running highly parallel may fall behind during this calculation. To improve parallel throughput, disabled the content checksum feature on decompress.
53+
54+
### Random read access
55+
56+
Another advantage of independent blocks is the potential to support random read access. This is possible because each block can be independently decompressed. To support this, plz4 provides an optional progress callback that emits both the source offset and corresponding block offset during compression. An implementation can use this information to build lookup tables that can later be used to skip ahead during decompression to a known block offset. plz4 provides the 'WithReadOffset' option on the NewReader API to skip ahead and start decompression at a known block offset.
57+
58+
59+
## Install
60+
61+
To install the 'plz4' command line tool use the following command:
62+
63+
```
64+
go install github.com/prequel-dev/plz4/cmd/plz4
65+
```
66+
67+
Use the 'bakeoff' command to determine whether your particular payload performs better using plz4 or the native go [implementation](https://github.com/pierrec/lz4). The two implementations differ on the relation between compression level and output size.
68+
69+

0 commit comments

Comments
 (0)