Skip to content

Commit 6ad2161

Browse files
committed
Add warning
1 parent 423a3ec commit 6ad2161

File tree

3 files changed

+82
-25
lines changed

3 files changed

+82
-25
lines changed

README.md

Lines changed: 35 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,25 @@
33
Fast batched dataloading of BigWig files containing epigentic track data and corresponding sequences powered by GPU
44
for deep learning applications.
55

6+
> ⚠️ **BREAKING CHANGE (v0.3.0+)**: The output matrix dimensionality has changed from `(n_tracks, batch_size, sequence_length)` to `(batch_size, sequence_length, n_tracks)`. This change was long overdue and eliminates the need for (potentially memory expensive) transpose operations downstream. If you're upgrading from an earlier version, please update your code accordingly (probaby you need to delete one transpose in your code).
7+
8+
9+
10+
611
## Quickstart
712

13+
### Installation with Pixi
14+
15+
Please take a look at the pixi.toml file. If you just want to use bigwig-loader, just
16+
copy that pixi.toml, add the other libraries you need and use the "prod" environment
17+
(you don't need to clone this repo, pixi will download bigwig-loader from the
18+
conda "dataloading" channel):
19+
20+
```shell
21+
pixi run -e prod <my_training_command>
22+
```
23+
24+
825
### Installation with conda/mamba
926

1027
Bigwig-loader mainly depends on the rapidsai kvikio library and cupy, both of which are best installed using
@@ -65,16 +82,17 @@ dataset = PytorchBigWigDataset(
6582
regions_of_interest=train_regions,
6683
collection=example_bigwigs_directory,
6784
reference_genome_path=reference_genome_file,
68-
sequence_length=1000,
69-
center_bin_to_predict=500,
85+
sequence_length=1000,000,
86+
center_bin_to_predict=500,000,
7087
window_size=1,
71-
batch_size=32,
72-
super_batch_size=1024,
73-
batches_per_epoch=20,
88+
batch_size=1,
89+
super_batch_size=4,
90+
batches_per_epoch=100,
7491
maximum_unknown_bases_fraction=0.1,
7592
sequence_encoder="onehot",
7693
n_threads=4,
7794
return_batch_objects=True,
95+
dtype="bfloat16"
7896
)
7997

8098
# Don't use num_workers > 0 in DataLoader. The heavy
@@ -88,7 +106,7 @@ class MyTerribleModel(torch.nn.Module):
88106
self.linear = torch.nn.Linear(4, 2)
89107

90108
def forward(self, batch):
91-
return self.linear(batch).transpose(1, 2)
109+
return self.linear(batch)
92110

93111

94112
model = MyTerribleModel()
@@ -98,10 +116,10 @@ def poisson_loss(pred, target):
98116
return (pred - target * torch.log(pred.clamp(min=1e-8))).mean()
99117

100118
for batch in dataloader:
101-
# batch.sequences.shape = n_batch (32), sequence_length (1000), onehot encoding (4)
119+
# batch.sequences.shape = n_batch x sequence_length x onehot encoding (4)
102120
pred = model(batch.sequences)
103-
# batch.values.shape = n_batch (32), n_tracks (2) center_bin_to_predict (500)
104-
loss = poisson_loss(pred[:, :, 250:750], batch.values)
121+
# batch.values.shape = n_batch x center_bin_to_predict x n_tracks
122+
loss = poisson_loss(pred[:, 250000:750000, :], batch.values)
105123
print(loss)
106124
optimizer.zero_grad()
107125
loss.backward()
@@ -166,19 +184,23 @@ anything is unclear, please open an issue.
166184

167185
### Environment
168186

187+
The pixi.toml includes a dev environment that has bigwig-loader installed
188+
as an editable pypi dependency.
189+
169190
1. `git clone [email protected]:pfizer-opensource/bigwig-loader`
170191
2. `cd bigwig-loader`
171-
3. create the conda environment" `conda env create -f environment.yml`
172-
4. `pip install -e '.[dev]'`
173-
5. run `pre-commit install` to install the pre-commit hooks
192+
3. optional: `pixi install -e dev`
193+
4. run `pre-commit install` to install the pre-commit hooks
174194

175195
### Run Tests
176196
Tests are in the tests directory. One of the most important tests is
177197
test_against_pybigwig which makes sure that if there is a mistake in
178198
pyBigWIg, it is also in bigwig-loader.
179199

200+
In order to run these tests you need gpu.
201+
180202
```shell
181-
pytest -vv .
203+
pixi run -e dev test
182204
```
183205

184206
When github runners with GPU's will become available we would also

pixi.lock

Lines changed: 46 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pixi.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,6 @@ asgi-lifespan = "*"
6060
pyBigWig = "*"
6161

6262
[environments]
63-
mac = {features = ["cpu", "test"]}
6463
prod = {features = ["gpu", "released"]}
6564
dev = {features = ["gpu", "dev", "test"]}
65+
dev-cpu = {features = ["cpu", "test"] }

0 commit comments

Comments
 (0)