-
Notifications
You must be signed in to change notification settings - Fork 209
Description
Problem
When reading a very large, highly-tiled Cloud-Optimized GeoTIFF (e.g., 7 GB), the initial call to getImage() or parseFileDirectoryAt() triggers an unexpectedly large data download. For one of our test files, this initial metadata request downloads ~500 MB of data.
We have observed that this behavior is caused by the library fetching the entire TileOffsets and TileByteCounts arrays from the IFDs. For an image with millions of tiles, these arrays can be hundreds of megabytes in size, which defeats the purpose of the COG format's selective data access.
This makes the library difficult to use for efficiently visualizing very large-scale raster data on the web.
Behavior Comparison
The library @cogeotiff/core handles this scenario differently. Due to its lazy-loading approach, it appears to parse the IFD without immediately fetching the tile index arrays. For the same 7 GB COG file, it only requires ~7 MB of initial data transfer to begin visualization, which is the expected behavior.
Steps to Reproduce
The issue can be easily verified on the official geotiff.js demo applications.
A. Reproducing on Official Demo Sites (No Code Required)
Navigate to geotiff.io or the COG Explorer.
Open the browser's developer tools and select the Network tab.
In the application, open the test file via URL: https://gisat-data.eu-central-1.linodeobjects.com/WorldCereal_GST-10/project/demo/merged_cog.tif
Observe the network requests. A large download of ~276 MB will be initiated to fetch the IFD before any part of the map is rendered.
B. Reproducing with Code
Use geotiff.js to open a large COG that contains a high number of tiles.
Call the getImage() method on the highest-resolution image.
Observe the network tab in the browser's developer tools and note the large download size.
import { fromUrl } from 'geotiff';
const cogUrl = 'https://gisat-data.eu-central-1.linodeobjects.com/WorldCereal_GST-10/project/demo/merged_cog.tif';
const tiff = await fromUrl(cogUrl);
const image = await tiff.getImage();Technical Details of Example Data
The problem can be reproduced with the following public file:
https://gisat-data.eu-central-1.linodeobjects.com/WorldCereal_GST-10/project/demo/merged_cog.tif
File Analysis: The file is a valid COG created with the rio-cogeo tool (files created with the gdal_translate tool behaved the same way). The first image (Image 0) contains over 17 million tiles (1083648x1044736 pixels at 256x256 tile size).
IFD Size: The total size of the IFD headers, primarily TileOffsets and TileByteCounts, is ~276 MB. This corresponds directly to the data being downloaded on initialization.
Below is a print screen of the console log of the first IFD with visible large arrays for TileOffsets and TileByteCounts.
Proposed Solution / Feature Request
We are curious if we are missing some important piece of the logic within the geotiff.js library functions in solving our problem. Is there a way to use the geotiff.js library without fetching the full TileOffsets and TileByteCounts arrays? For example, an optional lazy loading? This would keep the initial metadata download minimal and align with the performance expectations of the COG format.