Skip to content

Performance: getImage() or parseFileDirectoryAt() fetches huge TileOffsets and TileByteCounts arrays, causing massive downloads for highly-tiled COGs #479

@MariDani

Description

@MariDani

Problem

When reading a very large, highly-tiled Cloud-Optimized GeoTIFF (e.g., 7 GB), the initial call to getImage() or parseFileDirectoryAt() triggers an unexpectedly large data download. For one of our test files, this initial metadata request downloads ~500 MB of data.

We have observed that this behavior is caused by the library fetching the entire TileOffsets and TileByteCounts arrays from the IFDs. For an image with millions of tiles, these arrays can be hundreds of megabytes in size, which defeats the purpose of the COG format's selective data access.

This makes the library difficult to use for efficiently visualizing very large-scale raster data on the web.

Behavior Comparison

The library @cogeotiff/core handles this scenario differently. Due to its lazy-loading approach, it appears to parse the IFD without immediately fetching the tile index arrays. For the same 7 GB COG file, it only requires ~7 MB of initial data transfer to begin visualization, which is the expected behavior.

Steps to Reproduce

The issue can be easily verified on the official geotiff.js demo applications.

A. Reproducing on Official Demo Sites (No Code Required)

Navigate to geotiff.io or the COG Explorer.
Open the browser's developer tools and select the Network tab.
In the application, open the test file via URL: https://gisat-data.eu-central-1.linodeobjects.com/WorldCereal_GST-10/project/demo/merged_cog.tif
Observe the network requests. A large download of ~276 MB will be initiated to fetch the IFD before any part of the map is rendered.

B. Reproducing with Code

Use geotiff.js to open a large COG that contains a high number of tiles.
Call the getImage() method on the highest-resolution image.
Observe the network tab in the browser's developer tools and note the large download size.

import { fromUrl } from 'geotiff';

const cogUrl = 'https://gisat-data.eu-central-1.linodeobjects.com/WorldCereal_GST-10/project/demo/merged_cog.tif';

const tiff = await fromUrl(cogUrl);
const image = await tiff.getImage();

Technical Details of Example Data

The problem can be reproduced with the following public file:
https://gisat-data.eu-central-1.linodeobjects.com/WorldCereal_GST-10/project/demo/merged_cog.tif

File Analysis: The file is a valid COG created with the rio-cogeo tool (files created with the gdal_translate tool behaved the same way). The first image (Image 0) contains over 17 million tiles (1083648x1044736 pixels at 256x256 tile size).
IFD Size: The total size of the IFD headers, primarily TileOffsets and TileByteCounts, is ~276 MB. This corresponds directly to the data being downloaded on initialization.
Below is a print screen of the console log of the first IFD with visible large arrays for TileOffsets and TileByteCounts.

Image

Proposed Solution / Feature Request

We are curious if we are missing some important piece of the logic within the geotiff.js library functions in solving our problem. Is there a way to use the geotiff.js library without fetching the full TileOffsets and TileByteCounts arrays? For example, an optional lazy loading? This would keep the initial metadata download minimal and align with the performance expectations of the COG format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions