Skip to content

Batch Processing Feature #40

@Blaizzy

Description

@Blaizzy

Overview

The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.

Use cases:

  1. Generating captions for a large dataset of images.
  2. Localizing objects or regions in a batch of images based on textual descriptions.
  3. Classifying a large number of images into predefined categories, considering accompanying text information.
  4. Answering questions based on a batch of images (single and multiple question prompts).
  5. Video processing.

Note: Tag @Blaizzy for code reviews and questions.

Requirements

Support batched inputs:

  • Accept a batch of images as input, provided as a list or array of image objects.
  • Accept a batch of text prompts as input, provided as a list or array of strings.
  • Accept a single text prompt as input, provided as a string.

Perform batch processing:

  • Process the batch of images and text prompts simultaneously (async) using the MLX-VLM model.
  • Utilize parallel processing or GPU acceleration to optimize batch processing performance.
  • Ensure that the processing of one input in the batch does not affect the processing of other inputs.

Generate batched outputs:

  • Return the generated outputs for each input in the batch.
  • Maintain the order of the outputs corresponding to the order of the inputs.
  • Support different output formats such as text, embeddings, or visual representations based on the specific task.

Error handling:

  • Handle errors gracefully during batch processing.
  • Provide informative error messages for invalid inputs or processing failures.
  • Continue processing the remaining inputs in the batch if an error occurs for a specific input.

API design:

  • Provide a clear and intuitive API for users to perform batch processing.
  • Allow users to specify the maximum batch size supported by their system.
  • Provide options to control the batch processing behavior, such as enabling/disabling parallel processing.

Documentation and examples:

  • Update the library documentation to include information about the batch processing feature.
  • Provide code examples demonstrating how to use the batch processing API effectively.
  • Include performance benchmarks and guidelines for optimal batch sizes based on system resources.

Implementation

  • Modify the existing input handling logic to accept batches of images and text prompts.
  • Implement batch processing functionality using parallel processing techniques or GPU acceleration libraries.
  • Optimize memory usage and performance for efficient batch processing.
  • Update the output generation logic to handle batched outputs and maintain the correct order.
  • Implement error handling mechanisms to gracefully handle and report errors during batch processing.
  • Design and expose a user-friendly API for performing batch processing.
  • Write unit tests to verify the correctness and performance of the batch processing implementation.
  • Update the library documentation and provide code examples for using the batch processing feature.

Testing

  • Prepare a comprehensive test suite to validate the batch processing functionality.
  • Test with different batch sizes and input variations to ensure robustness.
  • Verify that the generated outputs match the expected results for each input in the batch.
  • Measure the performance improvement gained by batch processing compared to individual processing.
  • Conduct error handling tests to ensure graceful handling of invalid inputs and processing failures.

Delivery

  • Integrate the batch processing feature into the existing MLX-VLM library codebase.
  • Ensure backward compatibility with previous versions of the library.
  • Provide release notes highlighting the new batch processing capability and any breaking changes.
  • Update the library version number following semantic versioning conventions.
  • Publish the updated library package to the relevant package repositories or distribution channels.

By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions