-
-
Notifications
You must be signed in to change notification settings - Fork 208
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Overview
The goal is to add support for efficient batch processing of inputs to the MLX-VLM library. This will allow users to process multiple images and text prompts simultaneously to generate corresponding outputs in a single batch, improving performance.
Use cases:
- Generating captions for a large dataset of images.
- Localizing objects or regions in a batch of images based on textual descriptions.
- Classifying a large number of images into predefined categories, considering accompanying text information.
- Answering questions based on a batch of images (single and multiple question prompts).
- Video processing.
Note: Tag @Blaizzy for code reviews and questions.
Requirements
Support batched inputs:
- Accept a batch of images as input, provided as a list or array of image objects.
- Accept a batch of text prompts as input, provided as a list or array of strings.
- Accept a single text prompt as input, provided as a string.
Perform batch processing:
- Process the batch of images and text prompts simultaneously (async) using the MLX-VLM model.
- Utilize parallel processing or GPU acceleration to optimize batch processing performance.
- Ensure that the processing of one input in the batch does not affect the processing of other inputs.
Generate batched outputs:
- Return the generated outputs for each input in the batch.
- Maintain the order of the outputs corresponding to the order of the inputs.
- Support different output formats such as text, embeddings, or visual representations based on the specific task.
Error handling:
- Handle errors gracefully during batch processing.
- Provide informative error messages for invalid inputs or processing failures.
- Continue processing the remaining inputs in the batch if an error occurs for a specific input.
API design:
- Provide a clear and intuitive API for users to perform batch processing.
- Allow users to specify the maximum batch size supported by their system.
- Provide options to control the batch processing behavior, such as enabling/disabling parallel processing.
Documentation and examples:
- Update the library documentation to include information about the batch processing feature.
- Provide code examples demonstrating how to use the batch processing API effectively.
- Include performance benchmarks and guidelines for optimal batch sizes based on system resources.
Implementation
- Modify the existing input handling logic to accept batches of images and text prompts.
- Implement batch processing functionality using parallel processing techniques or GPU acceleration libraries.
- Optimize memory usage and performance for efficient batch processing.
- Update the output generation logic to handle batched outputs and maintain the correct order.
- Implement error handling mechanisms to gracefully handle and report errors during batch processing.
- Design and expose a user-friendly API for performing batch processing.
- Write unit tests to verify the correctness and performance of the batch processing implementation.
- Update the library documentation and provide code examples for using the batch processing feature.
Testing
- Prepare a comprehensive test suite to validate the batch processing functionality.
- Test with different batch sizes and input variations to ensure robustness.
- Verify that the generated outputs match the expected results for each input in the batch.
- Measure the performance improvement gained by batch processing compared to individual processing.
- Conduct error handling tests to ensure graceful handling of invalid inputs and processing failures.
Delivery
- Integrate the batch processing feature into the existing MLX-VLM library codebase.
- Ensure backward compatibility with previous versions of the library.
- Provide release notes highlighting the new batch processing capability and any breaking changes.
- Update the library version number following semantic versioning conventions.
- Publish the updated library package to the relevant package repositories or distribution channels.
By implementing this batch processing feature, MLX-VLM will provide users with the ability to efficiently process multiple inputs simultaneously, improving performance and usability of the library for various vision-language tasks.
lin72h, eDeveloperOZ, cast42, willccbb, Benjoyo and 2 morelin72h and tmoroney
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers