-
Notifications
You must be signed in to change notification settings - Fork 302
Samples for VLM video input. #3050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a Python sample demonstrating video-to-text functionality for Vision Language Models (VLMs). The sample enables users to input video files and interact with VLMs through a chat interface.
- Adds new
video_to_text_chat.pysample for VLM video input processing - Updates test configuration to include a tiny random LLaVA-NeXT-Video model and sample video file
- Updates documentation to describe the new video-to-text sample usage
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/python_tests/samples/conftest.py | Adds test model configuration for LLaVA-NeXT-Video and sample video file resource |
| samples/python/visual_language_chat/video_to_text_chat.py | New sample implementing video-to-text chat functionality using VLM pipeline |
| samples/python/visual_language_chat/README.md | Updates documentation to describe the new video-to-text sample and its usage |
| samples/deployment-requirements.txt | Adds opencv-python dependency required for video processing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Vladimir Zlobin <[email protected]>
Description
Python and C++ samples for VLM video input.
CVS-175408
Checklist: