Skip to content

Conversation

@sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Nov 25, 2025

Description

WWB is updated to run VLM pipeline with video inputs.

  • New model type has been added for this purpose, so to configure the video input it's needed to run wwb with --model-type visual-video-text.
  • Default video data is data from dataset: lmms-lab/LLaVA-Video-178K , subset 30_60_s_academic_v0_1 , video from archive 30_60_s_academic_v0_1_videos_10.tar.gz (was chosen because of its weight - 274 MB). Archive includs 56 videos from different datasets: youcook2, NextQA, ego4d, Chrades and activitynet.

how to use:
optimum-cli export openvino -m Qwen/Qwen2-VL-7B-Instruct --weight-format int8 qwen2-vl-7b-Instruct
python whowhatbench/wwb.py --base-model qwen2-vl-7b-Instruct --model-type visual-video-text --gt-data vlm_video_gt.csv

Ticket:
CVS-173847

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

@github-actions github-actions bot added the category: WWB PR changes WWB label Nov 25, 2025
@sbalandi sbalandi force-pushed the wwb_video_in branch 4 times, most recently from 36c1196 to d347bdb Compare November 27, 2025 19:58
@sbalandi sbalandi marked this pull request as ready for review November 27, 2025 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: WWB PR changes WWB

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant