Build Multimodal AI Agents for Document and Video Intelligence using NVIDIA Nemotron Nano 2 VL

Build Multimodal AI Agents for Document and Video Intelligence using NVIDIA Nemotron Nano 2 VL

**Abstract of the talk/workshop**
This talk explores the Architecture and capabilities of Nemotron Nano 2 VL, NVIDIA's compact powerful vision-language model, in building intelligent agents that can process and understand both documents and videos. Participants will learn how to use this multimodal AI model to create applications that can extract insights from complex documents, analyze video content, and generate contextual responses. The session will cover practical implementation strategies, including setting up the model, and integrating with existing AI pipelines. Real-world use cases will demonstrate how these agents can transform document processing workflows and enable advanced video intelligence in various industries. Attendees will gain hands-on insights into building efficient AI solutions that balance performance with computational resources.

**Category of the talk/workshop**
Data Science, Machine Learning, and AI

**Duration (including Q&A)**
40 minutes (30 minutes presentation + 10 minutes Q&A)

**Level of Audience**
Intermediate

**Speaker Bio**
Name: Navuluri Balaji
Company: AVK Tech Solutions
Position: Associate Software Developer
Email: navuluribalaji03@gmail.com
Years of Experience: 1
Portfolio: https://linktr.ee/BalajiNavuluri

**Prerequisites(if any)**
Basic understanding of AI/ML concepts
Familiarity with Python programming
Basic knowledge of neural networks and transformers (helpful but not required)
No specific software setup required as the talk will focus on concepts and implementation approaches


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build Multimodal AI Agents for Document and Video Intelligence using NVIDIA Nemotron Nano 2 VL #149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build Multimodal AI Agents for Document and Video Intelligence using NVIDIA Nemotron Nano 2 VL #149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions