Skip to content

Build Multimodal AI Agents for Document and Video Intelligence using NVIDIA Nemotron Nano 2 VL #149

@NavuluriBalaji

Description

@NavuluriBalaji

Build Multimodal AI Agents for Document and Video Intelligence using NVIDIA Nemotron Nano 2 VL

Abstract of the talk/workshop
This talk explores the Architecture and capabilities of Nemotron Nano 2 VL, NVIDIA's compact powerful vision-language model, in building intelligent agents that can process and understand both documents and videos. Participants will learn how to use this multimodal AI model to create applications that can extract insights from complex documents, analyze video content, and generate contextual responses. The session will cover practical implementation strategies, including setting up the model, and integrating with existing AI pipelines. Real-world use cases will demonstrate how these agents can transform document processing workflows and enable advanced video intelligence in various industries. Attendees will gain hands-on insights into building efficient AI solutions that balance performance with computational resources.

Category of the talk/workshop
Data Science, Machine Learning, and AI

Duration (including Q&A)
40 minutes (30 minutes presentation + 10 minutes Q&A)

Level of Audience
Intermediate

Speaker Bio
Name: Navuluri Balaji
Company: AVK Tech Solutions
Position: Associate Software Developer
Email: [email protected]
Years of Experience: 1
Portfolio: https://linktr.ee/BalajiNavuluri

Prerequisites(if any)
Basic understanding of AI/ML concepts
Familiarity with Python programming
Basic knowledge of neural networks and transformers (helpful but not required)
No specific software setup required as the talk will focus on concepts and implementation approaches

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions