AIDC-AI

All

25 repositories

Pixelle-Video
Public
🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine
tts image-generation video-generation aigc comfyui
Python
•
Apache License 2.0
•71•423•15•0•Updated Dec 11, 2025Dec 11, 2025
Pixelle-MCP
Public
An Open-Source Multimodal AIGC Solution based on ComfyUI + MCP + LLM https://pixelle.ai
Python
•
MIT License
•107•835•7•5•Updated Dec 10, 2025Dec 10, 2025
Ovis-Image
Public
Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints.
image-generation text-to-image
Python
•
Apache License 2.0
•11•261•4•0•Updated Dec 10, 2025Dec 10, 2025
Marco-Voice
Public
A Unified Framework for Expressive Speech Synthesis with Voice Cloning
Python
•
Apache License 2.0
•35•398•5•0•Updated Dec 3, 2025Dec 3, 2025
Ovis-U1
Public
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
image-editing text-to-image multimodal-large-language-models
Python
•
Apache License 2.0
•14•442•3•0•Updated Dec 2, 2025Dec 2, 2025
ComfyUI-Copilot
Public
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
agent flux ai copilot rag gpt-4 stable-diffusion comfyui llm-agent deepseek
TypeScript
•
MIT License
•251•3.9k•26•2•Updated Dec 1, 2025Dec 1, 2025
Agentic-ADK
Public
Agentic ADK is an Agent application development framework launched by Alibaba International AI Business, based on Google-ADK and Ali-LangEngine.
Java
•
Apache License 2.0
•116•625•13•5•Updated Nov 24, 2025Nov 24, 2025
Diffusion-SDPO
Public
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
text-to-image diffusion-model dpo flowmatching
Python
•
Apache License 2.0
•1•14•2•0•Updated Nov 11, 2025Nov 11, 2025
Marco-MT
Public
3•24•0•0•Updated Nov 5, 2025Nov 5, 2025
Marco-Search-Agent
Public
Marco Search Agent for Realistic and Challenging Agentic Search
Python
•
Apache License 2.0
•21•238•2•0•Updated Oct 24, 2025Oct 24, 2025
Marco-Bench
Public
Python
•0•13•0•0•Updated Oct 24, 2025Oct 24, 2025
Ovis
Public
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3
Python
•
Apache License 2.0
•85•1.4k•78•3•Updated Sep 22, 2025Sep 22, 2025
CHATS
Public
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)
text-to-image dpo sdxl
Python
•
Apache License 2.0
•2•117•1•0•Updated Aug 19, 2025Aug 19, 2025
Awesome-Unified-Multimodal-Models
Public
Awesome Unified Multimodal Models
multimodal-models text-to-image-generation vision-language-model multimodal-large-language-models unified-multimodal-models
28•943•6•5•Updated Aug 17, 2025Aug 17, 2025
TeEFusion
Public
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance (ICCV 2025)
text-to-image distillation-model sd3 classifier-free-guidance
Python
•
Other
•2•8•1•0•Updated Jul 25, 2025Jul 25, 2025
flashinfer
Public
FlashInfer: Kernel Library for LLM Serving
Cuda
•
Apache License 2.0
•594•1•0•0•Updated Jul 15, 2025Jul 15, 2025
UNIC-Adapter
Public
Python
•
MIT License
•0•9•1•0•Updated Jul 10, 2025Jul 10, 2025
Parrot
Public
🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.
multilingual mixture-of-experts vision-language-model multimodal-large-language-models
Python
•
Apache License 2.0
•3•77•1•0•Updated Jun 12, 2025Jun 12, 2025
Marco-o1
Public
An Open Large Reasoning Model for Real-World Solutions
Python
•
Other
•80•1.5k•10•0•Updated May 30, 2025May 30, 2025
TransBench
Public
2•40•3•0•Updated May 29, 2025May 29, 2025
TG-LLaVA
Public
Python
•
Apache License 2.0
•0•9•0•0•Updated Jan 14, 2025Jan 14, 2025
Wings
Public
The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]
deep-learning mllm multimodal-large-language-models multimodal-llm text-only-forgetting
Python
•
Apache License 2.0
•1•24•1•0•Updated Dec 28, 2024Dec 28, 2024
M3Bench
Public
Python
•
Apache License 2.0
•4•2•0•0•Updated Dec 15, 2024Dec 15, 2024
Meissonic
Public
Python
•
Other
•0•3•0•0•Updated Nov 14, 2024Nov 14, 2024
AutoGPTQ
Public
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Python
•
Other
•528•3•0•0•Updated Nov 4, 2024Nov 4, 2024