Repository: localaiLicense: apache-2.0
Highlights: - Natively multimodal, trained with interleaved image and text data - Strong performance on multimodal tasks, excels in instruction following - Maintains state-of-the-art performance on text-only benchmarks Architecture: - New 400M parameter vision encoder trained from scratch - 12B parameter multimodal decoder based on Mistral Nemo - Supports variable image sizes and aspect ratios - Supports multiple images in the long context window of 128k tokens
Links
Tags