LocalAI - Models

vllm-omni-wan2.2-t2v

Wan2.2-T2V-A14B via vLLM-Omni - Text-to-video generation model from Wan-AI. Generates high-quality videos from text prompts using a 14B parameter diffusion model.

Links

https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers

Tags

vllm-omni-wan2.2-i2v

Wan2.2-I2V-A14B via vLLM-Omni - Image-to-video generation model from Wan-AI. Generates high-quality videos from images using a 14B parameter diffusion model.

Links

https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers

Tags

dream-org_dream-v0-instruct-7b

This is the instruct model of Dream 7B, which is an open diffusion large language model with top-tier performance.

Links

Tags

dreamshaper

A text-to-image model that uses Stable Diffusion 1.5 to generate images from text prompts. This model is DreamShaper model by Lykon.

Links

https://civitai.com/models/4384/dreamshaper

Tags

stable-diffusion-3-medium

Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Links

Tags

wan-2.1-t2v-1.3b-ggml

Wan 2.1 T2V 1.3B — text-to-video diffusion model, GGUF-quantized for the stable-diffusion.cpp backend. Generates short (33-frame) 832x480 clips from a text prompt. Cheapest Wan variant, suitable for CPU-offloaded inference with ~10 GB of usable RAM.

Links

Tags

wan-2.1-i2v-14b-480p-ggml

Wan 2.1 I2V 14B 480P — image-to-video diffusion, GGUF Q4 quantization. Animates a reference image into a 33-frame 480p clip. Requires more RAM than the 1.3B T2V variant; CPU offload enabled by default.

Links

https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf

Tags

wan-2.1-flf2v-14b-720p-ggml

Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M. Takes a start and end reference image and interpolates a 33-frame clip between them. Unlike the plain I2V variant this model feeds the end frame through clip_vision as well, so it conditions semantically (not just in pixel-space) on both endpoints. That makes it the right choice for seamless loops (start_image == end_image) and clean narrative cuts. Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl text encoder, and clip_vision_h as I2V 14B.

Links

https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf

Tags

wan-2.1-i2v-14b-720p-ggml

Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M. Native 720p sibling of the 480p I2V model: animates a single reference image into a 33-frame clip at up to 1280x720. Trained purely as image-to-video (no first-last-frame interpolation path), so motion is freer and better-suited to single-anchor animation than repurposing the FLF2V 720P variant for i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P and FLF2V 14B 720P entries.

Links

https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf

Tags

sd-1.5-ggml

Stable Diffusion 1.5

Links

https://huggingface.co/second-state/stable-diffusion-v1-5-GGUF

Tags

sd-3.5-medium-ggml

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Links

Tags

sd-3.5-large-ggml

Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Links

Tags

flux.1-schnell

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. Key Features Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.

Links

https://huggingface.co/black-forest-labs/FLUX.1-schnell

Tags

flux.1-kontext-dev

FLUX.1 Kontext [dev] is a 12 billion parameter rectified flow transformer capable of editing images based on text instructions. For more information, please read our blog post and our technical report. You can find information about the [pro] version in here. Key Features Change existing images based on an edit instruction. Have character, style and object reference without any finetuning. Robust consistency allows users to refine an image through multiple successive edits with minimal visual drift. Trained using guidance distillation, making FLUX.1 Kontext [dev] more efficient. Open weights to drive new scientific research, and empower artists to develop innovative workflows. Generated outputs can be used for personal, scientific, and commercial purposes, as described in the FLUX.1 [dev] Non-Commercial License.

Links

Tags

flux.2-klein-4b

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM. FLUX.2 [klein] 4B is a 4 billion parameter rectified flow transformer capable of generating images from text descriptions and supports multi-reference editing capabilities.

Links

https://huggingface.co/black-forest-labs/FLUX.2-klein-4B

Tags

Z-Image-Turbo

Z-Image is a powerful and highly efficient image generation model with 6B parameters. Currently there are three variants of which this is the Turbo edition. 🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.

Links

https://github.com/Tongyi-MAI/Z-Image

Tags

Model Gallery

Filter by type:

Filter by tags:

vllm-omni-wan2.2-t2v

vllm-omni-wan2.2-i2v

dream-org_dream-v0-instruct-7b

dreamshaper

stable-diffusion-3-medium

wan-2.1-t2v-1.3b-ggml

wan-2.1-i2v-14b-480p-ggml

wan-2.1-flf2v-14b-720p-ggml

wan-2.1-i2v-14b-720p-ggml

sd-1.5-ggml

sd-3.5-medium-ggml

sd-3.5-large-ggml

flux.1-schnell

flux.1-kontext-dev

flux.2-klein-4b

Z-Image-Turbo