LocalAI - Models

vibevoice-cpp

VibeVoice Realtime 0.5B (C++ / GGML, Q8_0) - native C++ port of Microsoft VibeVoice via the vibevoice-cpp backend. 24kHz mono TTS with voice cloning from a single reference voice prompt. Default voice prompt: en-Carter_man.

Links

Tags

vibevoice-cpp-asr

VibeVoice ASR 7B (C++ / GGML, Q4_K) - long-form speech-to-text with speaker diarization. Returns per-speaker JSON segments with start/end timestamps. English-only. ~10 GB download.

Links

Tags

wan-2.1-t2v-1.3b-ggml

Wan 2.1 T2V 1.3B — text-to-video diffusion model, GGUF-quantized for the stable-diffusion.cpp backend. Generates short (33-frame) 832x480 clips from a text prompt. Cheapest Wan variant, suitable for CPU-offloaded inference with ~10 GB of usable RAM.

Links

Tags

wan-2.1-i2v-14b-480p-ggml

Wan 2.1 I2V 14B 480P — image-to-video diffusion, GGUF Q4 quantization. Animates a reference image into a 33-frame 480p clip. Requires more RAM than the 1.3B T2V variant; CPU offload enabled by default.

Links

https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf

Tags

wan-2.1-flf2v-14b-720p-ggml

Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M. Takes a start and end reference image and interpolates a 33-frame clip between them. Unlike the plain I2V variant this model feeds the end frame through clip_vision as well, so it conditions semantically (not just in pixel-space) on both endpoints. That makes it the right choice for seamless loops (start_image == end_image) and clean narrative cuts. Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl text encoder, and clip_vision_h as I2V 14B.

Links

https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf

Tags

wan-2.1-i2v-14b-720p-ggml

Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M. Native 720p sibling of the 480p I2V model: animates a single reference image into a 33-frame clip at up to 1280x720. Trained purely as image-to-video (no first-last-frame interpolation path), so motion is freer and better-suited to single-anchor animation than repurposing the FLF2V 720P variant for i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P and FLF2V 14B 720P entries.

Links

https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf

Tags

sd-1.5-ggml

Stable Diffusion 1.5

Links

https://huggingface.co/second-state/stable-diffusion-v1-5-GGUF

Tags

sd-3.5-medium-ggml

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Links

Tags

sd-3.5-large-ggml

Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Links

Tags

flux.1-dev-ggml

FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. Key Features Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. Competitive prompt following, matching the performance of closed source alternatives . Trained using guidance distillation, making FLUX.1 [dev] more efficient. Open weights to drive new scientific research, and empower artists to develop innovative workflows. Generated outputs can be used for personal, scientific, and commercial purposes as described in the flux-1-dev-non-commercial-license. This model is quantized with GGUF

Links

Tags

flux.1-dev-ggml-q8_0

FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. Key Features Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. Competitive prompt following, matching the performance of closed source alternatives . Trained using guidance distillation, making FLUX.1 [dev] more efficient. Open weights to drive new scientific research, and empower artists to develop innovative workflows. Generated outputs can be used for personal, scientific, and commercial purposes as described in the flux-1-dev-non-commercial-license.

Links

Tags

flux.1-dev-ggml-abliterated-v2-q8_0

FLUX.1 [dev] is an abliterated version of FLUX.1 [dev]

Links

Tags

flux.1-krea-dev-ggml

FLUX.1 Krea [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post and Krea's blog post. Cutting-edge output quality, with a focus on aesthetic photography. Competitive prompt following, matching the performance of closed source alternatives. Trained using guidance distillation, making FLUX.1 Krea [dev] more efficient. Open weights to drive new scientific research, and empower artists to develop innovative workflows. Generated outputs can be used for personal, scientific, and commercial purposes, as described in the flux-1-dev-non-commercial-license.

Links

Tags

flux.1-krea-dev-ggml-q8_0

FLUX.1 Krea [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post and Krea's blog post. Cutting-edge output quality, with a focus on aesthetic photography. Competitive prompt following, matching the performance of closed source alternatives. Trained using guidance distillation, making FLUX.1 Krea [dev] more efficient. Open weights to drive new scientific research, and empower artists to develop innovative workflows. Generated outputs can be used for personal, scientific, and commercial purposes, as described in the flux-1-dev-non-commercial-license.

Links

Tags

whisper-1

Port of OpenAI's Whisper model in C/C++

Links

Tags

whisper-base

Port of OpenAI's Whisper model in C/C++

Links

Tags

whisper-tiny

Port of OpenAI's Whisper model in C/C++

Links

Tags

silero-vad-ggml

Silero VAD - pre-trained enterprise-grade Voice Activity Detector.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

vibevoice-cpp

vibevoice-cpp-asr

wan-2.1-t2v-1.3b-ggml

wan-2.1-i2v-14b-480p-ggml

wan-2.1-flf2v-14b-720p-ggml

wan-2.1-i2v-14b-720p-ggml

sd-1.5-ggml

sd-3.5-medium-ggml

sd-3.5-large-ggml

flux.1-dev-ggml

flux.1-dev-ggml-q8_0

flux.1-dev-ggml-abliterated-v2-q8_0

flux.1-krea-dev-ggml

flux.1-krea-dev-ggml-q8_0

whisper-1

whisper-base

whisper-tiny

silero-vad-ggml