LocalAI - Models

nemo-parakeet-tdt-0.6b

NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.

Links

Tags

qwen3-tts-cpp-0.6b-base-q4

Qwen3-TTS 0.6B Base (C++ / GGML, qwentts.cpp), Q4_K_M (~0.6 GB talker). Streaming + voice cloning, 24kHz mono, 11 languages.

Links

Tags

qwen3-tts-0.6b-custom-voice

Qwen3-TTS is a high-quality text-to-speech model supporting custom voice, voice design, and voice cloning.

Links

https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice

Tags

qwen3-asr-0.6b

Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.

Links

https://huggingface.co/Qwen/Qwen3-ASR-0.6B

Tags

qwen3-asr-0.6b

Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.

Links

https://huggingface.co/Qwen/Qwen3-ASR-0.6B

Tags

qwen3-reranker-0.6b

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining. **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks No.1 in the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58), while the reranking model excels in various text retrieval scenarios. **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios. **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities. **Qwen3-Reranker-0.6B** has the following features: - Model Type: Text Reranking - Supported Languages: 100+ Languages - Number of Paramaters: 0.6B - Context Length: 32k - Quantization: q4_K_M, q5_0, q5_K_M, q6_K, q8_0, f16

Links

https://huggingface.co/Qwen/Qwen3-Reranker-0.6B

Tags

qwen3-0.6b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Links

Tags

qwen3-embedding-0.6b

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining. **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios. **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios. **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities. **Qwen3-Embedding-0.6B-GGUF** has the following features: - Model Type: Text Embedding - Supported Languages: 100+ Languages - Number of Paramaters: 0.6B - Context Length: 32k - Embedding Dimension: Up to 1024, supports user-defined output dimensions ranging from 32 to 1024 - Quantization: q8_0, f16

Links

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF

Tags

gustavecortal_beck-0.6b

A language model that handles delicate life situations and tries to really help you. Beck is based on Piaget and was finetuned on psychotherapeutic preferences from PsychoCounsel-Preference. Methodology Beck was trained using preference optimization (ORPO) and LoRA. You can reproduce the results using my repo for lightweight preference optimization using this config that contains the hyperparameters. This work was performed using HPC resources (Jean Zay supercomputer) from GENCI-IDRIS (Grant 20XX-AD011014205). Inspiration Beck aims to reason about psychological and philosophical concepts such as self-image, emotion, and existence. Beck was inspired by my position paper on emotion analysis: Improving Language Models for Emotion Analysis: Insights from Cognitive Science.

Links

Tags

aevum-0.6b-finetuned

**Model Name:** Aevum-0.6B-Finetuned **Base Model:** Qwen3-0.6B **Architecture:** Decoder-only Transformer **Parameters:** 0.6 Billion **Task:** Code Generation, Instruction Following **Languages:** English, Python (optimized for code) **License:** Apache 2.0 **Overview:** Aevum-0.6B-Finetuned is a highly efficient, small-scale language model fine-tuned for code generation and task following. Built on the Qwen3-0.6B foundation, it delivers strong performance—achieving a **HumanEval Pass@1 score of 21.34%**—making it the most parameter-efficient sub-1B model in its category. **Key Features:** - Optimized for low-latency inference on CPU and edge devices. - Fine-tuned on MBPP and DeepMind Code Contests for superior code generation accuracy. - Ideal for lightweight development, education, and prototyping. **Use Case:** Perfect for developers and researchers needing a fast, compact, and open model for Python code generation without requiring high-end hardware. **Performance Benchmark:** Outperforms larger models in efficiency: comparable to models 10x its size in task accuracy. **Cite:** @misc{aveum06B2025, title={aevum-0.6B-Finetuned: Lightweight Python Code Generation Model}, author={anonymous}, year={2025}} **Try it:** Use via Hugging Face `transformers` library with minimal setup. 👉 [Model Page on Hugging Face](https://huggingface.co/Aevum-Official/aveum-0.6B-Finetuned)

Links

https://huggingface.co/mradermacher/Aevum-0.6B-Finetuned-GGUF

Tags

parakeet-cpp-ctc-0.6b

CTC FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Links

Tags

parakeet-cpp-rnnt-0.6b

RNNT FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Links

Tags

parakeet-cpp-tdt-0.6b-v2

TDT FastConformer, 0.6B (v2). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Links

Tags

parakeet-cpp-tdt-0.6b-v3

TDT FastConformer, 0.6B (v3, multilingual). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Links

Tags

parakeet-cpp-nemotron-3.5-asr-streaming-0.6b

Multilingual (40+ locales), prompt-conditioned, cache-aware streaming FastConformer RNN-T, 0.6B. Q8_0 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo). Byte-identical to NeMo at WER 0 offline and streaming, about 2.5x faster than NeMo on CPU with no GPU. Select a language with the request "language" field (for example en, de, es, ja-JP), or leave it empty for automatic detection. License OpenMDW-1.1.

Links

Tags

parakeet-rnnt-0.6b-crispasr

NVIDIA Parakeet RNN-Transducer 0.6B (24-layer FastConformer) ASR. Runs via the CrispASR backend. Default GGUF size ~447 MB.

Links

https://huggingface.co/cstr/parakeet-rnnt-0.6b-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

nemo-parakeet-tdt-0.6b

qwen3-tts-cpp-0.6b-base-q4

qwen3-tts-0.6b-custom-voice

qwen3-asr-0.6b

qwen3-asr-0.6b

qwen3-reranker-0.6b

qwen3-0.6b

qwen3-embedding-0.6b

gustavecortal_beck-0.6b

aevum-0.6b-finetuned

parakeet-cpp-ctc-0.6b

parakeet-cpp-rnnt-0.6b

parakeet-cpp-tdt-0.6b-v2

parakeet-cpp-tdt-0.6b-v3

parakeet-cpp-nemotron-3.5-asr-streaming-0.6b

parakeet-rnnt-0.6b-crispasr