LocalAI - Models

qwen3-30b-a3b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-30B-A3B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 30.5B in total and 3.3B activated Number of Paramaters (Non-Embedding): 29.9B Number of Layers: 48 Number of Attention Heads (GQA): 32 for Q and 4 for KV Number of Experts: 128 Number of Activated Experts: 8 Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Links

Tags

qwen3-32b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Links

Tags

qwen3-14b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-14B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 14.8B Number of Paramaters (Non-Embedding): 13.2B Number of Layers: 40 Number of Attention Heads (GQA): 40 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Links

Tags

qwen3-8b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Model Overview Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Links

Tags

qwen3-4b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Links

Tags

qwen3-1.7b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Links

Tags

qwen3-0.6b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Links

Tags

shuttleai_shuttle-3.5

A fine-tuned version of Qwen3 32b, emulating the writing style of Claude 3 models and thoroughly trained on role-playing data. Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Shuttle 3.5 has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Links

Tags

yanolja_yanoljanext-rosetta-12b-2510

This model is a fine-tuned version of google/gemma-3-12b-pt. As it is intended solely for text generation, we have extracted and utilized only the Gemma3ForCausalLM component from the original architecture. Unlike our previous EEVE models, this model does not feature an expanded tokenizer. Base Model: google/gemma-3-12b-pt This model is a 12-billion parameter, decoder-only language model built on the Gemma3 architecture and fine-tuned by Yanolja NEXT. It is specifically designed to translate structured data (JSON format) while preserving the original data structure. The model was trained on a multilingual dataset covering the following languages equally: Arabic Bulgarian Chinese Czech Danish Dutch English Finnish French German Greek Gujarati Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Persian Polish Portuguese Romanian Russian Slovak Spanish Swedish Tagalog Thai Turkish Ukrainian Vietnamese While optimized for these languages, it may also perform effectively on other languages supported by the base Gemma3 model.

Links

Tags

eurollm-9b-instruct

The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages. EuroLLM-9B is a 9B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets. EuroLLM-9B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.

Links

Tags

llama3.1-70b-chinese-chat

"Llama3.1-70B-Chinese-Chat" is a 70-billion parameter large language model pre-trained on a large corpus of Chinese text data. It is designed for chat and dialog applications, and can generate human-like responses to various prompts and inputs. The model is based on the Llama3.1 architecture and has been fine-tuned for Chinese language understanding and generation. It can be used for a wide range of natural language processing tasks, including language translation, text summarization, question answering, and more.

Links

Tags

qwen-sea-lion-v4-32b-it-i1

**Model Name:** Qwen-SEA-LION-v4-32B-IT **Base Model:** Qwen3-32B **Type:** Instruction-tuned Large Language Model (LLM) **Language Support:** 11 languages including English, Mandarin, Burmese, Indonesian, Malay, Filipino, Tamil, Thai, Vietnamese, Khmer, and Lao **Context Length:** 128,000 tokens **Repository:** [aisingapore/Qwen-SEA-LION-v4-32B-IT](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-32B-IT) **License:** [Qwen Terms of Service](https://qwen.ai/termsservice) / [Qwen Usage Policy](https://qwen.ai/usagepolicy) **Overview:** Qwen-SEA-LION-v4-32B-IT is a high-performance, multilingual instruction-tuned LLM developed by AI Singapore, specifically optimized for Southeast Asia (SEA). Built on the Qwen3-32B foundation, it underwent continued pre-training on 100B tokens from the SEA-Pile v2 corpus and further fine-tuned on ~8 million question-answer pairs to enhance instruction-following and reasoning. Designed for real-world multilingual applications across government, education, and business sectors in Southeast Asia, it delivers strong performance in dialogue, content generation, and cross-lingual tasks. **Key Features:** - Trained for 11 major SEA languages with high linguistic accuracy - 128K token context for long-form content and complex reasoning - Optimized for instruction following, multi-turn dialogue, and cultural relevance - Available in full precision and quantized variants (4-bit/8-bit) - Not safety-aligned — suitable for downstream safety fine-tuning **Use Cases:** - Multilingual chatbots and virtual assistants in SEA regions - Cross-lingual content generation and translation - Educational tools and public sector applications in Southeast Asia - Research and development in low-resource language modeling **Note:** This model is not safety-aligned. Use with caution and consider additional alignment measures for production deployment. **Contact:** [[email protected]](mailto:[email protected]) for inquiries.

Links

https://huggingface.co/mradermacher/Qwen-SEA-LION-v4-32B-IT-i1-GGUF

Tags

yanoljanext-rosetta-27b-2511-i1

**YanoljaNEXT-Rosetta-27B-2511** *A multilingual, structure-preserving translation model built on Gemma3* This 27-billion-parameter language model, developed by Yanolja NEXT, is fine-tuned from **Google’s Gemma3-27B** to excel at translating structured data (JSON, YAML, XML) while preserving the original format. It supports **32 languages**, including English, Chinese, Korean, Japanese, German, French, Spanish, and more, with balanced training across all languages. Designed specifically for **high-accuracy, structured translation tasks**—such as localizing product catalogs, translating travel content, or handling technical documentation—the model ensures output remains syntactically valid and semantically precise. It achieves top-tier performance on English-to-Korean translation (CHrF++ score: **37.21**) and is optimized for efficient inference. The model is released under the **Gemma license**, making it suitable for research and commercial use with proper attribution. **Use Case:** Ideal for developers and localization teams needing reliable, format-aware translation in multilingual applications. **Base Model:** `google/gemma-3-27b-pt` **License:** Gemma (via Google) **Repository:** [yanolja/YanoljaNEXT-Rosetta-27B-2511](https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-27B-2511)

Links

https://huggingface.co/mradermacher/YanoljaNEXT-Rosetta-27B-2511-i1-GGUF

Tags

canary-crispasr

NVIDIA Canary 1B v2 (FastConformer encoder-decoder), multilingual ASR + translation. Runs via the CrispASR backend. Default GGUF size ~600 MB.

Links

https://huggingface.co/cstr/canary-1b-v2-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3-30b-a3b

qwen3-32b

qwen3-14b

qwen3-8b

qwen3-4b

qwen3-1.7b

qwen3-0.6b

shuttleai_shuttle-3.5

yanolja_yanoljanext-rosetta-12b-2510

eurollm-9b-instruct

llama3.1-70b-chinese-chat

qwen-sea-lion-v4-32b-it-i1

yanoljanext-rosetta-27b-2511-i1

canary-crispasr