LocalAI - Models

qwen3-30b-a3b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-30B-A3B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 30.5B in total and 3.3B activated Number of Paramaters (Non-Embedding): 29.9B Number of Layers: 48 Number of Attention Heads (GQA): 32 for Q and 4 for KV Number of Experts: 128 Number of Activated Experts: 8 Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Links

Tags

qwen3-32b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Links

Tags

qwen3-14b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-14B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 14.8B Number of Paramaters (Non-Embedding): 13.2B Number of Layers: 40 Number of Attention Heads (GQA): 40 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Links

Tags

qwen3-8b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Model Overview Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Links

Tags

qwen3-4b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Layers: 36 Number of Attention Heads (GQA): 32 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Links

Tags

qwen3-1.7b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Links

Tags

qwen3-0.6b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Links

Tags

shuttleai_shuttle-3.5

A fine-tuned version of Qwen3 32b, emulating the writing style of Claude 3 models and thoroughly trained on role-playing data. Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Shuttle 3.5 has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN.

Links

Tags

qwen3-33b-a3b-stranger-thoughts-abliterated-uncensored

WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Qwen3-33B-A3B-Stranger-Thoughts-Abliterated-Uncensored This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: A stranger, yet radically different version of "Qwen/Qwen3-30B-A3B", abliterated by "huihui-ai" , with 4 added layers expanding the model to 33B total parameters. The goal: slightly alter the model, to address some odd creative thinking and output choices AND de-censor it. Please note that the modifications affect the entire model operation; roughly I adjusted the model to think a little "deeper" and "ponder" a bit - but this is a very rough description. I also ran reasoning tests (non-creative) to ensure model was not damaged and roughly matched original model performance. That being said, reasoning and output generation will be altered regardless of your use case(s)

Links

Tags

zhi-create-qwen3-32b-i1

Zhi-Create-Qwen3-32B is a fine-tuned model derived from Qwen/Qwen3-32B, with a focus on enhancing creative writing capabilities. Through careful optimization, the model shows promising improvements in creative writing performance, as evaluated using the WritingBench. In our evaluation, the model attains a score of 82.08 on WritingBench, which represents a significant improvement over the base Qwen3-32B model's score of 78.97. Additionally, to maintain the model's general capabilities such as knowledge and reasoning, we performed fine-grained data mixture experiments by combining general knowledge, mathematics, code, and other data types. The final evaluation results show that general capabilities remain stable with no significant decline compared to the base model.

Links

Tags

nousresearch_hermes-4-14b

Hermes 4 14B is a frontier, hybrid-mode reasoning model based on Qwen 3 14B by Nous Research that is aligned to you. Read the Hermes 4 technical report here: Hermes 4 Technical Report Chat with Hermes in Nous Chat: https://chat.nousresearch.com Training highlights include a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment. What’s new vs Hermes 3 Post-training corpus: Massively increased dataset size from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data. Hybrid reasoning mode with explicit … segments when the model decides to deliberate, and options to make your responses faster when you want. Reasoning that is top quality, expressive, improves math, code, STEM, logic, and even creative writing and subjective responses. Schema adherence & structured outputs: trained to produce valid JSON for given schemas and to repair malformed objects. Much easier to steer and align: extreme improvements on steerability, especially on reduced refusal rates.

Links

Tags

gemma-3-glitter-12b-i1

A creative writing model based on Gemma 3 12B IT. This is a 50/50 merge of two separate trains: ToastyPigeon/g3-12b-rp-system-v0.1 - ~13.5M tokens of instruct-based training related to RP (2:1 human to synthetic) and examples using a system prompt. ToastyPigeon/g3-12b-storyteller-v0.2-textonly - ~20M tokens of completion training on long-form creative writing; 1.6M synthetic from R1, the rest human-created

Links

Tags

gemma-3-starshine-12b-i1

A creative writing model based on a merge of fine-tunes on Gemma 3 12B IT and Gemma 3 12B PT. This is the Story Focused merge. This version works better for storytelling and scenarios, as the prose is more novel-like and it has a tendency to impersonate the user character. See the Alternate RP Focused version as well. This is a merge of two G3 models, one trained on instruct and one trained on base: allura-org/Gemma-3-Glitter-12B - Itself a merge of a storywriting and RP train (both also by ToastyPigeon), on instruct ToastyPigeon/Gemma-3-Confetti-12B - Experimental application of the Glitter data using base instead of instruct, additionally includes some adventure data in the form of SpringDragon. The result is a lovely blend of Glitter's ability to follow instructions and Confetti's free-spirit prose, effectively 'loosening up' much of the hesitancy that was left in Glitter.

Links

Tags

tesslate_synthia-s1-27b

Synthia-S1-27b is a reasoning, AI model developed by Tesslate AI, fine-tuned specifically for advanced reasoning, coding, and RP usecases. Built upon the robust Gemma3 architecture, Synthia-S1-27b excels in logical reasoning, creative writing, and deep contextual understanding. It supports multimodal inputs (text and images) with a large 128K token context window, enabling complex analysis suitable for research, academic tasks, and enterprise-grade AI applications.

Links

Tags

arliai_llama-3.3-70b-arliai-rpmax-v1.4

RPMax is a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.

Links

Tags

sao10k_llama-3.3-70b-vulpecula-r1

🌟 A thinking-based model inspired by Deepseek-R1, trained through both SFT and a little bit of RL on creative writing data. 🧠 Prefill, or begin assistant replies with \n to activate thinking mode, or not. It works well without thinking too. 🚀 Improved Steerability, instruct-roleplay and creative control over base model. 👾 Semi-synthetic Chat/Roleplaying datasets that has been re-made, cleaned and filtered for repetition, quality and output. 🎭 Human-based Natural Chat / Roleplaying datasets cleaned, filtered and checked for quality. 📝 Diverse Instruct dataset from a few different LLMs, cleaned and filtered for refusals and quality. 💭 Reasoning Traces taken from Deepseek-R1 for Instruct, Chat & Creative Tasks, filtered and cleaned for quality. █▓▒ Toxic / Decensorship data was not needed for our purposes, the model is unrestricted enough as is.

Links

Tags

l3.3-geneticlemonade-unleashed-v2-70b

An experimental release. zerofata/GeneticLemonade-Unleashed qlora trained on a test dataset. Performance is improved from the original in my testing, but there are possibly (likely?) areas where the model will underperform which I am looking for feedback on. This is a creative model intended to excel at character driven RP / ERP. It has not been tested or trained on adventure stories or any large amounts of creative writing.

Links

Tags

l3.3-genetic-lemonade-sunset-70b

Inspired to learn how to merge by the Nevoria series from SteelSkull. I wasn't planning to release any more models in this series, but I wasn't fully satisfied with Unleashed or the Final version. I happened upon the below when testing merges and found myself coming back to it, so decided to publish. Model Comparison Designed for RP and creative writing, all three models are focused around striking a balance between writing style, creativity and intelligence.

Links

Tags

e-n-v-y_legion-v2.1-llama-70b-elarablated-v0.8-hf

This checkpoint was finetuned with a process I'm calling "Elarablation" (a portamenteau of "Elara", which is a name that shows up in AI-generated writing and RP all the time) and "ablation". The idea is to reduce the amount of repetitiveness and "slop" that the model exhibits. In addition to significantly reducing the occurrence of the name "Elara", I've also reduced other very common names that pop up in certain situations. I've also specifically attacked two phrases, "voice barely above a whisper" and "eyes glinted with mischief", which come up a lot less often now. Finally, I've convinced it that it can put a f-cking period after the word "said" because a lot of slop-ish phrases tend to come after "said,". You can check out some of the more technical details in the overview on my github repo, here: https://github.com/envy-ai/elarablate My current focus has been on some of the absolute worst offending phrases in AI creative writing, but I plan to go after RP slop as well. If you run into any issues with this model (going off the rails, repeating tokens, etc), go to the community tab and post the context and parameters in a comment so I can look into it. Also, if you have any "slop" pet peeves, post the context of those as well and I can try to reduce/eliminate them in the next version. The settings I've tested with are temperature at 0.7 and all other filters completely neutral. Other settings may lead to better or worse results.

Links

Tags

steelskull_l3.3-shakudo-70b

L3.3-Shakudo-70b is the result of a multi-stage merging process by Steelskull, designed to create a powerful and creative roleplaying model with a unique flavor. The creation process involved several advanced merging techniques, including weight twisting, to achieve its distinct characteristics. Stage 1: The Cognitive Foundation & Weight Twisting The process began by creating a cognitive and tool-use focused base model, L3.3-Cogmoblated-70B. This was achieved through a `model_stock` merge of several models known for their reasoning and instruction-following capabilities. This base was built upon `nbeerbower/Llama-3.1-Nemotron-lorablated-70B`, a model intentionally "ablated" to skew refusal behaviors. This technique, known as weight twisting, helps the final model adopt more desirable response patterns by building upon a foundation that is already aligned against common refusal patterns. Stage 2: The Twin Hydrargyrum - Flavor and Depth Two distinct models were then created from the Cogmoblated base: L3.3-M1-Hydrargyrum-70B: This model was merged using `SCE`, a technique that enhances creative writing and prose style, giving the model its unique "flavor." The Top_K for this merge were set at 0.22 . L3.3-M2-Hydrargyrum-70B: This model was created using a `Della_Linear` merge, which focuses on integrating the "depth" of various roleplaying and narrative models. The settings for this merge were set at: (lambda: 1.1) (weight: 0.2) (density: 0.7) (epsilon: 0.2) Final Stage: Shakudo The final model, L3.3-Shakudo-70b, was created by merging the two Hydrargyrum variants using a 50/50 `nuslerp`. This final step combines the rich, creative prose (flavor) from the SCE merge with the strong roleplaying capabilities (depth) from the Della_Linear merge, resulting in a model with a distinct and refined narrative voice. A special thank you to Nectar.ai for their generous support of the open-source community and my projects. Additionally, a heartfelt thanks to all the Ko-fi supporters who have contributed—your generosity is deeply appreciated and helps keep this work going and the Pods spinning.

Links

Tags

invisietch_l3.3-ignition-v0.1-70b

Ignition v0.1 is a Llama 3.3-based model merge designed for creative roleplay and fiction writing purposes. The model underwent a multi-stage merge process designed to optimise for creative writing capability, minimising slop, and improving coherence when compared with its constituent models. The model shows a preference for detailed character cards and is sensitive to detailed system prompting. If you want a specific behavior from the model, try prompting for it directly. Inferencing has been tested at fp8 and fp16, and both are coherent up to ~64k context.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3-30b-a3b

qwen3-32b

qwen3-14b

qwen3-8b

qwen3-4b

qwen3-1.7b

qwen3-0.6b

shuttleai_shuttle-3.5

qwen3-33b-a3b-stranger-thoughts-abliterated-uncensored

zhi-create-qwen3-32b-i1

nousresearch_hermes-4-14b

gemma-3-glitter-12b-i1

gemma-3-starshine-12b-i1

tesslate_synthia-s1-27b

arliai_llama-3.3-70b-arliai-rpmax-v1.4

sao10k_llama-3.3-70b-vulpecula-r1

l3.3-geneticlemonade-unleashed-v2-70b

l3.3-genetic-lemonade-sunset-70b

e-n-v-y_legion-v2.1-llama-70b-elarablated-v0.8-hf

steelskull_l3.3-shakudo-70b

invisietch_l3.3-ignition-v0.1-70b