Model Gallery

22 models from 1 repositories

Filter by type:

Filter by tags:

ds-r1-qwen3-8b-arliai-rpr-v4-small-iq-imatrix
The best RP/creative model series from ArliAI yet again. This time made based on DS-R1-0528-Qwen3-8B-Fast for a smaller memory footprint. Reduced repetitions and impersonation To add to the creativity and out of the box thinking of RpR v3, a more advanced filtering method was used in order to remove examples where the LLM repeated similar phrases or talked for the user. Any repetition or impersonation cases that happens will be due to how the base QwQ model was trained, and not because of the RpR dataset. Increased training sequence length The training sequence length was increased to 16K in order to help awareness and memory even on longer chats.

Repository: localaiLicense: apache-2.0

qwen3-22b-a3b-the-harley-quinn
WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Qwen3-22B-A3B-The-Harley-Quinn This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: A stranger, yet radically different version of Kalmaze's "Qwen/Qwen3-16B-A3B" with the experts pruned to 64 (from 128, the Qwen 3 30B-A3B version) and then I added 19 layers expanding (Brainstorm 20x by DavidAU info at bottom of this page) the model to 22B total parameters. The goal: slightly alter the model, to address some odd creative thinking and output choices. Then... Harley Quinn showed up, and then it was a party! A wild, out of control (sometimes) but never boring party. Please note that the modifications affect the entire model operation; roughly I adjusted the model to think a little "deeper" and "ponder" a bit - but this is a very rough description. That being said, reasoning and output generation will be altered regardless of your use case(s). These modifications pushes Qwen's model to the absolute limit for creative use cases. Detail, vividiness, and creativity all get a boost. Prose (all) will also be very different from "default" Qwen3. Likewise, regen(s) of the same prompt - even at the same settings - will create very different version(s) too. The Brainstrom 20x has also lightly de-censored the model under some conditions. However, this model can be prone to bouts of madness. It will not always behave, and it will sometimes go -wildly- off script. See 4 examples below. Model retains full reasoning, and output generation of a Qwen3 MOE ; but has not been tested for "non-creative" use cases. Model is set with Qwen's default config: 40 k context 8 of 64 experts activated. Chatml OR Jinja Template (embedded) Four example generations below. IMPORTANT: See usage guide / repo below to get the most out of this model, as settings are very specific. If not set correctly, this model will not work the way it should. Critical settings: Chatml or Jinja Template (embedded, but updated version at repo below) Rep pen of 1.01 or 1.02 ; higher (1.04, 1.05) will result in "Harley Mode". Temp range of .6 to 1.2. ; higher you may need to prompt the model to "output" after thinking. Experts set at 8-10 ; higher will result in "odder" output BUT it might be better. That being said, "Harley Quinn" may make her presence known at any moment. USAGE GUIDE: Please refer to this model card for Specific usage, suggested settings, changing ACTIVE EXPERTS, templates, settings and the like: How to maximize this model in "uncensored" form, with specific notes on "abliterated" models. Rep pen / temp settings specific to getting the model to perform strongly. https://huggingface.co/DavidAU/Qwen3-18B-A3B-Stranger-Thoughts-Abliterated-Uncensored-GGUF GGUF / QUANTS / SPECIAL SHOUTOUT: Special thanks to team Mradermacher for making the quants! https://huggingface.co/mradermacher/Qwen3-22B-A3B-The-Harley-Quinn-GGUF KNOWN ISSUES: Model may "mis-capitalize" word(s) - lowercase, where uppercase should be - from time to time. Model may add extra space from time to time before a word. Incorrect template and/or settings will result in a drop in performance / poor performance. Can rant at the end / repeat. Most of the time it will stop on its own. Looking for the Abliterated / Uncensored version? https://huggingface.co/DavidAU/Qwen3-23B-A3B-The-Harley-Quinn-PUDDIN-Abliterated-Uncensored In some cases this "abliterated/uncensored" version may work better than this version. EXAMPLES Standard system prompt, rep pen 1.01-1.02, topk 100, topp .95, minp .05, rep pen range 64. Tested in LMStudio, quant Q4KS, GPU (CPU output will differ slightly). As this is the mid range quant, expected better results from higher quants and/or with more experts activated to be better. NOTE: Some formatting lost on copy/paste. WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun.

Repository: localaiLicense: apache-2.0

nousresearch_hermes-4-14b
Hermes 4 14B is a frontier, hybrid-mode reasoning model based on Qwen 3 14B by Nous Research that is aligned to you. Read the Hermes 4 technical report here: Hermes 4 Technical Report Chat with Hermes in Nous Chat: https://chat.nousresearch.com Training highlights include a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment. What’s new vs Hermes 3 Post-training corpus: Massively increased dataset size from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data. Hybrid reasoning mode with explicit … segments when the model decides to deliberate, and options to make your responses faster when you want. Reasoning that is top quality, expressive, improves math, code, STEM, logic, and even creative writing and subjective responses. Schema adherence & structured outputs: trained to produce valid JSON for given schemas and to repair malformed objects. Much easier to steer and align: extreme improvements on steerability, especially on reduced refusal rates.

Repository: localaiLicense: apache-2.0

gemma-3-12b-fornaxv.2-qat-cot
This model is an experiment to try to produce a strong smaller thinking model capable of fitting in an 8GiB consumer graphics card with generalizeable reasoning capabilities. Most other open source thinking models, especially on the smaller side, fail to generalize their reasoning to tasks other than coding or math due to an overly large focus on GRPO zero for CoT which is only applicable for coding and math. Instead of using GRPO, this model aims to SFT a wide variety of high quality, diverse reasoning traces from Deepseek R1 onto Gemma 3 to force the model to learn to effectively generalize its reasoning capabilites to a large number of tasks as an extension of the LiMO paper's approach to Math/Coding CoT. A subset of V3 O3/24 non-thinking data was also included for improved creativity and to allow the model to retain it's non-thinking capabilites. Training off the QAT checkpoint allows for this model to be used without a drop in quality at Q4_0, requiring only ~6GiB of memory. Thinking Mode Similar to the Qwen 3 model line, Gemma Fornax can be used with or without thinking mode enabled. To enable thinking place /think in the system prompt and prefill \n for thinking mode. To disable thinking put /no_think in the system prompt.

Repository: localaiLicense: gemma

l3.3-geneticlemonade-unleashed-70b-i1
Inspired to learn how to merge by the Nevoria series from SteelSkull. This model is the result of a few dozen different attempts of learning how to merge. Designed for RP, this model is mostly uncensored and focused around striking a balance between writing style, creativity and intelligence.

Repository: localaiLicense: llama3.3

l3.3-genetic-lemonade-sunset-70b
Inspired to learn how to merge by the Nevoria series from SteelSkull. I wasn't planning to release any more models in this series, but I wasn't fully satisfied with Unleashed or the Final version. I happened upon the below when testing merges and found myself coming back to it, so decided to publish. Model Comparison Designed for RP and creative writing, all three models are focused around striking a balance between writing style, creativity and intelligence.

Repository: localaiLicense: llama3.3

sophosympatheia_strawberrylemonade-70b-v1.1
This 70B parameter model is a merge of zerofata/L3.3-GeneticLemonade-Final-v2-70B and zerofata/L3.3-GeneticLemonade-Unleashed-v3-70B, which are two excellent models for roleplaying, on top of two different base models that were then combined into this model. In my opinion, this merge improves upon my previous release (v1.0) with enhanced creativity and expressiveness. This model is uncensored. You are responsible for whatever you do with it. This model was designed for roleplaying and storytelling and I think it does well at both. It may also perform well at other tasks but I have not tested its performance in other areas.

Repository: localaiLicense: llama3

astral-fusion-neural-happy-l3.1-8b
Astral-Fusion-Neural-Happy-L3.1-8B is a celestial blend of magic, creativity, and dynamic storytelling. Designed to excel in instruction-following, immersive roleplaying, and magical narrative generation, this model is a fusion of the finest qualities from Astral-Fusion, NIHAPPY, and NeuralMahou. ✨🚀 This model is perfect for anyone seeking a cosmic narrative experience, with the ability to generate both precise instructional content and fantastical stories in one cohesive framework. Whether you're crafting immersive stories, creating AI roleplaying characters, or working on interactive storytelling, this model brings out the magic. 🌟

Repository: localaiLicense: apache-2.0

llama3.1-darkstorm-aspire-8b
Welcome to Llama3.1-DarkStorm-Aspire-8B — an advanced and versatile 8B parameter AI model born from the fusion of powerful language models, designed to deliver superior performance across research, writing, coding, and creative tasks. This unique merge blends the best qualities of the Dark Enigma, Storm, and Aspire models, while built on the strong foundation of DarkStock. With balanced integration, it excels in generating coherent, context-aware, and imaginative outputs. Llama3.1-DarkStorm-Aspire-8B combines cutting-edge natural language processing capabilities to perform exceptionally well in a wide variety of tasks: Research and Analysis: Perfect for analyzing textual data, planning experiments, and brainstorming complex ideas. Creative Writing and Roleplaying: Excels in creative writing, immersive storytelling, and generating roleplaying scenarios. General AI Applications: Use it for any application where advanced reasoning, instruction-following, and creativity are needed.

Repository: localaiLicense: apache-2.0

l3.1-8b-slush-i1
Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge. This is an initial experiment done on the at-this-point-infamous Llama 3.1 8B model, in an attempt to retain its smartness while addressing its abysmal lack of imagination/creativity. As always, feedback is welcome, and begone if you demand perfection. The second stage, like the Sunfall series, follows the Silly Tavern preset, so ymmv in particular if you use some other tool and/or preset.

Repository: localaiLicense: llama3

control-nanuq-8b
The model is a fine-tuned version of LLaMA 3.1 8B Supernova, designed to be "short and sweet" by minimizing narration and lengthy responses. It was fine-tuned over 4 epochs using OpenCAI and RP logs, with DPO applied to enhance coherence. Finally, KTO reinforcement learning was implemented on version 1.1, significantly improving the model's prose and creativity.

Repository: localaiLicense: llama3.1

nousresearch_hermes-4-70b
Hermes 4 70B is a frontier, hybrid-mode reasoning model based on Llama-3.1-70B by Nous Research that is aligned to you. Read the Hermes 4 technical report here: Hermes 4 Technical Report Chat with Hermes in Nous Chat: https://chat.nousresearch.com Training highlights include a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment. What’s new vs Hermes 3 Post-training corpus: Massively increased dataset size from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data. Hybrid reasoning mode with explicit … segments when the model decides to deliberate, and options to make your responses faster when you want. Reasoning that is top quality, expressive, improves math, code, STEM, logic, and even creative writing and subjective responses. Schema adherence & structured outputs: trained to produce valid JSON for given schemas and to repair malformed objects. Much easier to steer and align: extreme improvements on steerability, especially on reduced refusal rates.

Repository: localaiLicense: llama3

valor-7b-v0.1
Valor speaks louder than words. This is a qlora finetune of blockchainlabs_7B_merged_test2_4 using the Neural-Story-v0.1 dataset, with the intention of increasing creativity and writing ability.

Repository: localaiLicense: apache-2.0

eurydice-24b-v2-i1
Eurydice 24b v2 is designed to be the perfect companion for multi-role conversations. It demonstrates exceptional contextual understanding and excels in creativity, natural conversation and storytelling. Built on Mistral 3.1, this model has been trained on a custom dataset specifically crafted to enhance its capabilities.

Repository: localaiLicense: apache-2.0

thedrummer_snowpiercer-15b-v1
Snowpiercer 15B v1 knocks out the positivity, enhances the RP & creativity, and retains the intelligence & reasoning.

Repository: localaiLicense: mit

mistral-small-3.2-46b-the-brilliant-raconteur-ii-instruct-2506
WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Mistral-Small-3.2-46B-The-Brilliant-Raconteur-II-Instruct-2506 This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: A stronger, more creative Mistral (Mistral-Small-3.2-24B-Instruct-2506) extended to 79 layers, 46B parameters with Brainstorm 40x by DavidAU (details at very bottom of the page). This is version II, which has a jump in detail, and raw emotion relative to version 1. This model pushes Mistral's Instruct 2506 to the limit: Regens will be very different, even with same prompt / settings. Output generation will vary vastly on each generation. Reasoning will be changed, and often shorter. Prose, creativity, word choice, and general "flow" are improved. Several system prompts below help push this model even further. Model is partly de-censored / abliterated. Most Mistrals are more uncensored that most other models too. This model can also be used for coding too; even at low quants. Model can be used for all use cases too. As this is an instruct model, this model thrives on instructions - both in the system prompt and/or the prompt itself. One example below with 3 generations using Q4_K_S. Second example below with 2 generations using Q4_K_S. Quick Details: Model is 128k context, Jinja template (embedded) OR Chatml Template. Reasoning can be turned on/off (see system prompts below) and is OFF by default. Temp range .1 to 1 suggested, with 1-2 for enhanced creative. Above temp 2, is strong but can be very different. Rep pen range: 1 (off) or very light 1.01, 1.02 to 1.05. (model is sensitive to rep pen - this affects reasoning / generation length.) For creative/brainstorming use: suggest 2-5 generations due to variations caused by Brainstorm. Observations: Sometimes using Chatml (or Alpaca / others ) template (VS Jinja) will result in stronger creative generation. Model can be operated with NO system prompt; however a system prompt will enhance generation. Longer prompts, that more detailed, with more instructions will result in much stronger generations. For prose directives: You may need to add directions, because the model may follow your instructions too closely. IE: "use short sentences" vs "use short sentences sparsely". Reasoning (on) can lead to better creative generation, however sometimes generation with reasoning off is better. Rep pen of up to 1.05 may be needed on quants Q2k/q3ks for some prompts to address "low bit" issues. Detailed settings, system prompts, how to and examples below. NOTES: Image generation should also be possible with this model, just like the base model. Brainstorm was not applied to the image generation systems of the model... yet. This is Version II and subject to change / revision. This model is a slightly different version of: https://huggingface.co/DavidAU/Mistral-Small-3.2-46B-The-Brilliant-Raconteur-Instruct-2506

Repository: localaiLicense: apache-2.0

impish_nemo_12b
August 2025, Impish_Nemo_12B — my best model yet. And unlike a typical Nemo, this one can take in much higher temperatures (works well with 1+). Oh, and regarding following the character card: It somehow gotten even better, to the point of it being straight up uncanny 🙃 (I had to check twice that this model was loaded, and not some 70B!) I feel like this model could easily replace models much larger than itself for adventure or roleplay, for assistant tasks, obviously not, but the creativity here? Off the charts. Characters have never felt so alive and in the moment before — they’ll use insinuation, manipulation, and, if needed (or provoked) — force. They feel so very present. That look on Neo’s face when he opened his eyes and said, “I know Kung Fu”? Well, Impish_Nemo_12B had pretty much the same moment — and it now knows more than just Kung Fu, much, much more. It wasn’t easy, and it’s a niche within a niche, but as promised almost half a year ago — it is now done. Impish_Nemo_12B is smart, sassy, creative, and got a lot of unhingedness too — these are baked-in deep into every interaction. It took the innate Mistral's relative freedom, and turned it up to 11. It very well maybe too much for many, but after testing and interacting with so many models, I find this 'edge' of sorts, rather fun and refreshing. Anyway, the dataset used is absolutely massive, tons of new types of data and new domains of knowledge (Morrowind fandom, fighting, etc...). The whole dataset is a very well-balanced mix, and resulted in a model with extremely strong common sense for a 12B. Regarding response length — there's almost no response-length bias here, this one is very much dynamic and will easily adjust reply length based on 1–3 examples of provided dialogue. Oh, and the model comes with 3 new Character Cards, 2 Roleplay and 1 Adventure!

Repository: localaiLicense: apache-2.0

tlacuilo-12b
**Tlacuilo-12B** is a 12-billion-parameter fine-tuned language model developed by Allura Org, based on **Mistral-Nemo-Base-2407** and **Muse-12B**, optimized for high-quality creative writing, roleplay, and narrative generation. Trained using a three-stage QLoRA process with diverse datasets—including literary texts, roleplay content, and instruction-following data—the model excels in coherent, expressive, and stylistically rich prose. Key features: - **Base models**: Built on Mistral-Nemo-Base-2407 and Muse-12B for strong reasoning and narrative capability. - **Fine-tuned for creativity**: Optimized for roleplay, storytelling, and imaginative writing with natural, fluid prose. - **Chat template**: Uses **ChatML**, making it compatible with standard conversational interfaces. - **Recommended settings**: Works well with temperature 1.0–1.3 and min-p 0.02–0.05 for balanced, engaging responses. Ideal for writers, game masters, and creative professionals seeking a versatile, high-performance model for narrative tasks. > *Note: The GGUF quantized version (e.g., `Ennthen/Tlacuilo-12B-Q4_K_M-GGUF`) is a conversion of this base model for local inference via llama.cpp.*

Repository: localaiLicense: apache-2.0

magidonia-24b-v4.2.0-i1
**Model Name:** Magidonia 24B v4.2.0 **Base Model:** mistralai/Magistral-Small-2509 **Author:** TheDrummer **License:** MIT (as per standard for Hugging Face models) **Model Type:** Fine-tuned large language model (LLM) **Size:** 24 billion parameters **Description:** Magidonia 24B v4.2.0 is a creatively-oriented, open-weight fine-tuned language model developed by TheDrummer. Built upon the **Magistral-Small-2509** base, this model emphasizes **creativity, narrative dynamism, and expressive language use**—ideal for storytelling, roleplay, and imaginative writing. It features enhanced reasoning with a built-in **THINKING MODE**, activated using `` and `` tokens, encouraging detailed inner monologue before response generation. Designed for flexibility and minimal alignment constraints, it's well-suited for entertainment, world-building, and experimental use cases. **Key Features:** - Strong creative and literary capabilities - Supports structured thinking via special tokens - Optimized for roleplay and dynamic storytelling - Available in GGUF format for local inference (via llama.cpp, etc.) - Includes iMatrix quantization for high-quality low-precision performance **Use Case:** Ideal for writers, game masters, and AI artists seeking expressive, unfiltered, and imaginative language models. **Repository:** [TheDrummer/Magidonia-24B-v4.2.0](https://huggingface.co/TheDrummer/Magidonia-24B-v4.2.0) **Quantized Version (GGUF):** [mradermacher/Magidonia-24B-v4.2.0-i1-GGUF](https://huggingface.co/mradermacher/Magidonia-24B-v4.2.0-i1-GGUF) *(for reference only — use original for full description)*

Repository: localaiLicense: apache-2.0

cydonia-24b-v4.2.0-i1
**Cydonia-24B-v4.2.0** is a creatively oriented, large language model developed by *TheDrummer*, based on the **Mistral-Small-3.2-24B-Instruct-2507** foundation. Fine-tuned for dynamic storytelling, imaginative writing, and expressive roleplay, it excels in narrative coherence, linguistic flair, and non-aligned, open-ended interaction. Designed for users seeking creativity over strict alignment, the model delivers rich, engaging, and often surprising outputs—ideal for fiction writing, worldbuilding, and entertainment-focused AI use. **Key Features:** - Built on Mistral-Small-3.2-24B-Instruct-2507 base - Optimized for creative writing, roleplay, and narrative depth - Minimal alignment constraints for greater freedom and expression - Available in GGUF, EXL3, and iMatrix formats for local inference > *“This is the best model of yours I've tried yet… It writes superbly well.”* – User testimonial **Best For:** Writers, worldbuilders, and creators who value imagination, voice, and stylistic richness over rigid safety or factual accuracy. *Model Repository:* [TheDrummer/Cydonia-24B-v4.2.0](https://huggingface.co/TheDrummer/Cydonia-24B-v4.2.0)

Repository: localaiLicense: apache-2.0

almost-human-x3-32bit-1839-6b-i1
**Model Name:** Almost-Human-X3-32bit-1839-6B **Base Model:** Qwen3-Jan-v1-256k-ctx-6B-Brainstorm20x **Author:** DavidAU **Repository:** [DavidAU/Almost-Human-X3-32bit-1839-6B](https://huggingface.co/DavidAU/Almost-Human-X3-32bit-1839-6B) **License:** Apache 2.0 --- ### 🔍 **Overview** A high-precision, full-precision (float32) fine-tuned variant of the Qwen3-Jan model, specifically trained to emulate the literary and philosophical depth of Philip K. Dick. This model is the third in the "Almost-Human" series, built with advanced **"Brainstorm 20x"** methodology to enhance reasoning, coherence, and narrative quality—without sacrificing instruction-following ability. ### 🎯 **Key Features** - **Full Precision (32-bit):** Trained at 16-bit for 3 epochs, then finalized at float32 for maximum fidelity and performance. - **Extended Context (256k tokens):** Ideal for long-form writing, complex reasoning, and detailed code generation. - **Advanced Reasoning via Brainstorm 20x:** The model’s reasoning centers are expanded, calibrated, and interconnected 20 times, resulting in: - Richer, more nuanced prose - Stronger emotional engagement - Deeper narrative focus and foreshadowing - Fewer clichés, more originality - Enhanced coherence and detail - **Optimized for Creativity & Code:** Excels at brainstorming, roleplay, storytelling, and multi-step coding tasks. ### 🛠️ **Usage Tips** - Use **CHATML or Jinja templates** for best results. - Recommended settings: Temperature 0.3–0.7 (higher for creativity), Top-p 0.8, Repetition penalty 1.05–1.1. - Best used with **"smoothing" (1.5)** in GUIs like KoboldCpp or oobabooga. - For complex tasks, use **Q6 or Q8 GGUF quantizations**. ### 📦 **Model Formats** - **Full precision (safe tensors)** – for training or high-fidelity inference - **GGUF, GPTQ, EXL2, AWQ, HQQ** – available via quantization (see [mradermacher/Almost-Human-X3-32bit-1839-6B-i1-GGUF](https://huggingface.co/mradermacher/Almost-Human-X3-32bit-1839-6B-i1-GGUF) for quantized versions) --- ### 💬 **Ideal For** - Creative writing, speculative fiction, and philosophical storytelling - Complex code generation with deep reasoning - Roleplay, character-driven dialogue, and immersive narratives - Researchers and developers seeking a highly expressive, human-like model > 📌 **Note:** This is the original source model. The GGUF versions by mradermacher are quantized derivatives — not the base model. --- **Explore the source:** [DavidAU/Almost-Human-X3-32bit-1839-6B](https://huggingface.co/DavidAU/Almost-Human-X3-32bit-1839-6B) **Quantization guide:** [mradermacher/Almost-Human-X3-32bit-1839-6B-i1-GGUF](https://huggingface.co/mradermacher/Almost-Human-X3-32bit-1839-6B-i1-GGUF)

Repository: localaiLicense: apache-2.0

Page 1