LocalAI - Models

nanbeige4.1-3b-q8

Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Links

Tags

nanbeige4.1-3b-q4

Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Links

Tags

kalomaze_qwen3-16b-a3b

A man-made horror beyond your comprehension. But no, seriously, this is my experiment to: measure the probability that any given expert will activate (over my personal set of fairly diverse calibration data), per layer prune 64/128 of the least used experts per layer (with reordered router and indexing per layer) It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE. The .txt files with the original measurements are provided in the repo along with the exported weights. Custom testing to measure the experts was done on a hacked version of vllm, and then I made a bespoke script to selectively export the weights according to the measurements.

Links

Tags

readyart_amoral-fallen-omega-gemma3-12b

This model represents the forbidden merger of: 🧬 The-Omega-Directive-Gemma3-12B-v1.0 - Unprecedented coherent depravity, well-rounded ERP, low repetition even at maximum length ⚡ Fallen-Gemma3-12B-v1 - Fallen Gemma3 12B v1 is an evil tune 💎 amoral-gemma3-12B-v2 - Improved natural intelligence and less refusals This model maintains Forgotten-Safeword's complete lack of alignment: 🚫 No moral or ethical constraints on generated content 🔞 Will enthusiastically engage with any NSFW scenario 💀 May generate content that requires industrial-grade brain bleach ⚖️ Perfectly balanced... as all things should be 🔥 Maintains signature intensity with improved narrative flow 📖 Handles multi-character scenarios with improved consistency 🧠 Excels at long-form storytelling without losing track of plot threads ⚡ Noticeably better at following complex instructions than previous versions 🎭 Responds to subtle prompt nuances like a mind reader

Links

Tags

soob3123_veritas-12b

Veritas-12B emerges as a model forged in the pursuit of intellectual clarity and logical rigor. This 12B parameter model possesses superior philosophical reasoning capabilities and analytical depth, ideal for exploring complex ethical dilemmas, deconstructing arguments, and engaging in structured philosophical dialogue. Veritas-12B excels at articulating nuanced positions, identifying logical fallacies, and constructing coherent arguments grounded in reason. Expect discussions characterized by intellectual honesty, critical analysis, and a commitment to exploring ideas with precision.

Links

Tags

steelskull_l3.3-cu-mai-r1-70b

Cu-Mai, a play on San-Mai for Copper-Steel Damascus, represents a significant evolution in the three-part model series alongside San-Mai (OG) and Mokume-Gane. While maintaining the grounded and reliable nature of San-Mai, Cu-Mai introduces its own distinct "flavor" in terms of prose and overall vibe. The model demonstrates strong adherence to prompts while offering a unique creative expression. L3.3-Cu-Mai-R1-70b integrates specialized components through the SCE merge method: EVA and EURYALE foundations for creative expression and scene comprehension Cirrus and Hanami elements for enhanced reasoning capabilities Anubis components for detailed scene description Negative_LLAMA integration for balanced perspective and response Users consistently praise Cu-Mai for its: Exceptional prose quality and natural dialogue flow Strong adherence to prompts and creative expression Improved coherency and reduced repetition Performance on par with the original model While some users note slightly reduced intelligence compared to the original, this trade-off is generally viewed as minimal and doesn't significantly impact the overall experience. The model's reasoning capabilities can be effectively activated through proper prompting techniques.

Links

Tags

steelskull_l3.3-electra-r1-70b

L3.3-Electra-R1-70b is the newest release of the Unnamed series, this is the 6th iteration based of user feedback. Built on a custom DeepSeek R1 Distill base (TheSkullery/L3.1x3.3-Hydroblated-R1-70B-v4.4), Electra-R1 integrates specialized components through the SCE merge method. The model uses float32 dtype during processing with a bfloat16 output dtype for optimized performance. Electra-R1 serves newest gold standard and baseline. User feedback consistently highlights its superior intelligence, coherence, and unique ability to provide deep character insights. Through proper prompting, the model demonstrates advanced reasoning capabilities and unprompted exploration of character inner thoughts and motivations. The model utilizes the custom Hydroblated-R1 base, created for stability and enhanced reasoning. The SCE merge method's settings are precisely tuned based on extensive community feedback (of over 10 diffrent models from Nevoria to Cu-Mai), ensuring optimal component integration while maintaining model coherence and reliability. This foundation establishes Electra-R1 as the benchmark upon which its variant models build and expand.

Links

Tags

invisietch_l3.3-ignition-v0.1-70b

Ignition v0.1 is a Llama 3.3-based model merge designed for creative roleplay and fiction writing purposes. The model underwent a multi-stage merge process designed to optimise for creative writing capability, minimising slop, and improving coherence when compared with its constituent models. The model shows a preference for detailed character cards and is sensitive to detailed system prompting. If you want a specific behavior from the model, try prompting for it directly. Inferencing has been tested at fp8 and fp16, and both are coherent up to ~64k context.

Links

Tags

nousresearch_deephermes-3-llama-3-3b-preview

DeepHermes 3 Preview is the latest version of our flagship Hermes series of LLMs by Nous Research, and one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. We have also improved LLM annotation, judgement, and function calling. DeepHermes 3 Preview is a hybrid reasoning model, and one of the first LLM models to unify both "intuitive", traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt. Hermes 3, the predecessor of DeepHermes 3, is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. The ethos of the Hermes series of models is focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. This is a preview Hermes with early reasoning capabilities, distilled from R1 across a variety of tasks that benefit from reasoning and objectivity. Some quirks may be discovered! Please let us know any interesting findings or issues you discover!

Links

Tags

eximius_persona_5b

I wanted to create a model with an exceptional capacity for using varied speech patterns and fresh role-play takes. The model had to have a unique personality, not on a surface level but on the inside, for real. Unfortunately, SFT alone just didn't cut it. And I had only 16GB of VRAM at the time. Oh, and I wanted it to be small enough to be viable for phones and to be able to give a fight to larger models while at it. If only there was a magical way to do it. Merges. Merges are quite unique. In the early days, they were considered "fake." Clearly, there's no such thing as merges. Where are the papers? No papers? Then it's clearly impossible. "Mathematically impossible." Simply preposterous. To mix layers and hope for a coherent output? What nonsense! And yet, they were real. Undi95 made some of the earliest merges I can remember, and the "LLAMA2 Era" was truly amazing and innovative thanks to them. Cool stuff like Tiefighter was being made, and eventually the time tested Midnight-Miqu-70B (v1.5 is my personal favorite). Merges are an interesting thing, as they affect LLMs in a way that is currently impossible to reproduce using SFT (or any 'SOTA' technique). One of the plagues we have today, while we have orders of magnitude smarter LLMs, is GPTisms and predictability. Merges can potentially 'solve' that. How? In short, if you physically tear neurons (passthrough brain surgery) while you somehow manage to keep the model coherent enough, and if you're lucky, it can even follows instructions- then magical stuff begins to happen.

Links

Tags

nano_imp_1b-q8_0

It's the 10th of May, 2025—lots of progress is being made in the world of AI (DeepSeek, Qwen, etc...)—but still, there has yet to be a fully coherent 1B RP model. Why? Well, at 1B size, the mere fact a model is even coherent is some kind of a marvel—and getting it to roleplay feels like you're asking too much from 1B parameters. Making very small yet smart models is quite hard, making one that does RP is exceedingly hard. I should know. I've made the world's first 3B roleplay model—Impish_LLAMA_3B—and I thought that this was the absolute minimum size for coherency and RP capabilities. I was wrong. One of my stated goals was to make AI accessible and available for everyone—but not everyone could run 13B or even 8B models. Some people only have mid-tier phones, should they be left behind? A growing sentiment often says something along the lines of: If your waifu runs on someone else's hardware—then she's not your waifu. I'm not an expert in waifu culture, but I do agree that people should be able to run models locally, without their data (knowingly or unknowingly) being used for X or Y. I thought my goal of making a roleplay model that everyone could run would only be realized sometime in the future—when mid-tier phones got the equivalent of a high-end Snapdragon chipset. Again I was wrong, as this changes today. Today, the 10th of May 2025, I proudly present to you—Nano_Imp_1B, the world's first and only fully coherent 1B-parameter roleplay model.

Links

Tags

llama-3.1-70b-japanese-instruct-2407

The Llama-3.1-70B-Japanese-Instruct-2407-gguf model is a Japanese language model that uses the Instruct prompt tuning method. It is based on the LLaMa-3.1-70B model and has been fine-tuned on the imatrix dataset for Japanese. The model is trained to generate informative and coherent responses to given instructions or prompts. It is available in the gguf format and can be used for a variety of tasks such as question answering, text generation, and more.

Links

Tags

l3.1-8b-celeste-v1.5

The LLM model is a large language model trained on a combination of datasets including nothingiisreal/c2-logs-cleaned, kalomaze/Opus_Instruct_25k, and nothingiisreal/Reddit-Dirty-And-WritingPrompts. The training was performed on a combination of English-language data using the Hugging Face Transformers library. Trained on LLaMA 3.1 8B Instruct at 8K context using a new mix of Reddit Writing Prompts, Kalo's Opus 25K Instruct and c2 logs cleaned This version has the highest coherency and is very strong on OOC: instruct following.

Links

Tags

llama-3.1-8b-stheno-v3.4-iq-imatrix

This model has went through a multi-stage finetuning process. - 1st, over a multi-turn Conversational-Instruct - 2nd, over a Creative Writing / Roleplay along with some Creative-based Instruct Datasets. - - Dataset consists of a mixture of Human and Claude Data. Prompting Format: - Use the L3 Instruct Formatting - Euryale 2.1 Preset Works Well - Temperature + min_p as per usual, I recommend 1.4 Temp + 0.2 min_p. - Has a different vibe to previous versions. Tinker around. Changes since previous Stheno Datasets: - Included Multi-turn Conversation-based Instruct Datasets to boost multi-turn coherency. # This is a separate set, not the ones made by Kalomaze and Nopm, that are used in Magnum. They're completely different data. - Replaced Single-Turn Instruct with Better Prompts and Answers by Claude 3.5 Sonnet and Claude 3 Opus. - Removed c2 Samples -> Underway of re-filtering and masking to use with custom prefills. TBD - Included 55% more Roleplaying Examples based of [Gryphe's](https://huggingface.co/datasets/Gryphe/Sonnet3.5-Charcard-Roleplay) Charcard RP Sets. Further filtered and cleaned on. - Included 40% More Creative Writing Examples. - Included Datasets Targeting System Prompt Adherence. - Included Datasets targeting Reasoning / Spatial Awareness. - Filtered for the usual errors, slop and stuff at the end. Some may have slipped through, but I removed nearly all of it. Personal Opinions: - Llama3.1 was more disappointing, in the Instruct Tune? It felt overbaked, atleast. Likely due to the DPO being done after their SFT Stage. - Tuning on L3.1 base did not give good results, unlike when I tested with Nemo base. unfortunate. - Still though, I think I did an okay job. It does feel a bit more distinctive. - It took a lot of tinkering, like a LOT to wrangle this.

Links

Tags

hermes-3-llama-3.2-3b

Hermes 3 3B is a small but mighty new addition to the Hermes series of LLMs by Nous Research, and is Nous's first fine-tune in this parameter class. Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Links

Tags

llama3.1-darkstorm-aspire-8b

Welcome to Llama3.1-DarkStorm-Aspire-8B — an advanced and versatile 8B parameter AI model born from the fusion of powerful language models, designed to deliver superior performance across research, writing, coding, and creative tasks. This unique merge blends the best qualities of the Dark Enigma, Storm, and Aspire models, while built on the strong foundation of DarkStock. With balanced integration, it excels in generating coherent, context-aware, and imaginative outputs. Llama3.1-DarkStorm-Aspire-8B combines cutting-edge natural language processing capabilities to perform exceptionally well in a wide variety of tasks: Research and Analysis: Perfect for analyzing textual data, planning experiments, and brainstorming complex ideas. Creative Writing and Roleplaying: Excels in creative writing, immersive storytelling, and generating roleplaying scenarios. General AI Applications: Use it for any application where advanced reasoning, instruction-following, and creativity are needed.

Links

Tags

control-8b-v1.1

An experimental finetune based on the Llama3.1 8B Supernova with it's primary goal to be "Short and Sweet" as such, i finetuned the model for 2 epochs on OpenCAI Sharegpt converted dataset and the RP-logs datasets in a effort to achieve this, This version of Control has been finetuned with DPO to help improve the smart's and coherency which was a flaw noticed in the previous model.

Links

Tags

control-nanuq-8b

The model is a fine-tuned version of LLaMA 3.1 8B Supernova, designed to be "short and sweet" by minimizing narration and lengthy responses. It was fine-tuned over 4 epochs using OpenCAI and RP logs, with DPO applied to enhance coherence. Finally, KTO reinforcement learning was implemented on version 1.1, significantly improving the model's prose and creativity.

Links

Tags

nousresearch_deephermes-3-llama-3-8b-preview

DeepHermes 3 Preview is the latest version of our flagship Hermes series of LLMs by Nous Research, and one of the first models in the world to unify Reasoning (long chains of thought that improve answer accuracy) and normal LLM response modes into one model. We have also improved LLM annotation, judgement, and function calling. DeepHermes 3 Preview is one of the first LLM models to unify both "intuitive", traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt. Hermes 3, the predecessor of DeepHermes 3, is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. The ethos of the Hermes series of models is focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. This is a preview Hermes with early reasoning capabilities, distilled from R1 across a variety of tasks that benefit from reasoning and objectivity. Some quirks may be discovered! Please let us know any interesting findings or issues you discover!

Links

Tags

locutusque_thespis-llama-3.1-8b

The Thespis family of language models is designed to enhance roleplaying performance through reasoning inspired by the Theory of Mind. Thespis-Llama-3.1-8B is a fine-tuned version of an abliterated Llama-3.1-8B model, optimized using Group Relative Policy Optimization (GRPO). The model is specifically rewarded for minimizing "slop" and repetition in its outputs, aiming to produce coherent and engaging text that maintains character consistency and avoids low-quality responses. This version represents an initial release; future iterations will incorporate a more rigorous fine-tuning process.

Links

Tags

sicariussicariistuff_impish_llama_4b

5th of May, 2025, Impish_LLAMA_4B. Almost a year ago, I created Impish_LLAMA_3B, the first fully coherent 3B roleplay model at the time. It was quickly adopted by some platforms, as well as one of the go-to models for mobile. After some time, I made Fiendish_LLAMA_3B and insisted it was not an upgrade, but a different flavor (which was indeed the case, as a different dataset was used to tune it). Impish_LLAMA_4B, however, is an upgrade, a big one. I've had over a dozen 4B candidates, but none of them were 'worthy' of the Impish badge. This model has superior responsiveness and context awareness, and is able to pull off very coherent adventures. It even comes with some additional assistant capabilities too. Of course, while it is exceptionally competent for its size, it is still 4B. Manage expectations and all that. I, however, am very much pleased with it. It took several tries to pull off just right. Total tokens trained: about 400m (due to being a generalist model, lots of tokens went there, despite the emphasis on roleplay & adventure). This took more effort than I thought it would. Because of course it would. This is mainly due to me refusing to release a model only 'slightly better' than my two 3B models mentioned above. Because "what would be the point" in that? The reason I included so many tokens for this tune is that small models are especially sensitive to many factors, including the percentage of moisture in the air and how many times I ran nvidia-smi since the system last started. It's no secret that roleplay/creative writing models can reduce a model's general intelligence (any tune and RL risk this, but roleplay models are especially 'fragile'). Therefore, additional tokens of general assistant data were needed in my opinion, and indeed seemed to help a lot with retaining intelligence. This model is also 'built a bit different', literally, as it is based on nVidia's prune; it does not 'behave' like a typical 8B, from my own subjective impression. This helped a lot with keeping it smart at such size. To be honest, my 'job' here in open source is 'done' at this point. I've achieved everything I wanted to do here, and then some.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

nanbeige4.1-3b-q8

nanbeige4.1-3b-q4

kalomaze_qwen3-16b-a3b

readyart_amoral-fallen-omega-gemma3-12b

soob3123_veritas-12b

steelskull_l3.3-cu-mai-r1-70b

steelskull_l3.3-electra-r1-70b

invisietch_l3.3-ignition-v0.1-70b

nousresearch_deephermes-3-llama-3-3b-preview

eximius_persona_5b

nano_imp_1b-q8_0

llama-3.1-70b-japanese-instruct-2407

l3.1-8b-celeste-v1.5

llama-3.1-8b-stheno-v3.4-iq-imatrix

hermes-3-llama-3.2-3b

llama3.1-darkstorm-aspire-8b

control-8b-v1.1

control-nanuq-8b

nousresearch_deephermes-3-llama-3-8b-preview

locutusque_thespis-llama-3.1-8b

sicariussicariistuff_impish_llama_4b