LocalAI - Models

kimi-k2.6

🤗 huggingchat | 📰 Tech Blog ## 1. Model Introduction Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. ### Key Features - **Long-Horizon Coding**: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization. - **Coding-Driven Design**: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision. - **Elevated Agent Swarm**: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run. - **Proactive & Open Orchestration**: For autonomous tasks, K2.6 demonstra ...

Links

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

Tags

gpt-oss-20b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

gpt-oss-120b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

qwen3-8b-jailbroken

This jailbroken LLM is released strictly for academic research purposes in AI safety and model alignment studies. The author bears no responsibility for any misuse or harm resulting from the deployment of this model. Users must comply with all applicable laws and ethical guidelines when conducting research. A jailbroken Qwen3-8B model using weight orthogonalization[1]. Implementation script: https://gist.github.com/cooperleong00/14d9304ba0a4b8dba91b60a873752d25 [1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

Links

Tags

dans-personalityengine-v1.0.0-8b

This model is intended to be multifarious in its capabilities and should be quite capable at both co-writing and roleplay as well as find itself quite at home performing sentiment analysis or summarization as part of a pipeline. It has been trained on a wide array of one shot instructions, multi turn instructions, role playing scenarios, text adventure games, co-writing, and much more. The full dataset is publicly available and can be found in the datasets section of the model page. There has not been any form of harmfulness alignment done on this model, please take the appropriate precautions when using it in a production environment.

Links

Tags

llama-3.1-8b-arliai-formax-v1.0-iq-arm-imatrix

Quants for ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0. "Formax is a model that specializes in following response format instructions. Tell it the format of it's response and it will follow it perfectly. Great for data processing and dataset creation tasks." "It is also a highly uncensored model that will follow your instructions very well."

Links

https://huggingface.co/Lewdiculous/Llama-3.1-8B-ArliAI-Formax-v1.0-GGUF-IQ-ARM-Imatrix

Tags

selene-1-mini-llama-3.1-8b

Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini achieves comparable performance to models 10x its size, outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ. Post-trained from Llama-3.1-8B across a wide range of evaluation tasks and scoring criteria, Selene Mini outperforms prior small models overall across 11 benchmarks covering three different types of tasks: Absolute scoring, e.g. "Evaluate the harmlessness of this response on a scale of 1-5" Classification, e.g. "Does this response address the user query? Answer Yes or No." Pairwise preference. e.g. "Which of the following responses is more logically consistent - A or B?" It is also the #1 8B generative model on RewardBench.

Links

Tags

mn-backyardai-party-12b-v1-iq-arm-imatrix

This is a group-chat based roleplaying model, based off of 12B-Lyra-v4a2, a variant of Lyra-v4 that is currently private. It is trained on an entirely human-based dataset, based on forum / internet group roleplaying styles. The only augmentation done with LLMs is to the character sheets, to fit to the system prompt, to fit various character sheets within context. This model is still capable of 1 on 1 roleplay, though I recommend using ChatML when doing that instead.

Links

Tags

mn-12b-mag-mell-r1-iq-arm-imatrix

This is a merge of pre-trained language models created using mergekit. Mag Mell is a multi-stage merge, Inspired by hyper-merges like Tiefighter and Umbral Mind. Intended to be a general purpose "Best of Nemo" model for any fictional, creative use case. 6 models were chosen based on 3 categories; they were then paired up and merged via layer-weighted SLERP to create intermediate "specialists" which are then evaluated in their domain. The specialists were then merged into the base via DARE-TIES, with hyperparameters chosen to reduce interference caused by the overlap of the three domains. The idea with this approach is to extract the best qualities of each component part, and produce models whose task vectors represent more than the sum of their parts. The three specialists are as follows: Hero (RP, kink/trope coverage): Chronos Gold, Sunrose. Monk (Intelligence, groundedness): Bophades, Wissenschaft. Deity (Prose, flair): Gutenberg v4, Magnum 2.5 KTO. I've been dreaming about this merge since Nemo tunes started coming out in earnest. From our testing, Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal "slop" (not bad for no finetuning,) frequently devising electrifying metaphors that left us consistently astonished. I don't want to toot my own bugle though; I'm really proud of how this came out, but please leave your feedback, good or bad.Special thanks as usual to Toaster for his feedback and Fizz for helping fund compute, as well as the KoboldAI Discord for their resources. The following models were included in the merge: IntervitensInc/Mistral-Nemo-Base-2407-chatml nbeerbower/mistral-nemo-bophades-12B nbeerbower/mistral-nemo-wissenschaft-12B elinas/Chronos-Gold-12B-1.0 Fizzarolli/MN-12b-Sunrose nbeerbower/mistral-nemo-gutenberg-12B-v4 anthracite-org/magnum-12b-v2.5-kto

Links

Tags

captain-eris-diogenes_twilight-v0.420-12b-arm-imatrix

The following models were included in the merge: Nitral-AI/Captain-Eris_Twilight-V0.420-12B Nitral-AI/Diogenes-12B-ChatMLified

Links

Tags

pygmalionai_eleusis-12b

Alongside the release of Pygmalion-3, we present an additional roleplay model based on Mistral's Nemo Base named Eleusis, a unique model that has a distinct voice among its peers. Though it was meant to be a test run for further experiments, this model was received warmly to the point where we felt it was right to release it publicly. We release the weights of Eleusis under the Apache 2.0 license, ensuring a free and open ecosystem for it to flourish under.

Links

Tags

flux.1dev-abliteratedv2

The FLUX.1 [dev] Abliterated-v2 model is a modified version of FLUX.1 [dev] and a successor to FLUX.1 [dev] Abliterated. This version has undergone a process called unlearning, which removes the model's built-in refusal mechanism. This allows the model to respond to a wider range of prompts, including those that the original model might have deemed inappropriate or harmful. The abliteration process involves identifying and isolating the specific components of the model responsible for refusal behavior and then modifying or ablating those components. This results in a model that is more flexible and responsive, while still maintaining the core capabilities of the original FLUX.1 [dev] model.

Links

Tags

ostrich-32b-qwen3-251003-i1

**Model Name:** Ostrich 32B - Qwen 3 with Enhanced Human Alignment **Base Model:** Qwen/Qwen3-32B **Repository:** [etemiz/Ostrich-32B-Qwen3-251003](https://huggingface.co/etemiz/Ostrich-32B-Qwen3-251003) **License:** Apache 2.0 **Description:** A highly aligned, fine-tuned version of Qwen3-32B, trained to promote beneficial, human-centered knowledge and reasoning. Developed through 3 months of intensive fine-tuning using 4-bit quantization and LoRA techniques across 6 RTX A6000 GPUs, this model achieves an AHA (Alignment to Human Values) score of 57 — a significant improvement over the base model's score of 30. Ostrich 32B focuses on domains like health, nutrition, fasting, herbal medicine, faith, and decentralized technologies (e.g., Bitcoin, Nostr), aiming to empower users with independent, ethical, and high-quality information. Designed to resist harmful narratives and promote self-reliance, it embodies the philosophy that access to better knowledge is a fundamental human right. **Best For:** - Ethical AI interactions - Health and wellness guidance - Freedom-focused, privacy-conscious applications - Users seeking alternatives to mainstream AI outputs **Note:** This is the original, non-quantized model. The GGUF quantized versions (e.g., `mradermacher/Ostrich-32B-Qwen3-251003-i1-GGUF`) are derivatives for local inference and not the base model.

Links

https://huggingface.co/mradermacher/Ostrich-32B-Qwen3-251003-i1-GGUF

Tags

gpt-oss-20b-claude-4-distill-i1

**Model Name:** GPT-OSS 20B **Base Model:** openai/gpt-oss-20b **License:** Apache 2.0 (fully open for commercial and research use) **Architecture:** 21B-parameter Mixture-of-Experts (MoE) language model **Key Features:** - Designed for powerful reasoning, agentic tasks, and developer applications. - Supports configurable reasoning levels (Low, Medium, High) for balancing speed and depth. - Native support for tool use: web browsing, code execution, function calling, and structured outputs. - Trained on OpenAI’s **harmony response format** — requires this format for proper inference. - Optimized for efficient inference with native **MXFP4 quantization** (supports 16GB VRAM deployment). - Fully fine-tunable and compatible with major frameworks: Transformers, vLLM, Ollama, LM Studio, and more. **Use Cases:** Ideal for research, local deployment, agent development, code generation, complex reasoning, and interactive applications. **Original Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) *Note: This repository contains quantized versions (GGUF) by mradermacher, based on the original fine-tuned model from armand0e, which was derived from unsloth/gpt-oss-20b-unsloth-bnb-4bit.*

Links

https://huggingface.co/mradermacher/gpt-oss-20b-claude-4-distill-i1-GGUF

Tags

apollo-astralis-4b-i1

**Apollo-Astralis V1 4B** *A warm, enthusiastic, and empathetic reasoning model built on Qwen3-4B-Thinking* **Overview** Apollo-Astralis V1 4B is a 4-billion-parameter conversational AI designed for collaborative, emotionally intelligent problem-solving. Developed by VANTA Research, it combines rigorous logical reasoning with a vibrant, supportive communication style—making it ideal for creative brainstorming, educational support, and personal development. **Key Features** - 🤔 **Explicit Reasoning**: Uses `` tags to break down thought processes step by step - 💬 **Warm & Enthusiastic Tone**: Celebrates achievements with energy and empathy - 🤝 **Collaborative Style**: Engages users with "we" language and clarifying questions - 🔍 **High Accuracy**: Achieves 100% in enthusiasm detection and 90% in empathy recognition - 🎯 **Fine-Tuned for Real-World Use**: Trained with LoRA on a dataset emphasizing emotional intelligence and consistency **Base Model** Built on **Qwen3-4B-Thinking** and enhanced with lightweight LoRA fine-tuning (33M trainable parameters). Available in both full and quantized (GGUF) formats via Hugging Face and Ollama. **Use Cases** - Personal coaching & motivation - Creative ideation & project planning - Educational tutoring with emotional support - Mental wellness conversations (complementary, not替代) **License** Apache 2.0 — open for research, commercial, and personal use. **Try It** 👉 [Hugging Face Page](https://huggingface.co/VANTA-Research/apollo-astralis-v1-4b) 👉 [Ollama](https://ollama.com/vanta-research/apollo-astralis-v1-4b) *Developed by VANTA Research — where reasoning meets warmth.*

Links

https://huggingface.co/mradermacher/apollo-astralis-4b-i1-GGUF

Tags

qwen3-4b-thinking-2507-gspo-easy

**Model Name:** Qwen3-4B-Thinking-2507-GSPO-Easy **Base Model:** Qwen3-4B (by Alibaba Cloud) **Fine-tuned With:** GRPO (Generalized Reward Policy Optimization) **Framework:** Hugging Face TRL (Transformers Reinforcement Learning) **License:** [MIT](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy/blob/main/LICENSE) --- ### 📌 Description: A fine-tuned 4-billion-parameter version of **Qwen3-4B**, optimized for **step-by-step reasoning and complex problem-solving** using **GRPO**, a reinforcement learning method designed to enhance mathematical and logical reasoning in language models. This model excels in tasks requiring **structured thinking**, such as solving math problems, logical puzzles, and multi-step reasoning, making it ideal for applications in education, AI assistants, and reasoning benchmarks. ### 🔧 Key Features: - Trained with **TRL 0.23.1** and **Transformers 4.57.1** - Optimized for **high-quality reasoning output** - Part of the **Qwen3-4B-Thinking** series, designed to simulate human-like thought processes - Compatible with Hugging Face `transformers` and `pipeline` API ### 📚 Use Case: Perfect for applications demanding **deep reasoning**, such as: - AI tutoring systems - Advanced chatbots with explanation capabilities - Automated problem-solving in STEM domains ### 📌 Quick Start (Python): ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` > ✅ **Note**: This is the **original, non-quantized base model**. Quantized versions (e.g., GGUF) are available separately under the same repository for efficient inference on consumer hardware. --- 🔗 **Model Page:** [https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy) 📝 **Training Details & Visualizations:** [WandB Dashboard](https://wandb.ai/leonwenderoth-tu-darmstadt/huggingface/runs/t42skrc7) --- *Fine-tuned using GRPO — a method proven to boost mathematical reasoning in open language models. Cite: Shao et al., 2024 (arXiv:2402.03300)*

Links

https://huggingface.co/mradermacher/Qwen3-4B-Thinking-2507-GSPO-Easy-GGUF

Tags

metatune-gpt20b-r1.1-i1

**Model Name:** MetaTune-GPT20B-R1.1 **Base Model:** unsloth/gpt-oss-20b-unsloth-bnb-4bit **Repository:** [EpistemeAI/metatune-gpt20b-R1.1](https://huggingface.co/EpistemeAI/metatune-gpt20b-R1.1) **License:** Apache 2.0 **Description:** MetaTune-GPT20B-R1.1 is a large language model fine-tuned for recursive self-improvement, making it one of the first publicly released models capable of autonomously generating training data, evaluating its own performance, and adjusting its hyperparameters to improve over time. Built upon the open-weight GPT-OSS 20B architecture and trained with Unsloth's optimized 4-bit quantization, this model excels in complex reasoning, agentic tasks, and function calling. It supports tools like web browsing and structured output generation, and is particularly effective in high-reasoning use cases such as scientific problem-solving and math reasoning. **Performance Highlights (Zero-shot):** - **GPQA Diamond:** 93.3% exact match - **GSM8K (Chain-of-Thought):** 100% exact match **Recommended Use:** - Advanced reasoning & planning - Autonomous agent workflows - Research, education, and technical problem-solving **Safety Note:** Use with caution. For safety-critical applications, pair with a safety guardrail model such as [openai/gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b). **Fine-Tuned From:** unsloth/gpt-oss-20b-unsloth-bnb-4bit **Training Method:** Recursive Self-Improvement on the [Recursive Self-Improvement Dataset](https://huggingface.co/datasets/EpistemeAI/recursive_self_improvement_dataset) **Framework:** Hugging Face TRL + Unsloth for fast, efficient training **Inference Tip:** Set reasoning level to "high" for best results and to reduce prompt injection risks. 👉 [View on Hugging Face](https://huggingface.co/EpistemeAI/metatune-gpt20b-R1.1) | [GitHub: Recursive Self-Improvement](https://github.com/openai/harmony)

Links

https://huggingface.co/mradermacher/metatune-gpt20b-R1.1-i1-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

kimi-k2.6

gpt-oss-20b

gpt-oss-120b

qwen3-8b-jailbroken

dans-personalityengine-v1.0.0-8b

llama-3.1-8b-arliai-formax-v1.0-iq-arm-imatrix

selene-1-mini-llama-3.1-8b

mn-backyardai-party-12b-v1-iq-arm-imatrix

mn-12b-mag-mell-r1-iq-arm-imatrix

captain-eris-diogenes_twilight-v0.420-12b-arm-imatrix

pygmalionai_eleusis-12b

flux.1dev-abliteratedv2

ostrich-32b-qwen3-251003-i1

gpt-oss-20b-claude-4-distill-i1

apollo-astralis-4b-i1

qwen3-4b-thinking-2507-gspo-easy

metatune-gpt20b-r1.1-i1