LocalAI - Models

qwen3.5-397b-a17b

Links

https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF

Tags

qwen3.5-27b

Links

https://huggingface.co/unsloth/Qwen3.5-27B-GGUF

Tags

qwen3.5-122b-a10b

Links

https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF

Tags

qwen3-vl-2b-thinking

Qwen3-VL-2B-Thinking is the 2B parameter model of the Qwen3-VL series that is thinking.

Links

https://huggingface.co/unsloth/Qwen3-VL-2B-Thinking-GGUF

Tags

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

This is an uncensored version of unsloth/gpt-oss-20b-BF16 created with abliteration (see remove-refusals-with-transformers to know more about it).

Links

Tags

compumacy-experimental-32b

A Specialized Language Model for Clinical Psychology & Psychiatry Compumacy-Experimental_MF is an advanced, experimental large language model fine-tuned to assist mental health professionals in clinical assessment and treatment planning. By leveraging the powerful unsloth/Qwen3-32B as its base, this model is designed to process complex clinical vignettes and generate structured, evidence-based responses that align with established diagnostic manuals and practice guidelines. This model is a research-focused tool intended to augment, not replace, the expertise of a licensed clinician. It systematically applies diagnostic criteria from the DSM-5-TR, references ICD-11 classifications, and cites peer-reviewed literature to support its recommendations.

Links

Tags

llama-3.2-3b-agent007

The model is a quantized version of EpistemeAI/Llama-3.2-3B-Agent007, developed by EpistemeAI and fine-tuned from unsloth/llama-3.2-3b-instruct-bnb-4bit. It was trained 2x faster with Unsloth and Huggingface's TRL library. Fine tuned with Agent datasets.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-GGUF

Tags

llama-3.2-3b-agent007-coder

The Llama-3.2-3B-Agent007-Coder-GGUF is a quantized version of the EpistemeAI/Llama-3.2-3B-Agent007-Coder model, which is a fine-tuned version of the unsloth/llama-3.2-3b-instruct-bnb-4bit model. It is created using llama.cpp and trained with additional datasets such as the Agent dataset, Code Alpaca 20K, and magpie ultra 0.1. This model is optimized for multilingual dialogue use cases and agentic retrieval and summarization tasks. The model is available for commercial and research use in multiple languages and is best used with the transformers library.

Links

https://huggingface.co/QuantFactory/Llama-3.2-3B-Agent007-Coder-GGUF

Tags

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.

Links

https://huggingface.co/QuantFactory/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF

Tags

fireball-llama-3.11-8b-v1orpo

Developed by: EpistemeAI License: apache-2.0 Finetuned from model : unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit Finetuned methods: DPO (Direct Preference Optimization) & ORPO (Odds Ratio Preference Optimization)

Links

https://huggingface.co/mradermacher/Fireball-Llama-3.11-8B-v1orpo-GGUF

Tags

mn-lulanum-12b-fix-i1

This model was merged using the della_linear merge method using unsloth/Mistral-Nemo-Base-2407 as a base. The following models were included in the merge: VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct anthracite-org/magnum-v2.5-12b-kto Undi95/LocalC-12B-e2.0 NeverSleep/Lumimaid-v0.2-12B

Links

Tags

mistralai_magistral-small-2509-multimodal

Magistral Small 1.2 Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters. Magistral Small can be deployed locally, fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized. Learn more about Magistral in our blog post. The model was presented in the paper Magistral. Quantization from unsloth, using their recommended parameters as defaults and including mmproj for multimodality.

Links

Tags

llama-3.2-3b-small_shiro_roleplay

**Model Name:** Llama-3.2-3B-small_Shiro_roleplay-gguf **Base Model:** Meta-Llama-3.2-3B-Instruct (via unsloth/Meta-Llama-3.2-3B-Instruct-bnb-4bit) **Fine-Tuned With:** LoRA (rank 64) using Unsloth for optimized performance **Task:** Roleplay & creative storytelling **Format:** GGUF (Q4_K_M, Q8_0) – optimized for local inference via llama.cpp, LM Studio, Ollama **Context Length:** 4096 tokens **Description:** A compact yet powerful 3.2B-parameter fine-tuned Llama 3.2 model specialized for immersive, witty, and darkly imaginative roleplay. Trained on creative and absurd narrative scenarios, it excels at generating unique characters, engaging scenes, and high-concept storytelling with a distinct, sarcastic flair. Ideal for writers, game masters, and creative developers seeking a responsive, locally runnable assistant for imaginative storytelling.

Links

https://huggingface.co/samunder12/Llama-3.2-3B-small_Shiro_roleplay-gguf

Tags

gpt-oss-20b-claude-4-distill-i1

**Model Name:** GPT-OSS 20B **Base Model:** openai/gpt-oss-20b **License:** Apache 2.0 (fully open for commercial and research use) **Architecture:** 21B-parameter Mixture-of-Experts (MoE) language model **Key Features:** - Designed for powerful reasoning, agentic tasks, and developer applications. - Supports configurable reasoning levels (Low, Medium, High) for balancing speed and depth. - Native support for tool use: web browsing, code execution, function calling, and structured outputs. - Trained on OpenAI’s **harmony response format** — requires this format for proper inference. - Optimized for efficient inference with native **MXFP4 quantization** (supports 16GB VRAM deployment). - Fully fine-tunable and compatible with major frameworks: Transformers, vLLM, Ollama, LM Studio, and more. **Use Cases:** Ideal for research, local deployment, agent development, code generation, complex reasoning, and interactive applications. **Original Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) *Note: This repository contains quantized versions (GGUF) by mradermacher, based on the original fine-tuned model from armand0e, which was derived from unsloth/gpt-oss-20b-unsloth-bnb-4bit.*

Links

https://huggingface.co/mradermacher/gpt-oss-20b-claude-4-distill-i1-GGUF

Tags

financial-gpt-oss-20b-q8-i1

### **Financial GPT-OSS 20B (Base Model)** **Model Type:** Causal Language Model (Fine-tuned for Financial Analysis) **Architecture:** Mixture of Experts (MoE) – 20B parameters, 32 experts (4 active per token) **Base Model:** `unsloth/gpt-oss-20b-unsloth-bnb-4bit` **Fine-tuned With:** LoRA (Low-Rank Adaptation) on financial conversation data **Training Data:** 22,250 financial dialogue pairs covering stocks (AAPL, NVDA, TSLA, etc.), technical analysis, risk assessment, and trading signals **Context Length:** 131,072 tokens **Quantization:** Q8_0 GGUF (for efficient inference) **License:** Apache 2.0 **Key Features:** - Specialized in financial market analysis: technical indicators (RSI, MACD), risk assessments, trading signals, and price forecasts - Handles complex financial queries with structured, actionable insights - Designed for real-time use with low-latency inference (GGUF format) - Supports S&P 500 stocks and major asset classes across tech, healthcare, energy, and finance sectors **Use Case:** Ideal for traders, analysts, and developers building financial AI tools. Use with caution—**not financial advice**. **Citation:** ```bibtex @misc{financial-gpt-oss-20b-q8, title={Financial GPT-OSS 20B Q8: Fine-tuned Financial Analysis Model}, author={beenyb}, year={2025}, publisher={Hugging Face Hub}, url={https://huggingface.co/beenyb/financial-gpt-oss-20b-q8} } ```

Links

https://huggingface.co/mradermacher/financial-gpt-oss-20b-q8-i1-GGUF

Tags

metatune-gpt20b-r1.1-i1

**Model Name:** MetaTune-GPT20B-R1.1 **Base Model:** unsloth/gpt-oss-20b-unsloth-bnb-4bit **Repository:** [EpistemeAI/metatune-gpt20b-R1.1](https://huggingface.co/EpistemeAI/metatune-gpt20b-R1.1) **License:** Apache 2.0 **Description:** MetaTune-GPT20B-R1.1 is a large language model fine-tuned for recursive self-improvement, making it one of the first publicly released models capable of autonomously generating training data, evaluating its own performance, and adjusting its hyperparameters to improve over time. Built upon the open-weight GPT-OSS 20B architecture and trained with Unsloth's optimized 4-bit quantization, this model excels in complex reasoning, agentic tasks, and function calling. It supports tools like web browsing and structured output generation, and is particularly effective in high-reasoning use cases such as scientific problem-solving and math reasoning. **Performance Highlights (Zero-shot):** - **GPQA Diamond:** 93.3% exact match - **GSM8K (Chain-of-Thought):** 100% exact match **Recommended Use:** - Advanced reasoning & planning - Autonomous agent workflows - Research, education, and technical problem-solving **Safety Note:** Use with caution. For safety-critical applications, pair with a safety guardrail model such as [openai/gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b). **Fine-Tuned From:** unsloth/gpt-oss-20b-unsloth-bnb-4bit **Training Method:** Recursive Self-Improvement on the [Recursive Self-Improvement Dataset](https://huggingface.co/datasets/EpistemeAI/recursive_self_improvement_dataset) **Framework:** Hugging Face TRL + Unsloth for fast, efficient training **Inference Tip:** Set reasoning level to "high" for best results and to reduce prompt injection risks. 👉 [View on Hugging Face](https://huggingface.co/EpistemeAI/metatune-gpt20b-R1.1) | [GitHub: Recursive Self-Improvement](https://github.com/openai/harmony)

Links

https://huggingface.co/mradermacher/metatune-gpt20b-R1.1-i1-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3.5-397b-a17b

qwen3.5-27b

qwen3.5-122b-a10b

qwen3-vl-2b-thinking

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

compumacy-experimental-32b

llama-3.2-3b-agent007

llama-3.2-3b-agent007-coder

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo

fireball-llama-3.11-8b-v1orpo

mn-lulanum-12b-fix-i1

mistralai_magistral-small-2509-multimodal

llama-3.2-3b-small_shiro_roleplay

gpt-oss-20b-claude-4-distill-i1

financial-gpt-oss-20b-q8-i1

metatune-gpt20b-r1.1-i1