LocalAI - Models

deepseek-ai.deepseek-v3.2

This is a quantized version of the DeepSeek-V3.2 model by deepseek-ai, optimized for efficient deployment. It is designed for text generation tasks and supports the pipeline tag `text-generation`. The model is based on the original DeepSeek-V3.2 architecture and is available for use in various applications. For more details, refer to the [official repository](https://github.com/DevQuasar/deepseek-ai.DeepSeek-V3.2-GGUF).

Links

https://huggingface.co/DevQuasar/deepseek-ai.DeepSeek-V3.2-GGUF

Tags

mox-small-1-i1

The model, **vanta-research/mox-small-1**, is a small-scale text-generation model optimized for conversational AI tasks. It supports chat, persona research, and chatbot applications. The quantized versions (e.g., i1-Q4_K_M, i1-Q4_K_S) are available for efficient deployment, with the i1-Q4_K_S variant offering the best balance of size, speed, and quality. The model is designed for lightweight inference and is compatible with frameworks like HuggingFace Transformers.

Links

https://huggingface.co/mradermacher/mox-small-1-i1-GGUF

Tags

liquidai.lfm2-2.6b-transcript

This is a large language model (2.6B parameters) designed for text-generation tasks. It is a quantized version of the original model `LiquidAI/LFM2-2.6B-Transcript`, optimized for efficiency while retaining strong performance. The model is built on the foundation of the base model, with additional optimizations for deployment and use cases like transcription or language modeling. It is trained on large-scale text data and supports multiple languages.

Links

https://huggingface.co/DevQuasar/LiquidAI.LFM2-2.6B-Transcript-GGUF

Tags

rwkv7-g1c-13.3b

The model is **RWKV7 g1c 13B**, a large language model optimized for efficiency. It is quantized using **Bartowski's calibrationv5 for imatrix** to reduce memory usage while maintaining performance. The base model is **BlinkDL/rwkv7-g1**, and this version is tailored for text-generation tasks. It balances accuracy and efficiency, making it suitable for deployment in various applications.

Links

https://huggingface.co/NaomiBTW/rwkv7-g1c-13.3b-gguf

Tags

liquidai_lfm2-350m-extract

Based on LFM2-350M, LFM2-350M-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. Use cases: Extracting invoice details from emails into structured JSON. Converting regulatory filings into XML for compliance systems. Transforming customer support tickets into YAML for analytics pipelines. Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post.

Links

Tags

gpt-oss-120b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

This is an uncensored version of unsloth/gpt-oss-20b-BF16 created with abliteration (see remove-refusals-with-transformers to know more about it).

Links

Tags

l3.3-70b-euryale-v2.3

A direct replacement / successor to Euryale v2.2, not Hanami-x1, though it is slightly better than them in my opinion.

Links

Tags

steelskull_l3.3-cu-mai-r1-70b

Cu-Mai, a play on San-Mai for Copper-Steel Damascus, represents a significant evolution in the three-part model series alongside San-Mai (OG) and Mokume-Gane. While maintaining the grounded and reliable nature of San-Mai, Cu-Mai introduces its own distinct "flavor" in terms of prose and overall vibe. The model demonstrates strong adherence to prompts while offering a unique creative expression. L3.3-Cu-Mai-R1-70b integrates specialized components through the SCE merge method: EVA and EURYALE foundations for creative expression and scene comprehension Cirrus and Hanami elements for enhanced reasoning capabilities Anubis components for detailed scene description Negative_LLAMA integration for balanced perspective and response Users consistently praise Cu-Mai for its: Exceptional prose quality and natural dialogue flow Strong adherence to prompts and creative expression Improved coherency and reduced repetition Performance on par with the original model While some users note slightly reduced intelligence compared to the original, this trade-off is generally viewed as minimal and doesn't significantly impact the overall experience. The model's reasoning capabilities can be effectively activated through proper prompting techniques.

Links

Tags

steelskull_l3.3-electra-r1-70b

L3.3-Electra-R1-70b is the newest release of the Unnamed series, this is the 6th iteration based of user feedback. Built on a custom DeepSeek R1 Distill base (TheSkullery/L3.1x3.3-Hydroblated-R1-70B-v4.4), Electra-R1 integrates specialized components through the SCE merge method. The model uses float32 dtype during processing with a bfloat16 output dtype for optimized performance. Electra-R1 serves newest gold standard and baseline. User feedback consistently highlights its superior intelligence, coherence, and unique ability to provide deep character insights. Through proper prompting, the model demonstrates advanced reasoning capabilities and unprompted exploration of character inner thoughts and motivations. The model utilizes the custom Hydroblated-R1 base, created for stability and enhanced reasoning. The SCE merge method's settings are precisely tuned based on extensive community feedback (of over 10 diffrent models from Nevoria to Cu-Mai), ensuring optimal component integration while maintaining model coherence and reliability. This foundation establishes Electra-R1 as the benchmark upon which its variant models build and expand.

Links

Tags

rwkv-6-world-7b

RWKV (pronounced RwaKuv) is an RNN with GPT-level LLM performance, and can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7. So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free, and a Linux Foundation AI project.

Links

Tags

opencoder-1.5b-base

The model is a large language model with 1.5 billion parameters, trained on 2.5 trillion tokens of code-related data. It supports both English and Chinese languages and is part of the OpenCoder LLM family which also includes 8B base and chat models. The model achieves high performance across multiple language model benchmarks and is one of the most comprehensively open-sourced models available.

Links

Tags

finemath-llama-3b

This is a continual-pre-training of Llama-3.2-3B on a mix of 📐 FineMath (our new high quality math dataset) and FineWeb-Edu. The model demonstrates superior math performance compared to Llama 3.2 3B, while maintaining similar performance on knowledge, reasoning, and common sense benchmarks. It was trained on 160B tokens using a mix of 40% FineWeb-Edu and 60% from FineMath (30% FineMath-4+ subset and 30% InfiWebMath-4+ subset). We use nanotron for the training, and you can find the training scripts in our SmolLM2 GitHub repo.

Links

Tags

l3.1-8b-llamoutcast-i1

Warning: this model is utterly cursed. Llamoutcast This model was originally intended to be a DADA finetune of Llama-3.1-8B-Instruct but the results were unsatisfactory. So it received some additional finetuning on a rawtext dataset and now it is utterly cursed. It responds to Llama-3 Instruct formatting.

Links

Tags

mistral-7b-instruct-v0.3

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

Links

Tags

krutrim-ai-labs_krutrim-2-instruct

Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is built on the Mistral-NeMo 12B architecture and trained across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned for instruction following on diverse data covering a wide range of tasks, including knowledge recall, math, reasoning, coding, safety, and creative writing.

Links

Tags

sicariussicariistuff_redemption_wind_24b

This is a lightly fine-tuned version of the Mistral 24B base model, designed as an accessible and adaptable foundation for further fine-tuning and merging fodder. Key modifications include: ChatML-ified, with no additional tokens introduced. High quality private instruct—not generated by ChatGPT or Claude, ensuring no slop and good markdown understanding. No refusals—since it’s a base model, refusals should be minimal to non-existent, though, in early testing, occasional warnings still appear (I assume some were baked into the pre-train). High-quality private creative writing dataset Mainly to dilute baked-in slop further, but it can actually write some stories, not bad for loss ~8. Small, high-quality private RP dataset This was done so further tuning for RP will be easier. The dataset was kept small and contains ZERO SLOP, some entries are of 16k token length. Exceptional adherence to character cards This was done to make it easier for further tunes intended for roleplay.

Links

Tags

logics-qwen3-math-4b

**Model Name:** Logics-Qwen3-Math-4B **Base Model:** Qwen/Qwen3-4B-Thinking-2507 **Size:** 4B parameters **Fine-Tuned For:** Mathematical reasoning, logical problem solving, and algorithmic coding **Training Data:** OpenMathReasoning, OpenCodeReasoning, Helios-R-6M **Description:** A lightweight, high-precision 4B-parameter model optimized for mathematical and logical reasoning. Fine-tuned from Qwen3-4B-Thinking-2507, it excels in solving equations, performing step-by-step reasoning, and handling algorithmic tasks with structured outputs in LaTeX, Markdown, JSON, and more. Ideal for education, research, and deployment on mid-range hardware. **Use Case:** Perfect for math problem-solving, code reasoning, and technical content generation in resource-constrained environments. **Tags:** #math #code #reasoning #4B #Qwen3 #text-generation #open-source

Links

https://huggingface.co/mradermacher/Logics-Qwen3-Math-4B-GGUF

Tags

a2fm-32b-rl

**A²FM-32B-rl** is a 32-billion-parameter adaptive foundation model designed for hybrid reasoning and agentic tasks. It dynamically selects between *instant*, *reasoning*, and *agentic* execution modes using a **route-then-align** framework, enabling smarter, more efficient AI behavior. Trained with **Adaptive Policy Optimization (APO)**, it achieves state-of-the-art performance on benchmarks like AIME25 (70.4%) and BrowseComp (13.4%), while reducing inference cost by up to **45%** compared to traditional reasoning methods—delivering high accuracy at low cost. Originally developed by **PersonalAILab**, this model is optimized for tool-aware, multi-step problem solving and is ideal for advanced AI agents requiring both precision and efficiency. 🔹 *Model Type:* Adaptive Agent Foundation Model 🔹 *Size:* 32B 🔹 *Use Case:* Agentic reasoning, tool use, cost-efficient AI agents 🔹 *Training Approach:* Route-then-align + Adaptive Policy Optimization (APO) 🔹 *Performance:* SOTA on reasoning and agentic benchmarks 📄 [Paper](https://arxiv.org/abs/2510.12838) | 🐙 [GitHub](https://github.com/OPPO-PersonalAI/Adaptive_Agent_Foundation_Models)

Links

https://huggingface.co/mradermacher/A2FM-32B-rl-GGUF

Tags

gpt-oss-20b-esper3.1-i1

**Model Name:** gpt-oss-20b-Esper3.1 **Repository:** [ValiantLabs/gpt-oss-20b-Esper3.1](https://huggingface.co/ValiantLabs/gpt-oss-20b-Esper3.1) **Base Model:** openai/gpt-oss-20b **Type:** Instruction-tuned, reasoning-focused language model **Size:** 20 billion parameters **License:** Apache 2.0 --- ### 🔍 **Overview** gpt-oss-20b-Esper3.1 is a specialized, instruction-tuned variant of the 20B open-source GPT model, developed by **Valiant Labs**. It excels in **advanced coding, software architecture, and DevOps reasoning**, making it ideal for technical problem-solving and AI-driven engineering tasks. ### ✨ **Key Features** - **Expert in DevOps & Cloud Systems:** Trained on high-difficulty datasets (e.g., Titanium3, Tachibana3, Mitakihara), it delivers precise, actionable guidance for AWS, Kubernetes, Terraform, Ansible, Docker, Jenkins, and more. - **Strong Code Reasoning:** Optimized for complex programming tasks, including full-stack development, scripting, and debugging. - **High-Quality Inference:** Uses `bf16` precision for full-precision performance; quantized versions (e.g., GGUF) available for efficient local inference. - **Open-Source & Free to Use:** Fully open-access, built on the public gpt-oss-20b foundation and trained with community datasets. ### 📌 **Use Cases** - Designing scalable cloud architectures - Writing and optimizing infrastructure-as-code - Debugging complex DevOps pipelines - AI-assisted software development and documentation - Real-time technical troubleshooting ### 💡 **Getting Started** Use the standard `text-generation` pipeline with the `transformers` library. Supports role-based prompting (e.g., `user`, `assistant`) and performs best with high-reasoning prompts. ```python from transformers import pipeline pipe = pipeline("text-generation", model="ValiantLabs/gpt-oss-20b-Esper3.1", torch_dtype="auto", device_map="auto") messages = [{"role": "user", "content": "Design a Kubernetes cluster for a high-traffic web app with CI/CD via GitHub Actions."}] outputs = pipe(messages, max_new_tokens=2000) print(outputs[0]["generated_text"][-1]) ``` --- > 🔗 **Model Gallery Entry**: > *gpt-oss-20b-Esper3.1 – A powerful, open-source 20B model tuned for expert-level DevOps, coding, and system architecture. Built by Valiant Labs using high-quality technical datasets. Perfect for engineers, architects, and AI developers.*

Links

https://huggingface.co/mradermacher/gpt-oss-20b-Esper3.1-i1-GGUF

Tags

qwen3-4b-thinking-2507-gspo-easy

**Model Name:** Qwen3-4B-Thinking-2507-GSPO-Easy **Base Model:** Qwen3-4B (by Alibaba Cloud) **Fine-tuned With:** GRPO (Generalized Reward Policy Optimization) **Framework:** Hugging Face TRL (Transformers Reinforcement Learning) **License:** [MIT](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy/blob/main/LICENSE) --- ### 📌 Description: A fine-tuned 4-billion-parameter version of **Qwen3-4B**, optimized for **step-by-step reasoning and complex problem-solving** using **GRPO**, a reinforcement learning method designed to enhance mathematical and logical reasoning in language models. This model excels in tasks requiring **structured thinking**, such as solving math problems, logical puzzles, and multi-step reasoning, making it ideal for applications in education, AI assistants, and reasoning benchmarks. ### 🔧 Key Features: - Trained with **TRL 0.23.1** and **Transformers 4.57.1** - Optimized for **high-quality reasoning output** - Part of the **Qwen3-4B-Thinking** series, designed to simulate human-like thought processes - Compatible with Hugging Face `transformers` and `pipeline` API ### 📚 Use Case: Perfect for applications demanding **deep reasoning**, such as: - AI tutoring systems - Advanced chatbots with explanation capabilities - Automated problem-solving in STEM domains ### 📌 Quick Start (Python): ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` > ✅ **Note**: This is the **original, non-quantized base model**. Quantized versions (e.g., GGUF) are available separately under the same repository for efficient inference on consumer hardware. --- 🔗 **Model Page:** [https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy) 📝 **Training Details & Visualizations:** [WandB Dashboard](https://wandb.ai/leonwenderoth-tu-darmstadt/huggingface/runs/t42skrc7) --- *Fine-tuned using GRPO — a method proven to boost mathematical reasoning in open language models. Cite: Shao et al., 2024 (arXiv:2402.03300)*

Links

https://huggingface.co/mradermacher/Qwen3-4B-Thinking-2507-GSPO-Easy-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

deepseek-ai.deepseek-v3.2

mox-small-1-i1

liquidai.lfm2-2.6b-transcript

rwkv7-g1c-13.3b

liquidai_lfm2-350m-extract

gpt-oss-120b

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

l3.3-70b-euryale-v2.3

steelskull_l3.3-cu-mai-r1-70b

steelskull_l3.3-electra-r1-70b

rwkv-6-world-7b

opencoder-1.5b-base

finemath-llama-3b

l3.1-8b-llamoutcast-i1

mistral-7b-instruct-v0.3

krutrim-ai-labs_krutrim-2-instruct

sicariussicariistuff_redemption_wind_24b

logics-qwen3-math-4b

a2fm-32b-rl

gpt-oss-20b-esper3.1-i1

qwen3-4b-thinking-2507-gspo-easy