LocalAI - Models

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-heretic-i1

Links

https://huggingface.co/mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-i1-GGUF

Tags

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-i1

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF - A GGUF quantized model optimized for local inference. Specialized for reasoning and chain-of-thought tasks. Based on Qwen 3.5 architecture with enhanced language understanding. Available in multiple quantization levels for various hardware requirements. Distilled from Claude-style reasoning models for enhanced logical reasoning capabilities.

Links

https://huggingface.co/mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF

Tags

huihui-glm-4.7-flash-abliterated-i1

The model is a quantized version of **huihui-ai/Huihui-GLM-4.7-Flash-abliterated**, optimized for efficiency and deployment. It uses GGUF files with various quantization levels (e.g., IQ1_M, IQ2_XXS, Q4_K_M) and is designed for tasks requiring low-resource deployment. Key features include: - **Base Model**: Huihui-GLM-4.7-Flash-abliterated (unmodified, original model). - **Quantization**: Supports IQ1_M to Q4_K_M, balancing accuracy and efficiency. - **Use Cases**: Suitable for applications needing lightweight inference, such as edge devices or resource-constrained environments. - **Downloads**: Available in GGUF format with varying quality and size (e.g., 0.2GB to 18.2GB). - **Tags**: Abliterated, uncensored, and optimized for specific tasks. This model is a modified version of the original GLM-4.7, tailored for deployment with quantized weights.

Links

https://huggingface.co/mradermacher/Huihui-GLM-4.7-Flash-abliterated-i1-GGUF

Tags

mox-small-1-i1

The model, **vanta-research/mox-small-1**, is a small-scale text-generation model optimized for conversational AI tasks. It supports chat, persona research, and chatbot applications. The quantized versions (e.g., i1-Q4_K_M, i1-Q4_K_S) are available for efficient deployment, with the i1-Q4_K_S variant offering the best balance of size, speed, and quality. The model is designed for lightweight inference and is compatible with frameworks like HuggingFace Transformers.

Links

https://huggingface.co/mradermacher/mox-small-1-i1-GGUF

Tags

qwen3-vl-reranker-2b-i1

**Model Name:** Qwen3-VL-Reranker-2B-i1 **Base Model:** Qwen/Qwen3-VL-Reranker-2B **Description:** A high-performance multimodal reranking model for state-of-the-art cross-modal search. It supports 30+ languages and handles text, images, screenshots, videos, and mixed modalities. With 8B parameters and a 32K context length, it refines retrieval results by combining embedding vectors with precise relevance scores. Optimized for efficiency, it supports quantized versions (e.g., Q8_0, Q4_K_M) and is ideal for applications requiring accurate multimodal content matching. **Key Features:** - **Multimodal**: Text, images, videos, and mixed content. - **Language Support**: 30+ languages. - **Quantization**: Available in Q8_0 (best quality), Q4_K_M (fast, recommended), and lower-precision options. - **Performance**: Outperforms base models in retrieval tasks (e.g., JinaVDR, ViDoRe v3). - **Use Case**: Enhances search pipelines by refining embeddings with precise relevance scores. **Downloads:** - [GGUF Files](https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF) (e.g., `Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf`). **Usage:** - Requires `transformers`, `qwen-vl-utils`, and `torch`. - Example: `from scripts.qwen3_vl_reranker import Qwen3VLReranker; model = Qwen3VLReranker(...)` **Citation:** @article{qwen3vlembedding, ...} This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.

Links

https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF

Tags

mistral-nemo-instruct-2407-12b-thinking-m-claude-opus-high-reasoning-i1

The model described in this repository is the **Mistral-Nemo-Instruct-2407-12B** (12 billion parameters), a large language model optimized for instruction tuning and high-level reasoning tasks. It is a **quantized version** of the original model, compressed for efficiency while retaining key capabilities. The model is designed to generate human-like text, perform complex reasoning, and support multi-modal tasks, making it suitable for applications requiring strong language understanding and output.

Links

https://huggingface.co/mradermacher/Mistral-Nemo-Instruct-2407-12B-Thinking-M-Claude-Opus-High-Reasoning-i1-GGUF

Tags

iquest-coder-v1-40b-instruct-i1

The **IQuest-Coder-V1-40B-Instruct-i1-GGUF** is a quantized version of the original **IQuestLab/IQuest-Coder-V1-40B-Instruct** model, designed for efficient deployment. It is an **instruction-following large language model** with 40 billion parameters, optimized for tasks like code generation and reasoning. **Key Features:** - **Size:** 40B parameters (quantized for efficiency). - **Purpose:** Instruction-based coding and reasoning. - **Format:** GGUF (supports multi-part files). - **Quantization:** Uses advanced techniques (e.g., IQ3_M, Q4_K_M) for balance between performance and quality. **Available Quantizations:** - Optimized for speed and size: **i1-Q4_K_M** (recommended). - Lower-quality options for trade-off between size/quality. **Note:** This is a **quantized version** of the original model, but the base model (IQuestLab/IQuest-Coder-V1-40B-Instruct) is the official source. For full functionality, use the unquantized version or verify compatibility with your deployment tools.

Links

https://huggingface.co/mradermacher/IQuest-Coder-V1-40B-Instruct-i1-GGUF

Tags

minimax-m2.1-i1

The model **MiniMax-M2.1** (base model: *MiniMaxAI/MiniMax-M2.1*) is a large language model quantized for efficient deployment. It is optimized for speed and memory usage, with quantized versions available in various formats (e.g., GGUF) for different performance trade-offs. The quantization is done by the user, and the model is licensed under the *modified-mit* license. Key features: - **Quantized versions**: Includes low-precision (IQ1, IQ2, Q2_K, etc.) and high-precision (Q4_K_M, Q6_K) options. - **Usage**: Requires GGUF files; see [TheBloke's documentation](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for details on integration. - **License**: Modified MIT (see [license link](https://github.com/MiniMax-AI/MiniMax-M2.1/blob/main/LICENSE)). For gallery use, emphasize its quantized variants, performance trade-offs, and licensing.

Links

https://huggingface.co/mradermacher/MiniMax-M2.1-i1-GGUF

Tags

tildeopen-30b-instruct-lv-i1

The **TildeOpen-30B-Instruct-LV-i1-GGUF** is a quantized version of the base model **pazars/TildeOpen-30B-Instruct-LV**, optimized for deployment. It is an instruct-based language model trained on diverse datasets, supporting multiple languages (en, de, fr, pl, ru, it, pt, cs, nl, es, fi, tr, hu, bg, uk, bs, hr, da, et, lt, ro, sk, sl, sv, no, lv, sr, sq, mk, is, mt, ga). Licensed under CC-BY-4.0, it uses the Transformers library and is designed for efficient inference. The quantized version (with imatrix format) is tailored for deployment on devices with limited resources, while the base model remains the original, high-quality version.

Links

https://huggingface.co/mradermacher/TildeOpen-30B-Instruct-LV-i1-GGUF

Tags

qwen3-coder-30b-a3b-instruct-rtpurbo-i1

The model in question is a quantized version of the original **Qwen3-Coder** large language model, specifically tailored for code generation. The base model, **RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo**, is a 30B-parameter variant optimized for instruction-following and code-related tasks. It employs the **A3B attention mechanism** and is trained on diverse data to excel in programming and logical reasoning. The current repository provides a quantized (compressed) version of this model, which is suitable for deployment on hardware with limited memory but loses some precision compared to the original. For a high-fidelity version, the unquantized base model is recommended.

Links

https://huggingface.co/mradermacher/Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF

Tags

glm-4.5v-i1

The model in question is a **quantized version** of the **GLM-4.5V** large language model, originally developed by **zai-org**. This repository provides multiple quantized variants of the model, optimized for different trade-offs between size, speed, and quality. The base model, **GLM-4.5V**, is a multilingual (Chinese/English) large language model, and this quantized version is designed for efficient inference on hardware with limited memory. Key features include: - **Quantization options**: IQ2_M, Q2_K, Q4_K_M, IQ3_M, IQ4_XS, etc., with sizes ranging from 43 GB to 96 GB. - **Performance**: Optimized for inference, with some variants (e.g., Q4_K_M) balancing speed and quality. - **Vision support**: The model is a vision model, with mmproj files available in the static repository. - **License**: MIT-licensed. This quantized version is ideal for applications requiring compact, efficient models while retaining most of the original capabilities of the base GLM-4.5V.

Links

https://huggingface.co/mradermacher/GLM-4.5V-i1-GGUF

Tags

qwen-3-32b-medical-reasoning-i1

This is https://huggingface.co/kingabzpro/Qwen-3-32B-Medical-Reasoning applied to https://huggingface.co/Qwen/Qwen3-32B Original model card created by @kingabzpro Original model card from @kingabzpro Fine-tuning Qwen3-32B in 4-bit Quantization for Medical Reasoning This project fine-tunes the Qwen/Qwen3-32B model using a medical reasoning dataset (FreedomIntelligence/medical-o1-reasoning-SFT) with 4-bit quantization for memory-efficient training.

Links

Tags

qwen3-14b-griffon-i1

This is a fine-tuned version of the Qwen3-14B model using the high-quality OpenThoughts2-1M dataset. Fine-tuned with Unsloth’s TRL-compatible framework and LoRA for efficient performance, this model is optimized for advanced reasoning tasks, especially in math, logic puzzles, code generation, and step-by-step problem solving. Training Dataset Dataset: OpenThoughts2-1M Source: A synthetic dataset curated and expanded by the OpenThoughts team Volume: ~1.1M high-quality examples Content Type: Multi-turn reasoning, math proofs, algorithmic code generation, logical deduction, and structured conversations Tools Used: Curator Viewer This dataset builds upon OpenThoughts-114k and integrates strong reasoning-centric data sources like OpenR1-Math and KodCode. Intended Use This model is particularly suited for: Chain-of-thought and step-by-step reasoning Code generation with logical structure Educational tools for math and programming AI agents requiring multi-turn problem-solving

Links

Tags

qwen3-4b-esper3-i1

Esper 3 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3. Finetuned on our DevOps and architecture reasoning and code reasoning data generated with Deepseek R1! Improved general and creative reasoning to supplement problem-solving and general chat performance. Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Links

Tags

symiotic-14b-i1

SymbioticLM-14B is a state-of-the-art 17.8 billion parameter symbolic–transformer hybrid model that tightly couples high-capacity neural representation with structured symbolic cognition. Designed to match or exceed performance of top-tier LLMs in symbolic domains, it supports persistent memory, entropic recall, multi-stage symbolic routing, and self-organizing knowledge structures. This model is ideal for advanced reasoning agents, research assistants, and symbolic math/code generation systems.

Links

Tags

qwen3-the-josiefied-omega-directive-22b-uncensored-abliterated-i1

WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. A massive 22B, 62 layer merge of the fantastic "The-Omega-Directive-Qwen3-14B-v1.1" and off the scale "Goekdeniz-Guelmez/Josiefied-Qwen3-14B-abliterated-v3" in Qwen3, with full reasoning (can be turned on or off) and the model is completely uncensored/abliterated too.

Links

Tags

qwen3-the-xiaolong-omega-directive-22b-uncensored-abliterated-i1

WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. A massive 22B, 62 layer merge of the fantastic "The-Omega-Directive-Qwen3-14B-v1.1" (by ReadyArt) and off the scale "Xiaolong-Qwen3-14B" (by nbeerbower) in Qwen3, with full reasoning (can be turned on or off) and the model is completely uncensored/abliterated too.

Links

Tags

qwen3-55b-a3b-total-recall-v1.3-i1

WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. This model is for all use cases, but excels in creative use cases specifically. This model is based on Qwen3-30B-A3B (MOE, 128 experts, 8 activated), with Brainstorm 40X (by DavidAU - details at bottom of this page. This is the refined version -V1.3- from this project (see this repo for all settings, details, system prompts, example generations etc etc): https://huggingface.co/DavidAU/Qwen3-55B-A3B-TOTAL-RECALL-Deep-40X-GGUF/ This version -1.3- is slightly smaller, with further refinements to the Brainstorm adapter. This will change generation and reasoning performance within the model.

Links

Tags

qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1

WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Qwen3-42B-A3B-Stranger-Thoughts-Deep20x-Abliterated-Uncensored This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: Qwen's excellent "Qwen3-30B-A3B", abliterated by "huihui-ai" then combined Brainstorm 20x (tech notes at bottom of the page) in a MOE (128 experts) at 42B parameters (up from 30B). This pushes Qwen's abliterated/uncensored model to the absolute limit for creative use cases. Prose (all), reasoning, thinking ... all will be very different from reg "Qwen 3s". This model will generate horror, fiction, erotica, - you name it - in vivid, stark detail. It will NOT hold back. Likewise, regen(s) of the same prompt - even at the same settings - will create very different version(s) too. See FOUR examples below. Model retains full reasoning, and output generation of a Qwen3 MOE ; but has not been tested for "non-creative" use cases. Model is set with Qwen's default config: 40 k context 8 of 128 experts activated. Chatml OR Jinja Template (embedded) IMPORTANT: See usage guide / repo below to get the most out of this model, as settings are very specific. USAGE GUIDE: Please refer to this model card for Specific usage, suggested settings, changing ACTIVE EXPERTS, templates, settings and the like: How to maximize this model in "uncensored" form, with specific notes on "abliterated" models. Rep pen / temp settings specific to getting the model to perform strongly. https://huggingface.co/DavidAU/Qwen3-18B-A3B-Stranger-Thoughts-Abliterated-Uncensored-GGUF GGUF / QUANTS / SPECIAL SHOUTOUT: Special thanks to team Mradermacher for making the quants! https://huggingface.co/mradermacher/Qwen3-42B-A3B-Stranger-Thoughts-Deep20x-Abliterated-Uncensored-GGUF KNOWN ISSUES: Model may "mis-capitalize" word(s) - lowercase, where uppercase should be - from time to time. Model may add extra space from time to time before a word. Incorrect template and/or settings will result in a drop in performance / poor performance.

Links

Tags

zonui-3b-i1

ZonUI-3B — A lightweight, resolution-aware GUI grounding model trained with only 24K samples on a single RTX 4090.

Links

Tags

zhi-create-qwen3-32b-i1

Zhi-Create-Qwen3-32B is a fine-tuned model derived from Qwen/Qwen3-32B, with a focus on enhancing creative writing capabilities. Through careful optimization, the model shows promising improvements in creative writing performance, as evaluated using the WritingBench. In our evaluation, the model attains a score of 82.08 on WritingBench, which represents a significant improvement over the base Qwen3-32B model's score of 78.97. Additionally, to maintain the model's general capabilities such as knowledge and reasoning, we performed fine-grained data mixture experiments by combining general knowledge, mathematics, code, and other data types. The final evaluation results show that general capabilities remain stable with no significant decline compared to the base model.

Links

Tags

Model Gallery

Filter by type:

Filter by tags:

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-heretic-i1

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-i1

huihui-glm-4.7-flash-abliterated-i1

mox-small-1-i1

qwen3-vl-reranker-2b-i1

mistral-nemo-instruct-2407-12b-thinking-m-claude-opus-high-reasoning-i1

iquest-coder-v1-40b-instruct-i1

minimax-m2.1-i1

tildeopen-30b-instruct-lv-i1

qwen3-coder-30b-a3b-instruct-rtpurbo-i1

glm-4.5v-i1

qwen-3-32b-medical-reasoning-i1

qwen3-14b-griffon-i1

qwen3-4b-esper3-i1

symiotic-14b-i1

qwen3-the-josiefied-omega-directive-22b-uncensored-abliterated-i1

qwen3-the-xiaolong-omega-directive-22b-uncensored-abliterated-i1

qwen3-55b-a3b-total-recall-v1.3-i1

qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1

zonui-3b-i1

zhi-create-qwen3-32b-i1