Model Gallery

521 models from 1 repositories

Filter by type:

Filter by tags:

qwen3.6-27b-heretic-uncensored-finetune-neo-code-di-imatrix-max
Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking Yes... fully uncensored AND fine tuned lightly. Freedom and brainpower. Trained on different Heretic base, with different KLD/Refusals. Model fine tune was used to finalize and "firm up" Heretic / uncensored changes. The goal here was light, minor fixes rather than full / heavy fine tune. That being said, the tuning still raised critical metrics. This is Version 2, using "trohrbaugh" Heretic, which has a lower refusal rate, and tuning bumped up the metrics a bit more too. This has also positively impacted "NEO-Coder Di-Matrix" (dual imatrix) GGUF quants as well (vs heretic/non heretic too). https://huggingface.co/DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF ``` IN HOUSE BENCHMARKS [by Nightmedia]: arc-c arc/e boolq hswag obkqa piqa wino Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking mxfp8 0.673,0.846,0.905... [instruct mode] Qwen3.6-27B-Heretic-Uncensored-Finetune-Thinking mxfp8 0.669,0.835,0.906,... [instruct mode] BASE UNTUNED MODEL: Qwen3.6-27B HERETIC (by llmfan46) [instruct mode] mxfp8 0.644,0.788,0.902,... ...

Repository: localaiLicense: apache-2.0

qwopus-glm-18b-merged
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1 ## 📌 Model Overview **Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1` **Base Model:** Qwen3.5-9B **Training Type:** Supervised Fine-Tuning (SFT, Distillation) **Parameter Scale:** 9B **Training Framework:** Unsloth This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**. The primary goals are to: - Improve **structured reasoning ability** - Enhance **instruction-following consistency** - Activate **latent knowledge via better reasoning structure** ## 📊 Training Data ### Main Dataset - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned` - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset. - Generated from a **GLM-5.1 teacher model** - Approximately **700x** the scale of `Qwen3.5-reasoning-700x` - Training used a **filtered subset**, not the full source dataset. ### Auxiliary Dataset - `Jackrong/Qwen3.5-reasoning-700x` ...

Repository: localaiLicense: apache-2.0

qwen3.5-9b-glm5.1-distill-v1
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1 ## 📌 Model Overview **Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1` **Base Model:** Qwen3.5-9B **Training Type:** Supervised Fine-Tuning (SFT, Distillation) **Parameter Scale:** 9B **Training Framework:** Unsloth This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**. The primary goals are to: - Improve **structured reasoning ability** - Enhance **instruction-following consistency** - Activate **latent knowledge via better reasoning structure** ## 📊 Training Data ### Main Dataset - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned` - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset. - Generated from a **GLM-5.1 teacher model** - Approximately **700x** the scale of `Qwen3.5-reasoning-700x` - Training used a **filtered subset**, not the full source dataset. ### Auxiliary Dataset - `Jackrong/Qwen3.5-reasoning-700x` ...

Repository: localaiLicense: apache-2.0

supergemma4-26b-uncensored-v2
Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. Gemma 4 introduces key **capability and architectural advancements**: * **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. ...

Repository: localaiLicense: gemma

qwopus-glm-18b-merged
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1 ## 📌 Model Overview **Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1` **Base Model:** Qwen3.5-9B **Training Type:** Supervised Fine-Tuning (SFT, Distillation) **Parameter Scale:** 9B **Training Framework:** Unsloth This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**. The primary goals are to: - Improve **structured reasoning ability** - Enhance **instruction-following consistency** - Activate **latent knowledge via better reasoning structure** ## 📊 Training Data ### Main Dataset - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned` - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset. - Generated from a **GLM-5.1 teacher model** - Approximately **700x** the scale of `Qwen3.5-reasoning-700x` - Training used a **filtered subset**, not the full source dataset. ### Auxiliary Dataset - `Jackrong/Qwen3.5-reasoning-700x` ...

Repository: localaiLicense: apache-2.0

qwen_qwen3.5-0.8b
Qwen 3.5 0.8B parameter model quantized for llama-cpp backend. Supports chat interactions and multimodal image-text inputs.

Repository: localaiLicense: apache-2.0

qwen_qwen3.5-2b
Qwen3.5-2B is a highly efficient, instruction-tuned multilingual language model available in various quantized GGUF formats. Optimized for llama-cpp inference, it supports chat and completion tasks with strong performance on low-RAM hardware. The model is available in multiple quantization levels ranging from Q8_0 to IQ2_M to balance quality and resource usage.

Repository: localaiLicense: apache-2.0

q3.5-bluestar-27b

Repository: localaiLicense: mit

qwen_qwen3-next-80b-a3b-thinking

Repository: localaiLicense: apache-2.0

nanbeige4.1-3b-q8
Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Repository: localaiLicense: apache-2.0

nanbeige4.1-3b-q4
Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Repository: localaiLicense: apache-2.0

vllm-omni-z-image-turbo
Z-Image-Turbo via vLLM-Omni - A distilled version of Z-Image optimized for speed with only 8 NFEs. Offers sub-second inference latency on enterprise-grade H800 GPUs and fits within 16GB VRAM. Excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.

Repository: localaiLicense: apache-2.0

deepseek-ai.deepseek-v3.2
This is a quantized version of the DeepSeek-V3.2 model by deepseek-ai, optimized for efficient deployment. It is designed for text generation tasks and supports the pipeline tag `text-generation`. The model is based on the original DeepSeek-V3.2 architecture and is available for use in various applications. For more details, refer to the [official repository](https://github.com/DevQuasar/deepseek-ai.DeepSeek-V3.2-GGUF).

Repository: localai

z-image-turbo-diffusers
🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.

Repository: localaiLicense: apache-2.0

glm-4.7-flash-derestricted
This model is a quantized version of the original GLM-4.7-Flash-Derestricted model, derived from the base model `koute/GLM-4.7-Flash-Derestricted`. It is designed for restricted use, featuring tags like "derestricted," "uncensored," and "unlimited." The quantized versions (e.g., Q2_K, Q4_K_S, Q6_K) offer varying trade-offs between accuracy and efficiency, with the Q4_K_S and Q6_K variants being recommended for balanced performance. The model is optimized for fast inference and supports multiple quantization schemes, though some advanced quantization options (like IQ4_XS) are not available. It is intended for use in environments with specific constraints or restrictions.

Repository: localaiLicense: mit

glm-4.7-flash
**GLM-4.7-Flash** is a 30B-A3B MoE (Model Organism Ensemble) model designed for efficient deployment. It outperforms competitors in benchmarks like AIME 25, GPQA, and τ²-Bench, offering strong accuracy while balancing performance and efficiency. Optimized for lightweight use cases, it supports inference via frameworks like vLLM and SGLang, with detailed deployment instructions in the official repository. Ideal for applications requiring high-quality text generation with minimal resource consumption.

Repository: localaiLicense: mit

lfm2.5-1.2b-nova-function-calling
The **LFM2.5-1.2B-Nova-Function-Calling-GGUF** is a quantized version of the original model, optimized for efficiency with **Unsloth**. It supports text and multimodal tasks, using different quantization levels (e.g., Q2_K, Q3_K, Q4_K, etc.) to balance performance and memory usage. The model is designed for function calling and is faster than the original version, making it suitable for tasks like code generation, reasoning, and multi-modal input processing.

Repository: localaiLicense: apache-2.0

mistral-nemo-instruct-2407-12b-thinking-m-claude-opus-high-reasoning-i1
The model described in this repository is the **Mistral-Nemo-Instruct-2407-12B** (12 billion parameters), a large language model optimized for instruction tuning and high-level reasoning tasks. It is a **quantized version** of the original model, compressed for efficiency while retaining key capabilities. The model is designed to generate human-like text, perform complex reasoning, and support multi-modal tasks, making it suitable for applications requiring strong language understanding and output.

Repository: localai

rwkv7-g1c-13.3b
The model is **RWKV7 g1c 13B**, a large language model optimized for efficiency. It is quantized using **Bartowski's calibrationv5 for imatrix** to reduce memory usage while maintaining performance. The base model is **BlinkDL/rwkv7-g1**, and this version is tailored for text-generation tasks. It balances accuracy and efficiency, making it suitable for deployment in various applications.

Repository: localaiLicense: apache-2.0

iquest-coder-v1-40b-instruct-i1
The **IQuest-Coder-V1-40B-Instruct-i1-GGUF** is a quantized version of the original **IQuestLab/IQuest-Coder-V1-40B-Instruct** model, designed for efficient deployment. It is an **instruction-following large language model** with 40 billion parameters, optimized for tasks like code generation and reasoning. **Key Features:** - **Size:** 40B parameters (quantized for efficiency). - **Purpose:** Instruction-based coding and reasoning. - **Format:** GGUF (supports multi-part files). - **Quantization:** Uses advanced techniques (e.g., IQ3_M, Q4_K_M) for balance between performance and quality. **Available Quantizations:** - Optimized for speed and size: **i1-Q4_K_M** (recommended). - Lower-quality options for trade-off between size/quality. **Note:** This is a **quantized version** of the original model, but the base model (IQuestLab/IQuest-Coder-V1-40B-Instruct) is the official source. For full functionality, use the unquantized version or verify compatibility with your deployment tools.

Repository: localaiLicense: iquestcoder

minimax-m2.1-i1
The model **MiniMax-M2.1** (base model: *MiniMaxAI/MiniMax-M2.1*) is a large language model quantized for efficient deployment. It is optimized for speed and memory usage, with quantized versions available in various formats (e.g., GGUF) for different performance trade-offs. The quantization is done by the user, and the model is licensed under the *modified-mit* license. Key features: - **Quantized versions**: Includes low-precision (IQ1, IQ2, Q2_K, etc.) and high-precision (Q4_K_M, Q6_K) options. - **Usage**: Requires GGUF files; see [TheBloke's documentation](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for details on integration. - **License**: Modified MIT (see [license link](https://github.com/MiniMax-AI/MiniMax-M2.1/blob/main/LICENSE)). For gallery use, emphasize its quantized variants, performance trade-offs, and licensing.

Repository: localaiLicense: modified-mit

Page 1