Model Gallery

197 models from 1 repositories

Filter by type:

Filter by tags:

qwen3.6-40b-claude-4.6-opus-deckard-heretic-uncensored-thinking-neo-code-di-imatrix-max
The Qwen 3.5 version (also 40B) got 181 likes+ This version uses the new Qwen 3.6 27B arch (which exceeds even Qwen's own 398B model). WARNING: This model has character and intelligence. It will take no prisoners. It will give no quarter. Uncensored, Unfiltered and boldly confident. Not even remotely "SFW", if you ask it for NSFW content. And it is wickedly smart too - exceeding the base model in 6 out of 7 benchmarks. Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking 40 billion parameters (dense, not moe) expanded from 27B Qwen 3.6, then trained on Claude 4.6 Opus High Reasoning dataset via Unsloth on local hardware... but there is much more to the story - in comes DECKARD. 96 layers, 1275 Tensors. (50% more than base model of 27B) Features variable length reasoning ; less complex = shorter, longer for more complex. Model performance has increased dramatically. And it has character too. A lot of character. No censorship, no nanny. (via Heretic) And it is very, very smart. ...

Repository: localaiLicense: apache-2.0

qwen3.6-27b-heretic-uncensored-finetune-neo-code-di-imatrix-max
Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking Yes... fully uncensored AND fine tuned lightly. Freedom and brainpower. Trained on different Heretic base, with different KLD/Refusals. Model fine tune was used to finalize and "firm up" Heretic / uncensored changes. The goal here was light, minor fixes rather than full / heavy fine tune. That being said, the tuning still raised critical metrics. This is Version 2, using "trohrbaugh" Heretic, which has a lower refusal rate, and tuning bumped up the metrics a bit more too. This has also positively impacted "NEO-Coder Di-Matrix" (dual imatrix) GGUF quants as well (vs heretic/non heretic too). https://huggingface.co/DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF ``` IN HOUSE BENCHMARKS [by Nightmedia]: arc-c arc/e boolq hswag obkqa piqa wino Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking mxfp8 0.673,0.846,0.905... [instruct mode] Qwen3.6-27B-Heretic-Uncensored-Finetune-Thinking mxfp8 0.669,0.835,0.906,... [instruct mode] BASE UNTUNED MODEL: Qwen3.6-27B HERETIC (by llmfan46) [instruct mode] mxfp8 0.644,0.788,0.902,... ...

Repository: localaiLicense: apache-2.0

qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled
# 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving. The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples. - **Developed by:** @hesamation - **Base model:** `Qwen/Qwen3.6-35B-A3B` - **License:** apache-2.0 This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction. [](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t) ## Benchmark Results The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark. ...

Repository: localaiLicense: apache-2.0

qwen3.5-397b-a17b

Repository: localaiLicense: apache-2.0

nanbeige4.1-3b-q4
Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Repository: localaiLicense: apache-2.0

acestep-cpp-turbo
ACE-Step 1.5 Turbo (C++ / GGML) — native C++ music generation from text descriptions and lyrics. Two-stage pipeline: text-to-code (Qwen3 LM) + code-to-audio (DiT-VAE). Stereo 48kHz output. Uses Q8_0 quantized models for a good balance of quality and speed.

Repository: localaiLicense: mit

acestep-cpp-turbo-4b
ACE-Step 1.5 Turbo (C++ / GGML) with 4B LM — higher quality music generation from text and lyrics. Uses the larger 4B parameter LM for better metadata/code generation. Stereo 48kHz output.

Repository: localaiLicense: mit

qwen3-coder-next-mxfp4_moe
The model is a quantized version of **Qwen/Qwen3-Coder-Next** (base model) using the **MXFP4** quantization scheme. It is optimized for efficiency while retaining performance, suitable for deployment in applications requiring lightweight inference. The quantized version is tailored for specific tasks, with parameters like temperature=1.0 and top_p=0.95 recommended for generation.

Repository: localai

fish-speech-s2-pro
Fish Speech S2-Pro is a high-quality text-to-speech model supporting voice cloning via reference audio. Uses a two-stage pipeline: text to semantic tokens (LLaMA-based) then semantic to audio (DAC decoder).

Repository: localaiLicense: fish-audio-research-license

lfm2.5-1.2b-nova-function-calling
The **LFM2.5-1.2B-Nova-Function-Calling-GGUF** is a quantized version of the original model, optimized for efficiency with **Unsloth**. It supports text and multimodal tasks, using different quantization levels (e.g., Q2_K, Q3_K, Q4_K, etc.) to balance performance and memory usage. The model is designed for function calling and is faster than the original version, making it suitable for tasks like code generation, reasoning, and multi-modal input processing.

Repository: localaiLicense: apache-2.0

iquest-coder-v1-40b-instruct-i1
The **IQuest-Coder-V1-40B-Instruct-i1-GGUF** is a quantized version of the original **IQuestLab/IQuest-Coder-V1-40B-Instruct** model, designed for efficient deployment. It is an **instruction-following large language model** with 40 billion parameters, optimized for tasks like code generation and reasoning. **Key Features:** - **Size:** 40B parameters (quantized for efficiency). - **Purpose:** Instruction-based coding and reasoning. - **Format:** GGUF (supports multi-part files). - **Quantization:** Uses advanced techniques (e.g., IQ3_M, Q4_K_M) for balance between performance and quality. **Available Quantizations:** - Optimized for speed and size: **i1-Q4_K_M** (recommended). - Lower-quality options for trade-off between size/quality. **Note:** This is a **quantized version** of the original model, but the base model (IQuestLab/IQuest-Coder-V1-40B-Instruct) is the official source. For full functionality, use the unquantized version or verify compatibility with your deployment tools.

Repository: localaiLicense: iquestcoder

qwen3-coder-30b-a3b-instruct-rtpurbo-i1
The model in question is a quantized version of the original **Qwen3-Coder** large language model, specifically tailored for code generation. The base model, **RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo**, is a 30B-parameter variant optimized for instruction-following and code-related tasks. It employs the **A3B attention mechanism** and is trained on diverse data to excel in programming and logical reasoning. The current repository provides a quantized (compressed) version of this model, which is suitable for deployment on hardware with limited memory but loses some precision compared to the original. For a high-fidelity version, the unquantized base model is recommended.

Repository: localai

qwen3-vl-8b-thinking
Qwen3-VL-8B-Thinking is the 8B parameter model of the Qwen3-VL series that is thinking. Uses recommended default parameters according to Unsloth documentation for Qwen 3 VL.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-h-small
Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-h-tiny
Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-h-micro
Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

ibm-granite_granite-4.0-micro
Granite-4.0-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-30b-a3b
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-30b-a3b-q8_0
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-14b-q8_0
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

opengvlab_internvl3_5-14b
We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0% gain in overall reasoning performance and a 4.05 ×\times× inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks—narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.

Repository: localaiLicense: apache-2.0

Page 1